Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
Genome Res ; 30(7): 1073-1081, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32079618

RESUMEN

Long noncoding RNAs (lncRNAs) have emerged as key coordinators of biological and cellular processes. Characterizing lncRNA expression across cells and tissues is key to understanding their role in determining phenotypes, including human diseases. We present here FC-R2, a comprehensive expression atlas across a broadly defined human transcriptome, inclusive of over 109,000 coding and noncoding genes, as described in the FANTOM CAGE-Associated Transcriptome (FANTOM-CAT) study. This atlas greatly extends the gene annotation used in the original recount2 resource. We demonstrate the utility of the FC-R2 atlas by reproducing key findings from published large studies and by generating new results across normal and diseased human samples. In particular, we (a) identify tissue-specific transcription profiles for distinct classes of coding and noncoding genes, (b) perform differential expression analysis across thirteen cancer types, identifying novel noncoding genes potentially involved in tumor pathogenesis and progression, and (c) confirm the prognostic value for several enhancer lncRNAs expression in cancer. Our resource is instrumental for the systematic molecular characterization of lncRNA by the FANTOM6 Consortium. In conclusion, comprised of over 70,000 samples, the FC-R2 atlas will empower other researchers to investigate functions and biological roles of both known coding genes and novel lncRNAs.


Asunto(s)
Transcriptoma , Bases de Datos Genéticas , Elementos de Facilitación Genéticos , Perfilación de la Expresión Génica , Genoma Humano , Humanos , Neoplasias/genética , Especificidad de Órganos , Pronóstico , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismo , ARN Mensajero/metabolismo
2.
Eur Radiol ; 33(10): 6883-6891, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37083741

RESUMEN

OBJECTIVES: To perform a systematic review comparing the diagnostic accuracy of MRI vs. CT for assessing pancreatic ductal adenocarcinoma (PDAC) vascular invasion. METHODS: MEDLINE, EMBASE, Cochrane Central, and Scopus were searched until December 2021 for diagnostic accuracy studies comparing MRI vs. CT to evaluate vascular invasion of pathologically confirmed PDAC in the same patients. Findings on resection or exploratory laparotomy were the preferred reference standard. Data extraction, risk of bias, and applicability assessment were performed by two authors using the Quality Assessment of Diagnostic Accuracy Studies-Comparative Tool. Bivariate random-effects meta-analysis and meta-regression were performed with 95% confidence intervals (95% CI). RESULTS: Three studies were included assessing 474 vessels without vascular invasion and 65 with vascular invasion in 107 patients. All patients were imaged using MRI at ≥ 1.5 T and a pancreatic protocol CT. No difference was shown between MRI and CT for diagnosing PDAC vascular invasion: MRI/CT sensitivity (95% CI) were 71% (47-87%)/74% (56-86%), and specificity were 97% (94-99%)/97% (94-98%). Sources of bias included selection bias from only a subset of CT patients undergoing MRI and verification bias from patients with unresectable disease not confirmed on surgery. No patients received neoadjuvant therapy prior to staging. CONCLUSIONS: Based on limited data, no difference was observed between MRI and pancreatic protocol CT for PDAC vascular invasion assessment. MRI may be an adequate substitute for pancreatic protocol CT in some patients, particularly those who have already had a single-phase CT. Larger and more recent cohort studies at low risk of bias, including patients who have received neoadjuvant therapy, are needed. CLINICAL RELEVANCE STATEMENT: Abdominal MRI performed similarly to pancreatic protocol CT at assessing pancreatic ductal adenocarcinoma vascular invasion, suggesting local staging is adequate in some patients using MRI. More data are needed using larger, more recent cohorts including patients with neoadjuvant treatment. KEY POINTS: • Based on limited data, no difference was found between MRI and pancreatic protocol CT sensitivity and specificity for diagnosing PDAC vascular invasion (p = 0.81, 0.73 respectively). • Risk of bias could be reduced in future PDAC MRI vs CT comparative diagnostic test accuracy research by ensuring all enrolled patients undergo both imaging modalities being compared in random order and regardless of the findings on either modality. • More studies are needed that directly compare the diagnostic performance of MRI and CT for PDAC staging after neoadjuvant therapy.


Asunto(s)
Adenocarcinoma , Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Humanos , Neoplasias Pancreáticas/patología , Adenocarcinoma/diagnóstico por imagen , Tomografía Computarizada por Rayos X/métodos , Imagen por Resonancia Magnética , Carcinoma Ductal Pancreático/diagnóstico por imagen , Sensibilidad y Especificidad , Pruebas Diagnósticas de Rutina , Neoplasias Pancreáticas
3.
Bioinformatics ; 37(18): 3014-3016, 2021 09 29.
Artículo en Inglés | MEDLINE | ID: mdl-33693500

RESUMEN

MOTIVATION: A common way to summarize sequencing datasets is to quantify data lying within genes or other genomic intervals. This can be slow and can require different tools for different input file types. RESULTS: Megadepth is a fast tool for quantifying alignments and coverage for BigWig and BAM/CRAM input files, using substantially less memory than the next-fastest competitor. Megadepth can summarize coverage within all disjoint intervals of the Gencode V35 gene annotation for more than 19 000 GTExV8 BigWig files in approximately 1 h using 32 threads. Megadepth is available both as a command-line tool and as an R/Bioconductor package providing much faster quantification compared to the rtracklayer package. AVAILABILITY AND IMPLEMENTATION: https://github.com/ChristopherWilks/megadepth, https://bioconductor.org/packages/megadepth. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genoma , Genómica , Programas Informáticos , Anotación de Secuencia Molecular
4.
Bioinformatics ; 35(3): 421-432, 2019 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-30020410

RESUMEN

Motivation: General-purpose processors can now contain many dozens of processor cores and support hundreds of simultaneous threads of execution. To make best use of these threads, genomics software must contend with new and subtle computer architecture issues. We discuss some of these and propose methods for improving thread scaling in tools that analyze each read independently, such as read aligners. Results: We implement these methods in new versions of Bowtie, Bowtie 2 and HISAT. We greatly improve thread scaling in many scenarios, including on the recent Intel Xeon Phi architecture. We also highlight how bottlenecks are exacerbated by variable-record-length file formats like FASTQ and suggest changes that enable superior scaling. Availability and implementation: Experiments for this study: https://github.com/BenLangmead/bowtie-scaling. Bowtie: http://bowtie-bio.sourceforge.net. Bowtie 2: http://bowtie-bio.sourceforge.net/bowtie2. HISAT: http://www.ccb.jhu.edu/software/hisat. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Genómica , Programas Informáticos , Sistemas de Computación
5.
Proteomics ; 19(15): e1800315, 2019 08.
Artículo en Inglés | MEDLINE | ID: mdl-30983154

RESUMEN

Understanding the molecular profile of every human cell type is essential for understanding its role in normal physiology and disease. Technological advancements in DNA sequencing, mass spectrometry, and computational methods allow us to carry out multiomics analyses although such approaches are not routine yet. Human umbilical vein endothelial cells (HUVECs) are a widely used model system to study pathological and physiological processes associated with the cardiovascular system. In this study, next-generation sequencing and high-resolution mass spectrometry to profile the transcriptome and proteome of primary HUVECs is employed. Analysis of 145 million paired-end reads from next-generation sequencing confirmed expression of 12 186 protein-coding genes (FPKM ≥0.1), 439 novel long non-coding RNAs, and revealed 6089 novel isoforms that were not annotated in GENCODE. Proteomics analysis identifies 6477 proteins including confirmation of N-termini for 1091 proteins, isoforms for 149 proteins, and 1034 phosphosites. A database search to specifically identify other post-translational modifications provide evidence for a number of modification sites on 117 proteins which include ubiquitylation, lysine acetylation, and mono-, di- and tri-methylation events. Evidence for 11 "missing proteins," which are proteins for which there was insufficient or no protein level evidence, is provided. Peptides supporting missing protein and novel events are validated by comparison of MS/MS fragmentation patterns with synthetic peptides. Finally, 245 variant peptides derived from 207 expressed proteins in addition to alternate translational start sites for seven proteins and evidence for novel proteoforms for five proteins resulting from alternative splicing are identified. Overall, it is believed that the integrated approach employed in this study is widely applicable to study any primary cell type for deeper molecular characterization.


Asunto(s)
Proteómica/métodos , Transcriptoma/genética , Empalme Alternativo/genética , Células Endoteliales de la Vena Umbilical Humana , Humanos
6.
Bioinformatics ; 34(1): 114-116, 2018 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-28968689

RESUMEN

Motivation: As more and larger genomics studies appear, there is a growing need for comprehensive and queryable cross-study summaries. These enable researchers to leverage vast datasets that would otherwise be difficult to obtain. Results: Snaptron is a search engine for summarized RNA sequencing data with a query planner that leverages R-tree, B-tree and inverted indexing strategies to rapidly execute queries over 146 million exon-exon splice junctions from over 70 000 human RNA-seq samples. Queries can be tailored by constraining which junctions and samples to consider. Snaptron can score junctions according to tissue specificity or other criteria, and can score samples according to the relative frequency of different splicing patterns. We describe the software and outline biological questions that can be explored with Snaptron queries. Availability and implementation: Documentation is at http://snaptron.cs.jhu.edu. Source code is at https://github.com/ChristopherWilks/snaptron and https://github.com/ChristopherWilks/snaptron-experiments with a CC BY-NC 4.0 license. Contact: chris.wilks@jhu.edu or langmea@cs.jhu.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Empalme del ARN , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Exones , Humanos
7.
Nucleic Acids Res ; 45(2): e9, 2017 01 25.
Artículo en Inglés | MEDLINE | ID: mdl-27694310

RESUMEN

Differential expression analysis of RNA sequencing (RNA-seq) data typically relies on reconstructing transcripts or counting reads that overlap known gene structures. We previously introduced an intermediate statistical approach called differentially expressed region (DER) finder that seeks to identify contiguous regions of the genome showing differential expression signal at single base resolution without relying on existing annotation or potentially inaccurate transcript assembly.We present the derfinder software that improves our annotation-agnostic approach to RNA-seq analysis by: (i) implementing a computationally efficient bump-hunting approach to identify DERs that permits genome-scale analyses in a large number of samples, (ii) introducing a flexible statistical modeling framework, including multi-group and time-course analyses and (iii) introducing a new set of data visualizations for expressed region analysis. We apply this approach to public RNA-seq data from the Genotype-Tissue Expression (GTEx) project and BrainSpan project to show that derfinder permits the analysis of hundreds of samples at base resolution in R, identifies expression outside of known gene boundaries and can be used to visualize expressed regions at base-resolution. In simulations, our base resolution approaches enable discovery in the presence of incomplete annotation and is nearly as powerful as feature-level methods when the annotation is complete.derfinder analysis using expressed region-level and single base-level approaches provides a compromise between full transcript reconstruction and feature-level analysis. The package is available from Bioconductor at www.bioconductor.org/packages/derfinder.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Programas Informáticos , Regulación de la Expresión Génica , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Anotación de Secuencia Molecular , Especificidad de Órganos/genética , Transcriptoma , Navegador Web
8.
Bioinformatics ; 33(24): 4033-4040, 2017 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-27592709

RESUMEN

MOTIVATION: RNA sequencing (RNA-seq) experiments now span hundreds to thousands of samples. Current spliced alignment software is designed to analyze each sample separately. Consequently, no information is gained from analyzing multiple samples together, and it requires extra work to obtain analysis products that incorporate data from across samples. RESULTS: We describe Rail-RNA, a cloud-enabled spliced aligner that analyzes many samples at once. Rail-RNA eliminates redundant work across samples, making it more efficient as samples are added. For many samples, Rail-RNA is more accurate than annotation-assisted aligners. We use Rail-RNA to align 667 RNA-seq samples from the GEUVADIS project on Amazon Web Services in under 16 h for US$0.91 per sample. Rail-RNA outputs alignments in SAM/BAM format; but it also outputs (i) base-level coverage bigWigs for each sample; (ii) coverage bigWigs encoding normalized mean and median coverages at each base across samples analyzed; and (iii) exon-exon splice junctions and indels (features) in columnar formats that juxtapose coverages in samples in which a given feature is found. Supplementary outputs are ready for use with downstream packages for reproducible statistical analysis. We use Rail-RNA to identify expressed regions in the GEUVADIS samples and show that both annotated and unannotated (novel) expressed regions exhibit consistent patterns of variation across populations and with respect to known confounding variables. AVAILABILITY AND IMPLEMENTATION: Rail-RNA is open-source software available at http://rail.bio. CONTACTS: anellore@gmail.com or langmea@cs.jhu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Empalme del ARN , Alineación de Secuencia/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Exones , Perfilación de la Expresión Génica
9.
Bioinformatics ; 32(16): 2551-3, 2016 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-27153614

RESUMEN

MOTIVATION: Public archives contain thousands of trillions of bases of valuable sequencing data. More than 40% of the Sequence Read Archive is human data protected by provisions such as dbGaP. To analyse dbGaP-protected data, researchers must typically work with IT administrators and signing officials to ensure all levels of security are implemented at their institution. This is a major obstacle, impeding reproducibility and reducing the utility of archived data. RESULTS: We present a protocol and software tool for analyzing protected data in a commercial cloud. The protocol, Rail-dbGaP, is applicable to any tool running on Amazon Web Services Elastic MapReduce. The tool, Rail-RNA v0.2, is a spliced aligner for RNA-seq data, which we demonstrate by running on 9662 samples from the dbGaP-protected GTEx consortium dataset. The Rail-dbGaP protocol makes explicit for the first time the steps an investigator must take to develop Elastic MapReduce pipelines that analyse dbGaP-protected data in a manner compliant with NIH guidelines. Rail-RNA automates implementation of the protocol, making it easy for typical biomedical investigators to study protected RNA-seq data, regardless of their local IT resources or expertise. AVAILABILITY AND IMPLEMENTATION: Rail-RNA is available from http://rail.bio Technical details on the Rail-dbGaP protocol as well as an implementation walkthrough are available at https://github.com/nellore/rail-dbgap Detailed instructions on running Rail-RNA on dbGaP-protected data using Amazon Web Services are available at http://docs.rail.bio/dbgap/ CONTACTS: : anellore@gmail.com or langmea@cs.jhu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Bases de Datos Genéticas , Programas Informáticos , Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , ARN , Reproducibilidad de los Resultados
10.
Nucleic Acids Res ; 40(Database issue): D1202-10, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22140109

RESUMEN

The Arabidopsis Information Resource (TAIR, http://arabidopsis.org) is a genome database for Arabidopsis thaliana, an important reference organism for many fundamental aspects of biology as well as basic and applied plant biology research. TAIR serves as a central access point for Arabidopsis data, annotates gene function and expression patterns using controlled vocabulary terms, and maintains and updates the A. thaliana genome assembly and annotation. TAIR also provides researchers with an extensive set of visualization and analysis tools. Recent developments include several new genome releases (TAIR8, TAIR9 and TAIR10) in which the A. thaliana assembly was updated, pseudogenes and transposon genes were re-annotated, and new data from proteomics and next generation transcriptome sequencing were incorporated into gene models and splice variants. Other highlights include progress on functional annotation of the genome and the release of several new tools including Textpresso for Arabidopsis which provides the capability to carry out full text searches on a large body of research literature.


Asunto(s)
Arabidopsis/genética , Bases de Datos Genéticas , Genes de Plantas , Anotación de Secuencia Molecular , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Genoma de Planta , Programas Informáticos
11.
Genome Biol ; 24(1): 22, 2023 02 09.
Artículo en Inglés | MEDLINE | ID: mdl-36759904

RESUMEN

Alternative polyadenylation (APA) is an important post-transcriptional mechanism that has major implications in biological processes and diseases. Although specialized sequencing methods for polyadenylation exist, availability of these data are limited compared to RNA-sequencing data. We developed REPAC, a framework for the analysis of APA from RNA-sequencing data. Using REPAC, we investigate the landscape of APA caused by activation of B cells. We also show that REPAC is faster than alternative methods by at least 7-fold and that it scales well to hundreds of samples. Overall, the REPAC method offers an accurate, easy, and convenient solution for the exploration of APA.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Poliadenilación , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Regiones no Traducidas 3' , ARN Mensajero , Análisis de Secuencia de ARN/métodos
12.
Genome Biol ; 22(1): 323, 2021 11 29.
Artículo en Inglés | MEDLINE | ID: mdl-34844637

RESUMEN

We present recount3, a resource consisting of over 750,000 publicly available human and mouse RNA sequencing (RNA-seq) samples uniformly processed by our new Monorail analysis pipeline. To facilitate access to the data, we provide the recount3 and snapcount R/Bioconductor packages as well as complementary web resources. Using these tools, data can be downloaded as study-level summaries or queried for specific exon-exon junctions, genes, samples, or other features. Monorail can be used to process local and/or private data, allowing results to be directly compared to any study in recount3. Taken together, our tools help biologists maximize the utility of publicly available RNA-seq data, especially to improve their understanding of newly collected data. recount3 is available from http://rna.recount.bio .


Asunto(s)
Empalme del ARN , RNA-Seq/métodos , ARN/genética , Animales , Secuencia de Bases , Biología Computacional/métodos , Exones , Regulación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Ratones , Análisis de Secuencia de ARN/métodos , Programas Informáticos
13.
Nucleic Acids Res ; 36(Database issue): D1009-14, 2008 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17986450

RESUMEN

The Arabidopsis Information Resource (TAIR, http://arabidopsis.org) is the model organism database for the fully sequenced and intensively studied model plant Arabidopsis thaliana. Data in TAIR is derived in large part from manual curation of the Arabidopsis research literature and direct submissions from the research community. New developments at TAIR include the addition of the GBrowse genome viewer to the TAIR site, a redesigned home page, navigation structure and portal pages to make the site more intuitive and easier to use, the launch of several TAIR web services and a new genome annotation release (TAIR7) in April 2007. A combination of manual and computational methods were used to generate this release, which contains 27,029 protein-coding genes, 3889 pseudogenes or transposable elements and 1123 ncRNAs (32,041 genes in all, 37,019 gene models). A total of 681 new genes and 1002 new splice variants were added. Overall, 10,098 loci (one-third of all loci from the previous TAIR6 release) were updated for the TAIR7 release.


Asunto(s)
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Bases de Datos Genéticas , Empalme Alternativo , Genes de Plantas , Genoma de Planta , Genómica , Internet , ARN no Traducido/genética , Interfaz Usuario-Computador , Vocabulario Controlado
14.
Nat Commun ; 11(1): 137, 2020 01 09.
Artículo en Inglés | MEDLINE | ID: mdl-31919425

RESUMEN

Public archives of next-generation sequencing data are growing exponentially, but the difficulty of marshaling this data has led to its underutilization by scientists. Here, we present ASCOT, a resource that uses annotation-free methods to rapidly analyze and visualize splice variants across tens of thousands of bulk and single-cell data sets in the public archive. To demonstrate the utility of ASCOT, we identify novel cell type-specific alternative exons across the nervous system and leverage ENCODE and GTEx data sets to study the unique splicing of photoreceptors. We find that PTBP1 knockdown and MSI1 and PCBP2 overexpression are sufficient to activate many photoreceptor-specific exons in HepG2 liver cancer cells. This work demonstrates how large-scale analysis of public RNA-Seq data sets can yield key insights into cell type-specific control of RNA splicing and underscores the importance of considering both annotated and unannotated splicing events.


Asunto(s)
Empalme Alternativo/genética , Biología Computacional/métodos , Análisis de Datos , Células Fotorreceptoras/citología , Sitios de Empalme de ARN/genética , Animales , Línea Celular Tumoral , Expresión Génica/genética , Células Hep G2 , Ribonucleoproteínas Nucleares Heterogéneas/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Neoplasias Hepáticas/genética , Ratones , Proteínas del Tejido Nervioso/biosíntesis , Proteínas del Tejido Nervioso/genética , Neuronas/citología , Proteína de Unión al Tracto de Polipirimidina/genética , Proteínas de Unión al ARN/biosíntesis , Proteínas de Unión al ARN/genética , Retina/citología , Análisis de Secuencia de ARN/métodos
15.
Artículo en Inglés | MEDLINE | ID: mdl-25267794

RESUMEN

The Cancer Genomics Hub (CGHub) is the online repository of the sequencing programs of the National Cancer Institute (NCI), including The Cancer Genomics Atlas (TCGA), the Cancer Cell Line Encyclopedia (CCLE) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) projects, with data from 25 different types of cancer. The CGHub currently contains >1.4 PB of data, has grown at an average rate of 50 TB a month and serves >100 TB per week. The architecture of CGHub is designed to support bulk searching and downloading through a Web-accessible application programming interface, enforce patient genome confidentiality in data storage and transmission and optimize for efficiency in access and transfer. In this article, we describe the design of these three components, present performance results for our transfer protocol, GeneTorrent, and finally report on the growth of the system in terms of data stored and transferred, including estimated limits on the current architecture. Our experienced-based estimates suggest that centralizing storage and computational resources is more efficient than wide distribution across many satellite labs. Database URL: https://cghub.ucsc.edu.


Asunto(s)
Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Genómica/métodos , Internet , Neoplasias/genética , Seguridad Computacional , Registros Electrónicos de Salud , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA