Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Mol Cell ; 72(5): 849-861.e6, 2018 12 06.
Artículo en Inglés | MEDLINE | ID: mdl-30318446

RESUMEN

Alternative polyadenylation generates numerous 3' mRNA isoforms that can vary in biological properties, such as stability and localization. We developed methods to obtain transcriptome-scale structural information and protein binding on individual 3' mRNA isoforms in vivo. Strikingly, near-identical mRNA isoforms can possess dramatically different structures throughout the 3' UTR. Analyses of identical mRNAs in different species or refolded in vitro indicate that structural differences in vivo are often due to trans-acting factors. The level of Pab1 binding to poly(A)-containing isoforms is surprisingly variable, and differences in Pab1 binding correlate with the extent of structural variation for closely spaced isoforms. A pattern encompassing single-strandedness near the 3' terminus, double-strandedness of the poly(A) tail, and low Pab1 binding is associated with mRNA stability. Thus, individual 3' mRNA isoforms can be remarkably different physical entities in vivo. Sequences responsible for isoform-specific structures, differential Pab1 binding, and mRNA stability are evolutionarily conserved, indicating biological function.


Asunto(s)
Regulación Fúngica de la Expresión Génica , Proteínas de Unión a Poli(A)/genética , Isoformas de ARN/química , ARN de Hongos/química , ARN Mensajero/química , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Secuencia de Bases , Conformación de Ácido Nucleico , Proteínas de Unión a Poli(A)/metabolismo , Poliadenilación , Unión Proteica , Isoformas de ARN/genética , Isoformas de ARN/metabolismo , Estabilidad del ARN , ARN de Hongos/genética , ARN de Hongos/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Transcriptoma
2.
Nucleic Acids Res ; 52(W1): W341-W347, 2024 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-38709877

RESUMEN

Genes commonly express multiple RNA products (RNA isoforms), which differ in exonic content and can have different functions. Making sense of the plethora of known and novel RNA isoforms being identified by transcriptomic approaches requires a user-friendly way to visualize gene isoforms and how they differ in exonic content, expression levels and potential functions. Here we introduce IsoVis, a freely available webserver that accepts user-supplied transcriptomic data and visualizes the expressed isoforms in a clear, intuitive manner. IsoVis contains numerous features, including the ability to visualize all RNA isoforms of a gene and their expression levels; the annotation of known isoforms from external databases; mapping of protein domains and features to exons, allowing changes to protein sequence and function between isoforms to be established; and extensive species compatibility. Datasets visualised on IsoVis remain private to the user, allowing analysis of sensitive data. IsoVis visualisations can be downloaded to create publication-ready figures. The IsoVis webserver enables researchers to perform isoform analyses without requiring programming skills, is free to use, and available at https://isomix.org/isovis/.


Asunto(s)
Internet , Anotación de Secuencia Molecular , Isoformas de ARN , Programas Informáticos , Isoformas de ARN/genética , Isoformas de ARN/metabolismo , Isoformas de ARN/química , Humanos , Animales , Exones/genética , Transcriptoma/genética , Empalme Alternativo
3.
RNA ; 28(2): 162-176, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-34728536

RESUMEN

Nanopore sequencing devices read individual RNA strands directly. This facilitates identification of exon linkages and nucleotide modifications; however, using conventional direct RNA nanopore sequencing, the 5' and 3' ends of poly(A) RNA cannot be identified unambiguously. This is due in part to RNA degradation in vivo and in vitro that can obscure transcription start and end sites. In this study, we aimed to identify individual full-length human RNA isoforms among ∼4 million nanopore poly(A)-selected RNA reads. First, to identify RNA strands bearing 5' m7G caps, we exchanged the biological cap for a modified cap attached to a 45-nt oligomer. This oligomer adaptation method improved 5' end sequencing and ensured correct identification of the 5' m7G capped ends. Second, among these 5'-capped nanopore reads, we screened for features consistent with a 3' polyadenylation site. Combining these two steps, we identified 294,107 individual high-confidence full-length RNA scaffolds from human GM12878 cells, most of which (257,721) aligned to protein-coding genes. Of these, 4876 scaffolds indicated unannotated isoforms that were often internal to longer, previously identified RNA isoforms. Orthogonal data for m7G caps and open chromatin, such as CAGE and DNase-HS seq, confirmed the validity of these high-confidence RNA scaffolds.


Asunto(s)
Isoformas de ARN/química , ARN Mensajero/química , Línea Celular Tumoral , Humanos , Secuenciación de Nanoporos/métodos , Señales de Poliadenilación de ARN 3' , Isoformas de ARN/genética , ARN Mensajero/genética , Transcriptoma
4.
Genome Res ; 30(9): 1332-1344, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32887688

RESUMEN

Eukaryotic genes often generate a variety of RNA isoforms that can lead to functionally distinct protein variants. The synthesis and stability of RNA isoforms is poorly characterized because current methods to quantify RNA metabolism use short-read sequencing and cannot detect RNA isoforms. Here we present nanopore sequencing-based isoform dynamics (nano-ID), a method that detects newly synthesized RNA isoforms and monitors isoform metabolism. Nano-ID combines metabolic RNA labeling, long-read nanopore sequencing of native RNA molecules, and machine learning. Nano-ID derives RNA stability estimates and evaluates stability determining factors such as RNA sequence, poly(A)-tail length, secondary structure, translation efficiency, and RNA-binding proteins. Application of nano-ID to the heat shock response in human cells reveals that many RNA isoforms change their stability. Nano-ID also shows that the metabolism of individual RNA isoforms differs strongly from that estimated for the combined RNA signal at a specific gene locus. Nano-ID enables studies of RNA metabolism at the level of single RNA molecules and isoforms in different cell states and conditions.


Asunto(s)
Secuenciación de Nanoporos/métodos , Isoformas de ARN/química , Estabilidad del ARN , Línea Celular Tumoral , Humanos , Aprendizaje Automático , Redes Neurales de la Computación , Isoformas de ARN/síntesis química , Uridina/química
5.
RNA Biol ; 19(1): 279-289, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35188062

RESUMEN

The Drosha cleavage of a pri-miRNA defines mature microRNA sequence. Drosha cleavage at alternative positions generates 5' isoforms (isomiRs) which have distinctive functions. To understand how pri-miRNA structures influence Drosha cleavage, we performed a systematic analysis of the maturation of endogenous pri-miRNAs and their variants both in vitro and in vivo. We show that in addition to previously known features, the overall structural flexibility of pri-miRNA impact Drosha cleavage fidelity. Internal loops and nearby G · U wobble pairs on the pri-miRNA stem induce the use of non-canonical cleavage sites by Drosha, resulting in 5' isomiR production. By analysing patient data deposited in the Cancer Genome Atlas, we provide evidence that alternative Drosha cleavage of pri-miRNAs is a tunable process that responds to the level of pri-miRNA-associated RNA-binding proteins. Together, our findings reveal that Drosha cleavage fidelity can be modulated by altering pri-miRNA structure, a potential mechanism underlying 5' isomiR biogenesis in tumours.[Figure: see text].


Asunto(s)
MicroARNs/química , Conformación de Ácido Nucleico , Isoformas de ARN/química , Humanos , MicroARNs/genética , MicroARNs/metabolismo , División del ARN , Isoformas de ARN/genética , Isoformas de ARN/metabolismo , Ribonucleasa III/metabolismo , Relación Estructura-Actividad
6.
Nucleic Acids Res ; 48(14): 7700-7711, 2020 08 20.
Artículo en Inglés | MEDLINE | ID: mdl-32652016

RESUMEN

Arabidopsis thaliana transcriptomes have been extensively studied and characterized under different conditions. However, most of the current 'RNA-sequencing' technologies produce a relatively short read length and demand a reverse-transcription step, preventing effective characterization of transcriptome complexity. Here, we performed Direct RNA Sequencing (DRS) using the latest Oxford Nanopore Technology (ONT) with exceptional read length. We demonstrate that the complexity of the A. thaliana transcriptomes has been substantially under-estimated. The ONT direct RNA sequencing identified novel transcript isoforms at both the vegetative (14-day old seedlings, stage 1.04) and reproductive stages (stage 6.00-6.10) of development. Using in-house software called TrackCluster, we determined alternative transcription initiation (ATI), alternative polyadenylation (APA), alternative splicing (AS), and fusion transcripts. More than 38 500 novel transcript isoforms were identified, including six categories of fusion-transcripts that may result from differential RNA processing mechanisms. Aided by the Tombo algorithm, we found an enrichment of m5C modifications in the mobile mRNAs, consistent with a recent finding that m5C modification in mRNAs is crucial for their long-distance movement. In summary, ONT DRS offers an advantage in the identification and functional characterization of novel RNA isoforms and RNA base modifications, significantly improving annotation of the A. thaliana genome.


Asunto(s)
Arabidopsis/genética , Secuenciación de Nanoporos/métodos , ARN de Planta/química , ARN de Planta/metabolismo , Análisis de Secuencia de ARN/métodos , Transcriptoma , Citosina/metabolismo , Metilación , Isoformas de ARN/química , Isoformas de ARN/metabolismo , ARN Mensajero/química , ARN Mensajero/metabolismo , RNA-Seq
8.
Nucleic Acids Res ; 47(14): 7262-7275, 2019 08 22.
Artículo en Inglés | MEDLINE | ID: mdl-31305886

RESUMEN

RNA-Seq is a powerful transcriptome profiling technology enabling transcript discovery and quantification. Whilst most commonly used for gene-level quantification, the data can be used for the analysis of transcript isoforms. However, when the underlying transcript assemblies are complex, current visualization approaches can be limiting, with splicing events a challenge to interpret. Here, we report on the development of a graph-based visualization method as a complementary approach to understanding transcript diversity from short-read RNA-Seq data. Following the mapping of reads to a reference genome, a read-to-read comparison is performed on all reads mapping to a given gene, producing a weighted similarity matrix between reads. This is used to produce an RNA assembly graph, where nodes represent reads and edges similarity scores between them. The resulting graphs are visualized in 3D space to better appreciate their sometimes large and complex topology, with other information being overlaid on to nodes, e.g. transcript models. Here we demonstrate the utility of this approach, including the unusual structure of these graphs and how they can be used to identify issues in assembly, repetitive sequences within transcripts and splice variants. We believe this approach has the potential to significantly improve our understanding of transcript complexity.


Asunto(s)
Empalme Alternativo , Gráficos por Computador , Perfilación de la Expresión Génica/métodos , ARN Mensajero/genética , Análisis de Secuencia de ARN/métodos , Genoma Humano/genética , Humanos , Modelos Genéticos , Modelos Moleculares , Conformación de Ácido Nucleico , Isoformas de ARN/química , Isoformas de ARN/genética , Isoformas de ARN/metabolismo , ARN Mensajero/química , ARN Mensajero/metabolismo
9.
Mol Cell ; 45(4): 447-58, 2012 Feb 24.
Artículo en Inglés | MEDLINE | ID: mdl-22264824

RESUMEN

A substantial amount of organismal complexity is thought to be encoded by enhancers which specify the location, timing, and levels of gene expression. In mammals there are more enhancers than promoters which are distributed both between and within genes. Here we show that activated, intragenic enhancers frequently act as alternative tissue-specific promoters producing a class of abundant, spliced, multiexonic poly(A)(+) RNAs (meRNAs) which reflect the host gene's structure. meRNAs make a substantial and unanticipated contribution to the complexity of the transcriptome, appearing as alternative isoforms of the host gene. The low protein-coding potential of meRNAs suggests that many meRNAs may be byproducts of enhancer activation or underlie as-yet-unidentified RNA-encoded functions. Distinguishing between meRNAs and mRNAs will transform our interpretation of dynamic changes in transcription both at the level of individual genes and of the genome as a whole.


Asunto(s)
Elementos de Facilitación Genéticos/fisiología , Regulación de la Expresión Génica , Regiones Promotoras Genéticas/fisiología , Animales , Células Cultivadas , Células Eritroides , Ratones , Poli A , ARN/química , ARN/fisiología , Isoformas de ARN/química , ARN Mensajero/química , ARN Mensajero/fisiología , Transcriptoma
10.
Proc Natl Acad Sci U S A ; 114(47): E10244-E10253, 2017 11 21.
Artículo en Inglés | MEDLINE | ID: mdl-29109288

RESUMEN

Chronic obstructive pulmonary disease (COPD) affects over 65 million individuals worldwide, where α-1-antitrypsin deficiency is a major genetic cause of the disease. The α-1-antitrypsin gene, SERPINA1, expresses an exceptional number of mRNA isoforms generated entirely by alternative splicing in the 5'-untranslated region (5'-UTR). Although all SERPINA1 mRNAs encode exactly the same protein, expression levels of the individual mRNAs vary substantially in different human tissues. We hypothesize that these transcripts behave unequally due to a posttranscriptional regulatory program governed by their distinct 5'-UTRs and that this regulation ultimately determines α-1-antitrypsin expression. Using whole-transcript selective 2'-hydroxyl acylation by primer extension (SHAPE) chemical probing, we show that splicing yields distinct local 5'-UTR secondary structures in SERPINA1 transcripts. Splicing in the 5'-UTR also changes the inclusion of long upstream ORFs (uORFs). We demonstrate that disrupting the uORFs results in markedly increased translation efficiencies in luciferase reporter assays. These uORF-dependent changes suggest that α-1-antitrypsin protein expression levels are controlled at the posttranscriptional level. A leaky-scanning model of translation based on Kozak translation initiation sequences alone does not adequately explain our quantitative expression data. However, when we incorporate the experimentally derived RNA structure data, the model accurately predicts translation efficiencies in reporter assays and improves α-1-antitrypsin expression prediction in primary human tissues. Our results reveal that RNA structure governs a complex posttranscriptional regulatory program of α-1-antitrypsin expression. Crucially, these findings describe a mechanism by which genetic alterations in noncoding gene regions may result in α-1-antitrypsin deficiency.


Asunto(s)
Empalme Alternativo/genética , Modelos Biológicos , Biosíntesis de Proteínas/genética , ARN Mensajero/química , alfa 1-Antitripsina/genética , Regiones no Traducidas 5'/genética , Células A549 , Secuencia de Bases , Células Hep G2 , Humanos , Mutagénesis , Sistemas de Lectura Abierta/genética , Enfermedad Pulmonar Obstructiva Crónica/genética , Relación Estructura-Actividad Cuantitativa , Isoformas de ARN/química , Isoformas de ARN/genética , ARN Mensajero/genética , Deficiencia de alfa 1-Antitripsina/genética
11.
Nucleic Acids Res ; 43(15): e96, 2015 Sep 03.
Artículo en Inglés | MEDLINE | ID: mdl-25953852

RESUMEN

Most mammalian genes have mRNA variants due to alternative promoter usage, alternative splicing, and alternative cleavage and polyadenylation. Expression of alternative RNA isoforms has been found to be associated with tumorigenesis, proliferation and differentiation. Detection of condition-associated transcription variation requires association methods. Traditional association methods such as Pearson chi-square test and Fisher Exact test are single test methods and do not work on count data with replicates. Although the Cochran Mantel Haenszel (CMH) approach can handle replicated count data, our simulations showed that multiple CMH tests still had very low power. To identify condition-associated variation of transcription, we here proposed a ranking analysis of chi-squares (RAX2) for large-scale association analysis. RAX2 is a nonparametric method and has accurate and conservative estimation of FDR profile. Simulations demonstrated that RAX2 performs well in finding condition-associated transcription variants. We applied RAX2 to primary T-cell transcriptomic data and identified 1610 (16.3%) tags associated in transcription with immune stimulation at FDR < 0.05. Most of these tags also had differential expression. Analysis of two and three tags within genes revealed that under immune stimulation short RNA isoforms were preferably used.


Asunto(s)
Empalme Alternativo , Perfilación de la Expresión Génica/métodos , Poliadenilación , Linfocitos T CD4-Positivos/metabolismo , Línea Celular , Distribución de Chi-Cuadrado , Variación Genética , Genómica/métodos , Humanos , Isoformas de ARN/química , Isoformas de ARN/metabolismo , Estadísticas no Paramétricas , Transcripción Genética
12.
Nucleic Acids Res ; 43(1): e1, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25056322

RESUMEN

The preparation and high-throughput sequencing of cDNA libraries from samples of small RNA is a powerful tool to quantify known small RNAs (such as microRNAs) and to discover novel RNA species. Interest in identifying the small RNA repertoire present in tissues and in biofluids has grown substantially with the findings that small RNAs can serve as indicators of biological conditions and disease states. Here we describe a novel and straightforward method to clone cDNA libraries from small quantities of input RNA. This method permits the generation of cDNA libraries from sub-picogram quantities of RNA robustly, efficiently and reproducibly. We demonstrate that the method provides a significant improvement in sensitivity compared to previous cloning methods while maintaining reproducible identification of diverse small RNA species. This method should have widespread applications in a variety of contexts, including biomarker discovery from scarce samples of human tissue or body fluids.


Asunto(s)
Clonación Molecular/métodos , Biblioteca de Genes , MicroARNs/sangre , Biotinilación , ADN Complementario/química , ADN Complementario/aislamiento & purificación , Nucleótidos de Desoxiuracil , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , MicroARNs/química , MicroARNs/aislamiento & purificación , Isoformas de ARN/química , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa
13.
Bioinformatics ; 31(14): 2400-2, 2015 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-25617416

RESUMEN

MOTIVATION: Analysis of RNA sequencing (RNA-Seq) data revealed that the vast majority of human genes express multiple mRNA isoforms, produced by alternative pre-mRNA splicing and other mechanisms, and that most alternative isoforms vary in expression between human tissues. As RNA-Seq datasets grow in size, it remains challenging to visualize isoform expression across multiple samples. RESULTS: To help address this problem, we present Sashimi plots, a quantitative visualization of aligned RNA-Seq reads that enables quantitative comparison of exon usage across samples or experimental conditions. Sashimi plots can be made using the Broad Integrated Genome Viewer or with a stand-alone command line program. AVAILABILITY AND IMPLEMENTATION: Software code and documentation freely available here: http://miso.readthedocs.org/en/fastmiso/sashimi.html


Asunto(s)
Empalme Alternativo , Exones , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Gráficos por Computador , Humanos , Isoformas de ARN/química , Isoformas de ARN/metabolismo , Alineación de Secuencia
14.
Nucleic Acids Res ; 42(3): 1427-41, 2014 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-24178030

RESUMEN

MicroRNA (miRNA) 5'-isoforms, or 5'-isomiRs, are small-RNA species that originate from the same genomic loci as the major miRNAs with their 5' ends shifted from the 5' ends of the miRNAs by a few nucleotides. Although 5'-isomiRs have been reported, their origins, properties and potential functions remain to be examined. We systematically studied 5'-isomiRs in human, mouse, fruitfly and worm by analysing a large collection of small non-coding RNA and mRNA profiling data. The results revealed a broad existence of 5'-isomiRs in the four species, many of which were conserved and could arise from genomic loci of canonical and non-canonical miRNAs. The well-conserved 5'-isomiRs have several features, including a preference of the 3p over the 5p arms of hairpins of conserved mammalian miRNAs, altered 5'-isomiRs across species and across tissues, and association with structural variations of miRNA hairpins. Importantly, 5'-isomiRs and their major miRNAs may have different mRNA targets and thus potentially play distinct roles of gene regulation, as shown by an integrative analysis combining miRNA and mRNA profiling data from psoriatic and normal human skin and from murine miRNA knockout assays. Indeed, 18 5'-isomiRs had aberrant expression in psoriatic human skin, suggesting their potential function in psoriasis pathogenesis. The results of the current study deepened our understanding of the diversity and conservation of miRNAs, their plasticity in gene regulation and potential broad function in complex diseases.


Asunto(s)
MicroARNs/química , MicroARNs/metabolismo , Animales , Secuencia de Bases , Caenorhabditis elegans/genética , Secuencia Conservada , Drosophila melanogaster/genética , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Ratones , MicroARNs/genética , Modelos Animales , Psoriasis/genética , Isoformas de ARN/química , Isoformas de ARN/genética , Isoformas de ARN/metabolismo , Análisis de Secuencia de ARN , Piel/metabolismo
15.
Bioinformatics ; 30(14): 1958-64, 2014 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-24659106

RESUMEN

MOTIVATION: High-throughput sequencing of RNA in vivo facilitates many applications, not the least of which is the cataloging of variant splice isoforms of protein-coding messenger RNAs. Although many solutions have been proposed for reconstructing putative isoforms from deep sequencing data, these generally take as their substrate the collective alignment structure of RNA-seq reads and ignore the biological signals present in the actual nucleotide sequence. The majority of these solutions are graph-theoretic, relying on a splice graph representing the splicing patterns and exon expression levels indicated by the spliced-alignment process. RESULTS: We show how to augment splice graphs with additional information reflecting the biology of transcription, splicing and translation, to produce what we call an ORF (open reading frame) graph. We then show how ORF graphs can be used to produce isoform predictions with higher accuracy than current state-of-the-art approaches. AVAILABILITY AND IMPLEMENTATION: RSVP is available as C++ source code under an open-source licence: http://ohlerlab.mdc-berlin.de/software/RSVP/.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Sistemas de Lectura Abierta , Isoformas de ARN/química , Análisis de Secuencia de ARN/métodos , Arabidopsis/genética , Exones , Humanos , Isoformas de ARN/metabolismo , Empalme del ARN , Programas Informáticos
16.
Bioinformatics ; 30(17): 2447-55, 2014 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-24813214

RESUMEN

MOTIVATION: Several state-of-the-art methods for isoform identification and quantification are based on [Formula: see text]-regularized regression, such as the Lasso. However, explicitly listing the-possibly exponentially-large set of candidate transcripts is intractable for genes with many exons. For this reason, existing approaches using the [Formula: see text]-penalty are either restricted to genes with few exons or only run the regression algorithm on a small set of preselected isoforms. RESULTS: We introduce a new technique called FlipFlop, which can efficiently tackle the sparse estimation problem on the full set of candidate isoforms by using network flow optimization. Our technique removes the need of a preselection step, leading to better isoform identification while keeping a low computational cost. Experiments with synthetic and real RNA-Seq data confirm that our approach is more accurate than alternative methods and one of the fastest available. AVAILABILITY AND IMPLEMENTATION: Source code is freely available as an R package from the Bioconductor Web site (http://www.bioconductor.org/), and more information is available at http://cbio.ensmp.fr/flipflop. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Isoformas de ARN/química , Análisis de Secuencia de ARN/métodos , Algoritmos , Exones , Humanos , Modelos Estadísticos , Isoformas de ARN/metabolismo , Programas Informáticos
17.
Bioinformatics ; 30(5): 644-51, 2014 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-24130305

RESUMEN

MOTIVATION: RNA-Seq technology is promising to uncover many novel alternative splicing events, gene fusions and other variations in RNA transcripts. For an accurate detection and quantification of transcripts, it is important to resolve the mapping ambiguity for those RNA-Seq reads that can be mapped to multiple loci: >17% of the reads from mouse RNA-Seq data and 50% of the reads from some plant RNA-Seq data have multiple mapping loci. In this study, we show how to resolve the mapping ambiguity in the presence of novel transcriptomic events such as exon skipping and novel indels towards accurate downstream analysis. We introduce ORMAN ( O ptimal R esolution of M ultimapping A mbiguity of R N A-Seq Reads), which aims to compute the minimum number of potential transcript products for each gene and to assign each multimapping read to one of these transcripts based on the estimated distribution of the region covering the read. ORMAN achieves this objective through a combinatorial optimization formulation, which is solved through well-known approximation algorithms, integer linear programs and heuristics. RESULTS: On a simulated RNA-Seq dataset including a random subset of transcripts from the UCSC database, the performance of several state-of-the-art methods for identifying and quantifying novel transcripts, such as Cufflinks, IsoLasso and CLIIQ, is significantly improved through the use of ORMAN. Furthermore, in an experiment using real RNA-Seq reads, we show that ORMAN is able to resolve multimapping to produce coverage values that are similar to the original distribution, even in genes with highly non-uniform coverage. AVAILABILITY: ORMAN is available at http://orman.sf.net


Asunto(s)
Perfilación de la Expresión Génica/métodos , Isoformas de ARN/metabolismo , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Algoritmos , Empalme Alternativo , Exones , Humanos , Isoformas de ARN/química , Alineación de Secuencia
18.
Nucleic Acids Res ; 41(1): e6, 2013 Jan 07.
Artículo en Inglés | MEDLINE | ID: mdl-22941640

RESUMEN

RNA sequencing has become an important method to perform hypothesis-free characterization of global gene expression. One of the limitations of RNA sequencing is that most sequence reads represent highly expressed transcripts, whereas low level transcripts are challenging to detect. To combine the benefits of traditional expression arrays with the advantages of RNA sequencing, we have used whole exome enrichment prior to sequencing of total RNA. We show that whole exome capture can be successfully applied to cDNA to study the transcriptional landscape in human tissues. By introducing the exome enrichment step, we are able to identify transcripts present at very low levels, which are below the level of detection in conventional RNA sequencing. Although the enrichment increases the ability to detect presence of transcripts, it also lowers the accuracy of quantification of expression levels. Our results yield a large number of novel exons and splice isoforms, suggesting that conventional RNA sequencing methods only detect a small fraction of the full transcript diversity. We propose that whole exome enrichment of RNA is a suitable strategy for genome-wide discovery of novel transcripts, alternative splice variants and fusion genes.


Asunto(s)
Empalme Alternativo , Exoma , Isoformas de ARN/química , Análisis de Secuencia de ARN/métodos , Exones , Humanos , Polimorfismo de Nucleótido Simple , Sitios de Empalme de ARN , Transcriptoma
19.
BMC Bioinformatics ; 15: 135, 2014 May 09.
Artículo en Inglés | MEDLINE | ID: mdl-24885830

RESUMEN

BACKGROUND: The main goal of the whole transcriptome analysis is to correctly identify all expressed transcripts within a specific cell/tissue--at a particular stage and condition--to determine their structures and to measure their abundances. RNA-seq data promise to allow identification and quantification of transcriptome at unprecedented level of resolution, accuracy and low cost. Several computational methods have been proposed to achieve such purposes. However, it is still not clear which promises are already met and which challenges are still open and require further methodological developments. RESULTS: We carried out a simulation study to assess the performance of 5 widely used tools, such as: CEM, Cufflinks, iReckon, RSEM, and SLIDE. All of them have been used with default parameters. In particular, we considered the effect of the following three different scenarios: the availability of complete annotation, incomplete annotation, and no annotation at all. Moreover, comparisons were carried out using the methods in three different modes of action. In the first mode, the methods were forced to only deal with those isoforms that are present in the annotation; in the second mode, they were allowed to detect novel isoforms using the annotation as guide; in the third mode, they were operating in fully data driven way (although with the support of the alignment on the reference genome). In the latter modality, precision and recall are quite poor. On the contrary, results are better with the support of the annotation, even though it is not complete. Finally, abundance estimation error often shows a very skewed distribution. The performance strongly depends on the true real abundance of the isoforms. Lowly (and sometimes also moderately) expressed isoforms are poorly detected and estimated. In particular, lowly expressed isoforms are identified mainly if they are provided in the original annotation as potential isoforms. CONCLUSIONS: Both detection and quantification of all isoforms from RNA-seq data are still hard problems and they are affected by many factors. Overall, the performance significantly changes since it depends on the modes of action and on the type of available annotation. Results obtained using complete or partial annotation are able to detect most of the expressed isoforms, even though the number of false positives is often high. Fully data driven approaches require more attention, at least for complex eucaryotic genomes. Improvements are desirable especially for isoform quantification and for isoform detection with low abundance.


Asunto(s)
Isoformas de ARN/análisis , Programas Informáticos , Algoritmos , Perfilación de la Expresión Génica , Humanos , Isoformas de ARN/química , Isoformas de ARN/metabolismo , Análisis de Secuencia de ARN/métodos
20.
Bioinformatics ; 29(6): 810-2, 2013 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-23396121

RESUMEN

Next-generation sequencing is rapidly becoming the approach of choice for transcriptional analysis experiments. Substantial advances have been achieved in computational approaches to support these technologies. These approaches typically rely on existing transcript annotations, introducing a bias towards known genes, require specific experimental design and computational resources, or focus only on identification of splice variants (ignoring other biologically relevant transcribed features contained within the data that may be important for downstream analysis). Biologically relevant transcribed features also include large and small non-coding RNA, new transcription start sites, alternative promoters, RNA editing and processing of coding transcripts. Also, many existing solutions lack accessible interfaces required for wide scale adoption. We present a user-friendly, rapid and computation-efficient feature annotation framework (RNA-eXpress) that enables identification of transcripts and other genomic and transcriptional features independently of current annotations. RNA-eXpress accepts mapped reads in the standard binary alignment (BAM) format and produces a study-specific feature annotation in GTF format, comparison statistics, sequence extraction and feature counts. The framework is designed to be easily accessible while allowing advanced users to integrate new feature-identification algorithms through simple class extension, thus facilitating expansion to novel feature types or identification of study-specific feature types.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Algoritmos , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Anotación de Secuencia Molecular , Isoformas de ARN/química , Empalme del ARN , ARN no Traducido/química , Sitio de Iniciación de la Transcripción , Regiones no Traducidas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA