Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
BMC Genomics ; 22(1): 412, 2021 Jun 04.
Artículo en Inglés | MEDLINE | ID: mdl-34088266

RESUMEN

BACKGROUND: The development of RNA sequencing (RNAseq) and the corresponding emergence of public datasets have created new avenues of transcriptional marker search. The long non-coding RNAs (lncRNAs) constitute an emerging class of transcripts with a potential for high tissue specificity and function. Therefore, we tested the biomarker potential of lncRNAs on Mesenchymal Stem Cells (MSCs), a complex type of adult multipotent stem cells of diverse tissue origins, that is frequently used in clinics but which is lacking extensive characterization. RESULTS: We developed a dedicated bioinformatics pipeline for the purpose of building a cell-specific catalogue of unannotated lncRNAs. The pipeline performs ab initio transcript identification, pseudoalignment and uses new methodologies such as a specific k-mer approach for naive quantification of expression in numerous RNAseq data. We next applied it on MSCs, and our pipeline was able to highlight novel lncRNAs with high cell specificity. Furthermore, with original and efficient approaches for functional prediction, we demonstrated that each candidate represents one specific state of MSCs biology. CONCLUSIONS: We showed that our approach can be employed to harness lncRNAs as cell markers. More specifically, our results suggest different candidates as potential actors in MSCs biology and propose promising directions for future experimental investigations.


Asunto(s)
Células Madre Mesenquimatosas , ARN Largo no Codificante , Secuencia de Bases , Biología Computacional , ARN Largo no Codificante/genética , Análisis de Secuencia de ARN
2.
Nucleic Acids Res ; 42(5): 2820-32, 2014 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-24357408

RESUMEN

Recent sequencing technologies that allow massive parallel production of short reads are the method of choice for transcriptome analysis. Particularly, digital gene expression (DGE) technologies produce a large dynamic range of expression data by generating short tag signatures for each cell transcript. These tags can be mapped back to a reference genome to identify new transcribed regions that can be further covered by RNA-sequencing (RNA-Seq) reads. Here, we applied an integrated bioinformatics approach that combines DGE tags, RNA-Seq, tiling array expression data and species-comparison to explore new transcriptional regions and their specific biological features, particularly tissue expression or conservation. We analysed tags from a large DGE data set (designated as 'TranscriRef'). We then annotated 750,000 tags that were uniquely mapped to the human genome according to Ensembl. We retained transcripts originating from both DNA strands and categorized tags corresponding to protein-coding genes, antisense, intronic- or intergenic-transcribed regions and computed their overlap with annotated non-coding transcripts. Using this bioinformatics approach, we identified ∼34,000 novel transcribed regions located outside the boundaries of known protein-coding genes. As demonstrated using sequencing data from human pluripotent stem cells for biological validation, the method could be easily applied for the selection of tissue-specific candidate transcripts. DigitagCT is available at http://cractools.gforge.inria.fr/softwares/digitagct.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Genoma Humano , ARN no Traducido/análisis , Análisis de Secuencia de ARN/métodos , Línea Celular , Humanos , Anotación de Secuencia Molecular , Poli A/análisis , Programas Informáticos , Transcripción Genética
3.
Br J Haematol ; 157(3): 347-56, 2012 May.
Artículo en Inglés | MEDLINE | ID: mdl-22390678

RESUMEN

Chronic myelomonocytic leukaemia (CMML) is a heterogeneous haematopoietic disorder characterized by myeloproliferative or myelodysplastic features. At present, the pathogenesis of this malignancy is not completely understood. In this study, we sought to analyse gene expression profiles of CMML in order to characterize new molecular outcome predictors. A learning set of 32 untreated CMML patients at diagnosis was available for TaqMan low-density array gene expression analysis. From 93 selected genes related to cancer and cell cycle, we built a five-gene prognostic index after multiplicity correction. Using this index, we characterized two categories of patients with distinct overall survival (94% vs. 19% for good and poor overall survival, respectively; P = 0·007) and we successfully validated its strength on an independent cohort of 21 CMML patients with Affymetrix gene expression data. We found no specific patterns of association with traditional prognostic stratification parameters in the learning cohort. However, the poor survival group strongly correlated with high-risk treated patients and transformation to acute myeloid leukaemia. We report here a new multigene prognostic index for CMML, independent of the gene expression measurement method, which could be used as a powerful tool to predict clinical outcome and help physicians to evaluate criteria for treatments.


Asunto(s)
Biomarcadores de Tumor/metabolismo , Leucemia Mielomonocítica Crónica/diagnóstico , Anciano , Anciano de 80 o más Años , Estudios de Casos y Controles , Femenino , Estudios de Seguimiento , Perfilación de la Expresión Génica/métodos , Humanos , Estimación de Kaplan-Meier , Leucemia Mielomonocítica Crónica/terapia , Masculino , Persona de Mediana Edad , Familia de Multigenes , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Reacción en Cadena de la Polimerasa/métodos , Pronóstico , ARN Neoplásico/genética , Resultado del Tratamiento , Células U937
4.
NAR Genom Bioinform ; 3(3): lqab058, 2021 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-34179780

RESUMEN

The huge body of publicly available RNA-sequencing (RNA-seq) libraries is a treasure of functional information allowing to quantify the expression of known or novel transcripts in tissues. However, transcript quantification commonly relies on alignment methods requiring a lot of computational resources and processing time, which does not scale easily to large datasets. K-mer decomposition constitutes a new way to process RNA-seq data for the identification of transcriptional signatures, as k-mers can be used to quantify accurately gene expression in a less resource-consuming way. We present the Kmerator Suite, a set of three tools designed to extract specific k-mer signatures, quantify these k-mers into RNA-seq datasets and quickly visualize large dataset characteristics. The core tool, Kmerator, produces specific k-mers for 97% of human genes, enabling the measure of gene expression with high accuracy in simulated datasets. KmerExploR, a direct application of Kmerator, uses a set of predictor gene-specific k-mers to infer metadata including library protocol, sample features or contaminations from RNA-seq datasets. KmerExploR results are visualized through a user-friendly interface. Moreover, we demonstrate that the Kmerator Suite can be used for advanced queries targeting known or new biomarkers such as mutations, gene fusions or long non-coding RNAs for human health applications.

5.
Nucleic Acids Res ; 35(17): e108, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17709346

RESUMEN

Analysis of several million expressed gene signatures (tags) revealed an increasing number of different sequences, largely exceeding that of annotated genes in mammalian genomes. Serial analysis of gene expression (SAGE) can reveal new Poly(A) RNAs transcribed from previously unrecognized chromosomal regions. However, conventional SAGE tags are too short to identify unambiguously unique sites in large genomes. Here, we design a novel strategy with tags anchored on two different restrictions sites of cDNAs. New transcripts are then tentatively defined by the two SAGE tags in tandem and by the spanning sequence read on the genome between these tagged sites. Having developed a new algorithm to locate these tag-delimited genomic sequences (TDGS), we first validated its capacity to recognize known genes and its ability to reveal new transcripts with two SAGE libraries built in parallel from a single RNA sample. Our algorithm proves fast enough to experiment this strategy at a large scale. We then collected and processed the complete sets of human SAGE tags to predict yet unknown transcripts. A cross-validation with tiling arrays data shows that 47% of these TDGS overlap transcriptional active regions. Our method provides a new and complementary approach for complex transcriptome annotation.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/métodos , ARN Mensajero/análisis , Lugares Marcados de Secuencia , Secuencia de Bases , Biología Computacional , Biblioteca de Genes , Genómica/métodos , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , Alineación de Secuencia , Transcripción Genética
6.
Methods Mol Biol ; 1769: 133-156, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29564822

RESUMEN

RNA-Seq approach enables the detection and characterization of fusion or chimeric transcript associated to complex genome rearrangement. Until now, these events are classically identified at DNA level.Here we describe a complete procedure including a novel way of analyzing reads that combines genomic locations and local coverage to directly infer chimeric junctions with a high sensitivity and specificity, allowing identification of different classes of chimeric RNA events. We also recommend the best practices for the bioinformatics analysis and describe the experimental process for RNA validation using real-time PCR and sequencing.


Asunto(s)
Cromotripsis , Reordenamiento Génico , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ARN , Transcripción Genética , Algoritmos , Biología Computacional/métodos , Biblioteca de Genes , Anotación de Secuencia Molecular , Flujo de Trabajo
7.
F1000Res ; 62017.
Artículo en Inglés | MEDLINE | ID: mdl-29623188

RESUMEN

Background: High-throughput next generation sequencing (NGS) technologies enable the detection of biomarkers used for tumor classification, disease monitoring and cancer therapy. Whole-transcriptome analysis using RNA-seq is important, not only as a means of understanding the mechanisms responsible for complex diseases but also to efficiently identify novel genes/exons, splice isoforms, RNA editing, allele-specific mutations, differential gene expression and fusion-transcripts or chimeric RNA (chRNA). Methods: We used Crac, a tool that uses genomic locations and local coverage to classify biological events and directly infer splice and chimeric junctions within a single read. Crac's algorithm extracts transcriptional chimeric events irrespective of annotation with a high sensitivity, and CracTools was used to aggregate, annotate and filter the chRNA reads. The selected chRNA candidates were validated by real time PCR and sequencing.  In order to check the tumor specific expression of chRNA, we analyzed a publicly available dataset using a new tag search approach. Results:  We present data related to acute myeloid leukemia (AML) RNA-seq analysis. We highlight novel biological cases of chRNA, in addition to previously well characterized leukemia chRNA. We have identified and validated 17 chRNAs among 3 AML patients: 10 from an AML patient with a translocation between chromosomes 15 and 17 (AML-t(15;17), 4  from patient with normal karyotype (AML-NK) 3 from a patient with chromosomal 16 inversion (AML-inv16). The new fusion transcripts can be classified into four groups according to the exon organization. Conclusions:  All groups suggest complex but distinct synthesis mechanisms involving either collinear exons of different genes, non-collinear exons, or exons of different chromosomes. Finally, we check tumor-specific expression in a larger RNA-seq AML cohort and identify new AML biomarkers that could improve diagnosis and prognosis of AML.

8.
BioData Min ; 9: 34, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27822312

RESUMEN

BACKGROUND: High-throughput sequencing technology and bioinformatics have identified chimeric RNAs (chRNAs), raising the possibility of chRNAs expressing particularly in diseases can be used as potential biomarkers in both diagnosis and prognosis. RESULTS: The task of discriminating true chRNAs from the false ones poses an interesting Machine Learning (ML) challenge. First of all, the sequencing data may contain false reads due to technical artifacts and during the analysis process, bioinformatics tools may generate false positives due to methodological biases. Moreover, if we succeed to have a proper set of observations (enough sequencing data) about true chRNAs, chances are that the devised model can not be able to generalize beyond it. Like any other machine learning problem, the first big issue is finding the good data to build models. As far as we were concerned, there is no common benchmark data available for chRNAs detection. The definition of a classification baseline is lacking in the related literature too. In this work we are moving towards benchmark data and an evaluation of the fidelity of supervised classifiers in the prediction of chRNAs. CONCLUSIONS: We proposed a modelization strategy that can be used to increase the tools performances in context of chRNA classification based on a simulated data generator, that permit to continuously integrate new complex chimeric events. The pipeline incorporated a genome mutation process and simulated RNA-seq data. The reads within distinct depth were aligned and analysed by CRAC that integrates genomic location and local coverage, allowing biological predictions at the read scale. Additionally, these reads were functionally annotated and aggregated to form chRNAs events, making it possible to evaluate ML methods (classifiers) performance in both levels of reads and events. Ensemble learning strategies demonstrated to be more robust to this classification problem, providing an average AUC performance of 95 % (ACC=94 %, Kappa=0.87 %). The resulting classification models were also tested on real RNA-seq data from a set of twenty-seven patients with acute myeloid leukemia (AML).

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA