Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 42
Filtrar
1.
Cancers (Basel) ; 13(21)2021 11 03.
Artículo en Inglés | MEDLINE | ID: mdl-34771686

RESUMEN

Anaplastic large cell lymphomas associated with ALK translocation have a good outcome after CHOP treatment; however, the 2-year relapse rate remains at 30%. Microarray gene-expression profiling of 48 samples obtained at diagnosis was used to identify 47 genes that were differentially expressed between patients with early relapse/progression and no relapse. In the relapsing group, the most significant overrepresented genes were related to the regulation of the immune response and T-cell activation while those in the non-relapsing group were involved in the extracellular matrix. Fluidigm technology gave concordant results for 29 genes, of which FN1, FAM179A, and SLC40A1 had the strongest predictive power after logistic regression and two classification algorithms. In parallel with 39 samples, we used a Kallisto/Sleuth pipeline to analyze RNA sequencing data and identified 20 genes common to the 28 genes validated by Fluidigm technology-notably, the FAM179A and FN1 genes. Interestingly, FN1 also belongs to the gene signature predicting longer survival in diffuse large B-cell lymphomas treated with CHOP. Thus, our molecular signatures indicate that the FN1 gene, a matrix key regulator, might also be involved in the prognosis and the therapeutic response in anaplastic lymphomas.

2.
NAR Genom Bioinform ; 3(3): lqab058, 2021 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-34179780

RESUMEN

The huge body of publicly available RNA-sequencing (RNA-seq) libraries is a treasure of functional information allowing to quantify the expression of known or novel transcripts in tissues. However, transcript quantification commonly relies on alignment methods requiring a lot of computational resources and processing time, which does not scale easily to large datasets. K-mer decomposition constitutes a new way to process RNA-seq data for the identification of transcriptional signatures, as k-mers can be used to quantify accurately gene expression in a less resource-consuming way. We present the Kmerator Suite, a set of three tools designed to extract specific k-mer signatures, quantify these k-mers into RNA-seq datasets and quickly visualize large dataset characteristics. The core tool, Kmerator, produces specific k-mers for 97% of human genes, enabling the measure of gene expression with high accuracy in simulated datasets. KmerExploR, a direct application of Kmerator, uses a set of predictor gene-specific k-mers to infer metadata including library protocol, sample features or contaminations from RNA-seq datasets. KmerExploR results are visualized through a user-friendly interface. Moreover, we demonstrate that the Kmerator Suite can be used for advanced queries targeting known or new biomarkers such as mutations, gene fusions or long non-coding RNAs for human health applications.

3.
BMC Genomics ; 22(1): 412, 2021 Jun 04.
Artículo en Inglés | MEDLINE | ID: mdl-34088266

RESUMEN

BACKGROUND: The development of RNA sequencing (RNAseq) and the corresponding emergence of public datasets have created new avenues of transcriptional marker search. The long non-coding RNAs (lncRNAs) constitute an emerging class of transcripts with a potential for high tissue specificity and function. Therefore, we tested the biomarker potential of lncRNAs on Mesenchymal Stem Cells (MSCs), a complex type of adult multipotent stem cells of diverse tissue origins, that is frequently used in clinics but which is lacking extensive characterization. RESULTS: We developed a dedicated bioinformatics pipeline for the purpose of building a cell-specific catalogue of unannotated lncRNAs. The pipeline performs ab initio transcript identification, pseudoalignment and uses new methodologies such as a specific k-mer approach for naive quantification of expression in numerous RNAseq data. We next applied it on MSCs, and our pipeline was able to highlight novel lncRNAs with high cell specificity. Furthermore, with original and efficient approaches for functional prediction, we demonstrated that each candidate represents one specific state of MSCs biology. CONCLUSIONS: We showed that our approach can be employed to harness lncRNAs as cell markers. More specifically, our results suggest different candidates as potential actors in MSCs biology and propose promising directions for future experimental investigations.


Asunto(s)
Células Madre Mesenquimatosas , ARN Largo no Codificante , Secuencia de Bases , Biología Computacional , ARN Largo no Codificante/genética , Análisis de Secuencia de ARN
4.
Stem Cell Reports ; 14(1): 1-8, 2020 01 14.
Artículo en Inglés | MEDLINE | ID: mdl-31902703

RESUMEN

Genomic integrity of human pluripotent stem cells (hPSCs) is essential for research and clinical applications. However, genetic abnormalities can accumulate during hPSC generation and routine culture and following gene editing. Their occurrence should be regularly monitored, but the current assays to assess hPSC genomic integrity are not fully suitable for such regular screening. To address this issue, we first carried out a large meta-analysis of all hPSC genetic abnormalities reported in more than 100 publications and identified 738 recurrent genetic abnormalities (i.e., overlapping abnormalities found in at least five distinct scientific publications). We then developed a test based on the droplet digital PCR technology that can potentially detect more than 90% of these hPSC recurrent genetic abnormalities in DNA extracted from culture supernatant samples. This test can be used to routinely screen genomic integrity in hPSCs.


Asunto(s)
Variación Genética , Células Madre Pluripotentes/citología , Células Madre Pluripotentes/metabolismo , Biomarcadores , Técnicas de Cultivo de Célula , Diferenciación Celular/genética , Medios de Cultivo Condicionados , Edición Génica , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Inmunofenotipificación , Reacción en Cadena en Tiempo Real de la Polimerasa
5.
Methods Mol Biol ; 1769: 133-156, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29564822

RESUMEN

RNA-Seq approach enables the detection and characterization of fusion or chimeric transcript associated to complex genome rearrangement. Until now, these events are classically identified at DNA level.Here we describe a complete procedure including a novel way of analyzing reads that combines genomic locations and local coverage to directly infer chimeric junctions with a high sensitivity and specificity, allowing identification of different classes of chimeric RNA events. We also recommend the best practices for the bioinformatics analysis and describe the experimental process for RNA validation using real-time PCR and sequencing.


Asunto(s)
Cromotripsis , Reordenamiento Génico , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ARN , Transcripción Genética , Algoritmos , Biología Computacional/métodos , Biblioteca de Genes , Anotación de Secuencia Molecular , Flujo de Trabajo
6.
Sci Rep ; 8(1): 2202, 2018 02 02.
Artículo en Inglés | MEDLINE | ID: mdl-29396444

RESUMEN

Progress in assisted reproductive technologies strongly relies on understanding the regulation of the dialogue between oocyte and cumulus cells (CCs). Little is known about the role of long non-coding RNAs (lncRNAs) in the human cumulus-oocyte complex (COC). To this aim, publicly available RNA-sequencing data were analyzed to identify lncRNAs that were abundant in metaphase II (MII) oocytes (BCAR4, C3orf56, TUNAR, OOEP-AS1, CASC18, and LINC01118) and CCs (NEAT1, MALAT1, ANXA2P2, MEG3, IL6STP1, and VIM-AS1). These data were validated by RT-qPCR analysis using independent oocytes and CC samples. The functions of the identified lncRNAs were then predicted by constructing lncRNA-mRNA co-expression networks. This analysis suggested that MII oocyte lncRNAs could be involved in chromatin remodeling, cell pluripotency and in driving early embryonic development. CC lncRNAs were co-expressed with genes involved in apoptosis and extracellular matrix-related functions. A bioinformatic analysis of RNA-sequencing data to identify CC lncRNAs that are affected by maternal age showed that lncRNAs with age-related altered expression in CCs are essential for oocyte growth. This comprehensive analysis of lncRNAs expressed in human MII oocytes and CCs could provide biomarkers of oocyte quality for the development of non-invasive tests to identify embryos with high developmental potential.


Asunto(s)
Células del Cúmulo/fisiología , Perfilación de la Expresión Génica , Oocitos/fisiología , ARN Largo no Codificante/análisis , Biología Computacional , Humanos , Metafase , Reacción en Cadena en Tiempo Real de la Polimerasa , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa
7.
Hepatology ; 68(1): 89-102, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29152775

RESUMEN

Surgery and cisplatin-based treatment of hepatoblastoma (HB) currently guarantee the survival of 70%-80% of patients. However, some important challenges remain in diagnosing high-risk tumors and identifying relevant targetable pathways offering new therapeutic avenues. Previously, two molecular subclasses of HB tumors have been described, C1 and C2, with C2 being the subgroup with the poorest prognosis, a more advanced tumor stage, and the worst overall survival rate. An associated 16-gene signature to discriminate the two tumoral subgroups was proposed, but it has not been transferred into clinical routine. To address these issues, we performed RNA sequencing of 25 tumors and matched normal liver samples from patients. The transcript profiling separated HB into three distinct subgroups named C1, C2A, and C2B, identifiable by a concise four-gene signature: hydroxysteroid 17-beta dehydrogenase 6, integrin alpha 6, topoisomerase 2-alpha, and vimentin, with topoisomerase 2-alpha being characteristic for the proliferative C2A tumors. Differential expression of these genes was confirmed by quantitative RT-PCR on an expanded cohort and by immunohistochemistry. We also revealed significant overexpression of genes involved in the Fanconi anemia (FA) pathway in the C2A subgroup. We then investigated the ability of several described FA inhibitors to block growth of HB cells in vitro and in vivo. We demonstrated that bortezomib, a Food and Drug Administration-approved proteasome inhibitor, strongly impairs the proliferation and survival of HB cell lines in vitro, blocks FA pathway-associated double-strand DNA repair, and significantly impedes HB growth in vivo. CONCLUSION: The highly proliferating C2A subtype is characterized by topoisomerase 2-alpha gene up-regulation and FA pathway activation, and the HB therapeutic arsenal could include bortezomib for the treatment of patients with the most aggressive tumors. (Hepatology 2018;68:89-102).


Asunto(s)
ADN-Topoisomerasas de Tipo II/metabolismo , Hepatoblastoma/clasificación , Hepatoblastoma/genética , Neoplasias Hepáticas/clasificación , Neoplasias Hepáticas/genética , Proteínas de Unión a Poli-ADP-Ribosa/metabolismo , Antineoplásicos/farmacología , Antineoplásicos/uso terapéutico , Biomarcadores/metabolismo , Bortezomib/farmacología , Bortezomib/uso terapéutico , Reparación del ADN/efectos de los fármacos , Proteínas del Grupo de Complementación de la Anemia de Fanconi/metabolismo , Perfilación de la Expresión Génica , Células Hep G2 , Hepatoblastoma/tratamiento farmacológico , Hepatoblastoma/enzimología , Humanos , Neoplasias Hepáticas/tratamiento farmacológico , Neoplasias Hepáticas/enzimología , Análisis de Secuencia de ARN
8.
Genome Biol ; 18(1): 243, 2017 12 28.
Artículo en Inglés | MEDLINE | ID: mdl-29284518

RESUMEN

We introduce a k-mer-based computational protocol, DE-kupl, for capturing local RNA variation in a set of RNA-seq libraries, independently of a reference genome or transcriptome. DE-kupl extracts all k-mers with differential abundance directly from the raw data files. This enables the retrieval of virtually all variation present in an RNA-seq data set. This variation is subsequently assigned to biological events or entities such as differential long non-coding RNAs, splice and polyadenylation variants, introns, repeats, editing or mutation events, and exogenous RNA. Applying DE-kupl to human RNA-seq data sets identified multiple types of novel events, reproducibly across independent RNA-seq experiments.


Asunto(s)
Biología Computacional/métodos , Variación Genética , ARN/genética , Programas Informáticos , Alelos , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Poliadenilación , Empalme del ARN , ARN sin Sentido , ARN Largo no Codificante/genética , ARN Mensajero/genética , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN , Transcriptoma
9.
BMC Bioinformatics ; 18(1): 428, 2017 Sep 29.
Artículo en Inglés | MEDLINE | ID: mdl-28969586

RESUMEN

BACKGROUND: The evolution of next-generation sequencing (NGS) technologies has led to increased focus on RNA-Seq. Many bioinformatic tools have been developed for RNA-Seq analysis, each with unique performance characteristics and configuration parameters. Users face an increasingly complex task in understanding which bioinformatic tools are best for their specific needs and how they should be configured. In order to provide some answers to these questions, we investigate the performance of leading bioinformatic tools designed for RNA-Seq analysis and propose a methodology for systematic evaluation and comparison of performance to help users make well informed choices. RESULTS: To evaluate RNA-Seq pipelines, we developed a suite of two benchmarking tools. SimCT generates simulated datasets that get as close as possible to specific real biological conditions accompanied by the list of genomic incidents and mutations that have been inserted. BenchCT then compares the output of any bioinformatics pipeline that has been run against a SimCT dataset with the simulated genomic and transcriptional variations it contains to give an accurate performance evaluation in addressing specific biological question. We used these tools to simulate a real-world genomic medicine question s involving the comparison of healthy and cancerous cells. Results revealed that performance in addressing a particular biological context varied significantly depending on the choice of tools and settings used. We also found that by combining the output of certain pipelines, substantial performance improvements could be achieved. CONCLUSION: Our research emphasizes the importance of selecting and configuring bioinformatic tools for the specific biological question being investigated to obtain optimal results. Pipeline designers, developers and users should include benchmarking in the context of their biological question as part of their design and quality control process. Our SimBA suite of benchmarking tools provides a reliable basis for comparing the performance of RNA-Seq bioinformatics pipelines in addressing a specific biological question. We would like to see the creation of a reference corpus of data-sets that would allow accurate comparison between benchmarks performed by different groups and the publication of more benchmarks based on this public corpus. SimBA software and data-set are available at http://cractools.gforge.inria.fr/softwares/simba/ .


Asunto(s)
Biología Computacional/métodos , Simulación por Computador , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Fusión Génica , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Mutación INDEL/genética , Polimorfismo de Nucleótido Simple/genética
10.
F1000Res ; 62017.
Artículo en Inglés | MEDLINE | ID: mdl-29623188

RESUMEN

Background: High-throughput next generation sequencing (NGS) technologies enable the detection of biomarkers used for tumor classification, disease monitoring and cancer therapy. Whole-transcriptome analysis using RNA-seq is important, not only as a means of understanding the mechanisms responsible for complex diseases but also to efficiently identify novel genes/exons, splice isoforms, RNA editing, allele-specific mutations, differential gene expression and fusion-transcripts or chimeric RNA (chRNA). Methods: We used Crac, a tool that uses genomic locations and local coverage to classify biological events and directly infer splice and chimeric junctions within a single read. Crac's algorithm extracts transcriptional chimeric events irrespective of annotation with a high sensitivity, and CracTools was used to aggregate, annotate and filter the chRNA reads. The selected chRNA candidates were validated by real time PCR and sequencing.  In order to check the tumor specific expression of chRNA, we analyzed a publicly available dataset using a new tag search approach. Results:  We present data related to acute myeloid leukemia (AML) RNA-seq analysis. We highlight novel biological cases of chRNA, in addition to previously well characterized leukemia chRNA. We have identified and validated 17 chRNAs among 3 AML patients: 10 from an AML patient with a translocation between chromosomes 15 and 17 (AML-t(15;17), 4  from patient with normal karyotype (AML-NK) 3 from a patient with chromosomal 16 inversion (AML-inv16). The new fusion transcripts can be classified into four groups according to the exon organization. Conclusions:  All groups suggest complex but distinct synthesis mechanisms involving either collinear exons of different genes, non-collinear exons, or exons of different chromosomes. Finally, we check tumor-specific expression in a larger RNA-seq AML cohort and identify new AML biomarkers that could improve diagnosis and prognosis of AML.

11.
BioData Min ; 9: 34, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27822312

RESUMEN

BACKGROUND: High-throughput sequencing technology and bioinformatics have identified chimeric RNAs (chRNAs), raising the possibility of chRNAs expressing particularly in diseases can be used as potential biomarkers in both diagnosis and prognosis. RESULTS: The task of discriminating true chRNAs from the false ones poses an interesting Machine Learning (ML) challenge. First of all, the sequencing data may contain false reads due to technical artifacts and during the analysis process, bioinformatics tools may generate false positives due to methodological biases. Moreover, if we succeed to have a proper set of observations (enough sequencing data) about true chRNAs, chances are that the devised model can not be able to generalize beyond it. Like any other machine learning problem, the first big issue is finding the good data to build models. As far as we were concerned, there is no common benchmark data available for chRNAs detection. The definition of a classification baseline is lacking in the related literature too. In this work we are moving towards benchmark data and an evaluation of the fidelity of supervised classifiers in the prediction of chRNAs. CONCLUSIONS: We proposed a modelization strategy that can be used to increase the tools performances in context of chRNA classification based on a simulated data generator, that permit to continuously integrate new complex chimeric events. The pipeline incorporated a genome mutation process and simulated RNA-seq data. The reads within distinct depth were aligned and analysed by CRAC that integrates genomic location and local coverage, allowing biological predictions at the read scale. Additionally, these reads were functionally annotated and aggregated to form chRNAs events, making it possible to evaluate ML methods (classifiers) performance in both levels of reads and events. Ensemble learning strategies demonstrated to be more robust to this classification problem, providing an average AUC performance of 95 % (ACC=94 %, Kappa=0.87 %). The resulting classification models were also tested on real RNA-seq data from a set of twenty-seven patients with acute myeloid leukemia (AML).

12.
Hum Reprod Update ; 23(1): 19-40, 2016 12.
Artículo en Inglés | MEDLINE | ID: mdl-27655590

RESUMEN

BACKGROUND: Human long non-coding RNAs (lncRNAs) are an emerging category of transcripts with increasingly documented functional roles during development. LncRNAs and roles during human early embryo development have recently begun to be unravelled. OBJECTIVE AND RATIONALE: This review summarizes the most recent knowledge on lncRNAs and focuses on their expression patterns and role during early human embryo development and in pluripotent stem cells (PSCs). Public mRNA sequencing (mRNA-seq) data were used to illustrate these expression signatures. SEARCH METHODS: The PubMed and EMBASE databases were first interrogated using specific terms, such as 'lncRNAs', to get an extensive overview on lncRNAs up to February 2016, and then using 'human lncRNAs' and 'embryo', 'development', or 'PSCs' to focus on lncRNAs involved in human embryo development or in PSC.Recently published RNA-seq data from human oocytes and pre-implantation embryos (including single-cell data), PSC and a panel of normal and malignant adult tissues were used to describe the specific expression patterns of some lncRNAs in early human embryos. OUTCOMES: The existence and the crucial role of lncRNAs in many important biological phenomena in each branch of the life tree are now well documented. The number of identified lncRNAs is rapidly increasing and has already outnumbered that of protein-coding genes. Unlike small non-coding RNAs, a variety of mechanisms of action have been proposed for lncRNAs. The functional role of lncRNAs has been demonstrated in many biological and developmental processes, including cell pluripotency induction, X-inactivation or gene imprinting. Analysis of RNA-seq data highlights that lncRNA abundance changes significantly during human early embryonic development. This suggests that lncRNAs could represent candidate biomarkers for developing non-invasive tests for oocyte or embryo quality. Finally, some of these lncRNAs are also expressed in human cancer tissues, suggesting that reactivation of an embryonic lncRNA program may contribute to human malignancies. WIDER IMPLICATIONS: LncRNAs are emerging potential key players in gene expression regulation. Analysis of RNA-seq data from human pre-implantation embryos identified lncRNA signatures that are specific to this critical step. We anticipate that further studies will show that these new transcripts are major regulators of embryo development. These findings might also be used to develop new tests/treatments for improving the pregnancy success rate in IVF procedures or for regenerative medicine applications involving PSC.


Asunto(s)
Desarrollo Embrionario/genética , Regulación de la Expresión Génica , ARN Largo no Codificante/metabolismo , Humanos , Neoplasias/genética , Inactivación del Cromosoma X
13.
Nat Commun ; 7: 10767, 2016 Feb 24.
Artículo en Inglés | MEDLINE | ID: mdl-26908133

RESUMEN

The cytidine analogues azacytidine and 5-aza-2'-deoxycytidine (decitabine) are commonly used to treat myelodysplastic syndromes, with or without a myeloproliferative component. It remains unclear whether the response to these hypomethylating agents results from a cytotoxic or an epigenetic effect. In this study, we address this question in chronic myelomonocytic leukaemia. We describe a comprehensive analysis of the mutational landscape of these tumours, combining whole-exome and whole-genome sequencing. We identify an average of 14±5 somatic mutations in coding sequences of sorted monocyte DNA and the signatures of three mutational processes. Serial sequencing demonstrates that the response to hypomethylating agents is associated with changes in DNA methylation and gene expression, without any decrease in the mutation allele burden, nor prevention of new genetic alteration occurence. Our findings indicate that cytosine analogues restore a balanced haematopoiesis without decreasing the size of the mutated clone, arguing for a predominantly epigenetic effect.


Asunto(s)
Antimetabolitos Antineoplásicos/farmacología , Azacitidina/análogos & derivados , Azacitidina/farmacología , Supervivencia Celular/efectos de los fármacos , Metilación de ADN/efectos de los fármacos , Epigénesis Genética/efectos de los fármacos , Regulación Neoplásica de la Expresión Génica/efectos de los fármacos , Leucemia Mielomonocítica Crónica/genética , Mutación , Anciano , Anciano de 80 o más Años , Alelos , Antimetabolitos Antineoplásicos/uso terapéutico , Azacitidina/uso terapéutico , Decitabina , Femenino , Células HEK293 , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Leucemia Mielomonocítica Crónica/tratamiento farmacológico , Masculino , Persona de Mediana Edad , Análisis de Secuencia de ADN , Análisis de Secuencia de ARN
14.
Biomed Res Int ; 2014: 423174, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24883311

RESUMEN

Despite the improvement in treatment options, chronic lymphocytic leukemia (CLL) remains an incurable disease and patients show a heterogeneous clinical course requiring therapy for many of them. In the current work, we have built a 20-gene expression (GE)-based risk score predictive for patients overall survival and improving risk classification using microarray gene expression data. GE-based risk score allowed identifying a high-risk group associated with a significant shorter overall survival (OS) and time to treatment (TTT) (P ≤ .01), comprising 19.6% and 13.6% of the patients in two independent cohorts. GE-based risk score, and NRIP1 and TCF7 gene expression remained independent prognostic factors using multivariate Cox analyses and combination of GE-based risk score together with NRIP1 and TCF7 gene expression enabled the identification of three clinically distinct groups of CLL patients. Therefore, this GE-based risk score represents a powerful tool for risk stratification and outcome prediction of CLL patients and could thus be used to guide clinical and therapeutic decisions prospectively.


Asunto(s)
Regulación Neoplásica de la Expresión Génica , Leucemia Linfocítica Crónica de Células B/genética , Proteínas de Neoplasias/biosíntesis , Pronóstico , Proteínas Adaptadoras Transductoras de Señales/biosíntesis , Humanos , Leucemia Linfocítica Crónica de Células B/patología , Análisis por Micromatrices , Proteínas Nucleares/biosíntesis , Proteína de Interacción con Receptores Nucleares 1 , Análisis de Supervivencia , Factor 1 de Transcripción de Linfocitos T/biosíntesis , Resultado del Tratamiento
15.
Nucleic Acids Res ; 42(5): 2820-32, 2014 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-24357408

RESUMEN

Recent sequencing technologies that allow massive parallel production of short reads are the method of choice for transcriptome analysis. Particularly, digital gene expression (DGE) technologies produce a large dynamic range of expression data by generating short tag signatures for each cell transcript. These tags can be mapped back to a reference genome to identify new transcribed regions that can be further covered by RNA-sequencing (RNA-Seq) reads. Here, we applied an integrated bioinformatics approach that combines DGE tags, RNA-Seq, tiling array expression data and species-comparison to explore new transcriptional regions and their specific biological features, particularly tissue expression or conservation. We analysed tags from a large DGE data set (designated as 'TranscriRef'). We then annotated 750,000 tags that were uniquely mapped to the human genome according to Ensembl. We retained transcripts originating from both DNA strands and categorized tags corresponding to protein-coding genes, antisense, intronic- or intergenic-transcribed regions and computed their overlap with annotated non-coding transcripts. Using this bioinformatics approach, we identified ∼34,000 novel transcribed regions located outside the boundaries of known protein-coding genes. As demonstrated using sequencing data from human pluripotent stem cells for biological validation, the method could be easily applied for the selection of tissue-specific candidate transcripts. DigitagCT is available at http://cractools.gforge.inria.fr/softwares/digitagct.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Genoma Humano , ARN no Traducido/análisis , Análisis de Secuencia de ARN/métodos , Línea Celular , Humanos , Anotación de Secuencia Molecular , Poli A/análisis , Programas Informáticos , Transcripción Genética
16.
Genome Biol ; 14(3): R30, 2013 Mar 28.
Artículo en Inglés | MEDLINE | ID: mdl-23537109

RESUMEN

A large number of RNA-sequencing studies set out to predict mutations, splice junctions or fusion RNAs. We propose a method, CRAC, that integrates genomic locations and local coverage to enable such predictions to be made directly from RNA-seq read analysis. A k-mer profiling approach detects candidate mutations, indels and splice or chimeric junctions in each single read. CRAC increases precision compared with existing tools, reaching 99:5% for splice junctions, without losing sensitivity. Importantly, CRAC predictions improve with read length. In cancer libraries, CRAC recovered 74% of validated fusion RNAs and predicted novel recurrent chimeric junctions. CRAC is available at http://crac.gforge.inria.fr.


Asunto(s)
Algoritmos , Análisis de Secuencia de ARN/métodos , Neoplasias de la Mama/genética , Simulación por Computador , Femenino , Biblioteca de Genes , Genoma , Humanos , Sitios de Empalme de ARN/genética
17.
Cancer Biol Ther ; 14(5): 401-10, 2013 May.
Artículo en Inglés | MEDLINE | ID: mdl-23377825

RESUMEN

The N-myc downstream regulated gene 1 (NDRG1) has been identified as a metastasis-suppressor gene in prostate cancer (PCa). Compounds targeting PCa cells deficient in NDRG1 could potentially decrease invasion/metastasis of PCa. A cell based screening strategy was employed to identify small molecules that selectively target NDRG1 deficient PCa cells. DU-145 PCa cells rendered deficient in NDRG1 expression by a lentiviral shRNA-mediated knockdown strategy were used in the primary screen. Compounds filtered from the primary screen were further validated through proliferation and clonogenic survival assays in parental and NDRG1 knockdown PCa cells. Screening of 3360 compounds revealed irinotecan and cetrimonium bromide (CTAB) as compounds that exhibited synthetic lethality against NDRG1 deficient PCa cells. A three-dimensional (3-D) invasion assay was utilized to test the ability of CTAB to inhibit invasion of DU-145 cells. CTAB was found to remarkably decrease invasion of DU-145 cells in collagen matrix. Our results suggest that CTAB and irinotecan could be further explored for their potential clinical benefit in patients with NDRG1 deficient PCa.


Asunto(s)
Camptotecina/análogos & derivados , Proteínas de Ciclo Celular/deficiencia , Compuestos de Cetrimonio/farmacología , Péptidos y Proteínas de Señalización Intracelular/deficiencia , Neoplasias de la Próstata/tratamiento farmacológico , Neoplasias de la Próstata/metabolismo , Antineoplásicos Fitogénicos/farmacología , Camptotecina/farmacología , Proteínas de Ciclo Celular/genética , Proteínas de Ciclo Celular/metabolismo , Procesos de Crecimiento Celular/efectos de los fármacos , Línea Celular Tumoral , Cetrimonio , Técnicas de Silenciamiento del Gen , Humanos , Péptidos y Proteínas de Señalización Intracelular/genética , Péptidos y Proteínas de Señalización Intracelular/metabolismo , Irinotecán , Masculino , Persona de Mediana Edad , Neoplasias de la Próstata/genética , Neoplasias de la Próstata/patología , ARN Interferente Pequeño/administración & dosificación , ARN Interferente Pequeño/genética , Tensoactivos/farmacología
18.
Oncotarget ; 3(8): 824-32, 2012 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-22910040

RESUMEN

Patients with normal karyotype represent the single largest cytogenetic group of acute myeloid leukemia (AML), with highly heterogeneous clinical and molecular characteristics. In this study, we sought to determine new prognostic biomarkers in cytogenetically normal (CN)-AML patients. A gene expression (GE)-based risk score was built, summing up the prognostic value of 22 genes whose expression is associated with a bad prognosis in a training cohort of 163 patients. GE-based risk score allowed identifying a high-risk group of patients (53.4%) in two independent cohorts of CN-AML patients. GE-based risk score and EVI1 gene expression remained independent prognostic factors using multivariate Cox analyses. Combining GE-based risk score with EVI1 gene expression allowed the identification of three clinically different groups of patients in two independent cohorts of CN-AML patients. Thus, GE-based risk score is powerful to predict clinical outcome for CN-AML patients and may provide potential therapeutic advances.


Asunto(s)
Biomarcadores de Tumor/genética , Proteínas de Unión al ADN/genética , Expresión Génica , Leucemia Mieloide Aguda/diagnóstico , Leucemia Mieloide Aguda/genética , Proto-Oncogenes/genética , Factores de Transcripción/genética , Adulto , Análisis Citogenético , Proteínas de Unión al ADN/biosíntesis , Supervivencia sin Enfermedad , Perfilación de la Expresión Génica , Humanos , Cariotipo , Proteína del Locus del Complejo MDS1 y EV11 , Proteínas de Neoplasias/biosíntesis , Proteínas de Neoplasias/genética , Pronóstico , Riesgo , Transactivadores/biosíntesis , Transactivadores/genética , Factores de Transcripción/biosíntesis , Regulador Transcripcional ERG , Proteínas Supresoras de Tumor/biosíntesis , Proteínas Supresoras de Tumor/genética
19.
Br J Haematol ; 157(3): 347-56, 2012 May.
Artículo en Inglés | MEDLINE | ID: mdl-22390678

RESUMEN

Chronic myelomonocytic leukaemia (CMML) is a heterogeneous haematopoietic disorder characterized by myeloproliferative or myelodysplastic features. At present, the pathogenesis of this malignancy is not completely understood. In this study, we sought to analyse gene expression profiles of CMML in order to characterize new molecular outcome predictors. A learning set of 32 untreated CMML patients at diagnosis was available for TaqMan low-density array gene expression analysis. From 93 selected genes related to cancer and cell cycle, we built a five-gene prognostic index after multiplicity correction. Using this index, we characterized two categories of patients with distinct overall survival (94% vs. 19% for good and poor overall survival, respectively; P = 0·007) and we successfully validated its strength on an independent cohort of 21 CMML patients with Affymetrix gene expression data. We found no specific patterns of association with traditional prognostic stratification parameters in the learning cohort. However, the poor survival group strongly correlated with high-risk treated patients and transformation to acute myeloid leukaemia. We report here a new multigene prognostic index for CMML, independent of the gene expression measurement method, which could be used as a powerful tool to predict clinical outcome and help physicians to evaluate criteria for treatments.


Asunto(s)
Biomarcadores de Tumor/metabolismo , Leucemia Mielomonocítica Crónica/diagnóstico , Anciano , Anciano de 80 o más Años , Estudios de Casos y Controles , Femenino , Estudios de Seguimiento , Perfilación de la Expresión Génica/métodos , Humanos , Estimación de Kaplan-Meier , Leucemia Mielomonocítica Crónica/terapia , Masculino , Persona de Mediana Edad , Familia de Multigenes , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Reacción en Cadena de la Polimerasa/métodos , Pronóstico , ARN Neoplásico/genética , Resultado del Tratamiento , Células U937
20.
BMC Bioinformatics ; 12: 242, 2011 Jun 17.
Artículo en Inglés | MEDLINE | ID: mdl-21682852

RESUMEN

BACKGROUND: High Throughput Sequencing (HTS) is now heavily exploited for genome (re-) sequencing, metagenomics, epigenomics, and transcriptomics and requires different, but computer intensive bioinformatic analyses. When a reference genome is available, mapping reads on it is the first step of this analysis. Read mapping programs owe their efficiency to the use of involved genome indexing data structures, like the Burrows-Wheeler transform. Recent solutions index both the genome, and the k-mers of the reads using hash-tables to further increase efficiency and accuracy. In various contexts (e.g. assembly or transcriptome analysis), read processing requires to determine the sub-collection of reads that are related to a given sequence, which is done by searching for some k-mers in the reads. Currently, many developments have focused on genome indexing structures for read mapping, but the question of read indexing remains broadly unexplored. However, the increase in sequence throughput urges for new algorithmic solutions to query large read collections efficiently. RESULTS: Here, we present a solution, named Gk arrays, to index large collections of reads, an algorithm to build the structure, and procedures to query it. Once constructed, the index structure is kept in main memory and is repeatedly accessed to answer queries like "given a k-mer, get the reads containing this k-mer (once/at least once)". We compared our structure to other solutions that adapt uncompressed indexing structures designed for long texts and show that it processes queries fast, while requiring much less memory. Our structure can thus handle larger read collections. We provide examples where such queries are adapted to different types of read analysis (SNP detection, assembly, RNA-Seq). CONCLUSIONS: Gk arrays constitute a versatile data structure that enables fast and more accurate read analysis in various contexts. The Gk arrays provide a flexible brick to design innovative programs that mine efficiently genomics, epigenomics, metagenomics, or transcriptomics reads. The Gk arrays library is available under Cecill (GPL compliant) license from http://www.atgc-montpellier.fr/ngs/.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Computadores , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...