Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 54
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nat Commun ; 11(1): 69, 2020 01 03.
Artigo em Inglês | MEDLINE | ID: mdl-31900418

RESUMO

Cancer driver gene alterations influence cancer development, occurring in oncogenes, tumor suppressors, and dual role genes. Discovering dual role cancer genes is difficult because of their elusive context-dependent behavior. We define oncogenic mediators as genes controlling biological processes. With them, we classify cancer driver genes, unveiling their roles in cancer mechanisms. To this end, we present Moonlight, a tool that incorporates multiple -omics data to identify critical cancer driver genes. With Moonlight, we analyze 8000+ tumor samples from 18 cancer types, discovering 3310 oncogenic mediators, 151 having dual roles. By incorporating additional data (amplification, mutation, DNA methylation, chromatin accessibility), we reveal 1000+ cancer driver genes, corroborating known molecular mechanisms. Additionally, we confirm critical cancer driver genes by analysing cell-line datasets. We discover inactivation of tumor suppressors in intron regions and that tissue type and subtype indicate dual role status. These findings help explain tumor heterogeneity and could guide therapeutic decisions.


Assuntos
Biologia Computacional/métodos , Genes Supressores de Tumor , Neoplasias/genética , Oncogenes , Metilação de DNA , Humanos , Mutação , Software
2.
PLoS Comput Biol ; 15(3): e1006701, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30835723

RESUMO

The advent of Next-Generation Sequencing (NGS) technologies has opened new perspectives in deciphering the genetic mechanisms underlying complex diseases. Nowadays, the amount of genomic data is massive and substantial efforts and new tools are required to unveil the information hidden in the data. The Genomic Data Commons (GDC) Data Portal is a platform that contains different genomic studies including the ones from The Cancer Genome Atlas (TCGA) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiatives, accounting for more than 40 tumor types originating from nearly 30000 patients. Such platforms, although very attractive, must make sure the stored data are easily accessible and adequately harmonized. Moreover, they have the primary focus on the data storage in a unique place, and they do not provide a comprehensive toolkit for analyses and interpretation of the data. To fulfill this urgent need, comprehensive but easily accessible computational methods for integrative analyses of genomic data that do not renounce a robust statistical and theoretical framework are required. In this context, the R/Bioconductor package TCGAbiolinks was developed, offering a variety of bioinformatics functionalities. Here we introduce new features and enhancements of TCGAbiolinks in terms of i) more accurate and flexible pipelines for differential expression analyses, ii) different methods for tumor purity estimation and filtering, iii) integration of normal samples from other platforms iv) support for other genomics datasets, exemplified here by the TARGET data. Evidence has shown that accounting for tumor purity is essential in the study of tumorigenesis, as these factors promote confounding behavior regarding differential expression analysis. With this in mind, we implemented these filtering procedures in TCGAbiolinks. Moreover, a limitation of some of the TCGA datasets is the unavailability or paucity of corresponding normal samples. We thus integrated into TCGAbiolinks the possibility to use normal samples from the Genotype-Tissue Expression (GTEx) project, which is another large-scale repository cataloging gene expression from healthy individuals. The new functionalities are available in the TCGAbiolinks version 2.8 and higher released in Bioconductor version 3.7.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Neoplasias/genética , Carcinogênese , Conjuntos de Dados como Assunto , Genoma Humano , Humanos
3.
Oncotarget ; 9(45): 27605-27629, 2018 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-29963224

RESUMO

Colorectal cancer (CRC) is one of the most common cancers in humans and a leading cause of cancer-related deaths worldwide. As in the case of other cancers, CRC heterogeneity leads to a wide range of clinical outcomes and complicates therapy. Over the years, multiple factors have emerged as markers of CRC heterogeneity, improving tumor classification and selection of therapeutic strategies. Understanding the molecular mechanisms underlying this heterogeneity remains a major challenge. A considerable research effort is therefore devoted to identifying additional features of colorectal tumors, in order to better understand CRC etiology and to multiply therapeutic avenues. Recently, long noncoding RNAs (lncRNAs) have emerged as important players in physiological and pathological processes, including CRC. Here we looked for lncRNAs that might contribute to the various colorectal tumor phenotypes. We thus monitored the expression of 4898 lncRNA genes across 566 CRC samples and identified 282 lncRNAs reflecting CRC heterogeneity. We then inferred potential functions of these lncRNAs. Our results highlight lncRNAs that may participate in the major processes altered in distinct CRC cases, such as WNT/ß-catenin and TGF-ß signaling, immunity, the epithelial-to-mesenchymal transition (EMT), and angiogenesis. For several candidates, we provide experimental evidence supporting our functional predictions that they may be involved in the cell cycle or the EMT. Overall, our work identifies lncRNAs associated with key CRC characteristics and provides insights into their respective functions. Our findings constitute a further step towards understanding the contribution of lncRNAs to CRC heterogeneity. They may open new therapeutic opportunities.

4.
Int J Mol Sci ; 19(3)2018 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-29562723

RESUMO

Like other cancer diseases, prostate cancer (PC) is caused by the accumulation of genetic alterations in the cells that drives malignant growth. These alterations are revealed by gene profiling and copy number alteration (CNA) analysis. Moreover, recent evidence suggests that also microRNAs have an important role in PC development. Despite efforts to profile PC, the alterations (gene, CNA, and miRNA) and biological processes that correlate with disease development and progression remain partially elusive. Many gene signatures proposed as diagnostic or prognostic tools in cancer poorly overlap. The identification of co-expressed genes, that are functionally related, can identify a core network of genes associated with PC with a better reproducibility. By combining different approaches, including the integration of mRNA expression profiles, CNAs, and miRNA expression levels, we identified a gene signature of four genes overlapping with other published gene signatures and able to distinguish, in silico, high Gleason-scored PC from normal human tissue, which was further enriched to 19 genes by gene co-expression analysis. From the analysis of miRNAs possibly regulating this network, we found that hsa-miR-153 was highly connected to the genes in the network. Our results identify a four-gene signature with diagnostic and prognostic value in PC and suggest an interesting gene network that could play a key regulatory role in PC development and progression. Furthermore, hsa-miR-153, controlling this network, could be a potential biomarker for theranostics in high Gleason-scored PC.


Assuntos
Simulação por Computador , Redes Reguladoras de Genes , MicroRNAs/genética , Neoplasias da Próstata/genética , Neoplasias da Próstata/patologia , Adulto , Idoso , Área Sob a Curva , Variações do Número de Cópias de DNA/genética , Regulação para Baixo/genética , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Masculino , MicroRNAs/metabolismo , Pessoa de Meia-Idade , Invasividade Neoplásica , Regulação para Cima/genética
5.
BMC Genomics ; 19(1): 25, 2018 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-29304754

RESUMO

BACKGROUND: Modern high-throughput genomic technologies represent a comprehensive hallmark of molecular changes in pan-cancer studies. Although different cancer gene signatures have been revealed, the mechanism of tumourigenesis has yet to be completely understood. Pathways and networks are important tools to explain the role of genes in functional genomic studies. However, few methods consider the functional non-equal roles of genes in pathways and the complex gene-gene interactions in a network. RESULTS: We present a novel method in pan-cancer analysis that identifies de-regulated genes with a functional role by integrating pathway and network data. A pan-cancer analysis of 7158 tumour/normal samples from 16 cancer types identified 895 genes with a central role in pathways and de-regulated in cancer. Comparing our approach with 15 current tools that identify cancer driver genes, we found that 35.6% of the 895 genes identified by our method have been found as cancer driver genes with at least 2/15 tools. Finally, we applied a machine learning algorithm on 16 independent GEO cancer datasets to validate the diagnostic role of cancer driver genes for each cancer. We obtained a list of the top-ten cancer driver genes for each cancer considered in this study. CONCLUSIONS: Our analysis 1) confirmed that there are several known cancer driver genes in common among different types of cancer, 2) highlighted that cancer driver genes are able to regulate crucial pathways.


Assuntos
Biomarcadores Tumorais/genética , Redes Reguladoras de Genes , Genômica/métodos , Neoplasias/genética , Transdução de Sinais , Algoritmos , Estudos de Casos e Controles , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos
6.
Gastroenterology ; 154(4): 965-975, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29158192

RESUMO

BACKGROUND & AIMS: Patients with severe alcoholic hepatitis (AH) have a high risk of death within 90 days. Corticosteroids, which can cause severe adverse events, are the only treatment that increases short-term survival. It is a challenge to predict outcomes of patients with severe AH. Therefore, we developed a scoring system to predict patient survival, integrating baseline molecular and clinical variables. METHODS: We obtained fixed liver biopsy samples from 71 consecutive patients diagnosed with severe AH and treated with corticosteroids from July 2006 through December 2013 in Brussels, Belgium (derivation cohort). Gene expression patterns were analyzed by microarrays and clinical data were collected for 180 days. We identified gene expression signatures and clinical data that are associated with survival without liver transplantation at 90 and 180 days after initiation of corticosteroid therapy. Findings were validated using liver biopsies from 48 consecutive patients with severe AH treated with corticosteroids, collected from March 2010 through February 2015 at hospitals in Belgium and Switzerland (validation cohort 1) and in liver biopsies from 20 patients (9 received corticosteroid treatment), collected from January 2012 through May 2015 in the United States (validation cohort 2). RESULTS: We integrated data on expression patterns of 123 genes and the model for end-stage liver disease (MELD) scores to assign patients to groups with poor survival (29% survived 90 days and 26% survived 180 days) and good survival (76% survived 90 days and 65% survived 180 days) (P < .001) in the derivation cohort. We named this assignment system the gene signature-MELD (gs-MELD) score. In validation cohort 1, the gs-MELD score discriminated patients with poor survival (43% survived 90 days) from those with good survival (96% survived 90 days) (P < .001). The gs-MELD score also discriminated between patients with a poor survival at 180 days (34% survived) and a good survival at 180 days (84% survived) (P < .001). The time-dependent area under the receiver operator characteristic curve for the score was 0.86 (95% confidence interval 0.73-0.99) for survival at 90 days, and 0.83 (95% confidence interval 0.71-0.96) for survival at 180 days. This score outperformed other clinical models to predict survival of patients with severe AH in validation cohort 1. In validation cohort 2, the gs-MELD discriminated patients with a poor survival at 90 days (12% survived) from those with a good survival at 90 days (100%) (P < .001). CONCLUSIONS: We integrated data on baseline liver gene expression pattern and the MELD score to create the gs-MELD scoring system, which identifies patients with severe AH, treated or not with corticosteroids, most and least likely to survive for 90 and 180 days.


Assuntos
Técnicas de Apoio para a Decisão , Perfilação da Expressão Gênica/métodos , Hepatite Alcoólica/diagnóstico , Hepatite Alcoólica/genética , Transcriptoma , Corticosteroides/uso terapêutico , Adulto , Área Sob a Curva , Bélgica , Biópsia , Feminino , Marcadores Genéticos , Predisposição Genética para Doença , Hepatite Alcoólica/tratamento farmacológico , Hepatite Alcoólica/mortalidade , Humanos , Estimativa de Kaplan-Meier , Masculino , Pessoa de Meia-Idade , Análise de Sequência com Séries de Oligonucleotídeos , Fenótipo , Valor Preditivo dos Testes , Modelos de Riscos Proporcionais , Curva ROC , Reprodutibilidade dos Testes , Medição de Risco , Fatores de Risco , Índice de Gravidade de Doença , Fatores de Tempo , Resultado do Tratamento
7.
IEEE Trans Neural Netw Learn Syst ; 29(8): 3784-3797, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-28920909

RESUMO

Detecting frauds in credit card transactions is perhaps one of the best testbeds for computational intelligence algorithms. In fact, this problem involves a number of relevant challenges, namely: concept drift (customers' habits evolve and fraudsters change their strategies over time), class imbalance (genuine transactions far outnumber frauds), and verification latency (only a small set of transactions are timely checked by investigators). However, the vast majority of learning algorithms that have been proposed for fraud detection rely on assumptions that hardly hold in a real-world fraud-detection system (FDS). This lack of realism concerns two main aspects: 1) the way and timing with which supervised information is provided and 2) the measures used to assess fraud-detection performance. This paper has three major contributions. First, we propose, with the help of our industrial partner, a formalization of the fraud-detection problem that realistically describes the operating conditions of FDSs that everyday analyze massive streams of credit card transactions. We also illustrate the most appropriate performance measures to be used for fraud-detection purposes. Second, we design and assess a novel learning strategy that effectively addresses class imbalance, concept drift, and verification latency. Third, in our experiments, we demonstrate the impact of class unbalance and concept drift in a real-world data stream containing more than 75 million transactions, authorized over a time window of three years.

8.
Genome Med ; 9(1): 67, 2017 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-28724449

RESUMO

BACKGROUND: Tissue-specific integrative omics has the potential to reveal new genic elements important for developmental disorders. METHODS: Two pediatric patients with global developmental delay and intellectual disability phenotype underwent array-CGH genetic testing, both showing a partial deletion of the DLG2 gene. From independent human and murine omics datasets, we combined copy number variations, histone modifications, developmental tissue-specific regulation, and protein data to explore the molecular mechanism at play. RESULTS: Integrating genomics, transcriptomics, and epigenomics data, we describe two novel DLG2 promoters and coding first exons expressed in human fetal brain. Their murine conservation and protein-level evidence allowed us to produce new DLG2 gene models for human and mouse. These new genic elements are deleted in 90% of 29 patients (public and in-house) showing partial deletion of the DLG2 gene. The patients' clinical characteristics expand the neurodevelopmental phenotypic spectrum linked to DLG2 gene disruption to cognitive and behavioral categories. CONCLUSIONS: While protein-coding genes are regarded as well known, our work shows that integration of multiple omics datasets can unveil novel coding elements. From a clinical perspective, our work demonstrates that two new DLG2 promoters and exons are crucial for the neurodevelopmental phenotypes associated with this gene. In addition, our work brings evidence for the lack of cross-annotation in human versus mouse reference genomes and nucleotide versus protein databases.


Assuntos
Deficiências do Desenvolvimento/metabolismo , Éxons , Guanilato Quinases/genética , Deficiência Intelectual/metabolismo , Regiões Promotoras Genéticas , Proteínas Supressoras de Tumor/genética , Animais , Criança , Deficiências do Desenvolvimento/genética , Feminino , Humanos , Deficiência Intelectual/genética , Masculino , Proteínas de Membrana/genética , Camundongos
9.
J Clin Invest ; 127(8): 3090-3102, 2017 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-28714863

RESUMO

BACKGROUND: The tumor immune response is increasingly associated with better clinical outcomes in breast and other cancers. However, the evaluation of tumor-infiltrating lymphocytes (TILs) relies on histopathological measurements with limited accuracy and reproducibility. Here, we profiled DNA methylation markers to identify a methylation of TIL (MeTIL) signature that recapitulates TIL evaluations and their prognostic value for long-term outcomes in breast cancer (BC). METHODS: MeTIL signature scores were correlated with clinical endpoints reflecting overall or disease-free survival and a pathologic complete response to preoperative anthracycline therapy in 3 BC cohorts from the Jules Bordet Institute in Brussels and in other cancer types from The Cancer Genome Atlas. RESULTS: The MeTIL signature measured TIL distributions in a sensitive manner and predicted survival and response to chemotherapy in BC better than did histopathological assessment of TILs or gene expression-based immune markers, respectively. The MeTIL signature also improved the prediction of survival in other malignancies, including melanoma and lung cancer. Furthermore, the MeTIL signature predicted differences in survival for malignancies in which TILs were not known to have a prognostic value. Finally, we showed that MeTIL markers can be determined by bisulfite pyrosequencing of small amounts of DNA from formalin-fixed, paraffin-embedded tumor tissue, supporting clinical applications for this methodology. CONCLUSIONS: This study highlights the power of DNA methylation to evaluate tumor immune responses and the potential of this approach to improve the diagnosis and treatment of breast and other cancers. FUNDING: This work was funded by the Fonds National de la Recherche Scientifique (FNRS) and Télévie, the INNOVIRIS Brussels Region BRUBREAST Project, the IUAP P7/03 program, the Belgian "Foundation against Cancer," the Breast Cancer Research Foundation (BCRF), and the Fonds Gaston Ithier.


Assuntos
Neoplasias da Mama/diagnóstico , Metilação de DNA , Idoso , Antraciclinas/uso terapêutico , Neoplasias da Mama/genética , Neoplasias da Mama/terapia , Linhagem Celular Tumoral , Separação Celular , Estudos de Coortes , Terapia Combinada , Intervalo Livre de Doença , Feminino , Humanos , Sistema Imunitário , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/terapia , Linfócitos do Interstício Tumoral/citologia , Masculino , Melanoma/diagnóstico , Melanoma/genética , Melanoma/terapia , Pessoa de Meia-Idade , Período Pré-Operatório , Prognóstico , Modelos de Riscos Proporcionais , Análise de Sequência de DNA , Neoplasias Cutâneas/diagnóstico , Neoplasias Cutâneas/genética , Neoplasias Cutâneas/terapia , Resultado do Tratamento
10.
Bioinformatics ; 33(19): 3131-3133, 2017 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-28605519

RESUMO

Summary: Identifying molecular cancer subtypes from multi-omics data is an important step in the personalized medicine. We introduce CancerSubtypes, an R package for identifying cancer subtypes using multi-omics data, including gene expression, miRNA expression and DNA methylation data. CancerSubtypes integrates four main computational methods which are highly cited for cancer subtype identification and provides a standardized framework for data pre-processing, feature selection, and result follow-up analyses, including results computing, biology validation and visualization. The input and output of each step in the framework are packaged in the same data format, making it convenience to compare different methods. The package is useful for inferring cancer subtypes from an input genomic dataset, comparing the predictions from different well-known methods and testing new subtype discovery methods, as shown with different application scenarios in the Supplementary Material. Availability and implementation: The package is implemented in R and available under GPL-2 license from the Bioconductor website (http://bioconductor.org/packages/CancerSubtypes/). Contact: thuc.le@unisa.edu.au or jiuyong.li@unisa.edu.au. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Neoplasias/classificação , Neoplasias/genética , Software , Gráficos por Computador , Metilação de DNA , Expressão Gênica , Genômica , Humanos , MicroRNAs/metabolismo , Neoplasias/metabolismo
11.
BioData Min ; 10: 15, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28484519

RESUMO

BACKGROUND: Reverse engineering of gene regulatory networks (GRNs) from gene expression data is a classical challenge in systems biology. Thanks to high-throughput technologies, a massive amount of gene-expression data has been accumulated in the public repositories. Modelling GRNs from multiple experiments (also called integrative analysis) has; therefore, naturally become a standard procedure in modern computational biology. Indeed, such analysis is usually more robust than the traditional approaches, which suffer from experimental biases and the low number of samples by analysing individual datasets. To date, there are mainly two strategies for the problem of interest: the first one ("data merging") merges all datasets together and then infers a GRN whereas the other ("networks ensemble") infers GRNs from every dataset separately and then aggregates them using some ensemble rules (such as ranksum or weightsum). Unfortunately, a thorough comparison of these two approaches is lacking. RESULTS: In this work, we are going to present another meta-analysis approach for inferring GRNs from multiple studies. Our proposed meta-analysis approach, adapted to methods based on pairwise measures such as correlation or mutual information, consists of two steps: aggregating matrices of the pairwise measures from every dataset followed by extracting the network from the meta-matrix. Afterwards, we evaluate the performance of the two commonly used approaches mentioned above and our presented approach with a systematic set of experiments based on in silico benchmarks. CONCLUSIONS: We proposed a first systematic evaluation of different strategies for reverse engineering GRNs from multiple datasets. Experiment results strongly suggest that assembling matrices of pairwise dependencies is a better strategy for network inference than the two commonly used ones.

12.
Int J Mol Sci ; 18(2)2017 Jan 27.
Artigo em Inglês | MEDLINE | ID: mdl-28134831

RESUMO

Gene Regulatory Networks (GRNs) control many biological systems, but how such network coordination is shaped is still unknown. GRNs can be subdivided into basic connections that describe how the network members interact e.g., co-expression, physical interaction, co-localization, genetic influence, pathways, and shared protein domains. The important regulatory mechanisms of these networks involve miRNAs. We developed an R/Bioconductor package, namely SpidermiR, which offers an easy access to both GRNs and miRNAs to the end user, and integrates this information with differentially expressed genes obtained from The Cancer Genome Atlas. Specifically, SpidermiR allows the users to: (i) query and download GRNs and miRNAs from validated and predicted repositories; (ii) integrate miRNAs with GRNs in order to obtain miRNA-gene-gene and miRNA-protein-protein interactions, and to analyze miRNA GRNs in order to identify miRNA-gene communities; and (iii) graphically visualize the results of the analyses. These analyses can be performed through a single interface and without the need for any downloads. The full data sets are then rapidly integrated and processed locally.


Assuntos
MicroRNAs/metabolismo , Software , Estatística como Assunto , Neoplasias da Mama/genética , Feminino , Humanos , Masculino , Proteínas de Neoplasias/metabolismo , Neoplasias da Próstata/genética , Ligação Proteica
13.
Sci Adv ; 2(9): e1600220, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-27617288

RESUMO

Evidence is emerging that long noncoding RNAs (lncRNAs) may play a role in cancer development, but this role is not yet clear. We performed a genome-wide transcriptional survey to explore the lncRNA landscape across 995 breast tissue samples. We identified 215 lncRNAs whose genes are aberrantly expressed in breast tumors, as compared to normal samples. Unsupervised hierarchical clustering of breast tumors on the basis of their lncRNAs revealed four breast cancer subgroups that correlate tightly with PAM50-defined mRNA-based subtypes. Using multivariate analysis, we identified no less than 210 lncRNAs prognostic of clinical outcome. By analyzing the coexpression of lncRNA genes and protein-coding genes, we inferred potential functions of the 215 dysregulated lncRNAs. We then associated subtype-specific lncRNAs with key molecular processes involved in cancer. A correlation was observed, on the one hand, between luminal A-specific lncRNAs and the activation of phosphatidylinositol 3-kinase, fibroblast growth factor, and transforming growth factor-ß pathways and, on the other hand, between basal-like-specific lncRNAs and the activation of epidermal growth factor receptor (EGFR)-dependent pathways and of the epithelial-to-mesenchymal transition. Finally, we showed that a specific lncRNA, which we called CYTOR, plays a role in breast cancer. We confirmed its predicted functions, showing that it regulates genes involved in the EGFR/mammalian target of rapamycin pathway and is required for cell proliferation, cell migration, and cytoskeleton organization. Overall, our work provides the most comprehensive analyses for lncRNA in breast cancers. Our findings suggest a wide range of biological functions associated with lncRNAs in breast cancer and provide a foundation for functional investigations that could lead to new therapeutic approaches.


Assuntos
Neoplasias da Mama/genética , Genoma Humano , Proteínas de Neoplasias/genética , RNA Longo não Codificante/genética , Adulto , Idoso , Neoplasias da Mama/patologia , Movimento Celular/genética , Proliferação de Células/genética , Receptores ErbB/genética , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Pessoa de Meia-Idade , RNA Longo não Codificante/isolamento & purificação
14.
F1000Res ; 5: 1542, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-28232861

RESUMO

Biotechnological advances in sequencing have led to an explosion of publicly available data via large international consortia such as The Cancer Genome Atlas (TCGA), The Encyclopedia of DNA Elements (ENCODE), and The NIH Roadmap Epigenomics Mapping Consortium (Roadmap). These projects have provided unprecedented opportunities to interrogate the epigenome of cultured cancer cell lines as well as normal and tumor tissues with high genomic resolution. The bioconductor project offers more than 1,000 open-source software and statistical packages to analyze high-throughput genomic data. However, most packages are designed for specific data types (e.g. expression, epigenetics, genomics) and there is no comprehensive tool that provides a complete integrative analysis harnessing the resources and data provided by all three public projects. A need to create an integration of these different analyses was recently proposed. In this workflow, we provide a series of biologically focused integrative downstream analyses of different molecular data. We describe how to download, process and prepare TCGA data and by harnessing several key bioconductor packages, we describe how to extract biologically meaningful genomic and epigenomic data and by using Roadmap and ENCODE data, we provide a workplan to identify candidate biologically relevant functional epigenomic elements associated with cancer. To illustrate our workflow, we analyzed two types of brain tumors : low-grade glioma (LGG) versus high-grade glioma (glioblastoma multiform or GBM). This workflow introduces the following Bioconductor packages: AnnotationHub, ChIPSeeker, ComplexHeatmap, pathview, ELMER, GAIA, MINET, RTCGAtoolbox, TCGAbiolinks.

15.
BMC Bioinformatics ; 17(Suppl 12): 348, 2016 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-28185585

RESUMO

BACKGROUND: An important challenge in cancer biology is to understand the complex aspects of the disease. It is increasingly evident that genes are not isolated from each other and the comprehension of how different genes are related to each other could explain biological mechanisms causing diseases. Biological pathways are important tools to reveal gene interaction and reduce the large number of genes to be studied by partitioning it into smaller paths. Furthermore, recent scientific evidence has proven that a combination of pathways, instead than a single element of the pathway or a single pathway, could be responsible for pathological changes in a cell. RESULTS: In this paper we develop a new method that can reveal miRNAs able to regulate, in a coordinated way, networks of gene pathways. We applied the method to subtypes of breast cancer. The basic idea is the identification of pathways significantly enriched with differentially expressed genes among the different breast cancer subtypes and normal tissue. Looking at the pairs of pathways that were found to be functionally related, we created a network of dependent pathways and we focused on identifying miRNAs that could act as miRNA drivers in a coordinated regulation process. CONCLUSIONS: Our approach enables miRNAs identification that could have an important role in the development of breast cancer.


Assuntos
Neoplasias da Mama/genética , Redes Reguladoras de Genes , Genômica/métodos , MicroRNAs/genética , Neoplasias da Mama/metabolismo , Feminino , Perfilação da Expressão Gênica/métodos , Humanos , MicroRNAs/metabolismo
16.
Nucleic Acids Res ; 44(8): e71, 2016 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-26704973

RESUMO

The Cancer Genome Atlas (TCGA) research network has made public a large collection of clinical and molecular phenotypes of more than 10 000 tumor patients across 33 different tumor types. Using this cohort, TCGA has published over 20 marker papers detailing the genomic and epigenomic alterations associated with these tumor types. Although many important discoveries have been made by TCGA's research network, opportunities still exist to implement novel methods, thereby elucidating new biological pathways and diagnostic markers. However, mining the TCGA data presents several bioinformatics challenges, such as data retrieval and integration with clinical data and other molecular data types (e.g. RNA and DNA methylation). We developed an R/Bioconductor package called TCGAbiolinks to address these challenges and offer bioinformatics solutions by using a guided workflow to allow users to query, download and perform integrative analyses of TCGA data. We combined methods from computer science and statistics into the pipeline and incorporated methodologies developed in previous TCGA marker studies and in our own group. Using four different TCGA tumor types (Kidney, Brain, Breast and Colon) as examples, we provide case studies to illustrate examples of reproducibility, integrative analysis and utilization of different Bioconductor packages to advance and accelerate novel discoveries.


Assuntos
Biologia Computacional/métodos , Mineração de Dados/métodos , Bases de Dados Genéticas , Genoma Humano/genética , Genômica/métodos , Neoplasias/genética , Proteína BRCA1/genética , Proteína BRCA2/genética , Biomarcadores Tumorais/genética , Metilação de DNA/genética , Humanos , Neoplasias/classificação , Estatística como Assunto/métodos
17.
Genom Data ; 4: 123-6, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-26484195

RESUMO

Quantitative validation of gene regulatory networks (GRNs) inferred from observational expression data is a difficult task usually involving time intensive and costly laboratory experiments. We were able to show that gene knock-down experiments can be used to quantitatively assess the quality of large-scale GRNs via a purely data-driven approach (Olsen et al. 2014). Our new validation framework also enables the statistical comparison of multiple network inference techniques, which was a long-standing challenge in the field. In this Data in Brief we detail the contents and quality controls for the gene expression data (available from NCBI Gene Expression Omnibus repository with accession number GSE53091) associated with our study published in Genomics (Olsen et al. 2014). We also provide R code to access the data and reproduce the analysis presented in this article.

18.
Biomed Res Int ; 2015: 831314, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26240829

RESUMO

In this work an integrated approach was used to identify functional miRNAs regulating gene pathway cross-talk in breast cancer (BC). We first integrated gene expression profiles and biological pathway information to explore the underlying associations between genes differently expressed among normal and BC samples and pathways enriched from these genes. For each pair of pathways, a score was derived from the distribution of gene expression levels by quantifying their pathway cross-talk. Random forest classification allowed the identification of pairs of pathways with high cross-talk. We assessed miRNAs regulating the identified gene pathways by a mutual information analysis. A Fisher test was applied to demonstrate their significance in the regulated pathways. Our results suggest interesting networks of pathways that could be key regulatory of target genes in BC, including stem cell pluripotency, coagulation, and hypoxia pathways and miRNAs that control these networks could be potential biomarkers for diagnostic, prognostic, and therapeutic development in BC. This work shows that standard methods of predicting normal and tumor classes such as differentially expressed miRNAs or transcription factors could lose intrinsic features; instead our approach revealed the responsible molecules of the disease.


Assuntos
Neoplasias da Mama/genética , Regulação Neoplásica da Expressão Gênica/genética , MicroRNAs/genética , Modelos Genéticos , Receptor Cross-Talk , Transdução de Sinais/genética , Biomarcadores Tumorais/genética , Neoplasias da Mama/patologia , Simulação por Computador , Perfilação da Expressão Gênica/métodos , Genes Neoplásicos/genética , Marcadores Genéticos/genética , Humanos , Modelos Estatísticos , Método de Monte Carlo , Invasividade Neoplásica , Integração de Sistemas
19.
Science ; 348(6237): 1262073, 2015 May 22.
Artigo em Inglês | MEDLINE | ID: mdl-25999517

RESUMO

Species interaction networks are shaped by abiotic and biotic factors. Here, as part of the Tara Oceans project, we studied the photic zone interactome using environmental factors and organismal abundance profiles and found that environmental factors are incomplete predictors of community structure. We found associations across plankton functional types and phylogenetic groups to be nonrandomly distributed on the network and driven by both local and global patterns. We identified interactions among grazers, primary producers, viruses, and (mainly parasitic) symbionts and validated network-generated hypotheses using microscopy to confirm symbiotic relationships. We have thus provided a resource to support further research on ocean food webs and integrating biological components into ocean models.


Assuntos
Cadeia Alimentar , Plâncton/classificação , Plâncton/fisiologia , Simbiose , Animais , Especificidade de Hospedeiro , Oceanos e Mares , Filogenia , Platelmintos/classificação , Platelmintos/fisiologia , Luz Solar , Vírus/classificação
20.
Front Genet ; 5: 177, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25009552

RESUMO

When inferring networks from high-throughput genomic data, one of the main challenges is the subsequent validation of these networks. In the best case scenario, the true network is partially known from previous research results published in structured databases or research articles. Traditionally, inferred networks are validated against these known interactions. Whenever the recovery rate is gauged to be high enough, subsequent high scoring but unknown inferred interactions are deemed good candidates for further experimental validation. Therefore such validation framework strongly depends on the quantity and quality of published interactions and presents serious pitfalls: (1) availability of these known interactions for the studied problem might be sparse; (2) quantitatively comparing different inference algorithms is not trivial; and (3) the use of these known interactions for validation prevents their integration in the inference procedure. The latter is particularly relevant as it has recently been showed that integration of priors during network inference significantly improves the quality of inferred networks. To overcome these problems when validating inferred networks, we recently proposed a data-driven validation framework based on single gene knock-down experiments. Using this framework, we were able to demonstrate the benefits of integrating prior knowledge and expression data. In this paper we used this framework to assess the quality of different sources of prior knowledge on their own and in combination with different genomic data sets in colorectal cancer. We observed that most prior sources lead to significant F-scores. Furthermore, their integration with genomic data leads to a significant increase in F-scores, especially for priors extracted from full text PubMed articles, known co-expression modules and genetic interactions. Lastly, we observed that the results are consistent for three different data sets: experimental knock-down data and two human tumor data sets.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA