RESUMO
In this work, we present the Genome Modeling System (GMS), an analysis information management system capable of executing automated genome analysis pipelines at a massive scale. The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines. The GMS also serves as a platform for bioinformatics development, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its data management system. Rather than separating ad-hoc analysis from rigorous, reproducible pipelines, the GMS promotes systematic integration between the two. As a demonstration of the GMS, we performed an integrated analysis of whole genome, exome and transcriptome sequencing data from a breast cancer cell line (HCC1395) and matched lymphoblastoid line (HCC1395BL). These data are available for users to test the software, complete tutorials and develop novel GMS pipeline configurations. The GMS is available at https://github.com/genome/gms.
Assuntos
Mapeamento Cromossômico/métodos , Genoma Humano/genética , Bases de Conhecimento , Modelos Genéticos , Análise de Sequência de DNA/métodos , Interface Usuário-Computador , Algoritmos , Simulação por Computador , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Humanos , Alinhamento de Sequência/métodosRESUMO
The Drug-Gene Interaction database (DGIdb) mines existing resources that generate hypotheses about how mutated genes might be targeted therapeutically or prioritized for drug development. It provides an interface for searching lists of genes against a compendium of drug-gene interactions and potentially 'druggable' genes. DGIdb can be accessed at http://dgidb.org/.
Assuntos
Mineração de Dados/métodos , Bases de Dados Genéticas , Descoberta de Drogas/métodos , Antineoplásicos/química , Neoplasias da Mama/tratamento farmacológico , Neoplasias da Mama/genética , Biologia Computacional/métodos , Interações Medicamentosas , Regulação da Expressão Gênica/efeitos dos fármacos , Variação Genética , Genoma , Genômica/métodos , Humanos , Neoplasias Pulmonares/tratamento farmacológico , Neoplasias Pulmonares/genética , Mutação , Software , Tecnologia Farmacêutica/métodosRESUMO
Massively parallel sequencing technology and the associated rapidly decreasing sequencing costs have enabled systemic analyses of somatic mutations in large cohorts of cancer cases. Here we introduce a comprehensive mutational analysis pipeline that uses standardized sequence-based inputs along with multiple types of clinical data to establish correlations among mutation sites, affected genes and pathways, and to ultimately separate the commonly abundant passenger mutations from the truly significant events. In other words, we aim to determine the Mutational Significance in Cancer (MuSiC) for these large data sets. The integration of analytical operations in the MuSiC framework is widely applicable to a broad set of tumor types and offers the benefits of automation as well as standardization. Herein, we describe the computational structure and statistical underpinnings of the MuSiC pipeline and demonstrate its performance using 316 ovarian cancer samples from the TCGA ovarian cancer project. MuSiC correctly confirms many expected results, and identifies several potentially novel avenues for discovery.