RESUMO
In this work, we present the Genome Modeling System (GMS), an analysis information management system capable of executing automated genome analysis pipelines at a massive scale. The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines. The GMS also serves as a platform for bioinformatics development, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its data management system. Rather than separating ad-hoc analysis from rigorous, reproducible pipelines, the GMS promotes systematic integration between the two. As a demonstration of the GMS, we performed an integrated analysis of whole genome, exome and transcriptome sequencing data from a breast cancer cell line (HCC1395) and matched lymphoblastoid line (HCC1395BL). These data are available for users to test the software, complete tutorials and develop novel GMS pipeline configurations. The GMS is available at https://github.com/genome/gms.
Assuntos
Mapeamento Cromossômico/métodos , Genoma Humano/genética , Bases de Conhecimento , Modelos Genéticos , Análise de Sequência de DNA/métodos , Interface Usuário-Computador , Algoritmos , Simulação por Computador , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Humanos , Alinhamento de Sequência/métodosRESUMO
Human chromosome 2 is unique to the human lineage in being the product of a head-to-head fusion of two intermediate-sized ancestral chromosomes. Chromosome 4 has received attention primarily related to the search for the Huntington's disease gene, but also for genes associated with Wolf-Hirschhorn syndrome, polycystic kidney disease and a form of muscular dystrophy. Here we present approximately 237 million base pairs of sequence for chromosome 2, and 186 million base pairs for chromosome 4, representing more than 99.6% of their euchromatic sequences. Our initial analyses have identified 1,346 protein-coding genes and 1,239 pseudogenes on chromosome 2, and 796 protein-coding genes and 778 pseudogenes on chromosome 4. Extensive analyses confirm the underlying construction of the sequence, and expand our understanding of the structure and evolution of mammalian chromosomes, including gene deserts, segmental duplications and highly variant regions.
Assuntos
Cromossomos Humanos Par 2/genética , Cromossomos Humanos Par 4/genética , Animais , Composição de Bases , Sequência de Bases , Centrômero/genética , Sequência Conservada/genética , Ilhas de CpG/genética , Eucromatina/genética , Etiquetas de Sequências Expressas , Duplicação Gênica , Variação Genética/genética , Genômica , Humanos , Dados de Sequência Molecular , Mapeamento Físico do Cromossomo , Polimorfismo Genético/genética , Primatas/genética , Proteínas/genética , Pseudogenes/genética , RNA Mensageiro/análise , RNA Mensageiro/genética , RNA não Traduzido/análise , RNA não Traduzido/genética , Recombinação Genética/genética , Análise de Sequência de DNARESUMO
Multiple myeloma (MM) is a disease of copy number variants (CNVs), chromosomal translocations, and single-nucleotide variants (SNVs). To enable integrative studies across these diverse mutation types, we developed a capture-based sequencing platform to detect their occurrence in 465 genes altered in MM and used it to sequence 95 primary tumor-normal pairs to a mean depth of 104×. We detected cases of hyperdiploidy (23%), deletions of 1p (8%), 6q (21%), 8p (17%), 14q (16%), 16q (22%), and 17p (4%), and amplification of 1q (19%). We also detected IGH and MYC translocations near expected frequencies and non-silent SNVs in NRAS (24%), KRAS (21%), FAM46C (17%), TP53 (9%), DIS3 (9%), and BRAF (3%). We discovered frequent mutations in IGLL5 (18%) that were mutually exclusive of RAS mutations and associated with increased risk of disease progression (p = 0.03), suggesting that IGLL5 may be a stratifying biomarker. We identified novel IGLL5/IGH translocations in two samples. We subjected 15 of the pairs to ultra-deep sequencing (1259×) and found that although depth correlated with number of mutations detected (p = 0.001), depth past ~300× added little. The platform provides cost-effective genomic analysis for research and may be useful in individualizing treatment decisions in clinical settings.