Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
2.
Nature ; 572(7769): 323-328, 2019 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-31367044

RESUMO

Exome-sequencing studies have generally been underpowered to identify deleterious alleles with a large effect on complex traits as such alleles are mostly rare. Because the population of northern and eastern Finland has expanded considerably and in isolation following a series of bottlenecks, individuals of these populations have numerous deleterious alleles at a relatively high frequency. Here, using exome sequencing of nearly 20,000 individuals from these regions, we investigate the role of rare coding variants in clinically relevant quantitative cardiometabolic traits. Exome-wide association studies for 64 quantitative traits identified 26 newly associated deleterious alleles. Of these 26 alleles, 19 are either unique to or more than 20 times more frequent in Finnish individuals than in other Europeans and show geographical clustering comparable to Mendelian disease mutations that are characteristic of the Finnish population. We estimate that sequencing studies of populations without this unique history would require hundreds of thousands to millions of participants to achieve comparable association power.

3.
Bioinformatics ; 2019 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-31218349

RESUMO

SUMMARY: Large-scale human genetics studies are now employing whole genome sequencing with the goal of conducting comprehensive trait mapping analyses of all forms of genome variation. However, methods for structural variation (SV) analysis have lagged far behind those for smaller scale variants, and there is an urgent need to develop more efficient tools that scale to the size of human populations. Here, we present a fast and highly scalable software toolkit (svtools) and cloud-based pipeline for assembling high quality SV maps - including deletions, duplications, mobile element insertions, inversions, and other rearrangements - in many thousands of human genomes. We show that this pipeline achieves similar variant detection performance to established per-sample methods (e.g., LUMPY), while providing fast and affordable joint analysis at the scale of ≥ 100,000 genomes. These tools will help enable the next generation of human genetics studies. AVAILABILITY AND IMPLEMENTATION: svtools is implemented in Python and freely available (MIT) from https://github.com/hall-lab/svtools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.
Cell ; 177(1): 70-84, 2019 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-30901550

RESUMO

Affordable genome sequencing technologies promise to revolutionize the field of human genetics by enabling comprehensive studies that interrogate all classes of genome variation, genome-wide, across the entire allele frequency spectrum. Ongoing projects worldwide are sequencing many thousands-and soon millions-of human genomes as part of various gene mapping studies, biobanking efforts, and clinical programs. However, while genome sequencing data production has become routine, genome analysis and interpretation remain challenging endeavors with many limitations and caveats. Here, we review the current state of technologies for genetic variant discovery, genotyping, and functional interpretation and discuss the prospects for future advances. We focus on germline variants discovered by whole-genome sequencing, genome-wide functional genomic approaches for predicting and measuring variant functional effects, and implications for studies of common and rare human disease.

5.
Nat Commun ; 9(1): 4038, 2018 10 02.
Artigo em Inglês | MEDLINE | ID: mdl-30279509

RESUMO

Hundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power. A central challenge for joint analysis is that different WGS data processing pipelines cause substantial differences in variant calling in combined datasets, necessitating computationally expensive reprocessing. This approach is no longer tenable given the scale of current studies and data volumes. Here, we define WGS data processing standards that allow different groups to produce functionally equivalent (FE) results, yet still innovate on data processing pipelines. We present initial FE pipelines developed at five genome centers and show that they yield similar variant calling results and produce significantly less variability than sequencing replicates. This work alleviates a key technical bottleneck for genome aggregation and helps lay the foundation for community-wide human genetics studies.

6.
Cell Rep ; 23(9): 2758-2769, 2018 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-29847804

RESUMO

Although aneuploidy is found in the majority of tumors, the degree of aneuploidy varies widely. It is unclear how cancer cells become aneuploid or how highly aneuploid tumors are different from those of more normal ploidy. We developed a simple computational method that measures the degree of aneuploidy or structural rearrangements of large chromosome regions of 522 human breast tumors from The Cancer Genome Atlas (TCGA). Highly aneuploid tumors overexpress activators of mitotic transcription and the genes encoding proteins that segregate chromosomes. Overexpression of three mitotic transcriptional regulators, E2F1, MYBL2, and FOXM1, is sufficient to increase the rate of lagging anaphase chromosomes in a non-transformed vertebrate tissue, demonstrating that this event can initiate aneuploidy. Highly aneuploid human breast tumors are also enriched in TP53 mutations. TP53 mutations co-associate with the overexpression of mitotic transcriptional activators, suggesting that these events work together to provide fitness to breast tumors.

7.
Nature ; 550(7675): 239-243, 2017 10 11.
Artigo em Inglês | MEDLINE | ID: mdl-29022581

RESUMO

Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.


Assuntos
Perfilação da Expressão Gênica , Variação Genética/genética , Especificidade de Órgãos/genética , Teorema de Bayes , Feminino , Genoma Humano/genética , Genômica , Genótipo , Humanos , Masculino , Modelos Genéticos , Análise de Sequência de RNA
8.
Nat Genet ; 49(5): 692-699, 2017 May.
Artigo em Inglês | MEDLINE | ID: mdl-28369037

RESUMO

Structural variants (SVs) are an important source of human genetic diversity, but their contribution to traits, disease and gene regulation remains unclear. We mapped cis expression quantitative trait loci (eQTLs) in 13 tissues via joint analysis of SVs, single-nucleotide variants (SNVs) and short insertion/deletion (indel) variants from deep whole-genome sequencing (WGS). We estimated that SVs are causal at 3.5-6.8% of eQTLs-a substantially higher fraction than prior estimates-and that expression-altering SVs have larger effect sizes than do SNVs and indels. We identified 789 putative causal SVs predicted to directly alter gene expression: most (88.3%) were noncoding variants enriched at enhancers and other regulatory elements, and 52 were linked to genome-wide association study loci. We observed a notable abundance of rare high-impact SVs associated with aberrant expression of nearby genes. These results suggest that comprehensive WGS-based SV analyses will increase the power of common- and rare-variant association studies.


Assuntos
Regulação da Expressão Gênica , Variação Genética , Genoma Humano/genética , Locos de Características Quantitativas/genética , Análise de Sequência de DNA/métodos , Algoritmos , Mapeamento Cromossômico , Estudo de Associação Genômica Ampla/métodos , Humanos , Mutação INDEL , Modelos Lineares , Polimorfismo de Nucleotídeo Único
9.
Bioinformatics ; 33(7): 1083-1085, 2017 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-28031184

RESUMO

Summary: Here we present SVScore, a tool for in silico structural variation (SV) impact prediction. SVScore aggregates per-base single nucleotide polymorphism (SNP) pathogenicity scores across relevant genomic intervals for each SV in a manner that considers variant type, gene features and positional uncertainty. We show that the allele frequency spectrum of high-scoring SVs is strongly skewed toward lower frequencies, suggesting that they are under purifying selection, and that SVScore identifies deleterious variants more effectively than alternative methods. Notably, our results also suggest that duplications are under surprisingly strong selection relative to deletions, and that there are a similar number of strongly pathogenic SVs and SNPs in the human population. Availability and Implementation: SVScore is implemented in Perl and available freely at {{ http://www.github.com/lganel/SVScore }} for use under the MIT license. Contact: ihall@wustl.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Variação Estrutural do Genoma , Software , Frequência do Gene , Genômica/métodos , Humanos , Polimorfismo de Nucleotídeo Único , Deleção de Sequência
10.
Neuron ; 89(6): 1223-1236, 2016 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-26948891

RESUMO

Somatic mutation in neurons is linked to neurologic disease and implicated in cell-type diversification. However, the origin, extent, and patterns of genomic mutation in neurons remain unknown. We established a nuclear transfer method to clonally amplify the genomes of neurons from adult mice for whole-genome sequencing. Comprehensive mutation detection and independent validation revealed that individual neurons harbor ∼100 unique mutations from all classes but lack recurrent rearrangements. Most neurons contain at least one gene-disrupting mutation and rare (0-2) mobile element insertions. The frequency and gene bias of neuronal mutations differ from other lineages, potentially due to novel mechanisms governing postmitotic mutation. Fertile mice were cloned from several neurons, establishing the compatibility of mutated adult neuronal genomes with reprogramming to pluripotency and development.


Assuntos
Clonagem Molecular , Mutação/genética , Neurônios/fisiologia , Análise de Sequência de DNA , Fatores Etários , Animais , Animais Recém-Nascidos , Caderinas/genética , Caderinas/metabolismo , Divisão Celular/genética , Elementos de DNA Transponíveis/genética , Embrião de Mamíferos , Feminino , Humanos , Antígeno Ki-67/metabolismo , Camundongos , Camundongos Transgênicos , Repetições de Microssatélites/genética , Proteínas do Tecido Nervoso/genética , Proteínas do Tecido Nervoso/metabolismo , Técnicas de Transferência Nuclear , Bulbo Olfatório/citologia , Bulbo Olfatório/embriologia , Bulbo Olfatório/crescimento & desenvolvimento , Oócitos/fisiologia
11.
Nat Methods ; 12(10): 966-8, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26258291

RESUMO

SpeedSeq is an open-source genome analysis platform that accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 h on a low-cost server and alleviates a bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement. SpeedSeq offers performance competitive with or superior to current methods for detecting germline and somatic single-nucleotide variants, structural variants, insertions and deletions, and it includes novel functionality for streamlined interpretation.


Assuntos
Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Anotação de Sequência Molecular/métodos , Software , Variação Genética , Humanos , Neoplasias/genética , Polimorfismo de Nucleotídeo Único , Medicina de Precisão/métodos , Fluxo de Trabalho
12.
Genome Med ; 7(1): 6, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25729435

RESUMO

Human cancers are frequently polyploid, containing multiple aneuploid subpopulations that differ in total DNA content. In this study we exploit this property to reconstruct evolutionary histories, by assuming that mutational complexity increases with time. We developed an experimental method called Ploidy-Seq that uses flow-sorting to isolate and enrich subpopulations with different ploidy prior to next-generation genome sequencing. We applied Ploidy-Seq to a patient with a triple-negative (ER-/PR-/HER2-) ductal carcinoma and performed whole-genome sequencing to trace the evolution of point mutations, indels, copy number aberrations, and structural variants in three clonal subpopulations during tumor growth. Our data show that few mutations (8% to 22%) were shared between all three subpopulations, and that the most aggressive clones comprised a minority of the tumor mass. We expect that Ploidy-Seq will have broad applications for delineating clonal diversity and investigating genome evolution in many human cancers.

13.
Bioinformatics ; 31(8): 1286-9, 2015 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-25527832

RESUMO

UNLABELLED: Current strategies for SNP and INDEL discovery incorporate sequence alignments from multiple individuals to maximize sensitivity and specificity. It is widely accepted that this approach also improves structural variant (SV) detection. However, multisample SV analysis has been stymied by the fundamental difficulties of SV calling, e.g. library insert size variability, SV alignment signal integration and detecting long-range genomic rearrangements involving disjoint loci. Extant tools suffer from poor scalability, which limits the number of genomes that can be co-analyzed and complicates analysis workflows. We have developed an approach that enables multisample SV analysis in hundreds to thousands of human genomes using commodity hardware. Here, we describe Hydra-Multi and measure its accuracy, speed and scalability using publicly available datasets provided by The 1000 Genomes Project and by The Cancer Genome Atlas (TCGA). AVAILABILITY AND IMPLEMENTATION: Hydra-Multi is written in C++ and is freely available at https://github.com/arq5x/Hydra. CONTACT: aaronquinlan@gmail.com or ihall@genome.wustl.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genética Populacional , Variação Estrutural do Genoma , Genômica/métodos , Hydra/genética , Software , Animais , Bases de Dados Factuais , Deleção de Genes , Humanos , Hydra/classificação , Alinhamento de Sequência
14.
Genome Biol ; 15(6): R84, 2014 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-24970577

RESUMO

Comprehensive discovery of structural variation (SV) from whole genome sequencing data requires multiple detection signals including read-pair, split-read, read-depth and prior knowledge. Owing to technical challenges, extant SV discovery algorithms either use one signal in isolation, or at best use two sequentially. We present LUMPY, a novel SV discovery framework that naturally integrates multiple SV signals jointly across multiple samples. We show that LUMPY yields improved sensitivity, especially when SV signal is reduced owing to either low coverage data or low intra-sample variant allele frequency. We also report a set of 4,564 validated breakpoints from the NA12878 human genome. https://github.com/arq5x/lumpy-sv.


Assuntos
Pontos de Quebra do Cromossomo , Análise Mutacional de DNA , Modelos Genéticos , Frequência do Gene , Variação Genética , Genoma Humano , Homozigoto , Humanos , Modelos Estatísticos , Neoplasias/genética , Curva ROC
15.
Bioinformatics ; 30(17): 2503-5, 2014 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-24812344

RESUMO

MOTIVATION: Illumina DNA sequencing is now the predominant source of raw genomic data, and data volumes are growing rapidly. Bioinformatic analysis pipelines are having trouble keeping pace. A common bottleneck in such pipelines is the requirement to read, write, sort and compress large BAM files multiple times. RESULTS: We present SAMBLASTER, a tool that reduces the number of times such costly operations are performed. SAMBLASTER is designed to mark duplicates in read-sorted SAM files as a piped post-pass on DNA aligner output before it is compressed to BAM. In addition, it can simultaneously output into separate files the discordant read-pairs and/or split-read mappings used for structural variant calling. As an alignment post-pass, its own runtime overhead is negligible, while dramatically reducing overall pipeline complexity and runtime. As a stand-alone duplicate marking tool, it performs significantly better than PICARD or SAMBAMBA in terms of both speed and memory usage, while achieving nearly identical results. AVAILABILITY AND IMPLEMENTATION: SAMBLASTER is open-source C+ + code and freely available for download from https://github.com/GregoryFaust/samblaster.


Assuntos
Variação Estrutural do Genoma , Análise de Sequência de DNA/métodos , Software , Genômica/métodos , Alinhamento de Sequência
16.
Science ; 342(6158): 632-7, 2013 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-24179226

RESUMO

We used single-cell genomic approaches to map DNA copy number variation (CNV) in neurons obtained from human induced pluripotent stem cell (hiPSC) lines and postmortem human brains. We identified aneuploid neurons, as well as numerous subchromosomal CNVs in euploid neurons. Neurotypic hiPSC-derived neurons had larger CNVs than fibroblasts, and several large deletions were found in hiPSC-derived neurons but not in matched neural progenitor cells. Single-cell sequencing of endogenous human frontal cortex neurons revealed that 13 to 41% of neurons have at least one megabase-scale de novo CNV, that deletions are twice as common as duplications, and that a subset of neurons have highly aberrant genomes marked by multiple alterations. Our results show that mosaic CNV is abundant in human neurons.


Assuntos
Variações do Número de Cópias de DNA , Lobo Frontal/citologia , Mosaicismo , Células-Tronco Neurais/citologia , Neurônios/citologia , Aneuploidia , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , Masculino , Neurogênese , Análise de Sequência de DNA , Deleção de Sequência , Análise de Célula Única
17.
Cancer Biol Ther ; 14(9): 840-52, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23792589

RESUMO

Prostate cancer is the second highest cause of male cancer deaths in the United States. A significant number of tumors advance to a highly invasive and metastatic stage, which is typically resistant to traditional cancer therapeutics. In order to identify chromosomal structural variants that may contribute to prostate cancer progression we sequenced the genomes of a HPV-18 immortalized nonmalignant human prostate epithelial cell line, RWPE1, and compared it to its malignant, metastatic derivative, WPE1-NB26. There were a total of 34 large (> 1 Mbp) and 38 small copy number variants (<100 kbp) in WPE1-NB26 that were not present in the precursor cell line. We also identified and validated 46 structural variants present in the two cell lines, of which 23 were unique to WPE1-NB26. Structural variants unique to the malignant cell line inactivated: (1) the neurofibromin2 (NF2) gene, a known tumor suppressor; (2) its neighboring gene NIPSNAP1, another putative tumor suppressor that inhibits TRPV6, an anti-apoptotic oncogene implicated in prostate cancer progression; (3) UGT2B17, a gene that inactivates dihydrotestosterone, a known activator of prostate cancer progression; and (4) LPIN2, a phosphatidic acid phosphatase and a co-factor of PGC1a that is important for lipid metabolism and for suppressing autoinflammation. Our results illustrate the value of comparing the genomes of defined related pairs of cell lines to discover chromosomal structural variants that may contribute to cancer progression.


Assuntos
Células Epiteliais/patologia , Variação Estrutural do Genoma , Glucuronosiltransferase/genética , Proteínas Nucleares/genética , Neoplasias da Próstata/genética , Carcinogênese/genética , Carcinogênese/patologia , Linhagem Celular Tumoral , Genes Supressores de Tumor , Humanos , Masculino , Antígenos de Histocompatibilidade Menor , Invasividade Neoplásica/genética , Invasividade Neoplásica/patologia , Metástase Neoplásica , Neoplasias da Próstata/patologia
18.
Genome Res ; 23(5): 762-76, 2013 May.
Artigo em Inglês | MEDLINE | ID: mdl-23410887

RESUMO

Tumor genomes are generally thought to evolve through a gradual accumulation of mutations, but the observation that extraordinarily complex rearrangements can arise through single mutational events suggests that evolution may be accelerated by punctuated changes in genome architecture. To assess the prevalence and origins of complex genomic rearrangements (CGRs), we mapped 6179 somatic structural variation breakpoints in 64 cancer genomes from seven tumor types and screened for clusters of three or more interconnected breakpoints. We find that complex breakpoint clusters are extremely common: 154 clusters comprise 25% of all somatic breakpoints, and 75% of tumors exhibit at least one complex cluster. Based on copy number state profiling, 63% of breakpoint clusters are consistent with being CGRs that arose through a single mutational event. CGRs have diverse architectures including focal breakpoint clusters, large-scale rearrangements joining clusters from one or more chromosomes, and staggeringly complex chromothripsis events. Notably, chromothripsis has a significantly higher incidence in glioblastoma samples (39%) relative to other tumor types (9%). Chromothripsis breakpoints also show significantly elevated intra-tumor allele frequencies relative to simple SVs, which indicates that they arise early during tumorigenesis or confer selective advantage. Finally, assembly and analysis of 4002 somatic and 6982 germline breakpoint sequences reveal that somatic breakpoints show significantly less microhomology and fewer templated insertions than germline breakpoints, and this effect is stronger at CGRs than at simple variants. These results are inconsistent with replication-based models of CGR genesis and strongly argue that nonhomologous repair of concurrently arising DNA double-strand breaks is the predominant mechanism underlying complex cancer genome rearrangements.


Assuntos
Aberrações Cromossômicas , Pontos de Quebra do Cromossomo , Mutação/genética , Neoplasias/genética , Sequência de Bases , Quebras de DNA de Cadeia Dupla , Replicação do DNA/genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias/patologia
19.
Bioinformatics ; 29(1): 1-7, 2013 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-23129298

RESUMO

MOTIVATION: The comparison of diverse genomic datasets is fundamental to understand genome biology. Researchers must explore many large datasets of genome intervals (e.g. genes, sequence alignments) to place their experimental results in a broader context and to make new discoveries. Relationships between genomic datasets are typically measured by identifying intervals that intersect, that is, they overlap and thus share a common genome interval. Given the continued advances in DNA sequencing technologies, efficient methods for measuring statistically significant relationships between many sets of genomic features are crucial for future discovery. RESULTS: We introduce the Binary Interval Search (BITS) algorithm, a novel and scalable approach to interval set intersection. We demonstrate that BITS outperforms existing methods at counting interval intersections. Moreover, we show that BITS is intrinsically suited to parallel computing architectures, such as graphics processing units by illustrating its utility for efficient Monte Carlo simulations measuring the significance of relationships between sets of genomic intervals. AVAILABILITY: https://github.com/arq5x/bits.


Assuntos
Algoritmos , Genômica/métodos , Método de Monte Carlo , Alinhamento de Sequência , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA