Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 185(18): 3426-3440.e19, 2022 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-36055201

RESUMO

The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies.


Assuntos
Genoma Humano , Sequenciamento Completo do Genoma , Feminino , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Mutação INDEL , Masculino , Polimorfismo de Nucleotídeo Único
2.
Cell ; 184(10): 2633-2648.e19, 2021 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-33864768

RESUMO

Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v8 genetic and multi-tissue transcriptomic data to profile the expression, genetic regulation, cellular contexts, and trait associations of 14,100 lncRNA genes across 49 tissues for 101 distinct complex genetic traits. Using these approaches, we identified 1,432 lncRNA gene-trait associations, 800 of which were not explained by stronger effects of neighboring protein-coding genes. This included associations between lncRNA quantitative trait loci and inflammatory bowel disease, type 1 and type 2 diabetes, and coronary artery disease, as well as rare variant associations to body mass index.


Assuntos
Doença/genética , Herança Multifatorial/genética , População/genética , RNA Longo não Codificante/genética , Transcriptoma , Doença da Artéria Coronariana/genética , Diabetes Mellitus Tipo 1/genética , Diabetes Mellitus Tipo 2/genética , Perfilação da Expressão Gênica , Variação Genética , Humanos , Doenças Inflamatórias Intestinais/genética , Especificidade de Órgãos/genética , Locos de Características Quantitativas
3.
Cell ; 177(1): 70-84, 2019 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-30901550

RESUMO

Affordable genome sequencing technologies promise to revolutionize the field of human genetics by enabling comprehensive studies that interrogate all classes of genome variation, genome-wide, across the entire allele frequency spectrum. Ongoing projects worldwide are sequencing many thousands-and soon millions-of human genomes as part of various gene mapping studies, biobanking efforts, and clinical programs. However, while genome sequencing data production has become routine, genome analysis and interpretation remain challenging endeavors with many limitations and caveats. Here, we review the current state of technologies for genetic variant discovery, genotyping, and functional interpretation and discuss the prospects for future advances. We focus on germline variants discovered by whole-genome sequencing, genome-wide functional genomic approaches for predicting and measuring variant functional effects, and implications for studies of common and rare human disease.


Assuntos
Variação Genética/genética , Genoma Humano/genética , Análise de Sequência de DNA/tendências , Bancos de Espécimes Biológicos , Mapeamento Cromossômico/métodos , Predisposição Genética para Doença/genética , Testes Genéticos/tendências , Estudo de Associação Genômica Ampla , Genômica/métodos , Genômica/tendências , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Projeto Genoma Humano , Humanos , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de DNA/métodos , Sequenciamento Completo do Genoma/métodos , Sequenciamento Completo do Genoma/tendências
4.
Nature ; 604(7906): 437-446, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35444317

RESUMO

The human reference genome is the most widely used resource in human genetics and is due for a major update. Its current structure is a linear composite of merged haplotypes from more than 20 people, with a single individual comprising most of the sequence. It contains biases and errors within a framework that does not represent global human genomic variation. A high-quality reference with global representation of common variants, including single-nucleotide variants, structural variants and functional elements, is needed. The Human Pangenome Reference Consortium aims to create a more sophisticated and complete human reference genome with a graph-based, telomere-to-telomere representation of global genomic diversity. Here we leverage innovations in technology, study design and global partnerships with the goal of constructing the highest-possible quality human pangenome reference. Our goal is to improve data representation and streamline analyses to enable routine assembly of complete diploid genomes. With attention to ethical frameworks, the human pangenome reference will contain a more accurate and diverse representation of global genomic variation, improve gene-disease association studies across populations, expand the scope of genomics research to the most repetitive and polymorphic regions of the genome, and serve as the ultimate genetic resource for future biomedical research and precision medicine.


Assuntos
Genoma Humano , Genômica , Genoma Humano/genética , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA
5.
Nature ; 583(7814): 83-89, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32460305

RESUMO

A key goal of whole-genome sequencing for studies of human genetics is to interrogate all forms of variation, including single-nucleotide variants, small insertion or deletion (indel) variants and structural variants. However, tools and resources for the study of structural variants have lagged behind those for smaller variants. Here we used a scalable pipeline1 to map and characterize structural variants in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest, to our knowledge, whole-genome-sequencing-based structural variant resource so far. On average, individuals carry 2.9 rare structural variants that alter coding regions; these variants affect the dosage or structure of 4.2 genes and account for 4.0-11.2% of rare high-impact coding alleles. Using a computational model, we estimate that structural variants account for 17.2% of rare alleles genome-wide, with predicted deleterious effects that are equivalent to loss-of-function coding alleles; approximately 90% of such structural variants are noncoding deletions (mean 19.1 per genome). We report 158,991 ultra-rare structural variants and show that 2% of individuals carry ultra-rare megabase-scale structural variants, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and noncoding elements, and reveal trends that relate to element class and conservation. This work will help to guide the analysis and interpretation of structural variants in the era of whole-genome sequencing.


Assuntos
Variação Genética , Genoma Humano/genética , Sequenciamento Completo do Genoma , Alelos , Estudos de Casos e Controles , Epigênese Genética , Feminino , Dosagem de Genes/genética , Genética Populacional , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Anotação de Sequência Molecular , Locos de Características Quantitativas , Grupos Raciais/genética , Software
6.
Am J Hum Genet ; 109(10): 1727-1741, 2022 10 06.
Artigo em Inglês | MEDLINE | ID: mdl-36055244

RESUMO

Transcriptomics data have been integrated with genome-wide association studies (GWASs) to help understand disease/trait molecular mechanisms. The utility of metabolomics, integrated with transcriptomics and disease GWASs, to understand molecular mechanisms for metabolite levels or diseases has not been thoroughly evaluated. We performed probabilistic transcriptome-wide association and locus-level colocalization analyses to integrate transcriptomics results for 49 tissues in 706 individuals from the GTEx project, metabolomics results for 1,391 plasma metabolites in 6,136 Finnish men from the METSIM study, and GWAS results for 2,861 disease traits in 260,405 Finnish individuals from the FinnGen study. We found that genetic variants that regulate metabolite levels were more likely to influence gene expression and disease risk compared to the ones that do not. Integrating transcriptomics with metabolomics results prioritized 397 genes for 521 metabolites, including 496 previously identified gene-metabolite pairs with strong functional connections and suggested 33.3% of such gene-metabolite pairs shared the same causal variants with genetic associations of gene expression. Integrating transcriptomics and metabolomics individually with FinnGen GWAS results identified 1,597 genes for 790 disease traits. Integrating transcriptomics and metabolomics jointly with FinnGen GWAS results helped pinpoint metabolic pathways from genes to diseases. We identified putative causal effects of UGT1A1/UGT1A4 expression on gallbladder disorders through regulating plasma (E,E)-bilirubin levels, of SLC22A5 expression on nasal polyps and plasma carnitine levels through distinct pathways, and of LIPC expression on age-related macular degeneration through glycerophospholipid metabolic pathways. Our study highlights the power of integrating multiple sets of molecular traits and GWAS results to deepen understanding of disease pathophysiology.


Assuntos
Estudo de Associação Genômica Ampla , Transcriptoma , Bilirrubina , Carnitina , Glicerofosfolipídeos , Humanos , Masculino , Metabolômica , Locos de Características Quantitativas/genética , Membro 5 da Família 22 de Carreadores de Soluto/genética , Transcriptoma/genética
8.
Nature ; 572(7769): 323-328, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31367044

RESUMO

Exome-sequencing studies have generally been underpowered to identify deleterious alleles with a large effect on complex traits as such alleles are mostly rare. Because the population of northern and eastern Finland has expanded considerably and in isolation following a series of bottlenecks, individuals of these populations have numerous deleterious alleles at a relatively high frequency. Here, using exome sequencing of nearly 20,000 individuals from these regions, we investigate the role of rare coding variants in clinically relevant quantitative cardiometabolic traits. Exome-wide association studies for 64 quantitative traits identified 26 newly associated deleterious alleles. Of these 26 alleles, 19 are either unique to or more than 20 times more frequent in Finnish individuals than in other Europeans and show geographical clustering comparable to Mendelian disease mutations that are characteristic of the Finnish population. We estimate that sequencing studies of populations without this unique history would require hundreds of thousands to millions of participants to achieve comparable association power.


Assuntos
Sequenciamento do Exoma , Estudos de Associação Genética/métodos , Predisposição Genética para Doença/genética , Variação Genética/genética , Locos de Características Quantitativas/genética , Alelos , HDL-Colesterol/genética , Análise por Conglomerados , Determinação de Ponto Final , Finlândia , Mapeamento Geográfico , Humanos , Herança Multifatorial/genética , Reprodutibilidade dos Testes
9.
Am J Hum Genet ; 108(4): 583-596, 2021 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-33798444

RESUMO

The contribution of genome structural variation (SV) to quantitative traits associated with cardiometabolic diseases remains largely unknown. Here, we present the results of a study examining genetic association between SVs and cardiometabolic traits in the Finnish population. We used sensitive methods to identify and genotype 129,166 high-confidence SVs from deep whole-genome sequencing (WGS) data of 4,848 individuals. We tested the 64,572 common and low-frequency SVs for association with 116 quantitative traits and tested candidate associations using exome sequencing and array genotype data from an additional 15,205 individuals. We discovered 31 genome-wide significant associations at 15 loci, including 2 loci at which SVs have strong phenotypic effects: (1) a deletion of the ALB promoter that is greatly enriched in the Finnish population and causes decreased serum albumin level in carriers (p = 1.47 × 10-54) and is also associated with increased levels of total cholesterol (p = 1.22 × 10-28) and 14 additional cholesterol-related traits, and (2) a multi-allelic copy number variant (CNV) at PDPR that is strongly associated with pyruvate (p = 4.81 × 10-21) and alanine (p = 6.14 × 10-12) levels and resides within a structurally complex genomic region that has accumulated many rearrangements over evolutionary time. We also confirmed six previously reported associations, including five led by stronger signals in single nucleotide variants (SNVs) and one linking recurrent HP gene deletion and cholesterol levels (p = 6.24 × 10-10), which was also found to be strongly associated with increased glycoprotein level (p = 3.53 × 10-35). Our study confirms that integrating SVs in trait-mapping studies will expand our knowledge of genetic factors underlying disease risk.


Assuntos
Doenças Cardiovasculares/genética , Variação Estrutural do Genoma/genética , Alelos , Colesterol/sangue , Variações do Número de Cópias de DNA/genética , Feminino , Finlândia , Genoma Humano/genética , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Proteínas Mitocondriais/genética , Regiões Promotoras Genéticas/genética , Piruvato Desidrogenase (Lipoamida)-Fosfatase/genética , Ácido Pirúvico/metabolismo , Albumina Sérica Humana/genética
10.
Genome Res ; 31(12): 2249-2257, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34544830

RESUMO

Structural variants (SVs) are an important source of human genome diversity, but their functional effects are poorly understood. We mapped 61,668 SVs in 613 individuals from the GTEx project and measured their effects on gene expression. We estimate that common SVs are causal at 2.66% of eQTLs, a 10.5-fold enrichment relative to their abundance in the genome. Duplications and deletions were the most impactful variant types, whereas the contribution of mobile element insertions was small (0.12% of eQTLs, 1.9-fold enriched). Multitissue analysis of eQTLs revealed that gene-altering SVs show more constitutive effects than other variant types, with 62.09% of coding SV-eQTLs active in all tissues with eQTL activity compared with 23.08% of coding SNV- and indel-eQTLs. Noncoding SVs, SNVs and indels show broadly similar patterns. We also identified 539 rare SVs associated with nearby gene expression outliers. Of these, 62.34% are noncoding SVs that affect gene expression but have modest enrichment at regulatory elements, showing that rare noncoding SVs are a major source of gene expression differences but remain difficult to predict from current annotations. Both common and rare SVs often affect the expression of multiple genes: SV-eQTLs affect an average of 1.82 nearby genes, whereas SNV- and indel-eQTLs affect an average of 1.09 genes, and 21.34% of rare expression-altering SVs show effects on two to nine different genes. We also observe significant effects on rare gene expression changes extending 1 Mb from the SV. This provides a mechanism by which individual SVs may have strong or pleiotropic effects on phenotypic variation.

11.
Nature ; 550(7675): 239-243, 2017 10 11.
Artigo em Inglês | MEDLINE | ID: mdl-29022581

RESUMO

Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.


Assuntos
Perfilação da Expressão Gênica , Variação Genética/genética , Especificidade de Órgãos/genética , Teorema de Bayes , Feminino , Genoma Humano/genética , Genômica , Genótipo , Humanos , Masculino , Modelos Genéticos , Análise de Sequência de RNA
12.
Hum Genomics ; 15(1): 34, 2021 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-34099068

RESUMO

BACKGROUND: Mitochondrial genome copy number (MT-CN) varies among humans and across tissues and is highly heritable, but its causes and consequences are not well understood. When measured by bulk DNA sequencing in blood, MT-CN may reflect a combination of the number of mitochondria per cell and cell-type composition. Here, we studied MT-CN variation in blood-derived DNA from 19184 Finnish individuals using a combination of genome (N = 4163) and exome sequencing (N = 19034) data as well as imputed genotypes (N = 17718). RESULTS: We identified two loci significantly associated with MT-CN variation: a common variant at the MYB-HBS1L locus (P = 1.6 × 10-8), which has previously been associated with numerous hematological parameters; and a burden of rare variants in the TMBIM1 gene (P = 3.0 × 10-8), which has been reported to protect against non-alcoholic fatty liver disease. We also found that MT-CN is strongly associated with insulin levels (P = 2.0 × 10-21) and other metabolic syndrome (metS)-related traits. Using a Mendelian randomization framework, we show evidence that MT-CN measured in blood is causally related to insulin levels. We then applied an MT-CN polygenic risk score (PRS) derived from Finnish data to the UK Biobank, where the association between the PRS and metS traits was replicated. Adjusting for cell counts largely eliminated these signals, suggesting that MT-CN affects metS via cell-type composition. CONCLUSION: These results suggest that measurements of MT-CN in blood-derived DNA partially reflect differences in cell-type composition and that these differences are causally linked to insulin and related traits.


Assuntos
Proteínas Reguladoras de Apoptose/genética , Variações do Número de Cópias de DNA/genética , DNA Mitocondrial/sangue , Proteínas de Ligação ao GTP/genética , Proteínas de Membrana/genética , Proteínas Proto-Oncogênicas c-myb/genética , Adulto , Idoso , Linhagem da Célula/genética , DNA Mitocondrial/genética , Feminino , Predisposição Genética para Doença , Genoma Mitocondrial/genética , Estudo de Associação Genômica Ampla , Humanos , Masculino , Análise da Randomização Mendeliana , Pessoa de Meia-Idade , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de DNA , Sequenciamento do Exoma
13.
Mol Psychiatry ; 26(9): 4884-4895, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-33526825

RESUMO

Copy number variants (CNVs) are associated with syndromic and severe neurological and psychiatric disorders (SNPDs), such as intellectual disability, epilepsy, schizophrenia, and bipolar disorder. Although considered high-impact, CNVs are also observed in the general population. This presents a diagnostic challenge in evaluating their clinical significance. To estimate the phenotypic differences between CNV carriers and non-carriers regarding general health and well-being, we compared the impact of SNPD-associated CNVs on health, cognition, and socioeconomic phenotypes to the impact of three genome-wide polygenic risk score (PRS) in two Finnish cohorts (FINRISK, n = 23,053 and NFBC1966, n = 4895). The focus was on CNV carriers and PRS extremes who do not have an SNPD diagnosis. We identified high-risk CNVs (DECIPHER CNVs, risk gene deletions, or large [>1 Mb] CNVs) in 744 study participants (2.66%), 36 (4.8%) of whom had a diagnosed SNPD. In the remaining 708 unaffected carriers, we observed lower educational attainment (EA; OR = 0.77 [95% CI 0.66-0.89]) and lower household income (OR = 0.77 [0.66-0.89]). Income-associated CNVs also lowered household income (OR = 0.50 [0.38-0.66]), and CNVs with medical consequences lowered subjective health (OR = 0.48 [0.32-0.72]). The impact of PRSs was broader. At the lowest extreme of PRS for EA, we observed lower EA (OR = 0.31 [0.26-0.37]), lower-income (OR = 0.66 [0.57-0.77]), lower subjective health (OR = 0.72 [0.61-0.83]), and increased mortality (Cox's HR = 1.55 [1.21-1.98]). PRS for intelligence had a similar impact, whereas PRS for schizophrenia did not affect these traits. We conclude that the majority of working-age individuals carrying high-risk CNVs without SNPD diagnosis have a modest impact on morbidity and mortality, as well as the limited impact on income and educational attainment, compared to individuals at the extreme end of common genetic variation. Our findings highlight that the contribution of traditional high-risk variants such as CNVs should be analyzed in a broader genetic context, rather than evaluated in isolation.


Assuntos
Variações do Número de Cópias de DNA , Esquizofrenia , Cognição , Variações do Número de Cópias de DNA/genética , Escolaridade , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla , Humanos , Herança Multifatorial/genética , Esquizofrenia/genética
14.
Bioinformatics ; 35(22): 4782-4787, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31218349

RESUMO

SUMMARY: Large-scale human genetics studies are now employing whole genome sequencing with the goal of conducting comprehensive trait mapping analyses of all forms of genome variation. However, methods for structural variation (SV) analysis have lagged far behind those for smaller scale variants, and there is an urgent need to develop more efficient tools that scale to the size of human populations. Here, we present a fast and highly scalable software toolkit (svtools) and cloud-based pipeline for assembling high quality SV maps-including deletions, duplications, mobile element insertions, inversions and other rearrangements-in many thousands of human genomes. We show that this pipeline achieves similar variant detection performance to established per-sample methods (e.g. LUMPY), while providing fast and affordable joint analysis at the scale of ≥100 000 genomes. These tools will help enable the next generation of human genetics studies. AVAILABILITY AND IMPLEMENTATION: svtools is implemented in Python and freely available (MIT) from https://github.com/hall-lab/svtools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma Humano , Software , Humanos , Deleção de Sequência , Sequenciamento Completo do Genoma
15.
Nat Methods ; 12(10): 966-8, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26258291

RESUMO

SpeedSeq is an open-source genome analysis platform that accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 h on a low-cost server and alleviates a bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement. SpeedSeq offers performance competitive with or superior to current methods for detecting germline and somatic single-nucleotide variants, structural variants, insertions and deletions, and it includes novel functionality for streamlined interpretation.


Assuntos
Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Anotação de Sequência Molecular/métodos , Software , Variação Genética , Humanos , Neoplasias/genética , Polimorfismo de Nucleotídeo Único , Medicina de Precisão/métodos , Fluxo de Trabalho
16.
Bioinformatics ; 33(7): 1083-1085, 2017 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-28031184

RESUMO

Summary: Here we present SVScore, a tool for in silico structural variation (SV) impact prediction. SVScore aggregates per-base single nucleotide polymorphism (SNP) pathogenicity scores across relevant genomic intervals for each SV in a manner that considers variant type, gene features and positional uncertainty. We show that the allele frequency spectrum of high-scoring SVs is strongly skewed toward lower frequencies, suggesting that they are under purifying selection, and that SVScore identifies deleterious variants more effectively than alternative methods. Notably, our results also suggest that duplications are under surprisingly strong selection relative to deletions, and that there are a similar number of strongly pathogenic SVs and SNPs in the human population. Availability and Implementation: SVScore is implemented in Perl and available freely at {{ http://www.github.com/lganel/SVScore }} for use under the MIT license. Contact: ihall@wustl.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Variação Estrutural do Genoma , Software , Frequência do Gene , Genômica/métodos , Humanos , Polimorfismo de Nucleotídeo Único , Deleção de Sequência
17.
Genome Res ; 23(5): 762-76, 2013 May.
Artigo em Inglês | MEDLINE | ID: mdl-23410887

RESUMO

Tumor genomes are generally thought to evolve through a gradual accumulation of mutations, but the observation that extraordinarily complex rearrangements can arise through single mutational events suggests that evolution may be accelerated by punctuated changes in genome architecture. To assess the prevalence and origins of complex genomic rearrangements (CGRs), we mapped 6179 somatic structural variation breakpoints in 64 cancer genomes from seven tumor types and screened for clusters of three or more interconnected breakpoints. We find that complex breakpoint clusters are extremely common: 154 clusters comprise 25% of all somatic breakpoints, and 75% of tumors exhibit at least one complex cluster. Based on copy number state profiling, 63% of breakpoint clusters are consistent with being CGRs that arose through a single mutational event. CGRs have diverse architectures including focal breakpoint clusters, large-scale rearrangements joining clusters from one or more chromosomes, and staggeringly complex chromothripsis events. Notably, chromothripsis has a significantly higher incidence in glioblastoma samples (39%) relative to other tumor types (9%). Chromothripsis breakpoints also show significantly elevated intra-tumor allele frequencies relative to simple SVs, which indicates that they arise early during tumorigenesis or confer selective advantage. Finally, assembly and analysis of 4002 somatic and 6982 germline breakpoint sequences reveal that somatic breakpoints show significantly less microhomology and fewer templated insertions than germline breakpoints, and this effect is stronger at CGRs than at simple variants. These results are inconsistent with replication-based models of CGR genesis and strongly argue that nonhomologous repair of concurrently arising DNA double-strand breaks is the predominant mechanism underlying complex cancer genome rearrangements.


Assuntos
Aberrações Cromossômicas , Pontos de Quebra do Cromossomo , Mutação/genética , Neoplasias/genética , Sequência de Bases , Quebras de DNA de Cadeia Dupla , Replicação do DNA/genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias/patologia
18.
Bioinformatics ; 31(8): 1286-9, 2015 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-25527832

RESUMO

UNLABELLED: Current strategies for SNP and INDEL discovery incorporate sequence alignments from multiple individuals to maximize sensitivity and specificity. It is widely accepted that this approach also improves structural variant (SV) detection. However, multisample SV analysis has been stymied by the fundamental difficulties of SV calling, e.g. library insert size variability, SV alignment signal integration and detecting long-range genomic rearrangements involving disjoint loci. Extant tools suffer from poor scalability, which limits the number of genomes that can be co-analyzed and complicates analysis workflows. We have developed an approach that enables multisample SV analysis in hundreds to thousands of human genomes using commodity hardware. Here, we describe Hydra-Multi and measure its accuracy, speed and scalability using publicly available datasets provided by The 1000 Genomes Project and by The Cancer Genome Atlas (TCGA). AVAILABILITY AND IMPLEMENTATION: Hydra-Multi is written in C++ and is freely available at https://github.com/arq5x/Hydra. CONTACT: aaronquinlan@gmail.com or ihall@genome.wustl.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genética Populacional , Variação Estrutural do Genoma , Genômica/métodos , Hydra/genética , Software , Animais , Bases de Dados Factuais , Deleção de Genes , Humanos , Hydra/classificação , Alinhamento de Sequência
19.
Trends Genet ; 28(1): 43-53, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22094265

RESUMO

Genome structural variation (SV) is a major source of genetic diversity in mammals and a hallmark of cancer. Although SV is typically defined by its canonical forms (duplication, deletion, insertion, inversion and translocation), recent breakpoint mapping studies have revealed a surprising number of 'complex' variants that evade simple classification. Complex SVs are defined by clustered breakpoints that arose through a single mutation but cannot be explained by one simple end-joining or recombination event. Some complex variants exhibit profoundly complicated rearrangements between distinct loci from multiple chromosomes, whereas others involve more subtle alterations at a single locus. These diverse and unpredictable features present a challenge for SV mapping experiments. Here, we review current knowledge of complex SV in mammals, and outline techniques for identifying and characterizing complex variants using next-generation DNA sequencing.


Assuntos
Variação Estrutural do Genoma , Células Germinativas , Animais , Genômica , Células Germinativas/química , Células Germinativas/metabolismo , Humanos , Neoplasias/genética
20.
Bioinformatics ; 30(17): 2503-5, 2014 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-24812344

RESUMO

MOTIVATION: Illumina DNA sequencing is now the predominant source of raw genomic data, and data volumes are growing rapidly. Bioinformatic analysis pipelines are having trouble keeping pace. A common bottleneck in such pipelines is the requirement to read, write, sort and compress large BAM files multiple times. RESULTS: We present SAMBLASTER, a tool that reduces the number of times such costly operations are performed. SAMBLASTER is designed to mark duplicates in read-sorted SAM files as a piped post-pass on DNA aligner output before it is compressed to BAM. In addition, it can simultaneously output into separate files the discordant read-pairs and/or split-read mappings used for structural variant calling. As an alignment post-pass, its own runtime overhead is negligible, while dramatically reducing overall pipeline complexity and runtime. As a stand-alone duplicate marking tool, it performs significantly better than PICARD or SAMBAMBA in terms of both speed and memory usage, while achieving nearly identical results. AVAILABILITY AND IMPLEMENTATION: SAMBLASTER is open-source C+ + code and freely available for download from https://github.com/GregoryFaust/samblaster.


Assuntos
Variação Estrutural do Genoma , Análise de Sequência de DNA/métodos , Software , Genômica/métodos , Alinhamento de Sequência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA