Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
bioRxiv ; 2024 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-38645134

RESUMO

Missense variants can have a range of functional impacts depending on factors such as the specific amino acid substitution and location within the gene. To interpret their deleteriousness, studies have sought to identify regions within genes that are specifically intolerant of missense variation 1-12 . Here, we leverage the patterns of rare missense variation in 125,748 individuals in the Genome Aggregation Database (gnomAD) 13 against a null mutational model to identify transcripts that display regional differences in missense constraint. Missense-depleted regions are enriched for ClinVar 14 pathogenic variants, de novo missense variants from individuals with neurodevelopmental disorders (NDDs) 15,16 , and complex trait heritability. Following ClinGen calibration recommendations for the ACMG/AMP guidelines, we establish that regions with less than 20% of their expected missense variation achieve moderate support for pathogenicity. We create a missense deleteriousness metric (MPC) that incorporates regional constraint and outperforms other deleteriousness scores at stratifying case and control de novo missense variation, with a strong enrichment in NDDs. These results provide additional tools to aid in missense variant interpretation.

3.
Nat Genet ; 56(1): 152-161, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38057443

RESUMO

Recessive diseases arise when both copies of a gene are impacted by a damaging genetic variant. When a patient carries two potentially causal variants in a gene, accurate diagnosis requires determining that these variants occur on different copies of the chromosome (that is, are in trans) rather than on the same copy (that is, in cis). However, current approaches for determining phase, beyond parental testing, are limited in clinical settings. Here we developed a strategy for inferring phase for rare variant pairs within genes, leveraging genotypes observed in the Genome Aggregation Database (v2, n = 125,748 exomes). Our approach estimates phase with 96% accuracy, both in trio data and in patients with Mendelian conditions and presumed causal compound heterozygous variants. We provide a public resource of phasing estimates for coding variants and counts per gene of rare variants in trans that can aid interpretation of rare co-occurring variants in the context of recessive disease.


Assuntos
Exoma , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Exoma/genética , Sequenciamento do Exoma , Genótipo
4.
Nature ; 625(7993): 92-100, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38057664

RESUMO

The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.


Assuntos
Genoma Humano , Genômica , Modelos Genéticos , Mutação , Humanos , Acesso à Informação , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Frequência do Gene , Genoma Humano/genética , Mutação/genética , Seleção Genética
5.
Am J Hum Genet ; 110(9): 1496-1508, 2023 09 07.
Artigo em Inglês | MEDLINE | ID: mdl-37633279

RESUMO

Predicted loss of function (pLoF) variants are often highly deleterious and play an important role in disease biology, but many pLoF variants may not result in loss of function (LoF). Here we present a framework that advances interpretation of pLoF variants in research and clinical settings by considering three categories of LoF evasion: (1) predicted rescue by secondary sequence properties, (2) uncertain biological relevance, and (3) potential technical artifacts. We also provide recommendations on adjustments to ACMG/AMP guidelines' PVS1 criterion. Applying this framework to all high-confidence pLoF variants in 22 genes associated with autosomal-recessive disease from the Genome Aggregation Database (gnomAD v.2.1.1) revealed predicted LoF evasion or potential artifacts in 27.3% (304/1,113) of variants. The major reasons were location in the last exon, in a homopolymer repeat, in a low proportion expressed across transcripts (pext) scored region, or the presence of cryptic in-frame splice rescues. Variants predicted to evade LoF or to be potential artifacts were enriched for ClinVar benign variants. PVS1 was downgraded in 99.4% (162/163) of pLoF variants predicted as likely not LoF/not LoF, with 17.2% (28/163) downgraded as a result of our framework, adding to previous guidelines. Variant pathogenicity was affected (mostly from likely pathogenic to VUS) in 20 (71.4%) of these 28 variants. This framework guides assessment of pLoF variants beyond standard annotation pipelines and substantially reduces false positive rates, which is key to ensure accurate LoF variant prediction in both a research and clinical setting.


Assuntos
Padrões de Herança , Humanos , Éxons , Incerteza
6.
bioRxiv ; 2023 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-36993580

RESUMO

Recessive diseases arise when both the maternal and the paternal copies of a gene are impacted by a damaging genetic variant in the affected individual. When a patient carries two different potentially causal variants in a gene for a given disorder, accurate diagnosis requires determining that these two variants occur on different copies of the chromosome (i.e., are in trans) rather than on the same copy (i.e. in cis). However, current approaches for determining phase, beyond parental testing, are limited in clinical settings. We developed a strategy for inferring phase for rare variant pairs within genes, leveraging genotypes observed in exome sequencing data from the Genome Aggregation Database (gnomAD v2, n=125,748). When applied to trio data where phase can be determined by transmission, our approach estimates phase with 95.7% accuracy and remains accurate even for very rare variants (allele frequency < 1×10-4). We also correctly phase 95.9% of variant pairs in a set of 293 patients with Mendelian conditions carrying presumed causal compound heterozygous variants. We provide a public resource of phasing estimates from gnomAD, including phasing estimates for coding variants across the genome and counts per gene of rare variants in trans, that can aid interpretation of rare co-occurring variants in the context of recessive disease.

7.
medRxiv ; 2023 Mar 09.
Artigo em Inglês | MEDLINE | ID: mdl-36945502

RESUMO

Predicted loss of function (pLoF) variants are highly deleterious and play an important role in disease biology, but many of these variants may not actually result in loss-of-function. Here we present a framework that advances interpretation of pLoF variants in research and clinical settings by considering three categories of LoF evasion: (1) predicted rescue by secondary sequence properties, (2) uncertain biological relevance, and (3) potential technical artifacts. We also provide recommendations on adjustments to ACMG/AMP guidelines's PVS1 criterion. Applying this framework to all high-confidence pLoF variants in 22 autosomal recessive disease-genes from the Genome Aggregation Database (gnomAD, v2.1.1) revealed predicted LoF evasion or potential artifacts in 27.3% (304/1,113) of variants. The major reasons were location in the last exon, in a homopolymer repeat, in low per-base expression (pext) score regions, or the presence of cryptic splice rescues. Variants predicted to be potential artifacts or to evade LoF were enriched for ClinVar benign variants. PVS1 was downgraded in 99.4% (162/163) of LoF evading variants assessed, with 17.2% (28/163) downgraded as a result of our framework, adding to previous guidelines. Variant pathogenicity was affected (mostly from likely pathogenic to VUS) in 20 (71.4%) of these 28 variants. This framework guides assessment of pLoF variants beyond standard annotation pipelines, and substantially reduces false positive rates, which is key to ensure accurate LoF variant prediction in both a research and clinical setting.

8.
Nat Genet ; 54(5): 541-547, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35410376

RESUMO

We report results from the Bipolar Exome (BipEx) collaboration analysis of whole-exome sequencing of 13,933 patients with bipolar disorder (BD) matched with 14,422 controls. We find an excess of ultra-rare protein-truncating variants (PTVs) in patients with BD among genes under strong evolutionary constraint in both major BD subtypes. We find enrichment of ultra-rare PTVs within genes implicated from a recent schizophrenia exome meta-analysis (SCHEMA; 24,248 cases and 97,322 controls) and among binding targets of CHD8. Genes implicated from genome-wide association studies (GWASs) of BD, however, are not significantly enriched for ultra-rare PTVs. Combining gene-level results with SCHEMA, AKAP11 emerges as a definitive risk gene (odds ratio (OR) = 7.06, P = 2.83 × 10-9). At the protein level, AKAP-11 interacts with GSK3B, the hypothesized target of lithium, a primary treatment for BD. Our results lend support to BD's polygenicity, demonstrating a role for rare coding variation as a significant risk factor in BD etiology.


Assuntos
Transtorno Bipolar , Esquizofrenia , Proteínas de Ancoragem à Quinase A/genética , Transtorno Bipolar/genética , Exoma/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Esquizofrenia/genética , Sequenciamento do Exoma
9.
Genome Res ; 32(3): 569-582, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35074858

RESUMO

Genomic databases of allele frequency are extremely helpful for evaluating clinical variants of unknown significance; however, until now, databases such as the Genome Aggregation Database (gnomAD) have focused on nuclear DNA and have ignored the mitochondrial genome (mtDNA). Here, we present a pipeline to call mtDNA variants that addresses three technical challenges: (1) detecting homoplasmic and heteroplasmic variants, present, respectively, in all or a fraction of mtDNA molecules; (2) circular mtDNA genome; and (3) misalignment of nuclear sequences of mitochondrial origin (NUMTs). We observed that mtDNA copy number per cell varied across gnomAD cohorts and influenced the fraction of NUMT-derived false-positive variant calls, which can account for the majority of putative heteroplasmies. To avoid false positives, we excluded contaminated samples, cell lines, and samples prone to NUMT misalignment due to few mtDNA copies. Furthermore, we report variants with heteroplasmy ≥10%. We applied this pipeline to 56,434 whole-genome sequences in the gnomAD v3.1 database that includes individuals of European (58%), African (25%), Latino (10%), and Asian (5%) ancestry. Our gnomAD v3.1 release contains population frequencies for 10,850 unique mtDNA variants at more than half of all mtDNA bases. Importantly, we report frequencies within each nuclear ancestral population and mitochondrial haplogroup. Homoplasmic variants account for most variant calls (98%) and unique variants (85%). We observed that 1/250 individuals carry a pathogenic mtDNA variant with heteroplasmy above 10%. These mtDNA population allele frequencies are freely accessible and will aid in diagnostic interpretation and research studies.


Assuntos
DNA Mitocondrial , Genoma Mitocondrial , Núcleo Celular/genética , DNA Mitocondrial/genética , Frequência do Gene , Genoma , Humanos , Mitocôndrias/genética , Análise de Sequência de DNA
10.
Hum Mutat ; 43(8): 1012-1030, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-34859531

RESUMO

Reference population databases are an essential tool in variant and gene interpretation. Their use guides the identification of pathogenic variants amidst the sea of benign variation present in every human genome, and supports the discovery of new disease-gene relationships. The Genome Aggregation Database (gnomAD) is currently the largest and most widely used publicly available collection of population variation from harmonized sequencing data. The data is available through the online gnomAD browser (https://gnomad.broadinstitute.org/) that enables rapid and intuitive variant analysis. This review provides guidance on the content of the gnomAD browser, and its usage for variant and gene interpretation. We introduce key features including allele frequency, per-base expression levels, constraint scores, and variant co-occurrence, alongside guidance on how to use these in analysis, with a focus on the interpretation of candidate variants and novel genes in rare disease.


Assuntos
Doenças Raras , Software , Bases de Dados Genéticas , Frequência do Gene , Humanos , Doenças Raras/genética
11.
Cell Genom ; 2(9): 100168, 2022 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-36778668

RESUMO

Genome-wide association studies have successfully discovered thousands of common variants associated with human diseases and traits, but the landscape of rare variations in human disease has not been explored at scale. Exome-sequencing studies of population biobanks provide an opportunity to systematically evaluate the impact of rare coding variations across a wide range of phenotypes to discover genes and allelic series relevant to human health and disease. Here, we present results from systematic association analyses of 4,529 phenotypes using single-variant and gene tests of 394,841 individuals in the UK Biobank with exome-sequence data. We find that the discovery of genetic associations is tightly linked to frequency and is correlated with metrics of deleteriousness and natural selection. We highlight biological findings elucidated by these data and release the dataset as a public resource alongside the Genebass browser for rapidly exploring rare-variant association results.

16.
Nature ; 581(7809): 444-451, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32461652

RESUMO

Structural variants (SVs) rearrange large segments of DNA1 and can have profound consequences in evolution and human disease2,3. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)4 have become integral in the interpretation of single-nucleotide variants (SNVs)5. However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25-29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage6. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings7. This SV resource is freely distributed via the gnomAD browser8 and will have broad utility in population genetics, disease-association studies, and diagnostic screening.


Assuntos
Doença/genética , Variação Genética , Genética Médica/normas , Genética Populacional/normas , Genoma Humano/genética , Feminino , Testes Genéticos , Técnicas de Genotipagem , Humanos , Masculino , Pessoa de Meia-Idade , Mutação , Polimorfismo de Nucleotídeo Único/genética , Grupos Raciais/genética , Padrões de Referência , Seleção Genética , Sequenciamento Completo do Genoma
17.
Nature ; 581(7809): 452-458, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32461655

RESUMO

The acceleration of DNA sequencing in samples from patients and population studies has resulted in extensive catalogues of human genetic variation, but the interpretation of rare genetic variants remains problematic. A notable example of this challenge is the existence of disruptive variants in dosage-sensitive disease genes, even in apparently healthy individuals. Here, by manual curation of putative loss-of-function (pLoF) variants in haploinsufficient disease genes in the Genome Aggregation Database (gnomAD)1, we show that one explanation for this paradox involves alternative splicing of mRNA, which allows exons of a gene to be expressed at varying levels across different cell types. Currently, no existing annotation tool systematically incorporates information about exon expression into the interpretation of variants. We develop a transcript-level annotation metric known as the 'proportion expressed across transcripts', which quantifies isoform expression for variants. We calculate this metric using 11,706 tissue samples from the Genotype Tissue Expression (GTEx) project2 and show that it can differentiate between weakly and highly evolutionarily conserved exons, a proxy for functional importance. We demonstrate that expression-based annotation selectively filters 22.8% of falsely annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while removing less than 4% of high-confidence pathogenic variants in the same genes. Finally, we apply our expression filter to the analysis of de novo variants in patients with autism spectrum disorder and intellectual disability or developmental disorders to show that pLoF variants in weakly expressed regions have similar effect sizes to those of synonymous variants, whereas pLoF variants in highly expressed exons are most strongly enriched among cases. Our annotation is fast, flexible and generalizable, making it possible for any variant file to be annotated with any isoform expression dataset, and will be valuable for the genetic diagnosis of rare diseases, the analysis of rare variant burden in complex disorders, and the curation and prioritization of variants in recall-by-genotype studies.


Assuntos
Doença/genética , Haploinsuficiência/genética , Mutação com Perda de Função/genética , Anotação de Sequência Molecular , Transcrição Gênica , Transcriptoma/genética , Transtorno do Espectro Autista/genética , Conjuntos de Dados como Assunto , Deficiências do Desenvolvimento/genética , Éxons/genética , Feminino , Genótipo , Humanos , Deficiência Intelectual/genética , Masculino , Anotação de Sequência Molecular/normas , Distribuição de Poisson , RNA Mensageiro/análise , RNA Mensageiro/genética , Doenças Raras/diagnóstico , Doenças Raras/genética , Reprodutibilidade dos Testes , Sequenciamento do Exoma
18.
Nature ; 581(7809): 434-443, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32461654

RESUMO

Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.


Assuntos
Exoma/genética , Genes Essenciais/genética , Variação Genética/genética , Genoma Humano/genética , Adulto , Encéfalo/metabolismo , Doenças Cardiovasculares/genética , Estudos de Coortes , Bases de Dados Genéticas , Feminino , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla , Humanos , Mutação com Perda de Função/genética , Masculino , Taxa de Mutação , Pró-Proteína Convertase 9/genética , RNA Mensageiro/genética , Reprodutibilidade dos Testes , Sequenciamento do Exoma , Sequenciamento Completo do Genoma
19.
Bioinformatics ; 33(4): 627-628, 2017 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-27797780

RESUMO

Motivation: The ability to centralize and store data for long periods on an end user's computational resources is increasingly difficult for many scientific disciplines. For example, genomics data is increasingly large and distributed, and the data needs to be moved into workflow execution sites ranging from lab workstations to the cloud. However, the typical user is not always informed on emerging network technology or the most efficient methods to move and share data. Thus, the user defaults to using inefficient methods for transfer across the commercial internet. Results: To accelerate large data transfer, we created a tool called the Big Data Smart Socket (BDSS) that abstracts data transfer methodology from the user. The user provides BDSS with a manifest of datasets stored in a remote storage repository. BDSS then queries a metadata repository for curated data transfer mechanisms and optimal path to move each of the files in the manifest to the site of workflow execution. BDSS functions as a standalone tool or can be directly integrated into a computational workflow such as provided by the Galaxy Project. To demonstrate applicability, we use BDSS within a biological context, although it is applicable to any scientific domain. Availability and Implementation: BDSS is available under version 2 of the GNU General Public License at https://github.com/feltus/BDSS . Contact: ffeltus@clemson.edu.


Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais , Software , Hábitos , Internet , Fluxo de Trabalho
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...