Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Genome Res ; 2024 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-38749656

RESUMO

Underrepresented populations are often excluded from genomic studies due in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high quality set of 4,094 whole genomes from 80 populations in the HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also demonstrate substantial added value from this dataset compared to the prior versions of the component resources, typically combined via liftOver and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared to previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.

2.
bioRxiv ; 2024 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-38645134

RESUMO

Missense variants can have a range of functional impacts depending on factors such as the specific amino acid substitution and location within the gene. To interpret their deleteriousness, studies have sought to identify regions within genes that are specifically intolerant of missense variation 1-12 . Here, we leverage the patterns of rare missense variation in 125,748 individuals in the Genome Aggregation Database (gnomAD) 13 against a null mutational model to identify transcripts that display regional differences in missense constraint. Missense-depleted regions are enriched for ClinVar 14 pathogenic variants, de novo missense variants from individuals with neurodevelopmental disorders (NDDs) 15,16 , and complex trait heritability. Following ClinGen calibration recommendations for the ACMG/AMP guidelines, we establish that regions with less than 20% of their expected missense variation achieve moderate support for pathogenicity. We create a missense deleteriousness metric (MPC) that incorporates regional constraint and outperforms other deleteriousness scores at stratifying case and control de novo missense variation, with a strong enrichment in NDDs. These results provide additional tools to aid in missense variant interpretation.

3.
medRxiv ; 2024 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-38496558

RESUMO

Genes encoding long non-coding RNAs (lncRNAs) comprise a large fraction of the human genome, yet haploinsufficiency of a lncRNA has not been shown to cause a Mendelian disease. CHASERR is a highly conserved human lncRNA adjacent to CHD2-a coding gene in which de novo loss-of-function variants cause developmental and epileptic encephalopathy. Here we report three unrelated individuals each harboring an ultra-rare heterozygous de novo deletion in the CHASERR locus. We report similarities in severe developmental delay, facial dysmorphisms, and cerebral dysmyelination in these individuals, distinguishing them from the phenotypic spectrum of CHD2 haploinsufficiency. We demonstrate reduced CHASERR mRNA expression and corresponding increased CHD2 mRNA and protein in whole blood and patient-derived cell lines-specifically increased expression of the CHD2 allele in cis with the CHASERR deletion, as predicted from a prior mouse model of Chaserr haploinsufficiency. We show for the first time that de novo structural variants facilitated by Alu-mediated non-allelic homologous recombination led to deletion of a non-coding element (the lncRNA CHASERR) to cause a rare syndromic neurodevelopmental disorder. We also demonstrate that CHD2 has bidirectional dosage sensitivity in human disease. This work highlights the need to carefully evaluate other lncRNAs, particularly those upstream of genes associated with Mendelian disorders.

5.
bioRxiv ; 2024 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-36747613

RESUMO

Underrepresented populations are often excluded from genomic studies due in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high quality set of 4,094 whole genomes from HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also demonstrate substantial added value from this dataset compared to the prior versions of the component resources, typically combined via liftover and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared to previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.

6.
Nat Genet ; 56(1): 152-161, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38057443

RESUMO

Recessive diseases arise when both copies of a gene are impacted by a damaging genetic variant. When a patient carries two potentially causal variants in a gene, accurate diagnosis requires determining that these variants occur on different copies of the chromosome (that is, are in trans) rather than on the same copy (that is, in cis). However, current approaches for determining phase, beyond parental testing, are limited in clinical settings. Here we developed a strategy for inferring phase for rare variant pairs within genes, leveraging genotypes observed in the Genome Aggregation Database (v2, n = 125,748 exomes). Our approach estimates phase with 96% accuracy, both in trio data and in patients with Mendelian conditions and presumed causal compound heterozygous variants. We provide a public resource of phasing estimates for coding variants and counts per gene of rare variants in trans that can aid interpretation of rare co-occurring variants in the context of recessive disease.


Assuntos
Exoma , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Exoma/genética , Sequenciamento do Exoma , Genótipo
7.
Nature ; 625(7993): 92-100, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38057664

RESUMO

The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.


Assuntos
Genoma Humano , Genômica , Modelos Genéticos , Mutação , Humanos , Acesso à Informação , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Frequência do Gene , Genoma Humano/genética , Mutação/genética , Seleção Genética
8.
Am J Hum Genet ; 110(12): 2068-2076, 2023 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-38000370

RESUMO

DNA sample contamination is a major issue in clinical and research applications of whole-genome and -exome sequencing. Even modest levels of contamination can substantially affect the overall quality of variant calls and lead to widespread genotyping errors. Currently, popular tools for estimating the contamination level use short-read data (BAM/CRAM files), which are expensive to store and manipulate and often not retained or shared widely. We propose a metric to estimate DNA sample contamination from variant-level whole-genome and -exome sequence data called CHARR, contamination from homozygous alternate reference reads, which leverages the infiltration of reference reads within homozygous alternate variant calls. CHARR uses a small proportion of variant-level genotype information and thus can be computed from single-sample gVCFs or callsets in VCF or BCF formats, as well as efficiently stored variant calls in Hail VariantDataset format. Our results demonstrate that CHARR accurately recapitulates results from existing tools with substantially reduced costs, improving the accuracy and efficiency of downstream analyses of ultra-large whole-genome and exome sequencing datasets.


Assuntos
DNA , Truta , Humanos , Animais , Análise de Sequência de DNA/métodos , Genótipo , Homozigoto , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software
9.
bioRxiv ; 2023 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-37425834

RESUMO

DNA sample contamination is a major issue in clinical and research applications of whole genome and exome sequencing. Even modest levels of contamination can substantially affect the overall quality of variant calls and lead to widespread genotyping errors. Currently, popular tools for estimating the contamination level use short-read data (BAM/CRAM files), which are expensive to store and manipulate and often not retained or shared widely. We propose a new metric to estimate DNA sample contamination from variant-level whole genome and exome sequence data, CHARR, Contamination from Homozygous Alternate Reference Reads, which leverages the infiltration of reference reads within homozygous alternate variant calls. CHARR uses a small proportion of variant-level genotype information and thus can be computed from single-sample gVCFs or callsets in VCF or BCF formats, as well as efficiently stored variant calls in Hail VDS format. Our results demonstrate that CHARR accurately recapitulates results from existing tools with substantially reduced costs, improving the accuracy and efficiency of downstream analyses of ultra-large whole genome and exome sequencing datasets.

10.
bioRxiv ; 2023 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-36993580

RESUMO

Recessive diseases arise when both the maternal and the paternal copies of a gene are impacted by a damaging genetic variant in the affected individual. When a patient carries two different potentially causal variants in a gene for a given disorder, accurate diagnosis requires determining that these two variants occur on different copies of the chromosome (i.e., are in trans) rather than on the same copy (i.e. in cis). However, current approaches for determining phase, beyond parental testing, are limited in clinical settings. We developed a strategy for inferring phase for rare variant pairs within genes, leveraging genotypes observed in exome sequencing data from the Genome Aggregation Database (gnomAD v2, n=125,748). When applied to trio data where phase can be determined by transmission, our approach estimates phase with 95.7% accuracy and remains accurate even for very rare variants (allele frequency < 1×10-4). We also correctly phase 95.9% of variant pairs in a set of 293 patients with Mendelian conditions carrying presumed causal compound heterozygous variants. We provide a public resource of phasing estimates from gnomAD, including phasing estimates for coding variants across the genome and counts per gene of rare variants in trans, that can aid interpretation of rare co-occurring variants in the context of recessive disease.

11.
Hum Mutat ; 43(8): 1012-1030, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-34859531

RESUMO

Reference population databases are an essential tool in variant and gene interpretation. Their use guides the identification of pathogenic variants amidst the sea of benign variation present in every human genome, and supports the discovery of new disease-gene relationships. The Genome Aggregation Database (gnomAD) is currently the largest and most widely used publicly available collection of population variation from harmonized sequencing data. The data is available through the online gnomAD browser (https://gnomad.broadinstitute.org/) that enables rapid and intuitive variant analysis. This review provides guidance on the content of the gnomAD browser, and its usage for variant and gene interpretation. We introduce key features including allele frequency, per-base expression levels, constraint scores, and variant co-occurrence, alongside guidance on how to use these in analysis, with a focus on the interpretation of candidate variants and novel genes in rare disease.


Assuntos
Doenças Raras , Software , Bases de Dados Genéticas , Frequência do Gene , Humanos , Doenças Raras/genética
12.
Cell Genom ; 2(9): 100168, 2022 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-36778668

RESUMO

Genome-wide association studies have successfully discovered thousands of common variants associated with human diseases and traits, but the landscape of rare variations in human disease has not been explored at scale. Exome-sequencing studies of population biobanks provide an opportunity to systematically evaluate the impact of rare coding variations across a wide range of phenotypes to discover genes and allelic series relevant to human health and disease. Here, we present results from systematic association analyses of 4,529 phenotypes using single-variant and gene tests of 394,841 individuals in the UK Biobank with exome-sequence data. We find that the discovery of genetic associations is tightly linked to frequency and is correlated with metrics of deleteriousness and natural selection. We highlight biological findings elucidated by these data and release the dataset as a public resource alongside the Genebass browser for rapidly exploring rare-variant association results.

14.
Nat Commun ; 12(1): 3505, 2021 06 09.
Artigo em Inglês | MEDLINE | ID: mdl-34108472

RESUMO

Hundreds of thousands of genetic variants have been reported to cause severe monogenic diseases, but the probability that a variant carrier develops the disease (termed penetrance) is unknown for virtually all of them. Additionally, the clinical utility of common polygenetic variation remains uncertain. Using exome sequencing from 77,184 adult individuals (38,618 multi-ancestral individuals from a type 2 diabetes case-control study and 38,566 participants from the UK Biobank, for whom genotype array data were also available), we apply clinical standard-of-care gene variant curation for eight monogenic metabolic conditions. Rare variants causing monogenic diabetes and dyslipidemias display effect sizes significantly larger than the top 1% of the corresponding polygenic scores. Nevertheless, penetrance estimates for monogenic variant carriers average 60% or lower for most conditions. We assess epidemiologic and genetic factors contributing to risk prediction in monogenic variant carriers, demonstrating that inclusion of polygenic variation significantly improves biomarker estimation for two monogenic dyslipidemias.


Assuntos
Diabetes Mellitus Tipo 2/genética , Dislipidemias/genética , Predisposição Genética para Doença/genética , Adulto , Variação Biológica da População , Biomarcadores/metabolismo , Diabetes Mellitus Tipo 2/metabolismo , Dislipidemias/metabolismo , Exoma/genética , Genótipo , Humanos , Herança Multifatorial , Penetrância , Medição de Risco
15.
J Exp Med ; 218(6)2021 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-33857290

RESUMO

Advances in genome sequencing have resulted in the identification of the causes for numerous rare diseases. However, many cases remain unsolved with standard molecular analyses. We describe a family presenting with a phenotype resembling inherited thrombocytopenia 2 (THC2). THC2 is generally caused by single nucleotide variants that prevent silencing of ANKRD26 expression during hematopoietic differentiation. Short-read whole-exome and genome sequencing approaches were unable to identify a causal variant in this family. Using long-read whole-genome sequencing, a large complex structural variant involving a paired-duplication inversion was identified. Through functional studies, we show that this structural variant results in a pathogenic gain-of-function WAC-ANKRD26 fusion transcript. Our findings illustrate how complex structural variants that may be missed by conventional genome sequencing approaches can cause human disease.


Assuntos
Proteínas Adaptadoras de Transdução de Sinal/genética , Peptídeos e Proteínas de Sinalização Intercelular/genética , Polimorfismo de Nucleotídeo Único/genética , Trombocitopenia/genética , Adolescente , Adulto , Idoso , Linhagem Celular , Linhagem Celular Tumoral , Criança , Quebra Cromossômica , Transtornos Cromossômicos/genética , Exoma/genética , Feminino , Células HEK293 , Células HeLa , Humanos , Masculino , Pessoa de Meia-Idade , Mutação/genética , Linhagem , Trombocitopenia/congênito
16.
Am J Hum Genet ; 108(5): 840-856, 2021 05 06.
Artigo em Inglês | MEDLINE | ID: mdl-33861953

RESUMO

JAG2 encodes the Notch ligand Jagged2. The conserved Notch signaling pathway contributes to the development and homeostasis of multiple tissues, including skeletal muscle. We studied an international cohort of 23 individuals with genetically unsolved muscular dystrophy from 13 unrelated families. Whole-exome sequencing identified rare homozygous or compound heterozygous JAG2 variants in all 13 families. The identified bi-allelic variants include 10 missense variants that disrupt highly conserved amino acids, a nonsense variant, two frameshift variants, an in-frame deletion, and a microdeletion encompassing JAG2. Onset of muscle weakness occurred from infancy to young adulthood. Serum creatine kinase (CK) levels were normal or mildly elevated. Muscle histology was primarily dystrophic. MRI of the lower extremities revealed a distinct, slightly asymmetric pattern of muscle involvement with cores of preserved and affected muscles in quadriceps and tibialis anterior, in some cases resembling patterns seen in POGLUT1-associated muscular dystrophy. Transcriptome analysis of muscle tissue from two participants suggested misregulation of genes involved in myogenesis, including PAX7. In complementary studies, Jag2 downregulation in murine myoblasts led to downregulation of multiple components of the Notch pathway, including Megf10. Investigations in Drosophila suggested an interaction between Serrate and Drpr, the fly orthologs of JAG1/JAG2 and MEGF10, respectively. In silico analysis predicted that many Jagged2 missense variants are associated with structural changes and protein misfolding. In summary, we describe a muscular dystrophy associated with pathogenic variants in JAG2 and evidence suggests a disease mechanism related to Notch pathway dysfunction.


Assuntos
Proteína Jagged-2/genética , Distrofias Musculares/genética , Adolescente , Adulto , Sequência de Aminoácidos , Animais , Linhagem Celular , Criança , Pré-Escolar , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Feminino , Glucosiltransferases/genética , Haplótipos/genética , Humanos , Proteína Jagged-1/genética , Proteína Jagged-2/química , Proteína Jagged-2/deficiência , Proteína Jagged-2/metabolismo , Masculino , Proteínas de Membrana/genética , Camundongos , Pessoa de Meia-Idade , Modelos Moleculares , Músculos/metabolismo , Músculos/patologia , Distrofias Musculares/patologia , Mioblastos/metabolismo , Mioblastos/patologia , Linhagem , Fenótipo , Receptores Notch/metabolismo , Transdução de Sinais , Sequenciamento do Exoma , Adulto Jovem
18.
Nat Med ; 26(6): 869-877, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32461697

RESUMO

Human genetic variants predicted to cause loss-of-function of protein-coding genes (pLoF variants) provide natural in vivo models of human gene inactivation and can be valuable indicators of gene function and the potential toxicity of therapeutic inhibitors targeting these genes1,2. Gain-of-kinase-function variants in LRRK2 are known to significantly increase the risk of Parkinson's disease3,4, suggesting that inhibition of LRRK2 kinase activity is a promising therapeutic strategy. While preclinical studies in model organisms have raised some on-target toxicity concerns5-8, the biological consequences of LRRK2 inhibition have not been well characterized in humans. Here, we systematically analyze pLoF variants in LRRK2 observed across 141,456 individuals sequenced in the Genome Aggregation Database (gnomAD)9, 49,960 exome-sequenced individuals from the UK Biobank and over 4 million participants in the 23andMe genotyped dataset. After stringent variant curation, we identify 1,455 individuals with high-confidence pLoF variants in LRRK2. Experimental validation of three variants, combined with previous work10, confirmed reduced protein levels in 82.5% of our cohort. We show that heterozygous pLoF variants in LRRK2 reduce LRRK2 protein levels but that these are not strongly associated with any specific phenotype or disease state. Our results demonstrate the value of large-scale genomic databases and phenotyping of human loss-of-function carriers for target validation in drug discovery.


Assuntos
Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/genética , Mutação com Perda de Função/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Bancos de Espécimes Biológicos , Linhagem Celular , Células-Tronco Embrionárias/metabolismo , Feminino , Mutação com Ganho de Função/genética , Heterozigoto , Humanos , Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/antagonistas & inibidores , Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/metabolismo , Longevidade/genética , Linfócitos/metabolismo , Masculino , Pessoa de Meia-Idade , Miócitos Cardíacos/metabolismo , Doença de Parkinson/tratamento farmacológico , Doença de Parkinson/genética , Fenótipo
19.
mSystems ; 4(6)2019 Nov 26.
Artigo em Inglês | MEDLINE | ID: mdl-31771976

RESUMO

Interactions between the gut microbiome and immunoglobulin A (IgA) in the gut during infancy are important for future health. IgM and IgG are also present in the gut; however, their interactions with the microbiome in the developing infant remain to be characterized. Using stool samples sampled 15 times in infancy from 32 healthy subjects at 4 locations in 3 countries, we characterized patterns of microbiome development in relation to fecal levels of IgA, IgG, and IgM. For 8 infants from a single location, we used fluorescence-activated cell sorting of microbial cells from stool by Ig-coating status over 18 months. We used 16S rRNA gene profiling on full and sorted microbiomes to assess patterns of antibody coating in relation to age and other factors. All antibodies decreased in concentration with age but were augmented by breastmilk feeding regardless of infant age. Levels of IgA correlated with relative abundances of operational taxonomic units (OTUs) belonging to the Bifidobacteria and Enterobacteriaceae, which dominated the early microbiome, and IgG levels correlated with Haemophilus The diversity of Ig-coated microbiota was influenced by breastfeeding and age. IgA and IgM coated the same microbiota, which reflected the overall diversity of the microbiome, while IgG targeted a different subset. Blautia generally evaded antibody coating, while members of the Bifidobacteria and Enterobacteriaceae were high in IgA/M. IgA/M displayed similar dynamics, generally coating the microbiome proportionally, and were influenced by breastfeeding status. IgG only coated a small fraction of the commensal microbiota and differed from the proportion targeted by IgA and IgM.IMPORTANCE Antibodies are secreted into the gut and attach to roughly half of the trillions of bacterial cells present. When babies are born, the breastmilk supplies these antibodies until the baby's own immune system takes over this task after a few weeks. The vast majority of these antibodies are IgA, but two other types, IgG and IgM, are also present in the gut. Here, we ask if these three different antibody types target different types of bacteria in the infant gut as the infant develops from birth to 18 months old and how patterns of antibody coating of bacteria change with age. In this study of healthy infant samples over time, we found that IgA and IgM coat the same bacteria, which are generally representative of the diversity present, with a few exceptions that were more or less antibody coated than expected. IgG coated a separate suite of bacteria. These results provide a better understanding of how these antibodies interact with the developing infant gut microbiome.

20.
Cell Host Microbe ; 25(4): 553-564.e7, 2019 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-30974084

RESUMO

Host genetic variation influences microbiome composition. While studies have focused on associations between the gut microbiome and specific alleles, gene copy number (CN) also varies. We relate microbiome diversity to CN variation of the AMY1 locus, which encodes salivary amylase, facilitating starch digestion. After imputing AMY1-CN for ∼1,000 subjects, we identified taxa differentiating fecal microbiomes of high and low AMY1-CN hosts. In a month-long diet intervention study, we show that diet standardization drove gut microbiome convergence, and AMY1-CN correlated with oral and gut microbiome composition and function. The microbiomes of low-AMY1-CN subjects had enhanced capacity to break down complex carbohydrates. High-AMY1-CN subjects had higher levels of salivary Porphyromonas; their gut microbiota had increased abundance of resistant starch-degrading microbes, produced higher levels of short-chain fatty acids, and drove higher adiposity when transferred to germ-free mice. This study establishes AMY1-CN as a genetic factor associated with microbiome composition and function.


Assuntos
Amilases/genética , Trato Gastrointestinal/microbiologia , Dosagem de Genes , Microbiota , Boca/microbiologia , Saliva/enzimologia , Animais , Vida Livre de Germes , Humanos , Camundongos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...