Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
bioRxiv ; 2024 Jun 13.
Artículo en Inglés | MEDLINE | ID: mdl-38915639

RESUMEN

Incomplete penetrance, or absence of disease phenotype in an individual with a disease-associated variant, is a major challenge in variant interpretation. Studying individuals with apparent incomplete penetrance can shed light on underlying drivers of altered phenotype penetrance. Here, we investigate clinically relevant variants from ClinVar in 807,162 individuals from the Genome Aggregation Database (gnomAD), demonstrating improved representation in gnomAD version 4. We then conduct a comprehensive case-by-case assessment of 734 predicted loss of function variants (pLoF) in 77 genes associated with severe, early-onset, highly penetrant haploinsufficient disease. We identified explanations for the presumed lack of disease manifestation in 701 of the variants (95%). Individuals with unexplained lack of disease manifestation in this set of disorders rarely occur, underscoring the need and power of deep case-by-case assessment presented here to minimize false assignments of disease risk, particularly in unaffected individuals with higher rates of secondary properties that result in rescue.

2.
Genome Res ; 34(5): 796-809, 2024 Jun 25.
Artículo en Inglés | MEDLINE | ID: mdl-38749656

RESUMEN

Underrepresented populations are often excluded from genomic studies owing in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high-quality set of 4094 whole genomes from 80 populations in the HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also show substantial added value from this data set compared with the prior versions of the component resources, typically combined via liftOver and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared with previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality-control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.


Asunto(s)
Bases de Datos Genéticas , Genoma Humano , Humanos , Proyecto Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Variación Genética , Genómica/métodos
3.
bioRxiv ; 2024 May 03.
Artículo en Inglés | MEDLINE | ID: mdl-38645134

RESUMEN

Missense variants can have a range of functional impacts depending on factors such as the specific amino acid substitution and location within the gene. To interpret their deleteriousness, studies have sought to identify regions within genes that are specifically intolerant of missense variation 1-12 . Here, we leverage the patterns of rare missense variation in 125,748 individuals in the Genome Aggregation Database (gnomAD) 13 against a null mutational model to identify transcripts that display regional differences in missense constraint. Missense-depleted regions are enriched for ClinVar 14 pathogenic variants, de novo missense variants from individuals with neurodevelopmental disorders (NDDs) 15,16 , and complex trait heritability. Following ClinGen calibration recommendations for the ACMG/AMP guidelines, we establish that regions with less than 20% of their expected missense variation achieve moderate support for pathogenicity. We create a missense deleteriousness metric (MPC) that incorporates regional constraint and outperforms other deleteriousness scores at stratifying case and control de novo missense variation, with a strong enrichment in NDDs. These results provide additional tools to aid in missense variant interpretation.

4.
medRxiv ; 2024 Feb 07.
Artículo en Inglés | MEDLINE | ID: mdl-38496558

RESUMEN

Genes encoding long non-coding RNAs (lncRNAs) comprise a large fraction of the human genome, yet haploinsufficiency of a lncRNA has not been shown to cause a Mendelian disease. CHASERR is a highly conserved human lncRNA adjacent to CHD2-a coding gene in which de novo loss-of-function variants cause developmental and epileptic encephalopathy. Here we report three unrelated individuals each harboring an ultra-rare heterozygous de novo deletion in the CHASERR locus. We report similarities in severe developmental delay, facial dysmorphisms, and cerebral dysmyelination in these individuals, distinguishing them from the phenotypic spectrum of CHD2 haploinsufficiency. We demonstrate reduced CHASERR mRNA expression and corresponding increased CHD2 mRNA and protein in whole blood and patient-derived cell lines-specifically increased expression of the CHD2 allele in cis with the CHASERR deletion, as predicted from a prior mouse model of Chaserr haploinsufficiency. We show for the first time that de novo structural variants facilitated by Alu-mediated non-allelic homologous recombination led to deletion of a non-coding element (the lncRNA CHASERR) to cause a rare syndromic neurodevelopmental disorder. We also demonstrate that CHD2 has bidirectional dosage sensitivity in human disease. This work highlights the need to carefully evaluate other lncRNAs, particularly those upstream of genes associated with Mendelian disorders.

6.
bioRxiv ; 2024 Feb 28.
Artículo en Inglés | MEDLINE | ID: mdl-36747613

RESUMEN

Underrepresented populations are often excluded from genomic studies due in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high quality set of 4,094 whole genomes from HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also demonstrate substantial added value from this dataset compared to the prior versions of the component resources, typically combined via liftover and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared to previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.

7.
Nat Genet ; 56(1): 152-161, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38057443

RESUMEN

Recessive diseases arise when both copies of a gene are impacted by a damaging genetic variant. When a patient carries two potentially causal variants in a gene, accurate diagnosis requires determining that these variants occur on different copies of the chromosome (that is, are in trans) rather than on the same copy (that is, in cis). However, current approaches for determining phase, beyond parental testing, are limited in clinical settings. Here we developed a strategy for inferring phase for rare variant pairs within genes, leveraging genotypes observed in the Genome Aggregation Database (v2, n = 125,748 exomes). Our approach estimates phase with 96% accuracy, both in trio data and in patients with Mendelian conditions and presumed causal compound heterozygous variants. We provide a public resource of phasing estimates for coding variants and counts per gene of rare variants in trans that can aid interpretation of rare co-occurring variants in the context of recessive disease.


Asunto(s)
Exoma , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Exoma/genética , Secuenciación del Exoma , Genotipo
8.
Nature ; 625(7993): 92-100, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38057664

RESUMEN

The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.


Asunto(s)
Genoma Humano , Genómica , Modelos Genéticos , Mutación , Humanos , Acceso a la Información , Bases de Datos Genéticas , Conjuntos de Datos como Asunto , Frecuencia de los Genes , Genoma Humano/genética , Mutación/genética , Selección Genética
9.
Am J Hum Genet ; 110(12): 2068-2076, 2023 Dec 07.
Artículo en Inglés | MEDLINE | ID: mdl-38000370

RESUMEN

DNA sample contamination is a major issue in clinical and research applications of whole-genome and -exome sequencing. Even modest levels of contamination can substantially affect the overall quality of variant calls and lead to widespread genotyping errors. Currently, popular tools for estimating the contamination level use short-read data (BAM/CRAM files), which are expensive to store and manipulate and often not retained or shared widely. We propose a metric to estimate DNA sample contamination from variant-level whole-genome and -exome sequence data called CHARR, contamination from homozygous alternate reference reads, which leverages the infiltration of reference reads within homozygous alternate variant calls. CHARR uses a small proportion of variant-level genotype information and thus can be computed from single-sample gVCFs or callsets in VCF or BCF formats, as well as efficiently stored variant calls in Hail VariantDataset format. Our results demonstrate that CHARR accurately recapitulates results from existing tools with substantially reduced costs, improving the accuracy and efficiency of downstream analyses of ultra-large whole-genome and exome sequencing datasets.


Asunto(s)
ADN , Trucha , Humanos , Animales , Análisis de Secuencia de ADN/métodos , Genotipo , Homocigoto , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos
10.
bioRxiv ; 2023 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-37425834

RESUMEN

DNA sample contamination is a major issue in clinical and research applications of whole genome and exome sequencing. Even modest levels of contamination can substantially affect the overall quality of variant calls and lead to widespread genotyping errors. Currently, popular tools for estimating the contamination level use short-read data (BAM/CRAM files), which are expensive to store and manipulate and often not retained or shared widely. We propose a new metric to estimate DNA sample contamination from variant-level whole genome and exome sequence data, CHARR, Contamination from Homozygous Alternate Reference Reads, which leverages the infiltration of reference reads within homozygous alternate variant calls. CHARR uses a small proportion of variant-level genotype information and thus can be computed from single-sample gVCFs or callsets in VCF or BCF formats, as well as efficiently stored variant calls in Hail VDS format. Our results demonstrate that CHARR accurately recapitulates results from existing tools with substantially reduced costs, improving the accuracy and efficiency of downstream analyses of ultra-large whole genome and exome sequencing datasets.

11.
bioRxiv ; 2023 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-36993580

RESUMEN

Recessive diseases arise when both the maternal and the paternal copies of a gene are impacted by a damaging genetic variant in the affected individual. When a patient carries two different potentially causal variants in a gene for a given disorder, accurate diagnosis requires determining that these two variants occur on different copies of the chromosome (i.e., are in trans) rather than on the same copy (i.e. in cis). However, current approaches for determining phase, beyond parental testing, are limited in clinical settings. We developed a strategy for inferring phase for rare variant pairs within genes, leveraging genotypes observed in exome sequencing data from the Genome Aggregation Database (gnomAD v2, n=125,748). When applied to trio data where phase can be determined by transmission, our approach estimates phase with 95.7% accuracy and remains accurate even for very rare variants (allele frequency < 1×10-4). We also correctly phase 95.9% of variant pairs in a set of 293 patients with Mendelian conditions carrying presumed causal compound heterozygous variants. We provide a public resource of phasing estimates from gnomAD, including phasing estimates for coding variants across the genome and counts per gene of rare variants in trans, that can aid interpretation of rare co-occurring variants in the context of recessive disease.

12.
Hum Mutat ; 43(8): 1012-1030, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-34859531

RESUMEN

Reference population databases are an essential tool in variant and gene interpretation. Their use guides the identification of pathogenic variants amidst the sea of benign variation present in every human genome, and supports the discovery of new disease-gene relationships. The Genome Aggregation Database (gnomAD) is currently the largest and most widely used publicly available collection of population variation from harmonized sequencing data. The data is available through the online gnomAD browser (https://gnomad.broadinstitute.org/) that enables rapid and intuitive variant analysis. This review provides guidance on the content of the gnomAD browser, and its usage for variant and gene interpretation. We introduce key features including allele frequency, per-base expression levels, constraint scores, and variant co-occurrence, alongside guidance on how to use these in analysis, with a focus on the interpretation of candidate variants and novel genes in rare disease.


Asunto(s)
Enfermedades Raras , Programas Informáticos , Bases de Datos Genéticas , Frecuencia de los Genes , Humanos , Enfermedades Raras/genética
13.
Cell Genom ; 2(9): 100168, 2022 Sep 14.
Artículo en Inglés | MEDLINE | ID: mdl-36778668

RESUMEN

Genome-wide association studies have successfully discovered thousands of common variants associated with human diseases and traits, but the landscape of rare variations in human disease has not been explored at scale. Exome-sequencing studies of population biobanks provide an opportunity to systematically evaluate the impact of rare coding variations across a wide range of phenotypes to discover genes and allelic series relevant to human health and disease. Here, we present results from systematic association analyses of 4,529 phenotypes using single-variant and gene tests of 394,841 individuals in the UK Biobank with exome-sequence data. We find that the discovery of genetic associations is tightly linked to frequency and is correlated with metrics of deleteriousness and natural selection. We highlight biological findings elucidated by these data and release the dataset as a public resource alongside the Genebass browser for rapidly exploring rare-variant association results.

15.
Nat Commun ; 12(1): 3505, 2021 06 09.
Artículo en Inglés | MEDLINE | ID: mdl-34108472

RESUMEN

Hundreds of thousands of genetic variants have been reported to cause severe monogenic diseases, but the probability that a variant carrier develops the disease (termed penetrance) is unknown for virtually all of them. Additionally, the clinical utility of common polygenetic variation remains uncertain. Using exome sequencing from 77,184 adult individuals (38,618 multi-ancestral individuals from a type 2 diabetes case-control study and 38,566 participants from the UK Biobank, for whom genotype array data were also available), we apply clinical standard-of-care gene variant curation for eight monogenic metabolic conditions. Rare variants causing monogenic diabetes and dyslipidemias display effect sizes significantly larger than the top 1% of the corresponding polygenic scores. Nevertheless, penetrance estimates for monogenic variant carriers average 60% or lower for most conditions. We assess epidemiologic and genetic factors contributing to risk prediction in monogenic variant carriers, demonstrating that inclusion of polygenic variation significantly improves biomarker estimation for two monogenic dyslipidemias.


Asunto(s)
Diabetes Mellitus Tipo 2/genética , Dislipidemias/genética , Predisposición Genética a la Enfermedad/genética , Adulto , Variación Biológica Poblacional , Biomarcadores/metabolismo , Diabetes Mellitus Tipo 2/metabolismo , Dislipidemias/metabolismo , Exoma/genética , Genotipo , Humanos , Herencia Multifactorial , Penetrancia , Medición de Riesgo
16.
Am J Hum Genet ; 108(5): 840-856, 2021 05 06.
Artículo en Inglés | MEDLINE | ID: mdl-33861953

RESUMEN

JAG2 encodes the Notch ligand Jagged2. The conserved Notch signaling pathway contributes to the development and homeostasis of multiple tissues, including skeletal muscle. We studied an international cohort of 23 individuals with genetically unsolved muscular dystrophy from 13 unrelated families. Whole-exome sequencing identified rare homozygous or compound heterozygous JAG2 variants in all 13 families. The identified bi-allelic variants include 10 missense variants that disrupt highly conserved amino acids, a nonsense variant, two frameshift variants, an in-frame deletion, and a microdeletion encompassing JAG2. Onset of muscle weakness occurred from infancy to young adulthood. Serum creatine kinase (CK) levels were normal or mildly elevated. Muscle histology was primarily dystrophic. MRI of the lower extremities revealed a distinct, slightly asymmetric pattern of muscle involvement with cores of preserved and affected muscles in quadriceps and tibialis anterior, in some cases resembling patterns seen in POGLUT1-associated muscular dystrophy. Transcriptome analysis of muscle tissue from two participants suggested misregulation of genes involved in myogenesis, including PAX7. In complementary studies, Jag2 downregulation in murine myoblasts led to downregulation of multiple components of the Notch pathway, including Megf10. Investigations in Drosophila suggested an interaction between Serrate and Drpr, the fly orthologs of JAG1/JAG2 and MEGF10, respectively. In silico analysis predicted that many Jagged2 missense variants are associated with structural changes and protein misfolding. In summary, we describe a muscular dystrophy associated with pathogenic variants in JAG2 and evidence suggests a disease mechanism related to Notch pathway dysfunction.


Asunto(s)
Proteína Jagged-2/genética , Distrofias Musculares/genética , Adolescente , Adulto , Secuencia de Aminoácidos , Animales , Línea Celular , Niño , Preescolar , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Femenino , Glucosiltransferasas/genética , Haplotipos/genética , Humanos , Proteína Jagged-1/genética , Proteína Jagged-2/química , Proteína Jagged-2/deficiencia , Proteína Jagged-2/metabolismo , Masculino , Proteínas de la Membrana/genética , Ratones , Persona de Mediana Edad , Modelos Moleculares , Músculos/metabolismo , Músculos/patología , Distrofias Musculares/patología , Mioblastos/metabolismo , Mioblastos/patología , Linaje , Fenotipo , Receptores Notch/metabolismo , Transducción de Señal , Secuenciación del Exoma , Adulto Joven
17.
J Exp Med ; 218(6)2021 06 07.
Artículo en Inglés | MEDLINE | ID: mdl-33857290

RESUMEN

Advances in genome sequencing have resulted in the identification of the causes for numerous rare diseases. However, many cases remain unsolved with standard molecular analyses. We describe a family presenting with a phenotype resembling inherited thrombocytopenia 2 (THC2). THC2 is generally caused by single nucleotide variants that prevent silencing of ANKRD26 expression during hematopoietic differentiation. Short-read whole-exome and genome sequencing approaches were unable to identify a causal variant in this family. Using long-read whole-genome sequencing, a large complex structural variant involving a paired-duplication inversion was identified. Through functional studies, we show that this structural variant results in a pathogenic gain-of-function WAC-ANKRD26 fusion transcript. Our findings illustrate how complex structural variants that may be missed by conventional genome sequencing approaches can cause human disease.


Asunto(s)
Proteínas Adaptadoras Transductoras de Señales/genética , Péptidos y Proteínas de Señalización Intercelular/genética , Polimorfismo de Nucleótido Simple/genética , Trombocitopenia/genética , Adolescente , Adulto , Anciano , Línea Celular , Línea Celular Tumoral , Niño , Rotura Cromosómica , Trastornos de los Cromosomas/genética , Exoma/genética , Femenino , Células HEK293 , Células HeLa , Humanos , Masculino , Persona de Mediana Edad , Mutación/genética , Linaje , Trombocitopenia/congénito
19.
Nat Med ; 26(6): 869-877, 2020 06.
Artículo en Inglés | MEDLINE | ID: mdl-32461697

RESUMEN

Human genetic variants predicted to cause loss-of-function of protein-coding genes (pLoF variants) provide natural in vivo models of human gene inactivation and can be valuable indicators of gene function and the potential toxicity of therapeutic inhibitors targeting these genes1,2. Gain-of-kinase-function variants in LRRK2 are known to significantly increase the risk of Parkinson's disease3,4, suggesting that inhibition of LRRK2 kinase activity is a promising therapeutic strategy. While preclinical studies in model organisms have raised some on-target toxicity concerns5-8, the biological consequences of LRRK2 inhibition have not been well characterized in humans. Here, we systematically analyze pLoF variants in LRRK2 observed across 141,456 individuals sequenced in the Genome Aggregation Database (gnomAD)9, 49,960 exome-sequenced individuals from the UK Biobank and over 4 million participants in the 23andMe genotyped dataset. After stringent variant curation, we identify 1,455 individuals with high-confidence pLoF variants in LRRK2. Experimental validation of three variants, combined with previous work10, confirmed reduced protein levels in 82.5% of our cohort. We show that heterozygous pLoF variants in LRRK2 reduce LRRK2 protein levels but that these are not strongly associated with any specific phenotype or disease state. Our results demonstrate the value of large-scale genomic databases and phenotyping of human loss-of-function carriers for target validation in drug discovery.


Asunto(s)
Proteína 2 Quinasa Serina-Treonina Rica en Repeticiones de Leucina/genética , Mutación con Pérdida de Función/genética , Adulto , Anciano , Anciano de 80 o más Años , Bancos de Muestras Biológicas , Línea Celular , Células Madre Embrionarias/metabolismo , Femenino , Mutación con Ganancia de Función/genética , Heterocigoto , Humanos , Proteína 2 Quinasa Serina-Treonina Rica en Repeticiones de Leucina/antagonistas & inhibidores , Proteína 2 Quinasa Serina-Treonina Rica en Repeticiones de Leucina/metabolismo , Longevidad/genética , Linfocitos/metabolismo , Masculino , Persona de Mediana Edad , Miocitos Cardíacos/metabolismo , Enfermedad de Parkinson/tratamiento farmacológico , Enfermedad de Parkinson/genética , Fenotipo
20.
mSystems ; 4(6)2019 Nov 26.
Artículo en Inglés | MEDLINE | ID: mdl-31771976

RESUMEN

Interactions between the gut microbiome and immunoglobulin A (IgA) in the gut during infancy are important for future health. IgM and IgG are also present in the gut; however, their interactions with the microbiome in the developing infant remain to be characterized. Using stool samples sampled 15 times in infancy from 32 healthy subjects at 4 locations in 3 countries, we characterized patterns of microbiome development in relation to fecal levels of IgA, IgG, and IgM. For 8 infants from a single location, we used fluorescence-activated cell sorting of microbial cells from stool by Ig-coating status over 18 months. We used 16S rRNA gene profiling on full and sorted microbiomes to assess patterns of antibody coating in relation to age and other factors. All antibodies decreased in concentration with age but were augmented by breastmilk feeding regardless of infant age. Levels of IgA correlated with relative abundances of operational taxonomic units (OTUs) belonging to the Bifidobacteria and Enterobacteriaceae, which dominated the early microbiome, and IgG levels correlated with Haemophilus The diversity of Ig-coated microbiota was influenced by breastfeeding and age. IgA and IgM coated the same microbiota, which reflected the overall diversity of the microbiome, while IgG targeted a different subset. Blautia generally evaded antibody coating, while members of the Bifidobacteria and Enterobacteriaceae were high in IgA/M. IgA/M displayed similar dynamics, generally coating the microbiome proportionally, and were influenced by breastfeeding status. IgG only coated a small fraction of the commensal microbiota and differed from the proportion targeted by IgA and IgM.IMPORTANCE Antibodies are secreted into the gut and attach to roughly half of the trillions of bacterial cells present. When babies are born, the breastmilk supplies these antibodies until the baby's own immune system takes over this task after a few weeks. The vast majority of these antibodies are IgA, but two other types, IgG and IgM, are also present in the gut. Here, we ask if these three different antibody types target different types of bacteria in the infant gut as the infant develops from birth to 18 months old and how patterns of antibody coating of bacteria change with age. In this study of healthy infant samples over time, we found that IgA and IgM coat the same bacteria, which are generally representative of the diversity present, with a few exceptions that were more or less antibody coated than expected. IgG coated a separate suite of bacteria. These results provide a better understanding of how these antibodies interact with the developing infant gut microbiome.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...