Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 158
Filter
1.
Genome Res ; 34(5): 796-809, 2024 Jun 25.
Article in English | MEDLINE | ID: mdl-38749656

ABSTRACT

Underrepresented populations are often excluded from genomic studies owing in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high-quality set of 4094 whole genomes from 80 populations in the HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also show substantial added value from this data set compared with the prior versions of the component resources, typically combined via liftOver and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared with previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality-control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.


Subject(s)
Databases, Genetic , Genome, Human , Humans , Human Genome Project , High-Throughput Nucleotide Sequencing/methods , Genetic Variation , Genomics/methods
2.
bioRxiv ; 2024 Apr 29.
Article in English | MEDLINE | ID: mdl-38746320

ABSTRACT

Pediatric solid tumors are rare malignancies that represent a leading cause of death by disease among children in developed countries. The early age-of-onset of these tumors suggests that germline genetic factors are involved, yet conventional germline testing for short coding variants in established predisposition genes only identifies pathogenic events in 10-15% of patients. Here, we examined the role of germline structural variants (SVs)-an underexplored form of germline variation-in pediatric extracranial solid tumors using germline genome sequencing of 1,766 affected children, their 943 unaffected relatives, and 6,665 adult controls. We discovered a sex-biased association between very large (>1 megabase) germline chromosomal abnormalities and a four-fold increased risk of solid tumors in male children. The overall impact of germline SVs was greatest in neuroblastoma, where we revealed burdens of ultra-rare SVs that cause loss-of-function of highly expressed, mutationally intolerant, neurodevelopmental genes, as well as noncoding SVs predicted to disrupt three-dimensional chromatin domains in neural crest-derived tissues. Collectively, our results implicate rare germline SVs as a predisposing factor to pediatric solid tumors that may guide future studies and clinical practice.

4.
bioRxiv ; 2024 May 03.
Article in English | MEDLINE | ID: mdl-38645134

ABSTRACT

Missense variants can have a range of functional impacts depending on factors such as the specific amino acid substitution and location within the gene. To interpret their deleteriousness, studies have sought to identify regions within genes that are specifically intolerant of missense variation 1-12 . Here, we leverage the patterns of rare missense variation in 125,748 individuals in the Genome Aggregation Database (gnomAD) 13 against a null mutational model to identify transcripts that display regional differences in missense constraint. Missense-depleted regions are enriched for ClinVar 14 pathogenic variants, de novo missense variants from individuals with neurodevelopmental disorders (NDDs) 15,16 , and complex trait heritability. Following ClinGen calibration recommendations for the ACMG/AMP guidelines, we establish that regions with less than 20% of their expected missense variation achieve moderate support for pathogenicity. We create a missense deleteriousness metric (MPC) that incorporates regional constraint and outperforms other deleteriousness scores at stratifying case and control de novo missense variation, with a strong enrichment in NDDs. These results provide additional tools to aid in missense variant interpretation.

5.
medRxiv ; 2024 Mar 26.
Article in English | MEDLINE | ID: mdl-38585811

ABSTRACT

Purpose: To identify genetic etiologies and genotype/phenotype associations for unsolved ocular congenital cranial dysinnervation disorders (oCCDDs). Methods: We coupled phenotyping with exome or genome sequencing of 467 pedigrees with genetically unsolved oCCDDs, integrating analyses of pedigrees, human and animal model phenotypes, and de novo variants to identify rare candidate single nucleotide variants, insertion/deletions, and structural variants disrupting protein-coding regions. Prioritized variants were classified for pathogenicity and evaluated for genotype/phenotype correlations. Results: Analyses elucidated phenotypic subgroups, identified pathogenic/likely pathogenic variant(s) in 43/467 probands (9.2%), and prioritized variants of uncertain significance in 70/467 additional probands (15.0%). These included known and novel variants in established oCCDD genes, genes associated with syndromes that sometimes include oCCDDs (e.g., MYH10, KIF21B, TGFBR2, TUBB6), genes that fit the syndromic component of the phenotype but had no prior oCCDD association (e.g., CDK13, TGFB2), genes with no reported association with oCCDDs or the syndromic phenotypes (e.g., TUBA4A, KIF5C, CTNNA1, KLB, FGF21), and genes associated with oCCDD phenocopies that had resulted in misdiagnoses. Conclusion: This study suggests that unsolved oCCDDs are clinically and genetically heterogeneous disorders often overlapping other Mendelian conditions and nominates many candidates for future replication and functional studies.

6.
bioRxiv ; 2024 Mar 04.
Article in English | MEDLINE | ID: mdl-38496583

ABSTRACT

Epigenome editing with DNA-targeting technologies such as CRISPR-dCas9 can be used to dissect gene regulatory mechanisms and potentially treat associated disorders. For example, Prader-Willi Syndrome (PWS) is caused by loss of paternally expressed imprinted genes on chromosome 15q11.2-q13.3, although the maternal allele is intact but epigenetically silenced. Using CRISPR repression and activation screens in human induced pluripotent stem cells (iPSCs), we identified genomic elements that control expression of the PWS gene SNRPN from the paternal and maternal chromosomes. We showed that either targeted transcriptional activation or DNA demethylation can activate the silenced maternal SNRPN and downstream PWS transcripts. However, these two approaches function at unique regions, preferentially activating different transcript variants and involving distinct epigenetic reprogramming mechanisms. Remarkably, transient expression of the targeted demethylase leads to stable, long-term maternal SNRPN expression in PWS iPSCs. This work uncovers targeted epigenetic manipulations to reprogram a disease-associated imprinted locus and suggests possible therapeutic interventions.

8.
Genet Med ; 26(5): 101076, 2024 05.
Article in English | MEDLINE | ID: mdl-38258669

ABSTRACT

PURPOSE: Genome sequencing (GS)-specific diagnostic rates in prospective tightly ascertained exome sequencing (ES)-negative intellectual disability (ID) cohorts have not been reported extensively. METHODS: ES, GS, epigenetic signatures, and long-read sequencing diagnoses were assessed in 74 trios with at least moderate ID. RESULTS: The ES diagnostic yield was 42 of 74 (57%). GS diagnoses were made in 9 of 32 (28%) ES-unresolved families. Repeated ES with a contemporary pipeline on the GS-diagnosed families identified 8 of 9 single-nucleotide variations/copy-number variations undetected in older ES, confirming a GS-unique diagnostic rate of 1 in 32 (3%). Episignatures contributed diagnostic information in 9% with GS corroboration in 1 of 32 (3%) and diagnostic clues in 2 of 32 (6%). A genetic etiology for ID was detected in 51 of 74 (69%) families. Twelve candidate disease genes were identified. Contemporary ES followed by GS cost US$4976 (95% CI: $3704; $6969) per diagnosis and first-line GS at a cost of $7062 (95% CI: $6210; $8475) per diagnosis. CONCLUSION: Performing GS only in ID trios would be cost equivalent to ES if GS were available at $2435, about a 60% reduction from current prices. This study demonstrates that first-line GS achieves higher diagnostic rate than contemporary ES but at a higher cost.


Subject(s)
Exome Sequencing , Exome , Intellectual Disability , Humans , Intellectual Disability/genetics , Intellectual Disability/diagnosis , Male , Female , Exome/genetics , Exome Sequencing/economics , Cohort Studies , Genetic Testing/economics , Genetic Testing/methods , Whole Genome Sequencing/economics , Child , Genome, Human/genetics , DNA Copy Number Variations/genetics , Polymorphism, Single Nucleotide/genetics , Child, Preschool
11.
Prenat Diagn ; 44(4): 454-464, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38242839

ABSTRACT

Advances in sequencing and imaging technologies enable enhanced assessment in the prenatal space, with a goal to diagnose and predict the natural history of disease, to direct targeted therapies, and to implement clinical management, including transfer of care, election of supportive care, and selection of surgical interventions. The current lack of standardization and aggregation stymies variant interpretation and gene discovery, which hinders the provision of prenatal precision medicine, leaving clinicians and patients without an accurate diagnosis. With large amounts of data generated, it is imperative to establish standards for data collection, processing, and aggregation. Aggregated and homogeneously processed genetic and phenotypic data permits dissection of the genomic architecture of prenatal presentations of disease and provides a dataset on which data analysis algorithms can be tuned to the prenatal space. Here we discuss the importance of generating aggregate data sets and how the prenatal space is driving the development of interoperable standards and phenotype-driven tools.


Subject(s)
Precision Medicine , Prenatal Diagnosis , Pregnancy , Female , Humans , Phenotype , Genomics , Algorithms
12.
Sci Rep ; 14(1): 570, 2024 01 04.
Article in English | MEDLINE | ID: mdl-38177237

ABSTRACT

Familial dysautonomia (FD) is a rare recessive neurodevelopmental disease caused by a splice mutation in the Elongator acetyltransferase complex subunit 1 (ELP1) gene. This mutation results in a tissue-specific reduction of ELP1 protein, with the lowest levels in the central and peripheral nervous systems (CNS and PNS, respectively). FD patients exhibit complex neurological phenotypes due to the loss of sensory and autonomic neurons. Disease symptoms include decreased pain and temperature perception, impaired or absent myotatic reflexes, proprioceptive ataxia, and progressive retinal degeneration. While the involvement of the PNS in FD pathogenesis has been clearly recognized, the underlying mechanisms responsible for the preferential neuronal loss remain unknown. In this study, we aimed to elucidate the molecular mechanisms underlying FD by conducting a comprehensive transcriptome analysis of neuronal tissues from the phenotypic mouse model TgFD9; Elp1Δ20/flox. This mouse recapitulates the same tissue-specific ELP1 mis-splicing observed in patients while modeling many of the disease manifestations. Comparison of FD and control transcriptomes from dorsal root ganglion (DRG), trigeminal ganglion (TG), medulla (MED), cortex, and spinal cord (SC) showed significantly more differentially expressed genes (DEGs) in the PNS than the CNS. We then identified genes that were tightly co-expressed and functionally dependent on the level of full-length ELP1 transcript. These genes, defined as ELP1 dose-responsive genes, were combined with the DEGs to generate tissue-specific dysregulated FD signature genes and networks. Within the PNS networks, we observed direct connections between Elp1 and genes involved in tRNA synthesis and genes related to amine metabolism and synaptic signaling. Importantly, transcriptomic dysregulation in PNS tissues exhibited enrichment for neuronal subtype markers associated with peptidergic nociceptors and myelinated sensory neurons, which are known to be affected in FD. In summary, this study has identified critical tissue-specific gene networks underlying the etiology of FD and provides new insights into the molecular basis of the disease.


Subject(s)
Dysautonomia, Familial , Humans , Mice , Animals , Dysautonomia, Familial/genetics , Dysautonomia, Familial/metabolism , Dysautonomia, Familial/pathology , Carrier Proteins/metabolism , Peripheral Nervous System/metabolism , Sensory Receptor Cells/metabolism , Gene Expression Profiling , Gene Expression
13.
bioRxiv ; 2024 Feb 28.
Article in English | MEDLINE | ID: mdl-36747613

ABSTRACT

Underrepresented populations are often excluded from genomic studies due in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high quality set of 4,094 whole genomes from HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also demonstrate substantial added value from this dataset compared to the prior versions of the component resources, typically combined via liftover and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared to previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.

14.
Cell Rep Methods ; 4(1): 100672, 2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38091988

ABSTRACT

New technologies and large-cohort studies have enabled novel variant discovery and association at unprecedented scale, yet functional characterization of these variants remains paramount to deciphering disease mechanisms. Approaches that facilitate parallelized genome editing of cells of interest or induced pluripotent stem cells (iPSCs) have become critical tools toward this goal. Here, we developed an approach that incorporates libraries of CRISPR-Cas9 guide RNAs (gRNAs) together with inducible Cas9 into a piggyBac (PB) transposon system to engineer dozens to hundreds of genomic variants in parallel against isogenic cellular backgrounds. This method empowers loss-of-function (LoF) studies through the introduction of insertions or deletions (indels) and copy-number variants (CNVs), though generating specific nucleotide changes is possible with prime editing. The ability to rapidly establish high-quality mutational models at scale will facilitate the development of isogenic cellular collections and catalyze comparative functional genomic studies investigating the roles of hundreds of genes and mutations in development and disease.


Subject(s)
CRISPR-Cas Systems , Induced Pluripotent Stem Cells , Humans , Gene Editing/methods , Mutation , Genomics
15.
Nature ; 625(7993): 92-100, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38057664

ABSTRACT

The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.


Subject(s)
Genome, Human , Genomics , Models, Genetic , Mutation , Humans , Access to Information , Databases, Genetic , Datasets as Topic , Gene Frequency , Genome, Human/genetics , Mutation/genetics , Selection, Genetic
17.
bioRxiv ; 2023 Oct 11.
Article in English | MEDLINE | ID: mdl-37808686

ABSTRACT

Familial dysautonomia (FD) is a rare recessive neurodevelopmental disease caused by a splice mutation in the Elongator acetyltransferase complex subunit 1 ( ELP1 ) gene. This mutation results in a tissue-specific reduction of ELP1 protein, with the lowest levels in the central and peripheral nervous systems (CNS and PNS, respectively). FD patients exhibit complex neurological phenotypes due to the loss of sensory and autonomic neurons. Disease symptoms include decreased pain and temperature perception, impaired or absent myotatic reflexes, proprioceptive ataxia, and progressive retinal degeneration. While the involvement of the PNS in FD pathogenesis has been clearly recognized, the underlying mechanisms responsible for the preferential neuronal loss remain unknown. In this study, we aimed to elucidate the molecular mechanisms underlying FD by conducting a comprehensive transcriptome analysis of neuronal tissues from the phenotypic mouse model TgFD9 ; Elp1 Δ 20/flox . This mouse recapitulates the same tissue-specific ELP1 mis-splicing observed in patients while modeling many of the disease manifestations. Comparison of FD and control transcriptomes from dorsal root ganglion (DRG), trigeminal ganglion (TG), medulla (MED), cortex, and spinal cord (SC) showed significantly more differentially expressed genes (DEGs) in the PNS than the CNS. We then identified genes that were tightly co-expressed and functionally dependent on the level of full-length ELP1 transcript. These genes, defined as ELP1 dose-responsive genes, were combined with the DEGs to generate tissue-specific dysregulated FD signature genes and networks. Within the PNS networks, we observed direct connections between Elp1 and genes involved in tRNA synthesis and genes related to amine metabolism and synaptic signaling. Importantly, transcriptomic dysregulation in PNS tissues exhibited enrichment for neuronal subtype markers associated with peptidergic nociceptors and myelinated sensory neurons, which are known to be affected in FD. In summary, this study has identified critical tissue-specific gene networks underlying the etiology of FD and provides new insights into the molecular basis of the disease.

18.
Am J Hum Genet ; 110(9): 1454-1469, 2023 09 07.
Article in English | MEDLINE | ID: mdl-37595579

ABSTRACT

Short-read genome sequencing (GS) holds the promise of becoming the primary diagnostic approach for the assessment of autism spectrum disorder (ASD) and fetal structural anomalies (FSAs). However, few studies have comprehensively evaluated its performance against current standard-of-care diagnostic tests: karyotype, chromosomal microarray (CMA), and exome sequencing (ES). To assess the clinical utility of GS, we compared its diagnostic yield against these three tests in 1,612 quartet families including an individual with ASD and in 295 prenatal families. Our GS analytic framework identified a diagnostic variant in 7.8% of ASD probands, almost 2-fold more than CMA (4.3%) and 3-fold more than ES (2.7%). However, when we systematically captured copy-number variants (CNVs) from the exome data, the diagnostic yield of ES (7.4%) was brought much closer to, but did not surpass, GS. Similarly, we estimated that GS could achieve an overall diagnostic yield of 46.1% in unselected FSAs, representing a 17.2% increased yield over karyotype, 14.1% over CMA, and 4.1% over ES with CNV calling or 36.1% increase without CNV discovery. Overall, GS provided an added diagnostic yield of 0.4% and 0.8% beyond the combination of all three standard-of-care tests in ASD and FSAs, respectively. This corresponded to nine GS unique diagnostic variants, including sequence variants in exons not captured by ES, structural variants (SVs) inaccessible to existing standard-of-care tests, and SVs where the resolution of GS changed variant classification. Overall, this large-scale evaluation demonstrated that GS significantly outperforms each individual standard-of-care test while also outperforming the combination of all three tests, thus warranting consideration as the first-tier diagnostic approach for the assessment of ASD and FSAs.


Subject(s)
Autism Spectrum Disorder , Female , Pregnancy , Humans , Autism Spectrum Disorder/diagnosis , Autism Spectrum Disorder/genetics , Pregnancy Trimester, First , Ultrasonography, Prenatal , Chromosome Mapping , Exome
19.
Nat Genet ; 55(9): 1589-1597, 2023 09.
Article in English | MEDLINE | ID: mdl-37604963

ABSTRACT

Copy number variants (CNVs) are major contributors to genetic diversity and disease. While standardized methods, such as the genome analysis toolkit (GATK), exist for detecting short variants, technical challenges have confounded uniform large-scale CNV analyses from whole-exome sequencing (WES) data. Given the profound impact of rare and de novo coding CNVs on genome organization and human disease, we developed GATK-gCNV, a flexible algorithm to discover rare CNVs from sequencing read-depth information, complete with open-source distribution via GATK. We benchmarked GATK-gCNV in 7,962 exomes from individuals in quartet families with matched genome sequencing and microarray data, finding up to 95% recall of rare coding CNVs at a resolution of more than two exons. We used GATK-gCNV to generate a reference catalog of rare coding CNVs in WES data from 197,306 individuals in the UK Biobank, and observed strong correlations between per-gene CNV rates and measures of mutational constraint, as well as rare CNV associations with multiple traits. In summary, GATK-gCNV is a tunable approach for sensitive and specific CNV discovery in WES data, with broad applications.


Subject(s)
DNA Copy Number Variations , Exome , Humans , Exome/genetics , Exome Sequencing , DNA Copy Number Variations/genetics , Chromosome Mapping , Exons
20.
Am J Hum Genet ; 110(8): 1229-1248, 2023 08 03.
Article in English | MEDLINE | ID: mdl-37541186

ABSTRACT

Despite advances in clinical genetic testing, including the introduction of exome sequencing (ES), more than 50% of individuals with a suspected Mendelian condition lack a precise molecular diagnosis. Clinical evaluation is increasingly undertaken by specialists outside of clinical genetics, often occurring in a tiered fashion and typically ending after ES. The current diagnostic rate reflects multiple factors, including technical limitations, incomplete understanding of variant pathogenicity, missing genotype-phenotype associations, complex gene-environment interactions, and reporting differences between clinical labs. Maintaining a clear understanding of the rapidly evolving landscape of diagnostic tests beyond ES, and their limitations, presents a challenge for non-genetics professionals. Newer tests, such as short-read genome or RNA sequencing, can be challenging to order, and emerging technologies, such as optical genome mapping and long-read DNA sequencing, are not available clinically. Furthermore, there is no clear guidance on the next best steps after inconclusive evaluation. Here, we review why a clinical genetic evaluation may be negative, discuss questions to be asked in this setting, and provide a framework for further investigation, including the advantages and disadvantages of new approaches that are nascent in the clinical sphere. We present a guide for the next best steps after inconclusive molecular testing based upon phenotype and prior evaluation, including when to consider referral to research consortia focused on elucidating the underlying cause of rare unsolved genetic disorders.


Subject(s)
Exome , Genetic Testing , Humans , Exome/genetics , Sequence Analysis, DNA , Phenotype , Exome Sequencing , Rare Diseases
SELECTION OF CITATIONS
SEARCH DETAIL
...