RESUMO
Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
Assuntos
Exoma/genética , Variação Genética/genética , Análise Mutacional de DNA , Conjuntos de Dados como Assunto , Humanos , Fenótipo , Proteoma/genética , Doenças Raras/genética , Tamanho da AmostraRESUMO
[This corrects the article DOI: 10.1371/journal.pgen.1007329.].
RESUMO
As part of a broader collaborative network of exome sequencing studies, we developed a jointly called data set of 5,685 Ashkenazi Jewish exomes. We make publicly available a resource of site and allele frequencies, which should serve as a reference for medical genetics in the Ashkenazim (hosted in part at https://ibd.broadinstitute.org, also available in gnomAD at http://gnomad.broadinstitute.org). We estimate that 34% of protein-coding alleles present in the Ashkenazi Jewish population at frequencies greater than 0.2% are significantly more frequent (mean 15-fold) than their maximum frequency observed in other reference populations. Arising via a well-described founder effect approximately 30 generations ago, this catalog of enriched alleles can contribute to differences in genetic risk and overall prevalence of diseases between populations. As validation we document 148 AJ enriched protein-altering alleles that overlap with "pathogenic" ClinVar alleles (table available at https://github.com/macarthur-lab/clinvar/blob/master/output/clinvar.tsv), including those that account for 10-100 fold differences in prevalence between AJ and non-AJ populations of some rare diseases, especially recessive conditions, including Gaucher disease (GBA, p.Asn409Ser, 8-fold enrichment); Canavan disease (ASPA, p.Glu285Ala, 12-fold enrichment); and Tay-Sachs disease (HEXA, c.1421+1G>C, 27-fold enrichment; p.Tyr427IlefsTer5, 12-fold enrichment). We next sought to use this catalog, of well-established relevance to Mendelian disease, to explore Crohn's disease, a common disease with an estimated two to four-fold excess prevalence in AJ. We specifically attempt to evaluate whether strong acting rare alleles, particularly protein-truncating or otherwise large effect-size alleles, enriched by the same founder-effect, contribute excess genetic risk to Crohn's disease in AJ, and find that ten rare genetic risk factors in NOD2 and LRRK2 are enriched in AJ (p < 0.005), including several novel contributing alleles, show evidence of association to CD. Independently, we find that genomewide common variant risk defined by GWAS shows a strong difference between AJ and non-AJ European control population samples (0.97 s.d. higher, p<10-16). Taken together, the results suggest coordinated selection in AJ population for higher CD risk alleles in general. The results and approach illustrate the value of exome sequencing data in case-control studies along with reference data sets like ExAC (sites VCF available via FTP at ftp.broadinstitute.org/pub/ExAC_release/release0.3/) to pinpoint genetic variation that contributes to variable disease predisposition across populations.
Assuntos
Doença de Crohn/genética , Predisposição Genética para Doença/genética , Judeus/genética , Doenças Raras/genética , Algoritmos , Doença de Crohn/epidemiologia , Genética Populacional , Estudo de Associação Genômica Ampla , Haplótipos , Humanos , Modelos Genéticos , Epidemiologia Molecular , Polimorfismo de Nucleotídeo Único , Doenças Raras/epidemiologiaRESUMO
The clinical interpretation of genetic variants has come to rely heavily on reference population databases such as the Exome Aggregation Consortium (ExAC) database. Pathogenic variants in genes associated with severe, pediatric-onset, highly penetrant, autosomal dominant conditions are assumed to be absent or rare in these databases. Exome sequencing of a 6-year-old female patient with seizures, developmental delay, dysmorphic features, and failure to thrive identified an ASXL1 variant previously reported as causative of Bohring-Opitz syndrome (BOS). Surprisingly, the variant was observed seven times in the ExAC database, presumably in individuals without BOS. Although the BOS phenotype fit, the presence of the variant in reference population databases introduced ambiguity in result interpretation. Review of the literature revealed that acquired somatic mosaicism of ASXL1 variants (including pathogenic variants) during hematopoietic clonal expansion can occur with aging in healthy individuals. We examined all ASXL1 truncating variants in the ExAC database and determined most are likely somatic. Failure to consider somatic mosaicism may lead to the inaccurate assumption that conditions like BOS have reduced penetrance, or the misclassification of potentially pathogenic variants.
Assuntos
Craniossinostoses/diagnóstico , Craniossinostoses/genética , Estudos de Associação Genética , Mutação em Linhagem Germinativa , Deficiência Intelectual/diagnóstico , Deficiência Intelectual/genética , Mutação , Proteínas Repressoras/genética , Idoso , Idoso de 80 Anos ou mais , Alelos , Substituição de Aminoácidos , Pré-Escolar , Bases de Dados Genéticas , Fácies , Feminino , Estudos de Associação Genética/métodos , Humanos , Lactente , Masculino , Pessoa de Meia-Idade , FenótipoRESUMO
PURPOSE: The accurate interpretation of variation in Mendelian disease genes has lagged behind data generation as sequencing has become increasingly accessible. Ongoing large sequencing efforts present huge interpretive challenges, but they also provide an invaluable opportunity to characterize the spectrum and importance of rare variation. METHODS: We analyzed sequence data from 7,855 clinical cardiomyopathy cases and 60,706 Exome Aggregation Consortium (ExAC) reference samples to obtain a better understanding of genetic variation in a representative autosomal dominant disorder. RESULTS: We found that in some genes previously reported as important causes of a given cardiomyopathy, rare variation is not clinically informative because there is an unacceptably high likelihood of false-positive interpretation. By contrast, in other genes, we find that diagnostic laboratories may be overly conservative when assessing variant pathogenicity. CONCLUSIONS: We outline improved analytical approaches that evaluate which genes and variant classes are interpretable and propose that these will increase the clinical utility of testing across a range of Mendelian diseases.Genet Med 19 2, 192-203.
Assuntos
Cardiomiopatias/genética , Doenças Genéticas Inatas/genética , Testes Genéticos , Variação Genética , Cardiomiopatias/epidemiologia , Biologia Computacional , Bases de Dados Genéticas , Exoma/genética , Doenças Genéticas Inatas/fisiopatologia , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação , Sequenciamento do ExomaRESUMO
OBJECTIVES: To longitudinally characterize disease-relevant CSF and plasma biomarkers in individuals at risk for genetic prion disease up to disease conversion. METHODS: This single-center longitudinal cohort study has followed known carriers of PRNP pathogenic variants at risk for prion disease, individuals with a close relative who died of genetic prion disease but who have not undergone predictive genetic testing, and controls. All participants were asymptomatic at first visit and returned roughly annually. We determined PRNP genotypes, measured NfL and GFAP in plasma, and RT-QuIC, total PrP, NfL, T-tau, and beta-synuclein in CSF. RESULTS: Among 41 carriers and 21 controls enrolled, 28 (68%) and 15 (71%) were female, and mean ages were 47.5 and 46.1. At baseline, all individuals were asymptomatic. We observed RT-QuIC seeding activity in the CSF of 3 asymptomatic E200K carriers who subsequently converted to symptomatic and died of prion disease. 1 P102L carrier remained RT-QuIC negative through symptom conversion. No other individuals developed symptoms. The prodromal window from detection of RT-QuIC positivity to disease onset was 1 year long in an E200K individual homozygous (V/V) at PRNP codon 129 and 2.5 and 3.1 years in 2 codon 129 heterozygotes (M/V). Changes in neurodegenerative and neuroinflammatory markers were variably observed prior to onset, with increases observed for plasma NfL in 4/4 converters, and plasma GFAP, CSF NfL, CSF T-tau, and CSF beta-synuclein each in 2/4 converters, although values relative to age and fold changes relative to individual baseline were not remarkable for any of these markers. CSF PrP was longitudinally stable with mean coefficient of variation 9.0% across all individuals over up to 6 years, including data from converting individuals at RT-QuIC-positive timepoints. DISCUSSION: CSF prion seeding activity may represent the earliest detectable prodromal sign in E200K carriers. Neuronal damage and neuroinflammation markers show limited sensitivity in the prodromal phase. CSF PrP levels remain stable even in the presence of RT-QuIC seeding activity. CLINICAL TRIALS REGISTRATION: ClinicalTrials.gov NCT05124392 posted 2017-12-01, updated 2023-01-27.
Assuntos
Biomarcadores , Doenças Priônicas , Proteínas Priônicas , Humanos , Feminino , Masculino , Pessoa de Meia-Idade , Biomarcadores/líquido cefalorraquidiano , Biomarcadores/sangue , Proteínas Priônicas/genética , Proteínas Priônicas/líquido cefalorraquidiano , Proteínas Priônicas/sangue , Doenças Priônicas/genética , Doenças Priônicas/líquido cefalorraquidiano , Doenças Priônicas/sangue , Doenças Priônicas/diagnóstico , Estudos Longitudinais , Adulto , Proteínas tau/líquido cefalorraquidiano , Proteínas tau/sangue , Proteínas de Neurofilamentos/líquido cefalorraquidiano , Proteínas de Neurofilamentos/sangue , Heterozigoto , Proteína Glial Fibrilar Ácida/sangue , Proteína Glial Fibrilar Ácida/líquido cefalorraquidiano , Proteína Glial Fibrilar Ácida/genética , Progressão da Doença , alfa-Sinucleína/líquido cefalorraquidiano , alfa-Sinucleína/genética , alfa-Sinucleína/sangueRESUMO
Human genetic variants predicted to cause loss-of-function of protein-coding genes (pLoF variants) provide natural in vivo models of human gene inactivation and can be valuable indicators of gene function and the potential toxicity of therapeutic inhibitors targeting these genes1,2. Gain-of-kinase-function variants in LRRK2 are known to significantly increase the risk of Parkinson's disease3,4, suggesting that inhibition of LRRK2 kinase activity is a promising therapeutic strategy. While preclinical studies in model organisms have raised some on-target toxicity concerns5-8, the biological consequences of LRRK2 inhibition have not been well characterized in humans. Here, we systematically analyze pLoF variants in LRRK2 observed across 141,456 individuals sequenced in the Genome Aggregation Database (gnomAD)9, 49,960 exome-sequenced individuals from the UK Biobank and over 4 million participants in the 23andMe genotyped dataset. After stringent variant curation, we identify 1,455 individuals with high-confidence pLoF variants in LRRK2. Experimental validation of three variants, combined with previous work10, confirmed reduced protein levels in 82.5% of our cohort. We show that heterozygous pLoF variants in LRRK2 reduce LRRK2 protein levels but that these are not strongly associated with any specific phenotype or disease state. Our results demonstrate the value of large-scale genomic databases and phenotyping of human loss-of-function carriers for target validation in drug discovery.
Assuntos
Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/genética , Mutação com Perda de Função/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Bancos de Espécimes Biológicos , Linhagem Celular , Células-Tronco Embrionárias/metabolismo , Feminino , Mutação com Ganho de Função/genética , Heterozigoto , Humanos , Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/antagonistas & inibidores , Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/metabolismo , Longevidade/genética , Linfócitos/metabolismo , Masculino , Pessoa de Meia-Idade , Miócitos Cardíacos/metabolismo , Doença de Parkinson/tratamento farmacológico , Doença de Parkinson/genética , FenótipoRESUMO
There are few disease-modifying therapeutics for neurodegenerative diseases, but successes on the development of antisense oligonucleotide (ASO) therapeutics for spinal muscular atrophy and Duchenne muscular dystrophy predict a robust future for ASOs in medicine. Indeed, existing pipelines for the development of ASO therapies for spinocerebellar ataxias, Huntington disease, Alzheimer disease, amyotrophic lateral sclerosis, Parkinson disease, and others, and increased focus by the pharmaceutical industry on ASO development, strengthen the outlook for using ASOs for neurodegenerative diseases. Perhaps the most significant advantage to ASO therapeutics over other small molecule approaches is that acquisition of the target sequence provides immediate knowledge of putative complementary oligonucleotide therapeutics. In this review, we describe the various types of ASOs, how they are used therapeutically, and the present efforts to develop new ASO therapies that will contribute to a forthcoming toolkit for treating multiple neurodegenerative diseases.
RESUMO
This software repository provides a pipeline for converting raw ClinVar data files into analysis-friendly tab-delimited tables, and also provides these tables for the most recent ClinVar release. Separate tables are generated for genome builds GRCh37 and GRCh38 as well as for mono-allelic variants and complex multi-allelic variants. Additionally, the tables are augmented with allele frequencies from the ExAC and gnomAD datasets as these are often consulted when analyzing ClinVar variants. Overall, this work provides ClinVar data in a format that is easier to work with and can be directly loaded into a variety of popular analysis tools such as R, python pandas, and SQL databases.
RESUMO
Accurate prediction of the functional effect of genetic variation is critical for clinical genome interpretation. We systematically characterized the transcriptome effects of protein-truncating variants, a class of variants expected to have profound effects on gene function, using data from the Genotype-Tissue Expression (GTEx) and Geuvadis projects. We quantitated tissue-specific and positional effects on nonsense-mediated transcript decay and present an improved predictive model for this decay. We directly measured the effect of variants both proximal and distal to splice junctions. Furthermore, we found that robustness to heterozygous gene inactivation is not due to dosage compensation. Our results illustrate the value of transcriptome data in the functional interpretation of genetic variants.