RESUMEN
We present the largest exome sequencing study of autism spectrum disorder (ASD) to date (n = 35,584 total samples, 11,986 with ASD). Using an enhanced analytical framework to integrate de novo and case-control rare variation, we identify 102 risk genes at a false discovery rate of 0.1 or less. Of these genes, 49 show higher frequencies of disruptive de novo variants in individuals ascertained to have severe neurodevelopmental delay, whereas 53 show higher frequencies in individuals ascertained to have ASD; comparing ASD cases with mutations in these groups reveals phenotypic differences. Expressed early in brain development, most risk genes have roles in regulation of gene expression or neuronal communication (i.e., mutations effect neurodevelopmental and neurophysiological changes), and 13 fall within loci recurrently hit by copy number variants. In cells from the human cortex, expression of risk genes is enriched in excitatory and inhibitory neuronal lineages, consistent with multiple paths to an excitatory-inhibitory imbalance underlying ASD.
Asunto(s)
Trastorno Autístico/genética , Corteza Cerebral/crecimiento & desarrollo , Secuenciación del Exoma/métodos , Regulación del Desarrollo de la Expresión Génica , Neurobiología/métodos , Estudios de Casos y Controles , Linaje de la Célula , Estudios de Cohortes , Exoma , Femenino , Frecuencia de los Genes , Predisposición Genética a la Enfermedad , Humanos , Masculino , Mutación Missense , Neuronas/metabolismo , Fenotipo , Factores Sexuales , Análisis de la Célula Individual/métodosRESUMEN
The splicing of pre-mRNAs into mature transcripts is remarkable for its precision, but the mechanisms by which the cellular machinery achieves such specificity are incompletely understood. Here, we describe a deep neural network that accurately predicts splice junctions from an arbitrary pre-mRNA transcript sequence, enabling precise prediction of noncoding genetic variants that cause cryptic splicing. Synonymous and intronic mutations with predicted splice-altering consequence validate at a high rate on RNA-seq and are strongly deleterious in the human population. De novo mutations with predicted splice-altering consequence are significantly enriched in patients with autism and intellectual disability compared to healthy controls and validate against RNA-seq in 21 out of 28 of these patients. We estimate that 9%-11% of pathogenic mutations in patients with rare genetic disorders are caused by this previously underappreciated class of disease variation.
Asunto(s)
Predicción/métodos , Precursores del ARN/genética , Empalme del ARN/genética , Algoritmos , Empalme Alternativo/genética , Trastorno Autístico/genética , Aprendizaje Profundo , Exones/genética , Humanos , Discapacidad Intelectual/genética , Intrones/genética , Redes Neurales de la Computación , Precursores del ARN/metabolismo , Sitios de Empalme de ARN/genética , Sitios de Empalme de ARN/fisiologíaRESUMEN
Clonal haematopoiesis involves the expansion of certain blood cell lineages and has been associated with ageing and adverse health outcomes1-5. Here we use exome sequence data on 628,388 individuals to identify 40,208 carriers of clonal haematopoiesis of indeterminate potential (CHIP). Using genome-wide and exome-wide association analyses, we identify 24 loci (21 of which are novel) where germline genetic variation influences predisposition to CHIP, including missense variants in the lymphocytic antigen coding gene LY75, which are associated with reduced incidence of CHIP. We also identify novel rare variant associations with clonal haematopoiesis and telomere length. Analysis of 5,041 health traits from the UK Biobank (UKB) found relationships between CHIP and severe COVID-19 outcomes, cardiovascular disease, haematologic traits, malignancy, smoking, obesity, infection and all-cause mortality. Longitudinal and Mendelian randomization analyses revealed that CHIP is associated with solid cancers, including non-melanoma skin cancer and lung cancer, and that CHIP linked to DNMT3A is associated with the subsequent development of myeloid but not lymphoid leukaemias. Additionally, contrary to previous findings from the initial 50,000 UKB exomes6, our results in the full sample do not support a role for IL-6 inhibition in reducing the risk of cardiovascular disease among CHIP carriers. Our findings demonstrate that CHIP represents a complex set of heterogeneous phenotypes with shared and unique germline genetic causes and varied clinical implications.
Asunto(s)
COVID-19 , Enfermedades Cardiovasculares , Humanos , Hematopoyesis Clonal/genética , Enfermedades Cardiovasculares/epidemiología , Enfermedades Cardiovasculares/genéticaRESUMEN
Critical COVID-19 is caused by immune-mediated inflammatory lung injury. Host genetic variation influences the development of illness requiring critical care1 or hospitalization2-4 after infection with SARS-CoV-2. The GenOMICC (Genetics of Mortality in Critical Care) study enables the comparison of genomes from individuals who are critically ill with those of population controls to find underlying disease mechanisms. Here we use whole-genome sequencing in 7,491 critically ill individuals compared with 48,400 controls to discover and replicate 23 independent variants that significantly predispose to critical COVID-19. We identify 16 new independent associations, including variants within genes that are involved in interferon signalling (IL10RB and PLSCR1), leucocyte differentiation (BCL11A) and blood-type antigen secretor status (FUT2). Using transcriptome-wide association and colocalization to infer the effect of gene expression on disease severity, we find evidence that implicates multiple genes-including reduced expression of a membrane flippase (ATP11A), and increased expression of a mucin (MUC1)-in critical disease. Mendelian randomization provides evidence in support of causal roles for myeloid cell adhesion molecules (SELE, ICAM5 and CD209) and the coagulation factor F8, all of which are potentially druggable targets. Our results are broadly consistent with a multi-component model of COVID-19 pathophysiology, in which at least two distinct mechanisms can predispose to life-threatening disease: failure to control viral replication; or an enhanced tendency towards pulmonary inflammation and intravascular coagulation. We show that comparison between cases of critical illness and population controls is highly efficient for the detection of therapeutically relevant mechanisms of disease.
Asunto(s)
COVID-19 , Enfermedad Crítica , Genoma Humano , Interacciones Huésped-Patógeno , Secuenciación Completa del Genoma , Transportadoras de Casetes de Unión a ATP , COVID-19/genética , COVID-19/mortalidad , COVID-19/patología , COVID-19/virología , Moléculas de Adhesión Celular , Cuidados Críticos , Enfermedad Crítica/mortalidad , Selectina E , Factor VIII , Fucosiltransferasas , Genoma Humano/genética , Estudio de Asociación del Genoma Completo , Interacciones Huésped-Patógeno/genética , Humanos , Subunidad beta del Receptor de Interleucina-10 , Lectinas Tipo C , Mucina-1 , Proteínas del Tejido Nervioso , Proteínas de Transferencia de Fosfolípidos , Receptores de Superficie Celular , Proteínas Represoras , SARS-CoV-2/patogenicidad , Galactósido 2-alfa-L-FucosiltransferasaRESUMEN
A major goal in human genetics is to use natural variation to understand the phenotypic consequences of altering each protein-coding gene in the genome. Here we used exome sequencing1 to explore protein-altering variants and their consequences in 454,787 participants in the UK Biobank study2. We identified 12 million coding variants, including around 1 million loss-of-function and around 1.8 million deleterious missense variants. When these were tested for association with 3,994 health-related traits, we found 564 genes with trait associations at P ≤ 2.18 × 10-11. Rare variant associations were enriched in loci from genome-wide association studies (GWAS), but most (91%) were independent of common variant signals. We discovered several risk-increasing associations with traits related to liver disease, eye disease and cancer, among others, as well as risk-lowering associations for hypertension (SLC9A3R2), diabetes (MAP3K15, FAM234A) and asthma (SLC27A3). Six genes were associated with brain imaging phenotypes, including two involved in neural development (GBE1, PLD1). Of the signals available and powered for replication in an independent cohort, 81% were confirmed; furthermore, association signals were generally consistent across individuals of European, Asian and African ancestry. We illustrate the ability of exome sequencing to identify gene-trait associations, elucidate gene function and pinpoint effector genes that underlie GWAS signals at scale.
Asunto(s)
Bancos de Muestras Biológicas , Bases de Datos Genéticas , Secuenciación del Exoma , Exoma/genética , África/etnología , Asia/etnología , Asma/genética , Diabetes Mellitus/genética , Europa (Continente)/etnología , Oftalmopatías/genética , Femenino , Predisposición Genética a la Enfermedad/genética , Variación Genética , Estudio de Asociación del Genoma Completo , Humanos , Hipertensión/genética , Hepatopatías/genética , Masculino , Mutación , Neoplasias/genética , Carácter Cuantitativo Heredable , Reino UnidoRESUMEN
The acceleration of DNA sequencing in samples from patients and population studies has resulted in extensive catalogues of human genetic variation, but the interpretation of rare genetic variants remains problematic. A notable example of this challenge is the existence of disruptive variants in dosage-sensitive disease genes, even in apparently healthy individuals. Here, by manual curation of putative loss-of-function (pLoF) variants in haploinsufficient disease genes in the Genome Aggregation Database (gnomAD)1, we show that one explanation for this paradox involves alternative splicing of mRNA, which allows exons of a gene to be expressed at varying levels across different cell types. Currently, no existing annotation tool systematically incorporates information about exon expression into the interpretation of variants. We develop a transcript-level annotation metric known as the 'proportion expressed across transcripts', which quantifies isoform expression for variants. We calculate this metric using 11,706 tissue samples from the Genotype Tissue Expression (GTEx) project2 and show that it can differentiate between weakly and highly evolutionarily conserved exons, a proxy for functional importance. We demonstrate that expression-based annotation selectively filters 22.8% of falsely annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while removing less than 4% of high-confidence pathogenic variants in the same genes. Finally, we apply our expression filter to the analysis of de novo variants in patients with autism spectrum disorder and intellectual disability or developmental disorders to show that pLoF variants in weakly expressed regions have similar effect sizes to those of synonymous variants, whereas pLoF variants in highly expressed exons are most strongly enriched among cases. Our annotation is fast, flexible and generalizable, making it possible for any variant file to be annotated with any isoform expression dataset, and will be valuable for the genetic diagnosis of rare diseases, the analysis of rare variant burden in complex disorders, and the curation and prioritization of variants in recall-by-genotype studies.
Asunto(s)
Enfermedad/genética , Haploinsuficiencia/genética , Mutación con Pérdida de Función/genética , Anotación de Secuencia Molecular , Transcripción Genética , Transcriptoma/genética , Trastorno del Espectro Autista/genética , Conjuntos de Datos como Asunto , Discapacidades del Desarrollo/genética , Exones/genética , Femenino , Genotipo , Humanos , Discapacidad Intelectual/genética , Masculino , Anotación de Secuencia Molecular/normas , Distribución de Poisson , ARN Mensajero/análisis , ARN Mensajero/genética , Enfermedades Raras/diagnóstico , Enfermedades Raras/genética , Reproducibilidad de los Resultados , Secuenciación del ExomaRESUMEN
Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.
Asunto(s)
Exoma/genética , Genes Esenciales/genética , Variación Genética/genética , Genoma Humano/genética , Adulto , Encéfalo/metabolismo , Enfermedades Cardiovasculares/genética , Estudios de Cohortes , Bases de Datos Genéticas , Femenino , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo , Humanos , Mutación con Pérdida de Función/genética , Masculino , Tasa de Mutación , Proproteína Convertasa 9/genética , ARN Mensajero/genética , Reproducibilidad de los Resultados , Secuenciación del Exoma , Secuenciación Completa del GenomaRESUMEN
Host genetics is a key determinant of COVID-19 outcomes. Previously, the COVID-19 Host Genetics Initiative genome-wide association study used common variants to identify multiple loci associated with COVID-19 outcomes. However, variants with the largest impact on COVID-19 outcomes are expected to be rare in the population. Hence, studying rare variants may provide additional insights into disease susceptibility and pathogenesis, thereby informing therapeutics development. Here, we combined whole-exome and whole-genome sequencing from 21 cohorts across 12 countries and performed rare variant exome-wide burden analyses for COVID-19 outcomes. In an analysis of 5,085 severe disease cases and 571,737 controls, we observed that carrying a rare deleterious variant in the SARS-CoV-2 sensor toll-like receptor TLR7 (on chromosome X) was associated with a 5.3-fold increase in severe disease (95% CI: 2.75-10.05, p = 5.41x10-7). This association was consistent across sexes. These results further support TLR7 as a genetic determinant of severe disease and suggest that larger studies on rare variants influencing COVID-19 outcomes could provide additional insights.
Asunto(s)
COVID-19 , Exoma , Humanos , Exoma/genética , Estudio de Asociación del Genoma Completo , COVID-19/genética , Predisposición Genética a la Enfermedad , Receptor Toll-Like 7/genética , SARS-CoV-2/genéticaRESUMEN
Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) causes coronavirus disease 2019 (COVID-19), a respiratory illness that can result in hospitalization or death. We used exome sequence data to investigate associations between rare genetic variants and seven COVID-19 outcomes in 586,157 individuals, including 20,952 with COVID-19. After accounting for multiple testing, we did not identify any clear associations with rare variants either exome wide or when specifically focusing on (1) 13 interferon pathway genes in which rare deleterious variants have been reported in individuals with severe COVID-19, (2) 281 genes located in susceptibility loci identified by the COVID-19 Host Genetics Initiative, or (3) 32 additional genes of immunologic relevance and/or therapeutic potential. Our analyses indicate there are no significant associations with rare protein-coding variants with detectable effect sizes at our current sample sizes. Analyses will be updated as additional data become available, and results are publicly available through the Regeneron Genetics Center COVID-19 Results Browser.
Asunto(s)
COVID-19/diagnóstico , COVID-19/genética , Secuenciación del Exoma , Exoma/genética , Predisposición Genética a la Enfermedad , Hospitalización/estadística & datos numéricos , COVID-19/inmunología , COVID-19/terapia , Femenino , Humanos , Interferones/genética , Masculino , Pronóstico , SARS-CoV-2 , Tamaño de la MuestraRESUMEN
BACKGROUND: Open-label platform trials and a prospective meta-analysis suggest efficacy of anti-interleukin (IL)-6R therapies in hospitalized patients with coronavirus disease 2019 (COVID-19) receiving corticosteroids. This study evaluated the efficacy and safety of sarilumab, an anti-IL-6R monoclonal antibody, in the treatment of hospitalized patients with COVID-19. METHODS: In this adaptive, phase 2/3, randomized, double-blind, placebo-controlled trial, adults hospitalized with COVID-19 received intravenous sarilumab 400 mg or placebo. The phase 3 primary analysis population included patients with critical COVID-19 receiving mechanical ventilation (MV). The primary outcome was proportion of patients with ≥1-point improvement in clinical status from baseline to day 22. RESULTS: There were 457 and 1365 patients randomized and treated in phases 2 and 3, respectively. In phase 3, patients with critical COVID-19 receiving MV (nâ =â 298; 28.2% on corticosteroids), the proportion with ≥1-point improvement in clinical status (alive, not receiving MV) at day 22 was 43.2% for sarilumab and 35.5% for placebo (risk difference, +7.5%; 95% confidence interval [CI], -7.4 to 21.3; P =.3261), a relative risk improvement of 21.7%. In post hoc analyses pooling phase 2 and 3 critical patients receiving MV, the hazard ratio for death for sarilumab vs placebo was 0.76 (95% CI, .51 to 1.13) overall and 0.49 (95% CI, .25 to .94) in patients receiving corticosteroids at baseline. CONCLUSIONS: This study did not establish the efficacy of sarilumab in hospitalized patients with severe/critical COVID-19. Post hoc analyses were consistent with other studies that found a benefit of sarilumab in patients receiving corticosteroids. CLINICAL TRIALS REGISTRATION: NCT04315298.
Asunto(s)
Tratamiento Farmacológico de COVID-19 , Adulto , Anticuerpos Monoclonales Humanizados , Humanos , Estudios Prospectivos , Resultado del TratamientoRESUMEN
Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
Asunto(s)
Exoma/genética , Variación Genética/genética , Análisis Mutacional de ADN , Conjuntos de Datos como Asunto , Humanos , Fenotipo , Proteoma/genética , Enfermedades Raras/genética , Tamaño de la MuestraRESUMEN
The fate of alleles in the human population is believed to be highly affected by the stochastic force of genetic drift. Estimation of the strength of natural selection in humans generally necessitates a careful modeling of drift including complex effects of the population history and structure. Protein-truncating variants (PTVs) are expected to evolve under strong purifying selection and to have a relatively high per-gene mutation rate. Thus, it is appealing to model the population genetics of PTVs under a simple deterministic mutation-selection balance, as has been proposed earlier (Cassa et al. 2017). Here, we investigated the limits of this approximation using both computer simulations and data-driven approaches. Our simulations rely on a model of demographic history estimated from 33,370 individual exomes of the Non-Finnish European subset of the ExAC data set (Lek et al. 2016). Additionally, we compared the African and European subset of the ExAC study and analyzed de novo PTVs. We show that the mutation-selection balance model is applicable to the majority of human genes, but not to genes under the weakest selection.
Asunto(s)
Codón sin Sentido , Flujo Genético , Modelos Genéticos , Selección Genética , Humanos , Crecimiento DemográficoRESUMEN
The 16p11.2 600 kb copy-number variants (CNVs) are associated with mirror phenotypes on BMI, head circumference, and brain volume and represent frequent genetic lesions in autism spectrum disorders (ASDs) and schizophrenia. Here we interrogated the transcriptome of individuals carrying reciprocal 16p11.2 CNVs. Transcript perturbations correlated with clinical endophenotypes and were enriched for genes associated with ASDs, abnormalities of head size, and ciliopathies. Ciliary gene expression was also perturbed in orthologous mouse models, raising the possibility that ciliary dysfunction contributes to 16p11.2 pathologies. In support of this hypothesis, we found structural ciliary defects in the CA1 hippocampal region of 16p11.2 duplication mice. Moreover, by using an established zebrafish model, we show genetic interaction between KCTD13, a key driver of the mirrored neuroanatomical phenotypes of the 16p11.2 CNV, and ciliopathy-associated genes. Overexpression of BBS7 rescues head size and neuroanatomical defects of kctd13 morphants, whereas suppression or overexpression of CEP290 rescues phenotypes induced by KCTD13 under- or overexpression, respectively. Our data suggest that dysregulation of ciliopathy genes contributes to the clinical phenotypes of these CNVs.
Asunto(s)
Trastornos Generalizados del Desarrollo Infantil/genética , Cromosomas Humanos Par 16/genética , Variaciones en el Número de Copia de ADN/genética , Esquizofrenia/genética , Animales , Encéfalo , Niño , Trastornos Generalizados del Desarrollo Infantil/patología , Deleción Cromosómica , Cuerpo Ciliar/metabolismo , Cuerpo Ciliar/patología , Regulación de la Expresión Génica , Humanos , Ratones , Canales de Potasio con Entrada de Voltaje/genética , Esquizofrenia/patología , Transcriptoma , Pez Cebra , Proteínas de Pez Cebra/genéticaRESUMEN
Autism spectrum disorders (ASDs) are a highly heterogeneous group of conditions--phenotypically and genetically--although the link between phenotypic variation and differences in genetic architecture is unclear. This study aimed to determine whether differences in cognitive impairment and symptom severity reflect variation in the degree to which ASD cases reflect de novo or familial influences. Using data from more than 2,000 simplex cases of ASD, we examined the relationship between intelligence quotient (IQ), behavior and language assessments, and rate of de novo loss of function (LOF) mutations and family history of broadly defined psychiatric disease (depressive disorders, bipolar disorder, and schizophrenia; history of psychiatric hospitalization). Proband IQ was negatively associated with de novo LOF rate (P = 0.03) and positively associated with family history of psychiatric disease (P = 0.003). Female cases had a higher frequency of sporadic genetic events across the severity distribution (P = 0.01). High rates of LOF mutation and low frequencies of family history of psychiatric illness were seen in individuals who were unable to complete a traditional IQ test, a group with the greatest degree of language and behavioral impairment. These analyses provide strong evidence that familial risk for neuropsychiatric disease becomes more relevant to ASD etiology as cases become higher functioning. The findings of this study reinforce that there are many routes to the diagnostic category of autism and could lead to genetic studies with more specific insights into individual cases.
Asunto(s)
Trastornos Generalizados del Desarrollo Infantil/diagnóstico , Trastornos Generalizados del Desarrollo Infantil/genética , Conducta , Trastorno Bipolar/genética , Trastornos Generalizados del Desarrollo Infantil/epidemiología , Trastornos del Conocimiento , Femenino , Predisposición Genética a la Enfermedad , Humanos , Pruebas de Inteligencia , Masculino , Mutación , Fenotipo , Análisis de Regresión , Factores de Riesgo , Esquizofrenia/genética , ConvulsionesRESUMEN
With the rise of sequencing technologies, it is now feasible to assess the role rare variants play in the genetic contribution to complex trait variation. While some of the earlier targeted sequencing studies successfully identified rare variants of large effect, unbiased gene discovery using exome sequencing has experienced limited success for complex traits. Nevertheless, rare variant association studies have demonstrated that rare variants do contribute to phenotypic variability, but sample sizes will likely have to be even larger than those of common variant association studies to be powered for the detection of genes and loci. Large-scale sequencing efforts of tens of thousands of individuals, such as the UK10K Project and aggregation efforts such as the Exome Aggregation Consortium, have made great strides in advancing our knowledge of the landscape of rare variation, but there remain many considerations when studying rare variation in the context of complex traits. We discuss these considerations in this review, presenting a broad range of topics at a high level as an introduction to rare variant analysis in complex traits including the issues of power, study design, sample ascertainment, de novo variation, and statistical testing approaches. Ultimately, as sequencing costs continue to decline, larger sequencing studies will yield clearer insights into the biological consequence of rare mutations and may reveal which genes play a role in the etiology of complex traits.
Asunto(s)
Exoma/genética , Predisposición Genética a la Enfermedad , Variación Genética , Mutación/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , FenotipoRESUMEN
Missense variants can have a range of functional impacts depending on factors such as the specific amino acid substitution and location within the gene. To interpret their deleteriousness, studies have sought to identify regions within genes that are specifically intolerant of missense variation 1-12 . Here, we leverage the patterns of rare missense variation in 125,748 individuals in the Genome Aggregation Database (gnomAD) 13 against a null mutational model to identify transcripts that display regional differences in missense constraint. Missense-depleted regions are enriched for ClinVar 14 pathogenic variants, de novo missense variants from individuals with neurodevelopmental disorders (NDDs) 15,16 , and complex trait heritability. Following ClinGen calibration recommendations for the ACMG/AMP guidelines, we establish that regions with less than 20% of their expected missense variation achieve moderate support for pathogenicity. We create a missense deleteriousness metric (MPC) that incorporates regional constraint and outperforms other deleteriousness scores at stratifying case and control de novo missense variation, with a strong enrichment in NDDs. These results provide additional tools to aid in missense variant interpretation.