RESUMEN
BACKGROUND: Understanding the genetic basis of human diseases has become integral to drug development and precision medicine. Recent advancements have enabled the identification of molecular pathways driving diseases, leading to targeted treatment strategies. The increasing investment in rare diseases by the biotech industry underscores the importance of genetic evidence in drug discovery and approval processes. Here we studied a monogenic Mendelian kidney disease, TRPC6-associated podocytopathy (TRPC6-AP), to present its natural history, genetic spectrum, and clinicopathological associations in a large cohort of patients with causal variants in TRPC6, in order to help define the specific features of disease and further facilitate drug development and clinical trials design. METHODS: the study involved 64 individuals from 39 families with TRPC6 causal missense variants. Clinical data, including age of onset, laboratory results, response to treatment, kidney biopsy findings, and genetic information, were collected from multiple centers nationally and internationally. Exome or targeted sequencing was performed and variant classification was based on strict criteria. Structural and functional analyses of TRPC6 variants were conducted to understand their impact on protein function. In depth re-analysis of light and electron microscopy specimens for 9 available kidney biopsies was conducted to identify pathological features and correlates of TRPC6-AP. RESULTS: Large-scale sequencing data did not support causality for TRPC6 protein-truncating variants. We identified 21 unique TRPC6 missense variants, clustering in three distinct regions of the protein, and with different effects on TRPC6 3D protein structure. Kidney biopsy analysis revealed FSGS patterns of injury in most cases, along with distinctive podocyte features including diffuse foot process effacement and swollen cell bodies. The majority of patients presented in adolescence or early adulthood but with ample variation (average 22, SD ± 14 years), with frequent progression to kidney failure but with variability in time between presentation and ESKD. CONCLUSIONS: This study provides insights into the genetic spectrum, clinicopathological associations, and natural history of TRPC6-AP.
RESUMEN
The emergence of biobank-level datasets offers new opportunities to discover novel biomarkers and develop predictive algorithms for human disease. Here, we present an ensemble machine-learning framework (machine learning with phenotype associations, MILTON) utilizing a range of biomarkers to predict 3,213 diseases in the UK Biobank. Leveraging the UK Biobank's longitudinal health record data, MILTON predicts incident disease cases undiagnosed at time of recruitment, largely outperforming available polygenic risk scores. We further demonstrate the utility of MILTON in augmenting genetic association analyses in a phenome-wide association study of 484,230 genome-sequenced samples, along with 46,327 samples with matched plasma proteomics data. This resulted in improved signals for 88 known (P < 1 × 10-8) gene-disease relationships alongside 182 gene-disease relationships that did not achieve genome-wide significance in the nonaugmented baseline cohorts. We validated these discoveries in the FinnGen biobank alongside two orthogonal machine-learning methods built for gene-disease prioritization. All extracted gene-disease associations and incident disease predictive biomarkers are publicly available ( http://milton.public.cgr.astrazeneca.com ).
Asunto(s)
Bancos de Muestras Biológicas , Biomarcadores , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Aprendizaje Automático , Humanos , Reino Unido , Estudio de Asociación del Genoma Completo/métodos , Estudios de Casos y Controles , Herencia Multifactorial/genética , Proteómica/métodos , Fenotipo , Polimorfismo de Nucleótido Simple , Algoritmos , Multiómica , Biobanco del Reino UnidoRESUMEN
Telomeres protect chromosome ends from damage and their length is linked with human disease and aging. We developed a joint telomere length metric, combining quantitative PCR and whole-genome sequencing measurements from 462,666 UK Biobank participants. This metric increased SNP heritability, suggesting that it better captures genetic regulation of telomere length. Exome-wide rare-variant and gene-level collapsing association studies identified 64 variants and 30 genes significantly associated with telomere length, including allelic series in ACD and RTEL1. Notably, 16% of these genes are known drivers of clonal hematopoiesis-an age-related somatic mosaicism associated with myeloid cancers and several nonmalignant diseases. Somatic variant analyses revealed gene-specific associations with telomere length, including lengthened telomeres in individuals with large SRSF2-mutant clones, compared with shortened telomeres in individuals with clonal expansions driven by other genes. Collectively, our findings demonstrate the impact of rare variants on telomere length, with larger effects observed among genes also associated with clonal hematopoiesis.
Asunto(s)
Bancos de Muestras Biológicas , Polimorfismo de Nucleótido Simple , Telómero , Secuenciación Completa del Genoma , Humanos , Telómero/genética , Reino Unido , Secuenciación Completa del Genoma/métodos , Homeostasis del Telómero/genética , Masculino , Femenino , Hematopoyesis Clonal/genética , Estudio de Asociación del Genoma Completo/métodos , Anciano , ADN Helicasas/genética , Persona de Mediana Edad , Biobanco del Reino UnidoRESUMEN
Gene misexpression is the aberrant transcription of a gene in a context where it is usually inactive. Despite its known pathological consequences in specific rare diseases, we have a limited understanding of its wider prevalence and mechanisms in humans. To address this, we analyzed gene misexpression in 4,568 whole-blood bulk RNA sequencing samples from INTERVAL study blood donors. We found that while individual misexpression events occur rarely, in aggregate they were found in almost all samples and a third of inactive protein-coding genes. Using 2,821 paired whole-genome and RNA sequencing samples, we identified that misexpression events are enriched in cis for rare structural variants. We established putative mechanisms through which a subset of SVs lead to gene misexpression, including transcriptional readthrough, transcript fusions, and gene inversion. Overall, we develop misexpression as a type of transcriptomic outlier analysis and extend our understanding of the variety of mechanisms by which genetic variants can influence gene expression.
Asunto(s)
Regulación de la Expresión Génica , Humanos , Análisis de Secuencia de ARN , Variación Genética , Variación Estructural del Genoma/genética , Transcriptoma/genética , Donantes de SangreRESUMEN
The etiology of prostate cancer, the second most common cancer in men globally, has a strong heritable component. While rare coding germline variants in several genes have been identified as risk factors from candidate gene and linkage studies, the exome-wide spectrum of causal rare variants remains to be fully explored. To more comprehensively address their contribution, we analysed data from 37,184 prostate cancer cases and 331,329 male controls from five cohorts with germline exome/genome sequencing and one cohort with imputed array data from a population enriched in low-frequency deleterious variants. Our gene-level collapsing analysis revealed that rare damaging variants in SAMHD1 as well as genes in the DNA damage response pathway (BRCA2, ATM and CHEK2) are associated with the risk of overall prostate cancer. We also found that rare damaging variants in AOX1 and BRCA2 were associated with increased severity of prostate cancer in a case-only analysis of aggressive versus non-aggressive prostate cancer. At the single-variant level, we found rare non-synonymous variants in three genes (HOXB13, CHEK2, BIK) significantly associated with increased risk of overall prostate cancer and in four genes (ANO7, SPDL1, AR, TERT) with decreased risk. Altogether, this study provides deeper insights into the genetic architecture and biological basis of prostate cancer risk and severity.
RESUMEN
The ongoing expansion of human genomic datasets propels therapeutic target identification; however, extracting gene-disease associations from gene annotations remains challenging. Here, we introduce Mantis-ML 2.0, a framework integrating AstraZeneca's Biological Insights Knowledge Graph and numerous tabular datasets, to assess gene-disease probabilities throughout the phenome. We use graph neural networks, capturing the graph's holistic structure, and train them on hundreds of balanced datasets via a robust semi-supervised learning framework to provide gene-disease probabilities across the human exome. Mantis-ML 2.0 incorporates natural language processing to automate disease-relevant feature selection for thousands of diseases. The enhanced models demonstrate a 6.9% average classification power boost, achieving a median receiver operating characteristic (ROC) area under curve (AUC) score of 0.90 across 5220 diseases from Human Phenotype Ontology, OpenTargets, and Genomics England. Notably, Mantis-ML 2.0 prioritizes associations from an independent UK Biobank phenome-wide association study (PheWAS), providing a stronger form of triaging and mitigating against underpowered PheWAS associations. Results are exposed through an interactive web resource.
Asunto(s)
Redes Neurales de la Computación , Humanos , Algoritmos , Biología Computacional/métodos , Bases de Datos Genéticas , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo/métodos , Genómica/métodos , Fenómica/métodos , Fenotipo , Biobanco del Reino Unido , Reino UnidoRESUMEN
Obesity is a major risk factor for many common diseases and has a substantial heritable component. To identify new genetic determinants, we performed exome-sequence analyses for adult body mass index (BMI) in up to 587,027 individuals. We identified rare loss-of-function variants in two genes (BSN and APBA1) with effects substantially larger than those of well-established obesity genes such as MC4R. In contrast to most other obesity-related genes, rare variants in BSN and APBA1 were not associated with normal variation in childhood adiposity. Furthermore, BSN protein-truncating variants (PTVs) magnified the influence of common genetic variants associated with BMI, with a common variant polygenic score exhibiting an effect twice as large in BSN PTV carriers than in noncarriers. Finally, we explored the plasma proteomic signatures of BSN PTV carriers as well as the functional consequences of BSN deletion in human induced pluripotent stem cell-derived hypothalamic neurons. Collectively, our findings implicate degenerative processes in synaptic function in the etiology of adult-onset obesity.
Asunto(s)
Diabetes Mellitus Tipo 2 , Células Madre Pluripotentes Inducidas , Hepatopatías , Proteínas del Tejido Nervioso , Adulto , Humanos , Proteínas Adaptadoras Transductoras de Señales/genética , Diabetes Mellitus Tipo 2/genética , Predisposición Genética a la Enfermedad , Proteínas del Tejido Nervioso/genética , Obesidad/complicaciones , Obesidad/genética , ProteómicaRESUMEN
N-methyl-D-aspartate receptors (NMDARs) are members of the glutamate receptor family and participate in excitatory postsynaptic transmission throughout the central nervous system. Genetic variants in GRIN genes encoding NMDAR subunits are associated with a spectrum of neurological disorders. The M3 transmembrane helices of the NMDAR couple directly to the agonist-binding domains and form a helical bundle crossing in the closed receptors that occludes the pore. The M3 functions as a transduction element whose conformational change couples ligand binding to opening of an ion conducting pore. In this study, we report the functional consequences of 48 de novo missense variants in GRIN1, GRIN2A, and GRIN2B that alter residues in the M3 transmembrane helix. These de novo variants were identified in children with neurological and neuropsychiatric disorders including epilepsy, developmental delay, intellectual disability, hypotonia and attention deficit hyperactivity disorder. All 48 variants in M3 for which comprehensive testing was completed produce a gain-of-function (28/48) compared to loss-of-function (9/48); 11 variants had an indeterminant phenotype. This supports the idea that a key structural feature of the M3 gate exists to stabilize the closed state so that agonist binding can drive channel opening. Given that most M3 variants enhance channel gating, we assessed the potency of FDA-approved NMDAR channel blockers on these variant receptors. These data provide new insight into the structure-function relationship of the NMDAR gate, and suggest that variants within the M3 transmembrane helix produce a gain-of-function.
Asunto(s)
Epilepsia , Receptores de N-Metil-D-Aspartato , Niño , Humanos , Epilepsia/genética , Mutación Missense , Fenotipo , Receptores de N-Metil-D-Aspartato/genética , Receptores de N-Metil-D-Aspartato/metabolismo , Transducción de SeñalRESUMEN
Genomic medicine has been transformed by next-generation sequencing (NGS), inclusive of exome sequencing (ES) and genome sequencing (GS). Currently, ES is offered widely in clinical settings, with a less prevalent alternative model consisting of hybrid programs that incorporate research ES along with clinical patient workflows. We were among the earliest to implement a hybrid ES clinic, have provided diagnoses to 45% of probands, and have identified several novel candidate genes. Our program is enabled by a cost-effective investment by the health system and is unique in encompassing all the processes that have been variably included in other hybrid/clinical programs. These include careful patient selection, utilization of a phenotype-agnostic bioinformatics pipeline followed by manual curation of variants and phenotype integration by clinicians, close collaborations between the clinicians and the bioinformatician, pursuit of interesting variants, communication of results to patients in categories that are predicated upon the certainty of a diagnosis, and tracking changes in results over time and the underlying mechanisms for such changes. Due to its effectiveness, scalability to GS and its resource efficiency, specific elements of our paradigm can be incorporated into existing clinical settings, or the entire hybrid model can be implemented within health systems that have genomic medicine programs, to provide NGS in a scientifically rigorous, yet pragmatic setting.
Asunto(s)
Biología Computacional , Exoma , Humanos , Exoma/genética , Fenotipo , Secuenciación del Exoma , Secuenciación de Nucleótidos de Alto RendimientoRESUMEN
The Mexico City Prospective Study is a prospective cohort of more than 150,000 adults recruited two decades ago from the urban districts of Coyoacán and Iztapalapa in Mexico City1. Here we generated genotype and exome-sequencing data for all individuals and whole-genome sequencing data for 9,950 selected individuals. We describe high levels of relatedness and substantial heterogeneity in ancestry composition across individuals. Most sequenced individuals had admixed Indigenous American, European and African ancestry, with extensive admixture from Indigenous populations in central, southern and southeastern Mexico. Indigenous Mexican segments of the genome had lower levels of coding variation but an excess of homozygous loss-of-function variants compared with segments of African and European origin. We estimated ancestry-specific allele frequencies at 142 million genomic variants, with an effective sample size of 91,856 for Indigenous Mexican ancestry at exome variants, all available through a public browser. Using whole-genome sequencing, we developed an imputation reference panel that outperforms existing panels at common variants in individuals with high proportions of central, southern and southeastern Indigenous Mexican ancestry. Our work illustrates the value of genetic studies in diverse populations and provides foundational imputation and allele frequency resources for future genetic studies in Mexico and in the United States, where the Hispanic/Latino population is predominantly of Mexican descent.
Asunto(s)
Secuenciación del Exoma , Genoma Humano , Genotipo , Hispánicos o Latinos , Adulto , Humanos , África/etnología , Américas/etnología , Europa (Continente)/etnología , Frecuencia de los Genes/genética , Genética de Población , Genoma Humano/genética , Técnicas de Genotipaje , Hispánicos o Latinos/genética , Homocigoto , Mutación con Pérdida de Función/genética , México , Estudios ProspectivosRESUMEN
Integrating human genomics and proteomics can help elucidate disease mechanisms, identify clinical biomarkers and discover drug targets1-4. Because previous proteogenomic studies have focused on common variation via genome-wide association studies, the contribution of rare variants to the plasma proteome remains largely unknown. Here we identify associations between rare protein-coding variants and 2,923 plasma protein abundances measured in 49,736 UK Biobank individuals. Our variant-level exome-wide association study identified 5,433 rare genotype-protein associations, of which 81% were undetected in a previous genome-wide association study of the same cohort5. We then looked at aggregate signals using gene-level collapsing analysis, which revealed 1,962 gene-protein associations. Of the 691 gene-level signals from protein-truncating variants, 99.4% were associated with decreased protein levels. STAB1 and STAB2, encoding scavenger receptors involved in plasma protein clearance, emerged as pleiotropic loci, with 77 and 41 protein associations, respectively. We demonstrate the utility of our publicly accessible resource through several applications. These include detailing an allelic series in NLRC4, identifying potential biomarkers for a fatty liver disease-associated variant in HSD17B13 and bolstering phenome-wide association studies by integrating protein quantitative trait loci with protein-truncating variants in collapsing analyses. Finally, we uncover distinct proteomic consequences of clonal haematopoiesis (CH), including an association between TET2-CH and increased FLT3 levels. Our results highlight a considerable role for rare variation in plasma protein abundance and the value of proteogenomics in therapeutic discovery.
Asunto(s)
Bancos de Muestras Biológicas , Proteínas Sanguíneas , Estudios de Asociación Genética , Genómica , Proteómica , Humanos , Alelos , Biomarcadores/sangre , Proteínas Sanguíneas/análisis , Proteínas Sanguíneas/genética , Bases de Datos Factuales , Exoma/genética , Hematopoyesis , Mutación , Plasma/química , Reino UnidoRESUMEN
The Pharma Proteomics Project is a precompetitive biopharmaceutical consortium characterizing the plasma proteomic profiles of 54,219 UK Biobank participants. Here we provide a detailed summary of this initiative, including technical and biological validations, insights into proteomic disease signatures, and prediction modelling for various demographic and health indicators. We present comprehensive protein quantitative trait locus (pQTL) mapping of 2,923 proteins that identifies 14,287 primary genetic associations, of which 81% are previously undescribed, alongside ancestry-specific pQTL mapping in non-European individuals. The study provides an updated characterization of the genetic architecture of the plasma proteome, contextualized with projected pQTL discovery rates as sample sizes and proteomic assay coverages increase over time. We offer extensive insights into trans pQTLs across multiple biological domains, highlight genetic influences on ligand-receptor interactions and pathway perturbations across a diverse collection of cytokines and complement networks, and illustrate long-range epistatic effects of ABO blood group and FUT2 secretor status on proteins with gastrointestinal tissue-enriched expression. We demonstrate the utility of these data for drug discovery by extending the genetic proxied effects of protein targets, such as PCSK9, on additional endpoints, and disentangle specific genes and proteins perturbed at loci associated with COVID-19 susceptibility. This public-private partnership provides the scientific community with an open-access proteomics resource of considerable breadth and depth to help to elucidate the biological mechanisms underlying proteo-genomic discoveries and accelerate the development of biomarkers, predictive models and therapeutics1.
Asunto(s)
Bancos de Muestras Biológicas , Proteínas Sanguíneas , Bases de Datos Factuales , Genómica , Salud , Proteoma , Proteómica , Humanos , Sistema del Grupo Sanguíneo ABO/genética , Proteínas Sanguíneas/análisis , Proteínas Sanguíneas/genética , COVID-19/genética , Descubrimiento de Drogas , Epistasis Genética , Fucosiltransferasas/metabolismo , Predisposición Genética a la Enfermedad , Plasma/química , Proproteína Convertasa 9/metabolismo , Proteoma/análisis , Proteoma/genética , Asociación entre el Sector Público-Privado , Sitios de Carácter Cuantitativo , Reino Unido , Galactósido 2-alfa-L-FucosiltransferasaRESUMEN
Rare genetic diseases affect millions, and identifying causal DNA variants is essential for patient care. Therefore, it is imperative to estimate the effect of each independent variant and improve their pathogenicity classification. Our study of 140 214 unrelated UK Biobank (UKB) participants found that each of them carries a median of 7 variants previously reported as pathogenic or likely pathogenic. We focused on 967 diagnostic-grade gene (DGG) variants for rare bleeding, thrombotic, and platelet disorders (BTPDs) observed in 12 367 UKB participants. By association analysis, for a subset of these variants, we estimated effect sizes for platelet count and volume, and odds ratios for bleeding and thrombosis. Variants causal of some autosomal recessive platelet disorders revealed phenotypic consequences in carriers. Loss-of-function variants in MPL, which cause chronic amegakaryocytic thrombocytopenia if biallelic, were unexpectedly associated with increased platelet counts in carriers. We also demonstrated that common variants identified by genome-wide association studies (GWAS) for platelet count or thrombosis risk may influence the penetrance of rare variants in BTPD DGGs on their associated hemostasis disorders. Network-propagation analysis applied to an interactome of 18 410 nodes and 571 917 edges showed that GWAS variants with large effect sizes are enriched in DGGs and their first-order interactors. Finally, we illustrate the modifying effect of polygenic scores for platelet count and thrombosis risk on disease severity in participants carrying rare variants in TUBB1 or PROC and PROS1, respectively. Our findings demonstrate the power of association analyses using large population datasets in improving pathogenicity classifications of rare variants.
Asunto(s)
Estudio de Asociación del Genoma Completo , Trombosis , Humanos , Bancos de Muestras Biológicas , Hemostasis , Hemorragia/genética , Enfermedades RarasRESUMEN
Genome-wide association studies (GWASs) have established the contribution of common and low-frequency variants to metabolic blood measurements in the UK Biobank (UKB). To complement existing GWAS findings, we assessed the contribution of rare protein-coding variants in relation to 355 metabolic blood measurements-including 325 predominantly lipid-related nuclear magnetic resonance (NMR)-derived blood metabolite measurements (Nightingale Health Plc) and 30 clinical blood biomarkers-using 412,393 exome sequences from four genetically diverse ancestries in the UKB. Gene-level collapsing analyses were conducted to evaluate a diverse range of rare-variant architectures for the metabolic blood measurements. Altogether, we identified significant associations (p < 1 × 10-8) for 205 distinct genes that involved 1,968 significant relationships for the Nightingale blood metabolite measurements and 331 for the clinical blood biomarkers. These include associations for rare non-synonymous variants in PLIN1 and CREB3L3 with lipid metabolite measurements and SYT7 with creatinine, among others, which may not only provide insights into novel biology but also deepen our understanding of established disease mechanisms. Of the study-wide significant clinical biomarker associations, 40% were not previously detected on analyzing coding variants in a GWAS in the same cohort, reinforcing the importance of studying rare variation to fully understand the genetic architecture of metabolic blood measurements.
Asunto(s)
Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Bancos de Muestras Biológicas , Biomarcadores , Lípidos , Reino Unido , Polimorfismo de Nucleótido SimpleRESUMEN
Synonymous mutations change the DNA sequence of a gene without affecting the amino acid sequence of the encoded protein. Although some synonymous mutations can affect RNA splicing, translational efficiency, and mRNA stability, studies in human genetics, mutagenesis screens, and other experiments and evolutionary analyses have repeatedly shown that most synonymous variants are neutral or only weakly deleterious, with some notable exceptions. Based on a recent study in yeast, there have been claims that synonymous mutations could be as important as nonsynonymous mutations in causing disease, assuming the yeast findings hold up and translate to humans. Here, we argue that there is insufficient evidence to overturn the large, coherent body of knowledge establishing the predominant neutrality of synonymous variants in the human genome.
Asunto(s)
Evolución Biológica , Saccharomyces cerevisiae , Humanos , Mutación/genética , Secuencia de Aminoácidos , Genoma Humano/genéticaRESUMEN
The druggability of targets is a crucial consideration in drug target selection. Here, we adopt a stochastic semi-supervised ML framework to develop DrugnomeAI, which estimates the druggability likelihood for every protein-coding gene in the human exome. DrugnomeAI integrates gene-level properties from 15 sources resulting in 324 features. The tool generates exome-wide predictions based on labelled sets of known drug targets (median AUC: 0.97), highlighting features from protein-protein interaction networks as top predictors. DrugnomeAI provides generic as well as specialised models stratified by disease type or drug therapeutic modality. The top-ranking DrugnomeAI genes were significantly enriched for genes previously selected for clinical development programs (p value < 1 × 10-308) and for genes achieving genome-wide significance in phenome-wide association studies of 450 K UK Biobank exomes for binary (p value = 1.7 × 10-5) and quantitative traits (p value = 1.6 × 10-7). We accompany our method with a web application ( http://drugnomeai.public.cgr.astrazeneca.com ) to visualise the druggability predictions and the key features that define gene druggability, per disease type and modality.
Asunto(s)
Aprendizaje Automático , Programas Informáticos , Humanos , Sistemas de Liberación de MedicamentosRESUMEN
Kidney disease is a complex disease with several different etiologies and underlying associated pathophysiology. This is reflected by the lack of effective treatment therapies in chronic kidney disease (CKD) that stop disease progression. However, novel strategies, recent scientific breakthroughs, and technological advances have revealed new possibilities for finding novel disease drivers in CKD. This review describes some of the latest advances in the field and brings them together in a more holistic framework as applied to identification and validation of disease drivers in CKD. It uses high-resolution 'patient-centric' omics data sets, advanced in silico tools (systems biology, connectivity mapping, and machine learning) and 'state-of-the-art' experimental systems (complex 3D systems in vitro, CRISPR gene editing, and various model biological systems in vivo). Application of such a framework is expected to increase the likelihood of successful identification of novel drug candidates based on strong human target validation and a better scientific understanding of underlying mechanisms.