RESUMEN
While blood gene signatures have shown promise in tuberculosis (TB) diagnosis and treatment monitoring, most signatures derived from a single cohort may be insufficient to capture TB heterogeneity in populations and individuals. Here we report a new generalized approach combining a network-based meta-analysis with machine-learning modeling to leverage the power of heterogeneity among studies. The transcriptome datasets from 57 studies (37 TB and 20 viral infections) across demographics and TB disease states were used for gene signature discovery and model training and validation. The network-based meta-analysis identified a common 45-gene signature specific to active TB disease across studies. Two optimized random forest regression models, using the full or partial 45-gene signature, were then established to model the continuum from Mycobacterium tuberculosis infection to disease and treatment response. In model validation, using pooled multi-cohort datasets to mimic the real-world setting, the model provides robust predictive performance for incipient to active TB risk over a 2.5-year period with an AUROC of 0.85, 74.2% sensitivity, and 78.3% specificity, which approximates the minimum criteria (>75% sensitivity and >75% specificity) within the WHO target product profile for prediction of progression to TB. Moreover, the model strongly discriminates active TB from viral infection (AUROC 0.93, 95% CI 0.91-0.94). For treatment monitoring, the TB scores generated by the model statistically correlate with treatment responses over time and were predictive, even before treatment initiation, of standard treatment clinical outcomes. We demonstrate an end-to-end gene signature model development scheme that considers heterogeneity for TB risk estimation and treatment monitoring.
Asunto(s)
Mycobacterium tuberculosis , Tuberculosis , Humanos , Mycobacterium tuberculosis/genética , Tuberculosis/diagnóstico , Tuberculosis/tratamiento farmacológico , Tuberculosis/genética , Transcriptoma/genética , Resultado del Tratamiento , Progresión de la EnfermedadRESUMEN
Data within biobanks capture broad yet detailed indices of human variation, but biobank-wide insights can be difficult to extract due to complexity and scale. Here, using large-scale factor analysis, we distill hundreds of variables (diagnoses, assessments and survey items) into 35 latent constructs, using data from unrelated individuals with predominantly estimated European genetic ancestry in UK Biobank. These factors recapitulate known disease classifications, disentangle elements of socioeconomic status, highlight the relevance of psychiatric constructs to health and improve measurement of pro-health behaviours. We go on to demonstrate the power of this approach to clarify genetic signal, enhance discovery and identify associations between underlying phenotypic structure and health outcomes. In building a deeper understanding of ways in which constructs such as socioeconomic status, trauma, or physical activity are structured in the dataset, we emphasize the importance of considering the interwoven nature of the human phenome when evaluating public health patterns.
Asunto(s)
Bancos de Muestras Biológicas , Fenotipo , Humanos , Reino Unido , Masculino , Femenino , Clase Social , Persona de Mediana Edad , Biobanco del Reino UnidoRESUMEN
Classical statistical genetics theory defines dominance as any deviation from a purely additive, or dosage, effect of a genotype on a trait, which is known as the dominance deviation. Dominance is well documented in plant and animal breeding. Outside of rare monogenic traits, however, evidence in humans is limited. We systematically examined common genetic variation across 1060 traits in a large population cohort (UK Biobank, N = 361,194 samples analyzed) for evidence of dominance effects. We then developed a computationally efficient method to rapidly assess the aggregate contribution of dominance deviations to heritability. Lastly, observing that dominance associations are inherently less correlated between sites at a genomic locus than their additive counterparts, we explored whether they may be leveraged to identify causal variants more confidently.
Asunto(s)
Bancos de Muestras Biológicas , Genes Dominantes , Variación Genética , Herencia Multifactorial , Animales , Humanos , Cruzamiento , Genotipo , Modelos Genéticos , Fenotipo , Polimorfismo de Nucleótido Simple , Reino UnidoRESUMEN
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMEN
Bipolar disorder is a highly heritable psychiatric disorder. We performed a genome-wide association study (GWAS) including 20,352 cases and 31,358 controls of European descent, with follow-up analysis of 822 variants with P < 1 × 10-4 in an additional 9,412 cases and 137,760 controls. Eight of the 19 variants that were genome-wide significant (P < 5 × 10-8) in the discovery GWAS were not genome-wide significant in the combined analysis, consistent with small effect sizes and limited power but also with genetic heterogeneity. In the combined analysis, 30 loci were genome-wide significant, including 20 newly identified loci. The significant loci contain genes encoding ion channels, neurotransmitter transporters and synaptic components. Pathway analysis revealed nine significantly enriched gene sets, including regulation of insulin secretion and endocannabinoid signaling. Bipolar I disorder is strongly genetically correlated with schizophrenia, driven by psychosis, whereas bipolar II disorder is more strongly correlated with major depressive disorder. These findings address key clinical questions and provide potential biological mechanisms for bipolar disorder.
Asunto(s)
Trastorno Bipolar/genética , Sitios Genéticos , Trastorno Bipolar/clasificación , Estudios de Casos y Controles , Trastorno Depresivo Mayor/genética , Femenino , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Polimorfismo de Nucleótido Simple , Trastornos Psicóticos/genética , Esquizofrenia/genética , Biología de SistemasRESUMEN
To discover novel genes underlying amyotrophic lateral sclerosis (ALS), we aggregated exomes from 3,864 cases and 7,839 ancestry-matched controls. We observed a significant excess of rare protein-truncating variants among ALS cases, and these variants were concentrated in constrained genes. Through gene level analyses, we replicated known ALS genes including SOD1, NEK1 and FUS. We also observed multiple distinct protein-truncating variants in a highly constrained gene, DNAJC7. The signal in DNAJC7 exceeded genome-wide significance, and immunoblotting assays showed depletion of DNAJC7 protein in fibroblasts in a patient with ALS carrying the p.Arg156Ter variant. DNAJC7 encodes a member of the heat-shock protein family, HSP40, which, along with HSP70 proteins, facilitates protein homeostasis, including folding of newly synthesized polypeptides and clearance of degraded proteins. When these processes are not regulated, misfolding and accumulation of aberrant proteins can occur and lead to protein aggregation, which is a pathological hallmark of neurodegeneration. Our results highlight DNAJC7 as a novel gene for ALS.