Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 128
Filter
2.
Sci Transl Med ; 16(745): eade4510, 2024 May.
Article in English | MEDLINE | ID: mdl-38691621

ABSTRACT

Human inborn errors of immunity include rare disorders entailing functional and quantitative antibody deficiencies due to impaired B cells called the common variable immunodeficiency (CVID) phenotype. Patients with CVID face delayed diagnoses and treatments for 5 to 15 years after symptom onset because the disorders are rare (prevalence of ~1/25,000), and there is extensive heterogeneity in CVID phenotypes, ranging from infections to autoimmunity to inflammatory conditions, overlapping with other more common disorders. The prolonged diagnostic odyssey drives excessive system-wide costs before diagnosis. Because there is no single causal mechanism, there are no genetic tests to definitively diagnose CVID. Here, we present PheNet, a machine learning algorithm that identifies patients with CVID from their electronic health records (EHRs). PheNet learns phenotypic patterns from verified CVID cases and uses this knowledge to rank patients by likelihood of having CVID. PheNet could have diagnosed more than half of our patients with CVID 1 or more years earlier than they had been diagnosed. When applied to a large EHR dataset, followed by blinded chart review of the top 100 patients ranked by PheNet, we found that 74% were highly probable to have CVID. We externally validated PheNet using >6 million records from disparate medical systems in California and Tennessee. As artificial intelligence and machine learning make their way into health care, we show that algorithms such as PheNet can offer clinical benefits by expediting the diagnosis of rare diseases.


Subject(s)
Common Variable Immunodeficiency , Electronic Health Records , Humans , Common Variable Immunodeficiency/diagnosis , Machine Learning , Algorithms , Male , Female , Phenotype , Adult , Undiagnosed Diseases/diagnosis
3.
Nat Med ; 30(4): 1065-1074, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38443691

ABSTRACT

Type 2 diabetes (T2D) is a multifactorial disease with substantial genetic risk, for which the underlying biological mechanisms are not fully understood. In this study, we identified multi-ancestry T2D genetic clusters by analyzing genetic data from diverse populations in 37 published T2D genome-wide association studies representing more than 1.4 million individuals. We implemented soft clustering with 650 T2D-associated genetic variants and 110 T2D-related traits, capturing known and novel T2D clusters with distinct cardiometabolic trait associations across two independent biobanks representing diverse genetic ancestral populations (African, n = 21,906; Admixed American, n = 14,410; East Asian, n =2,422; European, n = 90,093; and South Asian, n = 1,262). The 12 genetic clusters were enriched for specific single-cell regulatory regions. Several of the polygenic scores derived from the clusters differed in distribution among ancestry groups, including a significantly higher proportion of lipodystrophy-related polygenic risk in East Asian ancestry. T2D risk was equivalent at a body mass index (BMI) of 30 kg m-2 in the European subpopulation and 24.2 (22.9-25.5) kg m-2 in the East Asian subpopulation; after adjusting for cluster-specific genetic risk, the equivalent BMI threshold increased to 28.5 (27.1-30.0) kg m-2 in the East Asian group. Thus, these multi-ancestry T2D genetic clusters encompass a broader range of biological mechanisms and provide preliminary insights to explain ancestry-associated differences in T2D risk profiles.


Subject(s)
Diabetes Mellitus, Type 2 , Humans , Diabetes Mellitus, Type 2/genetics , Genome-Wide Association Study , Risk Factors , Phenotype , Multifactorial Inheritance/genetics , Genetic Predisposition to Disease/genetics
4.
Am J Hum Genet ; 111(2): 323-337, 2024 Feb 01.
Article in English | MEDLINE | ID: mdl-38306997

ABSTRACT

Genome-wide association studies (GWASs) have uncovered susceptibility loci associated with psychiatric disorders such as bipolar disorder (BP) and schizophrenia (SCZ). However, most of these loci are in non-coding regions of the genome, and the causal mechanisms of the link between genetic variation and disease risk is unknown. Expression quantitative trait locus (eQTL) analysis of bulk tissue is a common approach used for deciphering underlying mechanisms, although this can obscure cell-type-specific signals and thus mask trait-relevant mechanisms. Although single-cell sequencing can be prohibitively expensive in large cohorts, computationally inferred cell-type proportions and cell-type gene expression estimates have the potential to overcome these problems and advance mechanistic studies. Using bulk RNA-seq from 1,730 samples derived from whole blood in a cohort ascertained from individuals with BP and SCZ, this study estimated cell-type proportions and their relation with disease status and medication. For each cell type, we found between 2,875 and 4,629 eGenes (genes with an associated eQTL), including 1,211 that are not found on the basis of bulk expression alone. We performed a colocalization test between cell-type eQTLs and various traits and identified hundreds of associations that occur between cell-type eQTLs and GWASs but that are not detected in bulk eQTLs. Finally, we investigated the effects of lithium use on the regulation of cell-type expression loci and found examples of genes that are differentially regulated according to lithium use. Our study suggests that applying computational methods to large bulk RNA-seq datasets of non-brain tissue can identify disease-relevant, cell-type-specific biology of psychiatric disorders and psychiatric medication.


Subject(s)
Genome-Wide Association Study , Lithium , Humans , Genome-Wide Association Study/methods , RNA-Seq , Quantitative Trait Loci/genetics , Phenotype , Polymorphism, Single Nucleotide , Genetic Predisposition to Disease
5.
Am J Hum Genet ; 111(2): 242-258, 2024 Feb 01.
Article in English | MEDLINE | ID: mdl-38211585

ABSTRACT

Tumor mutational burden (TMB), the total number of somatic mutations in the tumor, and copy number burden (CNB), the corresponding measure of aneuploidy, are established fundamental somatic features and emerging biomarkers for immunotherapy. However, the genetic and non-genetic influences on TMB/CNB and, critically, the manner by which they influence patient outcomes remain poorly understood. Here, we present a large germline-somatic study of TMB/CNB with >23,000 individuals across 17 cancer types, of which 12,000 also have extensive clinical, treatment, and overall survival (OS) measurements available. We report dozens of clinical associations with TMB/CNB, observing older age and male sex to have a strong effect on TMB and weaker impact on CNB. We additionally identified significant germline influences on TMB/CNB, including fine-scale European ancestry and germline polygenic risk scores (PRSs) for smoking, tanning, white blood cell counts, and educational attainment. We quantify the causal effect of exposures on somatic mutational processes using Mendelian randomization. Many of the identified features associated with TMB/CNB were additionally associated with OS for individuals treated at a single tertiary cancer center. For individuals receiving immunotherapy, we observed a complex relationship between PRSs for educational attainment, self-reported college attainment, TMB, and survival, suggesting that the influence of this biomarker may be substantially modified by socioeconomic status. While the accumulation of somatic alterations is a stochastic process, our work demonstrates that it can be shaped by host characteristics including germline genetics.


Subject(s)
Neoplasms , Humans , Male , Mutation/genetics , Neoplasms/genetics , Neoplasms/pathology , Immunotherapy , Biomarkers, Tumor/genetics , Germ Cells/pathology
6.
Nat Genet ; 56(2): 234-244, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38036780

ABSTRACT

Attention deficit hyperactivity disorder (ADHD) is a complex disorder that manifests variability in long-term outcomes and clinical presentations. The genetic contributions to such heterogeneity are not well understood. Here we show several genetic links to clinical heterogeneity in ADHD in a case-only study of 14,084 diagnosed individuals. First, we identify one genome-wide significant locus by comparing cases with ADHD and autism spectrum disorder (ASD) to cases with ADHD but not ASD. Second, we show that cases with ASD and ADHD, substance use disorder and ADHD, or first diagnosed with ADHD in adulthood have unique polygenic score (PGS) profiles that distinguish them from complementary case subgroups and controls. Finally, a PGS for an ASD diagnosis in ADHD cases predicted cognitive performance in an independent developmental cohort. Our approach uncovered evidence of genetic heterogeneity in ADHD, helping us to understand its etiology and providing a model for studies of other disorders.


Subject(s)
Attention Deficit Disorder with Hyperactivity , Autism Spectrum Disorder , Humans , Autism Spectrum Disorder/genetics , Attention Deficit Disorder with Hyperactivity/genetics , Multifactorial Inheritance/genetics
7.
Genome Biol ; 24(1): 281, 2023 Dec 07.
Article in English | MEDLINE | ID: mdl-38062486

ABSTRACT

GCLiPP is a global RNA interactome capture method that detects RNA-binding protein (RBP) occupancy transcriptome-wide. GCLiPP maps RBP-occupied sites at a higher resolution than phase separation-based techniques. GCLiPP sequence tags correspond with known RBP binding sites and are enriched for sites detected by RBP-specific crosslinking immunoprecipitation (CLIP) for abundant cytosolic RBPs. Comparison of human Jurkat T cells and mouse primary T cells uncovers shared peaks of GCLiPP signal across homologous regions of human and mouse 3' UTRs, including a conserved mRNA-destabilizing cis-regulatory element. GCLiPP signal overlapping with immune-related SNPs uncovers stabilizing cis-regulatory regions in CD5, STAT6, and IKZF1.


Subject(s)
RNA-Binding Proteins , Transcriptome , Animals , Humans , Mice , RNA-Binding Proteins/genetics , RNA-Binding Proteins/metabolism , Binding Sites/genetics , RNA/metabolism , Protein Binding , Immunoprecipitation
8.
Res Sq ; 2023 Nov 21.
Article in English | MEDLINE | ID: mdl-38045283

ABSTRACT

We present SLIViT, a deep-learning framework that accurately measures disease-related risk factors in volumetric biomedical imaging, such as magnetic resonance imaging (MRI) scans, optical coherence tomography (OCT) scans, and ultrasound videos. To evaluate SLIViT, we applied it to five different datasets of these three different data modalities tackling seven learning tasks (including both classification and regression) and found that it consistently and significantly outperforms domain-specific state-of-the-art models, typically improving performance (ROC AUC or correlation) by 0.1-0.4. Notably, compared to existing approaches, SLIViT can be applied even when only a small number of annotated training samples is available, which is often a constraint in medical applications. When trained on less than 700 annotated volumes, SLIViT obtained accuracy comparable to trained clinical specialists while reducing annotation time by a factor of 5,000 demonstrating its utility to automate and expedite ongoing research and other practical clinical scenarios.

9.
medRxiv ; 2023 Nov 03.
Article in English | MEDLINE | ID: mdl-37961495

ABSTRACT

South Africa is among the world's top eight TB burden countries, and despite a focus on HIV-TB co-infection, most of the population living with TB are not HIV co-infected. The disease is endemic across the country with 80-90% exposure by adulthood. We investigated epidemiological risk factors for tuberculosis (TB) in the Northern Cape Province, South Africa: an understudied TB endemic region with extreme TB incidence (645/100,000) and the lowest provincial population density. We leveraged the population's high TB incidence and community transmission to design a case-control study with population-based controls, reflecting similar mechanisms of exposure between the groups. We recruited 1,126 participants with suspected TB from 12 community health clinics, and generated a cohort of 878 individuals (cases =374, controls =504) after implementing our enrollment criteria. All participants were GeneXpert Ultra tested for active TB by a local clinic. We assessed important risk factors for active TB using logistic regression and random forest modeling. Additionally, a subset of individuals were genotyped to determine genome-wide ancestry components. Male gender had the strongest effect on TB risk (OR: 2.87 [95% CI: 2.1-3.8]); smoking and alcohol consumption did not significantly increase TB risk. We identified two interactions: age by socioeconomic status (SES) and birthplace by residence locality on TB risk (OR = 3.05, p = 0.016) - where rural birthplace but town residence was the highest risk category. Finally, participants had a majority Khoe-San ancestry, typically greater than 50%. Epidemiological risk factors for this cohort differ from other global populations. The significant interaction effects reflect rapid changes in SES and mobility over recent generations and strongly impact TB risk in the Northern Cape of South Africa. Our models show that such risk factors combined explain 16% of the variance (r2) in case/control status.

10.
Nat Genet ; 55(12): 2269-2276, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37985819

ABSTRACT

Biobanks that collect deep phenotypic and genomic data across many individuals have emerged as a key resource in human genetics. However, phenotypes in biobanks are often missing across many individuals, limiting their utility. We propose AutoComplete, a deep learning-based imputation method to impute or 'fill-in' missing phenotypes in population-scale biobank datasets. When applied to collections of phenotypes measured across ~300,000 individuals from the UK Biobank, AutoComplete substantially improved imputation accuracy over existing methods. On three traits with notable amounts of missingness, we show that AutoComplete yields imputed phenotypes that are genetically similar to the originally observed phenotypes while increasing the effective sample size by about twofold on average. Further, genome-wide association analyses on the resulting imputed phenotypes led to a substantial increase in the number of associated loci. Our results demonstrate the utility of deep learning-based phenotype imputation to increase power for genetic discoveries in existing biobank datasets.


Subject(s)
Deep Learning , Genome-Wide Association Study , Humans , Genome-Wide Association Study/methods , Genotype , Biological Specimen Banks , Polymorphism, Single Nucleotide , Phenotype
11.
bioRxiv ; 2023 Sep 12.
Article in English | MEDLINE | ID: mdl-37745394

ABSTRACT

The contribution of epistasis (interactions among genes or genetic variants) to human complex trait variation remains poorly understood. Methods that aim to explicitly identify pairs of genetic variants, usually single nucleotide polymorphisms (SNPs), associated with a trait suffer from low power due to the large number of hypotheses tested while also having to deal with the computational problem of searching over a potentially large number of candidate pairs. An alternate approach involves testing whether a single SNP modulates variation in a trait against a polygenic background. While overcoming the limitation of low power, such tests of polygenic or marginal epistasis (ME) are infeasible on Biobank-scale data where hundreds of thousands of individuals are genotyped over millions of SNPs. We present a method to test for ME of a SNP on a trait that is applicable to biobank-scale data. We performed extensive simulations to show that our method provides calibrated tests of ME. We applied our method to test for ME at SNPs that are associated with 53 quantitative traits across ≈ 300 K unrelated white British individuals in the UK Biobank (UKBB). Testing 15, 601 trait-loci associations that were significant in GWAS, we identified 16 trait-loci pairs across 12 traits that demonstrate strong evidence of ME signals (p-value p<5×10-853). We further partitioned the significant ME signals across the genome to identify 6 trait-loci pairs with evidence of local (within-chromosome) ME while 15 show evidence of distal (cross-chromosome) ME. Across the 16 trait-loci pairs, we document that the proportion of trait variance explained by ME is about 12x as large as that explained by the GWAS effects on average (range: 0.59 to 43.89). Our results show, for the first time, evidence of interaction effects between individual genetic variants and overall polygenic background modulating complex trait variation.

12.
medRxiv ; 2023 Sep 18.
Article in English | MEDLINE | ID: mdl-37745486

ABSTRACT

Over three percent of people carry a dominant pathogenic mutation, yet only a fraction of carriers develop disease (incomplete penetrance), and phenotypes from mutations in the same gene range from mild to severe (variable expressivity). Here, we investigate underlying mechanisms for this heterogeneity: variable variant effect sizes, carrier polygenic backgrounds, and modulation of carrier effect by genetic background (epistasis). We leveraged exomes and clinical phenotypes from the UK Biobank and the Mt. Sinai Bio Me Biobank to identify carriers of pathogenic variants affecting cardiometabolic traits. We employed recently developed methods to study these cohorts, observing strong statistical support and clinical translational potential for all three mechanisms of variable penetrance and expressivity. For example, scores from our recent model of variant pathogenicity were tightly correlated with phenotype amongst clinical variant carriers, they predicted effects of variants of unknown significance, and they distinguished gain- from loss-of-function variants. We also found that polygenic scores predicted phenotypes amongst pathogenic carriers and that epistatic effects can exceed main carrier effects by an order of magnitude.

13.
Am J Hum Genet ; 110(8): 1319-1329, 2023 08 03.
Article in English | MEDLINE | ID: mdl-37490908

ABSTRACT

Polygenic scores (PGSs) have emerged as a standard approach to predict phenotypes from genotype data in a wide array of applications from socio-genomics to personalized medicine. Traditional PGSs assume genotype data to be error-free, ignoring possible errors and uncertainties introduced from genotyping, sequencing, and/or imputation. In this work, we investigate the effects of genotyping error due to low coverage sequencing on PGS estimation. We leverage SNP array and low-coverage whole-genome sequencing data (lcWGS, median coverage 0.04×) of 802 individuals from the Dana-Farber PROFILE cohort to show that PGS error correlates with sequencing depth (p = 1.2 × 10-7). We develop a probabilistic approach that incorporates genotype error in PGS estimation to produce well-calibrated PGS credible intervals and show that the probabilistic approach increases classification accuracy by up to 6% as compared to traditional PGSs that ignore genotyping error. Finally, we use simulations to explore the combined effect of genotyping and effect size errors and their implication on PGS-based risk-stratification. Our results illustrate the importance of considering genotyping error as a source of PGS error especially for cohorts with varying genotyping technologies and/or low-coverage sequencing.


Subject(s)
Genomics , Polymorphism, Single Nucleotide , Uncertainty , Genotype , Genomics/methods , Whole Genome Sequencing , Polymorphism, Single Nucleotide/genetics
14.
Nat Med ; 29(7): 1845-1856, 2023 07.
Article in English | MEDLINE | ID: mdl-37464048

ABSTRACT

An individual's disease risk is affected by the populations that they belong to, due to shared genetics and environmental factors. The study of fine-scale populations in clinical care is important for identifying and reducing health disparities and for developing personalized interventions. To assess patterns of clinical diagnoses and healthcare utilization by fine-scale populations, we leveraged genetic data and electronic medical records from 35,968 patients as part of the UCLA ATLAS Community Health Initiative. We defined clusters of individuals using identity by descent, a form of genetic relatedness that utilizes shared genomic segments arising due to a common ancestor. In total, we identified 376 clusters, including clusters with patients of Afro-Caribbean, Puerto Rican, Lebanese Christian, Iranian Jewish and Gujarati ancestry. Our analysis uncovered 1,218 significant associations between disease diagnoses and clusters and 124 significant associations with specialty visits. We also examined the distribution of pathogenic alleles and found 189 significant alleles at elevated frequency in particular clusters, including many that are not regularly included in population screening efforts. Overall, this work progresses the understanding of health in understudied communities and can provide the foundation for further study into health inequities.


Subject(s)
Delivery of Health Care , Patient Acceptance of Health Care , Humans , Los Angeles , Iran , Ethnicity
15.
bioRxiv ; 2023 May 25.
Article in English | MEDLINE | ID: mdl-37293101

ABSTRACT

Genome-wide association studies (GWAS) have uncovered susceptibility loci associated with psychiatric disorders like bipolar disorder (BP) and schizophrenia (SCZ). However, most of these loci are in non-coding regions of the genome with unknown causal mechanisms of the link between genetic variation and disease risk. Expression quantitative trait loci (eQTL) analysis of bulk tissue is a common approach to decipher underlying mechanisms, though this can obscure cell-type specific signals thus masking trait-relevant mechanisms. While single-cell sequencing can be prohibitively expensive in large cohorts, computationally inferred cell type proportions and cell type gene expression estimates have the potential to overcome these problems and advance mechanistic studies. Using bulk RNA-Seq from 1,730 samples derived from whole blood in a cohort ascertained for individuals with BP and SCZ this study estimated cell type proportions and their relation with disease status and medication. We found between 2,875 and 4,629 eGenes for each cell type, including 1,211 eGenes that are not found using bulk expression alone. We performed a colocalization test between cell type eQTLs and various traits and identified hundreds of associations between cell type eQTLs and GWAS loci that are not detected in bulk eQTLs. Finally, we investigated the effects of lithium use on cell type expression regulation and found examples of genes that are differentially regulated dependent on lithium use. Our study suggests that computational methods can be applied to large bulk RNA-Seq datasets of non-brain tissue to identify disease-relevant, cell type specific biology of psychiatric disorders and psychiatric medication.

16.
Nat Genet ; 55(6): 952-963, 2023 06.
Article in English | MEDLINE | ID: mdl-37231098

ABSTRACT

We explored ancestry-related differences in the genetic architecture of whole-blood gene expression using whole-genome and RNA sequencing data from 2,733 African Americans, Puerto Ricans and Mexican Americans. We found that heritability of gene expression significantly increased with greater proportions of African genetic ancestry and decreased with higher proportions of Indigenous American ancestry, reflecting the relationship between heterozygosity and genetic variance. Among heritable protein-coding genes, the prevalence of ancestry-specific expression quantitative trait loci (anc-eQTLs) was 30% in African ancestry and 8% for Indigenous American ancestry segments. Most anc-eQTLs (89%) were driven by population differences in allele frequency. Transcriptome-wide association analyses of multi-ancestry summary statistics for 28 traits identified 79% more gene-trait associations using transcriptome prediction models trained in our admixed population than models trained using data from the Genotype-Tissue Expression project. Our study highlights the importance of measuring gene expression across large and ancestrally diverse populations for enabling new discoveries and reducing disparities.


Subject(s)
Black or African American , Hispanic or Latino , Mexican Americans , Humans , Black or African American/genetics , Genome-Wide Association Study , Hispanic or Latino/genetics , Mexican Americans/genetics , Phenotype , Polymorphism, Single Nucleotide , Transcriptome
17.
Graefes Arch Clin Exp Ophthalmol ; 261(8): 2245-2255, 2023 Aug.
Article in English | MEDLINE | ID: mdl-36917316

ABSTRACT

BACKGROUND: This study evaluated the relationship between statin use and the age of onset of age-related macular degeneration (AMD). METHODS: Electronic Health Records from 52,840 patients evaluated at University of California Los Angeles (UCLA) Ophthalmology Clinics and 9,977 patients evaluated at University of California San Francisco (UCSF) Ophthalmology Clinics were screened. Survival analysis was performed using Cox proportional hazards regression models and visualized using Kaplan Meier survival curves, with the following covariates-sex, ethnicity, smoking history, fluoxetine use, obesity, diabetes mellitus, and hypertension. RESULTS: 5,498 of 52,840 patients at UCLA were diagnosed with AMD. Statin use was associated with a later AMD onset (HR = 0.8823, p < 0.0001), while female sex (HR = 1.0852, p= 00,035), obesity (HR = 1.4555, p < 0.0001), and fluoxetine (HR = 1.3797, p= 0.0003) were associated with an earlier AMD onset. Non-hispanic black (HR = 0.5687, p < 0.0001) and hispanic ethnicities (HR = 0.8269, p= 0.0028) were associated with a later AMD onset. When stratifying for ethnicity, statins, fluoxetine, sex, and obesity were significant only within non-hispanic white subjects. Statin use was significant among patients with dry AMD (HR = 0.8410, p= 0.0001) but not wet AMD (0.9188, p= 0.0351). In the replication cohort, 526 of 9,977 patients at UCSF had AMD. Associations between statins (HR = 0.7643, p= 0.0033), non-hispanic black ethnicity (HR = 0.5043, p= 0.0035), and obesity (HR = 1.9602, p < 0.0001) on AMD onset were confirmed. CONCLUSIONS: In both cohorts, statin use and non-hispanic black ethnicity are associated with a later AMD onset, while obesity with an earlier AMD onset.


Subject(s)
Hydroxymethylglutaryl-CoA Reductase Inhibitors , Macular Degeneration , Humans , Female , Retrospective Studies , Age of Onset , Fluoxetine , Risk Factors , Obesity
18.
J Allergy Clin Immunol ; 151(6): 1503-1512, 2023 06.
Article in English | MEDLINE | ID: mdl-36796456

ABSTRACT

BACKGROUND: Albuterol is the drug most widely used as asthma treatment among African Americans despite having a lower bronchodilator drug response (BDR) than other populations. Although BDR is affected by gene and environmental factors, the influence of DNA methylation is unknown. OBJECTIVE: This study aimed to identify epigenetic markers in whole blood associated with BDR, study their functional consequences by multi-omic integration, and assess their clinical applicability in admixed populations with a high asthma burden. METHODS: We studied 414 children and young adults (8-21 years old) with asthma in a discovery and replication design. We performed an epigenome-wide association study on 221 African Americans and replicated the results on 193 Latinos. Functional consequences were assessed by integrating epigenomics with genomics, transcriptomics, and environmental exposure data. Machine learning was used to develop a panel of epigenetic markers to classify treatment response. RESULTS: We identified 5 differentially methylated regions and 2 CpGs genome-wide significantly associated with BDR in African Americans located in FGL2 (cg08241295, P = 6.8 × 10-9) and DNASE2 (cg15341340, P = 7.8 × 10-8), which were regulated by genetic variation and/or associated with gene expression of nearby genes (false discovery rate < 0.05). The CpG cg15341340 was replicated in Latinos (P = 3.5 × 10-3). Moreover, a panel of 70 CpGs showed good classification for those with response and nonresponse to albuterol therapy in African American and Latino children (area under the receiver operating characteristic curve for training, 0.99; for validation, 0.70-0.71). The DNA methylation model showed similar discrimination as clinical predictors (P > .05). CONCLUSIONS: We report novel associations of epigenetic markers with BDR in pediatric asthma and demonstrate for the first time the applicability of pharmacoepigenetics in precision medicine of respiratory diseases.


Subject(s)
Asthma , Bronchodilator Agents , Child , Young Adult , Humans , Adolescent , Adult , Bronchodilator Agents/therapeutic use , Epigenome , Multiomics , Asthma/drug therapy , Asthma/genetics , Asthma/metabolism , Albuterol/therapeutic use , DNA Methylation , Genome-Wide Association Study , Fibrinogen/metabolism
19.
bioRxiv ; 2023 Jan 06.
Article in English | MEDLINE | ID: mdl-36711575

ABSTRACT

Defining and accounting for subphenotypic structure has the potential to increase statistical power and provide a deeper understanding of the heterogeneity in the molecular basis of complex disease. Existing phenotype subtyping methods primarily rely on clinically observed heterogeneity or metadata clustering. However, they generally tend to capture the dominant sources of variation in the data, which often originate from variation that is not descriptive of the mechanistic heterogeneity of the phenotype of interest; in fact, such dominant sources of variation, such as population structure or technical variation, are, in general, expected to be independent of subphenotypic structure. We instead aim to find a subspace with signal that is unique to a group of samples for which we believe that subphenotypic variation exists (e.g., cases of a disease). To that end, we introduce Phenotype Aware Components Analysis (PACA), a contrastive learning approach leveraging canonical correlation analysis to robustly capture weak sources of subphenotypic variation. In the context of disease, PACA learns a gradient of variation unique to cases in a given dataset, while leveraging control samples for accounting for variation and imbalances of biological and technical confounders between cases and controls. We evaluated PACA using an extensive simulation study, as well as on various subtyping tasks using genotypes, transcriptomics, and DNA methylation data. Our results provide multiple strong evidence that PACA allows us to robustly capture weak unknown variation of interest while being calibrated and well-powered, far superseding the performance of alternative methods. This renders PACA as a state-of-the-art tool for defining de novo subtypes that are more likely to reflect molecular heterogeneity, especially in challenging cases where the phenotypic heterogeneity may be masked by a myriad of strong unrelated effects in the data.

20.
bioRxiv ; 2023 Dec 13.
Article in English | MEDLINE | ID: mdl-38168200

ABSTRACT

Understanding the contribution of gene-environment interactions (GxE) to complex trait variation can provide insights into mechanisms underlying disease risk, explain sources of heritability, and improve the accuracy of genetic risk prediction. While biobanks that collect genetic and deep phenotypic data over large numbers of individuals offer the promise of obtaining novel insights into GxE, our understanding of the architecture of GxE in complex traits remains limited. We introduce a method that can estimate the proportion of trait variance explained by GxE (GxE heritability) and additive genetic effects (additive heritability) across the genome and within specific genomic annotations. We show that our method is accurate in simulations and computationally efficient for biobank-scale datasets. We applied our method to ≈ 500, 000 common array SNPs (MAF ≥ 1%), fifty quantitative traits, and four environmental variables (smoking, sex, age, and statin usage) measured across ≈ 300, 000 unrelated white British individuals in the UK Biobank. We found 69 trait-environmental variable pairs with significant genome-wide GxE heritability (p < 0.05/200 correcting for the number of trait-E pairs tested) with an average ratio of GxE to additive heritability ≈ 6.8% that include BMI with smoking (ratio of GxE to additive heritability = 6.3 ± 1.1%), WHR (waist-to-hip ratio adjusted for BMI) with sex (ratio = 19.6 ± 2%), LDL cholesterol with age (ratio = 9.8 ± 3.9%), and HbA1c with statin usage (ratio = 11 ± 2%). Analyzing nearly 8 million common and low-frequency imputed SNPs (MAF ≥ 0.1%), we document an increase in genome-wide GxE heritability of about 28% on average over array SNPs. We partitioned GxE heritability across minor allele frequency (MAF) and local linkage disequilibrium values (LD score) of each SNP to observe that analogous to the relationship that has been observed for additive allelic effects, the magnitude of GxE allelic effects tends to increase with decreasing MAF and LD. Testing whether GxE heritability is enriched around genes that are highly expressed in specific tissues, we find significant tissue-specific enrichments that include brain-specific enrichment for BMI and Basal Metabolic Rate in the context of smoking, adipose-specific enrichment for WHR in the context of sex, and cardiovascular tissue-specific enrichment for total cholesterol in the context of age. Our analyses provide detailed insights into the architecture of GxE underlying complex traits.

SELECTION OF CITATIONS
SEARCH DETAIL
...