RESUMEN
OBJECTIVE: Individuals living with obesity are differentially susceptible to cardiometabolic diseases. We hypothesized that an integrative multi-omics approach might improve identification of subgroups of individuals with obesity who have distinct cardiometabolic disease patterns. METHODS: We performed machine learning-based, integrative unsupervised clustering to identify proteomics- and metabolomics-defined subpopulations of individuals living with obesity (BMI ≥ 30 kg/m2), leveraging data from 243 individuals in the Multi-Ethnic Study of Atherosclerosis (MESA) cohort. Omics that contributed to the observed clusters were functionally characterized. We performed multivariate regression to assess whether the individuals in each cluster demonstrated differential patterns of cardiometabolic traits. RESULTS: We identified two distinct clusters (iCluster1 and 2). iCluster2 had significantly higher average BMI values, fasting blood glucose, and inflammation. iCluster1 was associated with higher levels of total cholesterol and high-density lipoprotein cholesterol. Pathways mediating cell growth, lipogenesis, and energy expenditures were positively associated with iCluster1. Inflammatory response and insulin resistance pathways were positively associated with iCluster2. CONCLUSIONS: Although the two identified clusters may represent progressive obesity-related pathologic processes measured at different stages, other mechanisms in combination could also underpin the identified clusters given no significant age difference between the comparative groups. For instance, clusters may reflect differences in dietary/behavioral patterns or differential rates of metabolic damage.
Asunto(s)
Aprendizaje Automático , Obesidad , Humanos , Obesidad/metabolismo , Femenino , Masculino , Persona de Mediana Edad , Anciano , Análisis por Conglomerados , Metabolómica/métodos , Índice de Masa Corporal , Proteómica/métodos , Inflamación , Glucemia/metabolismo , Resistencia a la Insulina , Anciano de 80 o más Años , MultiómicaRESUMEN
AIMS: Proteomic profiling offers an expansive approach to biomarker discovery and mechanistic hypothesis generation for LV remodelling, a critical component of heart failure (HF). We sought to identify plasma proteins cross-sectionally associated with left ventricular (LV) size and geometry in a diverse population-based cohort without known cardiovascular disease (CVD). METHODS AND RESULTS: Among participants of the Multi-Ethnic Study of Atherosclerosis (MESA), we quantified plasma abundances of 1305 proteins using an aptamer-based platform at exam 1 (2000-2002) and exam 5 (2010-2011) and assessed LV structure by cardiac magnetic resonance (CMR) at the same time points. We used multivariable linear regression with robust variance to assess cross-sectional associations between plasma protein abundances and LV structural characteristics at exam 1, reproduced findings in later-life at exam 5, and explored relationships of associated proteins using annotated enrichment analysis. We studied 763 participants (mean age 60 ± 10 years at exam 1; 53% female; 19% Black race; 31% Hispanic ethnicity). Following adjustment for renal function and traditional CVD risk factors, plasma levels of 3 proteins were associated with LV mass index at both time points with the same directionality (FDR < 0.05): leptin (LEP), renin (REN), and cathepsin-D (CTSD); 20 with LV end-diastolic volume index: LEP, NT-proBNP, histone-lysine N-methyltransferase (EHMT2), chordin-like protein 1 (CHRDL1), tumour necrosis factor-inducible gene 6 protein (TNFAIP6), NT-3 growth factor receptor (NTRK3), c5a anaphylatoxin (C5), neurogenic locus notch homologue protein 3 (NOTCH3), ephrin-B2 (EFNB2), osteomodulin (OMD), contactin-4 (CNTN4), gelsolin (GSN), stromal cell-derived factor 1 (CXCL12), calcineurin subunit B type 1 (PPP3R1), insulin-like growth factor 1 receptor (IGF1R), bone sialoprotein 2 (IBSP), interleukin-11 (IL-11), follistatin-related protein 1 (FSTL1), periostin (POSTN), and biglycan (BGN); and 4 with LV mass-to-volume ratio: RGM domain family member B (RGMB), transforming growth factor beta receptor type 3 (TGFBR3), ephrin-A2 (EFNA2), and cell adhesion molecule 3 (CADM3). Functional annotation implicated regulation of the PI3K-Akt pathway, bone morphogenic protein signalling, and cGMP-mediated signalling. CONCLUSIONS: We report proteomic profiling of LV size and geometry, which identified novel associations and reinforced previous findings on biomarker candidates for LV remodelling and HF. If validated, these proteins may help refine risk prediction and identify novel therapeutic targets for HF.
RESUMEN
AIMS/HYPOTHESIS: Several studies have reported associations between specific proteins and type 2 diabetes risk in European populations. To better understand the role played by proteins in type 2 diabetes aetiology across diverse populations, we conducted a large proteome-wide association study using genetic instruments across four racial and ethnic groups: African; Asian; Hispanic/Latino; and European. METHODS: Genome and plasma proteome data from the Multi-Ethnic Study of Atherosclerosis (MESA) study involving 182 African, 69 Asian, 284 Hispanic/Latino and 409 European individuals residing in the USA were used to establish protein prediction models by using potentially associated cis- and trans-SNPs. The models were applied to genome-wide association study summary statistics of 250,127 type 2 diabetes cases and 1,222,941 controls from different racial and ethnic populations. RESULTS: We identified three, 44 and one protein associated with type 2 diabetes risk in Asian, European and Hispanic/Latino populations, respectively. Meta-analysis identified 40 proteins associated with type 2 diabetes risk across the populations, including well-established as well as novel proteins not yet implicated in type 2 diabetes development. CONCLUSIONS/INTERPRETATION: Our study improves our understanding of the aetiology of type 2 diabetes in diverse populations. DATA AVAILABILITY: The summary statistics of multi-ethnic type 2 diabetes GWAS of MVP, DIAMANTE, Biobank Japan and other studies are available from The database of Genotypes and Phenotypes (dbGaP) under accession number phs001672.v3.p1. MESA genetic, proteome and covariate data can be accessed through dbGaP under phs000209.v13.p3. All code is available on GitHub ( https://github.com/Arthur1021/MESA-1K-PWAS ).
RESUMEN
BACKGROUNDMost GWAS of plasma proteomics have focused on White individuals of European ancestry, limiting biological insight from other ancestry-enriched protein quantitative loci (pQTL).METHODSWe conducted a discovery GWAS of approximately 3,000 plasma proteins measured by the antibody-based Olink platform in 1,054 Black adults from the Jackson Heart Study (JHS) and validated our findings in the Multi-Ethnic Study of Atherosclerosis (MESA). The genetic architecture of identified pQTLs was further explored through fine mapping and admixture association analysis. Finally, using our pQTL findings, we performed a phenome-wide association study (PheWAS) across 2 large multiethnic electronic health record (EHR) systems in All of Us and BioMe.RESULTSWe identified 1,002 pQTLs for 925 protein assays. Fine mapping and admixture analyses suggested allelic heterogeneity of the plasma proteome across diverse populations. We identified associations for variants enriched in African ancestry, many in diseases that lack precise biomarkers, including cis-pQTLs for cathepsin L (CTSL) and Siglec-9, which were linked with sarcoidosis and non-Hodgkin's lymphoma, respectively. We found concordant associations across clinical diagnoses and laboratory measurements, elucidating disease pathways, including a cis-pQTL associated with circulating CD58, WBC count, and multiple sclerosis.CONCLUSIONSOur findings emphasize the value of leveraging diverse populations to enhance biological insights from proteomics GWAS, and we have made this resource readily available as an interactive web portal.FUNDINGNIH K08 HL161445-01A1; 5T32HL160522-03; HHSN268201600034I; HL133870.
Asunto(s)
Negro o Afroamericano , Registros Electrónicos de Salud , Estudio de Asociación del Genoma Completo , Proteogenómica , Humanos , Negro o Afroamericano/genética , Femenino , Masculino , Proteogenómica/métodos , Persona de Mediana Edad , Sitios de Carácter Cuantitativo , Anciano , Adulto , Proteínas Sanguíneas/genéticaRESUMEN
Most gene expression and alternative splicing quantitative trait loci (eQTL/sQTL) studies have been biased toward European ancestry individuals. Here, we performed eQTL and sQTL analyses using TOPMed whole-genome sequencing-derived genotype data and RNA-sequencing data from stored peripheral blood mononuclear cells in 1,012 African American participants from the Jackson Heart Study (JHS). At a false discovery rate of 5%, we identified 17,630 unique eQTL credible sets covering 16,538 unique genes; and 24,525 unique sQTL credible sets covering 9,605 unique genes, with lead QTL at P < 5e-8. About 24% of independent eQTLs and independent sQTLs with a minor allele frequency > 1% in JHS were rare (minor allele frequency < 0.1%), and therefore unlikely to be detected, in European ancestry individuals. Finally, we created an open database, which is freely available online, allowing fast query and bulk download of our QTL results.
Asunto(s)
Negro o Afroamericano , Sitios de Carácter Cuantitativo , Humanos , Negro o Afroamericano/genética , Empalme Alternativo , Masculino , Frecuencia de los Genes , Leucocitos Mononucleares/metabolismo , Femenino , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido SimpleRESUMEN
Chronic kidney disease (CKD) impacts about 1 in 7 adults in the United States, but African Americans (AAs) carry a disproportionately higher burden of disease. Epigenetic modifications, such as DNA methylation at cytosine-phosphate-guanine (CpG) sites, have been linked to kidney function and may have clinical utility in predicting the risk of CKD. Given the dynamic relationship between the epigenome, environment, and disease, AAs may be especially sensitive to environment-driven methylation alterations. Moreover, risk models incorporating CpG methylation have been shown to predict disease across multiple racial groups. In this study, we developed a methylation risk score (MRS) for CKD in cohorts of AAs. We selected nine CpG sites that were previously reported to be associated with estimated glomerular filtration rate (eGFR) in epigenome-wide association studies to construct a MRS in the Hypertension Genetic Epidemiology Network (HyperGEN). In logistic mixed models, the MRS was significantly associated with prevalent CKD and was robust to multiple sensitivity analyses, including CKD risk factors. There was modest replication in validation cohorts. In summary, we demonstrated that an eGFR-based CpG score is an independent predictor of prevalent CKD, suggesting that MRS should be further investigated for clinical utility in evaluating CKD risk and progression.
Asunto(s)
Islas de CpG , Metilación de ADN , Tasa de Filtración Glomerular , Insuficiencia Renal Crónica , Humanos , Insuficiencia Renal Crónica/genética , Insuficiencia Renal Crónica/epidemiología , Masculino , Femenino , Persona de Mediana Edad , Factores de Riesgo , Negro o Afroamericano/genética , Anciano , Estudio de Asociación del Genoma Completo , Epigénesis Genética , Adulto , Predisposición Genética a la EnfermedadRESUMEN
Inflammation biomarkers can provide valuable insight into the role of inflammatory processes in many diseases and conditions. Sequencing based analyses of such biomarkers can also serve as an exemplar of the genetic architecture of quantitative traits. To evaluate the biological insight, which can be provided by a multi-ancestry, whole-genome based association study, we performed a comprehensive analysis of 21 inflammation biomarkers from up to 38 465 individuals with whole-genome sequencing from the Trans-Omics for Precision Medicine (TOPMed) program (with varying sample size by trait, where the minimum sample size was n = 737 for MMP-1). We identified 22 distinct single-variant associations across 6 traits-E-selectin, intercellular adhesion molecule 1, interleukin-6, lipoprotein-associated phospholipase A2 activity and mass, and P-selectin-that remained significant after conditioning on previously identified associations for these inflammatory biomarkers. We further expanded upon known biomarker associations by pairing the single-variant analysis with a rare variant set-based analysis that further identified 19 significant rare variant set-based associations with 5 traits. These signals were distinct from both significant single variant association signals within TOPMed and genetic signals observed in prior studies, demonstrating the complementary value of performing both single and rare variant analyses when analyzing quantitative traits. We also confirm several previously reported signals from semi-quantitative proteomics platforms. Many of these signals demonstrate the extensive allelic heterogeneity and ancestry-differentiated variant-trait associations common for inflammation biomarkers, a characteristic we hypothesize will be increasingly observed with well-powered, large-scale analyses of complex traits.
Asunto(s)
Biomarcadores , Estudio de Asociación del Genoma Completo , Inflamación , Medicina de Precisión , Secuenciación Completa del Genoma , Humanos , Medicina de Precisión/métodos , Inflamación/genética , Estudio de Asociación del Genoma Completo/métodos , Secuenciación Completa del Genoma/métodos , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Predisposición Genética a la Enfermedad , Femenino , Interleucina-6/genéticaRESUMEN
Regulation of transcription and translation are mechanisms through which genetic variants affect complex traits. Expression quantitative trait locus (eQTL) studies have been more successful at identifying cis-eQTL (within 1 Mb of the transcription start site) than trans-eQTL. Here, we tested the cis component of gene expression for association with observed plasma protein levels to identify cis- and trans-acting genes that regulate protein levels. We used transcriptome prediction models from 49 Genotype-Tissue Expression (GTEx) Project tissues to predict the cis component of gene expression and tested the predicted expression of every gene in every tissue for association with the observed abundance of 3,622 plasma proteins measured in 3,301 individuals from the INTERVAL study. We tested significant results for replication in 971 individuals from the Trans-omics for Precision Medicine (TOPMed) Multi-Ethnic Study of Atherosclerosis (MESA). We found 1,168 and 1,210 cis- and trans-acting associations that replicated in TOPMed (FDR < 0.05) with a median expected true positive rate (π1) across tissues of 0.806 and 0.390, respectively. The target proteins of trans-acting genes were enriched for transcription factor binding sites and autoimmune diseases in the GWAS catalog. Furthermore, we found a higher correlation between predicted expression and protein levels of the same underlying gene (R = 0.17) than observed expression (R = 0.10, p = 7.50 × 10-11). This indicates the cis-acting genetically regulated (heritable) component of gene expression is more consistent across tissues than total observed expression (genetics + environment) and is useful in uncovering the function of SNPs associated with complex traits.
Asunto(s)
Proteoma , Transcriptoma , Humanos , Transcriptoma/genética , Proteoma/genética , Herencia Multifactorial , Sitios de Carácter Cuantitativo/genética , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
Rationale: Chronic obstructive pulmonary disease (COPD) and emphysema are associated with endothelial damage and altered pulmonary microvascular perfusion. The molecular mechanisms underlying these changes are poorly understood in patients, in part because of the inaccessibility of the pulmonary vasculature. Peripheral blood mononuclear cells (PBMCs) interact with the pulmonary endothelium. Objectives: To test the association between gene expression in PBMCs and pulmonary microvascular perfusion in COPD. Methods: The Multi-Ethnic Study of Atherosclerosis (MESA) COPD Study recruited two independent samples of COPD cases and controls with ⩾10 pack-years of smoking history. In both samples, pulmonary microvascular blood flow, pulmonary microvascular blood volume, and mean transit time were assessed on contrast-enhanced magnetic resonance imaging, and PBMC gene expression was assessed by microarray. Additional replication was performed in a third sample with pulmonary microvascular blood volume measures on contrast-enhanced dual-energy computed tomography. Differential expression analyses were adjusted for age, gender, race/ethnicity, educational attainment, height, weight, smoking status, and pack-years of smoking. Results: The 79 participants in the discovery sample had a mean age of 69 ± 6 years, 44% were female, 25% were non-White, 34% were current smokers, and 66% had COPD. There were large PBMC gene expression signatures associated with pulmonary microvascular perfusion traits, with several replicated in the replication sets with magnetic resonance imaging (n = 47) or dual-energy contrast-enhanced computed tomography (n = 157) measures. Many of the identified genes are involved in inflammatory processes, including nuclear factor-κB and chemokine signaling pathways. Conclusions: PBMC gene expression in nuclear factor-κB, inflammatory, and chemokine signaling pathways was associated with pulmonary microvascular perfusion in COPD, potentially offering new targetable candidates for novel therapies.
Asunto(s)
Leucocitos Mononucleares , Imagen por Resonancia Magnética , Enfermedad Pulmonar Obstructiva Crónica , Humanos , Femenino , Masculino , Anciano , Leucocitos Mononucleares/metabolismo , Enfermedad Pulmonar Obstructiva Crónica/genética , Enfermedad Pulmonar Obstructiva Crónica/fisiopatología , Persona de Mediana Edad , Pulmón/irrigación sanguínea , Pulmón/diagnóstico por imagen , Pulmón/metabolismo , Aterosclerosis/genética , Aterosclerosis/etnología , Estudios de Casos y Controles , Estados Unidos/epidemiología , Anciano de 80 o más Años , Expresión Génica , Tomografía Computarizada por Rayos X , Circulación Pulmonar , Fumar , MicrocirculaciónRESUMEN
Bulk-tissue molecular quantitative trait loci (QTLs) have been the starting point for interpreting disease-associated variants, and context-specific QTLs show particular relevance for disease. Here, we present the results of mapping interaction QTLs (iQTLs) for cell type, age, and other phenotypic variables in multi-omic, longitudinal data from the blood of individuals of diverse ancestries. By modeling the interaction between genotype and estimated cell-type proportions, we demonstrate that cell-type iQTLs could be considered as proxies for cell-type-specific QTL effects, particularly for the most abundant cell type in the tissue. The interpretation of age iQTLs, however, warrants caution because the moderation effect of age on the genotype and molecular phenotype association could be mediated by changes in cell-type composition. Finally, we show that cell-type iQTLs contribute to cell-type-specific enrichment of diseases that, in combination with additional functional data, could guide future functional studies. Overall, this study highlights the use of iQTLs to gain insights into the context specificity of regulatory effects.
Asunto(s)
Regulación de la Expresión Génica , Sitios de Carácter Cuantitativo , Humanos , Sitios de Carácter Cuantitativo/genética , Genotipo , FenotipoRESUMEN
Despite the prognostic value of arterial stiffness (AS) and pulsatile hemodynamics (PH) for cardiovascular morbidity and mortality, epigenetic modifications that contribute to AS/PH remain unknown. To gain a better understanding of the link between epigenetics (DNA methylation) and AS/PH, we examined the relationship of eight measures of AS/PH with CpG sites and co-methylated regions using multi-ancestry participants from Trans-Omics for Precision Medicine (TOPMed) Multi-Ethnic Study of Atherosclerosis (MESA) with sample sizes ranging from 438 to 874. Epigenome-wide association analysis identified one genome-wide significant CpG (cg20711926-CYP1B1) associated with aortic augmentation index (AIx). Follow-up analyses, including gene set enrichment analysis, expression quantitative trait methylation analysis, and functional enrichment analysis on differentially methylated positions and regions, further prioritized three CpGs and their annotated genes (cg23800023-ETS1, cg08426368-TGFB3, and cg17350632-HLA-DPB1) for AIx. Among these, ETS1 and TGFB3 have been previously prioritized as candidate genes. Furthermore, both ETS1 and HLA-DPB1 have significant tissue correlations between Whole Blood and Aorta in GTEx, which suggests ETS1 and HLA-DPB1 could be potential biomarkers in understanding pathophysiology of AS/PH. Overall, our findings support the possible role of epigenetic regulation via DNA methylation of specific genes associated with AIx as well as identifying potential targets for regulation of AS/PH.
Asunto(s)
Aterosclerosis , Epigénesis Genética , Humanos , Epigenoma , Factor de Crecimiento Transformador beta3/genética , Medicina de Precisión , Estudio de Asociación del Genoma Completo , Metilación de ADN , Islas de CpG/genética , Aterosclerosis/genéticaRESUMEN
Transcriptome prediction models built with data from European-descent individuals are less accurate when applied to different populations because of differences in linkage disequilibrium patterns and allele frequencies. We hypothesized that methods that leverage shared regulatory effects across different conditions, in this case, across different populations, may improve cross-population transcriptome prediction. To test this hypothesis, we made transcriptome prediction models for use in transcriptome-wide association studies (TWASs) using different methods (elastic net, joint-tissue imputation [JTI], matrix expression quantitative trait loci [Matrix eQTL], multivariate adaptive shrinkage in R [MASHR], and transcriptome-integrated genetic association resource [TIGAR]) and tested their out-of-sample transcriptome prediction accuracy in population-matched and cross-population scenarios. Additionally, to evaluate model applicability in TWASs, we integrated publicly available multiethnic genome-wide association study (GWAS) summary statistics from the Population Architecture using Genomics and Epidemiology (PAGE) study and Pan-ancestry genetic analysis of the UK Biobank (PanUKBB) with our developed transcriptome prediction models. In regard to transcriptome prediction accuracy, MASHR models performed better or the same as other methods in both population-matched and cross-population transcriptome predictions. Furthermore, in multiethnic TWASs, MASHR models yielded more discoveries that replicate in both PAGE and PanUKBB across all methods analyzed, including loci previously mapped in GWASs and loci previously not found in GWASs. Overall, our study demonstrates the importance of using methods that benefit from different populations' effect size estimates in order to improve TWASs for multiethnic or underrepresented populations.
Asunto(s)
Estudio de Asociación del Genoma Completo , Transcriptoma , Humanos , Transcriptoma/genética , Sitios de Carácter Cuantitativo/genética , Frecuencia de los Genes , Desequilibrio de LigamientoRESUMEN
Inflammation biomarkers can provide valuable insight into the role of inflammatory processes in many diseases and conditions. Sequencing based analyses of such biomarkers can also serve as an exemplar of the genetic architecture of quantitative traits. To evaluate the biological insight, which can be provided by a multi-ancestry, whole-genome based association study, we performed a comprehensive analysis of 21 inflammation biomarkers from up to 38,465 individuals with whole-genome sequencing from the Trans-Omics for Precision Medicine (TOPMed) program. We identified 22 distinct single-variant associations across 6 traits - E-selectin, intercellular adhesion molecule 1, interleukin-6, lipoprotein-associated phospholipase A2 activity and mass, and P-selectin - that remained significant after conditioning on previously identified associations for these inflammatory biomarkers. We further expanded upon known biomarker associations by pairing the single-variant analysis with a rare variant set-based analysis that further identified 19 significant rare variant set-based associations with 5 traits. These signals were distinct from both significant single variant association signals within TOPMed and genetic signals observed in prior studies, demonstrating the complementary value of performing both single and rare variant analyses when analyzing quantitative traits. We also confirm several previously reported signals from semi-quantitative proteomics platforms. Many of these signals demonstrate the extensive allelic heterogeneity and ancestry-differentiated variant-trait associations common for inflammation biomarkers, a characteristic we hypothesize will be increasingly observed with well-powered, large-scale analyses of complex traits.
RESUMEN
Blood lipid traits are treatable and heritable risk factors for heart disease, a leading cause of mortality worldwide. Although genome-wide association studies (GWAS) have discovered hundreds of variants associated with lipids in humans, most of the causal mechanisms of lipids remain unknown. To better understand the biological processes underlying lipid metabolism, we investigated the associations of plasma protein levels with total cholesterol (TC), triglycerides (TG), high-density lipoprotein cholesterol (HDL), and low-density lipoprotein cholesterol (LDL) in blood. We trained protein prediction models based on samples in the Multi-Ethnic Study of Atherosclerosis (MESA) and applied them to conduct proteome-wide association studies (PWAS) for lipids using the Global Lipids Genetics Consortium (GLGC) data. Of the 749 proteins tested, 42 were significantly associated with at least one lipid trait. Furthermore, we performed transcriptome-wide association studies (TWAS) for lipids using 9,714 gene expression prediction models trained on samples from peripheral blood mononuclear cells (PBMCs) in MESA and 49 tissues in the Genotype-Tissue Expression (GTEx) project. We found that although PWAS and TWAS can show different directions of associations in an individual gene, 40 out of 49 tissues showed a positive correlation between PWAS and TWAS signed p-values across all the genes, which suggests a high-level consistency between proteome-lipid associations and transcriptome-lipid associations.
RESUMEN
Although many novel gene-metabolite and gene-protein associations have been identified using high-throughput biochemical profiling, systematic studies that leverage human genetics to illuminate causal relationships between circulating proteins and metabolites are lacking. Here, we performed protein-metabolite association studies in 3,626 plasma samples from three human cohorts. We detected 171,800 significant protein-metabolite pairwise correlations between 1,265 proteins and 365 metabolites, including established relationships in metabolic and signaling pathways such as the protein thyroxine-binding globulin and the metabolite thyroxine, as well as thousands of new findings. In Mendelian randomization (MR) analyses, we identified putative causal protein-to-metabolite associations. We experimentally validated top MR associations in proof-of-concept plasma metabolomics studies in three murine knockout strains of key protein regulators. These analyses identified previously unrecognized associations between bioactive proteins and metabolites in human plasma. We provide publicly available data to be leveraged for studies in human metabolism and disease.
Asunto(s)
Metabolómica , Proteómica , Humanos , Animales , Ratones , Transducción de Señal , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
Multi-omics datasets are becoming more common, necessitating better integration methods to realize their revolutionary potential. Here, we introduce multi-set correlation and factor analysis (MCFA), an unsupervised integration method tailored to the unique challenges of high-dimensional genomics data that enables fast inference of shared and private factors. We used MCFA to integrate methylation markers, protein expression, RNA expression, and metabolite levels in 614 diverse samples from the Trans-Omics for Precision Medicine/Multi-Ethnic Study of Atherosclerosis multi-omics pilot. Samples cluster strongly by ancestry in the shared space, even in the absence of genetic information, while private spaces frequently capture dataset-specific technical variation. Finally, we integrated genetic data by conducting a genome-wide association study (GWAS) of our inferred factors, observing that several factors are enriched for GWAS hits and trans-expression quantitative trait loci. Two of these factors appear to be related to metabolic disease. Our study provides a foundation and framework for further integrative analysis of ever larger multi-modal genomic datasets.
RESUMEN
Despite the prognostic value of arterial stiffness (AS) and pulsatile hemodynamics (PH) for cardiovascular morbidity and mortality, epigenetic modifications that contribute to AS/PH remain unknown. To gain a better understanding of the link between epigenetics (DNA methylation) and AS/PH, we examined the relationship of eight measures of AS/PH with CpG sites and co-methylated regions using multi-ancestry participants from Trans-Omics for Precision Medicine (TOPMed) Multi-Ethnic Study of Atherosclerosis (MESA) with sample sizes ranging from 438 to 874. Epigenome-wide association analysis identified one genome-wide significant CpG (cg20711926-CYP1B1) associated with aortic augmentation index (AIx). Follow-up analyses, including gene set enrichment analysis, expression quantitative trait methylation analysis, and functional enrichment analysis on differentially methylated positions and regions, further prioritized three CpGs and their annotated genes (cg23800023-ETS1, cg08426368-TGFB3, and cg17350632-HLA-DPB1) for AIx. Among these, ETS1 and TGFB3 have been previously prioritized as candidate genes. Furthermore, both ETS1 and HLA-DPB1 have significant tissue correlations between Whole Blood and Aorta in GTEx, which suggests ETS1 and HLA-DPB1 could be potential biomarkers in understanding pathophysiology of AS/PH. Overall, our findings support the possible role of epigenetic regulation via DNA methylation of specific genes associated with AIx as well as identifying potential targets for regulation of AS/PH.
RESUMEN
Bulk tissue molecular quantitative trait loci (QTLs) have been the starting point for interpreting disease-associated variants, while context-specific QTLs show particular relevance for disease. Here, we present the results of mapping interaction QTLs (iQTLs) for cell type, age, and other phenotypic variables in multi-omic, longitudinal data from blood of individuals of diverse ancestries. By modeling the interaction between genotype and estimated cell type proportions, we demonstrate that cell type iQTLs could be considered as proxies for cell type-specific QTL effects. The interpretation of age iQTLs, however, warrants caution as the moderation effect of age on the genotype and molecular phenotype association may be mediated by changes in cell type composition. Finally, we show that cell type iQTLs contribute to cell type-specific enrichment of diseases that, in combination with additional functional data, may guide future functional studies. Overall, this study highlights iQTLs to gain insights into the context-specificity of regulatory effects.
RESUMEN
Rationale: Chronic obstructive pulmonary disease (COPD) is a complex disease characterized by airway obstruction and accelerated lung function decline. Our understanding of systemic protein biomarkers associated with COPD remains incomplete. Objectives: To determine what proteins and pathways are associated with impaired pulmonary function in a diverse population. Methods: We studied 6,722 participants across six cohort studies with both aptamer-based proteomic and spirometry data (4,566 predominantly White participants in a discovery analysis and 2,156 African American cohort participants in a validation). In linear regression models, we examined protein associations with baseline forced expiratory volume in 1 second (FEV1) and FEV1/forced vital capacity (FVC). In linear mixed effects models, we investigated the associations of baseline protein levels with rate of FEV1 decline (ml/yr) in 2,777 participants with up to 7 years of follow-up spirometry. Results: We identified 254 proteins associated with FEV1 in our discovery analyses, with 80 proteins validated in the Jackson Heart Study. Novel validated protein associations include kallistatin serine protease inhibitor, growth differentiation factor 2, and tumor necrosis factor-like weak inducer of apoptosis (discovery ß = 0.0561, Q = 4.05 × 10-10; ß = 0.0421, Q = 1.12 × 10-3; and ß = 0.0358, Q = 1.67 × 10-3, respectively). In longitudinal analyses within cohorts with follow-up spirometry, we identified 15 proteins associated with FEV1 decline (Q < 0.05), including elafin leukocyte elastase inhibitor and mucin-associated TFF2 (trefoil factor 2; ß = -4.3 ml/yr, Q = 0.049; ß = -6.1 ml/yr, Q = 0.032, respectively). Pathways and processes highlighted by our study include aberrant extracellular matrix remodeling, enhanced innate immune response, dysregulation of angiogenesis, and coagulation. Conclusions: In this study, we identify and validate novel biomarkers and pathways associated with lung function traits in a racially diverse population. In addition, we identify novel protein markers associated with FEV1 decline. Several protein findings are supported by previously reported genetic signals, highlighting the plausibility of certain biologic pathways. These novel proteins might represent markers for risk stratification, as well as novel molecular targets for treatment of COPD.
Asunto(s)
Pulmón , Enfermedad Pulmonar Obstructiva Crónica , Humanos , Volumen Espiratorio Forzado/fisiología , Proteómica , Capacidad Vital/fisiología , Espirometría , BiomarcadoresRESUMEN
Integrative approaches that simultaneously model multi-omics data have gained increasing popularity because they provide holistic system biology views of multiple or all components in a biological system of interest. Canonical correlation analysis (CCA) is a correlation-based integrative method designed to extract latent features shared between multiple assays by finding the linear combinations of features-referred to as canonical variables (CVs)-within each assay that achieve maximal across-assay correlation. Although widely acknowledged as a powerful approach for multi-omics data, CCA has not been systematically applied to multi-omics data in large cohort studies, which has only recently become available. Here, we adapted sparse multiple CCA (SMCCA), a widely-used derivative of CCA, to proteomics and methylomics data from the Multi-Ethnic Study of Atherosclerosis (MESA) and Jackson Heart Study (JHS). To tackle challenges encountered when applying SMCCA to MESA and JHS, our adaptations include the incorporation of the Gram-Schmidt (GS) algorithm with SMCCA to improve orthogonality among CVs, and the development of Sparse Supervised Multiple CCA (SSMCCA) to allow supervised integration analysis for more than two assays. Effective application of SMCCA to the two real datasets reveals important findings. Applying our SMCCA-GS to MESA and JHS, we identified strong associations between blood cell counts and protein abundance, suggesting that adjustment of blood cell composition should be considered in protein-based association studies. Importantly, CVs obtained from two independent cohorts also demonstrate transferability across the cohorts. For example, proteomic CVs learned from JHS, when transferred to MESA, explain similar amounts of blood cell count phenotypic variance in MESA, explaining 39.0% ~ 50.0% variation in JHS and 38.9% ~ 49.1% in MESA. Similar transferability was observed for other omics-CV-trait pairs. This suggests that biologically meaningful and cohort-agnostic variation is captured by CVs. We anticipate that applying our SMCCA-GS and SSMCCA on various cohorts would help identify cohort-agnostic biologically meaningful relationships between multi-omics data and phenotypic traits.