Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 29
Filter
1.
BMC Bioinformatics ; 25(1): 276, 2024 Aug 24.
Article in English | MEDLINE | ID: mdl-39179997

ABSTRACT

Sparse multiple canonical correlation network analysis (SmCCNet) is a machine learning technique for integrating omics data along with a variable of interest (e.g., phenotype of complex disease), and reconstructing multi-omics networks that are specific to this variable. We present the second-generation SmCCNet (SmCCNet 2.0) that adeptly integrates single or multiple omics data types along with a quantitative or binary phenotype of interest. In addition, this new package offers a streamlined setup process that can be configured manually or automatically, ensuring a flexible and user-friendly experience. AVAILABILITY : This package is available in both CRAN: https://cran.r-project.org/web/packages/SmCCNet/index.html and Github: https://github.com/KechrisLab/SmCCNet under the MIT license. The network visualization tool is available at https://smccnet.shinyapps.io/smccnetnetwork/ .


Subject(s)
Machine Learning , Software , Genomics/methods , Gene Regulatory Networks , Computational Biology/methods , Humans , Multiomics
2.
BMC Genomics ; 25(1): 825, 2024 Sep 02.
Article in English | MEDLINE | ID: mdl-39223457

ABSTRACT

BACKGROUND: Studies have identified individual blood biomarkers associated with chronic obstructive pulmonary disease (COPD) and related phenotypes. However, complex diseases such as COPD typically involve changes in multiple molecules with interconnections that may not be captured when considering single molecular features. METHODS: Leveraging proteomic data from 3,173 COPDGene Non-Hispanic White (NHW) and African American (AA) participants, we applied sparse multiple canonical correlation network analysis (SmCCNet) to 4,776 proteins assayed on the SomaScan v4.0 platform to derive sparse networks of proteins associated with current vs. former smoking status, airflow obstruction, and emphysema quantitated from high-resolution computed tomography scans. We then used NetSHy, a dimension reduction technique leveraging network topology, to produce summary scores of each proteomic network, referred to as NetSHy scores. We next performed a genome-wide association study (GWAS) to identify variants associated with the NetSHy scores, or network quantitative trait loci (nQTLs). Finally, we evaluated the replicability of the networks in an independent cohort, SPIROMICS. RESULTS: We identified networks of 13 to 104 proteins for each phenotype and exposure in NHW and AA, and the derived NetSHy scores significantly associated with the variable of interests. Networks included known (sRAGE, ALPP, MIP1) and novel molecules (CA10, CPB1, HIS3, PXDN) and interactions involved in COPD pathogenesis. We observed 7 nQTL loci associated with NetSHy scores, 4 of which remained after conditional analysis. Networks for smoking status and emphysema, but not airflow obstruction, demonstrated a high degree of replicability across race groups and cohorts. CONCLUSIONS: In this work, we apply state-of-the-art molecular network generation and summarization approaches to proteomic data from COPDGene participants to uncover protein networks associated with COPD phenotypes. We further identify genetic associations with networks. This work discovers protein networks containing known and novel proteins and protein interactions associated with clinically relevant COPD phenotypes across race groups and cohorts.


Subject(s)
Genome-Wide Association Study , Proteomics , Pulmonary Disease, Chronic Obstructive , Smoking , Humans , Pulmonary Disease, Chronic Obstructive/genetics , Smoking/genetics , Male , Female , Middle Aged , Aged , Quantitative Trait Loci , Phenotype , Polymorphism, Single Nucleotide , Genetic Variation
3.
Respir Res ; 25(1): 289, 2024 Jul 30.
Article in English | MEDLINE | ID: mdl-39080656

ABSTRACT

BACKGROUND: Sarcoidosis is a heterogeneous granulomatous disease with no accurate biomarkers of disease progression. Therefore, we profiled and integrated the DNA methylome, mRNAs, and microRNAs to identify molecular changes associated with sarcoidosis and disease progression that might illuminate underlying mechanisms of disease and potential biomarkers. METHODS: Bronchoalveolar lavage cells from 64 sarcoidosis subjects and 16 healthy controls were used. DNA methylation was profiled on Illumina HumanMethylationEPIC arrays, mRNA by RNA-sequencing, and miRNAs by small RNA-sequencing. Linear models were fit to test for effect of sarcoidosis diagnosis and progression phenotype, adjusting for age, sex, smoking, and principal components of the data. We built a supervised multi-omics model using a subset of features from each dataset. RESULTS: We identified 1,459 CpGs, 64 mRNAs, and five miRNAs associated with sarcoidosis versus controls and four mRNAs associated with disease progression. Our integrated model emphasized the prominence of the PI3K/AKT1 pathway, which is important in T cell and mTOR function. Novel immune related genes and miRNAs including LYST, RGS14, SLFN12L, and hsa-miR-199b-5p, distinguished sarcoidosis from controls. Our integrated model also demonstrated differential expression/methylation of IL20RB, ABCC11, SFSWAP, AGBL4, miR-146a-3p, and miR-378b between non-progressive and progressive sarcoidosis. CONCLUSIONS: Leveraging the DNA methylome, transcriptome, and miRNA-sequencing in sarcoidosis BAL cells, we detected widespread molecular changes associated with disease, many which are involved in immune response. These molecules may serve as diagnostic/prognostic biomarkers and/or drug targets, although future testing is required for confirmation.


Subject(s)
Bronchoalveolar Lavage Fluid , Multiomics , Sarcoidosis, Pulmonary , Adult , Female , Humans , Male , Middle Aged , Bronchoalveolar Lavage Fluid/cytology , Bronchoalveolar Lavage Fluid/chemistry , Bronchoalveolar Lavage Fluid/immunology , Case-Control Studies , Disease Progression , DNA Methylation , MicroRNAs/genetics , MicroRNAs/metabolism , RNA, Messenger/metabolism , RNA, Messenger/genetics , Sarcoidosis, Pulmonary/genetics , Sarcoidosis, Pulmonary/metabolism , Sarcoidosis, Pulmonary/diagnosis , Sarcoidosis, Pulmonary/pathology
4.
Mol Biol Evol ; 39(4)2022 04 10.
Article in English | MEDLINE | ID: mdl-35446958

ABSTRACT

Because errors at the DNA level power pathogen evolution, a systematic understanding of the rate and molecular spectra of mutations could guide the avoidance and treatment of infectious diseases. We thus accumulated tens of thousands of spontaneous mutations in 768 repeatedly bottlenecked lineages of 18 strains from various geographical sites, temporal spread, and genetic backgrounds. Entailing over ∼1.36 million generations, the resultant data yield an average mutation rate of ∼0.0005 per genome per generation, with a significant within-species variation. This is one of the lowest bacterial mutation rates reported, giving direct support for a high genome stability in this pathogen resulting from high DNA-mismatch-repair efficiency and replication-machinery fidelity. Pathogenicity genes do not exhibit an accelerated mutation rate, and thus, elevated mutation rates may not be the major determinant for the diversification of toxin and secretion systems. Intriguingly, a low error rate at the transcript level is not observed, suggesting distinct fidelity of the replication and transcription machinery. This study urges more attention on the most basic evolutionary processes of even the best-known human pathogens and deepens the understanding of their genome evolution.


Subject(s)
Salmonella enterica , Salmonella , Genome, Bacterial , Mutation , Mutation Rate , Salmonella/genetics , Salmonella enterica/genetics
5.
Int J Obes (Lond) ; 47(2): 109-116, 2023 02.
Article in English | MEDLINE | ID: mdl-36463326

ABSTRACT

BACKGROUND/OBJECTIVES: Obesity, defined as excessive fat accumulation that represents a health risk, is increasing in adults and children, reaching global epidemic proportions. Body mass index (BMI) correlates with body fat and future health risk, yet differs in prediction by fat distribution, across populations and by age. Nonetheless, few genetic studies of BMI have been conducted in ancestrally diverse populations. Gene expression association with BMI was assessed in the Multi-Ethnic Study of Atherosclerosis (MESA) in four self-identified race and ethnicity (SIRE) groups to identify genes associated with obesity. SUBJECTS/METHODS: RNA-sequencing was performed on 1096 MESA participants (37.8% white, 24.3% Hispanic, 28.4% African American, and 9.5% Chinese American) and linear models were used to assess the association of expression from each gene for its effect on BMI, adjusting for age, sex, sequencing center, study site, five expression and four genetic principal components in each self-identified race group. Sample-size-weighted meta-analysis was performed to identify genes with BMI-associated expression across ancestry groups. RESULTS: Within individual SIRE groups, there were zero to three genes whose expression is significantly (p < 1.97 × 10-6) associated with BMI. Across all groups, 45 genes were identified by meta-analysis whose expression was significantly associated with BMI, explaining 29.7% of BMI variation. The 45 genes are expressed in a variety of tissues and cell types and are enriched for obesity-related processes including erythrocyte function, oxygen binding and transport, and JAK-STAT signaling. CONCLUSIONS: We have identified genes whose expression is significantly associated with obesity in a multi-ethnic cohort. We have identified novel genes associated with BMI as well as confirmed previously identified genes from earlier genetic analyses. These novel genes and their biological pathways represent new targets for understanding the biology of obesity as well as new therapeutic intervention to reduce obesity and improve global public health.


Subject(s)
Body Mass Index , Gene Expression , Obesity , Adult , Child , Humans , Atherosclerosis , Obesity/epidemiology , Obesity/genetics
6.
Hum Genomics ; 16(1): 27, 2022 07 27.
Article in English | MEDLINE | ID: mdl-35897116

ABSTRACT

RT-PCR is the foremost clinical test for diagnosis of COVID-19. Unfortunately, PCR-based testing has limitations and may not result in a positive test early in the course of infection before symptoms develop. Enveloped RNA viruses, such as coronaviruses, alter peripheral blood methylation and DNA methylation signatures may characterize asymptomatic versus symptomatic infection. We used Illumina's Infinium MethylationEPIC BeadChip array to profile peripheral blood samples from 164 patients who tested positive for SARS-CoV-2 by RT-PCR, of whom 8 had no symptoms. Epigenome-wide association analysis identified 10 methylation sites associated with infection and a quantile-quantile plot showed little inflation. These preliminary results suggest that differences in methylation patterns may distinguish asymptomatic from symptomatic infection.


Subject(s)
COVID-19 , COVID-19/genetics , Epigenesis, Genetic , Epigenomics , Humans , SARS-CoV-2/genetics
7.
Am J Respir Crit Care Med ; 206(10): 1259-1270, 2022 11 15.
Article in English | MEDLINE | ID: mdl-35816432

ABSTRACT

Rationale: Common genetic variants have been associated with idiopathic pulmonary fibrosis (IPF). Objectives: To determine functional relevance of the 10 IPF-associated common genetic variants we previously identified. Methods: We performed expression quantitative trait loci (eQTL) and methylation quantitative trait loci (mQTL) mapping, followed by co-localization of eQTL and mQTL with genetic association signals and functional validation by luciferase reporter assays. Illumina multi-ethnic genotyping arrays, mRNA sequencing, and Illumina 850k methylation arrays were performed on lung tissue of participants with IPF (234 RNA and 345 DNA samples) and non-diseased controls (188 RNA and 202 DNA samples). Measurements and Main Results: Focusing on genetic variants within 10 IPF-associated genetic loci, we identified 27 eQTLs in controls and 24 eQTLs in cases (false-discovery-rate-adjusted P < 0.05). Among these signals, we identified associations of lead variants rs35705950 with expression of MUC5B and rs2076295 with expression of DSP in both cases and controls. mQTL analysis identified CpGs in gene bodies of MUC5B (cg17589883) and DSP (cg08964675) associated with the lead variants in these two loci. We also demonstrated strong co-localization of eQTL/mQTL and genetic signal in MUC5B (rs35705950) and DSP (rs2076295). Functional validation of the mQTL in MUC5B using luciferase reporter assays demonstrates that the CpG resides within a putative internal repressor element. Conclusions: We have established a relationship of the common IPF genetic risk variants rs35705950 and rs2076295 with respective changes in MUC5B and DSP expression and methylation. These results provide additional evidence that both MUC5B and DSP are involved in the etiology of IPF.


Subject(s)
Idiopathic Pulmonary Fibrosis , Humans , DNA , DNA Methylation/genetics , Gene Expression , Genetic Predisposition to Disease/genetics , Idiopathic Pulmonary Fibrosis/genetics , Mucin-5B/genetics , Quantitative Trait Loci/genetics , RNA
8.
Am J Respir Cell Mol Biol ; 67(6): 632-640, 2022 12.
Article in English | MEDLINE | ID: mdl-35972918

ABSTRACT

Chronic beryllium disease (CBD) is a Th1 granulomatous lung disease preceded by sensitization to beryllium (BeS). We profiled the methylome, transcriptome, and selected proteins in the lung to identify molecular signatures and networks associated with BeS and CBD. BAL cell DNA and RNA were profiled using microarrays from CBD (n = 30), BeS (n = 30), and control subjects (n = 12). BAL fluid proteins were measured using Olink Immune Response Panel proteins from CBD (n = 22) and BeS (n = 22) subjects. Linear models identified features associated with CBD, adjusting for covariation and batch effects. Multiomic integration methods identified correlated features between datasets. We identified 1,546 differentially expressed genes in CBD versus control subjects and 204 in CBD versus BeS. Of the 101 shared transcripts, 24 have significant cis relationships between gene expression and DNA methylation, assessed using expression quantitative trait methylation analysis, including genes not previously identified in CBD. A multiomic model of top DNA methylation and gene expression features demonstrated that the first component separated CBD from other samples and the second component separated control subjects from remaining samples. The top features on component one were enriched for T-lymphocyte function, and the top features on component two were enriched for innate immune signaling. We identified six differentially abundant proteins in CBD versus BeS, with two (SIT1 and SH2D1A) selected as important RNA features in the multiomic model. Our integrated analysis of DNA methylation, gene expression, and proteins in the lung identified multiomic signatures of CBD that differentiated it from BeS and control subjects.


Subject(s)
Berylliosis , Humans , Berylliosis/genetics , T-Lymphocytes , Bronchoalveolar Lavage , Bronchoalveolar Lavage Fluid , Immunity, Innate/genetics , RNA , Chronic Disease
9.
Am J Respir Cell Mol Biol ; 65(4): 430-441, 2021 10.
Article in English | MEDLINE | ID: mdl-34038697

ABSTRACT

Molecular patterns and pathways in idiopathic pulmonary fibrosis (IPF) have been extensively investigated, but few studies have assimilated multiomic platforms to provide an integrative understanding of molecular patterns that are relevant in IPF. Herein, we combine the coding and noncoding transcriptomes, DNA methylomes, and proteomes from IPF and healthy lung tissue to identify molecules and pathways associated with this disease. RNA sequencing, Illumina MethylationEPIC array, and liquid chromatography-mass spectrometry proteomic data were collected on lung tissue from 24 subjects with IPF and 14 control subjects. Significant differential features were identified by using linear models adjusting for age and sex, inflation, and bias when appropriate. Data Integration Analysis for Biomarker Discovery Using a Latent Component Method for Omics Studies was used for integrative multiomic analysis. We identified 4,643 differentially expressed transcripts aligning to 3,439 genes, 998 differentially abundant proteins, 2,500 differentially methylated regions, and 1,269 differentially expressed long noncoding RNAs (lncRNAs) that were significant after correcting for multiple tests (false discovery rate < 0.05). Unsupervised hierarchical clustering using 20 coding mRNA, protein, methylation, and lncRNA features with the highest loadings on the top latent variable from the four data sets demonstrates perfect separation of IPF and control lungs. Our analysis confirmed previously validated molecules and pathways known to be dysregulated in disease and implicated novel molecular features as potential drivers and modifiers of disease. For example, 4 proteins, 18 differentially methylated regions, and 10 lncRNAs were found to have strong correlations (|r| > 0.8) with MMP7 (matrix metalloproteinase 7). Therefore, by using a system biology approach, we have identified novel molecular relationships in IPF.


Subject(s)
Idiopathic Pulmonary Fibrosis/metabolism , Lung/metabolism , RNA, Long Noncoding/genetics , Transcriptome/physiology , Aged , Case-Control Studies , Female , Gene Expression Profiling/methods , Humans , Male , Matrix Metalloproteinase 7/metabolism , Middle Aged , RNA, Messenger/metabolism
10.
Am J Respir Crit Care Med ; 202(10): 1430-1444, 2020 11 15.
Article in English | MEDLINE | ID: mdl-32602730

ABSTRACT

Rationale: Chronic hypersensitivity pneumonitis (CHP) is caused by an immune response to antigen inhalation and is characterized by variable histopathological and clinical features. A subset of subjects with CHP have usual interstitial pneumonia and appear to be clinically similar to subjects with idiopathic pulmonary fibrosis (IPF).Objectives: To determine the common and unique molecular features of CHP and IPF.Methods: Transcriptome analysis of lung samples from CHP (n = 82), IPF (n = 103), and unaffected controls (n = 103) was conducted. Differential gene expression was determined adjusting for sex, race, age, and smoking history and using false discovery rate to control for multiple comparisons.Measurements and Main Results: When compared with controls, we identified 413 upregulated and 317 downregulated genes in CHP and 861 upregulated and 322 downregulated genes in IPF. Concordantly upregulated or downregulated genes in CHP and IPF were related to collagen catabolic processes and epithelial development, whereas genes specific to CHP (differentially expressed in CHP when compared with control and not differentially expressed in IPF) were related to chemokine-mediated signaling and immune responsiveness. Using weighted gene coexpression network analysis, we found that among subjects with CHP, genes involved in adaptive immunity or epithelial cell development were associated with improved or reduced lung function, respectively, and that MUC5B expression was associated with epithelial cell development. MUC5B expression was also associated with lung fibrosis and honeycombing.Conclusions: Gene expression analysis of CHP and IPF identified signatures common to CHP and IPF, as well as genes uniquely expressed in CHP. Select modules of gene expression are characterized by distinct clinical and pathological features of CHP.


Subject(s)
Alveolitis, Extrinsic Allergic/genetics , Alveolitis, Extrinsic Allergic/immunology , Gene Expression Profiling , Idiopathic Pulmonary Fibrosis/genetics , Idiopathic Pulmonary Fibrosis/immunology , Lung Diseases, Interstitial/genetics , Lung Diseases, Interstitial/immunology , Adult , Aged , Aged, 80 and over , Alveolitis, Extrinsic Allergic/physiopathology , Female , Gene Expression , Humans , Idiopathic Pulmonary Fibrosis/physiopathology , Lung Diseases, Interstitial/physiopathology , Male , Middle Aged
11.
Am J Respir Cell Mol Biol ; 60(1): 96-105, 2019 01.
Article in English | MEDLINE | ID: mdl-30141971

ABSTRACT

Epigenetic marks are likely to explain variability of response to antigen in granulomatous lung disease. The objective of this study was to identify DNA methylation and gene expression changes associated with chronic beryllium disease (CBD) and sarcoidosis in lung cells obtained by BAL. BAL cells from CBD (n = 8), beryllium-sensitized (n = 8), sarcoidosis (n = 8), and additional progressive sarcoidosis (n = 9) and remitting (n = 15) sarcoidosis were profiled on the Illumina 450k methylation and Affymetrix/Agilent gene expression microarrays. Statistical analyses were performed to identify DNA methylation and gene expression changes associated with CBD, sarcoidosis, and disease progression in sarcoidosis. DNA methylation array findings were validated by pyrosequencing. We identified 52,860 significant (P < 0.005 and q < 0.05) CpGs associated with CBD; 2,726 CpGs near 1,944 unique genes have greater than 25% methylation change. A total of 69% of differentially methylated genes are significantly (q < 0.05) differentially expressed in CBD, with many canonical inverse relationships of methylation and expression in genes critical to T-helper cell type 1 differentiation, chemokines and their receptors, and other genes involved in immunity. Testing of these CBD-associated CpGs in sarcoidosis reveals that methylation changes only approach significance, but are methylated in the same direction, suggesting similarities between the two diseases with more heterogeneity in sarcoidosis that limits power with the current sample size. Analysis of progressive versus remitting sarcoidosis identified 15,215 CpGs (P < 0.005 and q < 0.05), but only 801 of them have greater than 5% methylation change, demonstrating that DNA methylation marks of disease progression changes are more subtle. Our study highlights the significance of epigenetic marks in lung immune response in granulomatous lung disease.


Subject(s)
Berylliosis/genetics , Biomarkers/analysis , DNA Methylation , Gene Expression Regulation , Sarcoidosis, Pulmonary/genetics , Berylliosis/immunology , Berylliosis/pathology , Case-Control Studies , Chronic Disease , Female , Gene Expression Profiling , Genome, Human , Humans , Male , Middle Aged , Sarcoidosis, Pulmonary/immunology , Sarcoidosis, Pulmonary/pathology
14.
bioRxiv ; 2024 Apr 07.
Article in English | MEDLINE | ID: mdl-38045372

ABSTRACT

Summary: Sparse multiple canonical correlation network analysis (SmCCNet) is a machine learning technique for integrating omics data along with a variable of interest (e.g., phenotype of complex disease), and reconstructing multi-omics networks that are specific to this variable. We present the second-generation SmCCNet (SmCCNet 2.0) that adeptly integrates single or multiple omics data types along with a quantitative or binary phenotype of interest. In addition, this new package offers a streamlined setup process that can be configured manually or automatically, ensuring a flexible and user-friendly experience. Availability: This package is available in both CRAN: https://cran.r-project.org/web/packages/SmCCNet/index.html and Github: https://github.com/KechrisLab/SmCCNet under the MIT license. The network visualization tool is available at https://smccnet.shinyapps.io/smccnetnetwork/.

15.
Aging Cell ; 23(1): e14025, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37920126

ABSTRACT

Aging, human immunodeficiency virus (HIV) infection, and antiretroviral therapy modify the epigenetic profile and function of cells and tissues, including skeletal muscle (SkM). In some cells, accelerated epigenetic aging begins very soon after the initial HIV infection, potentially setting the stage for the early onset of frailty. Exercise imparts epigenetic modifications in SkM that may underpin some health benefits, including delayed frailty, in people living with HIV (PWH). In this first report of exercise-related changes in SkM DNA methylation among PWH, we investigated the impact of 24 weeks of aerobic and resistance exercise training on SkM (vastus lateralis) DNA methylation profiles and epigenetic age acceleration (EAA) in older, virally suppressed PWH (n = 12) and uninfected controls (n = 18), and associations of EAA with physical function at baseline. We identified 983 differentially methylated positions (DMPs) in PWH and controls at baseline and 237 DMPs after training. The influence of HIV serostatus on SkM methylation was more pronounced than that of exercise training. There was little overlap in the genes associated with the probes most significantly differentiated by exercise training within each group. Baseline EAA (mean ± SD) was similar between PWH (-0.4 ± 2.5 years) and controls (0.2 ± 2.6 years), and the exercise effect was not significant (p = 0.79). EAA and physical function at baseline were not significantly correlated (all p ≥ 0.10). This preliminary investigation suggests HIV-specific epigenetic adaptations in SkM with exercise training but confirmation in a larger study that includes transcriptomic analysis is warranted.


Subject(s)
Frailty , HIV Infections , Humans , Aged , DNA Methylation/genetics , Frailty/genetics , HIV Infections/genetics , Epigenesis, Genetic/genetics , Exercise/physiology , Muscle, Skeletal/metabolism , Aging/genetics
16.
medRxiv ; 2024 Jul 14.
Article in English | MEDLINE | ID: mdl-39040187

ABSTRACT

Most genetic variants identified through genome-wide association studies (GWAS) are suspected to be regulatory in nature, but only a small fraction colocalize with expression quantitative trait loci (eQTLs, variants associated with expression of a gene). Therefore, it is hypothesized but largely untested that integration of disease GWAS with context-specific eQTLs will reveal the underlying genes driving disease associations. We used colocalization and transcriptomic analyses to identify shared genetic variants and likely causal genes associated with critically ill COVID-19 and idiopathic pulmonary fibrosis. We first identified five genome-wide significant variants associated with both diseases. Four of the variants did not demonstrate clear colocalization between GWAS and healthy lung eQTL signals. Instead, two of the four variants colocalized only in cell-type and disease-specific eQTL datasets. These analyses pointed to higher ATP11A expression from the C allele of rs12585036, in monocytes and in lung tissue from primarily smokers, which increased risk of IPF and decreased risk of critically ill COVID-19. We also found lower DPP9 expression (and higher methylation at a specific CpG) from the G allele of rs12610495, acting in fibroblasts and in IPF lungs, and increased risk of IPF and critically ill COVID-19. We further found differential expression of the identified causal genes in diseased lungs when compared to non-diseased lungs, specifically in epithelial and immune cell types. These findings highlight the power of integrating GWAS, context-specific eQTLs, and transcriptomics of diseased tissue to harness human genetic variation to identify causal genes and where they function during multiple diseases.

17.
medRxiv ; 2024 Feb 28.
Article in English | MEDLINE | ID: mdl-38464285

ABSTRACT

Background: Studies have identified individual blood biomarkers associated with chronic obstructive pulmonary disease (COPD) and related phenotypes. However, complex diseases such as COPD typically involve changes in multiple molecules with interconnections that may not be captured when considering single molecular features. Methods: Leveraging proteomic data from 3,173 COPDGene Non-Hispanic White (NHW) and African American (AA) participants, we applied sparse multiple canonical correlation network analysis (SmCCNet) to 4,776 proteins assayed on the SomaScan v4.0 platform to derive sparse networks of proteins associated with current vs. former smoking status, airflow obstruction, and emphysema quantitated from high-resolution computed tomography scans. We then used NetSHy, a dimension reduction technique leveraging network topology, to produce summary scores of each proteomic network, referred to as NetSHy scores. We next performed genome-wide association study (GWAS) to identify variants associated with the NetSHy scores, or network quantitative trait loci (nQTLs). Finally, we evaluated the replicability of the networks in an independent cohort, SPIROMICS. Results: We identified networks of 13 to 104 proteins for each phenotype and exposure in NHW and AA, and the derived NetSHy scores significantly associated with the variable of interests. Networks included known (sRAGE, ALPP, MIP1) and novel molecules (CA10, CPB1, HIS3, PXDN) and interactions involved in COPD pathogenesis. We observed 7 nQTL loci associated with NetSHy scores, 4 of which remained after conditional analysis. Networks for smoking status and emphysema, but not airflow obstruction, demonstrated a high degree of replicability across race groups and cohorts. Conclusions: In this work, we apply state-of-the-art molecular network generation and summarization approaches to proteomic data from COPDGene participants to uncover protein networks associated with COPD phenotypes. We further identify genetic associations with networks. This work discovers protein networks containing known and novel proteins and protein interactions associated with clinically relevant COPD phenotypes across race groups and cohorts.

18.
Sci Rep ; 14(1): 20618, 2024 09 04.
Article in English | MEDLINE | ID: mdl-39232179

ABSTRACT

Protein biomarkers are associated with mortality in cardiovascular disease, but their effect on predicting respiratory and all-cause mortality is not clear. We tested whether a protein risk score (protRS) can improve prediction of all-cause mortality over clinical risk factors in smokers. We utilized smoking-enriched (COPDGene, LSC, SPIROMICS) and general population-based (MESA) cohorts with SomaScan proteomic and mortality data. We split COPDGene into training and testing sets (50:50) and developed a protRS based on respiratory mortality effect size and parsimony. We tested multivariable associations of the protRS with all-cause, respiratory, and cardiovascular mortality, and performed meta-analysis, area-under-the-curve (AUC), and network analyses. We included 2232 participants. In COPDGene, a penalized regression-based protRS was most highly associated with respiratory mortality (OR 9.2) and parsimonious (15 proteins). This protRS was associated with all-cause mortality (random effects HR 1.79 [95% CI 1.31-2.43]). Adding the protRS to clinical covariates improved all-cause mortality prediction in COPDGene (AUC 0.87 vs 0.82) and SPIROMICS (0.74 vs 0.6), but not in LSC and MESA. Protein-protein interaction network analyses implicate cytokine signaling, innate immune responses, and extracellular matrix turnover. A blood-based protein risk score predicts all-cause and respiratory mortality, identifies potential drivers of mortality, and demonstrates heterogeneity in effects amongst cohorts.


Subject(s)
Cardiovascular Diseases , Mortality , Respiratory Tract Diseases , Smoking , Aged , Female , Humans , Male , Middle Aged , Biomarkers , Black or African American , Cardiovascular Diseases/mortality , Proteomics , Risk Factors , White , Respiratory Tract Diseases/mortality
19.
medRxiv ; 2024 May 20.
Article in English | MEDLINE | ID: mdl-38826461

ABSTRACT

Rationale: Genetic variants and gene expression predict risk of chronic obstructive pulmonary disease (COPD), but their effect on COPD heterogeneity is unclear. Objectives: Define high-risk COPD subtypes using both genetics (polygenic risk score, PRS) and blood gene expression (transcriptional risk score, TRS) and assess differences in clinical and molecular characteristics. Methods: We defined high-risk groups based on PRS and TRS quantiles by maximizing differences in protein biomarkers in a COPDGene training set and identified these groups in COPDGene and ECLIPSE test sets. We tested multivariable associations of subgroups with clinical outcomes and compared protein-protein interaction networks and drug repurposing analyses between high-risk groups. Measurements and Main Results: We examined two high-risk omics-defined groups in non-overlapping test sets (n=1,133 NHW COPDGene, n=299 African American (AA) COPDGene, n=468 ECLIPSE). We defined "High activity" (low PRS/high TRS) and "severe risk" (high PRS/high TRS) subgroups. Participants in both subgroups had lower body-mass index (BMI), lower lung function, and alterations in metabolic, growth, and immune signaling processes compared to a low-risk (low PRS, low TRS) reference subgroup. "High activity" but not "severe risk" participants had greater prospective FEV 1 decline (COPDGene: -51 mL/year; ECLIPSE: - 40 mL/year) and their proteomic profiles were enriched in gene sets perturbed by treatment with 5-lipoxygenase inhibitors and angiotensin-converting enzyme (ACE) inhibitors. Conclusions: Concomitant use of polygenic and transcriptional risk scores identified clinical and molecular heterogeneity amongst high-risk individuals. Proteomic and drug repurposing analysis identified subtype-specific enrichment for therapies and suggest prior drug repurposing failures may be explained by patient selection.

20.
Am J Biol Anthropol ; 182(3): 487-498, 2023 11.
Article in English | MEDLINE | ID: mdl-37694912

ABSTRACT

OBJECTIVE: The degree of sexual dimorphism in certain traits between males and females differ from one sample to another. Although trait differences by sex are often reported in bioanthropological research, few studies test for statistical significance or make raw data available. TestDimorph is the first R package dedicated to testing and comparing the degree of sexual dimorphism among different samples by leveraging summary statistics. MATERIALS AND METHODS: We provide two approaches of analysis of inter-sample differences in degree of sexual dimorphism: univariate and multivariate for two or more samples. The methods follow upon publications primarily from the AJBA. Within-sex size variability between samples is compared using one-way ANOVA followed by control for multiple pairwise comparisons. In addition, we compute the overlapping area between the density functions of two normal distributions from the mixture intersection index or the non-overlapping area using the dissimilarity index as well as Hedges' g with inferential support using the 95% confidence interval. Finally, we use a multivariate analysis of differences in patterning of sexual dimorphism between samples. RESULTS: We demonstrate various results from applying TestDimorph functions to data supplied with the package. DISCUSSION: The package has many features including functionality for working with summary statistics, simulating data from summary statistics, and the extraction of summary statistics from raw data, so that the entire analysis can be performed through the package.


Subject(s)
Sex Characteristics , Male , Female , Humans , Multivariate Analysis , Analysis of Variance , Normal Distribution , Phenotype
SELECTION OF CITATIONS
SEARCH DETAIL