ABSTRACT
The brain functional connectivity can typically be represented as a brain functional network, where nodes represent regions of interest (ROIs) and edges symbolize their connections. Studying group differences in brain functional connectivity can help identify brain regions and recover the brain functional network linked to neurodegenerative diseases. This process, known as differential network analysis focuses on the differences between estimated precision matrices for two groups. Current methods struggle with individual heterogeneity in measuring the brain connectivity, false discovery rate (FDR) control, and accounting for confounding factors, resulting in biased estimates and diminished power. To address these issues, we present a two-stage FDR-controlled feature selection method for differential network analysis using functional magnetic resonance imaging (fMRI) data. First, we create individual brain connectivity measures using a high-dimensional precision matrix estimation technique. Next, we devise a penalized logistic regression model that employs individual brain connectivity data and integrates a new knockoff filter for FDR control when detecting significant differential edges. Through extensive simulations, we showcase the superiority of our approach compared to other methods. Additionally, we apply our technique to fMRI data to identify differential edges between Alzheimer's disease and control groups. Our results are consistent with prior experimental studies, emphasizing the practical applicability of our method.
Subject(s)
Alzheimer Disease , Brain , Magnetic Resonance Imaging , Humans , Magnetic Resonance Imaging/methods , Alzheimer Disease/diagnostic imaging , Brain/diagnostic imaging , Brain/physiology , Computer Simulation , Logistic Models , Nerve Net/diagnostic imaging , Nerve Net/physiology , Connectome/methodsABSTRACT
Growing evidence has shown that the brain connectivity network experiences alterations for complex diseases such as Alzheimer's disease (AD). Network comparison, also known as differential network analysis, is thus particularly powerful to reveal the disease pathologies and identify clinical biomarkers for medical diagnoses (classification). Data from neurophysiological measurements are multidimensional and in matrix-form. Naive vectorization method is not sufficient as it ignores the structural information within the matrix. In the article, we adopt the Kronecker product covariance matrices framework to capture both spatial and temporal correlations of the matrix-variate data while the temporal covariance matrix is treated as a nuisance parameter. By recognizing that the strengths of network connections may vary across subjects, we develop an ensemble-learning procedure, which identifies the differential interaction patterns of brain regions between the case group and the control group and conducts medical diagnosis (classification) of the disease simultaneously. Simulation studies are conducted to assess the performance of the proposed method. We apply the proposed procedure to the functional connectivity analysis of an functional magnetic resonance imaging study on AD. The hub nodes and differential interaction patterns identified are consistent with existing experimental studies, and satisfactory out-of-sample classification performance is achieved for medical diagnosis of AD.
Subject(s)
Alzheimer Disease , Brain , Alzheimer Disease/diagnostic imaging , Brain/diagnostic imaging , Humans , Magnetic Resonance Imaging/methodsABSTRACT
BACKGROUND: Transcriptome-wide association studies (TWASs) have shown great promise in interpreting the findings from genome-wide association studies (GWASs) and exploring the disease mechanisms, by integrating GWAS and eQTL mapping studies. Almost all TWAS methods only focus on one gene at a time, with exception of only two published multiple-gene methods nevertheless failing to account for the inter-dependence as well as the network structure among multiple genes, which may lead to power loss in TWAS analysis as complex disease often owe to multiple genes that interact with each other as a biological network. We therefore developed a Network Regression method in a two-stage TWAS framework (NeRiT) to detect whether a given network is associated with the traits of interest. NeRiT adopts the flexible Bayesian Dirichlet process regression to obtain the gene expression prediction weights in the first stage, uses pointwise mutual information to represent the general between-node correlation in the second stage and can effectively take the network structure among different gene nodes into account. RESULTS: Comprehensive and realistic simulations indicated NeRiT had calibrated type I error control for testing both the node effect and edge effect, and yields higher power than the existed methods, especially in testing the edge effect. The results were consistent regardless of the GWAS sample size, the gene expression prediction model in the first step of TWAS, the network structure as well as the correlation pattern among different gene nodes. Real data applications through analyzing systolic blood pressure and diastolic blood pressure from UK Biobank showed that NeRiT can simultaneously identify the trait-related nodes as well as the trait-related edges. CONCLUSIONS: NeRiT is a powerful and efficient network regression method in TWAS.
Subject(s)
Genome-Wide Association Study , Transcriptome , Bayes Theorem , Genetic Predisposition to Disease , Genome-Wide Association Study/methods , Humans , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Regression AnalysisABSTRACT
BACKGROUND: Breast cancer (BC) is one of the most prevalent cancers worldwide but its etiology remains unclear. Obesity is recognized as a risk factor for BC, and many obesity-related genes may be involved in its occurrence and development. Research assessing the complex genetic mechanisms of BC should not only consider the effect of a single gene on the disease, but also focus on the interaction between genes. This study sought to construct a gene interaction network to identify potential pathogenic BC genes. METHODS: The study included 953 BC patients and 963 control individuals. Chi-square analysis was used to assess the correlation between demographic characteristics and BC. The joint density-based non-parametric differential interaction network analysis and classification (JDINAC) was used to build a BC gene interaction network using single nucleotide polymorphisms (SNP). The odds ratio (OR) and 95% confidence interval (95% CI) of hub gene SNPs were evaluated using a logistic regression model. To assess reliability, the hub genes were quantified by edgeR program using BC RNA-seq data from The Cancer Genome Atlas (TCGA) and identical edges were verified by logistic regression using UK Biobank datasets. Go and KEGG enrichment analysis were used to explore the biological functions of interactive genes. RESULTS: Body mass index (BMI) and menopause are important risk factors for BC. After adjusting for potential confounding factors, the BC gene interaction network was identified using JDINAC. LEP, LEPR, XRCC6, and RETN were identified as hub genes and both hub genes and edges were verified. LEPR genetic polymorphisms (rs1137101 and rs4655555) were also significantly associated with BC. Enrichment analysis showed that the identified genes were mainly involved in energy regulation and fat-related signaling pathways. CONCLUSION: We explored the interaction network of genes derived from SNP data in BC progression. Gene interaction networks provide new insight into the underlying mechanisms of BC.
Subject(s)
Breast Neoplasms , Breast Neoplasms/pathology , Female , Gene Expression Regulation, Neoplastic , Gene Regulatory Networks , Humans , Machine Learning , Obesity/genetics , Polymorphism, Single Nucleotide , Reproducibility of ResultsABSTRACT
Brain functional connectivity reveals the synchronization of brain systems through correlations in neurophysiological measures of brain activities. Growing evidence now suggests that the brain connectivity network experiences alterations with the presence of numerous neurological disorders, thus differential brain network analysis may provide new insights into disease pathologies. The data from neurophysiological measurement are often multidimensional and in a matrix form, posing a challenge in brain connectivity analysis. Existing graphical model estimation methods either assume a vector normal distribution that in essence requires the columns of the matrix data to be independent or fail to address the estimation of differential networks across different populations. To tackle these issues, we propose an innovative matrix-variate differential network (MVDN) model. We exploit the D-trace loss function and a Lasso-type penalty to directly estimate the spatial differential partial correlation matrix and use an alternating direction method of multipliers algorithm for the optimization problem. Theoretical and simulation studies demonstrate that MVDN significantly outperforms other state-of-the-art methods in dynamic differential network analysis. We illustrate with a functional connectivity analysis of an attention deficit hyperactivity disorder dataset. The hub nodes and differential interaction patterns identified are consistent with existing experimental studies.
Subject(s)
Brain , Magnetic Resonance Imaging , Algorithms , Brain/diagnostic imaging , Brain Mapping/methods , Magnetic Resonance Imaging/methods , Normal DistributionABSTRACT
BACKGROUND: Genome-wide association studies (GWAS) have successfully identified genetic susceptible variants for complex diseases. However, the underlying mechanism of such association remains largely unknown. Most disease-associated genetic variants have been shown to reside in noncoding regions, leading to the hypothesis that regulation of gene expression may be the primary biological mechanism. Current methods to characterize gene expression mediating the effect of genetic variant on diseases, often analyzed one gene at a time and ignored the network structure. The impact of genetic variant can propagate to other genes along the links in the network, then to the final disease. There could be multiple pathways from the genetic variant to the final disease, with each having the chain structure since the first node is one specific SNP (Single Nucleotide Polymorphism) variant and the end is disease outcome. One key but inadequately addressed question is how to measure the between-node connection strength and rank the effects of such chain-type pathways, which can provide statistical evidence to give the priority of some pathways for potential drug development in a cost-effective manner. RESULTS: We first introduce the maximal correlation coefficient (MCC) to represent the between-node connection, and then integrate MCC with K shortest paths algorithm to rank and identify the potential pathways from genetic variant to disease. The pathway importance score (PIS) was further provided to quantify the importance of each pathway. We termed this method as "MCC-SP". Various simulations are conducted to illustrate MCC is a better measurement of the between-node connection strength than other quantities including Pearson correlation, Spearman correlation, distance correlation, mutual information, and maximal information coefficient. Finally, we applied MCC-SP to analyze one real dataset from the Religious Orders Study and the Memory and Aging Project, and successfully detected 2 typical pathways from APOE genotype to Alzheimer's disease (AD) through gene expression enriched in Alzheimer's disease pathway. CONCLUSIONS: MCC-SP has powerful and robust performance in identifying the pathway(s) from the genetic variant to the disease. The source code of MCC-SP is freely available at GitHub ( https://github.com/zhuyuchen95/ADnet ).
Subject(s)
Genetic Predisposition to Disease , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Algorithms , Alzheimer Disease/genetics , Computer Simulation , Genotype , Humans , Models, Genetic , SoftwareABSTRACT
Multiomics or integrative omics data have been increasingly common in biomedical studies, holding a promise in better understanding human health and disease. In this article, we propose an integrative copula discrimination analysis classifier in the context of two-class classification, which relaxes the common Gaussian assumption and gains power by borrowing information from multiple omics data types in discriminant analysis. Numerical studies are conducted to assess the finite sample performance of the new classifier. We apply our model to the Religious Orders Study and Memory and Aging Project (ROSMAP) Study, integrating gene expression and DNA methylation data for better prediction.
Subject(s)
DNA Methylation , Discriminant Analysis , Humans , Normal DistributionABSTRACT
BACKGROUND: Exposure to air pollution is associated with chronic obstructive pulmonary disease (COPD). However, findings on the effects of air pollution on lung function and systemic inflammation in Chinese COPD patients are inconsistent and scarce. This study aims to evaluate the effects of ambient air pollution on lung function parameters and serum cytokine levels in a COPD cohort in Beijing, China. METHODS: We enrolled COPD participants on a rolling basis from December 2015 to September 2017 in Beijing, China. Follow-ups were performed every 3 months for each participant. Serum levels of 20 cytokines were detected every 6 months. Hourly ambient pollutant levels over the same periods were obtained from 35 monitoring stations across Beijing. Geocoded residential addresses of the participants were used to estimate daily mean pollution exposures. A linear mixed-effect model was applied to explore the effects of air pollutants on health in the first-year of follow-up. RESULTS: A total of 84 COPD patients were enrolled at baseline. Of those, 75 COPD patients completed the first-year of follow-up. We found adverse cumulative effects of particulate matter less than 2.5 µm in aerodynamic diameter (PM2.5), nitrogen dioxide (NO2), sulfur dioxide (SO2) and carbon monoxide (CO) on the forced vital capacity % predicted (FVC % pred) in patients with COPD. Further analyses illustrated that among COPD patients, air pollution exposure was associated with reduced levels of serum eotaxin, interleukin 4 (IL-4) and IL-13 and was correlated with increased serum IL-2, IL-12, IL-17A, interferon γ (IFNγ), monocyte displacing protein 1 (MCP-1) and soluble CD40 ligand (sCD40L). CONCLUSION: Acute exposures to PM2.5, NO2, SO2 and CO were associated with a reduction in FVC % pred in COPD patients. Furthermore, short-term exposure to air pollutants increased systemic inflammation in COPD patients; this may be attributed to increased Th1 and Th17 cytokines and decreased Th2 cytokines.
Subject(s)
Air Pollution/adverse effects , Cytokines/blood , Inflammation/physiopathology , Lung/physiopathology , Pulmonary Disease, Chronic Obstructive/complications , Adult , Aged , Beijing , Female , Humans , Inflammation/chemically induced , Male , Middle Aged , Patients , Respiratory Function Tests , Serum/chemistry , Time Factors , Young AdultABSTRACT
PURPOSE: Previous studies have shown that serum carcinoembryonic antigen (CEA) is independently associated with metabolic syndrome (MetS). However, these studies were mainly cross-sectional analyses, and cause was not clarified. In the present study, two bidirectional cohort studies were conducted to investigate the bidirectional associations between CEA and MetS using a Chinese male sample cohort. METHODS: The initial longitudinal cohort included 9629 Chinese males enrolled from January 2010 to December 2015. Two bidirectional cohorts were conducted in the study: subcohort A (from CEA to MetS, n = 6439) included participants without MetS at baseline to estimate the risk of developing incident MetS; subcohort B (from MetS to CEA, n = 8533) included participants without an elevated CEA level (Hyper-CEA) at baseline to examine the risk of developing incident Hyper-CEA. Hazard ratios (HRs) and 95% confidence intervals (CIs) were estimated using Cox proportional hazards models. RESULTS: In subcohort A, the incidence densities of MetS among participants with and without Hyper-CEA were 84.56 and 99.28 per 1000 person-years, respectively. No significant effects of Hyper-CEA on incident MetS were observed in subcohort A (HR, 0.89; 95% CI, 0.71 to 1.12; P = 0.326). In subcohort B, a higher incidence density of Hyper-CEA was found among participants with MetS (33.42 and 29.13 per 1000 person-years for those with and without MetS, respectively). For nonsmoking participants aged > 65 years, MetS increased the risk of incident Hyper-CEA (HR, 1.87; 95% CI, 1.09 to 3.20; P = 0.022). CONCLUSION: For the direction of CEA on incident MetS, no significant association was observed. For the direction of MetS on incident Hyper-CEA, MetS in nonsmoking elderly men could increase the risk of incident Hyper-CEA, while this association was not found in other stratified participants. The clinical implications of the association between CEA and MetS should be interpreted with caution.
Subject(s)
Carcinoembryonic Antigen/blood , Metabolic Syndrome/blood , Adult , Asian People , Cohort Studies , Humans , Incidence , Male , Metabolic Syndrome/epidemiology , Middle Aged , SmokingABSTRACT
BACKGROUND: In genomic studies, to investigate how the structure of a genetic network differs between two experiment conditions is a very interesting but challenging problem, especially in high-dimensional setting. Existing literatures mostly focus on differential network modelling for continuous data. However, in real application, we may encounter discrete data or mixed data, which urges us to propose a unified differential network modelling for various data types. RESULTS: We propose a unified latent Gaussian copula differential network model which provides deeper understanding of the unknown mechanism than that among the observed variables. Adaptive rank-based estimation approaches are proposed with the assumption that the true differential network is sparse. The adaptive estimation approaches do not require precision matrices to be sparse, and thus can allow the individual networks to contain hub nodes. Theoretical analysis shows that the proposed methods achieve the same parametric convergence rate for both the difference of the precision matrices estimation and differential structure recovery, which means that the extra modeling flexibility comes at almost no cost of statistical efficiency. Besides theoretical analysis, thorough numerical simulations are conducted to compare the empirical performance of the proposed methods with some other state-of-the-art methods. The result shows that the proposed methods work quite well for various data types. The proposed method is then applied on gene expression data associated with lung cancer to illustrate its empirical usefulness. CONCLUSIONS: The proposed latent variable differential network models allows for various data-types and thus are more flexible, which also provide deeper understanding of the unknown mechanism than that among the observed variables. Theoretical analysis, numerical simulation and real application all demonstrate the great advantages of the latent differential network modelling and thus are highly recommended.
Subject(s)
Disease/genetics , Gene Regulatory Networks , Models, Theoretical , Computer Simulation , Gene Expression Regulation, Neoplastic , Gene Ontology , Humans , Lung Neoplasms/genetics , Normal Distribution , ROC CurveABSTRACT
MOTIVATION: A complex disease is usually driven by a number of genes interwoven into networks, rather than a single gene product. Network comparison or differential network analysis has become an important means of revealing the underlying mechanism of pathogenesis and identifying clinical biomarkers for disease classification. Most studies, however, are limited to network correlations that mainly capture the linear relationship among genes, or rely on the assumption of a parametric probability distribution of gene measurements. They are restrictive in real application. RESULTS: We propose a new Joint density based non-parametric Differential Interaction Network Analysis and Classification (JDINAC) method to identify differential interaction patterns of network activation between two groups. At the same time, JDINAC uses the network biomarkers to build a classification model. The novelty of JDINAC lies in its potential to capture non-linear relations between molecular interactions using high-dimensional sparse data as well as to adjust confounding factors, without the need of the assumption of a parametric probability distribution of gene measurements. Simulation studies demonstrate that JDINAC provides more accurate differential network estimation and lower classification error than that achieved by other state-of-the-art methods. We apply JDINAC to a Breast Invasive Carcinoma dataset, which includes 114 patients who have both tumor and matched normal samples. The hub genes and differential interaction patterns identified were consistent with existing experimental studies. Furthermore, JDINAC discriminated the tumor and normal sample with high accuracy by virtue of the identified biomarkers. JDINAC provides a general framework for feature selection and classification using high-dimensional sparse omics data. AVAILABILITY AND IMPLEMENTATION: R scripts available at https://github.com/jijiadong/JDINAC. CONTACT: lxie@iscb.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Breast Neoplasms/genetics , Gene Regulatory Networks , Carcinoma/genetics , Computer Simulation , Female , HumansABSTRACT
BACKGROUND: Complex disease is largely determined by a number of biomolecules interwoven into networks, rather than a single biomolecule. A key but inadequately addressed issue is how to test possible differences of the networks between two groups. Group-level comparison of network properties may shed light on underlying disease mechanisms and benefit the design of drug targets for complex diseases. We therefore proposed a powerful score-based statistic to detect group difference in weighted networks, which simultaneously capture the vertex changes and edge changes. RESULTS: Simulation studies indicated that the proposed network difference measure (NetDifM) was stable and outperformed other methods existed, under various sample sizes and network topology structure. One application to real data about GWAS of leprosy successfully identified the specific gene interaction network contributing to leprosy. For additional gene expression data of ovarian cancer, two candidate subnetworks, PI3K-AKT and Notch signaling pathways, were considered and identified respectively. CONCLUSIONS: The proposed method, accounting for the vertex changes and edge changes simultaneously, is valid and powerful to capture the group difference of biological networks.
Subject(s)
Gene Regulatory Networks , Leprosy/genetics , Models, Statistical , Ovarian Neoplasms/genetics , Signal Transduction , Epistasis, Genetic , Female , HumansABSTRACT
BACKGROUND: We propose a novel Markov Blanket-based repeated-fishing strategy (MBRFS) in attempt to increase the power of existing Markov Blanket method (DASSO-MB) and maintain its advantages in omic data analysis. RESULTS: Both simulation and real data analysis were conducted to assess its performances by comparing with other methods including χ(2) test with Bonferroni and B-H adjustment, least absolute shrinkage and selection operator (LASSO) and DASSO-MB. A serious of simulation studies showed that the true discovery rate (TDR) of proposed MBRFS was always close to zero under null hypothesis (odds ratio = 1 for each SNPs) with excellent stability in all three scenarios of independent phenotype-related SNPs without linkage disequilibrium (LD) around them, correlated phenotype-related SNPs without LD around them, and phenotype-related SNPs with strong LD around them. As expected, under different odds ratio and minor allel frequency (MAFs), MBRFS always had the best performances in capturing the true phenotype-related biomarkers with higher matthews correlation coefficience (MCC) for all three scenarios above. More importantly, since proposed MBRFS using the repeated fishing strategy, it still captures more phenotype-related SNPs with minor effects when non-significant phenotype-related SNPs emerged under χ(2) test after Bonferroni multiple correction. The various real omics data analysis, including GWAS data, DNA methylation data, gene expression data and metabolites data, indicated that the proposed MBRFS always detected relatively reasonable biomarkers. CONCLUSIONS: Our proposed MBRFS can exactly capture the true phenotype-related biomarkers with the reduction of false negative rate when the phenotype-related biomarkers are independent or correlated, as well as the circumstance that phenotype-related biomarkers are associated with non-phenotype-related ones.
Subject(s)
Genetic Markers , Genomics/methods , Markov Chains , Phenotype , Asian People/genetics , Breast Neoplasms/diagnosis , Breast Neoplasms/genetics , Case-Control Studies , Computer Simulation , DNA Methylation , Databases, Genetic , Gene Frequency , Genome-Wide Association Study , Humans , Leprosy/diagnosis , Leprosy/genetics , Linkage Disequilibrium , Models, Theoretical , Polymorphism, Single Nucleotide , Schizophrenia/diagnosis , Schizophrenia/geneticsABSTRACT
BACKGROUND: The genetic variants identified by Genome-wide association study (GWAS) can only account for a small proportion of the total heritability for complex disease. The existence of gene-gene joint effects which contains the main effects and their co-association is one of the possible explanations for the "missing heritability" problems. Gene-gene co-association refers to the extent to which the joint effects of two genes differ from the main effects, not only due to the traditional interaction under nearly independent condition but the correlation between genes. Generally, genes tend to work collaboratively within specific pathway or network contributing to the disease and the specific disease-associated locus will often be highly correlated (e.g. single nucleotide polymorphisms (SNPs) in linkage disequilibrium). Therefore, we proposed a novel score-based statistic (SBS) as a gene-based method for detecting gene-gene co-association. RESULTS: Various simulations illustrate that, under different sample sizes, marginal effects of causal SNPs and co-association levels, the proposed SBS has the better performance than other existed methods including single SNP-based and principle component analysis (PCA)-based logistic regression model, the statistics based on canonical correlations (CCU), kernel canonical correlation analysis (KCCU), partial least squares path modeling (PLSPM) and delta-square (δ (2)) statistic. The real data analysis of rheumatoid arthritis (RA) further confirmed its advantages in practice. CONCLUSIONS: SBS is a powerful and efficient gene-based method for detecting gene-gene co-association.
Subject(s)
Gene Regulatory Networks , Models, Genetic , Models, Statistical , Arthritis, Rheumatoid/genetics , Computer Simulation , Genetic Predisposition to Disease/genetics , Genome-Wide Association Study , Humans , Inheritance Patterns , Polymorphism, Single Nucleotide , Principal Component AnalysisABSTRACT
Traditional epidemiology often pays more attention to the identification of a single factor rather than to the pathway that is related to a disease, and therefore, it is difficult to explore the disease mechanism. Systems epidemiology aims to integrate putative lifestyle exposures and biomarkers extracted from multiple omics platforms to offer new insights into the pathway mechanisms that underlie disease at the human population level. One key but inadequately addressed question is how to develop powerful statistics to identify whether one candidate pathway is associated with a disease. Bearing in mind that a pathway difference can result from not only changes in the nodes but also changes in the edges, we propose a novel statistic for detecting group differences between pathways, which in principle, captures the nodes changes and edge changes, as well as simultaneously accounting for the pathway structure simultaneously. The proposed test has been proven to follow the chi-square distribution, and various simulations have shown it has better performance than other existing methods. Integrating genome-wide DNA methylation data, we analyzed one real data set from the Bogalusa cohort study and significantly identified a potential pathway, Smoking â SOCS3 â PIK3R1, which was strongly associated with abdominal obesity. The proposed test was powerful and efficient at identifying pathway differences between two groups, and it can be extended to other disciplines that involve statistical comparisons between pathways. The source code in R is available on our website. Copyright © 2016 John Wiley & Sons, Ltd.
Subject(s)
Chi-Square Distribution , Cohort Studies , Epidemiologic Studies , HumansABSTRACT
BACKGROUND: In stark contrast to network-centric view for complex disease, regression-based methods are preferred in disease prediction, especially for epidemiologists and clinical professionals. It remains a controversy whether the network-based methods have advantageous performance than regression-based methods, and to what extent do they outperform. METHODS: Simulations under different scenarios (the input variables are independent or in network relationship) as well as an application were conducted to assess the prediction performance of four typical methods including Bayesian network, neural network, logistic regression and regression splines. RESULTS: The simulation results reveal that Bayesian network showed a better performance when the variables were in a network relationship or in a chain structure. For the special wheel network structure, logistic regression had a considerable performance compared to others. Further application on GWAS of leprosy show Bayesian network still outperforms other methods. CONCLUSION: Although regression-based methods are still popular and widely used, network-based approaches should be paid more attention, since they capture the complex relationship between variables.
Subject(s)
Bayes Theorem , Neural Networks, Computer , Outcome Assessment, Health Care/methods , Regression Analysis , Computer Simulation , Diagnosis, Differential , Humans , Logistic Models , Outcome Assessment, Health Care/statistics & numerical data , Reproducibility of Results , Sensitivity and SpecificityABSTRACT
BACKGROUND: The prevalence of cardiovascular disease has been increasing worldwide. As a common pathogenic risk factor, dyslipidemia played a great role in the incidence and progress of these diseases. We investigated to achieve accurate and up-to-date information on the prevalence of dyslipidemia and its associations with other lipid-related diseases in rural North China. METHODS: Using a complex, multistage, probability sampling design, we conducted a large-scale cross-sectional study of 8528 rural participants aged over 18 years in Shandong Province. Prevalence and characteristics of dyslipidemia were demonstrated. The odds ratios between dyslipidemia types and lipid-related diseases were further analyzed by logistic regression. RESULTS: Among the overall population, 45.8 % suffered from dyslipidemia. The prevalence of lipid abnormality (including high and very high levels) was 18.6, 12.7, 9.8 and 12.7 % for total cholesterol (TC), high-density lipoprotein (HDL), and low-density lipoprotein (LDL) cholesterol and triglycerides (TG), respectively. Among all participants with dyslipidemia, 23.9 % were aware, only 11.5 % were treated, 10.0 % were controlled. For subjects with dyslipidemia, the risk for non-alcoholic fatty liver disease (NAFLD) was highest with a 3.3-fold over that of non-dyslipidmia (OR = 3.30, P < 0.001); followed by hyperuricemia and diabetes mellitus (DM), while with 2-fold increase (OR = 1.99, P < 0.001; OR = 1.92, P < 0.001); with only 1.5-fold risk for atherosclerosis (AS) (OR = 1.47, P < 0.001). The presence of high cholesterol was mainly associated with AS, while abnormal TG was correlated with NAFLD and DM. CONCLUSIONS: Dyslipidemia has become a serious public health issue in rural North China. The rapid increase of high TC and incremental risk of high TG may contribute to the epidemic of AS, NAFLD and DM. It is imperative to develop individualized prevention and treatment guidelines according to dyslipidemia phenotypes.
Subject(s)
Atherosclerosis/epidemiology , Cardiovascular Diseases/epidemiology , Diabetes Mellitus/epidemiology , Dyslipidemias/epidemiology , Adolescent , Adult , Age Factors , Aged , Atherosclerosis/blood , Atherosclerosis/pathology , Cardiovascular Diseases/blood , Cardiovascular Diseases/pathology , Cholesterol/blood , Diabetes Mellitus/blood , Diabetes Mellitus/pathology , Dyslipidemias/blood , Dyslipidemias/pathology , Female , Humans , Lipids/blood , Lipoproteins, HDL/blood , Male , Middle Aged , Rural Population , Sex Characteristics , Triglycerides/bloodABSTRACT
Transcriptome-wide association studies (TWASs) aim to integrate genome-wide association studies with expression-mapping studies to identify genes with genetically predicted expression (GReX) associated with a complex trait. In the present report, we develop a method, GIFT (gene-based integrative fine-mapping through conditional TWAS), that performs conditional TWAS analysis by explicitly controlling for GReX of all other genes residing in a local region to fine-map putatively causal genes. GIFT is frequentist in nature, explicitly models both expression correlation and cis-single nucleotide polymorphism linkage disequilibrium across multiple genes and uses a likelihood framework to account for expression prediction uncertainty. As a result, GIFT produces calibrated P values and is effective for fine-mapping. We apply GIFT to analyze six traits in the UK Biobank, where GIFT narrows down the set size of putatively causal genes by 32.16-91.32% compared with existing TWAS fine-mapping approaches. The genes identified by GIFT highlight the importance of vessel regulation in determining blood pressures and lipid metabolism for regulating lipid levels.
Subject(s)
Genome-Wide Association Study , Transcriptome , Humans , Genome-Wide Association Study/methods , Quantitative Trait Loci/genetics , Phenotype , Linkage Disequilibrium , Polymorphism, Single Nucleotide/genetics , Genetic Predisposition to Disease/geneticsABSTRACT
Mass spectrometry is a powerful and widely used tool for generating proteomics, lipidomics and metabolomics profiles, which is pivotal for elucidating biological processes and identifying biomarkers. However, missing values in mass spectrometry-based omics data may pose a critical challenge for the comprehensive identification of biomarkers and elucidation of the biological processes underlying human complex disorders. To alleviate this issue, various imputation methods for mass spectrometry-based omics data have been developed. However, a comprehensive comparison of these imputation methods is still lacking, and researchers are frequently confronted with a multitude of options without a clear rationale for method selection. To address this pressing need, we developed omicsMIC (mass spectrometry-based omics with Missing values Imputation methods Comparison platform), an interactive platform that provides researchers with a versatile framework to evaluate the performance of 28 diverse imputation methods. omicsMIC offers a nuanced perspective, acknowledging the inherent heterogeneity in biological data and the unique attributes of each dataset. Our platform empowers researchers to make data-driven decisions in imputation method selection based on real-time visualizations of the outcomes associated with different imputation strategies. The comprehensive benchmarking and versatility of omicsMIC make it a valuable tool for the scientific community engaged in mass spectrometry-based omics research. omicsMIC is freely available at https://github.com/WQLin8/omicsMIC.