RESUMO
The crucial impact of the microbiome on human health and disease has gained significant scientific attention. Researchers seek to connect microbiome features with health conditions, aiming to predict diseases and develop personalized medicine strategies. However, the practicality of conventional models is restricted due to important aspects of microbiome data. Specifically, the data observed is compositional, as the counts within each sample are bound by a fixed-sum constraint. Moreover, microbiome data often exhibits high dimensionality, wherein the number of variables surpasses the available samples. In addition, microbiome features exhibiting phenotypical similarity usually have similar influence on the response variable. To address the challenges posed by these aspects of the data structure, we proposed Bayesian compositional generalized linear models for analyzing microbiome data (BCGLM) with a structured regularized horseshoe prior for the compositional coefficients and a soft sum-to-zero restriction on coefficients through the prior distribution. We fitted the proposed models using Markov Chain Monte Carlo (MCMC) algorithms with R package rstan. The performance of the proposed method was assessed by extensive simulation studies. The simulation results show that our approach outperforms existing methods with higher accuracy of coefficient estimates and lower prediction error. We also applied the proposed method to microbiome study to find microorganisms linked to inflammatory bowel disease (IBD). To make this work reproducible, the code and data used in this article are available at https://github.com/Li-Zhang28/BCGLM.
Assuntos
Microbiota , Humanos , Modelos Lineares , Teorema de Bayes , Simulação por Computador , AlgoritmosRESUMO
In addition to considering the main effects, understanding gene-environment (G × E) interactions is imperative for determining the etiology of diseases and the factors that affect their prognosis. In the existing statistical framework for censored survival outcomes, there are several challenges in detecting G × E interactions, such as handling high-dimensional omics data, diverse environmental factors, and algorithmic complications in survival analysis. The effect heredity principle has widely been used in studies involving interaction identification because it incorporates the dependence of the main and interaction effects. However, Bayesian survival models that incorporate the assumption of this principle have not been developed. Therefore, we propose Bayesian heredity-constrained accelerated failure time (BHAFT) models for identifying main and interaction (M-I) effects with novel spike-and-slab or regularized horseshoe priors to incorporate the assumption of effect heredity principle. The R package rstan was used to fit the proposed models. Extensive simulations demonstrated that BHAFT models had outperformed other existing models in terms of signal identification, coefficient estimation, and prognosis prediction. Biologically plausible G × E interactions associated with the prognosis of lung adenocarcinoma were identified using our proposed model. Notably, BHAFT models incorporating the effect heredity principle could identify both main and interaction effects, which are highly useful in exploring G × E interactions in high-dimensional survival analysis. The code and data used in our paper are available at https://github.com/SunNa-bayesian/BHAFT.
Assuntos
Teorema de Bayes , Simulação por Computador , Interação Gene-Ambiente , Neoplasias Pulmonares , Humanos , Análise de Sobrevida , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/mortalidade , Modelos Estatísticos , Prognóstico , Adenocarcinoma de Pulmão/genética , Adenocarcinoma de Pulmão/mortalidade , AlgoritmosRESUMO
[This corrects the article DOI: 10.1371/journal.pmed.1003232.].
RESUMO
BACKGROUND: There is little data on gut microbiome and various factors that lead to dysbiosis in pediatric intestinal failure (PIF). This study aimed to characterize gut microbiome in PIF and determine factors that may affect microbial composition in these patients. METHODS: This is a single-center, prospective cohort study of children with PIF followed at our intestinal rehabilitation program. Stool samples were collected longitudinally at regular intervals over a 1-year period. Medical records were reviewed, and demographic and clinical data were collected. Medication history including the use of acid blockers, scheduled prophylactic antibiotics, and bile acid sequestrants was obtained. Gut microbial diversity among patients was assessed and compared according to various host characteristics of interest. RESULTS: The final analysis included 74 specimens from 12 subjects. Scheduled prophylactic antibiotics, presence of central line associated bloodstream infection (CLABSI) at the time of specimen collection, use of acid blockers, and ≥50% calories delivered via parenteral nutrition (PN) was associated with reduced alpha diversity, whereas increasing age was associated with improved alpha diversity at various microbial levels ( P value <0.05). Beta diversity differed with age, presence of CLABSI, use of scheduled antibiotics, acid blockers, percent calories via PN, and presence of oral feeds at various microbial levels ( P value <0.05). Single taxon analysis identified several taxa at several microbial levels, which were significantly associated with various host characteristics. CONCLUSION: Gut microbial diversity in PIF subjects is influenced by various factors involved in the rehabilitation process including medications, percent calories received parenterally, CLABSI events, the degree of oral feeding, and age. Additional investigation performed across multiple centers is needed to further understand the impact of these findings on important clinical outcomes in PIF.
Assuntos
Microbioma Gastrointestinal , Insuficiência Intestinal , Humanos , Criança , Estudos Prospectivos , Ingestão de Energia , Nutrição ParenteralRESUMO
BACKGROUND: Early progression of feeding could influence the development of the gut microbiome. METHODS: We collected fecal samples from extremely preterm infants randomized to receive either early (feeding day 2) or delayed (feeding day 5) feeding progression. After study completion, we compared samples obtained at three different time points (week 1, week 2, and week 3) to determine longitudinal differences in specific taxa between the study groups using unadjusted and adjusted negative binomial and zero-inflated mixed models. Analyses were adjusted for a mode of delivery, breastmilk intake, and exposure to antibiotics. RESULTS: We analyzed 137 fecal samples from 51 infants. In unadjusted and adjusted analyses, we did not observe an early transition to higher microbial diversity within samples (i.e., alpha diversity) or significant differences in microbial diversity between samples (i.e., beta diversity) in the early feeding group. Our longitudinal, single-taxon analysis found consistent differences in the genera Lactococcus, Veillonella, and Bilophila between groups. CONCLUSIONS: Differences in single-taxon analyses independent of the mode of delivery, exposure to antibiotics, and breastmilk feeding suggest potential benefits of early progression of enteral feeding volumes. However, this dietary intervention does not appear to increase the diversity of the gut microbiome in the first 28 days after birth. TRIAL REGISTRATION: ClinicalTrials.gov identifier: NCT02915549. IMPACT: Early progression of enteral feeding volumes with human milk reduces the duration of parenteral nutrition and the need for central venous access among extremely preterm infants. Early progression of enteral feeding leads to single-taxon differences in longitudinal analyses of the gut microbiome, but it does not appear to increase the diversity of the gut microbiome in the first 28 days after birth. Randomization in enteral feeding trials creates appealing opportunities to evaluate the effects of human milk diets on the gut microbiome.
Assuntos
Nutrição Enteral , Microbioma Gastrointestinal , Antibacterianos , Humanos , Lactente , Lactente Extremamente Prematuro , Recém-Nascido , Leite HumanoRESUMO
There are proposals that extend the classical generalized additive models (GAMs) to accommodate high-dimensional data ( p â« n $$ p\gg n $$ ) using group sparse regularization. However, the sparse regularization may induce excess shrinkage when estimating smooth functions, damaging predictive performance. Moreover, most of these GAMs consider an "all-in-all-out" approach for functional selection, rendering them difficult to answer if nonlinear effects are necessary. While some Bayesian models can address these shortcomings, using Markov chain Monte Carlo algorithms for model fitting creates a new challenge, scalability. Hence, we propose Bayesian hierarchical generalized additive models as a solution: we consider the smoothing penalty for proper shrinkage of curve interpolation via reparameterization. A novel two-part spike-and-slab LASSO prior for smooth functions is developed to address the sparsity of signals while providing extra flexibility to select the linear or nonlinear components of smooth functions. A scalable and deterministic algorithm, EM-Coordinate Descent, is implemented in an open-source R package BHAM. Simulation studies and metabolomics data analyses demonstrate improved predictive and computational performance against state-of-the-art models. Functional selection performance suggests trade-offs exist regarding the effect hierarchy assumption.
Assuntos
Algoritmos , Análise de Dados , Teorema de Bayes , Simulação por Computador , Humanos , Método de Monte CarloRESUMO
Spike-and-slab priors model predictors as arising from a mixture of distributions: those that should (slab) or should not (spike) remain in the model. The spike-and-slab lasso (SSL) is a mixture of double exponentials, extending the single lasso penalty by imposing different penalties on parameters based on their inclusion probabilities. The SSL was extended to Generalized Linear Models (GLM) for application in genetics/genomics, and can handle many highly correlated predictors of a scalar outcome, but does not incorporate these relationships into variable selection. When images/spatial data are used to model a scalar outcome, relevant parameters tend to cluster spatially, and model performance may benefit from incorporating spatial structure into variable selection. We propose to incorporate spatial information by assigning intrinsic autoregressive priors to the logit prior probabilities of inclusion, which results in more similar shrinkage penalties among spatially adjacent parameters. Using MCMC to fit Bayesian models can be computationally prohibitive for large-scale data, but we fit the model by adapting a computationally efficient coordinate-descent-based EM algorithm. A simulation study and an application to Alzheimer's Disease imaging data show that incorporating spatial information can improve model fitness.
RESUMO
MOTIVATION: Longitudinal metagenomics data, including both 16S rRNA and whole-metagenome shotgun sequencing data, enhanced our abilities to understand the dynamic associations between the human microbiome and various diseases. However, analytic tools have not been fully developed to simultaneously address the main challenges of longitudinal metagenomics data, i.e. high-dimensionality, dependence among samples and zero-inflation of observed counts. RESULTS: We propose a fast zero-inflated negative binomial mixed modeling (FZINBMM) approach to analyze high-dimensional longitudinal metagenomic count data. The FZINBMM approach is based on zero-inflated negative binomial mixed models (ZINBMMs) for modeling longitudinal metagenomic count data and a fast EM-IWLS algorithm for fitting ZINBMMs. FZINBMM takes advantage of a commonly used procedure for fitting linear mixed models, which allows us to include various types of fixed and random effects and within-subject correlation structures and quickly analyze many taxa. We found that FZINBMM remarkably outperformed in computational efficiency and was statistically comparable with two R packages, GLMMadaptive and glmmTMB, that use numerical integration to fit ZINBMMs. Extensive simulations and real data applications showed that FZINBMM outperformed other previous methods, including linear mixed models, negative binomial mixed models and zero-inflated Gaussian mixed models. AVAILABILITY AND IMPLEMENTATION: FZINBMM has been implemented in the R package NBZIMM, available in the public GitHub repository http://github.com//nyiuab//NBZIMM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Metagenômica , Microbiota , Humanos , Metagenoma , Microbiota/genética , Modelos Estatísticos , RNA Ribossômico 16S/genéticaRESUMO
BACKGROUND: Microbiome/metagenomic data have specific characteristics, including varying total sequence reads, over-dispersion, and zero-inflation, which require tailored analytic tools. Many microbiome/metagenomic studies follow a longitudinal design to collect samples, which further complicates the analysis methods needed. A flexible and efficient R package is needed for analyzing processed multilevel or longitudinal microbiome/metagenomic data. RESULTS: NBZIMM is a freely available R package that provides functions for setting up and fitting negative binomial mixed models, zero-inflated negative binomial mixed models, and zero-inflated Gaussian mixed models. It also provides functions to summarize the results from fitted models, both numerically and graphically. The main functions are built on top of the commonly used R packages nlme and MASS, allowing us to incorporate the well-developed analytic procedures into the framework for analyzing over-dispersed and zero-inflated count or proportion data with multilevel structures (e.g., longitudinal studies). The statistical methods and their implementations in NBZIMM particularly address the data characteristics and the complex designs in microbiome/metagenomic studies. The package is freely available from the public GitHub repository https://github.com/nyiuab/NBZIMM . CONCLUSION: The NBZIMM package provides useful tools for complex microbiome/metagenomics data analysis.
Assuntos
Análise de Dados , Metagenômica , Microbiota/genética , Modelos Estatísticos , Algoritmos , Humanos , Metagenoma , Análise MultinívelRESUMO
Chronic low back pain (cLBP) that cannot be attributable to a specific pathoanatomical change is associated with high personal and societal costs. Still, the underlying mechanism that causes and sustains such a phenotype is largely unknown. Emerging evidence suggests that epigenetic changes play a role in chronic pain conditions. Using reduced representation bisulfite sequencing (RRBS), we evaluated DNA methylation profiles of adults with non-specific cLBP (n = 50) and pain-free controls (n = 48). We identified 28,325 hypermethylated and 36,936 hypomethylated CpG sites (p < 0.05). After correcting for multiple testing, we identified 159 DMRs (q < 0.01and methylation difference > 10%), the majority of which were located in CpG island (50%) and promoter regions (48%) on the associated genes. The genes associated with the differentially methylated regions were highly enriched in biological processes that have previously been implicated in immune signaling, endochondral ossification, and G-protein coupled transmissions. Our findings support inflammatory alterations and the role of bone maturation in cLBP. This study suggests that epigenetic regulation has an important role in the pathophysiology of non-specific cLBP and a basis for future studies in biomarker development and targeted interventions.
Assuntos
Dor Crônica/genética , Metilação de DNA/genética , Dor Lombar/genética , Adulto , Ilhas de CpG/genética , Feminino , Genoma Humano , Humanos , Masculino , Análise de Componente PrincipalRESUMO
BACKGROUND: Obesity is closely related to the development of insulin resistance and type 2 diabetes (T2D). The prevention of T2D has become imperative to stem the rising rates of this disease. Weight loss is highly effective in preventing T2D; however, the at-risk pool is large, and a clinically meaningful metric for risk stratification to guide interventions remains a challenge. The objective of this study is to predict T2D risk using full-information continuous analysis of nationally sampled data from white and black American adults age ≥45 years. METHODS AND FINDINGS: A sample of 12,043 black (33%) and white individuals from a population-based cohort, REasons for Geographic And Racial Differences in Stroke (REGARDS) (enrolled 2003-2007), was observed through 2013-2016. The mean participant age was 63.12 ± 8.62 years, and 43.7% were male. Mean BMI was 28.55 ± 5.61 kg/m2. Risk factors for T2D regularly recorded in the primary care setting were used to evaluate future T2D risk using Bayesian logistic regression. External validation was performed using 9,710 participants (19% black) from Atherosclerotic Risk in Communities (ARIC) (enrolled 1987-1989), observed through 1996-1998. The mean participant age in this cohort was 53.86 ± 5.65 years, and 44.6% were male. Mean BMI was 27.15 ± 4.92 kg/m2. Predictive performance was assessed using the receiver operating characteristic (ROC) curves and area under the curve (AUC) statistics. The primary outcome was incident T2D. By 2016 in REGARDS, there were 1,602 incident cases of T2D. Risk factors used to predict T2D progression included age, sex, race, BMI, triglycerides, high-density lipoprotein, blood pressure, and blood glucose. The Bayesian logistic model (AUC = 0.79) outperformed the Framingham risk score (AUC = 0.76), the American Diabetes Association risk score (AUC = 0.64), and a cardiometabolic disease system (using Adult Treatment Panel III criteria) (AUC = 0.75). Validation in ARIC was robust (AUC = 0.85). Main limitations include the limited generalizability of the REGARDS sample to black and white, older Americans, and no time to diagnosis for T2D. CONCLUSIONS: Our results show that a Bayesian logistic model using full-information continuous predictors has high predictive discrimination, and can be used to quantify race- and sex-specific T2D risk, providing a new, powerful predictive tool. This tool can be used for T2D prevention efforts including weight loss therapy by allowing clinicians to target high-risk individuals in a manner that could be used to optimize outcomes.
Assuntos
Negro ou Afro-Americano , Interpretação Estatística de Dados , Diabetes Mellitus Tipo 2/sangue , Diabetes Mellitus Tipo 2/epidemiologia , População Branca , Idoso , Idoso de 80 Anos ou mais , Teorema de Bayes , Glicemia/metabolismo , Estudos de Coortes , Diabetes Mellitus Tipo 2/diagnóstico , Feminino , Seguimentos , Humanos , Incidência , Resistência à Insulina/fisiologia , Modelos Logísticos , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Obesidade/sangue , Obesidade/diagnóstico , Obesidade/epidemiologia , Valor Preditivo dos Testes , Reprodutibilidade dos TestesRESUMO
SUMMARY: BhGLM is a freely available R package that implements Bayesian hierarchical modeling for high-dimensional clinical and genomic data. It consists of functions for setting up various Bayesian hierarchical models, including generalized linear models (GLMs) and Cox survival models, with four types of prior distributions for coefficients, i.e. double-exponential, Student-t, mixture double-exponential and mixture Student-t. These functions adapt fast and stable algorithms to estimate parameters. BhGLM also provides functions for summarizing results numerically and graphically and for evaluating predictive values. The package is particularly useful for analyzing large-scale molecular data, i.e. detecting disease-associated variables and predicting disease outcomes. We here describe the models, algorithms and associated features implemented in BhGLM. AVAILABILITY AND IMPLEMENTATION: The package is freely available from the public GitHub repository, https://github.com/nyiuab/BhGLM.
Assuntos
Algoritmos , Genômica , Teorema de Bayes , Modelos Lineares , Modelos de Riscos ProporcionaisRESUMO
BACKGROUND: Group structures among genes encoded in functional relationships or biological pathways are valuable and unique features in large-scale molecular data for survival analysis. However, most of previous approaches for molecular data analysis ignore such group structures. It is desirable to develop powerful analytic methods for incorporating valuable pathway information for predicting disease survival outcomes and detecting associated genes. RESULTS: We here propose a Bayesian hierarchical Cox survival model, called the group spike-and-slab lasso Cox (gsslasso Cox), for predicting disease survival outcomes and detecting associated genes by incorporating group structures of biological pathways. Our hierarchical model employs a novel prior on the coefficients of genes, i.e., the group spike-and-slab double-exponential distribution, to integrate group structures and to adaptively shrink the effects of genes. We have developed a fast and stable deterministic algorithm to fit the proposed models. We performed extensive simulation studies to assess the model fitting properties and the prognostic performance of the proposed method, and also applied our method to analyze three cancer data sets. CONCLUSIONS: Both the theoretical and empirical studies show that the proposed method can induce weaker shrinkage on predictors in an active pathway, thereby incorporating the biological similarity of genes within a same pathway into the hierarchical modeling. Compared with several existing methods, the proposed method can more accurately estimate gene effects and can better predict survival outcomes. For the three cancer data sets, the results show that the proposed method generates more powerful models for survival prediction and detecting associated genes. The method has been implemented in a freely available R package BhGLM at https://github.com/nyiuab/BhGLM .
Assuntos
Algoritmos , Estudos de Associação Genética , Predisposição Genética para Doença , Modelos Teóricos , Teorema de Bayes , Simulação por Computador , Feminino , Humanos , Neoplasias/genética , Prognóstico , Modelos de Riscos Proporcionais , Análise de SobrevidaRESUMO
Microvascular injury is associated with accelerated kidney transplant dysfunction and allograft failure. Molecular pathology can identify new mechanisms of microvascular injury while improving on the diagnostic and prognostic capabilities of traditional histology. We conducted a case-control study of archived kidney biopsy specimens stored up to 10 years with microvascular injury (n = 50) compared with biopsy specimens without histologic injury (n = 45) from patients of similar age, race, and sex. We measured WNT gene expression with a multiplex quantification platform by using digital barcoding, given the importance of WNT reactivation to the response to wounding in the kidney microvasculature and other compartments. Of 210 genes from a commercial WNT panel, 71 were associated with microvascular injury and 79 were associated with allograft failure, with considerable overlap of genes between each set. Molecular pathology identified 46 biopsy specimens with molecular evidence of microvascular injury; 18 (39%) were either C4d negative, donor-specific antibody negative, or had no microvascular injury by histology. The majority of cases with molecular evidence of microvascular injury had poor long-term outcomes. We identified novel WNT pathway genes associated with microvascular injury and allograft failure in residual clinical biopsy specimens obtained up to 10 years earlier. Further mechanistic studies may identify the WNT pathway as a new diagnostic and therapeutic target.
Assuntos
Rejeição de Enxerto/diagnóstico , Isoanticorpos/efeitos adversos , Falência Renal Crônica/cirurgia , Transplante de Rim/efeitos adversos , Microvasos/patologia , Complicações Pós-Operatórias/diagnóstico , Via de Sinalização Wnt , Biomarcadores/metabolismo , Estudos de Casos e Controles , Estudos Transversais , Feminino , Seguimentos , Rejeição de Enxerto/etiologia , Rejeição de Enxerto/metabolismo , Sobrevivência de Enxerto , Humanos , Estudos Longitudinais , Masculino , Microvasos/lesões , Microvasos/metabolismo , Pessoa de Meia-Idade , Complicações Pós-Operatórias/etiologia , Complicações Pós-Operatórias/metabolismo , Prognóstico , Fatores de RiscoRESUMO
Motivation: Large-scale molecular data have been increasingly used as an important resource for prognostic prediction of diseases and detection of associated genes. However, standard approaches for omics data analysis ignore the group structure among genes encoded in functional relationships or pathway information. Results: We propose new Bayesian hierarchical generalized linear models, called group spike-and-slab lasso GLMs, for predicting disease outcomes and detecting associated genes by incorporating large-scale molecular data and group structures. The proposed model employs a mixture double-exponential prior for coefficients that induces self-adaptive shrinkage amount on different coefficients. The group information is incorporated into the model by setting group-specific parameters. We have developed a fast and stable deterministic algorithm to fit the proposed hierarchal GLMs, which can perform variable selection within groups. We assess the performance of the proposed method on several simulated scenarios, by varying the overlap among groups, group size, number of non-null groups, and the correlation within group. Compared with existing methods, the proposed method provides not only more accurate estimates of the parameters but also better prediction. We further demonstrate the application of the proposed procedure on three cancer datasets by utilizing pathway structures of genes. Our results show that the proposed method generates powerful models for predicting disease outcomes and detecting associated genes. Availability and implementation: The methods have been implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/). Contact: nyi@uab.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Biologia Computacional/métodos , Genes , Redes e Vias Metabólicas , Modelos Biológicos , Prognóstico , Teorema de Bayes , Humanos , Modelos Lineares , Fatores de RiscoRESUMO
Motivation: Molecular analyses suggest that myeloma is composed of distinct sub-types that have different molecular pathologies and various response rates to certain treatments. Drug responses in multiple myeloma (MM) are usually recorded as a multi-level ordinal outcome. One of the goals of drug response studies is to predict which response category any patients belong to with high probability based on their clinical and molecular features. However, as most of genes have small effects, gene-based models may provide limited predictive accuracy. In that case, methods for predicting multi-level ordinal drug responses by incorporating biological pathways are desired but have not been developed yet. Results: We propose a pathway-structured method for predicting multi-level ordinal responses using a two-stage approach. We first develop hierarchical ordinal logistic models and an efficient quasi-Newton algorithm for jointly analyzing numerous correlated variables. Our two-stage approach first obtains the linear predictor (called the pathway score) for each pathway by fitting all predictors within each pathway using the hierarchical ordinal logistic approach, and then combines the pathway scores as new predictors to build a predictive model. We applied the proposed method to two publicly available datasets for predicting multi-level ordinal drug responses in MM using large-scale gene expression data and pathway information. Our results show that our approach not only significantly improved the predictive performance compared with the corresponding gene-based model but also allowed us to identify biologically relevant pathways. Availability and implementation: The proposed approach has been implemented in our R package BhGLM, which is freely available from the public GitHub repository https://github.com/abbyyan3/BhGLM.
Assuntos
Fenômenos Biológicos , Mieloma Múltiplo , Algoritmos , Teorema de Bayes , Humanos , Modelos Logísticos , Mieloma Múltiplo/tratamento farmacológicoRESUMO
OBJECTIVES: To identify novel DNA methylation sites significant for rheumatoid arthritis (RA) and comprehensively understand their underlying pathological mechanism. METHODS: We performed (1) genome-wide DNA methylation and mRNA expression profiling in peripheral blood mononuclear cells from RA patients and health controls; (2) correlation analysis and causal inference tests for DNA methylation and mRNA expression data; (3) differential methylation genes regulatory network construction; (4) validation tests of 10 differential methylation positions (DMPs) of interest and corresponding gene expressions; (5) correlation between PARP9 methylation and its mRNA expression level in Jurkat cells and T cells from patients with RA; (6) testing the pathological functions of PARP9 in Jurkat cells. RESULTS: A total of 1046 DNA methylation positions were associated with RA. The identified DMPs have regulatory effects on mRNA expressions. Causal inference tests identified six DNA methylation-mRNA-RA regulatory chains (eg, cg00959259-PARP9-RA). The identified DMPs and genes formed an interferon-inducible gene interaction network (eg, MX1, IFI44L, DTX3L and PARP9). Key DMPs and corresponding genes were validated their differences in additional samples. Methylation of PARP9 was correlated with mRNA level in Jurkat cells and T lymphocytes isolated from patients with RA. The PARP9 gene exerted significant effects on Jurkat cells (eg, cell cycle, cell proliferation, cell activation and expression of inflammatory factor IL-2). CONCLUSIONS: This multistage study identified an interferon-inducible gene interaction network associated with RA and highlighted the importance of PARP9 gene in RA pathogenesis. The results enhanced our understanding of the important role of DNA methylation in pathology of RA.
Assuntos
Artrite Reumatoide/genética , Metilação de DNA/genética , Leucócitos Mononucleares/metabolismo , RNA Mensageiro/metabolismo , Artrite Reumatoide/sangue , Estudos de Casos e Controles , Feminino , Perfilação da Expressão Gênica , Redes Reguladoras de Genes/genética , Humanos , Células Jurkat/metabolismo , Masculino , Pessoa de Meia-Idade , Proteínas de Neoplasias/metabolismo , Poli(ADP-Ribose) Polimerases/metabolismo , Linfócitos T/metabolismoRESUMO
MicroRNAs (miRNAs) can regulate gene expression through binding to complementary sites in the 3'-untranslated regions of target mRNAs, which will lead to existence of correlation in expression between miRNA and mRNA. However, the miRNA-mRNA correlation patterns are complex and remain largely unclear yet. To establish the global correlation patterns in human peripheral blood mononuclear cells (PBMCs), multiple miRNA-mRNA correlation analyses and expression quantitative trait locus (eQTL) analysis were conducted in this study. We predicted and achieved 861 miRNA-mRNA pairs (65 miRNAs, 412 mRNAs) using multiple bioinformatics programs, and found global negative miRNA-mRNA correlations in PBMC from all 46 study subjects. Among the 861 pairs of correlations, 19.5% were significant (P < 0.05) and ~70% were negative. The correlation network was complex and highlighted key miRNAs/genes in PBMC. Some miRNAs, such as hsa-miR-29a, hsa-miR-148a, regulate a cluster of target genes. Some genes, e.g., TNRC6A, are regulated by multiple miRNAs. The identified genes tend to be enriched in molecular functions of DNA and RNA binding, and biological processes such as protein transport, regulation of translation and chromatin modification. The results provided a global view of the miRNA-mRNA expression correlation profile in human PBMCs, which would facilitate in-depth investigation of biological functions of key miRNAs/mRNAs and better understanding of the pathogenesis underlying PBMC-related diseases.
Assuntos
Regulação da Expressão Gênica/genética , Redes Reguladoras de Genes/genética , MicroRNAs/genética , Autoantígenos/genética , Montagem e Desmontagem da Cromatina/genética , Biologia Computacional , Humanos , Leucócitos Mononucleares/metabolismo , Leucócitos Mononucleares/patologia , Locos de Características Quantitativas/genética , RNA Mensageiro/genética , Proteínas de Ligação a RNA/genéticaRESUMO
MOTIVATION: Large-scale molecular profiling data have offered extraordinary opportunities to improve survival prediction of cancers and other diseases and to detect disease associated genes. However, there are considerable challenges in analyzing large-scale molecular data. RESULTS: We propose new Bayesian hierarchical Cox proportional hazards models, called the spike-and-slab lasso Cox, for predicting survival outcomes and detecting associated genes. We also develop an efficient algorithm to fit the proposed models by incorporating Expectation-Maximization steps into the extremely fast cyclic coordinate descent algorithm. The performance of the proposed method is assessed via extensive simulations and compared with the lasso Cox regression. We demonstrate the proposed procedure on two cancer datasets with censored survival outcomes and thousands of molecular features. Our analyses suggest that the proposed procedure can generate powerful prognostic models for predicting cancer survival and can detect associated genes. AVAILABILITY AND IMPLEMENTATION: The methods have been implemented in a freely available R package BhGLM ( http://www.ssg.uab.edu/bhglm/ ). CONTACT: nyi@uab.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Biologia Computacional/métodos , Modelos de Riscos Proporcionais , Teorema de Bayes , HumanosRESUMO
BACKGROUND: Multiple myeloma (MM), like other cancers, is caused by the accumulation of genetic abnormalities. Heterogeneity exists in the patients' response to treatments, for example, bortezomib. This urges efforts to identify biomarkers from numerous molecular features and build predictive models for identifying patients that can benefit from a certain treatment scheme. However, previous studies treated the multi-level ordinal drug response as a binary response where only responsive and non-responsive groups are considered. METHODS: It is desirable to directly analyze the multi-level drug response, rather than combining the response to two groups. In this study, we present a novel method to identify significantly associated biomarkers and then develop ordinal genomic classifier using the hierarchical ordinal logistic model. The proposed hierarchical ordinal logistic model employs the heavy-tailed Cauchy prior on the coefficients and is fitted by an efficient quasi-Newton algorithm. RESULTS: We apply our hierarchical ordinal regression approach to analyze two publicly available datasets for MM with five-level drug response and numerous gene expression measures. Our results show that our method is able to identify genes associated with the multi-level drug response and to generate powerful predictive models for predicting the multi-level response. CONCLUSIONS: The proposed method allows us to jointly fit numerous correlated predictors and thus build efficient models for predicting the multi-level drug response. The predictive model for the multi-level drug response can be more informative than the previous approaches. Thus, the proposed approach provides a powerful tool for predicting multi-level drug response and has important impact on cancer studies.