RESUMO
A mosaic of cross-phylum chemical interactions occurs between all metazoans and their microbiomes. A number of molecular families that are known to be produced by the microbiome have a marked effect on the balance between health and disease1-9. Considering the diversity of the human microbiome (which numbers over 40,000 operational taxonomic units10), the effect of the microbiome on the chemistry of an entire animal remains underexplored. Here we use mass spectrometry informatics and data visualization approaches11-13 to provide an assessment of the effects of the microbiome on the chemistry of an entire mammal by comparing metabolomics data from germ-free and specific-pathogen-free mice. We found that the microbiota affects the chemistry of all organs. This included the amino acid conjugations of host bile acids that were used to produce phenylalanocholic acid, tyrosocholic acid and leucocholic acid, which have not previously been characterized despite extensive research on bile-acid chemistry14. These bile-acid conjugates were also found in humans, and were enriched in patients with inflammatory bowel disease or cystic fibrosis. These compounds agonized the farnesoid X receptor in vitro, and mice gavaged with the compounds showed reduced expression of bile-acid synthesis genes in vivo. Further studies are required to confirm whether these compounds have a physiological role in the host, and whether they contribute to gut diseases that are associated with microbiome dysbiosis.
Assuntos
Ácidos e Sais Biliares/biossíntese , Ácidos e Sais Biliares/química , Metabolômica , Microbiota/fisiologia , Animais , Ácidos e Sais Biliares/metabolismo , Ácido Cólico/biossíntese , Ácido Cólico/química , Ácido Cólico/metabolismo , Fibrose Cística/genética , Fibrose Cística/metabolismo , Fibrose Cística/microbiologia , Vida Livre de Germes , Humanos , Doenças Inflamatórias Intestinais/genética , Doenças Inflamatórias Intestinais/metabolismo , Doenças Inflamatórias Intestinais/microbiologia , Camundongos , Receptores Citoplasmáticos e Nucleares/genética , Receptores Citoplasmáticos e Nucleares/metabolismoRESUMO
Inflammatory bowel diseases, which include Crohn's disease and ulcerative colitis, affect several million individuals worldwide. Crohn's disease and ulcerative colitis are complex diseases that are heterogeneous at the clinical, immunological, molecular, genetic, and microbial levels. Individual contributing factors have been the focus of extensive research. As part of the Integrative Human Microbiome Project (HMP2 or iHMP), we followed 132 subjects for one year each to generate integrated longitudinal molecular profiles of host and microbial activity during disease (up to 24 time points each; in total 2,965 stool, biopsy, and blood specimens). Here we present the results, which provide a comprehensive view of functional dysbiosis in the gut microbiome during inflammatory bowel disease activity. We demonstrate a characteristic increase in facultative anaerobes at the expense of obligate anaerobes, as well as molecular disruptions in microbial transcription (for example, among clostridia), metabolite pools (acylcarnitines, bile acids, and short-chain fatty acids), and levels of antibodies in host serum. Periods of disease activity were also marked by increases in temporal variability, with characteristic taxonomic, functional, and biochemical shifts. Finally, integrative analysis identified microbial, biochemical, and host factors central to this dysregulation. The study's infrastructure resources, results, and data, which are available through the Inflammatory Bowel Disease Multi'omics Database ( http://ibdmdb.org ), provide the most comprehensive description to date of host and microbial activities in inflammatory bowel diseases.
Assuntos
Microbioma Gastrointestinal/genética , Doenças Inflamatórias Intestinais/microbiologia , Animais , Fungos/patogenicidade , Microbioma Gastrointestinal/imunologia , Saúde , Humanos , Doenças Inflamatórias Intestinais/imunologia , Doenças Inflamatórias Intestinais/terapia , Doenças Inflamatórias Intestinais/virologia , Filogenia , Especificidade da Espécie , Transcriptoma , Vírus/patogenicidadeRESUMO
With the growing commonality of multi-omics datasets, there is now increasing evidence that integrated omics profiles lead to more efficient discovery of clinically actionable biomarkers that enable better disease outcome prediction and patient stratification. Several methods exist to perform host phenotype prediction from cross-sectional, single-omics data modalities but decentralized frameworks that jointly analyze multiple time-dependent omics data to highlight the integrative and dynamic impact of repeatedly measured biomarkers are currently limited. In this article, we propose a novel Bayesian ensemble method to consolidate prediction by combining information across several longitudinal and cross-sectional omics data layers. Unlike existing frequentist paradigms, our approach enables uncertainty quantification in prediction as well as interval estimation for a variety of quantities of interest based on posterior summaries. We apply our method to four published multi-omics datasets and demonstrate that it recapitulates known biology in addition to providing novel insights while also outperforming existing methods in estimation, prediction, and uncertainty quantification. Our open-source software is publicly available at https://github.com/himelmallick/IntegratedLearner.
Assuntos
Multiômica , Software , Humanos , Teorema de Bayes , Estudos Transversais , BiomarcadoresRESUMO
MOTIVATION: The discovery of biologically interpretable and clinically actionable communities in heterogeneous omics data is a necessary first step toward deriving mechanistic insights into complex biological phenomena. Here, we present a novel clustering approach, omeClust, for community detection in omics profiles by simultaneously incorporating similarities among measurements and the overall complex structure of the data. RESULTS: We show that omeClust outperforms published methods in inferring the true community structure as measured by both sensitivity and misclassification rate on simulated datasets. We further validated omeClust in diverse, multiple omics datasets, revealing new communities and functionally related groups in microbial strains, cell line gene expression patterns and fetal genomic variation. We also derived enrichment scores attributable to putatively meaningful biological factors in these datasets that can serve as hypothesis generators facilitating new sets of testable hypotheses. AVAILABILITY AND IMPLEMENTATION: omeClust is open-source software, and the implementation is available online at http://github.com/omicsEye/omeClust. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMO
Many methods have been developed for statistical analysis of microbial community profiles, but due to the complex nature of typical microbiome measurements (e.g. sparsity, zero-inflation, non-independence, and compositionality) and of the associated underlying biology, it is difficult to compare or evaluate such methods within a single systematic framework. To address this challenge, we developed SparseDOSSA (Sparse Data Observations for the Simulation of Synthetic Abundances): a statistical model of microbial ecological population structure, which can be used to parameterize real-world microbial community profiles and to simulate new, realistic profiles of known structure for methods evaluation. Specifically, SparseDOSSA's model captures marginal microbial feature abundances as a zero-inflated log-normal distribution, with additional model components for absolute cell counts and the sequence read generation process, microbe-microbe, and microbe-environment interactions. Together, these allow fully known covariance structure between synthetic features (i.e. "taxa") or between features and "phenotypes" to be simulated for method benchmarking. Here, we demonstrate SparseDOSSA's performance for 1) accurately modeling human-associated microbial population profiles; 2) generating synthetic communities with controlled population and ecological structures; 3) spiking-in true positive synthetic associations to benchmark analysis methods; and 4) recapitulating an end-to-end mouse microbiome feeding experiment. Together, these represent the most common analysis types in assessment of real microbial community environmental and epidemiological statistics, thus demonstrating SparseDOSSA's utility as a general-purpose aid for modeling communities and evaluating quantitative methods. An open-source implementation is available at http://huttenhower.sph.harvard.edu/sparsedossa2.
Assuntos
Microbiota , Modelos Estatísticos , Algoritmos , Benchmarking , Biologia Computacional/métodos , Simulação por ComputadorRESUMO
It is challenging to associate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to their quantitative properties. Microbiome multi-omics are typically noisy, sparse (zero-inflated), high-dimensional, extremely non-normal, and often in the form of count or compositional measurements. Here we introduce an optimized combination of novel and established methodology to assess multivariable association of microbial community features with complex metadata in population-scale observational studies. Our approach, MaAsLin 2 (Microbiome Multivariable Associations with Linear Models), uses generalized linear and mixed models to accommodate a wide variety of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as a variety of data types (e.g., counts and relative abundances) with or without covariates and repeated measurements. To construct this method, we conducted a large-scale evaluation of a broad range of scenarios under which straightforward identification of meta-omics associations can be challenging. These simulation studies reveal that MaAsLin 2's linear model preserves statistical power in the presence of repeated measures and multiple covariates, while accounting for the nuances of meta-omics features and controlling false discovery. We also applied MaAsLin 2 to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project which, in addition to reproducing established results, revealed a unique, integrated landscape of inflammatory bowel diseases (IBD) across multiple time points and omics profiles.
Assuntos
Biologia Computacional , Microbioma Gastrointestinal , Análise Multivariada , Simulação por Computador , Humanos , Doenças Inflamatórias Intestinais/genética , Doenças Inflamatórias Intestinais/metabolismo , Doenças Inflamatórias Intestinais/patologiaRESUMO
The performance of computational methods and software to identify differentially expressed features in single-cell RNA-sequencing (scRNA-seq) has been shown to be influenced by several factors, including the choice of the normalization method used and the choice of the experimental platform (or library preparation protocol) to profile gene expression in individual cells. Currently, it is up to the practitioner to choose the most appropriate differential expression (DE) method out of over 100 DE tools available to date, each relying on their own assumptions to model scRNA-seq expression features. To model the technological variability in cross-platform scRNA-seq data, here we propose to use Tweedie generalized linear models that can flexibly capture a large dynamic range of observed scRNA-seq expression profiles across experimental platforms induced by platform- and gene-specific statistical properties such as heavy tails, sparsity, and gene expression distributions. We also propose a zero-inflated Tweedie model that allows zero probability mass to exceed a traditional Tweedie distribution to model zero-inflated scRNA-seq data with excessive zero counts. Using both synthetic and published plate- and droplet-based scRNA-seq datasets, we perform a systematic benchmark evaluation of more than 10 representative DE methods and demonstrate that our method (Tweedieverse) outperforms the state-of-the-art DE approaches across experimental platforms in terms of statistical power and false discovery rate control. Our open-source software (R/Bioconductor package) is available at https://github.com/himelmallick/Tweedieverse.
Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Perfilação da Expressão Gênica/métodos , Humanos , RNA-Seq , Análise de Sequência de RNA , SoftwareRESUMO
BACKGROUND & AIMS: Sulfur-metabolizing microbes, which convert dietary sources of sulfur into genotoxic hydrogen sulfide (H2S), have been associated with development of colorectal cancer (CRC). We identified a dietary pattern associated with sulfur-metabolizing bacteria in stool and then investigated its association with risk of incident CRC using data from a large prospective study of men. METHODS: We collected data from 51,529 men enrolled in the Health Professionals Follow-up Study since 1986 to determine the association between sulfur-metabolizing bacteria in stool and risk of CRC over 26 years of follow-up. First, in a subcohort of 307 healthy men, we profiled serial stool metagenomes and metatranscriptomes and assessed diet using semiquantitative food frequency questionnaires to identify food groups associated with 43 bacterial species involved in sulfur metabolism. We used these data to develop a sulfur microbial dietary score. We then used Cox proportional hazards modeling to evaluate adherence to this pattern among eligible individuals (n = 48,246) from 1986 through 2012 with risk for incident CRC. RESULTS: Foods associated with higher sulfur microbial diet scores included increased consumption of processed meats and low-calorie drinks and lower consumption of vegetables and legumes. Increased sulfur microbial diet scores were associated with risk of distal colon and rectal cancers, after adjusting for other risk factors (multivariable relative risk, highest vs lowest quartile, 1.43; 95% confidence interval 1.14-1.81; P-trend = .002). In contrast, sulfur microbial diet scores were not associated with risk of proximal colon cancer (multivariable relative risk 0.86; 95% CI 0.65-1.14; P-trend = .31). CONCLUSIONS: In an analysis of participants in the Health Professionals Follow-up Study, we found that long-term adherence to a dietary pattern associated with sulfur-metabolizing bacteria in stool was associated with an increased risk of distal CRC. Further studies are needed to determine how sulfur-metabolizing bacteria might contribute to CRC pathogenesis.
Assuntos
Bactérias/metabolismo , Neoplasias Colorretais/epidemiologia , Fezes/microbiologia , Comportamento Alimentar/fisiologia , Microbioma Gastrointestinal/fisiologia , Idoso , Bactérias/isolamento & purificação , Neoplasias Colorretais/microbiologia , Neoplasias Colorretais/prevenção & controle , Inquéritos sobre Dietas/estatística & dados numéricos , Seguimentos , Pessoal de Saúde/estatística & dados numéricos , Humanos , Incidência , Masculino , Massachusetts/epidemiologia , Pessoa de Meia-Idade , Estudos Prospectivos , Fatores de Risco , Enxofre/metabolismoRESUMO
A reciprocal LASSO (rLASSO) regularization employs a decreasing penalty function as opposed to conventional penalization approaches that use increasing penalties on the coefficients, leading to stronger parsimony and superior model selection relative to traditional shrinkage methods. Here we consider a fully Bayesian formulation of the rLASSO problem, which is based on the observation that the rLASSO estimate for linear regression parameters can be interpreted as a Bayesian posterior mode estimate when the regression parameters are assigned independent inverse Laplace priors. Bayesian inference from this posterior is possible using an expanded hierarchy motivated by a scale mixture of double Pareto or truncated normal distributions. On simulated and real datasets, we show that the Bayesian formulation outperforms its classical cousin in estimation, prediction, and variable selection across a wide range of scenarios while offering the advantage of posterior inference. Finally, we discuss other variants of this new approach and provide a unified framework for variable selection using flexible reciprocal penalties. All methods described in this article are publicly available as an R package at: https://github.com/himelmallick/BayesRecipe.
Assuntos
Teorema de Bayes , Humanos , Modelos LinearesRESUMO
Compositional data consist of vectors of proportions normalized to a constant sum from a basis of unobserved counts. The sum constraint makes inference on correlations between unconstrained features challenging due to the information loss from normalization. However, such correlations are of long-standing interest in fields including ecology. We propose a novel Bayesian framework (BAnOCC: Bayesian Analysis of Compositional Covariance) to estimate a sparse precision matrix through a LASSO prior. The resulting posterior, generated by MCMC sampling, allows uncertainty quantification of any function of the precision matrix, including the correlation matrix. We also use a first-order Taylor expansion to approximate the transformation from the unobserved counts to the composition in order to investigate what characteristics of the unobserved counts can make the correlations more or less difficult to infer. On simulated datasets, we show that BAnOCC infers the true network as well as previous methods while offering the advantage of posterior inference. Larger and more realistic simulated datasets further showed that BAnOCC performs well as measured by type I and type II error rates. Finally, we apply BAnOCC to a microbial ecology dataset from the Human Microbiome Project, which in addition to reproducing established ecological results revealed unique, competition-based roles for Proteobacteria in multiple distinct habitats.
Assuntos
Teorema de Bayes , Biologia Computacional/métodos , Simulação por Computador , Modelos Biológicos , Algoritmos , Ecologia , Humanos , Cadeias de Markov , Microbiota , ProteobactériasRESUMO
In many biomedical applications, covariates are naturally grouped, with variables in the same group being systematically related or statistically correlated. Under such settings, variable selection must be conducted at both group and individual variable levels. Motivated by the widespread availability of zero-inflated count outcomes and grouped covariates in many practical applications, we consider group regularization for zero-inflated negative binomial regression models. Using a least squares approximation of the mixture likelihood and a variety of group-wise penalties on the coefficients, we propose a unified algorithm (Gooogle: Group Regularization for Zero-inflated Count Regression Models) to efficiently compute the entire regularization path of the estimators. We investigate the finite sample performance of these methods through extensive simulation experiments and the analysis of a German health care demand dataset. Finally, we derive theoretical properties of these methods under reasonable assumptions, which further provides deeper insight into the asymptotic behavior of these approaches. The open source software implementation of this method is publicly available at: https://github.com/himelmallick/Gooogle.
Assuntos
Necessidades e Demandas de Serviços de Saúde/estatística & dados numéricos , Modelos Estatísticos , Algoritmos , Alemanha , Humanos , Análise dos Mínimos Quadrados , Funções Verossimilhança , SoftwareRESUMO
BACKGROUND: Recent advances in next-generation sequencing (NGS) technology enable researchers to collect a large volume of metagenomic sequencing data. These data provide valuable resources for investigating interactions between the microbiome and host environmental/clinical factors. In addition to the well-known properties of microbiome count measurements, for example, varied total sequence reads across samples, over-dispersion and zero-inflation, microbiome studies usually collect samples with hierarchical structures, which introduce correlation among the samples and thus further complicate the analysis and interpretation of microbiome count data. RESULTS: In this article, we propose negative binomial mixed models (NBMMs) for detecting the association between the microbiome and host environmental/clinical factors for correlated microbiome count data. Although having not dealt with zero-inflation, the proposed mixed-effects models account for correlation among the samples by incorporating random effects into the commonly used fixed-effects negative binomial model, and can efficiently handle over-dispersion and varying total reads. We have developed a flexible and efficient IWLS (Iterative Weighted Least Squares) algorithm to fit the proposed NBMMs by taking advantage of the standard procedure for fitting the linear mixed models. CONCLUSIONS: We evaluate and demonstrate the proposed method via extensive simulation studies and the application to mouse gut microbiome data. The results show that the proposed method has desirable properties and outperform the previously used methods in terms of both empirical power and Type I error. The method has been incorporated into the freely available R package BhGLM ( http://www.ssg.uab.edu/bhglm/ and http://github.com/abbyyan3/BhGLM ), providing a useful tool for analyzing microbiome data.
Assuntos
Microbiota , Modelos Estatísticos , Algoritmos , Animais , Bactérias/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Internet , Intestinos/microbiologia , Masculino , Camundongos , Camundongos Endogâmicos C57BL , RNA Ribossômico 16S/química , RNA Ribossômico 16S/metabolismo , Interface Usuário-ComputadorRESUMO
A Bayesian bi-level variable selection method (BAGB: Bayesian Analysis of Group Bridge) is developed for regularized regression and classification. This new development is motivated by grouped data, where generic variables can be divided into multiple groups, with variables in the same group being mechanistically related or statistically correlated. As an alternative to frequentist group variable selection methods, BAGB incorporates structural information among predictors through a group-wise shrinkage prior. Posterior computation proceeds via an efficient MCMC algorithm. In addition to the usual ease-of-interpretation of hierarchical linear models, the Bayesian formulation produces valid standard errors, a feature that is notably absent in the frequentist framework. Empirical evidence of the attractiveness of the method is illustrated by extensive Monte Carlo simulations and real data analysis. Finally, several extensions of this new approach are presented, providing a unified framework for bi-level variable selection in general models with flexible penalties.
RESUMO
Multiple comparisons or multiple testing has been viewed as a thorny issue in genetic association studies aiming to detect disease-associated genetic variants from a large number of genotyped variants. We alleviate the problem of multiple comparisons by proposing a hierarchical modeling approach that is fundamentally different from the existing methods. The proposed hierarchical models simultaneously fit as many variables as possible and shrink unimportant effects towards zero. Thus, the hierarchical models yield more efficient estimates of parameters than the traditional methods that analyze genetic variants separately, and also coherently address the multiple comparisons problem due to largely reducing the effective number of genetic effects and the number of statistically "significant" effects. We develop a method for computing the effective number of genetic effects in hierarchical generalized linear models, and propose a new adjustment for multiple comparisons, the hierarchical Bonferroni correction, based on the effective number of genetic effects. Our approach not only increases the power to detect disease-associated variants but also controls the Type I error. We illustrate and evaluate our method with real and simulated data sets from genetic association studies. The method has been implemented in our freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/).
Assuntos
Estudos de Associação Genética , Modelos Genéticos , Adiponectina/genética , Algoritmos , Teorema de Bayes , Estudos de Casos e Controles , Neoplasias Colorretais/genética , Simulação por Computador , Frequência do Gene , Predisposição Genética para Doença , Genótipo , Cardiopatias/genética , Humanos , Modelos Lineares , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Risco , SoftwareRESUMO
The development of congenital heart defects (CHDs) involves a complex interplay between genetic variants, epigenetic variants, and environmental exposures. Previous studies have suggested that susceptibility to CHDs is associated with maternal genotypes, fetal genotypes, and maternal-fetal genotype (MFG) interactions. We conducted a haplotype-based genetic association study of obstructive heart defects (OHDs), aiming to detect the genetic effects of 877 SNPs involved in the homocysteine, folate, and transsulfuration pathways. Genotypes were available for 285 mother-offspring pairs with OHD-affected pregnancies and 868 mother-offspring pairs with unaffected pregnancies. A penalized logistic regression model was applied with an adaptive least absolute shrinkage and selection operator (lasso), which dissects the maternal effect, fetal effect, and MFG interaction effects associated with OHDs. By examining the association between 140 haplotype blocks, we identified 9 blocks that are potentially associated with OHD occurrence. Four haplotype blocks, located in genes MGMT, MTHFS, CBS, and DNMT3L, were statistically significant using a Bayesian false-discovery probability threshold of 0.8. Two blocks in MGMT and MTHFS appear to have significant fetal effects, while the CBS and DNMT3L genes may have significant MFG interaction effects.
Assuntos
Carbono-Nitrogênio Ligases/genética , DNA (Citosina-5-)-Metiltransferases/genética , Metilases de Modificação do DNA/genética , Enzimas Reparadoras do DNA/genética , Haplótipos , Cardiopatias Congênitas/genética , Proteínas Supressoras de Tumor/genética , Adulto , Estudos de Casos e Controles , Feminino , Feto , Estudos de Associação Genética , Predisposição Genética para Doença , Genótipo , Humanos , Modelos Logísticos , Polimorfismo de Nucleotídeo Único , Gravidez , Estados UnidosRESUMO
BACKGROUND: Wiping of the mouth and nose at birth is an alternative method to oronasopharyngeal suction in delivery-room management of neonates, but whether these methods have equivalent effectiveness is unclear. METHODS: For this randomised equivalency trial, neonates delivered at 35 weeks' gestation or later at the University of Alabama at Birmingham Hospital, Birmingham, AL, USA, between October, 2010, and November, 2011, were eligible. Before birth, neonates were randomly assigned gentle wiping of the face, mouth (implemented by the paediatric or obstetric resident), and nose with a towel (wipe group) or suction with a bulb syringe of the mouth and nostrils (suction group). The primary outcome was the respiratory rate in the first 24 h after birth. We hypothesised that respiratory rates would differ by fewer than 4 breaths per min between groups. Analysis was by intention to treat. This study is registered with ClinicalTrials.gov, number NCT01197807. FINDINGS: 506 neonates born at a median of 39 weeks' gestation (IQR 38-40) were randomised. Three parents withdrew consent and 15 non-vigorous neonates with meconium-stained amniotic fluid were excluded. Among the 488 treated neonates, the mean respiratory rates in the first 24 h were 51 (SD 8) breaths per min in the wipe group and 50 (6) breaths per min in the suction group (difference of means 1 breath per min, 95% CI -2 to 0, p<0·001). INTERPRETATION: Wiping the nose and mouth has equivalent efficacy to routine use of oronasopharyngeal suction in neonates born at or beyond 35 weeks' gestation. FUNDING: None.
Assuntos
Assistência Perinatal/métodos , Taxa Respiratória/fisiologia , Sucção/métodos , Feminino , Humanos , Recém-Nascido , Masculino , Boca , Nariz , Higiene Bucal/métodos , Resultado do TratamentoRESUMO
Microbiome studies of inflammatory bowel diseases (IBD) have achieved a scale for meta-analysis of dysbioses among populations. To enable microbial community meta-analyses generally, we develop MMUPHin for normalization, statistical meta-analysis, and population structure discovery using microbial taxonomic and functional profiles. Applying it to ten IBD cohorts, we identify consistent associations, including novel taxa such as Acinetobacter and Turicibacter, and additional exposure and interaction effects. A single gradient of dysbiosis severity is favored over discrete types to summarize IBD microbiome population structure. These results provide a benchmark for characterization of IBD and a framework for meta-analysis of any microbial communities.
Assuntos
Microbioma Gastrointestinal , Doenças Inflamatórias Intestinais , Microbiota , Disbiose , HumanosRESUMO
Microbial community metabolomics, particularly in the human gut, are beginning to provide a new route to identify functions and ecology disrupted in disease. However, these data can be costly and difficult to obtain at scale, while amplicon or shotgun metagenomic sequencing data are readily available for populations of many thousands. Here, we describe a computational approach to predict potentially unobserved metabolites in new microbial communities, given a model trained on paired metabolomes and metagenomes from the environment of interest. Focusing on two independent human gut microbiome datasets, we demonstrate that our framework successfully recovers community metabolic trends for more than 50% of associated metabolites. Similar accuracy is maintained using amplicon profiles of coral-associated, murine gut, and human vaginal microbiomes. We also provide an expected performance score to guide application of the model in new samples. Our results thus demonstrate that this 'predictive metabolomic' approach can aid in experimental design and provide useful insights into the thousands of community profiles for which only metagenomes are currently available.
Assuntos
Microbioma Gastrointestinal/genética , Metabolômica , Microbiota/genética , Modelos Genéticos , Algoritmos , Colite Ulcerativa/microbiologia , Doença de Crohn/microbiologia , Humanos , MetagenômicaRESUMO
In the Supplementary Tables 2, 4 and 6 originally published with this Article, the authors mistakenly included sample identifiers in the form of UMCGs rather than UMCG IBDs in the validation cohort; this has now been amended.
RESUMO
The inflammatory bowel diseases (IBDs), which include Crohn's disease (CD) and ulcerative colitis (UC), are multifactorial chronic conditions of the gastrointestinal tract. While IBD has been associated with dramatic changes in the gut microbiota, changes in the gut metabolome-the molecular interface between host and microbiota-are less well understood. To address this gap, we performed untargeted metabolomic and shotgun metagenomic profiling of cross-sectional stool samples from discovery (n = 155) and validation (n = 65) cohorts of CD, UC and non-IBD control patients. Metabolomic and metagenomic profiles were broadly correlated with faecal calprotectin levels (a measure of gut inflammation). Across >8,000 measured metabolite features, we identified chemicals and chemical classes that were differentially abundant in IBD, including enrichments for sphingolipids and bile acids, and depletions for triacylglycerols and tetrapyrroles. While > 50% of differentially abundant metabolite features were uncharacterized, many could be assigned putative roles through metabolomic 'guilt by association' (covariation with known metabolites). Differentially abundant species and functions from the metagenomic profiles reflected adaptation to oxidative stress in the IBD gut, and were individually consistent with previous findings. Integrating these data, however, we identified 122 robust associations between differentially abundant species and well-characterized differentially abundant metabolites, indicating possible mechanistic relationships that are perturbed in IBD. Finally, we found that metabolome- and metagenome-based classifiers of IBD status were highly accurate and, like the vast majority of individual trends, generalized well to the independent validation cohort. Our findings thus provide an improved understanding of perturbations of the microbiome-metabolome interface in IBD, including identification of many potential diagnostic and therapeutic targets.