RESUMO
Microbiome data predictive analysis within a machine learning (ML) workflow presents numerous domain-specific challenges involving preprocessing, feature selection, predictive modeling, performance estimation, model interpretation, and the extraction of biological information from the results. To assist decision-making, we offer a set of recommendations on algorithm selection, pipeline creation and evaluation, stemming from the COST Action ML4Microbiome. We compared the suggested approaches on a multi-cohort shotgun metagenomics dataset of colorectal cancer patients, focusing on their performance in disease diagnosis and biomarker discovery. It is demonstrated that the use of compositional transformations and filtering methods as part of data preprocessing does not always improve the predictive performance of a model. In contrast, the multivariate feature selection, such as the Statistically Equivalent Signatures algorithm, was effective in reducing the classification error. When validated on a separate test dataset, this algorithm in combination with random forest modeling, provided the most accurate performance estimates. Lastly, we showed how linear modeling by logistic regression coupled with visualization techniques such as Individual Conditional Expectation (ICE) plots can yield interpretable results and offer biological insights. These findings are significant for clinicians and non-experts alike in translational applications.
RESUMO
Seaweeds are potentially sustainable crops and are receiving significant interest because of their rich bioactive compound content; including fatty acids, polyphenols, carotenoids, and complex polysaccharides. However, there is little information on the in vivo effects on gut health of the polysaccharides and their low-molecular-weight derivatives. Herein, we describe the first investigation into the prebiotic potential of low-molecular-weight polysaccharides (LMWPs) derived from alginate and agar in order to validate their in vivo efficacy. We conducted a randomized; placebo-controlled trial testing the impact of alginate and agar LWMPs on faecal weight and other markers of gut health and on composition of gut microbiota. We show that these LMWPs led to significantly increased faecal bulk (20-30%). Analysis of gut microbiome composition by sequencing indicated no significant changes attributable to treatment at the phylum and family level, although FISH analysis showed an increase in Faecalibacterium prausnitzii in subjects consuming agar LMWP. Sequence analysis of gut bacteria corroborated with the FISH data, indicating that alginate and agar LWMPs do not alter human gut microbiome health markers. Crucially, our findings suggest an urgent need for robust and rigorous human in vivo testing-in particular, using refined seaweed extracts.
RESUMO
Considerable recent research has indicated the presence of bacteria in a variety of human tumours and matched normal tissue. Rather than focusing on further identification of bacteria within tumour samples, we reversed the hypothesis to query if establishing the bacterial profile of a tissue biopsy could reveal its histology / malignancy status. The aim of the present study was therefore to differentiate between malignant and non-malignant fresh breast biopsy specimens, collected specifically for this purpose, based on bacterial sequence data alone. Fresh tissue biopsies were obtained from breast cancer patients and subjected to 16S rRNA gene sequencing. Progressive microbiological and bioinformatic contamination control practices were imparted at all points of specimen handling and bioinformatic manipulation. Differences in breast tumour and matched normal tissues were probed using a variety of statistical and machine-learning-based strategies. Breast tumour and matched normal tissue microbiome profiles proved sufficiently different to indicate that a classification strategy using bacterial biomarkers could be effective. Leave-one-out cross-validation of the predictive model confirmed the ability to identify malignant breast tissue from its bacterial signature with 84.78% accuracy, with a corresponding area under the receiver operating characteristic curve of 0.888. This study provides proof-of-concept data, from fit-for-purpose study material, on the potential to use the bacterial signature of tissue biopsies to identify their malignancy status.
Assuntos
Bactérias/isolamento & purificação , Neoplasias da Mama/microbiologia , Mama/microbiologia , Adulto , Idoso , Idoso de 80 Anos ou mais , Bactérias/genética , Biópsia , Mama/patologia , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/patologia , Feminino , Genômica , Humanos , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , RNA Ribossômico 16S/genéticaRESUMO
OBJECTIVE: The microbiome contributes to the pathogenesis of inflammatory bowel disease (IBD) but the relative contribution of different lifestyle and environmental factors to the compositional variability of the gut microbiota is unclear. DESIGN: Here, we rank the size effect of disease activity, medications, diet and geographic location of the faecal microbiota composition (16S rRNA gene sequencing) in patients with Crohn's disease (CD; n=303), ulcerative colitis (UC; n = 228) and controls (n=161), followed longitudinally (at three time points with 16 weeks intervals). RESULTS: Reduced microbiota diversity but increased variability was confirmed in CD and UC compared with controls. Significant compositional differences between diseases, particularly CD, and controls were evident. Longitudinal analyses revealed reduced temporal microbiota stability in IBD, particularly in patients with changes in disease activity. Machine learning separated disease from controls, and active from inactive disease, when consecutive time points were modelled. Geographic location accounted for most of the microbiota variance, second to the presence or absence of CD, followed by history of surgical resection, alcohol consumption and UC diagnosis, medications and diet with most (90.3%) of the compositional variance stochastic or unexplained. CONCLUSION: The popular concept of precision medicine and rational design of any therapeutic manipulation of the microbiota will have to contend not only with the heterogeneity of the host response, but also with widely differing lifestyles and with much variance still unaccounted for.