Your browser doesn't support javascript.
loading
Machine learning approaches in microbiome research: challenges and best practices.
Papoutsoglou, Georgios; Tarazona, Sonia; Lopes, Marta B; Klammsteiner, Thomas; Ibrahimi, Eliana; Eckenberger, Julia; Novielli, Pierfrancesco; Tonda, Alberto; Simeon, Andrea; Shigdel, Rajesh; Béreux, Stéphane; Vitali, Giacomo; Tangaro, Sabina; Lahti, Leo; Temko, Andriy; Claesson, Marcus J; Berland, Magali.
Afiliación
  • Papoutsoglou G; Department of Computer Science, University of Crete, Heraklion, Greece.
  • Tarazona S; JADBio Gnosis DA S.A., Science and Technology Park of Crete, Heraklion, Greece.
  • Lopes MB; Department of Applied Statistics and Operations Research and Quality, Polytechnic University of Valencia, Valencia, Spain.
  • Klammsteiner T; Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal.
  • Ibrahimi E; Research and Development Unit for Mechanical and Industrial Engineering (UNIDEMI), Department of Mechanical and Industrial Engineering, NOVA School of Science and Technology, Caparica, Portugal.
  • Eckenberger J; Department of Ecology, Universität Innsbruck, Innsbruck, Austria.
  • Novielli P; Department of Microbiology, Universität Innsbruck, Innsbruck, Austria.
  • Tonda A; Department of Biology, University of Tirana, Tirana, Albania.
  • Simeon A; School of Microbiology, University College Cork, Cork, Ireland.
  • Shigdel R; APC Microbiome Ireland, Cork, Ireland.
  • Béreux S; Department of Soil, Plant, and Food Sciences, University of Bari Aldo Moro, Bari, Italy.
  • Vitali G; National Institute for Nuclear Physics, Bari Division, Bari, Italy.
  • Tangaro S; UMR 518 MIA-PS, INRAE, Paris-Saclay University, Palaiseau, France.
  • Lahti L; Complex Systems Institute of Paris Ile-de-France (ISC-PIF) - UAR 3611 CNRS, Paris, France.
  • Temko A; BioSense Institute, University of Novi Sad, Novi Sad, Serbia.
  • Claesson MJ; Department of Clinical Science, University of Bergen, Bergen, Norway.
  • Berland M; MetaGenoPolis, INRAE, Paris-Saclay University, Jouy-en-Josas, France.
Front Microbiol ; 14: 1261889, 2023.
Article en En | MEDLINE | ID: mdl-37808286
ABSTRACT
Microbiome data predictive analysis within a machine learning (ML) workflow presents numerous domain-specific challenges involving preprocessing, feature selection, predictive modeling, performance estimation, model interpretation, and the extraction of biological information from the results. To assist decision-making, we offer a set of recommendations on algorithm selection, pipeline creation and evaluation, stemming from the COST Action ML4Microbiome. We compared the suggested approaches on a multi-cohort shotgun metagenomics dataset of colorectal cancer patients, focusing on their performance in disease diagnosis and biomarker discovery. It is demonstrated that the use of compositional transformations and filtering methods as part of data preprocessing does not always improve the predictive performance of a model. In contrast, the multivariate feature selection, such as the Statistically Equivalent Signatures algorithm, was effective in reducing the classification error. When validated on a separate test dataset, this algorithm in combination with random forest modeling, provided the most accurate performance estimates. Lastly, we showed how linear modeling by logistic regression coupled with visualization techniques such as Individual Conditional Expectation (ICE) plots can yield interpretable results and offer biological insights. These findings are significant for clinicians and non-experts alike in translational applications.
Palabras clave

Texto completo: 1 Colección: 01-internacional Tipo de estudio: Guideline / Prognostic_studies Idioma: En Revista: Front Microbiol Año: 2023 Tipo del documento: Article País de afiliación: Grecia

Texto completo: 1 Colección: 01-internacional Tipo de estudio: Guideline / Prognostic_studies Idioma: En Revista: Front Microbiol Año: 2023 Tipo del documento: Article País de afiliación: Grecia