Your browser doesn't support javascript.
loading
A comparative study of supervised and unsupervised machine learning algorithms applied to human microbiome.
Kalluçi, E; Preni, B; Dhamo, X; Noka, E; Bardhi, S; Macchia, A; Bonetti, G; Dhuli, K; Donato, K; Bertelli, M; Zambrano, L J M; Janaqi, S.
Afiliação
  • Kalluçi E; Department of Applied Mathematics, Faculty of Natural Sciences, University of Tirana, Tirana, Albania.
  • Preni B; Department of Mathemat-ics, Faculty of Engineering Mathematics and Engineering Physics, Polytechnic University of Tirana, Tirana, Albania.
  • Dhamo X; Department of Applied Mathematics, Faculty of Natural Sciences, University of Tirana, Tirana, Albania.
  • Noka E; Department of Applied Mathematics, Faculty of Natural Sciences, University of Tirana, Tirana, Albania.
  • Bardhi S; Department of Applied Statistics and Informatics, University of Tirana, Tirana, Albania.
  • Macchia A; MAGI'S LAB, Rovereto (TN), Italy.
  • Bonetti G; MAGI'S LAB, Rovereto (TN), Italy.
  • Dhuli K; Department of Pharmaceutical Sciences, University of Perugia, Perugia, Italy.
  • Donato K; MAGI'S LAB, Rovereto (TN), Italy.
  • Bertelli M; MAGI EUREGIO, Bolzano, Italy.
  • Zambrano LJM; MAGISNAT, Atlanta Tech Park, Peachtree Corners, GA, USA.
  • Janaqi S; MAGI'S LAB, Rovereto (TN), Italy.
Clin Ter ; 175(3): 98-116, 2024.
Article em En | MEDLINE | ID: mdl-38767067
ABSTRACT

Background:

The human microbiome, consisting of diverse bacte-rial, fungal, protozoan and viral species, exerts a profound influence on various physiological processes and disease susceptibility. However, the complexity of microbiome data has presented significant challenges in the analysis and interpretation of these intricate datasets, leading to the development of specialized software that employs machine learning algorithms for these aims.

Methods:

In this paper, we analyze raw data taken from 16S rRNA gene sequencing from three studies, including stool samples from healthy control, patients with adenoma, and patients with colorectal cancer. Firstly, we use network-based methods to reduce dimensions of the dataset and consider only the most important features. In addition, we employ supervised machine learning algorithms to make prediction.

Results:

Results show that graph-based techniques reduces dimen-sion from 255 up to 78 features with modularity score 0.73 based on different centrality measures. On the other hand, projection methods (non-negative matrix factorization and principal component analysis) reduce dimensions to 7 features. Furthermore, we apply supervised machine learning algorithms on the most important features obtained from centrality measures and on the ones obtained from projection methods, founding that the evaluation metrics have approximately the same scores when applying the algorithms on the entire dataset, on 78 feature and on 7 features.

Conclusions:

This study demonstrates the efficacy of graph-based and projection methods in the interpretation for 16S rRNA gene sequencing data. Supervised machine learning on refined features from both approaches yields comparable predictive performance, emphasizing specific microbial features-bacteroides, prevotella, fusobacterium, lysinibacillus, blautia, sphingomonas, and faecalibacterium-as key in predicting patient conditions from raw data.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: RNA Ribossômico 16S / Microbiota / Aprendizado de Máquina Supervisionado / Aprendizado de Máquina não Supervisionado Limite: Humans Idioma: En Revista: Clin Ter Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Albânia

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: RNA Ribossômico 16S / Microbiota / Aprendizado de Máquina Supervisionado / Aprendizado de Máquina não Supervisionado Limite: Humans Idioma: En Revista: Clin Ter Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Albânia