Your browser doesn't support javascript.
loading
Investigation of machine learning algorithms for taxonomic classification of marine metagenomes.
Park, Helen; Lim, Shen Jean; Cosme, Jonathan; O'Connell, Kyle; Sandeep, Jilla; Gayanilo, Felimon; Cutter, George R; Montes, Enrique; Nitikitpaiboon, Chotinan; Fisher, Sam; Moustahfid, Hassan; Thompson, Luke R.
Afiliación
  • Park H; Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua-Peking Center for Life Sciences, Tsinghua University , Beijing, China.
  • Lim SJ; EPSRC/BBSRC Future Biomanufacturing Research Hub, EPSRC Synthetic Biology Research Centre SYNBIOCHEM Manchester Institute of Biotechnology and School of Chemistry, The University of Manchester , Manchester, United Kingdom.
  • Cosme J; Cooperative Institute for Marine and Atmospheric Studies, Rosenstiel School of Marine, Atmospheric, and Earth Science, University of Miami , Miami, Florida, USA.
  • O'Connell K; Ocean Chemistry and Ecosystems Division, Atlantic Oceanographic and Meteorological Laboratory, National Oceanic and Atmospheric Administration , Miami, Florida, USA.
  • Sandeep J; College of Marine Science, University of South Florida , St Petersburg, Florida, USA.
  • Gayanilo F; Run:AI, Office of the CTO , Tel Aviv, Israel.
  • Cutter GR; Deloitte Consulting LLP, Biomedical Data Science Team , Arlington, Virginia, USA.
  • Montes E; Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution , Northwest, Washington, DC, USA.
  • Nitikitpaiboon C; Harte Research Institute, Texas A&M University-Corpus Christi , Corpus Christi, Texas, USA.
  • Fisher S; Harte Research Institute, Texas A&M University-Corpus Christi , Corpus Christi, Texas, USA.
  • Moustahfid H; Southwest Fisheries Science Center, Antarctic Ecosystem Research Division, National Oceanic and Atmospheric Administration , La Jolla, California, USA.
  • Thompson LR; Cooperative Institute for Marine and Atmospheric Studies, Rosenstiel School of Marine, Atmospheric, and Earth Science, University of Miami , Miami, Florida, USA.
Microbiol Spectr ; : e0523722, 2023 Sep 11.
Article en En | MEDLINE | ID: mdl-37695074
ABSTRACT
Microbial communities play key roles in ocean ecosystems through regulation of biogeochemical processes such as carbon and nutrient cycling, food web dynamics, and gut microbiomes of invertebrates, fish, reptiles, and mammals. Assessments of marine microbial diversity are therefore critical to understanding spatiotemporal variations in microbial community structure and function in ocean ecosystems. With recent advances in DNA shotgun sequencing for metagenome samples and computational analysis, it is now possible to access the taxonomic and genomic content of ocean microbial communities to study their structural patterns, diversity, and functional potential. However, existing taxonomic classification tools depend upon manually curated phylogenetic trees, which can create inaccuracies in metagenomes from less well-characterized communities, such as from ocean water. Herein, we explore the utility of deep learning tools-DeepMicrobes and a novel Residual Network architecture-that leverage natural language processing and convolutional neural network architectures to map input sequence data (k-mers) to output labels (taxonomic groups) without reliance on a curated taxonomic tree. We trained both models using metagenomic reads simulated from marine microbial genomes in the MarRef database. The performance of both models (accuracy, precision, and percent microbe predicted) was compared with the standard taxonomic classification tool Kraken2 using 10 complex metagenomic data sets simulated from MarRef. Our results demonstrate that time, compute power, and microbial genomic diversity still pose challenges for machine learning (ML). Moreover, our results suggest that high genome coverage and rectification of class imbalance are prerequisites for a well-trained model, and therefore should be a major consideration in future ML work. IMPORTANCE Taxonomic profiling of microbial communities is essential to model microbial interactions and inform habitat conservation. This work develops approaches in constructing training/testing data sets from publicly available marine metagenomes and evaluates the performance of machine learning (ML) approaches in read-based taxonomic classification of marine metagenomes. Predictions from two models are used to test accuracy in metagenomic classification and to guide improvements in ML approaches. Our study provides insights on the methods, results, and challenges of deep learning on marine microbial metagenomic data sets. Future machine learning approaches can be improved by rectifying genome coverage and class imbalance in the training data sets, developing alternative models, and increasing the accessibility of computational resources for model training and refinement.
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Tipo de estudio: Prognostic_studies Idioma: En Revista: Microbiol Spectr Año: 2023 Tipo del documento: Article País de afiliación: China

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Tipo de estudio: Prognostic_studies Idioma: En Revista: Microbiol Spectr Año: 2023 Tipo del documento: Article País de afiliación: China