Your browser doesn't support javascript.
loading
Supervised Machine Learning Enables Geospatial Microbial Provenance.
Bhattacharya, Chandrima; Tierney, Braden T; Ryon, Krista A; Bhattacharyya, Malay; Hastings, Jaden J A; Basu, Srijani; Bhattacharya, Bodhisatwa; Bagchi, Debneel; Mukherjee, Somsubhro; Wang, Lu; Henaff, Elizabeth M; Mason, Christopher E.
Afiliação
  • Bhattacharya C; Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine, New York, NY 10065, USA.
  • Tierney BT; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10065, USA.
  • Ryon KA; Integrated Design and Media, Center for Urban Science and Progress, NYU Tandon School of Engineering, Brooklyn, New York, NY 11201, USA.
  • Bhattacharyya M; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10065, USA.
  • Hastings JJA; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA.
  • Basu S; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10065, USA.
  • Bhattacharya B; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA.
  • Bagchi D; Center for Artificial Intelligence and Machine Learning, Indian Statistical Institute, Kolkata 700108, India.
  • Mukherjee S; Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700108, India.
  • Wang L; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10065, USA.
  • Henaff EM; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA.
  • Mason CE; Department of Medicine, Weill Cornell Medicine, New York, NY 10065, USA.
Genes (Basel) ; 13(10)2022 10 21.
Article em En | MEDLINE | ID: mdl-36292799
ABSTRACT
The recent increase in publicly available metagenomic datasets with geospatial metadata has made it possible to determine location-specific, microbial fingerprints from around the world. Such fingerprints can be useful for comparing microbial niches for environmental research, as well as for applications within forensic science and public health. To determine the regional specificity for environmental metagenomes, we examined 4305 shotgun-sequenced samples from the MetaSUB Consortium dataset-the most extensive public collection of urban microbiomes, spanning 60 different cities, 30 countries, and 6 continents. We were able to identify city-specific microbial fingerprints using supervised machine learning (SML) on the taxonomic classifications, and we also compared the performance of ten SML classifiers. We then further evaluated the five algorithms with the highest accuracy, with the city and continental accuracy ranging from 85-89% to 90-94%, respectively. Thereafter, we used these results to develop Cassandra, a random-forest-based classifier that identifies bioindicator species to aid in fingerprinting and can infer higher-order microbial interactions at each site. We further tested the Cassandra algorithm on the Tara Oceans dataset, the largest collection of marine-based microbial genomes, where it classified the oceanic sample locations with 83% accuracy. These results and code show the utility of SML methods and Cassandra to identify bioindicator species across both oceanic and urban environments, which can help guide ongoing efforts in biotracing, environmental monitoring, and microbial forensics (MF).
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Metagenômica / Microbiota Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Metagenômica / Microbiota Idioma: En Ano de publicação: 2022 Tipo de documento: Article