RESUMO
Due to the complex nature of microbiome data, the field of microbial ecology has many current and potential uses for machine learning (ML) modeling. With the increased use of predictive ML models across many disciplines, including microbial ecology, there is extensive published information on the specific ML algorithms available and how those algorithms have been applied. Thus, our goal is not to summarize the breadth of ML models available or compare their performances. Rather, our goal is to provide more concrete and actionable information to guide microbial ecologists in how to select, run, and interpret ML algorithms to predict the taxa or genes associated with particular sample categories or environmental gradients of interest. Such microbial data often have unique characteristics that require careful consideration of how to apply ML models and how to interpret the associated results. This review is intended for practicing microbial ecologists who may be unfamiliar with some of the intricacies of ML models. We provide examples and discuss common opportunities and pitfalls specific to applying ML models to the types of data sets most frequently collected by microbial ecologists.
Assuntos
Aprendizado de Máquina , Microbiota , AlgoritmosRESUMO
Genomic information is now available for a broad diversity of bacteria, including uncultivated taxa. However, we have corresponding knowledge on environmental preferences (i.e. bacterial growth responses across gradients in oxygen, pH, temperature, salinity, and other environmental conditions) for a relatively narrow swath of bacterial diversity. These limits to our understanding of bacterial ecologies constrain our ability to predict how assemblages will shift in response to global change factors, design effective probiotics, or guide cultivation efforts. We need innovative approaches that take advantage of expanding genome databases to accurately infer the environmental preferences of bacteria and validate the accuracy of these inferences. By doing so, we can broaden our quantitative understanding of the environmental preferences of the majority of bacterial taxa that remain uncharacterized. With this perspective, we highlight why it is important to infer environmental preferences from genomic information and discuss the range of potential strategies for doing so. In particular, we highlight concrete examples of how both cultivation-independent and cultivation-dependent approaches can be integrated with genomic data to develop predictive models. We also emphasize the limitations and pitfalls of these approaches and the specific knowledge gaps that need to be addressed to successfully expand our understanding of the environmental preferences of bacteria.
Assuntos
Bactérias , Bactérias/genética , Bactérias/classificação , Genoma Bacteriano , Genômica , Meio Ambiente , Fenômenos Fisiológicos BacterianosRESUMO
The environmental preferences of many microbes remain undetermined. This is the case for bacterial pH preferences, which can be difficult to predict a priori despite the importance of pH as a factor structuring bacterial communities in many systems. We compiled data on bacterial distributions from five datasets spanning pH gradients in soil and freshwater systems (1470 samples), quantified the pH preferences of bacterial taxa across these datasets, and compiled genomic data from representative bacterial taxa. While taxonomic and phylogenetic information were generally poor predictors of bacterial pH preferences, we identified genes consistently associated with pH preference across environments. We then developed and validated a machine learning model to estimate bacterial pH preferences from genomic information alone, a model that could aid in the selection of microbial inoculants, improve species distribution models, or help design effective cultivation strategies. More generally, we demonstrate the value of combining biogeographic and genomic data to infer and predict the environmental preferences of diverse bacterial taxa.
Assuntos
Bactérias , Microbiologia do Solo , Filogenia , Bactérias/genética , Solo , Concentração de Íons de HidrogênioRESUMO
Wastewater microbial communities are not static and can vary significantly across time and space, but this variation and the factors driving the observed spatiotemporal variation often remain undetermined. We used a shotgun metagenomic approach to investigate changes in wastewater microbial communities across 17 locations in a sewer network, with samples collected from each location over a 3-week period. Fecal material-derived bacteria constituted a relatively small fraction of the taxa found in the collected samples, highlighting the importance of environmental sources to the sewage microbiome. The prokaryotic communities were highly variable in composition depending on the location within the sampling network, and this spatial variation was most strongly associated with location-specific differences in sewage pH. However, we also observed substantial temporal variation in the composition of the prokaryotic communities at individual locations. This temporal variation was asynchronous across sampling locations, emphasizing the importance of independently considering both spatial and temporal variation when assessing the wastewater microbiome. The spatiotemporal patterns in viral community composition closely tracked those of the prokaryotic communities, allowing us to putatively identify the bacterial hosts of some of the dominant viruses in these systems. Finally, we found that antibiotic resistance gene profiles also exhibit a high degree of spatiotemporal variability, with most of these genes unlikely to be derived from fecal bacteria. Together, these results emphasize the dynamic nature of the wastewater microbiome, the challenges associated with studying these systems, and the utility of metagenomic approaches for building a multifaceted understanding of these microbial communities and their functional attributes. IMPORTANCE Sewage systems harbor extensive microbial diversity, including microbes derived from both human and environmental sources. Studies of the sewage microbiome are useful for monitoring public health and the health of our infrastructure, but the sewage microbiome can be highly variable in ways that are often unresolved. We sequenced DNA recovered from wastewater samples collected over a 3-week period at 17 locations in a single sewer system to determine how these communities vary across time and space. Most of the wastewater bacteria, and the antibiotic resistance genes they harbor, were not derived from human feces, but human usage patterns did impact how the amounts and types of bacteria and bacterial genes we found in these systems varied over time. Likewise, the wastewater communities, including both bacteria and their viruses, varied depending on location within the sewage network, highlighting the challenges and opportunities in efforts to monitor and understand the sewage microbiome.