Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
mSphere ; 8(5): e0033623, 2023 10 24.
Artículo en Inglés | MEDLINE | ID: mdl-37615431

RESUMEN

The ability to use 16S rRNA gene sequence data to train machine learning classification models offers the opportunity to diagnose patients based on the composition of their microbiome. In some applications, the taxonomic resolution that provides the best models may require the use of de novo operational taxonomic units (OTUs) whose composition changes when new data are added. We previously developed a new reference-based approach, OptiFit, that fits new sequence data to existing de novo OTUs without changing the composition of the original OTUs. While OptiFit produces OTUs that are as high quality as de novo OTUs, it is unclear whether this method for fitting new sequence data into existing OTUs will impact the performance of classification models relative to models trained and tested only using de novo OTUs. We used OptiFit to cluster sequences into existing OTUs and evaluated model performance in classifying a dataset containing samples from patients with and without colonic screen relevant neoplasia (SRN). We compared the performance of this model to standard methods including de novo and database-reference-based clustering. We found that using OptiFit performed as well or better in classifying SRNs. OptiFit can streamline the process of classifying new samples by avoiding the need to retrain models using reclustered sequences. IMPORTANCE There is great potential for using microbiome data to aid in diagnosis. A challenge with de novo operational taxonomic unit (OTU)-based classification models is that 16S rRNA gene sequences are often assigned to OTUs based on similarity to other sequences in the dataset. If data are generated from new patients, the old and new sequences must be reclustered to OTUs and the classification model retrained. Yet there is a desire to have a single, validated model that can be widely deployed. To overcome this obstacle, we applied the OptiFit clustering algorithm to fit new sequence data to existing OTUs allowing for reuse of the model. A random forest model implemented using OptiFit performed as well as the traditional reassign and retrain approach. This result shows that it is possible to train and apply machine learning models based on OTU relative abundance data that do not require retraining or the use of a reference database.


Asunto(s)
Metagenómica , Microbiota , Humanos , Análisis de Secuencia de ADN/métodos , ARN Ribosómico 16S/genética , Metagenómica/métodos , Algoritmos , Microbiota/genética
2.
mBio ; 13(4): e0190422, 2022 08 30.
Artículo en Inglés | MEDLINE | ID: mdl-35900107

RESUMEN

Susceptibility to Clostridioides difficile infection (CDI) typically follows the administration of antibiotics. Patients with inflammatory bowel disease (IBD) have increased incidence of CDI, even in the absence of antibiotic treatment. However, the mechanisms underlying this susceptibility are not well understood. To explore the intersection between CDI and IBD, we recently described a mouse model where colitis triggered by the murine gut bacterium, Helicobacter hepaticus, in IL-10-/- mice led to susceptibility to C. difficile colonization without antibiotic administration. The current work disentangles the relative contributions of inflammation and gut microbiota in colonization resistance to C. difficile in this model. We show that inflammation drives changes in microbiota composition, which leads to CDI susceptibility. Decreasing inflammation with an anti-p40 monoclonal antibody promotes a shift of the microbiota back toward a colonization-resistant state. Transferring microbiota from susceptible and resistant mice to germfree animals transfers the susceptibility phenotype, supporting the primacy of the microbiota in colonization resistance. These findings shine light on the complex interactions between the host, microbiota, and C. difficile in the context of intestinal inflammation, and may form a basis for the development of strategies to prevent or treat CDI in IBD patients. IMPORTANCE Patients with inflammatory bowel disease (IBD) have an increased risk of developing C. difficile infection (CDI), even in the absence of antibiotic treatment. Yet, mechanisms regulating C. difficile colonization in IBD patients remain unclear. Here, we use an antibiotic-independent mouse model to demonstrate that intestinal inflammation alters microbiota composition to permit C. difficile colonization in mice with colitis. Notably, treating inflammation with an anti-p40 monoclonal antibody, a clinically relevant IBD therapeutic, restores microbiota-mediated colonization resistance to the pathogen. Through microbiota transfer experiments in germfree mice, we confirm that the microbiota shaped in the setting of IBD is the primary driver of susceptibility to C. diffiicile colonization. Collectively, our findings provide insight into CDI pathogenesis in the context of intestinal inflammation, which may inform methods to manage infection in IBD patients. More broadly, this work advances our understanding of mechanisms by which the host-microbiota interface modulates colonization resistance to C. difficile.


Asunto(s)
Clostridioides difficile , Infecciones por Clostridium , Colitis , Enfermedades Inflamatorias del Intestino , Microbiota , Animales , Antibacterianos/uso terapéutico , Anticuerpos Monoclonales , Clostridioides , Infecciones por Clostridium/microbiología , Modelos Animales de Enfermedad , Inflamación , Ratones
3.
Artículo en Inglés | MEDLINE | ID: mdl-35224460

RESUMEN

Inspired by well-established material and pedagogy provided by The Carpentries (Wilson, 2016), we developed a two-day workshop curriculum that teaches introductory R programming for managing, analyzing, plotting and reporting data using packages from the tidyverse (Wickham et al., 2019), the Unix shell, version control with git, and GitHub. While the official Software Carpentry curriculum is comprehensive, we found that it contains too much content for a two-day workshop. We also felt that the independent nature of the lessons left learners confused about how to integrate the newly acquired programming skills in their own work. Thus, we developed a new curriculum that aims to teach novices how to implement reproducible research principles in their own data analysis. The curriculum integrates live coding lessons with individual-level and group-based practice exercises, and also serves as a succinct resource that learners can reference both during and after the workshop. Moreover, it lowers the entry barrier for new instructors as they do not have to develop their own teaching materials or sift through extensive content. We developed this curriculum during a two-day sprint, successfully used it to host a two-day virtual workshop with almost 40 participants, and updated the material based on instructor and learner feedback. We hope that our new curriculum will prove useful to future instructors interested in teaching workshops with similar learning objectives.

4.
mSphere ; 7(1): e0091621, 2022 02 23.
Artículo en Inglés | MEDLINE | ID: mdl-35107341

RESUMEN

Assigning amplicon sequences to operational taxonomic units (OTUs) is an important step in characterizing microbial communities across large data sets. A notable difference between de novo clustering and database-dependent reference clustering methods is that OTU assignments from de novo methods may change when new sequences are added. However, one may wish to incorporate new samples to previously clustered data sets without clustering all sequences again, such as when comparing across data sets or deploying machine learning models. Existing reference-based methods produce consistent OTUs but only consider the similarity of each query sequence to a single reference sequence in an OTU, resulting in assignments that are worse than those generated by de novo methods. To provide an efficient method to fit sequences to existing OTUs, we developed the OptiFit algorithm. Inspired by the de novo OptiClust algorithm, OptiFit considers the similarity of all pairs of reference and query sequences to produce OTUs of the best possible quality. We tested OptiFit using four data sets with two strategies: (i) clustering to a reference database and (ii) splitting the data set into a reference and query set, clustering the references using OptiClust, and then clustering the queries to the references. The result is an improved implementation of reference-based clustering. OptiFit produces OTUs of a quality similar to that of OptiClust at faster speeds when using the split data set strategy. OptiFit provides a suitable option for users requiring consistent OTU assignments at the same quality as afforded by de novo clustering methods. IMPORTANCE Advancements in DNA sequencing technology have allowed researchers to affordably generate millions of sequence reads from microorganisms in diverse environments. Efficient and robust software tools are needed to assign microbial sequences into taxonomic groups for characterization and comparison of communities. The OptiClust algorithm produces high-quality groups by comparing sequences to each other, but the assignments can change when new sequences are added to a data set, making it difficult to compare different studies. Other approaches assign sequences to groups by comparing them to sequences in a reference database to produce consistent assignments, but the quality of the groups produced is reduced compared to that with OptiClust. We developed OptiFit, a new reference-based algorithm that produces consistent yet high-quality assignments like OptiClust. OptiFit allows researchers to compare microbial communities across different studies or add new data to existing studies without sacrificing the quality of the group assignments.


Asunto(s)
Metagenómica , Análisis por Conglomerados , Metagenómica/métodos , Filogenia , ARN Ribosómico 16S/genética , Análisis de Secuencia de ADN/métodos
5.
Artículo en Inglés | MEDLINE | ID: mdl-34414351

RESUMEN

Machine learning (ML) for classification and prediction based on a set of features is used to make decisions in healthcare, economics, criminal justice and more. However, implementing an ML pipeline including preprocessing, model selection, and evaluation can be time-consuming, confusing, and difficult. Here, we present mikropml (prononced "meek-ROPE em el"), an easy-to-use R package that implements ML pipelines using regression, support vector machines, decision trees, random forest, or gradient-boosted trees. The package is available on GitHub, CRAN, and conda.

6.
Proc Natl Acad Sci U S A ; 118(17)2021 04 27.
Artículo en Inglés | MEDLINE | ID: mdl-33888580

RESUMEN

The North American tiger salamander species complex, including its best-known species, the Mexican axolotl, has long been a source of biological fascination. The complex exhibits a wide range of variation in developmental life history strategies, including populations and individuals that undergo metamorphosis; those able to forego metamorphosis and retain a larval, aquatic lifestyle (i.e., paedomorphosis); and those that do both. The evolution of a paedomorphic life history state is thought to lead to increased population genetic differentiation and ultimately reproductive isolation and speciation, but the degree to which it has shaped population- and species-level divergence is poorly understood. Using a large multilocus dataset from hundreds of samples across North America, we identified genetic clusters across the geographic range of the tiger salamander complex. These clusters often contain a mixture of paedomorphic and metamorphic taxa, indicating that geographic isolation has played a larger role in lineage divergence than paedomorphosis in this system. This conclusion is bolstered by geography-informed analyses indicating no effect of life history strategy on population genetic differentiation and by model-based population genetic analyses demonstrating gene flow between adjacent metamorphic and paedomorphic populations. This fine-scale genetic perspective on life history variation establishes a framework for understanding how plasticity, local adaptation, and gene flow contribute to lineage divergence. Many members of the tiger salamander complex are endangered, and the Mexican axolotl is an important model system in regenerative and biomedical research. Our results chart a course for more informed use of these taxa in experimental, ecological, and conservation research.


Asunto(s)
Ambystoma/genética , Ambystoma/metabolismo , Ambystoma mexicanum/genética , Animales , Bases de Datos Genéticas , Flujo Génico , Genética de Población/métodos , Geografía , Larva/genética , Metamorfosis Biológica/genética , América del Norte , Filogenia
7.
Artículo en Inglés | MEDLINE | ID: mdl-35187422

RESUMEN

We are bioinformatics trainees at the University of Michigan who started a local chapter of Girls Who Code to provide a fun and supportive environment for high school women to learn the power of coding. Our goal was to cover basic coding topics and data science concepts through live coding and hands-on practice. However, we could not find a resource that exactly met our needs. Therefore, over the past three years, we have developed a curriculum and instructional format using Jupyter notebooks to effectively teach introductory Python for data science. This method, inspired by The Carpentries organization, uses bite-sized lessons followed by independent practice time to reinforce coding concepts, and culminates in a data science capstone project using real-world data. We believe our open curriculum is a valuable resource to the wider education community and hope that educators will use and improve our lessons, practice problems, and teaching best practices. Anyone can contribute to our Open Educational Resources on GitHub.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...