Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

Facilitating bioinformatics reproducibility with QIIME 2 Provenance Replay.

Keefe, Christopher R; Dillon, Matthew R; Gehret, Elizabeth; Herman, Chloe; Jewell, Mary; Wood, Colin V; Bolyen, Evan; Caporaso, J Gregory.

PLoS Comput Biol ; 19(11): e1011676, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-38011287

RESUMO

Study reproducibility is essential to corroborate, build on, and learn from the results of scientific research but is notoriously challenging in bioinformatics, which often involves large data sets and complex analytic workflows involving many different tools. Additionally, many biologists are not trained in how to effectively record their bioinformatics analysis steps to ensure reproducibility, so critical information is often missing. Software tools used in bioinformatics can automate provenance tracking of the results they generate, removing most barriers to bioinformatics reproducibility. Here we present an implementation of that idea, Provenance Replay, a tool for generating new executable code from results generated with the QIIME 2 bioinformatics platform, and discuss considerations for bioinformatics developers who wish to implement similar functionality in their software.

Assuntos

Biologia Computacional , Software , Reprodutibilidade dos Testes , Biologia Computacional/métodos , Fluxo de Trabalho

2.

Predicting Neurodegenerative Disease Using Prepathology Gut Microbiota Composition: a Longitudinal Study in Mice Modeling Alzheimer's Disease Pathologies.

Borsom, Emily M; Conn, Kathryn; Keefe, Christopher R; Herman, Chloe; Orsini, Gabrielle M; Hirsch, Allyson H; Palma Avila, Melanie; Testo, George; Jaramillo, Sierra A; Bolyen, Evan; Lee, Keehoon; Caporaso, J Gregory; Cope, Emily K.

Microbiol Spectr ; : e0345822, 2023 Mar 06.

Artigo em Inglês | MEDLINE | ID: mdl-36877047

RESUMO

The gut microbiota-brain axis is suspected to contribute to the development of Alzheimer's disease (AD), a neurodegenerative disease characterized by amyloid-ß plaque deposition, neurofibrillary tangles, and neuroinflammation. To evaluate the role of the gut microbiota-brain axis in AD, we characterized the gut microbiota of female 3xTg-AD mice modeling amyloidosis and tauopathy and wild-type (WT) genetic controls. Fecal samples were collected fortnightly from 4 to 52 weeks, and the V4 region of the 16S rRNA gene was amplified and sequenced on an Illumina MiSeq. RNA was extracted from the colon and hippocampus, converted to cDNA, and used to measure immune gene expression using reverse transcriptase quantitative PCR (RT-qPCR). Diversity metrics were calculated using QIIME2, and a random forest classifier was applied to predict bacterial features that are important in predicting mouse genotype. Gene expression of glial fibrillary acidic protein (GFAP; indicating astrocytosis) was elevated in the colon at 24 weeks. Markers of Th1 inflammation (il6) and microgliosis (mrc1) were elevated in the hippocampus. Gut microbiota were compositionally distinct early in life between 3xTg-AD mice and WT mice (permutational multivariate analysis of variance [PERMANOVA], 8 weeks, P = 0.001, 24 weeks, P = 0.039, and 52 weeks, P = 0.058). Mouse genotypes were correctly predicted 90 to 100% of the time using fecal microbiome composition. Finally, we show that the relative abundance of Bacteroides species increased over time in 3xTg-AD mice. Taken together, we demonstrate that changes in bacterial gut microbiota composition at prepathology time points are predictive of the development of AD pathologies. IMPORTANCE Recent studies have demonstrated alterations in the gut microbiota composition in mice modeling Alzheimer's disease (AD) pathologies; however, these studies have only included up to 4 time points. Our study is the first of its kind to characterize the gut microbiota of a transgenic AD mouse model, fortnightly, from 4 weeks of age to 52 weeks of age, to quantify the temporal dynamics in the microbial composition that correlate with the development of disease pathologies and host immune gene expression. In this study, we observed temporal changes in the relative abundances of specific microbial taxa, including the genus Bacteroides, that may play a central role in disease progression and the severity of pathologies. The ability to use features of the microbiota to discriminate between mice modeling AD and wild-type mice at prepathology time points indicates a potential role of the gut microbiota as a risk or protective factor in AD.

3.

Experiences and lessons learned from two virtual, hands-on microbiome bioinformatics workshops.

Dillon, Matthew R; Bolyen, Evan; Adamov, Anja; Belk, Aeriel; Borsom, Emily; Burcham, Zachary; Debelius, Justine W; Deel, Heather; Emmons, Alex; Estaki, Mehrbod; Herman, Chloe; Keefe, Christopher R; Morton, Jamie T; Oliveira, Renato R M; Sanchez, Andrew; Simard, Anthony; Vázquez-Baeza, Yoshiki; Ziemski, Michal; Miwa, Hazuki E; Kerere, Terry A; Coote, Carline; Bonneau, Richard; Knight, Rob; Oliveira, Guilherme; Gopalasingam, Piraveen; Kaehler, Benjamin D; Cope, Emily K; Metcalf, Jessica L; Robeson Ii, Michael S; Bokulich, Nicholas A; Caporaso, J Gregory.

PLoS Comput Biol ; 17(6): e1009056, 2021 06.

Artigo em Inglês | MEDLINE | ID: mdl-34166363

RESUMO

In October of 2020, in response to the Coronavirus Disease 2019 (COVID-19) pandemic, our team hosted our first fully online workshop teaching the QIIME 2 microbiome bioinformatics platform. We had 75 enrolled participants who joined from at least 25 different countries on 6 continents, and we had 22 instructors on 4 continents. In the 5-day workshop, participants worked hands-on with a cloud-based shared compute cluster that we deployed for this course. The event was well received, and participants provided feedback and suggestions in a postworkshop questionnaire. In January of 2021, we followed this workshop with a second fully online workshop, incorporating lessons from the first. Here, we present details on the technology and protocols that we used to run these workshops, focusing on the first workshop and then introducing changes made for the second workshop. We discuss what worked well, what didn't work well, and what we plan to do differently in future workshops.

Assuntos

COVID-19 , Biologia Computacional , Microbiota , Biologia Computacional/educação , Biologia Computacional/organização & administração , Retroalimentação , Humanos , SARS-CoV-2

4.

An Early Pandemic Analysis of SARS-CoV-2 Population Structure and Dynamics in Arizona.

Ladner, Jason T; Larsen, Brendan B; Bowers, Jolene R; Hepp, Crystal M; Bolyen, Evan; Folkerts, Megan; Sheridan, Krystal; Pfeiffer, Ashlyn; Yaglom, Hayley; Lemmer, Darrin; Sahl, Jason W; Kaelin, Emily A; Maqsood, Rabia; Bokulich, Nicholas A; Quirk, Grace; Watts, Thomas D; Komatsu, Kenneth K; Waddell, Victor; Lim, Efrem S; Caporaso, J Gregory; Engelthaler, David M; Worobey, Michael; Keim, Paul.

mBio ; 11(5)2020 09 04.

Artigo em Inglês | MEDLINE | ID: mdl-32887735

RESUMO

In December of 2019, a novel coronavirus, SARS-CoV-2, emerged in the city of Wuhan, China, causing severe morbidity and mortality. Since then, the virus has swept across the globe, causing millions of confirmed infections and hundreds of thousands of deaths. To better understand the nature of the pandemic and the introduction and spread of the virus in Arizona, we sequenced viral genomes from clinical samples tested at the TGen North Clinical Laboratory, the Arizona Department of Health Services, and those collected as part of community surveillance projects at Arizona State University and the University of Arizona. Phylogenetic analysis of 84 genomes from across Arizona revealed a minimum of 11 distinct introductions inferred to have occurred during February and March. We show that >80% of our sequences descend from strains that were initially circulating widely in Europe but have since dominated the outbreak in the United States. In addition, we show that the first reported case of community transmission in Arizona descended from the Washington state outbreak that was discovered in late February. Notably, none of the observed transmission clusters are epidemiologically linked to the original travel-related case in the state, suggesting successful early isolation and quarantine. Finally, we use molecular clock analyses to demonstrate a lack of identifiable, widespread cryptic transmission in Arizona prior to the middle of February 2020.IMPORTANCE As the COVID-19 pandemic swept across the United States, there was great differential impact on local and regional communities. One of the earliest and hardest hit regions was in New York, while at the same time Arizona (for example) had low incidence. That situation has changed dramatically, with Arizona now having the highest rate of disease increase in the country. Understanding the roots of the pandemic during the initial months is essential as the pandemic continues and reaches new heights. Genomic analysis and phylogenetic modeling of SARS-COV-2 in Arizona can help to reconstruct population composition and predict the earliest undetected introductions. This foundational work represents the basis for future analysis and understanding as the pandemic continues.

Assuntos

Betacoronavirus/genética , Infecções por Coronavirus/epidemiologia , Infecções por Coronavirus/transmissão , Pneumonia Viral/epidemiologia , Pneumonia Viral/transmissão , Arizona/epidemiologia , Betacoronavirus/classificação , Betacoronavirus/isolamento & purificação , COVID-19 , Infecções por Coronavirus/virologia , Evolução Molecular , Genoma Viral/genética , Humanos , Incidência , Mutação , Pandemias , Filogenia , Pneumonia Viral/virologia , SARS-CoV-2 , Proteínas Virais/genética

5.

QIIME 2 Enables Comprehensive End-to-End Analysis of Diverse Microbiome Data and Comparative Studies with Publicly Available Data.

Estaki, Mehrbod; Jiang, Lingjing; Bokulich, Nicholas A; McDonald, Daniel; González, Antonio; Kosciolek, Tomasz; Martino, Cameron; Zhu, Qiyun; Birmingham, Amanda; Vázquez-Baeza, Yoshiki; Dillon, Matthew R; Bolyen, Evan; Caporaso, J Gregory; Knight, Rob.

Curr Protoc Bioinformatics ; 70(1): e100, 2020 06.

Artigo em Inglês | MEDLINE | ID: mdl-32343490

RESUMO

QIIME 2 is a completely re-engineered microbiome bioinformatics platform based on the popular QIIME platform, which it has replaced. QIIME 2 facilitates comprehensive and fully reproducible microbiome data science, improving accessibility to diverse users by adding multiple user interfaces. QIIME 2 can be combined with Qiita, an open-source web-based platform, to re-use available data for meta-analysis. The following basic protocol describes how to install QIIME 2 on a single computer and analyze microbiome sequence data, from processing of raw DNA sequence reads through generating publishable interactive figures. These interactive figures allow readers of a study to interact with data with the same ease as its authors, advancing microbiome science transparency and reproducibility. We also show how plug-ins developed by the community to add analysis capabilities can be installed and used with QIIME 2, enhancing various aspects of microbiome analyses-e.g., improving taxonomic classification accuracy. Finally, we illustrate how users can perform meta-analyses combining different datasets using readily available public data through Qiita. In this tutorial, we analyze a subset of the Early Childhood Antibiotics and the Microbiome (ECAM) study, which tracked the microbiome composition and development of 43 infants in the United States from birth to 2 years of age, identifying microbiome associations with antibiotic exposure, delivery mode, and diet. For more information about QIIME 2, see https://qiime2.org. To troubleshoot or ask questions about QIIME 2 and microbiome analysis, join the active community at https://forum.qiime2.org. © 2020 The Authors. Basic Protocol: Using QIIME 2 with microbiome data Support Protocol: Further microbiome analyses.

Assuntos

Bases de Dados como Assunto , Microbiota , Software , Biodiversidade , Modelos Lineares , Filogenia

6.

Reproducibly sampling SARS-CoV-2 genomes across time, geography, and viral diversity.

Bolyen, Evan; Dillon, Matthew R; Bokulich, Nicholas A; Ladner, Jason T; Larsen, Brendan B; Hepp, Crystal M; Lemmer, Darrin; Sahl, Jason W; Sanchez, Andrew; Holdgraf, Chris; Sewell, Chris; Choudhury, Aakash G; Stachurski, John; McKay, Matthew; Simard, Anthony; Engelthaler, David M; Worobey, Michael; Keim, Paul; Caporaso, J Gregory.

F1000Res ; 9: 657, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33500774

RESUMO

The COVID-19 pandemic has led to a rapid accumulation of SARS-CoV-2 genomes, enabling genomic epidemiology on local and global scales. Collections of genomes from resources such as GISAID must be subsampled to enable computationally feasible phylogenetic and other analyses. We present genome-sampler, a software package that supports sampling collections of viral genomes across multiple axes including time of genome isolation, location of genome isolation, and viral diversity. The software is modular in design so that these or future sampling approaches can be applied independently and combined (or replaced with a random sampling approach) to facilitate custom workflows and benchmarking. genome-sampler is written as a QIIME 2 plugin, ensuring that its application is fully reproducible through QIIME 2's unique retrospective data provenance tracking system. genome-sampler can be installed in a conda environment on macOS or Linux systems. A complete default pipeline is available through a Snakemake workflow, so subsampling can be achieved using a single command. genome-sampler is open source, free for all to use, and available at https://caporasolab.us/genome-sampler. We hope that this will facilitate SARS-CoV-2 research and support evaluation of viral genome sampling approaches for genomic epidemiology.

Assuntos

Genoma Viral , Filogenia , SARS-CoV-2/genética , COVID-19 , Biologia Computacional , Geografia , Humanos , Pandemias , Estudos Retrospectivos , Software

7.

Author Correction: Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2.

Bolyen, Evan; Rideout, Jai Ram; Dillon, Matthew R; Bokulich, Nicholas A; Abnet, Christian C; Al-Ghalith, Gabriel A; Alexander, Harriet; Alm, Eric J; Arumugam, Manimozhiyan; Asnicar, Francesco; Bai, Yang; Bisanz, Jordan E; Bittinger, Kyle; Brejnrod, Asker; Brislawn, Colin J; Brown, C Titus; Callahan, Benjamin J; Caraballo-Rodríguez, Andrés Mauricio; Chase, John; Cope, Emily K; Da Silva, Ricardo; Diener, Christian; Dorrestein, Pieter C; Douglas, Gavin M; Durall, Daniel M; Duvallet, Claire; Edwardson, Christian F; Ernst, Madeleine; Estaki, Mehrbod; Fouquier, Jennifer; Gauglitz, Julia M; Gibbons, Sean M; Gibson, Deanna L; Gonzalez, Antonio; Gorlick, Kestrel; Guo, Jiarong; Hillmann, Benjamin; Holmes, Susan; Holste, Hannes; Huttenhower, Curtis; Huttley, Gavin A; Janssen, Stefan; Jarmusch, Alan K; Jiang, Lingjing; Kaehler, Benjamin D; Kang, Kyo Bin; Keefe, Christopher R; Keim, Paul; Kelley, Scott T; Knights, Dan.

Nat Biotechnol ; 37(9): 1091, 2019 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-31399723

RESUMO

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

8.

Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2.

Bolyen, Evan; Rideout, Jai Ram; Dillon, Matthew R; Bokulich, Nicholas A; Abnet, Christian C; Al-Ghalith, Gabriel A; Alexander, Harriet; Alm, Eric J; Arumugam, Manimozhiyan; Asnicar, Francesco; Bai, Yang; Bisanz, Jordan E; Bittinger, Kyle; Brejnrod, Asker; Brislawn, Colin J; Brown, C Titus; Callahan, Benjamin J; Caraballo-Rodríguez, Andrés Mauricio; Chase, John; Cope, Emily K; Da Silva, Ricardo; Diener, Christian; Dorrestein, Pieter C; Douglas, Gavin M; Durall, Daniel M; Duvallet, Claire; Edwardson, Christian F; Ernst, Madeleine; Estaki, Mehrbod; Fouquier, Jennifer; Gauglitz, Julia M; Gibbons, Sean M; Gibson, Deanna L; Gonzalez, Antonio; Gorlick, Kestrel; Guo, Jiarong; Hillmann, Benjamin; Holmes, Susan; Holste, Hannes; Huttenhower, Curtis; Huttley, Gavin A; Janssen, Stefan; Jarmusch, Alan K; Jiang, Lingjing; Kaehler, Benjamin D; Kang, Kyo Bin; Keefe, Christopher R; Keim, Paul; Kelley, Scott T; Knights, Dan.

Nat Biotechnol ; 37(8): 852-857, 2019 08.

Artigo em Inglês | MEDLINE | ID: mdl-31341288

Assuntos

Biologia Computacional , Ciência de Dados , Microbiota , Software , Bases de Dados Factuais , Humanos

9.

Domestic canines do not display evidence of gut microbial dysbiosis in the presence of Clostridioides (Clostridium) difficile, despite cellular susceptibility to its toxins.

Stone, Nathan E; Nunnally, Amalee E; Jimenez, Victor; Cope, Emily K; Sahl, Jason W; Sheridan, Krystal; Hornstra, Heidie M; Vinocur, Jacob; Settles, Erik W; Headley, Kyle C; Williamson, Charles H D; Rideout, Jai Ram; Bolyen, Evan; Caporaso, J Gregory; Terriquez, Joel; Monroy, Fernando P; Busch, Joseph D; Keim, Paul; Wagner, David M.

Anaerobe ; 58: 53-72, 2019 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-30946985

RESUMO

Clostridioides difficile infection (CDI) is an emerging public health threat and C. difficile is the most common cause of antimicrobial-associated diarrhea worldwide and the leading cause of hospital-associated infections in the US, yet the burden of community-acquired infections (CAI) is poorly understood. Characterizing C. difficile isolated from canines is important for understanding the role that canines may play in CAI. In addition, several studies have suggested that canines carry toxigenic C. difficile asymptomatically, which may imply that there are mechanisms responsible for resistance to CDI in canines that could be exploited to help combat human CDI. To assess the virulence potential of canine-derived C. difficile, we tested whether toxins TcdA and TcdB (hereafter toxins) derived from a canine isolate were capable of causing tight junction disruptions to colonic epithelial cells. Additionally, we addressed whether major differences exist between human and canine cells regarding C. difficile pathogenicity by exposing them to identical toxins. We then examined the canine gut microbiome associated with C. difficile carriage using 16S rRNA gene sequencing and searched for deviations from homeostasis as an indicator of CDI. Finally, we queried 16S rRNA gene sequences for bacterial taxa that may be associated with resistance to CDI in canines. Clostridioides difficile isolated from a canine produced toxins that reduced tight junction integrity in both human and canine cells in vitro. However, canine guts were not dysbiotic in the presence of C. difficile. These findings support asymptomatic carriage in canines and, furthermore, suggest that there are features of the gut microbiome and/or a canine-specific immune response that may protect canines against CDI. We identified two biologically relevant bacteria that may aid in CDI resistance in canines: 1) Clostridium hiranonis, which synthesizes secondary bile acids that have been shown to provide resistance to CDI in mice; and 2) Sphingobacterium faecium, which produces sphingophospholipids that may be associated with regulating homeostasis in the canine gut. Our findings suggest that canines may be cryptic reservoirs for C. difficile and, furthermore, that mechanisms of CDI resistance in the canine gut could provide insights into targeted therapeutics for human CDI.

Assuntos

Biota , Clostridioides difficile/crescimento & desenvolvimento , Infecções por Clostridium/veterinária , Doenças do Cão/microbiologia , Disbiose , Trato Gastrointestinal/microbiologia , Animais , Proteínas de Bactérias/toxicidade , Toxinas Bacterianas/toxicidade , Células CACO-2 , Sobrevivência Celular/efeitos dos fármacos , Clostridioides difficile/patogenicidade , Infecções por Clostridium/microbiologia , Cães , Enterotoxinas/toxicidade , Células Epiteliais/efeitos dos fármacos , Células Epiteliais/microbiologia , Células Epiteliais/fisiologia , Humanos , Camundongos , Fosfolipídeos/análise , Junções Íntimas/efeitos dos fármacos

10.

q2-longitudinal: Longitudinal and Paired-Sample Analyses of Microbiome Data.

Bokulich, Nicholas A; Dillon, Matthew R; Zhang, Yilong; Rideout, Jai Ram; Bolyen, Evan; Li, Huilin; Albert, Paul S; Caporaso, J Gregory.

mSystems ; 3(6)2018.

Artigo em Inglês | MEDLINE | ID: mdl-30505944

RESUMO

Studies of host-associated and environmental microbiomes often incorporate longitudinal sampling or paired samples in their experimental design. Longitudinal sampling provides valuable information about temporal trends and subject/population heterogeneity, offering advantages over cross-sectional and pre-post study designs. To support the needs of microbiome researchers performing longitudinal studies, we developed q2-longitudinal, a software plugin for the QIIME 2 microbiome analysis platform (https://qiime2.org). The q2-longitudinal plugin incorporates multiple methods for analysis of longitudinal and paired-sample data, including interactive plotting, linear mixed-effects models, paired differences and distances, microbial interdependence testing, first differencing, longitudinal feature selection, and volatility analyses. The q2-longitudinal package (https://github.com/qiime2/q2-longitudinal) is open-source software released under a 3-clause Berkeley Software Distribution (BSD) license and is freely available, including for commercial use. IMPORTANCE Longitudinal sampling provides valuable information about temporal trends and subject/population heterogeneity. We describe q2-longitudinal, a software plugin for longitudinal analysis of microbiome data sets in QIIME 2. The availability of longitudinal statistics and visualizations in the QIIME 2 framework will make the analysis of longitudinal data more accessible to microbiome researchers.

11.

Qiita: rapid, web-enabled microbiome meta-analysis.

Gonzalez, Antonio; Navas-Molina, Jose A; Kosciolek, Tomasz; McDonald, Daniel; Vázquez-Baeza, Yoshiki; Ackermann, Gail; DeReus, Jeff; Janssen, Stefan; Swafford, Austin D; Orchanian, Stephanie B; Sanders, Jon G; Shorenstein, Joshua; Holste, Hannes; Petrus, Semar; Robbins-Pianka, Adam; Brislawn, Colin J; Wang, Mingxun; Rideout, Jai Ram; Bolyen, Evan; Dillon, Matthew; Caporaso, J Gregory; Dorrestein, Pieter C; Knight, Rob.

Nat Methods ; 15(10): 796-798, 2018 10.

Artigo em Inglês | MEDLINE | ID: mdl-30275573

RESUMO

Multi-omic insights into microbiome function and composition typically advance one study at a time. However, in order for relationships across studies to be fully understood, data must be aggregated into meta-analyses. This makes it possible to generate new hypotheses by finding features that are reproducible across biospecimens and data layers. Qiita dramatically accelerates such integration tasks in a web-based microbiome-comparison platform, which we demonstrate with Human Microbiome Project and Integrative Human Microbiome Project (iHMP) data.

Assuntos

Biologia Computacional/métodos , Internet , Metagenômica , Microbiota , Software , Humanos , Interface Usuário-Computador

12.

Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin.

Bokulich, Nicholas A; Kaehler, Benjamin D; Rideout, Jai Ram; Dillon, Matthew; Bolyen, Evan; Knight, Rob; Huttley, Gavin A; Gregory Caporaso, J.

Microbiome ; 6(1): 90, 2018 05 17.

Artigo em Inglês | MEDLINE | ID: mdl-29773078

RESUMO

BACKGROUND: Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. RESULTS: We present q2-feature-classifier ( https://github.com/qiime2/q2-feature-classifier ), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit ( https://github.com/caporaso-lab/tax-credit-data ). CONCLUSIONS: Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.

Assuntos

Bactérias/genética , Simulação por Computador , DNA Intergênico/genética , Fungos/genética , Microbiota/genética , RNA Ribossômico 16S/genética , Alinhamento de Sequência/métodos , Algoritmos , Sequência de Bases/genética , Aprendizado de Máquina , Software

13.

An Introduction to Applied Bioinformatics: a free, open, and interactive text.

Bolyen, Evan; Rideout, Jai Ram; Chase, John; Pitman, T Anders; Shiffer, Arron; Mercurio, Willow; Dillon, Matthew R; Caporaso, J Gregory.

J Open Source Educ ; 1(5)2018.

Artigo em Inglês | MEDLINE | ID: mdl-30687845

14.

q2-sample-classifier: machine-learning tools for microbiome classification and regression.

Bokulich, Nicholas A; Dillon, Matthew R; Bolyen, Evan; Kaehler, Benjamin D; Huttley, Gavin A; Caporaso, J Gregory.

J Open Res Softw ; 3(30)2018.

Artigo em Inglês | MEDLINE | ID: mdl-31552137

RESUMO

q2-sample-classifier is a plugin for the QIIME 2 microbiome bioinformatics platform that facilitates access, reproducibility, and interpretation of supervised learning (SL) methods for a broad audience of non-bioinformatics specialists.

15.

cual-id: Globally Unique, Correctable, and Human-Friendly Sample Identifiers for Comparative Omics Studies.

Chase, John H; Bolyen, Evan; Rideout, Jai Ram; Caporaso, J Gregory.

mSystems ; 1(1)2016.

Artigo em Inglês | MEDLINE | ID: mdl-27822516

RESUMO

The number of samples in high-throughput comparative "omics" studies is increasing rapidly due to declining experimental costs. To keep sample data and metadata manageable and to ensure the integrity of scientific results as the scale of these projects continues to increase, it is essential that we transition to better-designed sample identifiers. Ideally, sample identifiers should be globally unique across projects, project teams, and institutions; short (to facilitate manual transcription); correctable with respect to common types of transcription errors; opaque, meaning that they do not contain information about the samples; and compatible with existing standards. We present cual-id, a lightweight command line tool that creates, or mints, sample identifiers that meet these criteria without reliance on centralized infrastructure. cual-id allows users to assign universally unique identifiers, or UUIDs, that are globally unique to their samples. UUIDs are too long to be conveniently written on sampling materials, such as swabs or microcentrifuge tubes, however, so cual-id additionally generates human-friendly 4- to 12-character identifiers that map to their UUIDs and are unique within a project. By convention, we use "cual-id" to refer to the software, "CualID" to refer to the short, human-friendly identifiers, and "UUID" to refer to the globally unique identifiers. CualIDs are used by humans when they manually write or enter identifiers, while the longer UUIDs are used by computers to unambiguously reference a sample. Finally, cual-id optionally generates printable label sticker sheets containing Code 128 bar codes and CualIDs for labeling of sample collection and processing materials. IMPORTANCE The adoption of identifiers that are globally unique, correctable, and easily handwritten or manually entered into a computer will be a major step forward for sample tracking in comparative omics studies. As the fields transition to more-centralized sample management, for example, across labs within an institution, across projects funded under a common program, or in systems designed to facilitate meta- and/or integrated analysis, sample identifiers generated with cual-id will not need to change; thus, costly and error-prone updating of data and metadata identifiers will be avoided. Further, using cual-id will ensure that transcription errors in sample identifiers do not require the discarding of otherwise-useful samples that may have been expensive to obtain. Finally, cual-id is simple to install and use and is free for all use. No centralized infrastructure is required to ensure global uniqueness, so it is feasible for any lab to get started using these identifiers within their existing infrastructure.

16.

Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets.

Rideout, Jai Ram; Chase, John H; Bolyen, Evan; Ackermann, Gail; González, Antonio; Knight, Rob; Caporaso, J Gregory.

Gigascience ; 5: 27, 2016 06 13.

Artigo em Inglês | MEDLINE | ID: mdl-27296526

RESUMO

BACKGROUND: Bioinformatics software often requires human-generated tabular text files as input and has specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support the concurrent editing of a single spreadsheet by different users working on different platforms. Most of the researchers who enter data are not familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis. MAIN TEXT: We present Keemei, a Google Sheets Add-on, for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google's Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports the validation of two widely used tabular bioinformatics formats, the Quantitative Insights into Microbial Ecology (QIIME) sample metadata mapping file format and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others. CONCLUSIONS: Keemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes.

Assuntos

Biologia Computacional/métodos , Computação em Nuvem , Humanos , Armazenamento e Recuperação da Informação , Software , Interface Usuário-Computador

17.

Ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses.

Fouquier, Jennifer; Rideout, Jai Ram; Bolyen, Evan; Chase, John; Shiffer, Arron; McDonald, Daniel; Knight, Rob; Caporaso, J Gregory; Kelley, Scott T.

Microbiome ; 4: 11, 2016 Feb 24.

Artigo em Inglês | MEDLINE | ID: mdl-26905735

RESUMO

BACKGROUND: Fungi play critical roles in many ecosystems, cause serious diseases in plants and animals, and pose significant threats to human health and structural integrity problems in built environments. While most fungal diversity remains unknown, the development of PCR primers for the internal transcribed spacer (ITS) combined with next-generation sequencing has substantially improved our ability to profile fungal microbial diversity. Although the high sequence variability in the ITS region facilitates more accurate species identification, it also makes multiple sequence alignment and phylogenetic analysis unreliable across evolutionarily distant fungi because the sequences are hard to align accurately. To address this issue, we created ghost-tree, a bioinformatics tool that integrates sequence data from two genetic markers into a single phylogenetic tree that can be used for diversity analyses. Our approach starts with a "foundation" phylogeny based on one genetic marker whose sequences can be aligned across organisms spanning divergent taxonomic groups (e.g., fungal families). Then, "extension" phylogenies are built for more closely related organisms (e.g., fungal species or strains) using a second more rapidly evolving genetic marker. These smaller phylogenies are then grafted onto the foundation tree by mapping taxonomic names such that each corresponding foundation-tree tip would branch into its new "extension tree" child. RESULTS: We applied ghost-tree to graft fungal extension phylogenies derived from ITS sequences onto a foundation phylogeny derived from fungal 18S sequences. Our analysis of simulated and real fungal ITS data sets found that phylogenetic distances between fungal communities computed using ghost-tree phylogenies explained significantly more variance than non-phylogenetic distances. The phylogenetic metrics also improved our ability to distinguish small differences (effect sizes) between microbial communities, though results were similar to non-phylogenetic methods for larger effect sizes. CONCLUSIONS: The Silva/UNITE-based ghost tree presented here can be easily integrated into existing fungal analysis pipelines to enhance the resolution of fungal community differences and improve understanding of these communities in built environments. The ghost-tree software package can also be used to develop phylogenetic trees for other marker gene sets that afford different taxonomic resolution, or for bridging genome trees with amplicon trees. AVAILABILITY: ghost-tree is pip-installable. All source code, documentation, and test code are available under the BSD license at https://github.com/JTFouquier/ghost-tree .

Assuntos

DNA Intergênico/genética , Fungos/genética , Microbiota/genética , Proteínas Mutantes Quiméricas/genética , Filogenia , Saliva/microbiologia , Biologia Computacional , Evolução Molecular , Fungos/classificação , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Componente Principal

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA