Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
PLoS Comput Biol ; 17(11): e1009581, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34748542

RESUMO

Nucleotide sequence and taxonomy reference databases are critical resources for widespread applications including marker-gene and metagenome sequencing for microbiome analysis, diet metabarcoding, and environmental DNA (eDNA) surveys. Reproducibly generating, managing, using, and evaluating nucleotide sequence and taxonomy reference databases creates a significant bottleneck for researchers aiming to generate custom sequence databases. Furthermore, database composition drastically influences results, and lack of standardization limits cross-study comparisons. To address these challenges, we developed RESCRIPt, a Python 3 software package and QIIME 2 plugin for reproducible generation and management of reference sequence taxonomy databases, including dedicated functions that streamline creating databases from popular sources, and functions for evaluating, comparing, and interactively exploring qualitative and quantitative characteristics across reference databases. To highlight the breadth and capabilities of RESCRIPt, we provide several examples for working with popular databases for microbiome profiling (SILVA, Greengenes, NCBI-RefSeq, GTDB), eDNA and diet metabarcoding surveys (BOLD, GenBank), as well as for genome comparison. We show that bigger is not always better, and reference databases with standardized taxonomies and those that focus on type strains have quantitative advantages, though may not be appropriate for all use cases. Most databases appear to benefit from some curation (quality filtering), though sequence clustering appears detrimental to database quality. Finally, we demonstrate the breadth and extensibility of RESCRIPt for reproducible workflows with a comparison of global hepatitis genomes. RESCRIPt provides tools to democratize the process of reference database acquisition and management, enabling researchers to reproducibly and transparently create reference materials for diverse research applications. RESCRIPt is released under a permissive BSD-3 license at https://github.com/bokulich-lab/RESCRIPt.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas/estatística & dados numéricos , Software , Animais , Classificação , Biologia Computacional , Código de Barras de DNA Taxonômico , Bases de Dados de Ácidos Nucleicos , Genômica , Humanos , Metagenoma , Metagenômica , Microbiota/genética , Filogenia , RNA Ribossômico 16S/genética , Análise de Sequência
2.
PLoS Comput Biol ; 17(6): e1009056, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-34166363

RESUMO

In October of 2020, in response to the Coronavirus Disease 2019 (COVID-19) pandemic, our team hosted our first fully online workshop teaching the QIIME 2 microbiome bioinformatics platform. We had 75 enrolled participants who joined from at least 25 different countries on 6 continents, and we had 22 instructors on 4 continents. In the 5-day workshop, participants worked hands-on with a cloud-based shared compute cluster that we deployed for this course. The event was well received, and participants provided feedback and suggestions in a postworkshop questionnaire. In January of 2021, we followed this workshop with a second fully online workshop, incorporating lessons from the first. Here, we present details on the technology and protocols that we used to run these workshops, focusing on the first workshop and then introducing changes made for the second workshop. We discuss what worked well, what didn't work well, and what we plan to do differently in future workshops.


Assuntos
COVID-19 , Biologia Computacional , Microbiota , Biologia Computacional/educação , Biologia Computacional/organização & administração , Retroalimentação , Humanos , SARS-CoV-2
3.
J Theor Biol ; 420: 144-151, 2017 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-28286217

RESUMO

Understanding the evolutionary relationship among species is of fundamental importance to the biological sciences. The location of the root in any phylogenetic tree is critical as it gives an order to evolutionary events. None of the popular models of nucleotide evolution currently used in likelihood or Bayesian methods are able to infer the location of the root without exogenous information. It is known that the most general Markov models of nucleotide substitution also cannot identify the location of the root or be fitted to multiple sequence alignments with fewer than three sequences. We prove that the location of the root and the full model can be identified and statistically consistently estimated for a non-stationary, strand-symmetric substitution model given a multiple sequence alignment with two or more sequences. We also generalise earlier work to provide a practical means of overcoming the computationally intractable problem of labelling hidden states in a phylogenetic model.


Assuntos
Evolução Molecular , Modelos Genéticos , Filogenia , Animais , Humanos , Funções Verossimilhança , Cadeias de Markov , Modelos Teóricos , Alinhamento de Sequência
4.
Syst Biol ; 64(2): 281-93, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25503772

RESUMO

The genetic distance between biological sequences is a fundamental quantity in molecular evolution. It pertains to questions of rates of evolution, existence of a molecular clock, and phylogenetic inference. Under the class of continuous-time substitution models, the distance is commonly defined as the expected number of substitutions at any site in the sequence. We eschew the almost ubiquitous assumptions of evolution under stationarity and time-reversible conditions and extend the concept of the expected number of substitutions to nonstationary Markov models where the only remaining constraint is of time homogeneity between nodes in the tree. Our measure of genetic distance reduces to the standard formulation if the data in question are consistent with the stationarity assumption. We apply this general model to samples from across the tree of life to compare distances so obtained with those from the general time-reversible model, with and without rate heterogeneity across sites, and the paralinear distance, an empirical pairwise method explicitly designed to address nonstationarity. We discover that estimates from both variants of the general time-reversible model and the paralinear distance systematically overestimate genetic distance and departure from the molecular clock. The magnitude of the distance bias is proportional to departure from stationarity, which we demonstrate to be associated with longer edge lengths. The marked improvement in consistency between the general nonstationary Markov model and sequence alignments leads us to conclude that analyses of evolutionary rates and phylogenies will be substantively improved by application of this model.


Assuntos
Evolução Molecular , Modelos Genéticos , Animais , Humanos , Mamíferos/classificação , Mamíferos/genética , Cadeias de Markov , Filogenia
5.
Front Microbiol ; 12: 644487, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34220738

RESUMO

Naive Bayes classifiers (NBC) have dominated the field of taxonomic classification of amplicon sequences for over a decade. Apart from having runtime requirements that allow them to be trained and used on modest laptops, they have persistently provided class-topping classification accuracy. In this work we compare NBC with random forest classifiers, neural network classifiers, and a perfect classifier that can only fail when different species have identical sequences, and find that in some practical scenarios there is little scope for improving on NBC for taxonomic classification of 16S rRNA gene sequences. Further improvements in taxonomy classification are unlikely to come from novel algorithms alone, and will need to leverage other technological innovations, such as ecological frequency information.

6.
Comput Struct Biotechnol J ; 18: 4048-4062, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33363701

RESUMO

Microbiomes are integral components of diverse ecosystems, and increasingly recognized for their roles in the health of humans, animals, plants, and other hosts. Given their complexity (both in composition and function), the effective study of microbiomes (microbiomics) relies on the development, optimization, and validation of computational methods for analyzing microbial datasets, such as from marker-gene (e.g., 16S rRNA gene) and metagenome data. This review describes best practices for benchmarking and implementing computational methods (and software) for studying microbiomes, with particular focus on unique characteristics of microbiomes and microbiomics data that should be taken into account when designing and testing microbiomics methods.

7.
Nat Commun ; 10(1): 4643, 2019 10 11.
Artigo em Inglês | MEDLINE | ID: mdl-31604942

RESUMO

Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always violated. By incorporating environment-specific taxonomic abundance information, we demonstrate a significant increase in the species-level classification accuracy across common sample types. At the species level, overall average error rates decline from 25% to 14%, which is favourably comparable to the error rates that existing classifiers achieve at the genus level (16%). Our findings indicate that for most practical purposes, the assumption that reference species are equally likely to be observed is untenable. q2-clawback provides a straightforward alternative for samples from common environments.


Assuntos
Microbiota/genética , Filogenia , Bactérias/genética , Classificação/métodos , Biologia Computacional , Metagenômica/métodos , Densidade Demográfica , Software
9.
PLoS One ; 13(9): e0203948, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30240428

RESUMO

Many of the challenges we currently face as an advanced society have been solved in unique ways by biological systems. One such challenge is developing strategies to avoid microbial infection. Social aculeates (wasps, bees and ants) mitigate the risk of infection to their colonies using a wide range of adaptations and mechanisms. These adaptations and mechanisms are reliant on intricate social structures and are energetically costly for the colony. It seems likely that these species must have had alternative and simpler mechanisms in place to ensure the maintenance of hygienic domicile conditions prior to the evolution of these complex behaviours. Features of the aculeate coiled-coil silk proteins are reminiscent of those of naturally occurring α-helical antimicrobial peptides (AMPs). In this study, we demonstrate that peptides derived from the aculeate silk proteins have antimicrobial activity. We reconstruct the predicted ancestral silk sequences of an aculeate ancestor that pre-dates the evolution of sociality and demonstrate that these ancestral sequences also contained peptides with antimicrobial properties. It is possible that the silks evolved as an antifouling material and facilitated the evolution of sociality. These materials serve as model materials for consideration in future biomaterial development.


Assuntos
Peptídeos Catiônicos Antimicrobianos/genética , Peptídeos Catiônicos Antimicrobianos/fisiologia , Proteínas de Insetos/genética , Proteínas de Insetos/fisiologia , Seda/genética , Seda/fisiologia , Sequência de Aminoácidos , Animais , Peptídeos Catiônicos Antimicrobianos/química , Formigas/genética , Formigas/fisiologia , Abelhas/genética , Abelhas/fisiologia , Evolução Molecular , Proteínas de Insetos/química , Filogenia , Seda/química , Comportamento Social , Vespas/genética , Vespas/fisiologia
10.
J Open Res Softw ; 3(30)2018.
Artigo em Inglês | MEDLINE | ID: mdl-31552137

RESUMO

q2-sample-classifier is a plugin for the QIIME 2 microbiome bioinformatics platform that facilitates access, reproducibility, and interpretation of supervised learning (SL) methods for a broad audience of non-bioinformatics specialists.

11.
Microbiome ; 6(1): 90, 2018 05 17.
Artigo em Inglês | MEDLINE | ID: mdl-29773078

RESUMO

BACKGROUND: Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. RESULTS: We present q2-feature-classifier ( https://github.com/qiime2/q2-feature-classifier ), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit ( https://github.com/caporaso-lab/tax-credit-data ). CONCLUSIONS: Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.


Assuntos
Bactérias/genética , Simulação por Computador , DNA Intergênico/genética , Fungos/genética , Microbiota/genética , RNA Ribossômico 16S/genética , Alinhamento de Sequência/métodos , Algoritmos , Sequência de Bases/genética , Aprendizado de Máquina , Software
12.
Genome Biol Evol ; 9(1): 134-149, 2017 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-28175284

RESUMO

Estimation of natural selection on protein-coding sequences is a key comparative genomics approach for de novo prediction of lineage-specific adaptations. Selective pressure is measured on a per-gene basis by comparing the rate of nonsynonymous substitutions to the rate of synonymous substitutions. All published codon substitution models have been time-reversible and thus assume that sequence composition does not change over time. We previously demonstrated that if time-reversible DNA substitution models are applied in the presence of changing sequence composition, the number of substitutions is systematically biased towards overestimation. We extend these findings to the case of codon substitution models and further demonstrate that the ratio of nonsynonymous to synonymous rates of substitution tends to be underestimated over three data sets of mammals, vertebrates, and insects. Our basis for comparison is a nonstationary codon substitution model that allows sequence composition to change. Goodness-of-fit results demonstrate that our new model tends to fit the data better. Direct measurement of nonstationarity shows that bias in estimates of natural selection and genetic distance increases with the degree of violation of the stationarity assumption. Additionally, inferences drawn under time-reversible models are systematically affected by compositional divergence. As genomic sequences accumulate at an accelerating rate, the importance of accurate de novo estimation of natural selection increases. Our results establish that our new model provides a more robust perspective on this fundamental quantity.


Assuntos
Códon , Modelos Genéticos , Proteínas/genética , Seleção Genética , Animais , Humanos , Cadeias de Markov
13.
Insect Biochem Mol Biol ; 59: 72-9, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25712559

RESUMO

Multiple gene duplication events in the precursor of the Aculeata (bees, ants, hornets) gave rise to four silk genes. Whilst these homologs encode proteins with similar amino acid composition and coiled coil structure, the retention of all four homologs implies they each are important. In this study we identified, produced and characterized the four silk proteins from Apis dorsata, the giant Asian honeybee. The proteins were readily purified, allowing us to investigate the folding behavior of solutions of individual proteins in comparison to mixtures of all four proteins at concentrations where they assemble into their native coiled coil structure. In contrast to solutions of any one protein type, solutions of a mixture of the four proteins formed coiled coils that were stable against dilution and detergent denaturation. The results are consistent with the formation of a heteromeric coiled coil protein complex. The mechanism of silk protein coiled coil formation and evolution is discussed in light of these results.


Assuntos
Abelhas/genética , Proteínas de Insetos/genética , Seda/genética , Sequência de Aminoácidos , Animais , Abelhas/metabolismo , Evolução Molecular , Proteínas de Insetos/química , Dados de Sequência Molecular , Dobramento de Proteína , Estrutura Secundária de Proteína , Homologia de Sequência , Seda/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA