Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 85
Filtrar
1.
Nucleic Acids Res ; 51(D1): D690-D699, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36263822

RESUMO

The Comprehensive Antibiotic Resistance Database (CARD; card.mcmaster.ca) combines the Antibiotic Resistance Ontology (ARO) with curated AMR gene (ARG) sequences and resistance-conferring mutations to provide an informatics framework for annotation and interpretation of resistomes. As of version 3.2.4, CARD encompasses 6627 ontology terms, 5010 reference sequences, 1933 mutations, 3004 publications, and 5057 AMR detection models that can be used by the accompanying Resistance Gene Identifier (RGI) software to annotate genomic or metagenomic sequences. Focused curation enhancements since 2020 include expanded ß-lactamase curation, incorporation of likelihood-based AMR mutations for Mycobacterium tuberculosis, addition of disinfectants and antiseptics plus their associated ARGs, and systematic curation of resistance-modifying agents. This expanded curation includes 180 new AMR gene families, 15 new drug classes, 1 new resistance mechanism, and two new ontological relationships: evolutionary_variant_of and is_small_molecule_inhibitor. In silico prediction of resistomes and prevalence statistics of ARGs has been expanded to 377 pathogens, 21,079 chromosomes, 2,662 genomic islands, 41,828 plasmids and 155,606 whole-genome shotgun assemblies, resulting in collation of 322,710 unique ARG allele sequences. New features include the CARD:Live collection of community submitted isolate resistome data and the introduction of standardized 15 character CARD Short Names for ARGs to support machine learning efforts.


Assuntos
Curadoria de Dados , Bases de Dados Factuais , Resistência Microbiana a Medicamentos , Aprendizado de Máquina , Antibacterianos/farmacologia , Genes Bacterianos , Funções Verossimilhança , Software , Anotação de Sequência Molecular
2.
Environ Microbiol ; 26(1): e16566, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38149467

RESUMO

Trimming of sequencing reads is a pre-processing step that aims to discard sequence segments such as primers, adapters and low quality nucleotides that will interfere with clustering and classification steps. We evaluated the impact of trimming length of paired-end 16S and 18S rRNA amplicon reads on the ability to reconstruct the taxonomic composition and relative abundances of communities with a known composition in both even and uneven proportions. We found that maximizing read retention maximizes recall but reduces precision by increasing false positives. The presence of expected taxa was accurately predicted across broad trim length ranges but recovering original relative proportions remains a difficult challenge. We show that parameters that maximize taxonomic recovery do not simultaneously maximize relative abundance accuracy. Trim length represents one of several experimental parameters that have non-uniform impact across microbial clades, making it a difficult parameter to optimize. This study offers insights, guidelines, and helps researchers assess the significance of their decisions when trimming raw reads in a microbiome analysis based on overlapping or non-overlapping paired-end amplicons.


Assuntos
Microbiota , RNA Ribossômico 16S/genética , Microbiota/genética , Análise de Sequência de DNA , RNA Ribossômico 18S , Primers do DNA/genética , Sequenciamento de Nucleotídeos em Larga Escala
3.
Syst Biol ; 72(3): 559-574, 2023 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-35904761

RESUMO

Organismal traits can evolve in a coordinated way, with correlated patterns of gains and losses reflecting important evolutionary associations. Discovering these associations can reveal important information about the functional and ecological linkages among traits. Phylogenetic profiles treat individual genes as traits distributed across sets of genomes and can provide a fine-grained view of the genetic underpinnings of evolutionary processes in a set of genomes. Phylogenetic profiling has been used to identify genes that are functionally linked and to identify common patterns of lateral gene transfer in microorganisms. However, comparative analysis of phylogenetic profiles and other trait distributions should take into account the phylogenetic relationships among the organisms under consideration. Here, we propose the Community Coevolution Model (CCM), a new coevolutionary model to analyze the evolutionary associations among traits, with a focus on phylogenetic profiles. In the CCM, traits are considered to evolve as a community with interactions, and the transition rate for each trait depends on the current states of other traits. Surpassing other comparative methods for pairwise trait analysis, CCM has the additional advantage of being able to examine multiple traits as a community to reveal more dependency relationships. We also develop a simulation procedure to generate phylogenetic profiles with correlated evolutionary patterns that can be used as benchmark data for evaluation purposes. A simulation study demonstrates that CCM is more accurate than other methods including the Jaccard Index and three tree-aware methods. The parameterization of CCM makes the interpretation of the relations between genes more direct, which leads to Darwin's scenario being identified easily based on the estimated parameters. We show that CCM is more efficient and fits real data better than other methods resulting in higher likelihood scores with fewer parameters. An examination of 3786 phylogenetic profiles across a set of 659 bacterial genomes highlights linkages between genes with common functions, including many patterns that would not have been identified under a nonphylogenetic model of common distribution. We also applied the CCM to 44 proteins in the well-studied Mitochondrial Respiratory Complex I and recovered associations that mapped well onto the structural associations that exist in the complex. [Coevolution; evolutionary rates; gene network; graphical models; phylogenetic profiles; phylogeny.].


Assuntos
Evolução Biológica , Proteínas , Filogenia , Fenótipo , Genoma Bacteriano
4.
Can J Microbiol ; 70(10): 446-460, 2024 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-39079170

RESUMO

With antimicrobial resistance (AMR) rapidly evolving in pathogens, quick and accurate identification of genetic determinants of phenotypic resistance is essential for improving surveillance, stewardship, and clinical mitigation. Machine learning (ML) models show promise for AMR prediction in diagnostics but require a deep understanding of internal processes to use effectively. Our study utilised AMR gene, pangenomic, and predicted plasmid features from 647 Enterococcus faecium and Enterococcus faecalis genomes across the One Health continuum, along with corresponding resistance phenotypes, to develop interpretive ML classifiers. Vancomycin resistance could be predicted with 99% accuracy with AMR gene features, 98% with pangenome features, and 96% with plasmid clusters. Top pangenome features overlapped with the resistance genes of the vanA operon, which are often laterally transmitted via plasmids. Doxycycline resistance prediction achieved approximately 92% accuracy with pangenome features, with the top feature being elements of Tn916 conjugative transposon, a tet(M) carrier. Erythromycin resistance prediction models achieved about 90% accuracy, but top features were negatively correlated with resistance due to the confounding effect of population structure. This work demonstrates the importance of reviewing ML models' features to discern biological relevance even when achieving high-performance metrics. Our workflow offers the potential to propose hypotheses for experimental testing, enhancing the understanding of AMR mechanisms, which are crucial for combating the AMR crisis.


Assuntos
Antibacterianos , Farmacorresistência Bacteriana , Enterococcus faecalis , Enterococcus faecium , Genoma Bacteriano , Aprendizado de Máquina , Plasmídeos , Enterococcus faecalis/genética , Enterococcus faecalis/efeitos dos fármacos , Enterococcus faecium/genética , Enterococcus faecium/efeitos dos fármacos , Antibacterianos/farmacologia , Farmacorresistência Bacteriana/genética , Plasmídeos/genética , Humanos , Testes de Sensibilidade Microbiana , Infecções por Bactérias Gram-Positivas/microbiologia , Proteínas de Bactérias/genética
5.
Clin Microbiol Rev ; 35(3): e0017921, 2022 09 21.
Artigo em Inglês | MEDLINE | ID: mdl-35612324

RESUMO

Antimicrobial resistance (AMR) is a global health crisis that poses a great threat to modern medicine. Effective prevention strategies are urgently required to slow the emergence and further dissemination of AMR. Given the availability of data sets encompassing hundreds or thousands of pathogen genomes, machine learning (ML) is increasingly being used to predict resistance to different antibiotics in pathogens based on gene content and genome composition. A key objective of this work is to advocate for the incorporation of ML into front-line settings but also highlight the further refinements that are necessary to safely and confidently incorporate these methods. The question of what to predict is not trivial given the existence of different quantitative and qualitative laboratory measures of AMR. ML models typically treat genes as independent predictors, with no consideration of structural and functional linkages; they also may not be accurate when new mutational variants of known AMR genes emerge. Finally, to have the technology trusted by end users in public health settings, ML models need to be transparent and explainable to ensure that the basis for prediction is clear. We strongly advocate that the next set of AMR-ML studies should focus on the refinement of these limitations to be able to bridge the gap to diagnostic implementation.


Assuntos
Antibacterianos , Farmacorresistência Bacteriana , Antibacterianos/farmacologia , Antibacterianos/uso terapêutico , Farmacorresistência Bacteriana/genética , Aprendizado de Máquina
6.
Bioinformatics ; 38(11): 3051-3061, 2022 05 26.
Artigo em Inglês | MEDLINE | ID: mdl-35536192

RESUMO

MOTIVATION: There is a plethora of measures to evaluate functional similarity (FS) of genes based on their co-expression, protein-protein interactions and sequence similarity. These measures are typically derived from hand-engineered and application-specific metrics to quantify the degree of shared information between two genes using their Gene Ontology (GO) annotations. RESULTS: We introduce deepSimDEF, a deep learning method to automatically learn FS estimation of gene pairs given a set of genes and their GO annotations. deepSimDEF's key novelty is its ability to learn low-dimensional embedding vector representations of GO terms and gene products and then calculate FS using these learned vectors. We show that deepSimDEF can predict the FS of new genes using their annotations: it outperformed all other FS measures by >5-10% on yeast and human reference datasets on protein-protein interactions, gene co-expression and sequence homology tasks. Thus, deepSimDEF offers a powerful and adaptable deep neural architecture that can benefit a wide range of problems in genomics and proteomics, and its architecture is flexible enough to support its extension to any organism. AVAILABILITY AND IMPLEMENTATION: Source code and data are available at https://github.com/ahmadpgh/deepSimDEF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Proteínas , Humanos , Ontologia Genética , Biologia Computacional/métodos , Anotação de Sequência Molecular , Software , Saccharomyces cerevisiae , RNA
7.
BMC Microbiol ; 22(1): 270, 2022 11 10.
Artigo em Inglês | MEDLINE | ID: mdl-36357861

RESUMO

BACKGROUND: Preterm birth is a global problem with about 12% of births in sub-Saharan Africa occurring before 37 weeks of gestation. Several studies have explored a potential association between vaginal microbiota and preterm birth, and some have found an association while others have not. We performed a study designed to determine whether there is an association with vaginal microbiota and/or placental microbiota and preterm birth in an African setting. METHODS: Women presenting to the study hospital in labor with a gestational age of 26 to 36 weeks plus six days were prospectively enrolled in a study of the microbiota in preterm labor along with controls matched for age and parity. A vaginal sample was collected at the time of presentation to the hospital in active labor. In addition, a placental sample was collected when available. Libraries were constructed using PCR primers to amplify the V6/V7/V8 variable regions of the 16S rRNA gene, followed by sequencing with an Illumina MiSeq machine and analysis using QIIME2 2022.2. RESULTS: Forty-nine women presenting with preterm labor and their controls were enrolled in the study of which 23 matched case-control pairs had sufficient sequence data for comparison. Lactobacillus was identified in all subjects, ranging in abundance from < 1% to > 99%, with Lactobacillus iners and Lactobacillus crispatus the most common species. Over half of the vaginal samples contained Gardnerella and/or Prevotella; both species were associated with preterm birth in previous studies. However, we found no significant difference in composition between mothers with preterm and those with full-term deliveries, with both groups showing roughly equal representation of different Lactobacillus species and dysbiosis-associated genera. Placental samples generally had poor DNA recovery, with a mix of probable sequencing artifacts, contamination, and bacteria acquired during passage through the birth canal. However, several placental samples showed strong evidence for the presence of Streptococcus species, which are known to infect the placenta. CONCLUSIONS: The current study showed no association of preterm birth with composition of the vaginal community. It does provide important information on the range of sequence types in African women and supports other data suggesting that women of African ancestry have an increased frequency of non-Lactobacillus types, but without evidence of associated adverse outcomes.


Assuntos
Microbiota , Trabalho de Parto Prematuro , Nascimento Prematuro , Humanos , Feminino , Recém-Nascido , Gravidez , Lactente , RNA Ribossômico 16S/genética , Nascimento Prematuro/microbiologia , Estudos de Casos e Controles , Quênia , Placenta , Vagina/microbiologia , Trabalho de Parto Prematuro/microbiologia , Microbiota/genética
8.
Nucleic Acids Res ; 48(D1): D517-D525, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31665441

RESUMO

The Comprehensive Antibiotic Resistance Database (CARD; https://card.mcmaster.ca) is a curated resource providing reference DNA and protein sequences, detection models and bioinformatics tools on the molecular basis of bacterial antimicrobial resistance (AMR). CARD focuses on providing high-quality reference data and molecular sequences within a controlled vocabulary, the Antibiotic Resistance Ontology (ARO), designed by the CARD biocuration team to integrate with software development efforts for resistome analysis and prediction, such as CARD's Resistance Gene Identifier (RGI) software. Since 2017, CARD has expanded through extensive curation of reference sequences, revision of the ontological structure, curation of over 500 new AMR detection models, development of a new classification paradigm and expansion of analytical tools. Most notably, a new Resistomes & Variants module provides analysis and statistical summary of in silico predicted resistance variants from 82 pathogens and over 100 000 genomes. By adding these resistance variants to CARD, we are able to summarize predicted resistance using the information included in CARD, identify trends in AMR mobility and determine previously undescribed and novel resistance variants. Here, we describe updates and recent expansions to CARD and its biocuration process, including new resources for community biocuration of AMR molecular reference data.


Assuntos
Bases de Dados Genéticas , Farmacorresistência Bacteriana , Genes Bacterianos , Software , Bactérias/efeitos dos fármacos , Bactérias/genética , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo
9.
Bioinformatics ; 36(10): 3043-3048, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-32108861

RESUMO

MOTIVATION: Many methods for microbial protein subcellular localization (SCL) prediction exist; however, none is readily available for analysis of metagenomic sequence data, despite growing interest from researchers studying microbial communities in humans, agri-food relevant organisms and in other environments (e.g. for identification of cell-surface biomarkers for rapid protein-based diagnostic tests). We wished to also identify new markers of water quality from freshwater samples collected from pristine versus pollution-impacted watersheds. RESULTS: We report PSORTm, the first bioinformatics tool designed for prediction of diverse bacterial and archaeal protein SCL from metagenomics data. PSORTm incorporates components of PSORTb, one of the most precise and widely used protein SCL predictors, with an automated classification by cell envelope. An evaluation using 5-fold cross-validation with in silico-fragmented sequences with known localization showed that PSORTm maintains PSORTb's high precision, while sensitivity increases proportionately with metagenomic sequence fragment length. PSORTm's read-based analysis was similar to PSORTb-based analysis of metagenome-assembled genomes (MAGs); however, the latter requires non-trivial manual classification of each MAG by cell envelope, and cannot make use of unassembled sequences. Analysis of the watershed samples revealed the importance of normalization and identified potential biomarkers of water quality. This method should be useful for examining a wide range of microbial communities, including human microbiomes, and other microbiomes of medical, environmental or industrial importance. AVAILABILITY AND IMPLEMENTATION: Documentation, source code and docker containers are available for running PSORTm locally at https://www.psort.org/psortm/ (freely available, open-source software under GNU General Public License Version 3). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Archaea , Metagenômica , Archaea/genética , Bactérias/genética , Humanos , Metagenoma , Software
10.
Microb Ecol ; 77(3): 713-725, 2019 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-30209585

RESUMO

Soil microorganisms are important mediators of carbon cycling in nature. Although cellulose- and hemicellulose-degrading bacteria have been isolated from Algerian ecosystems, the information on the composition of soil bacterial communities and thus the potential of their members to decompose plant residues is still limited. The objective of the present study was to describe and compare the bacterial community composition in Algerian soils (crop, forest, garden, and desert) and the activity of cellulose- and hemicellulose-degrading enzymes. Bacterial communities were characterized by high-throughput 16S amplicon sequencing followed by the in silico prediction of their functional potential. The highest lignocellulolytic activity was recorded in forest and garden soils whereas activities in the agricultural and desert soils were typically low. The bacterial phyla Proteobacteria (in particular classes α-proteobacteria, δ-proteobacteria, and γ-proteobacteria), Firmicutes, and Actinobacteria dominated in all soils. Forest and garden soils exhibited higher diversity than agricultural and desert soils. Endocellulase activity was elevated in forest and garden soils. In silico analysis predicted higher share of genes assigned to general metabolism in forest and garden soils compared with agricultural and arid soils, particularly in carbohydrate metabolism. The highest potential of lignocellulose decomposition was predicted for forest soils, which is in agreement with the highest activity of corresponding enzymes.


Assuntos
Bactérias/enzimologia , Proteínas de Bactérias/metabolismo , Celulase/metabolismo , Glicosídeo Hidrolases/metabolismo , Microbiologia do Solo , Solo/química , Argélia , Bactérias/classificação , Bactérias/genética , Bactérias/isolamento & purificação , Proteínas de Bactérias/genética , Celulase/genética , Ecossistema , Florestas , Glicosídeo Hidrolases/genética , Filogenia
11.
Mol Ecol ; 27(20): 4026-4040, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30152128

RESUMO

Conservation of exploited species requires an understanding of both genetic diversity and the dominant structuring forces, particularly near range limits, where climatic variation can drive rapid expansions or contractions of geographic range. Here, we examine population structure and landscape associations in Atlantic salmon (Salmo salar) across a heterogeneous landscape near the northern range limit in Labrador, Canada. Analysis of two amplicon-based data sets containing 101 microsatellites and 376 single nucleotide polymorphisms (SNPs) from 35 locations revealed clear differentiation between populations spawning in rivers flowing into a large marine embayment (Lake Melville) compared to coastal populations. The mechanisms influencing the differentiation of embayment populations were investigated using both multivariate and machine-learning landscape genetic approaches. We identified temperature as the strongest correlate with genetic structure, particularly warm temperature extremes and wider annual temperature ranges. The genomic basis of this divergence was further explored using a subset of locations (n = 17) and a 220K SNP array. SNPs associated with spatial structuring and temperature mapped to a diverse set of genes and molecular pathways, including regulation of gene expression, immune response, and cell development and differentiation. The results spanning molecular marker types and both novel and established methods clearly show climate-associated, fine-scale population structure across an environmental gradient in Atlantic salmon near its range limit in North America, highlighting valuable approaches for predicting population responses to climate change and managing species sustainability.


Assuntos
Genética Populacional/métodos , Repetições de Microssatélites/genética , Salmo salar/genética , Animais , América do Norte , Polimorfismo de Nucleotídeo Único/genética
12.
Nature ; 492(7427): 59-65, 2012 Dec 06.
Artigo em Inglês | MEDLINE | ID: mdl-23201678

RESUMO

Cryptophyte and chlorarachniophyte algae are transitional forms in the widespread secondary endosymbiotic acquisition of photosynthesis by engulfment of eukaryotic algae. Unlike most secondary plastid-bearing algae, miniaturized versions of the endosymbiont nuclei (nucleomorphs) persist in cryptophytes and chlorarachniophytes. To determine why, and to address other fundamental questions about eukaryote-eukaryote endosymbiosis, we sequenced the nuclear genomes of the cryptophyte Guillardia theta and the chlorarachniophyte Bigelowiella natans. Both genomes have >21,000 protein genes and are intron rich, and B. natans exhibits unprecedented alternative splicing for a single-celled organism. Phylogenomic analyses and subcellular targeting predictions reveal extensive genetic and biochemical mosaicism, with both host- and endosymbiont-derived genes servicing the mitochondrion, the host cell cytosol, the plastid and the remnant endosymbiont cytosol of both algae. Mitochondrion-to-nucleus gene transfer still occurs in both organisms but plastid-to-nucleus and nucleomorph-to-nucleus transfers do not, which explains why a small residue of essential genes remains locked in each nucleomorph.


Assuntos
Núcleo Celular/genética , Cercozoários/genética , Criptófitas/genética , Evolução Molecular , Genoma/genética , Mosaicismo , Simbiose/genética , Proteínas de Algas/genética , Proteínas de Algas/metabolismo , Processamento Alternativo/genética , Cercozoários/citologia , Cercozoários/metabolismo , Criptófitas/citologia , Criptófitas/metabolismo , Citosol/metabolismo , Duplicação Gênica/genética , Transferência Genética Horizontal/genética , Genes Essenciais/genética , Genoma Mitocondrial/genética , Genoma de Planta/genética , Genomas de Plastídeos/genética , Dados de Sequência Molecular , Filogenia , Transporte Proteico , Proteoma/genética , Proteoma/metabolismo , Transcriptoma/genética
13.
Bioinformatics ; 32(9): 1380-7, 2016 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-26708333

RESUMO

MOTIVATION: Measures of protein functional similarity are essential tools for function prediction, evaluation of protein-protein interactions (PPIs) and other applications. Several existing methods perform comparisons between proteins based on the semantic similarity of their GO terms; however, these measures are highly sensitive to modifications in the topological structure of GO, tend to be focused on specific analytical tasks and concentrate on the GO terms themselves rather than considering their textual definitions. RESULTS: We introduce simDEF, an efficient method for measuring semantic similarity of GO terms using their GO definitions, which is based on the Gloss Vector measure commonly used in natural language processing. The simDEF approach builds optimized definition vectors for all relevant GO terms, and expresses the similarity of a pair of proteins as the cosine of the angle between their definition vectors. Relative to existing similarity measures, when validated on a yeast reference database, simDEF improves correlation with sequence homology by up to 50%, shows a correlation improvement >4% with gene expression in the biological process hierarchy of GO and increases PPI predictability by > 2.5% in F1 score for molecular function hierarchy. AVAILABILITY AND IMPLEMENTATION: Datasets, results and source code are available at http://kiwi.cs.dal.ca/Software/simDEF CONTACT: ahmad.pgh@dal.ca or beiko@cs.dal.ca SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Ontologia Genética , Algoritmos , Animais , Humanos , Proteínas , Semântica
14.
BMC Genomics ; 16: 526, 2015 Jul 16.
Artigo em Inglês | MEDLINE | ID: mdl-26173980

RESUMO

BACKGROUND: Lateral gene transfer (LGT) is an important evolutionary process in microbial evolution. In sewage treatment plants, LGT of antibiotic resistance and xenobiotic degradation-related proteins has been suggested, but the role of LGT outside these processes is unknown. Microbial communities involved in Enhanced Biological Phosphorus Removal (EBPR) have been used to treat wastewater in the last 50 years and may provide insights into adaptation to an engineered environment. We introduce two different types of analysis to identify LGT in EBPR sewage communities, based on identifying assembled sequences with more than one strong taxonomic match, and on unusual phylogenetic patterns. We applied these methods to investigate the role of LGT in six energy-related metabolic pathways. RESULTS: The analyses identified overlapping but non-identical sets of transferred enzymes. All of these were homologous with sequences from known mobile genetic elements, and many were also in close proximity to transposases and integrases in the EBPR data set. The taxonomic method had higher sensitivity than the phylogenetic method, identifying more potential LGTs. Both analyses identified the putative transfer of five enzymes within an Australian community, two in a Danish community, and none in a US-derived culture. CONCLUSIONS: Our methods were able to identify sequences with unusual phylogenetic or compositional properties as candidate LGT events. The association of these candidates with known mobile elements supports the hypothesis of transfer. The results of our analysis strongly suggest that LGT has influenced the development of functionally important energy-related pathways in EBPR systems, but transfers may be unique to each community due to different operating conditions or taxonomic composition.


Assuntos
Transferência Genética Horizontal , Fósforo/metabolismo , Bactérias/enzimologia , Bactérias/genética , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Mapeamento de Sequências Contíguas , Metabolismo Energético/genética , Enzimas/genética , Enzimas/metabolismo , Esgotos/microbiologia
15.
Bioinformatics ; 30(21): 3123-4, 2014 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-25061070

RESUMO

UNLABELLED: STAMP is a graphical software package that provides statistical hypothesis tests and exploratory plots for analysing taxonomic and functional profiles. It supports tests for comparing pairs of samples or samples organized into two or more treatment groups. Effect sizes and confidence intervals are provided to allow critical assessment of the biological relevancy of test results. A user-friendly graphical interface permits easy exploration of statistical results and generation of publication-quality plots. AVAILABILITY AND IMPLEMENTATION: STAMP is licensed under the GNU GPL. Python source code and binaries are available from our website at: http://kiwi.cs.dal.ca/Software/STAMP.


Assuntos
Bactérias/classificação , Software , Classificação/métodos , Intervalos de Confiança , Cianobactérias/classificação , Cianobactérias/genética , Interpretação Estatística de Dados , Genoma Bacteriano , Humanos
16.
Syst Biol ; 63(4): 566-81, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24695589

RESUMO

Supertree methods reconcile a set of phylogenetic trees into a single structure that is often interpreted as a branching history of species. A key challenge is combining conflicting evolutionary histories that are due to artifacts of phylogenetic reconstruction and phenomena such as lateral gene transfer (LGT). Many supertree approaches use optimality criteria that do not reflect underlying processes, have known biases, and may be unduly influenced by LGT. We present the first method to construct supertrees by using the subtree prune-and-regraft (SPR) distance as an optimality criterion. Although calculating the rooted SPR distance between a pair of trees is NP-hard, our new maximum agreement forest-based methods can reconcile trees with hundreds of taxa and>50 transfers in fractions of a second, which enables repeated calculations during the course of an iterative search. Our approach can accommodate trees in which uncertain relationships have been collapsed to multifurcating nodes. Using a series of benchmark datasets simulated under plausible rates of LGT, we show that SPR supertrees are more similar to correct species histories than supertrees based on parsimony or Robinson-Foulds distance criteria. We successfully constructed an SPR supertree from a phylogenomic dataset of 40,631 gene trees that covered 244 genomes representing several major bacterial phyla. Our SPR-based approach also allowed direct inference of highways of gene transfer between bacterial classes and genera. A Small number of these highways connect genera in different phyla and can highlight specific genes implicated in long-distance LGT. [Lateral gene transfer; matrix representation with parsimony; phylogenomics; prokaryotic phylogeny; Robinson-Foulds; subtree prune-and-regraft; supertrees.].


Assuntos
Bactérias/classificação , Classificação/métodos , Simulação por Computador , Filogenia , Algoritmos , Bactérias/genética , Transferência Genética Horizontal , Genoma Bacteriano/genética , Reprodutibilidade dos Testes
17.
Bioinformatics ; 29(15): 1858-64, 2013 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-23732273

RESUMO

BACKGROUND: Homology-based taxonomic assignment is impeded by differences between the unassigned read and reference database, forcing a rank-specific classification to the closest (and possibly incorrect) reference lineage. This assignment may be correct only to a general rank (e.g. order) and incorrect below that rank (e.g. family and genus). Algorithms like LCA avoid this by varying the predicted taxonomic rank based on matches to a set of taxonomic references. LCA and related approaches can be conservative, especially if best matches are taxonomically widespread because of events such as lateral gene transfer (LGT). RESULTS: Our extension to LCA called SPANNER (similarity profile annotater) uses the set of best homology matches (the LCA Profile) for a given sequence and compares this profile with a set of profiles inferred from taxonomic reference organisms. SPANNER provides an assignment that is less sensitive to LGT and other confounding phenomena. In a series of trials on real and artificial datasets, SPANNER outperformed LCA-style algorithms in terms of taxonomic precision and outperformed best BLAST at certain levels of taxonomic novelty in the dataset. We identify examples where LCA made an overly conservative prediction, but SPANNER produced a more precise and correct prediction. CONCLUSIONS: By using profiles of homology matches to represent patterns of genomic similarity that arise because of vertical and lateral inheritance, SPANNER offers an effective compromise between taxonomic assignment based on best BLAST scores, and the conservative approach of LCA and similar approaches. AVAILABILITY: C++ source code and binaries are freely available at http://kiwi.cs.dal.ca/Software/SPANNER. CONTACT: beiko@cs.dal.ca SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Genoma Microbiano , Alinhamento de Sequência/métodos , Genômica/métodos , Metagenoma , Filogenia
18.
Nucleic Acids Res ; 40(14): e111, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22532608

RESUMO

Determining the taxonomic lineage of DNA sequences is an important step in metagenomic analysis. Short DNA fragments from next-generation sequencing projects and microbes that lack close relatives in reference sequenced genome databases pose significant problems to taxonomic attribution methods. Our new classification algorithm, RITA (Rapid Identification of Taxonomic Assignments), uses the agreement between composition and homology to accurately classify sequences as short as 50 nt in length by assigning them to different classification groups with varying degrees of confidence. RITA is much faster than the hybrid PhymmBL approach when comparable homology search algorithms are used, and achieves slightly better accuracy than PhymmBL on an artificial metagenome. RITA can also incorporate prior knowledge about taxonomic distributions to increase the accuracy of assignments in data sets with varying degrees of taxonomic novelty, and classified sequences with higher precision than the current best rank-flexible classifier. The accuracy on short reads can be increased by exploiting paired-end information, if available, which we demonstrate on a recently published bovine rumen data set. Finally, we develop a variant of RITA that incorporates accelerated homology search techniques, and generate predictions on a set of human gut metagenomes that were previously assigned to different 'enterotypes'. RITA is freely available in Web server and standalone versions.


Assuntos
Algoritmos , Metagenômica/métodos , Análise de Sequência de DNA , Animais , Bovinos , Classificação/métodos , Humanos , Camada de Gelo/microbiologia , Metagenoma , Rúmen/microbiologia , Homologia de Sequência do Ácido Nucleico , Estômago/microbiologia
19.
Mol Biol Evol ; 29(12): 3947-58, 2012 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-22915830

RESUMO

Environmental drivers of biodiversity can be identified by relating patterns of community similarity to ecological factors. Community variation has traditionally been assessed by considering changes in species composition and more recently by incorporating phylogenetic information to account for the relative similarity of taxa. Here, we describe how an important class of measures including Bray-Curtis, Canberra, and UniFrac can be extended to allow community variation to be computed on a phylogenetic network. We focus on phylogenetic split systems, networks that are produced by the widely used median network and neighbor-net methods, which can represent incongruence in the evolutionary history of a set of taxa. Calculating ß diversity over a split system provides a measure of community similarity averaged over uncertainty or conflict in the available phylogenetic signal. Our freely available software, Network Diversity, provides 11 qualitative (presence-absence, unweighted) and 14 quantitative (weighted) network-based measures of community similarity that model different aspects of community richness and evenness. We demonstrate the broad applicability of network-based diversity approaches by applying them to three distinct data sets: pneumococcal isolates from distinct geographic regions, human mitochondrial DNA data from the Indonesian island of Nias, and proteorhodopsin sequences from the Sargasso and Mediterranean Seas. Our results show that major expected patterns of variation for these data sets are recovered using network-based measures, which indicates that these patterns are robust to phylogenetic uncertainty and conflict. Nonetheless, network-based measures of community similarity can differ substantially from measures ignoring phylogenetic relationships or from tree-based measures when incongruent signals are present in the underlying data. Network-based measures provide a methodology for assessing the robustness of ß-diversity results in light of incongruent phylogenetic signal and allow ß diversity to be calculated over widely used network structures such as median networks.


Assuntos
Biodiversidade , Biota , Variação Genética , Modelos Teóricos , Filogenia , Software , DNA Mitocondrial/genética , Genética Populacional/métodos , Humanos , Indonésia , Tipagem de Sequências Multilocus , Rodopsina/genética , Rodopsinas Microbianas , Streptococcus pneumoniae/genética
20.
Sci Rep ; 13(1): 5210, 2023 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-36997631

RESUMO

Using environmental DNA (eDNA) to monitor biodiversity in aquatic environments is becoming an efficient and cost-effective alternative to other methods such as visual and acoustic identification. Until recently, eDNA sampling was accomplished primarily through manual sampling methods; however, with technological advances, automated samplers are being developed to make sampling easier and more accessible. This paper describes a new eDNA sampler capable of self-cleaning and multi-sample capture and preservation, all within a single unit capable of being deployed by a single person. The first in-field test of this sampler took place in the Bedford Basin, Nova Scotia, Canada alongside parallel samples taken using the typical Niskin bottle collection and post-collection filtration method. Both methods were able to capture the same aquatic microbial community and counts of representative DNA sequences were well correlated between methods with R[Formula: see text] values ranging from 0.71-0.93. The two collection methods returned the same top 10 families in near identical relative abundance, demonstrating that the sampler was able to capture the same community composition of common microbes as the Niskin. The presented eDNA sampler provides a robust alternative to manual sampling methods, is amenable to autonomous vehicle payload constraints, and will facilitate persistent monitoring of remote and inaccessible sites.


Assuntos
DNA Ambiental , Microbiota , Humanos , DNA Ambiental/genética , Biodiversidade , Filtração , Microbiota/genética , Nova Escócia , Monitoramento Ambiental/métodos , Código de Barras de DNA Taxonômico/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA