Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Front Microbiol ; 12: 769380, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34912316

RESUMEN

Aeromonas are Gram-negative rods widely distributed in the environment. They can cause severe infections in fish related to financial losses in the fish industry, and are considered opportunistic pathogens of humans causing infections ranging from diarrhea to septicemia. The objective of this study was to determine in silico the contribution of genomic islands to A. hydrophila. The complete genomes of 17 A. hydrophila isolates, which were separated into two phylogenetic groups, were analyzed using a genomic island (GI) predictor. The number of predicted GIs and their characteristics varied among strains. Strains from group 1, which contains mainly fish pathogens, generally have a higher number of predicted GIs, and with larger size, than strains from group 2 constituted by strains recovered from distinct sources. Only a few predicted GIs were shared among them and contained mostly genes from the core genome. Features related to virulence, metabolism, and resistance were found in the predicted GIs, but strains varied in relation to their gene content. In strains from group 1, O Ag biosynthesis clusters OX1 and OX6 were identified, while strains from group 2 each had unique clusters. Metabolic pathways for myo-inositol, L-fucose, sialic acid, and a cluster encoding QueDEC, tgtA5, and proteins related to DNA metabolism were identified in strains of group 1, which share a high number of predicted GIs. No distinctive features of group 2 strains were identified in their predicted GIs, which are more diverse and possibly better represent GIs in this species. However, some strains have several resistance attributes encoded by their predicted GIs. Several predicted GIs encode hypothetical proteins and phage proteins whose functions have not been identified but may contribute to Aeromonas fitness. In summary, features with functions identified on predicted GIs may confer advantages to host colonization and competitiveness in the environment.

2.
Sci Rep ; 10(1): 91, 2020 01 09.
Artículo en Inglés | MEDLINE | ID: mdl-31919449

RESUMEN

Vectoral and alignment-free approaches to biological sequence representation have been explored in bioinformatics to efficiently handle big data. Even so, most current methods involve sequence comparisons via alignment-based heuristics and fail when applied to the analysis of large data sets. Here, we present "Spaced Words Projection (SWeeP)", a method for representing biological sequences using relatively small vectors while preserving intersequence comparability. SWeeP uses spaced-words by scanning the sequences and generating indices to create a higher-dimensional vector that is later projected onto a smaller randomly oriented orthonormal base. We constructed phylogenetic trees for all organisms with mitochondrial and bacterial protein data in the NCBI database. SWeeP quickly built complete and accurate trees for these organisms with low computational cost. We compared SWeeP to other alignment-free methods and Sweep was 10 to 100 times quicker than the other techniques. A tool to build SWeeP vectors is available at https://sourceforge.net/projects/spacedwordsprojection/.


Asunto(s)
Proteínas Bacterianas/metabolismo , Biología Computacional/métodos , Mitocondrias/metabolismo , Proteínas Mitocondriales/metabolismo , Proteoma/análisis , Programas Informáticos , Algoritmos , Proteínas Bacterianas/genética , Conjuntos de Datos como Asunto , Humanos , Proteínas Mitocondriales/genética , Filogenia , Alineación de Secuencia
3.
BMC Bioinformatics ; 20(1): 392, 2019 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-31307371

RESUMEN

BACKGROUND: Clustering methods are essential to partitioning biological samples being useful to minimize the information complexity in large datasets. Tools in this context usually generates data with greed algorithms that solves some Data Mining difficulties which can degrade biological relevant information during the clustering process. The lack of standardization of metrics and consistent bases also raises questions about the clustering efficiency of some methods. Benchmarks are needed to explore the full potential of clustering methods - in which alignment-free methods stand out - and the good choice of dataset makes it essentials. RESULTS: Here we present a new approach to Data Mining in large protein sequences datasets, the Rapid Alignment Free Tool for Sequences Similarity Search to Groups (RAFTS3G), a method to clustering aiming of losing less biological information in the processes of generation groups. The strategy developed in our algorithm is optimized to be more astringent which reflects increase in accuracy and sensitivity in the generation of clusters in a wide range of similarity. RAFTS3G is the better choice compared to three main methods when the user wants more reliable result even ignoring the ideal threshold to clustering. CONCLUSION: In general, RAFTS3G is able to group up to millions of biological sequences into large datasets, which is a remarkable option of efficiency in clustering. RAFTS3G compared to other "standard-gold" methods in the clustering of large biological data maintains the balance between the reduction of biological information redundancy and the creation of consistent groups. We bring the binary search concept applied to grouped sequences which shows maintaining sensitivity/accuracy relation and up to minimize the time of data generated with RAFTS3G process.


Asunto(s)
Proteínas/química , Programas Informáticos , Algoritmos , Análisis por Conglomerados , Minería de Datos , Bases de Datos de Proteínas
4.
Front Genet ; 9: 619, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30631340

RESUMEN

Tools for genomic island prediction use strategies for genomic comparison analysis and sequence composition analysis. The goal of comparative analysis is to identify unique regions in the genomes of related organisms, whereas sequence composition analysis evaluates and relates the composition of specific regions with other regions in the genome. The goal of this study was to qualitatively and quantitatively evaluate extant genomic island predictors. We chose tools reported to produce significant results using sequence composition prediction, comparative genomics, and hybrid genomics methods. To maintain diversity, the tools were applied to eight complete genomes of organisms with distinct characteristics and belonging to different families. Escherichia coli CFT073 was used as a control and considered as the gold standard because its islands were previously curated in vitro. The results of predictions with the gold standard were manually curated, and the content and characteristics of each predicted island were analyzed. For other organisms, we created GenBank (GBK) files using Artemis software for each predicted island. We copied only the amino acid sequences from the coding sequence and constructed a multi-FASTA file for each predictor. We used BLASTp to compare all results and generate hits to evaluate similarities and differences among the predictions. Comparison of the results with the gold standard revealed that GIPSy produced the best results, covering ~91% of the composition and regions of the islands, followed by Alien Hunter (81%), IslandViewer (47.8%), Predict Bias (31%), GI Hunter (17%), and Zisland Explorer (16%). The tools with the best results in the analyzes of the set of organisms were the same ones that presented better performance in the tests with the gold standard.

5.
Front Genet ; 8: 165, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29163633

RESUMEN

Nowadays defying homology relationships among sequences is essential for biological research. Within homology the analysis of orthologs sequences is of great importance for computational biology, annotation of genomes and for phylogenetic inference. Since 2007, with the increase in the number of new sequences being deposited in large biological databases, researchers have begun to analyse computerized methodologies and tools aimed at selecting the most promising ones in the prediction of orthologous groups. Literature in this field of research describes the problems that the majority of available tools show, such as those encountered in accuracy, time required for analysis (especially in light of the increasing volume of data being submitted, which require faster techniques) and the automatization of the process without requiring manual intervention. Conducting our search through BMC, Google Scholar, NCBI PubMed, and Expasy, we examined more than 600 articles pursuing the most recent techniques and tools developed to solve most the problems still existing in orthology detection. We listed the main computational tools created and developed between 2011 and 2017, taking into consideration the differences in the type of orthology analysis, outlining the main features of each tool and pointing to the problems that each one tries to address. We also observed that several tools still use as their main algorithm the BLAST "all-against-all" methodology, which entails some limitations, such as limited number of queries, computational cost, and high processing time to complete the analysis. However, new promising tools are being developed, like OrthoVenn (which uses the Venn diagram to show the relationship of ortholog groups generated by its algorithm); or proteinOrtho (which improves the accuracy of ortholog groups); or ReMark (tackling the integration of the pipeline to turn the entry process automatic); or OrthAgogue (using algorithms developed to minimize processing time); and proteinOrtho (developed for dealing with large amounts of biological data). We made a comparison among the main features of four tool and tested them using four for prokaryotic genomas. We hope that our review can be useful for researchers and will help them in selecting the most appropriate tool for their work in the field of orthology.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA