Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
BMC Bioinformatics ; 23(1): 108, 2022 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-35354426

RESUMO

BACKGROUND: Biological sequence clustering is a complicated data clustering problem owing to the high computation costs incurred for pairwise sequence distance calculations through sequence alignments, as well as difficulties in determining parameters for deriving robust clusters. While current approaches are successful in reducing the number of sequence alignments performed, the generated clusters are based on a single sequence identity threshold applied to every cluster. Poor choices of this identity threshold would thus lead to low quality clusters. There is however little support provided to users in selecting thresholds that are well matched with the input sequences. RESULTS: We present a novel sequence clustering approach called ALFATClust that exploits rapid pairwise alignment-free sequence distance calculations and community detection in graph for clusters generation. Instead of a single threshold applied to every generated cluster, ALFATClust is capable of dynamically determining the cut-off threshold for each individual cluster by considering both cluster separation and intra-cluster sequence similarity. Benchmarking analysis shows that ALFATClust generally outperforms existing approaches by simultaneously maintaining cluster robustness and substantial cluster separation for the benchmark datasets. The software also provides an evaluation report for verifying the quality of the non-singleton clusters obtained. CONCLUSIONS: ALFATClust is able to generate sequence clusters having high intra-cluster sequence similarity and substantial separation between clusters without having users to decide precise similarity cut-off thresholds.


Assuntos
Algoritmos , Software , Benchmarking , Análise por Conglomerados , Alinhamento de Sequência
2.
Sci Rep ; 11(1): 18091, 2021 09 10.
Artigo em Inglês | MEDLINE | ID: mdl-34508122

RESUMO

This study aimed to characterize the alteration of the fecal microbiome and antimicrobial resistance (AMR) determinants in 24 piglets at day 3 pre-weaning (D. - 3), weaning day (D.0), days 3 (D.3) and 8 post-weaning (D.8), using whole-genome shotgun sequencing. Distinct clusters of microbiomes and AMR determinants were observed at D.8 when Prevotella (20.9%) was the major genus, whereas at D. - 3-D.3, Alistipes (6.9-12.7%) and Bacteroides (5.2-8.5%) were the major genera. Lactobacillus and Escherichia were notably observed at D. - 3 (1.2%) and D. - 3-D.3 (0.2-0.4%), respectively. For AMR, a distinct cluster of AMR determinants was observed at D.8, mainly conferring resistance to macrolide-lincosamide-streptogramin (mefA), ß-lactam (cfxA6 and aci1) and phenicol (rlmN). In contrast, at D. - 3-D.3, a high abundance of determinants with aminoglycoside (AMG) (sat, aac(6')-aph(2''), aadA and acrF), ß-lactam (fus-1, cepA and mrdA), multidrug resistance (MDR) (gadW, mdtE, emrA, evgS, tolC and mdtB), phenicol (catB4 and cmlA4), and sulfonamide patterns (sul3) was observed. Canonical correlation analysis (CCA) plot associated Escherichia coli with aac(6')-aph(2''), emrA, mdtB, catB4 and cmlA4 at D. - 3, D.0 and/or D.3 whereas at D.8 associations between Prevotella and mefA, cfxA6 and aci1 were identified. The weaning age and diet factor played an important role in the microbial community composition.


Assuntos
Antibacterianos/farmacologia , Fezes/microbiologia , Microbiota/efeitos dos fármacos , Desmame , Fatores Etários , Animais , Biodiversidade , Metagenoma , Metagenômica/métodos , Suínos
3.
Bioinformatics ; 35(14): 2466-2474, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-30520940

RESUMO

MOTIVATION: Antimicrobial resistance is currently one of the main challenges in public health due to the excessive use of antimicrobials in medical treatments and agriculture. The advancements in high-throughput next-generation sequencing and development of bioinformatics tools allow simultaneous detection and identification of antimicrobial resistance genes (ARGs) from clinical, food and environment samples, to monitor the prevalence and track the dissemination of these ARGs. Such analyses are however reliant on a comprehensive database of ARGs with accurate sequence content and annotation. Most of the current ARG databases are therefore manually curated, but this is a time-consuming process and the resulting curation errors could be hard to detect. Several secondary ARG databases consolidate contents from different source ARG databases, and hence modifications in the primary databases might not be propagated and updated promptly in the secondary ARG databases. RESULTS: To address these problems, a validation and integration toolkit called ARGDIT was developed to validate ARG database fidelity, and merge multiple primary ARG databases into a single consolidated secondary ARG database with optional automated sequence re-annotation. Experimental results demonstrated the effectiveness of this toolkit in identifying errors such as sequence annotation typos in current ARG databases and generating an integrated non-redundant ARG database with structured annotation. A toolkit-oriented workflow is also proposed to minimize the efforts in validating, curating and merging multiple ARG protein or coding sequence databases. Database developers therefore benefit from faster update cycles and lower costs for database maintenance, while ARG pipeline users can easily evaluate the reference ARG database quality. AVAILABILITY AND IMPLEMENTATION: ARGDIT is available at https://github.com/phglab/ARGDIT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Antibacterianos , Bases de Dados de Ácidos Nucleicos , Farmacorresistência Bacteriana , Sequenciamento de Nucleotídeos em Larga Escala
4.
J Theor Biol ; 455: 131-139, 2018 10 14.
Artigo em Inglês | MEDLINE | ID: mdl-30036526

RESUMO

Functionally similar non-coding RNAs are expected to be similar in certain regions of their secondary structures. These similar regions are called common structure motifs, and are structurally conserved throughout evolution to maintain their functional roles. Common structure motif identification is one of the critical tasks in RNA secondary structure analysis. Nevertheless, current approaches suffer several limitations, and/or do not scale with both structure size and the number of input secondary structures. In this work, we present a method to transform the conserved base pair stems into transaction items and apply frequent itemset mining to identify common structure motifs existing in a majority of input structures. Our experimental results on telomerase and ribosomal RNA secondary structures report frequent stem patterns that are of biological significance. Moreover, the algorithms utilized in our method are scalable and frequent stem patterns can be identified efficiently among many large structures.


Assuntos
Algoritmos , Simulação por Computador , Conformação de Ácido Nucleico , RNA Ribossômico/química , RNA/química , Análise de Sequência de RNA , Telomerase/química , RNA/genética , RNA Ribossômico/genética , Telomerase/genética
5.
Brief Bioinform ; 18(2): 291-305, 2017 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-26984617

RESUMO

RNA secondary structure alignment has received more attention since the discovery of the structure-function relationships in some non-protein-encoding RNAs. However, unlike the pure sequence alignment problem, which has been solved in polynomial time, secondary structure alignment incorporates the base pairings as another information dimension in addition to the base sequence. This problem therefore becomes more challenging. In this study, we classify the selected approaches, and algorithmically illustrate how these methods address the alignment problems with different structure types. Other features such as the types of base pair edit operations supported and the time complexity are also compared.


Assuntos
Algoritmos , Sequência de Bases , Conformação de Ácido Nucleico , RNA , Alinhamento de Sequência , Análise de Sequência de RNA
6.
Bioinformatics ; 31(24): 3914-21, 2015 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-26275897

RESUMO

MOTIVATION: The regulatory functions performed by non-coding RNAs are related to their 3D structures, which are, in turn, determined by their secondary structures. Pairwise secondary structure alignment gives insight into the functional similarity between a pair of RNA sequences. Numerous exact or heuristic approaches have been proposed for computational alignment. However, the alignment becomes intractable when arbitrary pseudoknots are allowed. Also, since non-coding RNAs are, in general, more conserved in structures than sequences, it is more effective to perform alignment based on the common structural motifs discovered. RESULTS: We devised a method to approximate the true conserved stem pattern for a secondary structure pair, and constructed the alignment from it. Experimental results suggest that our method identified similar RNA secondary structures better than the existing tools, especially for large structures. It also successfully indicated the conservation of some pseudoknot features with biological significance. More importantly, even for large structures with arbitrary pseudoknots, the alignment can usually be obtained efficiently. AVAILABILITY AND IMPLEMENTATION: Our algorithm has been implemented in a tool called PSMAlign. The source code of PSMAlign is freely available at http://homepage.cs.latrobe.edu.au/ypchen/psmalign/.


Assuntos
Algoritmos , RNA não Traduzido/química , Conformação de Ácido Nucleico , Alinhamento de Sequência , Análise de Sequência de RNA/métodos , Software
7.
IEEE Trans Biomed Eng ; 62(5): 1265-71, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25474805

RESUMO

RNA secondary structures are vital in determining the 3-D structures of noncoding RNA molecules, which in turn affect their functions. Computational RNA secondary structure alignment and analysis are biologically significant, because they help identify numerous functionally important motifs. Unfortunately, many analysis methods suffer from computational intractability in the presence of pseudoknots. The conversion of knotted to knot-free secondary structures is an essential preprocessing step, and is regarded as pseudoknot removal. Although exact methods have been proposed for this task, their computational complexities are undetermined, and so their efficiencies in processing complex pseudoknots are currently unknown. We transformed the pseudoknot removal problem into a circle graph maximum weight independent set (MWIS) problem, in which each MWIS represents a unique optimal deknotted structure. An existing circle graph MWIS algorithm was extended to report either single or all solutions. Its time complexity depends on the number of MWISs, and is guaranteed to report one solution in polynomial time. Experimental results suggest that our extended algorithm is much more efficient than the state-of-the-art tool. We also devised a novel concept called the structural scoring function, and investigated its effectiveness in more accurate solution candidate selection for a certain criteria.


Assuntos
Modelos Moleculares , Conformação de Ácido Nucleico , RNA não Traduzido/química , Análise de Sequência de RNA/métodos , Algoritmos , RNA não Traduzido/genética
8.
PLoS One ; 7(7): e39907, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22792195

RESUMO

BACKGROUND: Current RNA secondary structure prediction approaches predict prevalent pseudoknots such as the H-pseudoknot and kissing hairpin. The number of possible structures increases drastically when more complex pseudoknots are considered, thus leading to computational limitations. On the other hand, the enormous population of possible structures means not all of them appear in real RNA molecules. Therefore, it is of interest to understand how many of them really exist and the reasons for their preferred existence over the others, as any new findings revealed by this study might enhance the capability of future structure prediction algorithms for more accurate prediction of complex pseudoknots. METHODOLOGY/PRINCIPAL FINDINGS: A novel algorithm was devised to estimate the exact number of structural possibilities for a pseudoknot constructed with a specified number of base pair stems. Then, topological classification was applied to classify RNA pseudoknotted structures from data in the RNA STRAND database. By showing the vast possibilities and the real population, it is clear that most of these plausible complex pseudoknots are not observed. Moreover, from these classified motifs that exist in nature, some features were identified for further investigation. It was found that some features are related to helical stacking. Other features are still left open to discover underlying tertiary interactions. CONCLUSIONS: Results from topological classification suggest that complex pseudoknots are usually some well-known motifs that are themselves complex or the interaction results of some special motifs. Heuristics can be proposed to predict the essential parts of these complex motifs, even if the required thermodynamic parameters are currently unknown.


Assuntos
RNA/química , Algoritmos , Biologia Computacional/métodos , Modelos Moleculares , Conformação de Ácido Nucleico
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA