Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
1.
Nucleic Acids Res ; 50(W1): W500-W509, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35524553

RESUMO

Multi-CSAR is a web server that can efficiently and more accurately order and orient the contigs in the assembly of a target genome into larger scaffolds based on multiple reference genomes. Given a target genome and multiple reference genomes, Multi-CSAR first identifies sequence markers shared between the target genome and each reference genome, then utilizes these sequence markers to compute a scaffold for the target genome based on each single reference genome, and finally combines all the single reference-derived scaffolds into a multiple reference-derived scaffold. To run Multi-CSAR, the users need to upload a target genome to be scaffolded and one or more reference genomes in multi-FASTA format. The users can also choose to use the 'weighting scheme of reference genomes' for Multi-CSAR to automatically calculate different weights for the reference genomes and choose either 'NUCmer on nucleotides' or 'PROmer on translated amino acids' for Multi-CSAR to identify sequence markers. In the output page, Multi-CSAR displays its multiple reference-derived scaffold in two graphical representations (i.e. Circos plot and dotplot) for the users to visually validate the correctness of scaffolded contigs and in a tabular representation to further validate the scaffold in detail. Multi-CSAR is available online at http://genome.cs.nthu.edu.tw/Multi-CSAR/.


Assuntos
Genoma , Software , Computadores , Análise de Sequência de DNA , Algoritmos
2.
BMC Bioinformatics ; 21(1): 528, 2020 Nov 17.
Artigo em Inglês | MEDLINE | ID: mdl-33203354

RESUMO

BACKGROUND: Next-generation sequencing technologies revolutionized genomics by producing high-throughput reads at low cost, and this progress has prompted the recent development of de novo assemblers. Multiple assembly methods based on de Bruijn graph have been shown to be efficient for Illumina reads. However, the sequencing errors generated by the sequencer complicate analysis of de novo assembly and influence the quality of downstream genomic researches. RESULTS: In this paper, we develop a de Bruijn assembler, called Clover (clustering-oriented de novo assembler), that utilizes a novel k-mer clustering approach from the overlap-layout-consensus concept to deal with the sequencing errors generated by the Illumina platform. We further evaluate Clover's performance against several de Bruijn graph assemblers (ABySS, SOAPdenovo, SPAdes and Velvet), overlap-layout-consensus assemblers (Bambus2, CABOG and MSR-CA) and string graph assembler (SGA) on three datasets (Staphylococcus aureus, Rhodobacter sphaeroides and human chromosome 14). The results show that Clover achieves a superior assembly quality in terms of corrected N50 and E-size while remaining a significantly competitive in run time except SOAPdenovo. In addition, Clover was involved in the sequencing projects of bacterial genomes Acinetobacter baumannii TYTH-1 and Morganella morganii KT. CONCLUSIONS: The marvel clustering-based approach of Clover that integrates the flexibility of the overlap-layout-consensus approach and the efficiency of the de Bruijn graph method has high potential on de novo assembly. Now, Clover is freely available as open source software from https://oz.nthu.edu.tw/~d9562563/src.html .


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Sequência de Bases , Cromossomos Humanos Par 14/genética , Análise por Conglomerados , Genoma Bacteriano , Genômica/métodos , Humanos , Software , Fatores de Tempo
3.
Nucleic Acids Res ; 46(W1): W55-W59, 2018 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-29733393

RESUMO

CSAR-web is a web-based tool that allows the users to efficiently and accurately scaffold (i.e. order and orient) the contigs of a target draft genome based on a complete or incomplete reference genome from a related organism. It takes as input a target genome in multi-FASTA format and a reference genome in FASTA or multi-FASTA format, depending on whether the reference genome is complete or incomplete, respectively. In addition, it requires the users to choose either 'NUCmer on nucleotides' or 'PROmer on translated amino acids' for CSAR-web to identify conserved genomic markers (i.e. matched sequence regions) between the target and reference genomes, which are used by the rearrangement-based scaffolding algorithm in CSAR-web to order and orient the contigs of the target genome based on the reference genome. In the output page, CSAR-web displays its scaffolding result in a graphical mode (i.e. scalable dotplot) allowing the users to visually validate the correctness of scaffolded contigs and in a tabular mode allowing the users to view the details of scaffolds. CSAR-web is available online at http://genome.cs.nthu.edu.tw/CSAR-web.


Assuntos
Mapeamento de Sequências Contíguas , Genoma/genética , Internet , Software , Algoritmos , Biologia Computacional/métodos , Genômica/métodos , Análise de Sequência de DNA
4.
Bioinformatics ; 34(1): 109-111, 2018 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-28968788

RESUMO

Summary: Advances in next generation sequencing have generated massive amounts of short reads. However, assembling genome sequences from short reads still remains a challenging task. Due to errors in reads and large repeats in the genome, many of current assembly tools usually produce just collections of contigs whose relative positions and orientations along the genome being sequenced are still unknown. To address this issue, a scaffolding process to order and orient the contigs of a draft genome is needed for completing the genome sequence. In this work, we propose a new scaffolding tool called CSAR that can efficiently and more accurately order and orient the contigs of a given draft genome based on a reference genome of a related organism. In particular, the reference genome required by CSAR is not necessary to be complete in sequence. Our experimental results on real datasets have shown that CSAR outperforms other similar tools such as Projector2, OSLay and Mauve Aligner in terms of average sensitivity, precision, F-score, genome coverage, NGA50 and running time. Availability and implementation: The program of CSAR can be downloaded from https://github.com/ablab-nthu/CSAR. Contact: hchiu@mail.ncku.edu.tw or cllu@cs.nthu.edu.tw. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Bactérias/genética , Genoma , Genômica/métodos , Humanos
5.
Nucleic Acids Res ; 44(W1): W328-32, 2016 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-27185896

RESUMO

Since its first release in 2010, iPARTS has become a valuable tool for globally or locally aligning two RNA 3D structures. It was implemented by a structural alphabet (SA)-based approach, which uses an SA of 23 letters to reduce RNA 3D structures into 1D sequences of SA letters and applies traditional sequence alignment to these SA-encoded sequences for determining their global or local similarity. In this version, we have re-implemented iPARTS into a new web server iPARTS2 by constructing a totally new SA, which consists of 92 elements with each carrying both information of base and backbone geometry for a representative nucleotide. This SA is significantly different from the one used in iPARTS, because the latter consists of only 23 elements with each carrying only the backbone geometry information of a representative nucleotide. Our experimental results have shown that iPARTS2 outperforms its previous version iPARTS and also achieves better accuracy than other popular tools, such as SARA, SETTER and RASS, in RNA alignment quality and function prediction. iPARTS2 takes as input two RNA 3D structures in the PDB format and outputs their global or local alignments with graphical display. iPARTS2 is now available online at http://genome.cs.nthu.edu.tw/iPARTS2/.


Assuntos
Modelos Estatísticos , Conformação Molecular , Conformação de Ácido Nucleico , RNA/química , Interface Usuário-Computador , Algoritmos , Pareamento de Bases , Gráficos por Computador , Internet , Motivos de Nucleotídeos , Células Procarióticas/metabolismo , RNA/genética , Dobramento de RNA , Alinhamento de Sequência , Análise de Sequência de RNA , Homologia de Sequência do Ácido Nucleico
6.
BMC Bioinformatics ; 18(Suppl 16): 574, 2017 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-29297283

RESUMO

BACKGROUND: RNA molecules have been known to play a variety of significant roles in cells. In principle, the functions of RNAs are largely determined by their three-dimensional (3D) structures. As more and more RNA 3D structures are available in the Protein Data Bank (PDB), a bioinformatics tool, which is able to rapidly and accurately search the PDB database for similar RNA 3D structures or substructures, is helpful to understand the structural and functional relationships of RNAs. RESULTS: Since its first release in 2011, R3D-BLAST has become a useful tool for searching the PDB database for similar RNA 3D structures and substructures. It was implemented by a structural-alphabet (SA)-based method, which utilizes an SA with 23 structural letters to encode RNA 3D structures into one-dimensional (1D) structural sequences and applies BLAST to the resulting structural sequences for searching similar substructures of RNAs. In this study, we have upgraded R3D-BLAST to develop a new web server named R3D-BLAST2 based on a higher quality SA newly constructed from a representative and sufficiently non-redundant list of RNA 3D structures. In addition, we have modified the kernel program in R3D-BLAST2 so that it can accept an RNA structure in the mmCIF format as an input. The results of our experiments on a benchmark dataset have demonstrated that R3D-BLAST2 indeed performs very well in comparison to its earlier version R3D-BLAST and other similar tools RNA FRABASE, FASTR3D and RAG-3D by searching a larger number of RNA 3D substructures resembling those of the input RNA. CONCLUSIONS: R3D-BLAST2 is a valuable BLAST-like search tool that can more accurately scan the PDB database for similar RNA 3D substructures. It is publicly available at http://genome.cs.nthu.edu.tw/R3D-BLAST2/ .


Assuntos
Conformação de Ácido Nucleico , RNA/química , Ferramenta de Busca , Software , Algoritmos , Sequência de Aminoácidos , Bases de Dados de Ácidos Nucleicos , Nucleotídeos/genética , Fatores de Tempo , Interface Usuário-Computador
7.
BMC Bioinformatics ; 17(Suppl 17): 469, 2016 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-28155633

RESUMO

BACKGROUND: A draft genome assembled by current next-generation sequencing techniques from short reads is just a collection of contigs, whose relative positions and orientations along the genome being sequenced are unknown. To further obtain its complete sequence, a contig scaffolding process is usually applied to order and orient the contigs in the draft genome. Although several single reference-based scaffolding tools have been proposed, they may produce erroneous scaffolds if there are rearrangements between the target and reference genomes or their phylogenetic relationship is distant. This may suggest that a single reference genome may not be sufficient to produce correct scaffolds of a draft genome. RESULTS: In this study, we design a simple heuristic method to further revise our single reference-based scaffolding tool CAR into a new one called Multi-CAR such that it can utilize multiple complete genomes of related organisms as references to more accurately order and orient the contigs of a draft genome. In practical usage, our Multi-CAR does not require prior knowledge concerning phylogenetic relationships among the draft and reference genomes and libraries of paired-end reads. To validate Multi-CAR, we have tested it on a real dataset composed of several prokaryotic genomes and also compared its accuracy performance with other multiple reference-based scaffolding tools Ragout and MeDuSa. Our experimental results have finally shown that Multi-CAR indeed outperforms Ragout and MeDuSa in terms of sensitivity, precision, genome coverage, scaffold number and scaffold N50 size. CONCLUSIONS: Multi-CAR serves as an efficient tool that can more accurately order and orient the contigs of a draft genome based on multiple reference genomes. The web server of Multi-CAR is freely available at http://genome.cs.nthu.edu.tw/Multi-CAR/ .


Assuntos
Mapeamento de Sequências Contíguas/métodos , Análise de Sequência de DNA/métodos , Software , Bactérias/genética , Genoma Bacteriano , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos
8.
BMC Bioinformatics ; 15: 381, 2014 Nov 28.
Artigo em Inglês | MEDLINE | ID: mdl-25431302

RESUMO

BACKGROUND: Next generation sequencing technology has allowed efficient production of draft genomes for many organisms of interest. However, most draft genomes are just collections of independent contigs, whose relative positions and orientations along the genome being sequenced are unknown. Although several tools have been developed to order and orient the contigs of draft genomes, more accurate tools are still needed. RESULTS: In this study, we present a novel reference-based contig assembly (or scaffolding) tool, named as CAR, that can efficiently and more accurately order and orient the contigs of a prokaryotic draft genome based on a reference genome of a related organism. Given a set of contigs in multi-FASTA format and a reference genome in FASTA format, CAR can output a list of scaffolds, each of which is a set of ordered and oriented contigs. For validation, we have tested CAR on a real dataset composed of several prokaryotic genomes and also compared its performance with several other reference-based contig assembly tools. Consequently, our experimental results have shown that CAR indeed performs better than all these other reference-based contig assembly tools in terms of sensitivity, precision and genome coverage. CONCLUSIONS: CAR serves as an efficient tool that can more accurately order and orient the contigs of a prokaryotic draft genome based on a reference genome. The web server of CAR is freely available at http://genome.cs.nthu.edu.tw/CAR/ and its stand-alone program can also be downloaded from the same website.


Assuntos
Mapeamento Cromossômico/métodos , Biologia Computacional/métodos , Mapeamento de Sequências Contíguas/métodos , Rearranjo Gênico , Genoma , Análise de Sequência de DNA/métodos , Software , Algoritmos , Células Procarióticas
9.
BMC Bioinformatics ; 14 Suppl 5: S9, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23734866

RESUMO

The techniques of next generation sequencing allow an increasing number of draft genomes to be produced rapidly in a decreasing cost. However, these draft genomes usually are just partially sequenced as collections of unassembled contigs, which cannot be used directly by currently existing algorithms for studying their genome rearrangements and phylogeny reconstruction. In this work, we study the one-sided block (or contig) ordering problem with weighted reversal and block-interchange distance. Given a partially assembled genome π and a completely assembled genome σ, the problem is to find an optimal ordering to assemble (i.e., order and orient) the contigs of π such that the rearrangement distance measured by reversals and block-interchanges (also called generalized transpositions) with the weight ratio 1:2 between the assembled contigs of π and σ is minimized. In addition to genome rearrangements and phylogeny reconstruction, the one-sided block ordering problem particularly has a useful application in genome resequencing, because its algorithms can be used to assemble the contigs of a draft genome π based on a reference genome σ. By using permutation groups, we design an efficient algorithm to solve this one-sided block ordering problem in Oδn time, where n is the number of genes or markers and δ is the number of used reversals and block-interchanges. We also show that the assembly of the partially assembled genome can be done in On time and its weighted rearrangement distance from the completely assembled genome can be calculated in advance in On time. Finally, we have implemented our algorithm into a program and used some simulated datasets to compare its accuracy performance to a currently existing similar tool, called SIS that was implemented by a heuristic algorithm that considers only reversals, on assembling the contigs in draft genomes based on their reference genomes. Our experimental results have shown that the accuracy performance of our program is better than that of SIS, when the number of reversals and transpositions involved in the rearrangement events between the complete genomes of π and σ is increased. In particular, if there are more transpositions involved in the rearrangement events, then the gap of accuracy performance between our program and SIS is increasing.


Assuntos
Algoritmos , Mapeamento de Sequências Contíguas/métodos , Genômica/métodos , Genoma , Filogenia
10.
Nucleic Acids Res ; 39(Web Server issue): W45-9, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-21624889

RESUMO

R3D-BLAST is a BLAST-like search tool that allows the user to quickly and accurately search against the PDB for RNA structures sharing similar substructures with a specified query RNA structure. The basic idea behind R3D-BLAST is that all the RNA 3D structures deposited in the PDB are first encoded as 1D structural sequences using a structural alphabet of 23 distinct nucleotide conformations, and BLAST is then applied to these 1D structural sequences to search for those RNA substructures whose 1D structural sequences are similar to that of the query RNA substructure. R3D-BLAST takes as input an RNA 3D structure in the PDB format and outputs all substructures of the hits similar to that of the query with a graphical display to show their structural superposition. In addition, each RNA substructure hit found by R3D-BLAST has an associated E-value to measure its statistical significance. R3D-BLAST is now available online at http://genome.cs.nthu.edu.tw/R3D-BLAST/ for public access.


Assuntos
RNA/química , Software , Algoritmos , Bases de Dados de Proteínas , Conformação de Ácido Nucleico , RNA de Transferência/química
11.
Nucleic Acids Res ; 38(Web Server issue): W340-7, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20507908

RESUMO

iPARTS is an improved web server for aligning two RNA 3D structures based on a structural alphabet (SA)-based approach. In particular, we first derive a Ramachandran-like diagram of RNAs by plotting nucleotides on a 2D axis using their two pseudo-torsion angles eta and . Next, we apply the affinity propagation clustering algorithm to this eta- plot to obtain an SA of 23-nt conformations. We finally use this SA to transform RNA 3D structures into 1D sequences of SA letters and continue to utilize classical sequence alignment methods to compare these 1D SA-encoded sequences and determine their structural similarities. iPARTS takes as input two RNA 3D structures in the PDB format and outputs their global alignment (for determining overall structural similarity), semiglobal alignments (for detecting structural motifs or substructures), local alignments (for finding locally similar substructures) and normalized local structural alignments (for identifying more similar local substructures without non-similar internal fragments), with graphical display that allows the user to visually view, rotate and enlarge the superposition of aligned RNA 3D structures. iPARTS is now available online at http://bioalgorithm.life.nctu.edu.tw/iPARTS/.


Assuntos
RNA/química , Software , Algoritmos , Internet , Modelos Moleculares , Conformação de Ácido Nucleico , Curva ROC , Alinhamento de Sequência
12.
Nucleic Acids Res ; 38(Web Server issue): W221-7, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20538651

RESUMO

SoRT(2) is a web server that allows the user to perform genome rearrangement analysis involving reversals, generalized transpositions and translocations (including fusions and fissions), and infer phylogenetic trees of genomes being considered based on their pairwise genome rearrangement distances. It takes as input two or more linear/circular multi-chromosomal gene (or synteny block) orders in FASTA-like format. When the input is two genomes, SoRT(2) will quickly calculate their rearrangement distance, as well as a corresponding optimal scenario by highlighting the genes involved in each rearrangement operation. In the case of multiple genomes, SoRT(2) will also construct phylogenetic trees of these genomes based on a matrix of their pairwise rearrangement distances using distance-based approaches, such as neighbor-joining (NJ), unweighted pair group method with arithmetic mean (UPGMA) and Fitch-Margoliash (FM) methods. In addition, if the function of computing jackknife support values is selected, SoRT(2) will further perform the jackknife analysis to evaluate statistical reliability of the constructed NJ, UPGMA and FM trees. SoRT(2) is available online at http://bioalgorithm.life.nctu.edu.tw/SORT2/.


Assuntos
Cromossomos , Evolução Molecular , Genômica/métodos , Filogenia , Software , Animais , Inversão Cromossômica , Cães , Ordem dos Genes , Genoma , Genoma Bacteriano , Genoma Mitocondrial , Humanos , Internet , Camundongos , Ratos , Sintenia , Translocação Genética
13.
J Comput Biol ; 29(9): 961-973, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35638936

RESUMO

Reference-based scaffolding is an important process used in genomic sequencing to order and orient the contigs in a draft genome based on a reference genome. In this study, we utilize the concept of genome rearrangement to formulate this process as an exemplar breakpoint distance (EBD)-based scaffolding problem, whose aim is to scaffold the contigs of two given draft genomes, both containing duplicate genes (or sequence markers) and acting with each other as a reference, such that the EBD between the scaffolded genomes is minimized. The EBD-based scaffolding problem is difficult to solve because it is non-deterministic polynomial-time (NP)-hard. In this work, we design an integer linear programming (ILP)-based algorithm to exactly solve the EBD-based scaffolding problem. Our experimental results on both simulated and biological data sets show that our ILP-based scaffolding algorithm can accurately and efficiently use a reference genome to scaffold the contigs of a draft genome. Moreover, our ILP-based scaffolding algorithm with considering duplicate genes indeed has better accuracy performance than that without considering duplicate genes, suggesting that duplicate genes and their exemplars are helpful for the application of genome rearrangement in the study of the reference-based scaffolding problem. When compared with RaGOO, a current state-of-the-art alignment-based scaffolder, our ILP-based scaffolding algorithm still has better accuracy performance on the biological data sets.


Assuntos
Genoma , Programação Linear , Algoritmos , Sequência de Bases , Genes Duplicados , Genoma/genética , Análise de Sequência de DNA/métodos
14.
BMC Genomics ; 12 Suppl 3: S26, 2011 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-22369173

RESUMO

BACKGROUND: Genome rearrangements are studied on the basis of genome-wide analysis of gene orders and important in the evolution of species. In the last two decades, a variety of rearrangement operations, such as reversals, transpositions, block-interchanges, translocations, fusions and fissions, have been proposed to evaluate the differences between gene orders in two or more genomes. Usually, the computational studies of genome rearrangements are formulated as problems of sorting permutations by rearrangement operations. RESULT: In this article, we study a sorting problem by cut-circularize-linearize-and-paste (CCLP) operations, which aims to find a minimum number of CCLP operations to sort a signed permutation representing a chromosome. The CCLP is a genome rearrangement operation that cuts a segment out of a chromosome, circularizes the segment into a temporary circle, linearizes the temporary circle as a linear segment, and possibly inverts the linearized segment and pastes it into the remaining chromosome. The CCLP operation can model many well-known rearrangements, such as reversals, transpositions and block-interchanges, and others not reported in the biological literature. In addition, it really occurs in the immune response of higher animals. To distinguish those CCLP operations from the reversal, we call them as non-reversal CCLP operations. In this study, we use permutation groups in algebra to design an O(δn) time algorithm for solving the weighted sorting problem by CCLP operations when the weight ratio between reversals and non-reversal CCLP operations is 1:2, where n is the number of genes in the given chromosome and δ is the number of needed CCLP operations. CONCLUSION: The algorithm we propose in this study is very simple so that it can be easily implemented with 1-dimensional arrays and useful in the studies of phylogenetic tree reconstruction and human immune response to tumors.


Assuntos
Algoritmos , Biologia Computacional/métodos , Rearranjo Gênico , Genoma Humano , Humanos
15.
Nucleic Acids Res ; 37(Web Server issue): W287-95, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19435878

RESUMO

FASTR3D is a web-based search tool that allows the user to fast and accurately search the PDB database for structurally similar RNAs. Currently, it allows the user to input three types of queries: (i) a PDB code of an RNA tertiary structure (default), optionally with specified residue range, (ii) an RNA secondary structure, optionally with primary sequence, in the dot-bracket notation and (iii) an RNA primary sequence in the FASTA format. In addition, the user can run FASTR3D with specifying additional filtering options: (i) the released date of RNA structures in the PDB database, and (ii) the experimental methods used to determine RNA structures and their least resolutions. In the output page, FASTR3D will show the user-queried RNA molecule, as well as user-specified options, followed by a detailed list of identified structurally similar RNAs. Particularly, when queried with RNA tertiary structures, FASTR3D provides a graphical display to show the structural superposition of the query structure and each of identified structures. FASTR3D is now available online at http://bioalgorithm.life.nctu.edu.tw/FASTR3D/.


Assuntos
RNA/química , Software , Gráficos por Computador , Bases de Dados Genéticas , Modelos Moleculares , Conformação de Ácido Nucleico , RNA não Traduzido/química , Interface Usuário-Computador
16.
BMC Bioinformatics ; 11: 102, 2010 Feb 24.
Artigo em Inglês | MEDLINE | ID: mdl-20181237

RESUMO

BACKGROUND: Overlapping genes (OGs) are defined as adjacent genes whose coding sequences overlap partially or entirely. In fact, they are ubiquitous in microbial genomes and more conserved between species than non-overlapping genes. Based on this property, we have previously implemented a web server, named OGtree, that allows the user to reconstruct genome trees of some prokaryotes according to their pairwise OG distances. By analogy to the analyses of gene content and gene order, the OG distance between two genomes we defined was based on a measure of combining OG content (i.e., the normalized number of shared orthologous OG pairs) and OG order (i.e., the normalized OG breakpoint distance) in their whole genomes. A shortcoming of using the concept of breakpoints to define the OG distance is its inability to analyze the OG distance of multi-chromosomal genomes. In addition, the amount of overlapping coding sequences between some distantly related prokaryotic genomes may be limited so that it is hard to find enough OGs to properly evaluate their pairwise OG distances. RESULTS: In this study, we therefore define a new OG order distance that is based on more biologically accurate rearrangements (e.g., reversals, transpositions and translocations) rather than breakpoints and that is applicable to both uni-chromosomal and multi-chromosomal genomes. In addition, we expand the term "gene" to include both its coding sequence and regulatory regions so that two adjacent genes whose coding sequences or regulatory regions overlap with each other are considered as a pair of overlapping genes. This is because overlapping of regulatory regions of distinct genes suggests that the regulation of expression for these genes should be more or less interrelated. Based on these modifications, we have reimplemented our OGtree as a new web server, named OGtree2, and have also evaluated its accuracy of genome tree reconstruction on a testing dataset consisting of 21 Proteobacteria genomes. Our experimental results have finally shown that our current OGtree2 indeed outperforms its previous version OGtree, as well as another similar server, called BPhyOG, significantly in the quality of genome tree reconstruction, because the phylogenetic tree obtained by OGtree2 is greatly congruent with the reference tree that coincides with the taxonomy accepted by biologists for these Proteobacteria. CONCLUSIONS: In this study, we have introduced a new web server OGtree2 at http://bioalgorithm.life.nctu.edu.tw/OGtree2.0/ that can serve as a useful tool for reconstructing more precise and robust genome trees of prokaryotes according to their overlapping genes.


Assuntos
Homologia de Genes , Genoma Bacteriano , Genômica/métodos , Filogenia , Bases de Dados Genéticas , Proteobactérias/genética
17.
Nucleic Acids Res ; 36(Web Server issue): W19-24, 2008 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-18502774

RESUMO

SARSA is a web tool that can be used to align two or more RNA tertiary structures. The basic idea behind SARSA is that we use the vector quantization approach to derive a structural alphabet (SA) of 23 nucleotide conformations, via which we transform RNA 3D structures into 1D sequences of SA letters and then utilize classical sequence alignment methods to compare these 1D SA-encoded sequences and determine their structural similarities. In SARSA, we provide two RNA structural alignment tools, PARTS for pairwise alignment of RNA tertiary structures and MARTS for multiple alignment of RNA tertiary structures. Particularly in PARTS, we have implemented four kinds of pairwise alignments for a variety of practical applications: (i) global alignment for comparing whole structural similarity, (ii) semiglobal alignment for detecting structural motifs, (iii) local alignment for finding locally similar substructures and (iv) normalized local alignment for eliminating the mosaic effect of local alignment. Both tools in SARSA take as input RNA 3D structures in the PDB format and in their outputs provide graphical display that allows the user to visually view, rotate and enlarge the superposition of aligned RNA molecules. SARSA is available online at http://bioalgorithm.life.nctu.edu.tw/SARSA/.


Assuntos
RNA/química , Alinhamento de Sequência/métodos , Análise de Sequência de RNA , Software , Internet , Modelos Moleculares , Conformação de Ácido Nucleico
18.
Nucleic Acids Res ; 36(Web Server issue): W475-80, 2008 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-18456706

RESUMO

OGtree is a web-based tool for constructing genome trees of prokaryotic species based on a measure of combining overlapping-gene content and overlapping-gene order in their whole genomes. The overlapping genes (OGs) are defined as adjacent genes whose coding sequences overlap partially or entirely. In fact, OGs are ubiquitous in microbial genomes and more conserved between species than non-OGs. Based on these properties, it has been suggested that OGs can serve as better phylogenetic characters than non-OGs for reconstructing the evolutionary relationships among microbial genomes. OGtree takes the accession numbers of prokaryotic genomes as its input. It then downloads their complete genomes from the National Centre for Biotechnology Information and identifies OGs in each genome and their orthologous OGs in other genomes. Next, OGtree computes an overlapping-gene distance between each pair of input genomes based on a combination of their OG content and orthologous OG order. Finally, it utilizes distance-based methods of building tree to reconstruct the genome trees of input prokaryotic genomes according to their pairwise OG distance. OGtree is available online at http://bioalgorithm.life.nctu.edu.tw/OGtree/.


Assuntos
Bactérias/classificação , Genoma Bacteriano , Filogenia , Software , Ordem dos Genes , Genes Bacterianos , Internet , Proteobactérias/classificação , Proteobactérias/genética , Interface Usuário-Computador
19.
Nucleic Acids Res ; 35(Web Server issue): W639-44, 2007 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-17488842

RESUMO

RE-MuSiC is a web-based multiple sequence alignment tool that can incorporate biological knowledge about structure, function, or conserved patterns regarding the sequences of interest. It accepts amino acid or nucleic acid sequences and a set of constraints as inputs. The constraints are pattern descriptions, instead of exact positions of fragments to be aligned together. The output is an alignment where for each pattern (constraint), an occurrence on each sequence can be found aligned together with those on the other sequences, in a manner that the overall alignment is optimized. Its predecessor, MuSiC, has been found useful by researchers since its release in 2004. However, it is noticed in applications that the pattern formulation adopted in MuSiC, namely, plain strings allowing mismatches, is not expressive and flexible enough. The constraint formulation adopted in RE-MuSiC is therefore enhanced to be regular expressions, which is convenient in expressing many biologically significant patterns like those collected in the PROSITE database, or structural consensuses that often involve variable ranges between conserved parts. Experiments demonstrate that RE-MuSiC can be used to help predict important residues and locate phylogenetically conserved structural elements. RE-MuSiC is available on-line at http://140.113.239.131/RE-MUSIC.


Assuntos
Algoritmos , Biologia Computacional/métodos , Armazenamento e Recuperação da Informação/métodos , Proteínas/química , Proteínas/genética , Alinhamento de Sequência/métodos , Alinhamento de Sequência/estatística & dados numéricos , Análise de Sequência/métodos , Software , Interface Usuário-Computador , Sequência de Aminoácidos , Animais , Sequência Conservada , Humanos , Internet , Dados de Sequência Molecular , Alinhamento de Sequência/normas , Homologia de Sequência de Aminoácidos
20.
Nucleic Acids Res ; 34(Web Server issue): W696-9, 2006 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-16845100

RESUMO

SPRING (http://algorithm.cs.nthu.edu.tw/tools/SPRING/) is a tool for the analysis of genome rearrangement between two chromosomal genomes using reversals and/or block-interchanges. SPRING takes two or more chromosomes as its input and then computes a minimum series of reversals and/or block-interchanges between any two input chromosomes for transforming one chromosome into another. The input of SPRING can be either bacterial-size sequences or gene/landmark orders. If the input is a set of chromosomal sequences then the SPRING will automatically search for identical landmarks, which are homologous/conserved regions shared by all input sequences. In particular, SPRING also computes the breakpoint distance between any pair of two chromosomes, which can be used to compare with the rearrangement distance to confirm whether they are correlated or not. In addition, SPRING shows phylogenetic trees that are reconstructed based on the rearrangement and breakpoint distance matrixes.


Assuntos
Inversão Cromossômica , Cromossomos , Genômica/métodos , Recombinação Genética , Software , Cromossomos Bacterianos , Gammaproteobacteria/genética , Ordem dos Genes , Rearranjo Gênico , Internet , Filogenia , Interface Usuário-Computador , Vibrio/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA