Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 20(1): 77, 2019 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-30764761

RESUMO

BACKGROUND: Genetic sequence database retrieval benchmarks play an essential role in evaluating the performance of sequence searching tools. To date, all phylogenetically diverse benchmarks known to the authors include only query sequences with single protein domains. Domains are the primary building blocks of protein structure and function. Independently, each domain can fulfill a single function, but most proteins (>80% in Metazoa) exist as multi-domain proteins. Multiple domain units combine in various arrangements or architectures to create different functions and are often under evolutionary pressures to yield new ones. Thus, it is crucial to create gold standards reflecting the multi-domain complexity of real proteins to more accurately evaluate sequence searching tools. DESCRIPTION: This work introduces MultiDomainBenchmark (MDB), a database suite of 412 curated multi-domain queries and 227,512 target sequences, representing at least 5108 species and 1123 phylogenetically divergent protein families, their relevancy annotation, and domain location. Here, we use the benchmark to evaluate the performance of two commonly used sequence searching tools, BLAST/PSI-BLAST and HMMER. Additionally, we introduce a novel classification technique for multi-domain proteins to evaluate how well an algorithm recovers a domain architecture. CONCLUSION: MDB is publicly available at http://csc.columbusstate.edu/carroll/MDB/ .


Assuntos
Algoritmos , Benchmarking , Bases de Dados de Proteínas , Proteínas/química , Sequência de Aminoácidos , Filogenia , Estrutura Terciária de Proteína , Alinhamento de Sequência
2.
Genome Biol Evol ; 10(4): 1019-1038, 2018 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-29617800

RESUMO

Dinoflagellates are a group of unicellular protists with immense ecological and evolutionary significance and cell biological diversity. Of the photosynthetic dinoflagellates, the majority possess a plastid containing the pigment peridinin, whereas some lineages have replaced this plastid by serial endosymbiosis with plastids of distinct evolutionary affiliations, including a fucoxanthin pigment-containing plastid of haptophyte origin. Previous studies have described the presence of widespread substitutional RNA editing in peridinin and fucoxanthin plastid genes. Because reports of this process have been limited to manual assessment of individual lineages, global trends concerning this RNA editing and its effect on the biological function of the plastid are largely unknown. Using novel bioinformatic methods, we examine the dynamics and evolution of RNA editing over a large multispecies data set of dinoflagellates, including novel sequence data from the peridinin dinoflagellate Pyrocystis lunula and the fucoxanthin dinoflagellate Karenia mikimotoi. We demonstrate that while most individual RNA editing events in dinoflagellate plastids are restricted to single species, global patterns, and functional consequences of editing are broadly conserved. We find that editing is biased toward specific codon positions and regions of genes, and generally corrects otherwise deleterious changes in the genome prior to translation, though this effect is more prevalent in peridinin than fucoxanthin lineages. Our results support a model for promiscuous editing application subsequently shaped by purifying selection, and suggest the presence of an underlying editing mechanism transferred from the peridinin-containing ancestor into fucoxanthin plastids postendosymbiosis, with remarkably conserved functional consequences in the new lineage.


Assuntos
Sequência Conservada/genética , Dinoflagellida/genética , Evolução Molecular , Plastídeos/genética , Genoma , Filogenia , Edição de RNA/genética , Simbiose/genética
3.
Artigo em Inglês | MEDLINE | ID: mdl-26357264

RESUMO

Over the past few decades, discovery based on sequence homology has become a widely accepted practice. Consequently, comparative accuracy of retrieval algorithms (e.g., BLAST) has been rigorously studied for improvement. Unlike most components of retrieval algorithms, the E-value threshold criterion has yet to be thoroughly investigated. An investigation of the threshold is important as it exclusively dictates which sequences are declared relevant and irrelevant. In this paper, we introduce the false discovery rate (FDR) statistic as a replacement for the uniform threshold criterion in order to improve efficacy in retrieval systems. Using NCBI's BLAST and PSI-BLAST software packages, we demonstrate the applicability of such a replacement in both non-iterative (BLASTFDR) and iterative (PSI-BLAST(FDR)) homology searches. For each application, we performed an evaluation of retrieval efficacy with five different multiple testing methods on a large training database. For each algorithm, we choose the best performing method, Benjamini-Hochberg, as the default statistic. As measured by the threshold average precision, BLAST(FDR) yielded 14.1 percent better retrieval performance than BLAST on a large (5,161 queries) test database and PSI-BLAST(FDR) attained 11.8 percent better retrieval performance than PSI-BLAST. The C++ source code specific to BLAST(FDR) and PSI-BLAST(FDR) and instructions are available at http://www.cs.mtsu.edu/~hcarroll/blast_fdr/.


Assuntos
Biologia Computacional/métodos , Análise de Sequência de Proteína/métodos , Homologia de Sequência de Aminoácidos , Algoritmos , Bases de Dados de Proteínas , Software
4.
BMC Genomics ; 15: 31, 2014 Jan 17.
Artigo em Inglês | MEDLINE | ID: mdl-24433288

RESUMO

BACKGROUND: The purpose of this study was to sequence and assemble the tobacco mitochondrial transcriptome and obtain a genomic-level view of steady-state RNA abundance. Plant mitochondrial genomes have a small number of protein coding genes with large and variably sized intergenic spaces. In the tobacco mitogenome these intergenic spaces contain numerous open reading frames (ORFs) with no clear function. RESULTS: The assembled transcriptome revealed distinct monocistronic and polycistronic transcripts along with large intergenic spaces with little to no detectable RNA. Eighteen of the 117 ORFs were found to have steady-state RNA amounts above background in both deep-sequencing and qRT-PCR experiments and ten of those were found to be polysome associated. In addition, the assembled transcriptome enabled a full mitogenome screen of RNA C→U editing sites. Six hundred and thirty five potential edits were found with 557 occurring within protein-coding genes, five in tRNA genes, and 73 in non-coding regions. These sites were found in every protein-coding transcript in the tobacco mitogenome. CONCLUSION: These results suggest that a small number of the ORFs within the tobacco mitogenome may produce functional proteins and that RNA editing occurs in coding and non-coding regions of mitochondrial transcripts.


Assuntos
Genes Mitocondriais , Mitocôndrias/genética , Nicotiana/genética , Fases de Leitura Aberta/genética , Transcriptoma , Sequenciamento de Nucleotídeos em Larga Escala , Mitocôndrias/metabolismo , Polirribossomos/metabolismo , Edição de RNA , RNA Ribossômico/isolamento & purificação , RNA Ribossômico/metabolismo , RNA de Transferência/metabolismo
5.
Bioinformatics ; 26(14): 1708-13, 2010 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-20505002

RESUMO

MOTIVATION: Since database retrieval is a fundamental operation, the measurement of retrieval efficacy is critical to progress in bioinformatics. This article points out some issues with current methods of measuring retrieval efficacy and suggests some improvements. In particular, many studies have used the pooled receiver operating characteristic for n irrelevant records (ROC(n)) score, the area under the ROC curve (AUC) of a 'pooled' ROC curve, truncated at n irrelevant records. Unfortunately, the pooled ROC(n) score does not faithfully reflect actual usage of retrieval algorithms. Additionally, a pooled ROC(n) score can be very sensitive to retrieval results from as little as a single query. METHODS: To replace the pooled ROC(n) score, we propose the Threshold Average Precision (TAP-k), a measure closely related to the well-known average precision in information retrieval, but reflecting the usage of E-values in bioinformatics. Furthermore, in addition to conditions previously given in the literature, we introduce three new criteria that an ideal measure of retrieval efficacy should satisfy. RESULTS: PSI-BLAST, GLOBAL, HMMER and RPS-BLAST provided examples of using the TAP-k and pooled ROC(n) scores to evaluate sequence retrieval algorithms. In particular, compelling examples using real data highlight the drawbacks of the pooled ROC(n) score, showing that it can produce evaluations skewing far from intuitive expectations. In contrast, the TAP-k satisfies most of the criteria desired in an ideal measure of retrieval efficacy. AVAILABILITY AND IMPLEMENTATION: The TAP-k web server and downloadable Perl script are freely available at http://www.ncbi.nlm.nih.gov/CBBresearch/Spouge/html.ncbi/tap/


Assuntos
Biologia Computacional/métodos , Armazenamento e Recuperação da Informação/métodos , Software , Bases de Dados Factuais , Internet , Curva ROC
6.
Int J Bioinform Res Appl ; 3(4): 493-503, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-18048315

RESUMO

Fundamental to Multiple Sequence Alignment (MSA) algorithms is modelling insertions and deletions (gaps). The most prevalent model is to use Gap Open Penalties (GOP) and Gap Extension Penalties (GEP). While GOP and GEP are well understood conceptually, their effects on MSA and consequently on phylogeny scores are not as well understood. We use exhaustive phylogeny searching to explore the effects of varying the GOP and GEP for three nuclear ribosomal data sets. Particular attention is given to optimal maximum likelihood and parsimony phylogeny scores for various alignments of a range of GOP and GEP and their respective distribution of phylogeny scores.


Assuntos
Biologia Computacional/métodos , Proteômica/métodos , Núcleo Celular/metabolismo , Análise por Conglomerados , DNA/química , Deleção de Genes , Funções Verossimilhança , Modelos Genéticos , Modelos Estatísticos , Modelos Teóricos , Filogenia , Reprodutibilidade dos Testes , Ribossomos/metabolismo , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...