Pesquisa | BVS Doenças Infecciosas e Parasitárias

Missing genes in the annotation of prokaryotic genomes.

Warren, Andrew S; Archuleta, Jeremy; Feng, Wu-Chun; Setubal, João Carlos.

BMC Bioinformatics ; 11: 131, 2010 Mar 15.

Artigo em Inglês | MEDLINE | ID: mdl-20230630

RESUMO

BACKGROUND: Protein-coding gene detection in prokaryotic genomes is considered a much simpler problem than in intron-containing eukaryotic genomes. However there have been reports that prokaryotic gene finder programs have problems with small genes (either over-predicting or under-predicting). Therefore the question arises as to whether current genome annotations have systematically missing, small genes. RESULTS: We have developed a high-performance computing methodology to investigate this problem. In this methodology we compare all ORFs larger than or equal to 33 aa from all fully-sequenced prokaryotic replicons. Based on that comparison, and using conservative criteria requiring a minimum taxonomic diversity between conserved ORFs in different genomes, we have discovered 1,153 candidate genes that are missing from current genome annotations. These missing genes are similar only to each other and do not have any strong similarity to gene sequences in public databases, with the implication that these ORFs belong to missing gene families. We also uncovered 38,895 intergenic ORFs, readily identified as putative genes by similarity to currently annotated genes (we call these absent annotations). The vast majority of the missing genes found are small (less than 100 aa). A comparison of select examples with GeneMark, EasyGene and Glimmer predictions yields evidence that some of these genes are escaping detection by these programs. CONCLUSIONS: Prokaryotic gene finders and prokaryotic genome annotations require improvement for accurate prediction of small genes. The number of missing gene families found is likely a lower bound on the actual number, due to the conservative criteria used to determine whether an ORF corresponds to a real gene.

Assuntos

Genes Bacterianos , Genoma Bacteriano , Genômica/métodos , Fases de Leitura Aberta/genética , Bases de Dados Genéticas , Células Procarióticas

A pluggable framework for parallel pairwise sequence search.

Archuleta, Jeremy; Feng, Wu-chun; Tilevich, Eli.

Annu Int Conf IEEE Eng Med Biol Soc ; 2007: 127-30, 2007.

Artigo em Inglês | MEDLINE | ID: mdl-18001905

RESUMO

The current and near future of the computing industry is one of multi-core and multi-processor technology. Most existing sequence-search tools have been designed with a focus on single-core, single-processor systems. This discrepancy between software design and hardware architecture substantially hinders sequence-search performance by not allowing full utilization of the hardware. This paper presents a novel framework that will aid the conversion of serial sequence-search tools into a parallel version that can take full advantage of the available hardware. The framework, which is based on a software architecture called mixin layers with refined roles, enables modules to be plugged into the framework with minimal effort. The inherent modular design improves maintenance and extensibility, thus opening up a plethora of opportunities for advanced algorithmic features to be developed and incorporated while routine maintenance of the codebase persists.

Assuntos

Sistemas Computacionais , Armazenamento e Recuperação da Informação/métodos , Software , Algoritmos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA