Mining microorganism EST databases in the quest for new proteins

Faria-Campos, A. C; Cerqueira, G. C; Anacleto, C; Carvalho, C. M. de; Ortega, J. M

Faria-Campos, A. C; Cerqueira, G. C; Anacleto, C; Carvalho, C. M. de; Ortega, J. M.

Afiliação

Faria-Campos, A. C; Universidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Bioquímica e Imunologia. Belo Horizonte. BR
Cerqueira, G. C; Universidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Bioquímica e Imunologia. Belo Horizonte. BR
Anacleto, C; Universidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Bioquímica e Imunologia. Belo Horizonte. BR
Carvalho, C. M. de; Universidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Bioquímica e Imunologia. Belo Horizonte. BR
Ortega, J. M; Universidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Bioquímica e Imunologia. Belo Horizonte. BR

Genet. mol. res. (Online) ; 2(1): 169-177, Mar. 2003.

Article em En | LILACS | ID: lil-417613

Biblioteca responsável: BR1.1

RESUMO

RESUMO

Microorganisms with large genomes are commonly the subjects of single-round partial sequencing of cDNA, generating expressed sequence tags (ESTs). Usually there is a great distance between gene discovery by EST projects and submission of amino acid sequences to public databases. We analyzed the relationship between available ESTs and protein sequences and used the sequences available in the secondary database, clusters of orthologous groups (COG), to investigate ESTs from eight microorganisms of medical and/or economic relevance, selecting for candidate ESTs that may be further pursued for protein characterization. The organisms chosen were Paracoccidioides brasiliensis, Dictyostelium discoideum, Fusarium graminearum, Plasmodium yoelii, Magnaporthe grisea, Emericella nidulans, Chlamydomonas reinhardtii and Eimeria tenella, which have more than 10,000 ESTs available in dbEST. A total of 77,114 protein sequences from COG were used, corresponding to 3,201 distinct genes. At least 212 of these were capable of identifying candidate ESTs for further studies (E. tenella). This number was extended to over 700 candidate ESTs (C. reinhardtii, F. graminearum). Remarkably, even the organism that presents the highest number of ESTs corresponding to known proteins, P. yoelii, showed a considerable number of candidate ESTs for protein characterization (477). For some organisms, such as P. brasiliensis, M. grisea and F. graminearum, bioinformatics has allowed for automatic annotation of up to about 20 of the ESTs that did not correspond to proteins already characterized in the organism. In conclusion, 4093 ESTs from these eight organisms that are homologous to COG genes were selected as candidates for protein characterization

Assuntos

Animais; Bases de Dados de Proteínas; Etiquetas de Sequências Expressas; Análise de Sequência de Proteína; Chlamydomonas reinhardtii/genética; Dictyostelium/genética; Eimeria tenella/genética; Emericella/genética; Fusarium/genética; Genoma; Magnaporthe/genética; Paracoccidioides/genética; Plasmodium yoelii/genética; Proteínas/genética; Homologia de Sequência de Aminoácidos

Texto completo

Imprimir

XML

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: LILACS Assunto principal: Etiquetas de Sequências Expressas / Análise de Sequência de Proteína / Bases de Dados de Proteínas Limite: Animals Idioma: En Ano de publicação: 2003 Tipo de documento: Article

Texto completo

Imprimir

XML

Buscar no Google