Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks.

Tatusov, R L; Altschul, S F; Koonin, E V

Tatusov, R L; Altschul, S F; Koonin, E V.

Afiliação

Tatusov RL; National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894.

Proc Natl Acad Sci U S A ; 91(25): 12091-5, 1994 Dec 06.

Article em En | MEDLINE | ID: mdl-7991589

RESUMO

We describe an approach to analyzing protein sequence databases that, starting from a single uncharacterized sequence or group of related sequences, generates blocks of conserved segments. The procedure involves iterative database scans with an evolving position-dependent weight matrix constructed from a coevolving set of aligned conserved segments. For each iteration, the expected distribution of matrix scores under a random model is used to set a cutoff score for the inclusion of a segment in the next iteration. This cutoff may be calculated to allow the chance inclusion of either a fixed number or a fixed proportion of false positive segments. With sufficiently high cutoff scores, the procedure converged for all alignment blocks studied, with varying numbers of iterations required. Different methods for calculating weight matrices from alignment blocks were compared. The most effective of those tested was a logarithm-of-odds, Bayesian-based approach that used prior residue probabilities calculated from a mixture of Dirichlet distributions. The procedure described was used to detect novel conserved motifs of potential biological importance.

Assuntos

Sequência de Aminoácidos; Sequência Conservada; Bases de Dados Factuais; Proteínas/química; Proteínas/genética; Bactérias/enzimologia; Bactérias/genética; Evolução Biológica; Sequência Consenso; DNA Topoisomerases Tipo I/química; DNA Topoisomerases Tipo I/genética; Modelos Teóricos; Dados de Sequência Molecular; Saccharomyces cerevisiae/enzimologia; Saccharomyces cerevisiae/genética; Estatística como Assunto

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Proteínas / Bases de Dados Factuais / Sequência de Aminoácidos / Sequência Conservada Tipo de estudo: Diagnostic_studies / Prognostic_studies Idioma: En Ano de publicação: 1994 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google