Searching the protein structure databank with weak sequence patterns and structural constraints.

Jonassen, I; Eidhammer, I; Grindhaug, S H; Taylor, W R

Jonassen, I; Eidhammer, I; Grindhaug, S H; Taylor, W R.

Afiliação

Jonassen I; Department of Informatics, University of Bergen, Hoyteknologisenteret (P.B. 7800), Bergen, N-5020, Norway.

J Mol Biol ; 304(4): 599-619, 2000 Dec 08.

Article em En | MEDLINE | ID: mdl-11099383

ABSTRACT

ABSTRACT

A method is described in which proteins that match PROSITE patterns are filtered by the root-mean-square deviation of the local 3D structures of the probe and target over the pattern components. This was found to increase the discrimination between true and false members of the protein family but was dependent on how unique the structural features in the pattern were compared to equivalent fragments extracted from the structure databank (for example; if the pattern fell in an alpha-helix, then discrimination was poor.) We then generalised the sequence patterns (by widening the range of amino acid residues allowed at each position) and monitored how well the structural information helped retain specificity. While the discrimination of the pure sequence pattern had generally disappeared at information content values less than ten bits, the discrimination of the combined sequence structure probe remained high at this point before following a similar decay. The displacement between these curves indicates that the structural component is, on average, equivalent to about ten bits. The sequence patterns were also filtered using the structure comparison program SAP, giving a global, rather than local "view" of the proteins. This allowed the information content of the sequence patterns to become even less specific but raised problems of whether some proteins encountered with the same fold but no PROSITE pattern should constitute family members.

Assuntos

Bases de Dados como Assunto; Reconhecimento Automatizado de Padrão; Proteínas/química; Alinhamento de Sequência; Motivos de Aminoácidos; Sequência de Aminoácidos; Grupo dos Citocromos c/química; Endopeptidases/química; Fator de Crescimento Epidérmico/química; Kringles; Dados de Sequência Molecular; Sensibilidade e Especificidade; Homologia de Sequência de Aminoácidos; Software

Buscar no Google

Imprimir

XML

PubMed Links

Base de dados: MEDLINE Assunto principal: Reconhecimento Automatizado de Padrão / Proteínas / Alinhamento de Sequência / Bases de Dados como Assunto Idioma: En Ano de publicação: 2000 Tipo de documento: Article

Buscar no Google

Imprimir

XML

PubMed Links