Mining frequent patterns in protein structures: a study of protease families.
Bioinformatics
; 20 Suppl 1: i77-85, 2004 Aug 04.
Article
em En
| MEDLINE
| ID: mdl-15262784
MOTIVATION: Analysis of protein sequence and structure databases usually reveal frequent patterns (FP) associated with biological function. Data mining techniques generally consider the physicochemical and structural properties of amino acids and their microenvironment in the folded structures. Dynamics is not usually considered, although proteins are not static, and their function relates to conformational mobility in many cases. RESULTS: This work describes a novel unsupervised learning approach to discover FPs in the protein families, based on biochemical, geometric and dynamic features. Without any prior knowledge of functional motifs, the method discovers the FPs for each type of amino acid and identifies the conserved residues in three protease subfamilies; chymotrypsin and subtilisin subfamilies of serine proteases and papain subfamily of cysteine proteases. The catalytic triad residues are distinguished by their strong spatial coupling (high interconnectivity) to other conserved residues. Although the spatial arrangements of the catalytic residues in the two subfamilies of serine proteases are similar, their FPs are found to be quite different. The present approach appears to be a promising tool for detecting functional patterns in rapidly growing structure databases and providing insights in to the relationship among protein structure, dynamics and function. AVAILABILITY: Available upon request from the authors.
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Peptídeo Hidrolases
/
Algoritmos
/
Reconhecimento Automatizado de Padrão
/
Armazenamento e Recuperação da Informação
/
Análise de Sequência de Proteína
/
Bases de Dados de Proteínas
Idioma:
En
Ano de publicação:
2004
Tipo de documento:
Article