Your browser doesn't support javascript.
loading
A new family of powerful multivariate statistical sequence analysis techniques.
van Heel, M.
Afiliação
  • van Heel M; Fritz Haber Institute of the Max Planck Society, Berlin Dahlem, Germany.
J Mol Biol ; 220(4): 877-87, 1991 Aug 20.
Article em En | MEDLINE | ID: mdl-1880802
A novel multivariate statistical approach is presented for extracting and exploiting intrinsic information present in our ever-growing sequence data banks. The information extraction from the sequences avoids the pitfalls of intersequence alignment by analyzing secondary invariant functions derived from the sequences in the data bank rather than the sequences themselves. Such typical invariant function is a 20 x 20 histogram of occurrences of amino acid pairs in a given sequence or fragment thereof. To illustrate the potential of the approach an analysis of 10,000 protein sequences from the National Biomedical Research Foundation Protein Identification Resource is presented, whose analysis already reveals great biological detail. For example, zeta-hemoglobin is found to lie close to amphibian and fish chi-hemoglobin which, in turn, is an important clue to the physiological function of this mammalian early embryonic hemoglobin. The multivariate statistical framework presented unifies such apparently unrelated issues as phylogenetic comparisons between a set of sequences and distance matrices between the constituents of the biological sequences. The Multivariate Statistical Sequence Analysis (MSSA) principles can be used for a wide spectrum of sequence analysis problems such as: assignment of family memberships to new sequences, validation of new incoming sequences to be entered into the database, prediction of structure from sequence, discrimination of coding from non-coding DNA regions, and automatic generation of an atlas of protein or DNA sequences. The MSSA techniques represent a self-contained approach to learning continuously and automatically from the growing stream of new sequences. The MSSA approach is particularly likely to play a significant role in major sequencing efforts such as the human genome project.
Assuntos
Buscar no Google
Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Análise Multivariada / Alinhamento de Sequência / Sequência de Aminoácidos Tipo de estudo: Prognostic_studies Limite: Animals / Humans Idioma: En Revista: J Mol Biol Ano de publicação: 1991 Tipo de documento: Article País de afiliação: Alemanha
Buscar no Google
Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Análise Multivariada / Alinhamento de Sequência / Sequência de Aminoácidos Tipo de estudo: Prognostic_studies Limite: Animals / Humans Idioma: En Revista: J Mol Biol Ano de publicação: 1991 Tipo de documento: Article País de afiliação: Alemanha