Your browser doesn't support javascript.
loading
Shape based indexing for faster search of RNA family databases.
Janssen, Stefan; Reeder, Jens; Giegerich, Robert.
Afiliação
  • Janssen S; Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany. stefan.janssen@uni-bielefeld.de
BMC Bioinformatics ; 9: 131, 2008 Feb 29.
Article em En | MEDLINE | ID: mdl-18312625
ABSTRACT

BACKGROUND:

Most non-coding RNA families exert their function by means of a conserved, common secondary structure. The Rfam data base contains more than five hundred structurally annotated RNA families. Unfortunately, searching for new family members using covariance models (CMs) is very time consuming. Filtering approaches that use the sequence conservation to reduce the number of CM searches, are fast, but it is unknown to which sacrifice.

RESULTS:

We present a new filtering approach, which exploits the family specific secondary structure and significantly reduces the number of CM searches. The filter eliminates approximately 85% of the queries and discards only 2.6% true positives when evaluating Rfam against itself. First results also capture previously undetected non-coding RNAs in a recent human RNAz screen.

CONCLUSION:

The RNA shape index filter (RNAsifter) is based on the following rationale An RNA family is characterised by structure, much more succinctly than by sequence content. Structures of individual family members, which naturally have different length and sequence composition, may exhibit structural variation in detail, but overall, they have a common shape in a more abstract sense. Given a fixed release of the Rfam data base, we can compute these abstract shapes for all families. This is called a shape index. If a query sequence belongs to a certain family, it must be able to fold into the family shape with reasonable free energy. Therefore, rather than matching the query against all families in the data base, we can first (and quickly) compute its feasible shape(s), and use the shape index to access only those families where a good match is possible due to a common shape with the query.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / Sistemas de Gerenciamento de Base de Dados / RNA / Armazenamento e Recuperação da Informação / Alinhamento de Sequência / Análise de Sequência de RNA / Bases de Dados Genéticas Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2008 Tipo de documento: Article País de afiliação: Alemanha

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / Sistemas de Gerenciamento de Base de Dados / RNA / Armazenamento e Recuperação da Informação / Alinhamento de Sequência / Análise de Sequência de RNA / Bases de Dados Genéticas Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2008 Tipo de documento: Article País de afiliação: Alemanha