Your browser doesn't support javascript.
loading
Comprehensive and relaxed search for oligonucleotide signatures in hierarchically clustered sequence datasets.
Bader, Kai Christian; Grothoff, Christian; Meier, Harald.
  • Bader KC; Services Department of Informatics, Technische Universität München, Boltzmannstrasse 3, 85748 Garching, Germany.
Bioinformatics ; 27(11): 1546-54, 2011 Jun 01.
Article en En | MEDLINE | ID: mdl-21471017
ABSTRACT
MOTIVATION PCR, hybridization, DNA sequencing and other important methods in molecular diagnostics rely on both sequence-specific and sequence group-specific oligonucleotide primers and probes. Their design depends on the identification of oligonucleotide signatures in whole genome or marker gene sequences. Although genome and gene databases are generally available and regularly updated, collections of valuable signatures are rare. Even for single requests, the search for signatures becomes computationally expensive when working with large collections of target (and non-target) sequences. Moreover, with growing dataset sizes, the chance of finding exact group-matching signatures decreases, necessitating the application of relaxed search methods. The resultant substantial increase in complexity is exacerbated by the dearth of algorithms able to solve these problems efficiently.

RESULTS:

We have developed CaSSiS, a fast and scalable method for computing comprehensive collections of sequence- and sequence group-specific oligonucleotide signatures from large sets of hierarchically clustered nucleic acid sequence data. Based on the ARB Positional Tree (PT-)Server and a newly developed BGRT data structure, CaSSiS not only determines sequence-specific signatures and perfect group-covering signatures for every node within the cluster (i.e. target groups), but also signatures with maximal group coverage (sensitivity) within a user-defined range of non-target hits (specificity) for groups lacking a perfect common signature. An upper limit of tolerated mismatches within the target group, as well as the minimum number of mismatches with non-target sequences, can be predefined. Test runs with one of the largest phylogenetic gene sequence datasets available indicate good runtime and memory performance, and in silico spot tests have shown the usefulness of the resulting signature sequences as blueprints for group-specific oligonucleotide probes.

AVAILABILITY:

Software and Supplementary Material are available at http//cassis.in.tum.de/.
Asunto(s)

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Algoritmos / Sondas de Oligonucleótidos / Análisis de Secuencia de ARN / Análisis de Secuencia de ADN Tipo de estudio: Diagnostic_studies / Prognostic_studies Idioma: En Año: 2011 Tipo del documento: Article

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Algoritmos / Sondas de Oligonucleótidos / Análisis de Secuencia de ARN / Análisis de Secuencia de ADN Tipo de estudio: Diagnostic_studies / Prognostic_studies Idioma: En Año: 2011 Tipo del documento: Article