Generalised sequence signatures through symbolic clustering.
Int J Data Min Bioinform
; 4(6): 656-74, 2010.
Article
em En
| MEDLINE
| ID: mdl-21355500
Traditionally sequence motifs and domains are defined such that insertions, deletions and mismatched regions are small compared with matched regions. We introduce an algorithm for the identification of Generalised Sequence Signatures (GSS) that can be composed of windows distributed throughout the sequence. Our approach is based on clustering analysis of recurring subsequences of a predefined length, to which we refer as symbols. Sequences are grouped so as to maximise the number of shared symbols among them. We show that the utilisation of GSS for deriving sequence annotations yields higher confidence values than the usage of other signature recognition approaches.
Buscar no Google
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Algoritmos
/
Perfilação da Expressão Gênica
/
Genômica
Idioma:
En
Ano de publicação:
2010
Tipo de documento:
Article