Efficient Markov clustering algorithm for protein sequence grouping.

Szilágyi, László; Szilágyi, Sándor M

Szilágyi, László; Szilágyi, Sándor M.

Annu Int Conf IEEE Eng Med Biol Soc ; 2013: 639-42, 2013.

Article en En | MEDLINE | ID: mdl-24109768

ABSTRACT

ABSTRACT

In this paper we propose an efficient reformulation of a Markov clustering algorithm, suitable for fast and accurate grouping of protein sequences, based on pairwise similarity information. The proposed modification consists of optimal reordering of rows and columns in the similarity matrix after every iteration, transforming it into a matrix with several compact blocks along the diagonal, and zero similarities outside the blocks. These blocks are treated separately in later iterations, thus reducing the computational burden of the algorithm. The proposed algorithm was tested on protein sequence databases like SCOP95. In terms of efficiency, the proposed solution achieves a speed-up factor in the range 15-50 compared to the conventional Markov clustering, depending on input data size and parameter settings. This improvement in computation time is reached without losing anything from the partition accuracy. The convergence is usually reached in 40-50 iterations. Combining the proposed method with sparse matrix representation and parallel execution will certainly lead to a significantly more efficient solution in future.

Asunto(s)

Análisis de Secuencia de Proteína/métodos; Algoritmos; Secuencia de Aminoácidos; Análisis por Conglomerados; Bases de Datos de Proteínas; Cadenas de Markov

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Análisis de Secuencia de Proteína Tipo de estudio: Health_economic_evaluation / Prognostic_studies Idioma: En Revista: Annu Int Conf IEEE Eng Med Biol Soc Año: 2013 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google