Búsqueda | Portal Regional de la BVS

A Parallel Multiobjective Metaheuristic for Multiple Sequence Alignment.

Rubio-Largo, Álvaro; Castelli, Mauro; Vanneschi, Leonardo; Vega-Rodríguez, Miguel A.

J Comput Biol ; 25(9): 1009-1022, 2018 09.

Artículo en Inglés | MEDLINE | ID: mdl-29671616

RESUMEN

The alignment among three or more nucleotides/amino acids sequences at the same time is known as multiple sequence alignment (MSA), a nondeterministic polynomial time (NP)-hard optimization problem. The time complexity of finding an optimal alignment raises exponentially when the number of sequences to align increases. In this work, we deal with a multiobjective version of the MSA problem wherein the goal is to simultaneously optimize the accuracy and conservation of the alignment. A parallel version of the hybrid multiobjective memetic metaheuristics for MSA is proposed. To evaluate the parallel performance of our proposal, we have selected a pull of data sets with different number of sequences (up to 1000 sequences) and study its parallel performance against other well-known parallel metaheuristics published in the literature, such as MSAProbs, tree-based consistency objective function for alignment evaluation (T-Coffee), Clustal [Formula: see text], and multiple alignment using fast Fourier transform (MAFFT). The comparative study reveals that our parallel aligner obtains better results than MSAProbs, T-Coffee, Clustal [Formula: see text], and MAFFT. In addition, the parallel version is around 25 times faster than the sequential version with 32 cores, obtaining an efficiency around 80%.

Asunto(s)

Algoritmos , Biología Computacional/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Humanos

A Characteristic-Based Framework for Multiple Sequence Aligners.

Rubio-Largo, Alvaro; Vanneschi, Leonardo; Castelli, Mauro; Vega-Rodriguez, Miguel A.

IEEE Trans Cybern ; 48(1): 41-51, 2018 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-27831898

RESUMEN

The multiple sequence alignment is a well-known bioinformatics problem that consists in the alignment of three or more biological sequences (protein or nucleic acid). In the literature, a number of tools have been proposed for dealing with this biological sequence alignment problem, such as progressive methods, consistency-based methods, or iterative methods; among others. These aligners often use a default parameter configuration for all the input sequences to align. However, the default configuration is not always the best choice, the alignment accuracy of the tool may be highly boosted if specific parameter configurations are used, depending on the biological characteristics of the input sequences. In this paper, we propose a characteristic-based framework for multiple sequence aligners. The idea of the framework is, given an input set of unaligned sequences, extract its characteristics and run the aligner with the best parameter configuration found for another set of unaligned sequences with similar characteristics. In order to test the framework, we have used the well-known multiple sequence comparison by log-expectation (MUSCLE) v3.8 aligner with different benchmarks, such as benchmark alignments database v3.0, protein reference alignment benchmark v4.0, and sequence alignment benchmark v1.65. The results shown that the alignment accuracy and conservation of MUSCLE might be greatly improved with the proposed framework, specially in those scenarios with a low percentage of identity. The characteristic-based framework for multiple sequence aligners is freely available for downloading at http://arco.unex.es/arl/fwk-msa/cbf-msa.zip.

Reducing Alignment Time Complexity of Ultra-Large Sets of Sequences.

Rubio-Largo, Álvaro; Vanneschi, Leonardo; Castelli, Mauro; Vega-Rodríguez, Miguel A.

J Comput Biol ; 24(11): 1144-1154, 2017 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-28686466

RESUMEN

The alignment of three or more protein or nucleotide sequences is known as Multiple Sequence Alignment problem. The complexity of this problem increases exponentially with the number of sequences; therefore, many of the current approaches published in the literature suffer a computational overhead when thousands of sequences are required to be aligned. We introduce a new approach for dealing with ultra-large sets of sequences. A two-level clustering method is considered. The first level clusters the input sequences by using their biological composition, that is, the number of positive, negative, polar, special, and hydrophobic amino acids. In the second level, each cluster is divided into different clusters according to their similarity. Then, each cluster is aligned by using any method/aligner. After aligning the centroid sequences of each second-level cluster, we extrapolate the new gaps to each cluster of sequences to obtain the final alignment. We present a study on biological data with up to â¼100,000 sequences, showing that the proposed approach is able to obtain accurate alignments in a reduced amount of time; for example, in >10,000 sequences datasets, it is able to reduce up to â¼45 times the required runtime of the well-known Kalign.

Asunto(s)

Algoritmos , Biología Computacional/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de Proteína/métodos , Humanos

Finding Patterns in Protein Sequences by Using a Hybrid Multiobjective Teaching Learning Based Optimization Algorithm.

González-Álvarez, David L; Vega-Rodríguez, Miguel A; Rubio-Largo, Álvaro.

IEEE/ACM Trans Comput Biol Bioinform ; 12(3): 656-66, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-26357276

RESUMEN

Proteins are molecules that form the mass of living beings. These proteins exist in dissociated forms like amino-acids and carry out various biological functions, in fact, almost all body reactions occur with the participation of proteins. This is one of the reasons why the analysis of proteins has become a major issue in biology. In a more concrete way, the identification of conserved patterns in a set of related protein sequences can provide relevant biological information about these protein functions. In this paper, we present a novel algorithm based on teaching learning based optimization (TLBO) combined with a local search function specialized to predict common patterns in sets of protein sequences. This population-based evolutionary algorithm defines a group of individuals (solutions) that enhance their knowledge (quality) by means of different learning stages. Thus, if we correctly adapt it to the biological context of the mentioned problem, we can get an acceptable set of quality solutions. To evaluate the performance of the proposed technique, we have used six instances composed of different related protein sequences obtained from the PROSITE database. As we will see, the designed approach makes good predictions and improves the quality of the solutions found by other well-known biological tools.

Asunto(s)

Algoritmos , Secuencia de Aminoácidos , Biología Computacional/métodos , Proteínas/química , Análisis de Secuencia de Proteína/métodos , Bases de Datos de Proteínas

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA