Pesquisa | Portal Regional da BVS

A fast exact sequential algorithm for the partial digest problem.

Abbas, Mostafa M; Bahig, Hazem M.

BMC Bioinformatics ; 17(Suppl 19): 510, 2016 Dec 22.

Artigo em Inglês | MEDLINE | ID: mdl-28155644

RESUMO

BACKGROUND: Restriction site analysis involves determining the locations of restriction sites after the process of digestion by reconstructing their positions based on the lengths of the cut DNA. Using different reaction times with a single enzyme to cut DNA is a technique known as a partial digestion. Determining the exact locations of restriction sites following a partial digestion is challenging due to the computational time required even with the best known practical algorithm. RESULTS: In this paper, we introduce an efficient algorithm to find the exact solution for the partial digest problem. The algorithm is able to find all possible solutions for the input and works by traversing the solution tree with a breadth-first search in two stages and deleting all repeated subproblems. Two types of simulated data, random and Zhang, are used to measure the efficiency of the algorithm. We also apply the algorithm to real data for the Luciferase gene and the E. coli K12 genome. CONCLUSION: Our algorithm is a fast tool to find the exact solution for the partial digest problem. The percentage of improvement is more than 75% over the best known practical algorithm for the worst case. For large numbers of inputs, our algorithm is able to solve the problem in a suitable time, while the best known practical algorithm is unable.

Assuntos

Algoritmos , Biologia Computacional/métodos , DNA Bacteriano/genética , Escherichia coli/genética , Genoma Bacteriano , Luciferases/genética , Mapeamento por Restrição/métodos

A hybrid method for the exact planted (l, d) motif finding problem and its parallelization.

Abbas, Mostafa M; Abouelhoda, Mohamed; Bahig, Hazem M.

BMC Bioinformatics ; 13 Suppl 17: S10, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-23281969

RESUMO

BACKGROUND: Given a set of DNA sequences s1, ..., st, the (l, d) motif problem is to find an l-length motif sequence M , not necessary existing in any of the input sequences, such that for each sequence si, 1 ≤ i ≤ t, there is at least one subsequence differing with at most d mismatches from M. Many exact algorithms have been developed to solve the motif finding problem in the last three decades. However, the problem is still challenging and its solution is limited to small values of l and d. RESULTS: In this paper we present a new efficient method to improve the performance of the exact algorithms for the motif finding problem. Our method is composed of two main steps: First, we process q ≤ t sequences to find candidate motifs. Second, the candidate motifs are searched in the remaining sequences. For both steps, we use the best available algorithms. Our method is a hybrid one, because it integrates currently existing algorithms to achieve the best running time. In this paper, we show how the optimal value of q is determined to achieve the best running time. Our experimental results show that there is about 24% speed-up achieved by our method compared to the best existing algorithm. Furthermore, we also present a parallel version of our method running on shared memory architecture. Our experiments show that the performance of our algorithm scales linearly with the number of processors. Using the parallel version, we were able to solve the (21, 8) challenging instance using 8 processors in 20.42 hours instead of 6.68 days of the serial version. CONCLUSIONS: Our method speeds up the solution of the exact motif problem. Our method is generic, because it can accommodate any new faster algorithm based on traditional methods. We expect that our method will help to discover longer motifs. The software we developed is available for free for academic research at http://www.nubios.nileu.edu.eg/tools/hymotif.

Assuntos

Biologia Computacional/métodos , Motivos de Nucleotídeos , Análise de Sequência de DNA/métodos , Software

Experimental study of modified voting algorithm for planted (l,d)-motif problem.

Bahig, Hazem M; Abbas, Mostafa M; Bhery, Ashraf.

Adv Exp Med Biol ; 680: 65-73, 2010.

Artigo em Inglês | MEDLINE | ID: mdl-20865487

RESUMO

We consider the planted (l,d)-motif search problem, which consists of finding a substring of length l that occurs in each s ( i ) in a set of input sequences {s (1),,s ( t )} with at most d substitutions. In this paper, we study the effect of using Balla, Davila, and Rajasekaran strategy on voting algorithm practically. We call this technique, modified voting algorithm. We present an experimental study between original and modified voting algorithms on simulated data from (9,d) to (15,d). The comparison shows that the voting algorithm is faster than its modification in all instances except the instance (15,3). We also study the effect of increasing h, which is proposed by Balla, Davila, and Rajasekaran on the modified voting algorithm. From this study, we obtained the values of the number of sequences that make the running time of modified voting algorithm less than the voting algorithm and minimum. Finally, we analyze the experimental results and give some observations according to the relations: (1) l is fixed and d is variable. (2) l is variable and d is fixed. (3) l and d are variables. (4) (l,d) is challenging.

Assuntos

Algoritmos , DNA/genética , Motivos de Aminoácidos , Sequência de Bases , Sítios de Ligação/genética , Biologia Computacional , Bases de Dados de Ácidos Nucleicos , Reconhecimento Automatizado de Padrão , Ferramenta de Busca , Alinhamento de Sequência/estatística & dados numéricos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA