Pesquisa | Biblioteca Virtual em Saúde

CSA: comprehensive comparison of pairwise protein structure alignments.

Wohlers, Inken; Malod-Dognin, Noël; Andonov, Rumen; Klau, Gunnar W.

Nucleic Acids Res ; 40(Web Server issue): W303-9, 2012 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-22553365

RESUMO

CSA is a web server for the computation, evaluation and comprehensive comparison of pairwise protein structure alignments. Its exact alignment engine computes either optimal, top-scoring alignments or heuristic alignments with quality guarantee for the inter-residue distance-based scorings of contact map overlap, PAUL, DALI and MATRAS. These and additional, uploaded alignments are compared using a number of quality measures and intuitive visualizations. CSA brings new insight into the structural relationship of the protein pairs under investigation and is a valuable tool for studying structural similarities. It is available at http://csa.project.cwi.nl.

Assuntos

Software , Homologia Estrutural de Proteína , Algoritmos , Calmodulina/química , Internet

Global exact optimisations for chloroplast structural haplotype scaffolding.

Epain, Victor; Andonov, Rumen.

Algorithms Mol Biol ; 19(1): 5, 2024 Feb 06.

Artigo em Inglês | MEDLINE | ID: mdl-38321522

RESUMO

BACKGROUND: Scaffolding is an intermediate stage of fragment assembly. It consists in orienting and ordering the contigs obtained by the assembly of the sequencing reads. In the general case, the problem has been largely studied with the use of distances data between the contigs. Here we focus on a dedicated scaffolding for the chloroplast genomes. As these genomes are small, circular and with few specific repeats, numerous approaches have been proposed to assemble them. However, their specificities have not been sufficiently exploited. RESULTS: We give a new formulation for the scaffolding in the case of chloroplast genomes as a discrete optimisation problem, that we prove the decision version to be [Formula: see text]-Complete. We take advantage of the knowledge of chloroplast genomes and succeed in expressing the relationships between a few specific genomic repeats in mathematical constraints. Our approach is independent of the distances and adopts a genomic regions view, with the priority on scaffolding the repeats first. In this way, we encode the structural haplotype issue in order to retrieve several genome forms that coexist in the same chloroplast cell. To solve exactly the optimisation problem, we develop an integer linear program that we implement in Python3 package khloraascaf. We test it on synthetic data to investigate its performance behaviour and its robustness against several chosen difficulties. CONCLUSIONS: We succeed to model biological knowledge on genomic structures to scaffold chloroplast genomes. Our results suggest that modelling genomic regions is sufficient for scaffolding repeats and is suitable for finding several solutions corresponding to several genome forms.

Complete assembly of circular and chloroplast genomes based on global optimization.

Andonov, Rumen; Djidjev, Hristo; François, Sebastien; Lavenier, Dominique.

J Bioinform Comput Biol ; 17(3): 1950014, 2019 06.

Artigo em Inglês | MEDLINE | ID: mdl-31288643

RESUMO

This paper focuses on the last two stages of genome assembly, namely, scaffolding and gap-filling, and shows that they can be solved as part of a single optimization problem. Our approach is based on modeling genome assembly as a problem of finding a simple path in a specific graph that satisfies as many distance constraints as possible encoding the insert-size information. We formulate it as a mixed-integer linear programming (MILP) problem and apply an optimization solver to find the exact solutions on a benchmark of chloroplasts. We show that the presence of repetitions in the set of unitigs is the main reason for the existence of multiple equivalent solutions that are associated to alternative subpaths. We also describe two sufficient conditions and we design efficient algorithms for identifying these subpaths. Comparisons of the results achieved by our tool with the ones obtained with recent assemblers are presented.

Assuntos

Algoritmos , Genoma de Cloroplastos , Mapeamento de Sequências Contíguas/métodos , Genoma de Planta , Modelos Genéticos

GenoFrag: software to design primers optimized for whole genome scanning by long-range PCR amplification.

Ben Zakour, Nouri; Gautier, Michel; Andonov, Rumen; Lavenier, Dominique; Cochet, Marie-Françoise; Veber, Philippe; Sorokin, Alexei; Le Loir, Yves.

Nucleic Acids Res ; 32(1): 17-24, 2004.

Artigo em Inglês | MEDLINE | ID: mdl-14704339

RESUMO

Genome sequence data can be used to analyze genome plasticity by whole genome PCR scanning. Small sized chromosomes can indeed be fully amplified by long-range PCR with a set of primers designed using a reference strain and applied to several other strains. Analysis of the resulting patterns can reveal the genome plasticity. To facilitate such analysis, we have developed GenoFrag, a software package for the design of primers optimized for whole genome scanning by long-range PCR. GenoFrag was developed for the analysis of Staphylococcus aureus genome plasticity by whole genome amplification in approximately 10 kb-long fragments. A set of primers was generated from the genome sequence of S.aureus N315, employed here as a reference strain. Two subsets of primers were successfully used to amplify two portions of the N315 chromosome. This experimental validation demonstrates that GenoFrag is a robust and reliable tool for primer design and that whole genome PCR scanning can be envisaged for the analysis of genome diversity in S.aureus, one of the major public health concerns worldwide.

Assuntos

Primers do DNA/genética , Genoma , Reação em Cadeia da Polimerase/métodos , Software , Pareamento de Bases , Cromossomos Bacterianos/genética , Primers do DNA/química , Genoma Bacteriano , Genômica/métodos , Hibridização de Ácido Nucleico , Sensibilidade e Especificidade , Staphylococcus aureus/classificação , Staphylococcus aureus/genética , Termodinâmica

DALIX: optimal DALI protein structure alignment.

Wohlers, Inken; Andonov, Rumen; Klau, Gunnar W.

IEEE/ACM Trans Comput Biol Bioinform ; 10(1): 26-36, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23702541

RESUMO

We present a mathematical model and exact algorithm for optimally aligning protein structures using the DALI scoring model. This scoring model is based on comparing the interresidue distance matrices of proteins and is used in the popular DALI software tool, a heuristic method for protein structure alignment. Our model and algorithm extend an integer linear programming approach that has been previously applied for the related, but simpler, contact map overlap problem. To this end, we introduce a novel type of constraint that handles negative score values and relax it in a Lagrangian fashion. The new algorithm, which we call DALIX, is applicable to any distance matrix-based scoring scheme. We also review options that allow to consider fewer pairs of interresidue distances explicitly because their large number hinders the optimization process. Using four known data sets of varying structural similarity, we compute many provably score-optimal DALI alignments. This allowed, for the first time, to evaluate the DALI heuristic in sound mathematical terms. The results indicate that DALI usually computes optimal or close to optimal alignments. However, we detect a subset of small proteins for which DALI fails to generate any significant alignment, although such alignments do exist.

Assuntos

Algoritmos , Biologia Computacional/métodos , Modelos Químicos , Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Bases de Dados de Proteínas , Modelos Moleculares

Maximum contact map overlap revisited.

Andonov, Rumen; Malod-Dognin, Noël; Yanev, Nicola.

J Comput Biol ; 18(1): 27-41, 2011 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-21210730

RESUMO

Among the measures for quantifying the similarity between three-dimensional (3D) protein structures, maximum contact map overlap (CMO) received sustained attention during the past decade. Despite this, the known algorithms exhibit modest performance and are not applicable for large-scale comparison. This article offers a clear advance in this respect. We present a new integer programming model for CMO and propose an exact branch-and-bound algorithm with bounds obtained by a novel Lagrangian relaxation. The efficiency of the approach is demonstrated on a popular small benchmark (Skolnick set, 40 domains). On this set, our algorithm significantly outperforms the best existing exact algorithms. Many hard CMO instances have been solved for the first time. To further assess our approach, we constructed a large-scale set of 300 protein domains. Computing the similarity measure for any of the 44850 pairs, we obtained a classification in excellent agreement with SCOP. Supplementary Material is available at www.liebertonline.com/cmb.

Assuntos

Algoritmos , Modelos Moleculares , Conformação Proteica , Proteínas/química , Simulação por Computador , Bases de Dados de Proteínas , Homologia Estrutural de Proteína

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA