Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
Nucleic Acids Res ; 40(Web Server issue): W303-9, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22553365

RESUMEN

CSA is a web server for the computation, evaluation and comprehensive comparison of pairwise protein structure alignments. Its exact alignment engine computes either optimal, top-scoring alignments or heuristic alignments with quality guarantee for the inter-residue distance-based scorings of contact map overlap, PAUL, DALI and MATRAS. These and additional, uploaded alignments are compared using a number of quality measures and intuitive visualizations. CSA brings new insight into the structural relationship of the protein pairs under investigation and is a valuable tool for studying structural similarities. It is available at http://csa.project.cwi.nl.


Asunto(s)
Programas Informáticos , Homología Estructural de Proteína , Algoritmos , Calmodulina/química , Internet
2.
Algorithms Mol Biol ; 19(1): 5, 2024 Feb 06.
Artículo en Inglés | MEDLINE | ID: mdl-38321522

RESUMEN

BACKGROUND: Scaffolding is an intermediate stage of fragment assembly. It consists in orienting and ordering the contigs obtained by the assembly of the sequencing reads. In the general case, the problem has been largely studied with the use of distances data between the contigs. Here we focus on a dedicated scaffolding for the chloroplast genomes. As these genomes are small, circular and with few specific repeats, numerous approaches have been proposed to assemble them. However, their specificities have not been sufficiently exploited. RESULTS: We give a new formulation for the scaffolding in the case of chloroplast genomes as a discrete optimisation problem, that we prove the decision version to be [Formula: see text]-Complete. We take advantage of the knowledge of chloroplast genomes and succeed in expressing the relationships between a few specific genomic repeats in mathematical constraints. Our approach is independent of the distances and adopts a genomic regions view, with the priority on scaffolding the repeats first. In this way, we encode the structural haplotype issue in order to retrieve several genome forms that coexist in the same chloroplast cell. To solve exactly the optimisation problem, we develop an integer linear program that we implement in Python3 package khloraascaf. We test it on synthetic data to investigate its performance behaviour and its robustness against several chosen difficulties. CONCLUSIONS: We succeed to model biological knowledge on genomic structures to scaffold chloroplast genomes. Our results suggest that modelling genomic regions is sufficient for scaffolding repeats and is suitable for finding several solutions corresponding to several genome forms.

3.
J Bioinform Comput Biol ; 17(3): 1950014, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-31288643

RESUMEN

This paper focuses on the last two stages of genome assembly, namely, scaffolding and gap-filling, and shows that they can be solved as part of a single optimization problem. Our approach is based on modeling genome assembly as a problem of finding a simple path in a specific graph that satisfies as many distance constraints as possible encoding the insert-size information. We formulate it as a mixed-integer linear programming (MILP) problem and apply an optimization solver to find the exact solutions on a benchmark of chloroplasts. We show that the presence of repetitions in the set of unitigs is the main reason for the existence of multiple equivalent solutions that are associated to alternative subpaths. We also describe two sufficient conditions and we design efficient algorithms for identifying these subpaths. Comparisons of the results achieved by our tool with the ones obtained with recent assemblers are presented.


Asunto(s)
Algoritmos , Genoma del Cloroplasto , Mapeo Contig/métodos , Genoma de Planta , Modelos Genéticos
4.
Nucleic Acids Res ; 32(1): 17-24, 2004.
Artículo en Inglés | MEDLINE | ID: mdl-14704339

RESUMEN

Genome sequence data can be used to analyze genome plasticity by whole genome PCR scanning. Small sized chromosomes can indeed be fully amplified by long-range PCR with a set of primers designed using a reference strain and applied to several other strains. Analysis of the resulting patterns can reveal the genome plasticity. To facilitate such analysis, we have developed GenoFrag, a software package for the design of primers optimized for whole genome scanning by long-range PCR. GenoFrag was developed for the analysis of Staphylococcus aureus genome plasticity by whole genome amplification in approximately 10 kb-long fragments. A set of primers was generated from the genome sequence of S.aureus N315, employed here as a reference strain. Two subsets of primers were successfully used to amplify two portions of the N315 chromosome. This experimental validation demonstrates that GenoFrag is a robust and reliable tool for primer design and that whole genome PCR scanning can be envisaged for the analysis of genome diversity in S.aureus, one of the major public health concerns worldwide.


Asunto(s)
Cartilla de ADN/genética , Genoma , Reacción en Cadena de la Polimerasa/métodos , Programas Informáticos , Emparejamiento Base , Cromosomas Bacterianos/genética , Cartilla de ADN/química , Genoma Bacteriano , Genómica/métodos , Hibridación de Ácido Nucleico , Sensibilidad y Especificidad , Staphylococcus aureus/clasificación , Staphylococcus aureus/genética , Termodinámica
5.
Artículo en Inglés | MEDLINE | ID: mdl-23702541

RESUMEN

We present a mathematical model and exact algorithm for optimally aligning protein structures using the DALI scoring model. This scoring model is based on comparing the interresidue distance matrices of proteins and is used in the popular DALI software tool, a heuristic method for protein structure alignment. Our model and algorithm extend an integer linear programming approach that has been previously applied for the related, but simpler, contact map overlap problem. To this end, we introduce a novel type of constraint that handles negative score values and relax it in a Lagrangian fashion. The new algorithm, which we call DALIX, is applicable to any distance matrix-based scoring scheme. We also review options that allow to consider fewer pairs of interresidue distances explicitly because their large number hinders the optimization process. Using four known data sets of varying structural similarity, we compute many provably score-optimal DALI alignments. This allowed, for the first time, to evaluate the DALI heuristic in sound mathematical terms. The results indicate that DALI usually computes optimal or close to optimal alignments. However, we detect a subset of small proteins for which DALI fails to generate any significant alignment, although such alignments do exist.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Modelos Químicos , Proteínas/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Bases de Datos de Proteínas , Modelos Moleculares
6.
J Comput Biol ; 18(1): 27-41, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21210730

RESUMEN

Among the measures for quantifying the similarity between three-dimensional (3D) protein structures, maximum contact map overlap (CMO) received sustained attention during the past decade. Despite this, the known algorithms exhibit modest performance and are not applicable for large-scale comparison. This article offers a clear advance in this respect. We present a new integer programming model for CMO and propose an exact branch-and-bound algorithm with bounds obtained by a novel Lagrangian relaxation. The efficiency of the approach is demonstrated on a popular small benchmark (Skolnick set, 40 domains). On this set, our algorithm significantly outperforms the best existing exact algorithms. Many hard CMO instances have been solved for the first time. To further assess our approach, we constructed a large-scale set of 300 protein domains. Computing the similarity measure for any of the 44850 pairs, we obtained a classification in excellent agreement with SCOP. Supplementary Material is available at www.liebertonline.com/cmb.


Asunto(s)
Algoritmos , Modelos Moleculares , Conformación Proteica , Proteínas/química , Simulación por Computador , Bases de Datos de Proteínas , Homología Estructural de Proteína
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA