Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
RNA Biol ; 13(4): 391-9, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-26488198

RESUMEN

The 5S rDNA gene is a non-coding RNA that can be found in 2 copies (type I and type II) in bony and cartilaginous fish. Previous studies have pointed out that type II gene is a paralog derived from type I. We analyzed the molecular organization of 5S rDNA type II in elasmobranchs. Although the structure of the 5S rDNA is supposed to be highly conserved, our results show that the secondary structure in this group possesses some variability and is different than the consensus secondary structure. One of these differences in Selachii is an internal loop at nucleotides 7 and 112. These mutations observed in the transcribed region suggest an independent origin of the gene among Batoids and Selachii. All promoters were highly conserved with the exception of BoxA, possibly due to its affinity to polymerase III. This latter enzyme recognizes a dT4 sequence as stop signal, however in Rajiformes this signal was doubled in length to dT8. This could be an adaptation toward a higher efficiency in the termination process. Our results suggest that there is no TATA box in elasmobranchs in the NTS region. We also provide some evidence suggesting that the complexity of the microsatellites present in the NTS region play an important role in the 5S rRNA gene since it is significantly correlated with the length of the NTS.


Asunto(s)
Elasmobranquios/genética , ARN Ribosómico 5S/genética , Animales , Mutación , Conformación de Ácido Nucleico , ARN Ribosómico 5S/química , Especificidad de la Especie , Regiones Terminadoras Genéticas , Transcripción Genética
2.
BMC Struct Biol ; 13: 20, 2013 Oct 16.
Artículo en Inglés | MEDLINE | ID: mdl-24131821

RESUMEN

BACKGROUND: Assessing protein modularity is important to understand protein evolution. Still the question of the existence of a sub-domain modular architecture remains. We propose a graph-theory approach with significance and power testing to identify modules in protein structures. In the first step, clusters are determined by optimizing the partition that maximizes the modularity score. Second, each cluster is tested for significance. Significant clusters are referred to as modules. Evolutionary modules are identified by analyzing homologous structures. Dynamic modules are inferred from sets of snapshots of molecular simulations. We present here a methodology to identify sub-domain architecture robustly, biologically meaningful, and statistically supported. RESULTS: The robustness of this new method is tested using simulated data with known modularity. Modules are correctly identified even when there is a low correlation between landmarks within a module. We also analyzed the evolutionary modularity of a data set of α-amylase catalytic domain homologs, and the dynamic modularity of the Niemann-Pick C1 (NPC1) protein N-terminal domain.The α-amylase contains an (α/ß)8 barrel (TIM barrel) with the polysaccharides cleavage site and a calcium-binding domain. In this data set we identified four robust evolutionary modules, one of which forms the minimal functional TIM barrel topology.The NPC1 protein is involved in the intracellular lipid metabolism coordinating sterol trafficking. NPC1 N-terminus is the first luminal domain which binds to cholesterol and its oxygenated derivatives. Our inferred dynamic modules in the protein NPC1 are also shown to match functional components of the protein related to the NPC1 disease. CONCLUSIONS: A domain compartmentalization can be found and described in correlation space. To our knowledge, there is no other method attempting to identify sub-domain architecture from the correlation among residues. Most attempts made focus on sequence motifs of protein-protein interactions, binding sites, or sequence conservancy. We were able to describe functional/structural sub-domain architecture related to key residues for starch cleavage, calcium, and chloride binding sites in the α-amylase, and sterol opening-defining modules and disease-related residues in the NPC1. We also described the evolutionary sub-domain architecture of the α-amylase catalytic domain, identifying the already reported minimum functional TIM barrel.


Asunto(s)
Estructura Terciaria de Proteína , Proteínas/química , Secuencia de Aminoácidos , Animales , Sitios de Unión , Proteínas Portadoras/química , Proteínas Portadoras/metabolismo , Dominio Catalítico , Colesterol/metabolismo , Evolución Molecular , Humanos , Modelos Químicos , Modelos Moleculares , Simulación de Dinámica Molecular , Unión Proteica , Proteínas/metabolismo , Homología de Secuencia de Aminoácido , alfa-Amilasas/química , alfa-Amilasas/metabolismo
3.
Genome Res ; 19(10): 1896-904, 2009 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-19635847

RESUMEN

The increasing availability of genetic sequence data associated with explicit geographic and ecological information is offering new opportunities to study the processes that shape biodiversity. The generation and testing of hypotheses using these data sets requires effective tools for mathematical and visual analysis that can integrate digital maps, ecological data, and large genetic, genomic, or metagenomic data sets. GenGIS is a free and open-source software package that supports the integration of digital map data with genetic sequences and environmental information from multiple sample sites. Essential bioinformatic and statistical tools are integrated into the software, allowing the user a wide range of analysis options for their sequence data. Data visualizations are combined with the cartographic display to yield a clear view of the relationship between geography and genomic diversity, with a particular focus on the hierarchical clustering of sites based on their similarity or phylogenetic proximity. Here we outline the features of GenGIS and demonstrate its application to georeferenced microbial metagenomic, HIV-1, and human mitochondrial DNA data sets.


Asunto(s)
Bases de Datos Genéticas , Genómica/métodos , Sistemas de Información Geográfica , Programas Informáticos , África , Biodiversidad , Clasificación , ADN Mitocondrial/análisis , ADN Mitocondrial/genética , Variación Genética , VIH-1/clasificación , VIH-1/genética , VIH-1/metabolismo , Humanos , Océanos y Mares , Filogenia , Manejo de Especímenes/métodos
4.
J Biol Chem ; 285(12): 8605-14, 2010 Mar 19.
Artículo en Inglés | MEDLINE | ID: mdl-20083605

RESUMEN

Bacterial acyl carrier protein (ACP) is essential for the synthesis of fatty acids and serves as the major acyl donor for the formation of phospholipids and other lipid products. Acyl-ACP encloses attached fatty acyl groups in a hydrophobic pocket within a four-helix bundle, but must at least partially unfold to present the acyl chain to the active sites of its multiple enzyme partners. To further examine the constraints of ACP structure and function, we have constructed a cyclic version of Vibrio harveyi ACP, using split-intein technology to covalently join its closely apposed N and C termini. Cyclization stabilized ACP in a folded helical conformation as indicated by gel electrophoresis, circular dichroism, fluorescence, and mass spectrometry. Molecular dynamics simulations also indicated overall decreased polypeptide chain mobility in cyclic ACP, although no major conformational rearrangements over a 10-ns period were noted. In vivo complementation assays revealed that cyclic ACP can functionally replace the linear wild-type protein and support growth of an Escherichia coli ACP-null mutant strain. Cyclization of a folding-deficient ACP mutant (F50A) both restored its ability to adopt a folded conformation and enhanced complementation of growth. Our results thus suggest that ACP must be able to adopt a folded conformation for biological activity, and that its function does not require complete unfolding of the protein.


Asunto(s)
Proteína Transportadora de Acilo/química , Inteínas , Dicroismo Circular , Escherichia coli/metabolismo , Prueba de Complementación Genética , Modelos Moleculares , Conformación Molecular , Mutación , Fosfolípidos/química , Conformación Proteica , Desnaturalización Proteica , Pliegue de Proteína , Estructura Secundaria de Proteína , Espectrometría de Masas en Tándem/métodos , Vibrio/metabolismo
5.
Bioinformatics ; 25(23): 3093-8, 2009 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-19770262

RESUMEN

MOTIVATION: Aligning protein sequences with the best possible accuracy requires sophisticated algorithms. Since the optimal alignment is not guaranteed to be the correct one, it is expected that even the best alignment will contain sites that do not respect the assumption of positional homology. Because formulating rules to identify these sites is difficult, it is common practice to manually remove them. Although considered necessary in some cases, manual editing is time consuming and not reproducible. We present here an automated editing method based on the classification of 'valid' and 'invalid' sites. RESULTS: A support vector machine (SVM) classifier is trained to reproduce the decisions made during manual editing with an accuracy of 95.0%. This implies that manual editing can be made reproducible and applied to large-scale analyses. We further demonstrate that it is possible to retrain/extend the training of the classifier by providing examples of multiple sequence alignment (MSA) annotation. Near optimal training can be achieved with only 1000 annotated sites, or roughly three samples of protein sequence alignments. AVAILABILITY: This method is implemented in the software MANUEL, licensed under the GPL. A web-based application for single and batch job is available at http://fester.cs.dal.ca/manuel. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Bases de Datos de Proteínas , Proteínas/química , Programas Informáticos
6.
J Chem Inf Model ; 50(12): 2213-20, 2010 Dec 27.
Artículo en Inglés | MEDLINE | ID: mdl-21090591

RESUMEN

Although the α-helical secondary structure of proteins is well-defined, the exact causes and structures of helical kinks are not. This is especially important for transmembrane (TM) helices of integral membrane proteins, many of which contain kinks providing functional diversity despite predominantly helical structure. We have developed a Monte Carlo method based algorithm, MC-HELAN, to determine helical axes alongside positions and angles of helical kinks. Analysis of all nonredundant high-resolution α-helical membrane protein structures (842 TM helices from 205 polypeptide chains) revealed kinks in 64% of TM helices, demonstrating that a significantly greater proportion of TM helices are kinked than those indicated by previous analyses. The residue proline is over-represented by a factor >5 if it is two or three residues C-terminal to a bend. Prolines also cause kinks with larger kink angles than other residues. However, only 33% of TM kinks are in proximity to a proline. Machine learning techniques were used to test for sequence-based predictors of kinks. Although kinks are somewhat predicted by sequence, kink formation appears to be driven predominantly by other factors. This study provides an improved view of the prevalence and architecture of kinks in helical membrane proteins and highlights the fundamental inaccuracy of the typical topological depiction of helical membrane proteins as series of ideal helices.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Proteínas de la Membrana/química , Bases de Datos de Proteínas , Internet , Modelos Moleculares , Estructura Secundaria de Proteína
7.
PLoS One ; 13(4): e0196135, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29698417

RESUMEN

The Glycoside Hydrolase Family 13 (GH13) is both evolutionarily diverse and relevant to many industrial applications. Its members hydrolyze starch into smaller carbohydrates and members of the family have been bioengineered to improve catalytic function under industrial environments. We introduce a framework to analyze the response to selection of GH13 protein structures given some phylogenetic and simulated dynamic information. We find that the TIM-barrel (a conserved protein fold consisting of eight α-helices and eight parallel ß-strands that alternate along the peptide backbone, common to all amylases) is not selectable since it is under purifying selection. We also show a method to rank important residues with higher inferred response to selection. These residues can be altered to effect change in properties. In this work, we define fitness as inferred thermodynamic stability. We show that under the developed framework, residues 112Y, 122K, 124D, 125W, and 126P are good candidates to increase the stability of the truncated α-amylase protein from Geobacillus thermoleovorans (PDB code: 4E2O; α-1,4-glucan-4-glucanohydrolase; EC 3.2.1.1). Overall, this paper demonstrates the feasibility of a framework for the analysis of protein structures for any other fitness landscape.


Asunto(s)
Glicósido Hidrolasas/química , Bases de Datos de Proteínas , Geobacillus/enzimología , Glicósido Hidrolasas/clasificación , Glicósido Hidrolasas/metabolismo , Simulación de Dinámica Molecular , Filogenia , Conformación Proteica , Termodinámica , alfa-Amilasas/química , alfa-Amilasas/metabolismo
8.
BMC Bioinformatics ; 8: 444, 2007 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-18005425

RESUMEN

BACKGROUND: In protein evolution, the mechanism of the emergence of novel protein domain is still an open question. The incremental growth of protein variable regions, which was produced by stochastic insertions, has the potential to generate large and complex sub-structures. In this study, a deterministic methodology is proposed to reconstruct phylogenies from protein structures, and to infer insertion events in protein evolution. The analysis was performed on a broad range of SCOP domain families. RESULTS: Phylogenies were reconstructed from protein 3D structural data. The phylogenetic trees were used to infer ancestral structures with a consensus method. From these ancestral reconstructions, 42.7% of the observed insertions are nested insertions, which locate in previous insert regions. The average size of inserts tends to increase with the insert rank or total number of insertions in the variable regions. We found that the structures of some nested inserts show complex or even domain-like fold patterns with helices, strands and loops. Furthermore, a basal level of structural innovation was found in inserts which displayed a significant structural similarity exclusively to themselves. The beta-Lactamase/D-ala carboxypeptidase domain family is provided as an example to illustrate the inference of insertion events, and how the incremental growth of a variable region is capable to generate novel structural patterns. CONCLUSION: Using 3D data, we proposed a method to reconstruct phylogenies. We applied the method to reconstruct the sequences of insertion events leading to the emergence of potentially novel structural elements within existing protein domains. The results suggest that structural innovation is possible via the stochastic process of insertions and rapid evolution within variable regions where inserts tend to be nested. We also demonstrate that the structure-based phylogeny enables the study of new questions relating to the evolution of protein domain and biological function.


Asunto(s)
Elementos Transponibles de ADN/genética , Evolución Molecular , Modelos Químicos , Modelos Genéticos , Modelos Moleculares , Proteínas , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Simulación por Computador , Datos de Secuencia Molecular , Filogenia , Conformación Proteica , Proteínas/química , Proteínas/genética , Proteínas/ultraestructura , Relación Estructura-Actividad
9.
Nucleic Acids Res ; 31(2): 790-7, 2003 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-12527789

RESUMEN

Comparative sequence analysis has been used to study specific questions about the structure and function of proteins for many years. Here we propose a knowledge-based framework in which the maximum likelihood rate of evolution is used to quantify the level of constraint on the identity of a site. We demonstrate that site-rate mapping on 3D structures using datasets of rhodopsin-like G-protein receptors and alpha- and beta-tubulins provides an excellent tool for pinpointing the functional features shared between orthologous and paralogous proteins. In addition, functional divergence within protein families can be inferred by examining the differences in the site rates, the differences in the chemical properties of the side chains or amino acid usage between aligned sites. Two novel analytical methods are introduced to characterize rate- independent functional divergence. These are tested using a dataset of two classes of HMG-CoA reductases for which only one class can perform both the forward and reverse reaction. We show that functionally divergent sites occur in a cluster of sites interacting with the catalytic residues and that this information should facilitate the design of experimental strategies to directly test functional properties of residues.


Asunto(s)
Filogenia , Conformación Proteica , Proteínas/genética , Animales , Evolución Molecular , Proteínas de Unión al GTP/metabolismo , Variación Genética , Humanos , Hidroximetilglutaril-CoA Reductasas/química , Hidroximetilglutaril-CoA Reductasas/genética , Fosfopiruvato Hidratasa/química , Fosfopiruvato Hidratasa/genética , Proteínas/química , Receptores de Superficie Celular/química , Receptores de Superficie Celular/genética , Receptores de Superficie Celular/metabolismo , Rodopsina/química , Rodopsina/genética , Tubulina (Proteína)/química , Tubulina (Proteína)/genética
10.
Nucleic Acids Res ; 31(14): 4227-37, 2003 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-12853641

RESUMEN

A number of methods have recently been published that use phylogenetic information extracted from large multiple sequence alignments to detect sites that have changed properties in related protein families. In this study we use such methods to assess functional divergence between eukaryotic EF-1alpha (eEF-1alpha), archaebacterial EF-1alpha (aEF-1alpha) and two eukaryote-specific EF-1alpha paralogs-eukaryotic release factor 3 (eRF3) and Hsp70 subfamily B suppressor 1 (HBS1). Overall, the evolutionary modes of aEF-1alpha, HBS1 and eRF3 appear to significantly differ from that of eEF-1alpha. However, functionally divergent (FD) sites detected between aEF-1alpha and eEF-1alpha only weakly overlap with sites implicated as putative EF-1beta or aminoacyl-tRNA (aa-tRNA) binding residues in EF-1alpha, as expected based on the shared ancestral primary translational functions of these two orthologs. In contrast, FD sites detected between eEF-1alpha and its paralogs significantly overlap with the putative EF-1beta and/or aa-tRNA binding sites in EF-1alpha. In eRF3 and HBS1, these sites appear to be released from functional constraints, indicating that they bind neither eEF-1beta nor aa-tRNA. These results are consistent with experimental observations that eRF3 does not bind to aa-tRNA, but do not support the 'EF-1alpha-like' function recently proposed for HBS1. We re-assess the available genetic data for HBS1 in light of our analyses, and propose that this protein may function in stop codon-independent peptide release.


Asunto(s)
Células Eucariotas/metabolismo , Factor 1 de Elongación Peptídica/genética , Secuencia de Aminoácidos , Animales , Archaea/genética , Bacterias/genética , Sitios de Unión/genética , ADN Complementario/química , ADN Complementario/genética , ADN Protozoario/química , ADN Protozoario/genética , Dictyostelium/genética , Diplomonadida/genética , Variación Genética , Giardia lamblia/genética , Datos de Secuencia Molecular , Factor 1 de Elongación Peptídica/química , Filogenia , Conformación Proteica , Estructura Terciaria de Proteína , Alineación de Secuencia , Análisis de Secuencia de ADN , Homología de Secuencia de Aminoácido , Trichomonas vaginalis/genética , Trypanosoma brucei brucei/genética
11.
Nucleic Acids Res ; 30(2): 532-44, 2002 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-11788716

RESUMEN

Class 1 release factor in eukaryotes (eRF1) recognizes stop codons and promotes peptide release from the ribosome. The 'molecular mimicry' hypothesis suggests that domain 1 of eRF1 is analogous to the tRNA anticodon stem-loop. Recent studies strongly support this hypothesis and several models for specific interactions between stop codons and residues in domain 1 have been proposed. In this study we have sequenced and identified novel eRF1 sequences across a wide diversity of eukaryotes and re-evaluated the codon-binding site by bioinformatic analyses of a large eRF1 dataset. Analyses of the eRF1 structure combined with estimates of evolutionary rates at amino acid sites allow us to define the residues that are under structural (i.e. those involved in intramolecular interactions) versus non-structural selective constraints. Furthermore, we have re-assessed convergent substitutions in the ciliate variant code eRF1s using maximum likelihood-based phylogenetic approaches. Our results favor the model proposed by Bertram et al. that stop codons bind to three 'cavities' on the protein surface, although we suggest that the stop codon may bind in the opposite orientation to the original model. We assess the feasibility of this alternative binding orientation with a triplet stop codon and the eRF1 domain 1 structures using molecular modeling techniques.


Asunto(s)
Codón de Terminación/metabolismo , Células Eucariotas/química , Evolución Molecular , Factores de Terminación de Péptidos/química , Factores de Terminación de Péptidos/metabolismo , Secuencia de Aminoácidos , Animales , Anticodón/química , Anticodón/genética , Anticodón/metabolismo , Secuencia de Bases , Sitios de Unión , Codón de Terminación/química , Codón de Terminación/genética , Secuencia Conservada/genética , Bases de Datos Genéticas , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Humanos , Modelos Biológicos , Modelos Moleculares , Imitación Molecular , Datos de Secuencia Molecular , Conformación de Ácido Nucleico , Factores de Terminación de Péptidos/genética , Filogenia , Proteínas de Plantas/química , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Estructura Secundaria de Proteína , Estructura Terciaria de Proteína , Proteínas Protozoarias/química , Proteínas Protozoarias/genética , Proteínas Protozoarias/metabolismo , Especificidad por Sustrato
12.
Curr Protein Pept Sci ; 17(1): 62-71, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-26412786

RESUMEN

Protein structures can be conceptualized as context-aware self-organizing systems. One of its emerging properties is a modular architecture. Such modular architecture has been identified as domains and defined as its units of evolution and function. However, this modular architecture is not exclusively defined by domains. Also, the definition of a domain is an ongoing debate. Here we propose differentiating structural, evolutionary and functional domains as distinct concepts. Defining domains or modules is confounded by diverse definitions of the concept, and also by other elements inherent to protein structures. An apparent hierarchy in protein structure architecture is one of these elements, where lower level interactions may create noise for the definition of higher levels. Diverse modularity-molding factors such as folding, function, and selection, can have a misleading effect when trying to define a given type of module. It is thus important to keep in mind this complexity when defining modularity in protein structures and interpreting the outcome modularity inference approaches.


Asunto(s)
Modelos Moleculares , Conformación Proteica , Proteínas/química , Semántica , Animales , Sitios de Unión , Evolución Biológica , Unión Proteica , Dominios y Motivos de Interacción de Proteínas , Relación Estructura-Actividad
13.
IEEE J Biomed Health Inform ; 20(1): 424-31, 2016 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25494516

RESUMEN

Non-small cell lung cancer (NSCLC) constitutes the most common type of lung cancer and is frequently diagnosed at advanced stages. Clinical studies have shown that molecular targeted therapies increase survival and improve quality of life in patients. Nevertheless, the realization of personalized therapies for NSCLC faces a number of challenges including the integration of clinical and genetic data and a lack of clinical decision support tools to assist physicians with patient selection. To address this problem, we used frequent pattern mining to establish the relationships of patient characteristics and tumor response in advanced NSCLC. Univariate analysis determined that smoking status, histology, epidermal growth factor receptor (EGFR) mutation, and targeted drug were significantly associated with response to targeted therapy. We applied four classifiers to predict treatment outcome from EGFR tyrosine kinase inhibitors. Overall, the highest classification accuracy was 76.56% and the area under the curve was 0.76. The decision tree used a combination of EGFR mutations, histology, and smoking status to predict tumor response and the output was both easily understandable and in keeping with current knowledge. Our findings suggest that support vector machines and decision trees are a promising approach for clinical decision support in the patient selection for targeted therapy in advanced NSCLC.


Asunto(s)
Antineoplásicos/uso terapéutico , Carcinoma de Pulmón de Células no Pequeñas/tratamiento farmacológico , Árboles de Decisión , Neoplasias Pulmonares/tratamiento farmacológico , Modelos Biológicos , Medicina de Precisión , Anciano , Carcinoma de Pulmón de Células no Pequeñas/clasificación , Carcinoma de Pulmón de Células no Pequeñas/genética , Minería de Datos , Bases de Datos Factuales , Receptores ErbB/genética , Femenino , Humanos , Neoplasias Pulmonares/clasificación , Neoplasias Pulmonares/genética , Masculino , Persona de Mediana Edad , Mutación/genética , Reconocimiento de Normas Patrones Automatizadas , Máquina de Vectores de Soporte
14.
BMC Bioinformatics ; 6: 138, 2005 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-15938750

RESUMEN

BACKGROUND: An increasing number of bioinformatics methods are considering the phylogenetic relationships between biological sequences. Implementing new methodologies using the maximum likelihood phylogenetic framework can be a time consuming task. RESULTS: The bioinformatics library libcov is a collection of C++ classes that provides a high and low-level interface to maximum likelihood phylogenetics, sequence analysis and a data structure for structural biological methods. libcov can be used to compute likelihoods, search tree topologies, estimate site rates, cluster sequences, manipulate tree structures and compare phylogenies for a broad selection of applications. CONCLUSION: Using this library, it is possible to rapidly prototype applications that use the sophistication of phylogenetic likelihoods without getting involved in a major software engineering project. libcov is thus a potentially valuable building block to develop in-house methodologies in the field of protein phylogenetics.


Asunto(s)
Biología Computacional/instrumentación , Biología Computacional/métodos , Lenguajes de Programación , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Diseño de Software , Algoritmos , Análisis por Conglomerados , Computadores , Interpretación Estadística de Datos , Bases de Datos de Proteínas , Evolución Molecular , Biblioteca de Genes , Funciones de Verosimilitud , Filogenia , Análisis de Secuencia de ADN , Programas Informáticos
15.
Protein Sci ; 13(3): 608-16, 2004 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-14978301

RESUMEN

The rapidly evolving subsets of a protein are often evident in multiple sequence alignments as poorly defined, gap-containing regions. We investigated the 3D context of these regions observed in 28 protein structures containing a GTP-binding domain assumed to be homologous to the transforming factor p21-RAS. The phylogenetic depth of this data set is such that it is possible to observe lineages sharing a common protein core that diverged early in the eukaryotic cell history. The sequence variability among these homolog proteins is directly linked to the structural variability of surface loops. We demonstrate that these regions are self-contained and thus mostly free of the evolutionary constraints imposed by the conserved core of the domain. These intraloop interactions have the property to create stem-like structures. Interestingly, these stem-like structures can be observed in loops of varying size, up to the size of small protein domains. We propose a model under which the diversity of protein topologies observed in these loops can be the product of a stochastic sampling of sequence and conformational space in a near-neutral fashion, while the proximity of the functional features of the domain core allows novel beneficial traits to be fixed. Our comparative observations, limited here to the proteins containing the RAS-like GTP-binding domain, suggest that a stochastic process of insertion/deletion analogous to "budding" of loops is a likely mechanism of structural innovation. Such a framework could be experimentally exploited to investigate the folding of increasingly complex model inserts.


Asunto(s)
Evolución Molecular , Proteínas de Unión al GTP/química , Secuencia de Aminoácidos , Animales , Sitios de Unión/genética , Factor 2 Eucariótico de Iniciación/química , Factor 2 Eucariótico de Iniciación/genética , Subunidades alfa de la Proteína de Unión al GTP Gs/química , Subunidades alfa de la Proteína de Unión al GTP Gs/genética , Proteínas de Unión al GTP/genética , Eliminación de Gen , Humanos , Modelos Genéticos , Modelos Moleculares , Datos de Secuencia Molecular , Mutagénesis Insercional , Filogenia , Conformación Proteica , Estructura Secundaria de Proteína , Proteínas/química , Proteínas/genética , Proteínas Proto-Oncogénicas p21(ras)/química , Proteínas Proto-Oncogénicas p21(ras)/genética , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/genética , Alineación de Secuencia , Procesos Estocásticos , Homología Estructural de Proteína , Proteínas de Unión al GTP rab/química , Proteínas de Unión al GTP rab/genética
16.
PLoS One ; 9(11): e113438, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25409022

RESUMEN

Community structure detection is an important tool in graph analysis. This can be done, among other ways, by solving for the partition set which optimizes the modularity scores [Formula: see text]. Here it is shown that topological constraints in correlation graphs induce over-fragmentation of community structures. A refinement step to this optimization based on Linear Discriminant Analysis (LDA) and a statistical test for significance is proposed. In structured simulation constrained by topology, this novel approach performs better than the optimization of modularity alone. This method was also tested with two empirical datasets: the Roll-Call voting in the 110th US Senate constrained by geographic adjacency, and a biological dataset of 135 protein structures constrained by inter-residue contacts. The former dataset showed sub-structures in the communities that revealed a regional bias in the votes which transcend party affiliations. This is an interesting pattern given that the 110th Legislature was assumed to be a highly polarized government. The [Formula: see text]-amylase catalytic domain dataset (biological dataset) was analyzed with and without topological constraints (inter-residue contacts). The results without topological constraints showed differences with the topology constrained one, but the LDA filtering did not change the outcome of the latter. This suggests that the LDA filtering is a robust way to solve the possible over-fragmentation when present, and that this method will not affect the results where there is no evidence of over-fragmentation.


Asunto(s)
Algoritmos , Dominio Catalítico , Bases de Datos Factuales , Análisis Discriminante , alfa-Amilasas/química , alfa-Amilasas/metabolismo
17.
Evol Bioinform Online ; 4: 17-27, 2008 Feb 09.
Artículo en Inglés | MEDLINE | ID: mdl-19204804

RESUMEN

The subtree prune and regraft distance (d(SPR)) between phylogenetic trees is important both as a general means of comparing phylogenetic tree topologies as well as a measure of lateral gene transfer (LGT). Although there has been extensive study on the computation of d(SPR) and similar metrics between rooted trees, much less is known about SPR distances for unrooted trees, which often arise in practice when the root is unresolved. We show that unrooted SPR distance computation is NP-Hard and verify which techniques from related work can and cannot be applied. We then present an efficient heuristic algorithm for this problem and benchmark it on a variety of synthetic datasets. Our algorithm computes the exact SPR distance between unrooted tree, and the heuristic element is only with respect to the algorithm's computation time. Our method is a heuristic version of a fixed parameter tractability (FPT) approach and our experiments indicate that the running time behaves similar to FPT algorithms. For real data sets, our algorithm was able to quickly compute d(SPR) for the majority of trees that were part of a study of LGT in 144 prokaryotic genomes. Our analysis of its performance, especially with respect to searching and reduction rules, is applicable to computing many related distance measures.

18.
J Mol Evol ; 64(1): 80-9, 2007 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17160642

RESUMEN

We examine the impact of likelihood surface characteristics on phylogenetic inference. Amino acid data sets simulated from topologies with branch length features chosen to represent varying degrees of difficulty for likelihood maximization are analyzed. We present situations where the tree found to achieve the global maximum in likelihood is often not equal to the true tree. We use the program covSEARCH to demonstrate how the use of adaptively sized pools of candidate trees that are updated using confidence tests results in solution sets that are highly likely to contain the true tree. This approach requires more computation than traditional maximum likelihood methods, hence covSEARCH is best suited to small to medium-sized alignments or large alignments with some constrained nodes. The majority rule consensus tree computed from the confidence sets also proves to be different from the generating topology. Although low phylogenetic signal in the input alignment can result in large confidence sets of trees, some biological information can still be obtained based on nodes that exhibit high support within the confidence set. Two real data examples are analyzed: mammal mitochondrial proteins and a small tubulin alignment. We conclude that the technique of confidence set optimization can significantly improve the robustness of phylogenetic inference at a reasonable computational cost. Additionally, when either very short internal branches or very long terminal branches are present, confident resolution of specific bipartitions or subtrees, rather than whole-tree phylogenies, may be the most realistic goal for phylogenetic methods.


Asunto(s)
Algoritmos , Modelos Biológicos , Filogenia , Ascomicetos/genética , Funciones de Verosimilitud , Proteínas Mitocondriales/genética , Alineación de Secuencia/métodos , Tubulina (Proteína)
19.
J Mol Model ; 12(2): 221-8, 2006 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-16247602

RESUMEN

In this study, a new ab initio method named CLOOP has been developed to build all-atom loop conformations. In this method, a loop main-chain conformation is generated by sampling main-chain dihedral angles from a restrained varphi/psi set, and the side-chain conformations are built randomly. The CHARMM all-atom force field was used to evaluate the loop conformations. Soft core potentials were used to treat the non-bond interactions, and a designed energy-minimization technique was used to close and optimize the loop conformations. It is shown that the two strategies improve the computational efficiency and the loop-closure rate substantially compared to normal minimization methods. CLOOP was used to construct the conformations of 4-, 8-, and 12-residue loops in Fiser's test set. The average main-chain root-mean-square deviations obtained in 1,000 trials for the 10 different loops of each size are 0.33, 1.27, and 2.77 A, respectively. CLOOP can build all-atom loop conformations with a sampling accuracy comparable with previous loop main-chain construction algorithms. [Figure: see text].


Asunto(s)
Biología Computacional/métodos , Modelos Moleculares , Conformación Proteica , Algoritmos , Aminoácidos/química , Bioquímica/métodos , Conformación Molecular , Estructura Molecular
20.
Biochemistry ; 44(25): 9013-21, 2005 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-15966725

RESUMEN

Mandelate racemase (MR, EC 5.1.2.2) from Pseudomonas putida catalyzes the Mg(2+)-dependent 1,1-proton transfer that interconverts the enantiomers of mandelate. Crystal structures of MR reveal that the phenyl group of all ground-state ligands is located within a hydrophobic cavity, remote from the site of proton abstraction. MR forms numerous electrostatic and H-bonding interactions with the alpha-OH and carboxyl groups of the substrate, suggesting that these polar groups may remain relatively fixed in position during catalysis while the phenyl group is free to move between two binding sites [i.e., the R-pocket and the S-pocket for binding the phenyl group of (R)-mandelate and (S)-mandelate, respectively]. We show that MR binds benzilate (K(i) = 0.67 +/- 0.12 mM) and (S)-cyclohexylphenylglycolate (K(i) = 0.50 +/- 0.03 mM) as competitive inhibitors with affinities similar to that which the enzyme exhibits for the substrate. Therefore, the active site can simultaneously accommodate two phenyl groups, consistent with the existence of an R-pocket and an S-pocket. Wild-type MR exhibits a slightly higher affinity for (S)-mandelate [i.e., K(m)(S)(-)(man) < K(m)(R)(-)(man)] but catalyzes the turnover of (R)-mandelate slightly more rapidly (i.e., k(cat)(R)(-->)(S) > k(cat)(S)(-->)(R)). Upon introduction of steric bulk into the S-pocket using site-directed mutagenesis (i.e., the F52W, Y54W, and F52W/Y54W mutants), this catalytic preference is reversed. Although the catalytic efficiency (k(cat)/K(m)) of all the mutants was reduced (11-280-fold), all mutants exhibited a higher affinity for (R)-mandelate than for (S)-mandelate, and higher turnover numbers with (S)-mandelate as the substrate, relative to those with (R)-mandelate. (R)- and (S)-2-hydroxybutyrate are expected to be less sensitive to the additional steric bulk in the S-pocket. Unlike those for mandelate, the relative binding affinities for these substrate analogues are not reversed. These results are consistent with steric obstruction in the S-pocket and support the hypothesis that the phenyl group of the substrate may move between an R-pocket and an S-pocket during racemization. These conclusions were also supported by modeling of the binary complexes of the wild-type and F52W/Y54W enzymes with the substrate analogues (R)- and (S)-atrolactate, and of wild-type MR with bound benzilate using molecular dynamics simulations.


Asunto(s)
Movimiento , Fenol/química , Fenol/metabolismo , Racemasas y Epimerasas/química , Racemasas y Epimerasas/metabolismo , Sitios de Unión , Catálisis , Simulación por Computador , Interacciones Hidrofóbicas e Hidrofílicas , Hidroxibutiratos/química , Hidroxibutiratos/farmacología , Isomerismo , Cinética , Ácidos Mandélicos/química , Ácidos Mandélicos/metabolismo , Modelos Moleculares , Mutación/genética , Fenilalanina/genética , Fenilalanina/metabolismo , Estructura Terciaria de Proteína , Pseudomonas putida/enzimología , Pseudomonas putida/genética , Racemasas y Epimerasas/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA