Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Nucleic Acids Res ; 39(Database issue): D277-82, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21071426

RESUMEN

The Protein-RNA Interface Database (PRIDB) is a comprehensive database of protein-RNA interfaces extracted from complexes in the Protein Data Bank (PDB). It is designed to facilitate detailed analyses of individual protein-RNA complexes and their interfaces, in addition to automated generation of user-defined data sets of protein-RNA interfaces for statistical analyses and machine learning applications. For any chosen PDB complex or list of complexes, PRIDB rapidly displays interfacial amino acids and ribonucleotides within the primary sequences of the interacting protein and RNA chains. PRIDB also identifies ProSite motifs in protein chains and FR3D motifs in RNA chains and provides links to these external databases, as well as to structure files in the PDB. An integrated JMol applet is provided for visualization of interacting atoms and residues in the context of the 3D complex structures. The current version of PRIDB contains structural information regarding 926 protein-RNA complexes available in the PDB (as of 10 October 2010). Atomic- and residue-level contact information for the entire data set can be downloaded in a simple machine-readable format. Also, several non-redundant benchmark data sets of protein-RNA complexes are provided. The PRIDB database is freely available online at http://bindr.gdcb.iastate.edu/PRIDB.


Asunto(s)
Bases de Datos de Proteínas , Proteínas de Unión al ARN/química , ARN/química , Aminoácidos/química , Sitios de Unión , Conformación de Ácido Nucleico , Conformación Proteica , Ribonucleótidos/química , Interfaz Usuario-Computador
2.
BMC Bioinformatics ; 13: 89, 2012 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-22574904

RESUMEN

BACKGROUND: RNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition 'code' that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction. RESULTS: We provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naïve Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based representation (PSSMSeq) outperform those that use an amino acid identity based representation (IDSeq) or a smoothed PSSM (SmoPSSMSeq); (ii) Structure-based classifiers that use smoothed PSSM representation (SmoPSSMStr) outperform those that use PSSM (PSSMStr) as well as sequence identity based representation (IDStr). PSSMSeq classifiers, when tested on an independent test set of 44 proteins, achieve performance that is comparable to that of three state-of-the-art structure-based predictors (including those that exploit geometric features) in terms of Matthews Correlation Coefficient (MCC), although the structure-based methods achieve substantially higher Specificity (albeit at the expense of Sensitivity) compared to sequence-based methods. We also find that the expected performance of the classifiers on a residue level can be markedly different from that on a protein level. Our experiments show that the classifiers trained on three different non-redundant protein-RNA interface datasets achieve comparable cross-validation performance. However, we find that the results are significantly affected by differences in the distance threshold used to define interface residues. CONCLUSIONS: Our results demonstrate that protein-RNA interface residue predictors that use a PSSM-based encoding of sequence windows outperform classifiers that use other encodings of sequence windows. While structure-based methods that exploit geometric features can yield significant increases in the Specificity of protein-RNA interface residue predictions, such increases are offset by decreases in Sensitivity. These results underscore the importance of comparing alternative methods using rigorous statistical procedures, multiple performance measures, and datasets that are constructed based on several alternative definitions of interface residues and redundancy cutoffs as well as including evaluations on independent test sets into the comparisons.


Asunto(s)
Inteligencia Artificial , Proteínas de Unión al ARN/química , ARN/química , Algoritmos , Aminoácidos/química , Teorema de Bayes , Humanos , Posición Específica de Matrices de Puntuación , Conformación Proteica , ARN/metabolismo , Proteínas de Unión al ARN/metabolismo , Análisis de Secuencia de Proteína , Máquina de Vectores de Soporte
3.
Nucleic Acids Res ; 35(Web Server issue): W578-84, 2007 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-17483510

RESUMEN

Understanding interactions between proteins and RNA is key to deciphering the mechanisms of many important biological processes. Here we describe RNABindR, a web-based server that identifies and displays RNA-binding residues in known protein-RNA complexes and predicts RNA-binding residues in proteins of unknown structure. RNABindR uses a distance cutoff to identify which amino acids contact RNA in solved complex structures (from the Protein Data Bank) and provides a labeled amino acid sequence and a Jmol graphical viewer in which RNA-binding residues are displayed in the context of the three-dimensional structure. Alternatively, RNABindR can use a Naive Bayes classifier trained on a non-redundant set of protein-RNA complexes from the PDB to predict which amino acids in a protein sequence of unknown structure are most likely to bind RNA. RNABindR automatically displays 'high specificity' and 'high sensitivity' predictions of RNA-binding residues. RNABindR is freely available at http://bindr.gdcb.iastate.edu/RNABindR.


Asunto(s)
Aminoácidos/química , Biología Computacional/métodos , Internet , Proteínas/química , Proteínas/metabolismo , ARN/metabolismo , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Animales , Teorema de Bayes , Sitios de Unión , Bases de Datos de Proteínas , Modelos Moleculares , Datos de Secuencia Molecular , Conformación Proteica , Programas Informáticos
4.
Appl Biochem Biotechnol ; 136(3): 291-308, 2007 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-17625235

RESUMEN

Transglutaminase (TGase) is a multifunctional enzyme vital for many physiologic processes, such as cell differentiation, tissue regeneration, and plant pathogenicity. The acyl transfer function of the enzyme can activate primary amines and, consequently, attach them onto a peptidyl glutamine, a reaction important for various in vivo and in vitro protein crosslinking and modification processes. To understand better the structure-function relationship of the enzyme and to develop it further as an industrial biocatalyst, we studied TGase secreted by several Streptomyces species and Phytophthora cactorum. We purified the enzyme from S. lydicus, S. platensis, S. nigrescens, S. cinnamoneus, and S. hachijoensis. The pH and temperature profiles of S. lydicus, S. platensis, and S. nigrescens TGases were determined. The specificity of S. lydicus TGase toward its acyl-accepting amine substrates was characterized. Correlation of the electronic and steric features of the substrates with their reactivity supported the mechanism previously proposed for Streptomyces mobaraensis TGase.


Asunto(s)
Streptomycetaceae/enzimología , Transglutaminasas/metabolismo , Secuencia de Aminoácidos , Concentración de Iones de Hidrógeno , Datos de Secuencia Molecular , Phytophthora/enzimología , Homología de Secuencia de Aminoácido , Streptomycetaceae/genética , Relación Estructura-Actividad , Especificidad por Sustrato , Temperatura , Transglutaminasas/química , Transglutaminasas/aislamiento & purificación
5.
BMC Bioinformatics ; 7: 262, 2006 May 19.
Artículo en Inglés | MEDLINE | ID: mdl-16712732

RESUMEN

BACKGROUND: Understanding the molecular details of protein-DNA interactions is critical for deciphering the mechanisms of gene regulation. We present a machine learning approach for the identification of amino acid residues involved in protein-DNA interactions. RESULTS: We start with a Naïve Bayes classifier trained to predict whether a given amino acid residue is a DNA-binding residue based on its identity and the identities of its sequence neighbors. The input to the classifier consists of the identities of the target residue and 4 sequence neighbors on each side of the target residue. The classifier is trained and evaluated (using leave-one-out cross-validation) on a non-redundant set of 171 proteins. Our results indicate the feasibility of identifying interface residues based on local sequence information. The classifier achieves 71% overall accuracy with a correlation coefficient of 0.24, 35% specificity and 53% sensitivity in identifying interface residues as evaluated by leave-one-out cross-validation. We show that the performance of the classifier is improved by using sequence entropy of the target residue (the entropy of the corresponding column in multiple alignment obtained by aligning the target sequence with its sequence homologs) as additional input. The classifier achieves 78% overall accuracy with a correlation coefficient of 0.28, 44% specificity and 41% sensitivity in identifying interface residues. Examination of the predictions in the context of 3-dimensional structures of proteins demonstrates the effectiveness of this method in identifying DNA-binding sites from sequence information. In 33% (56 out of 171) of the proteins, the classifier identifies the interaction sites by correctly recognizing at least half of the interface residues. In 87% (149 out of 171) of the proteins, the classifier correctly identifies at least 20% of the interface residues. This suggests the possibility of using such classifiers to identify potential DNA-binding motifs and to gain potentially useful insights into sequence correlates of protein-DNA interactions. CONCLUSION: Naïve Bayes classifiers trained to identify DNA-binding residues using sequence information offer a computationally efficient approach to identifying putative DNA-binding sites in DNA-binding proteins and recognizing potential DNA-binding motifs.


Asunto(s)
Sitios de Unión , Biología Computacional/métodos , Técnicas Genéticas , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de Proteína/métodos , Algoritmos , Teorema de Bayes , Bases de Datos de Proteínas , Entropía , Humanos , Modelos Moleculares , Curva ROC , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Programas Informáticos
6.
Pac Symp Biocomput ; : 501-12, 2008.
Artículo en Inglés | MEDLINE | ID: mdl-18229711

RESUMEN

Telomerase is a ribonucleoprotein enzyme that adds telomeric DNA repeat sequences to the ends of linear chromosomes. The enzyme plays pivotal roles in cellular senescence and aging, and because it provides a telomere maintenance mechanism for approximately 90% of human cancers, it is a promising target for cancer therapy. Despite its importance, a high-resolution structure of the telomerase enzyme has been elusive, although a crystal structure of an N-terminal domain (TEN) of the telomerase reverse transcriptase subunit (TERT) from Tetrahymena has been reported. In this study, we used a comparative strategy, in which sequence-based machine learning approaches were integrated with computational structural modeling, to explore the potential conservation of structural and functional features of TERT in phylogenetically diverse species. We generated structural models of the N-terminal domains from human and yeast TERT using a combination of threading and homology modeling with the Tetrahymena TEN structure as a template. Comparative analysis of predicted and experimentally verified DNA and RNA binding residues, in the context of these structures, revealed significant similarities in nucleic acid binding surfaces of Tetrahymena and human TEN domains. In addition, the combined evidence from machine learning and structural modeling identified several specific amino acids that are likely to play a role in binding DNA or RNA, but for which no experimental evidence is currently available.


Asunto(s)
Telomerasa/química , Secuencia de Aminoácidos , Animales , Inteligencia Artificial , Sitios de Unión/genética , Biología Computacional , Simulación por Computador , Secuencia Conservada , ADN/química , ADN/metabolismo , Bases de Datos de Proteínas , Humanos , Sustancias Macromoleculares , Modelos Moleculares , Datos de Secuencia Molecular , Conformación Proteica , Estructura Terciaria de Proteína , ARN/química , ARN/metabolismo , Saccharomyces cerevisiae/enzimología , Saccharomyces cerevisiae/genética , Homología de Secuencia de Aminoácido , Telomerasa/genética , Telomerasa/metabolismo , Tetrahymena/enzimología , Tetrahymena/genética
7.
RNA ; 12(8): 1450-62, 2006 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-16790841

RESUMEN

RNA-protein interactions are vitally important in a wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses. We have developed a computational tool for predicting which amino acids of an RNA binding protein participate in RNA-protein interactions, using only the protein sequence as input. RNABindR was developed using machine learning on a validated nonredundant data set of interfaces from known RNA-protein complexes in the Protein Data Bank. It generates a classifier that captures primary sequence signals sufficient for predicting which amino acids in a given protein are located in the RNA-protein interface. In leave-one-out cross-validation experiments, RNABindR identifies interface residues with >85% overall accuracy. It can be calibrated by the user to obtain either high specificity or high sensitivity for interface residues. RNABindR, implementing a Naive Bayes classifier, performs as well as a more complex neural network classifier (to our knowledge, the only previously published sequence-based method for RNA binding site prediction) and offers the advantages of speed, simplicity and interpretability of results. RNABindR predictions on the human telomerase protein hTERT are in good agreement with experimental data. The availability of computational tools for predicting which residues in an RNA binding protein are likely to contact RNA should facilitate design of experiments to directly test RNA binding function and contribute to our understanding of the diversity, mechanisms, and regulation of RNA-protein complexes in biological systems. (RNABindR is available as a Web tool from http://bindr.gdcb.iastate.edu.).


Asunto(s)
Aminoácidos/química , Biología Computacional/métodos , Proteínas/química , Proteínas/metabolismo , ARN/metabolismo , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Animales , Inteligencia Artificial , Teorema de Bayes , Sitios de Unión , Bases de Datos de Proteínas , Humanos , Modelos Moleculares , Datos de Secuencia Molecular , Valor Predictivo de las Pruebas , Conformación Proteica , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Análisis de Secuencia de Proteína , Programas Informáticos
8.
Pac Symp Biocomput ; : 415-26, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-17094257

RESUMEN

Protein-protein and protein nucleic acid interactions are vitally important for a wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses. We have developed machine learning approaches for predicting which amino acids of a protein participate in its interactions with other proteins and/or nucleic acids, using only the protein sequence as input. In this paper, we describe an application of classifiers trained on datasets of well-characterized protein-protein and protein-RNA complexes for which experimental structures are available. We apply these classifiers to the problem of predicting protein and RNA binding sites in the sequence of a clinically important protein for which the structure is not known: the regulatory protein Rev, essential for the replication of HIV-1 and other lentiviruses. We compare our predictions with published biochemical, genetic and partial structural information for HIV-1 and EIAV Rev and with our own published experimental mapping of RNA binding sites in EIAV Rev. The predicted and experimentally determined binding sites are in very good agreement. The ability to predict reliably the residues of a protein that directly contribute to specific binding events--without the requirement for structural information regarding either the protein or complexes in which it participates--can potentially generate new disease intervention strategies.


Asunto(s)
Productos del Gen rev/química , Productos del Gen rev/metabolismo , VIH-1/metabolismo , Virus de la Anemia Infecciosa Equina/metabolismo , Secuencia de Aminoácidos , Animales , Inteligencia Artificial , Sitios de Unión/genética , Biología Computacional , Bases de Datos de Proteínas , Productos del Gen rev/genética , VIH-1/genética , Humanos , Virus de la Anemia Infecciosa Equina/genética , Datos de Secuencia Molecular , Unión Proteica , Estructura Terciaria de Proteína , ARN Viral/genética , ARN Viral/metabolismo , Productos del Gen rev del Virus de la Inmunodeficiencia Humana
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA