Búsqueda | Portal Regional de la BVS

A MOTIF-BASED METHOD FOR PREDICTING INTERFACIAL RESIDUES IN BOTH THE RNA AND PROTEIN COMPONENTS OF PROTEIN-RNA COMPLEXES.

Muppirala, Usha; Lewis, Benjamin A; Mann, Carla M; Dobbs, Drena.

Pac Symp Biocomput ; 21: 445-455, 2016.

Artículo en Inglés | MEDLINE | ID: mdl-26776208

RESUMEN

Efforts to predict interfacial residues in protein-RNA complexes have largely focused on predicting RNA-binding residues in proteins. Computational methods for predicting protein-binding residues in RNA sequences, however, are a problem that has received relatively little attention to date. Although the value of sequence motifs for classifying and annotating protein sequences is well established, sequence motifs have not been widely applied to predicting interfacial residues in macromolecular complexes. Here, we propose a novel sequence motif-based method for "partner-specific" interfacial residue prediction. Given a specific protein-RNA pair, the goal is to simultaneously predict RNA binding residues in the protein sequence and protein-binding residues in the RNA sequence. In 5-fold cross validation experiments, our method, PS-PRIP, achieved 92% Specificity and 61% Sensitivity, with a Matthews correlation coefficient (MCC) of 0.58 in predicting RNA-binding sites in proteins. The method achieved 69% Specificity and 75% Sensitivity, but with a low MCC of 0.13 in predicting protein binding sites in RNAs. Similar performance results were obtained when PS-PRIP was tested on two independent "blind" datasets of experimentally validated protein- RNA interactions, suggesting the method should be widely applicable and valuable for identifying potential interfacial residues in protein-RNA complexes for which structural information is not available. The PS-PRIP webserver and datasets are available at: http://pridb.gdcb.iastate.edu/PSPRIP/.

Asunto(s)

Proteínas de Unión al ARN/química , Proteínas de Unión al ARN/metabolismo , ARN/química , ARN/metabolismo , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Secuencia de Bases , Sitios de Unión/genética , Biología Computacional/métodos , Biología Computacional/estadística & datos numéricos , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Bases de Datos de Proteínas/estadística & datos numéricos , Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Modelos Moleculares , Unión Proteica , ARN/genética , ARN Bacteriano/química , ARN Bacteriano/genética , ARN Bacteriano/metabolismo , ARN Ribosómico 16S/química , ARN Ribosómico 16S/genética , ARN Ribosómico 16S/metabolismo , Proteínas de Unión al ARN/genética , Proteínas Ribosómicas/química , Proteínas Ribosómicas/genética , Proteínas Ribosómicas/metabolismo , Programas Informáticos

Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art.

Walia, Rasna R; Caragea, Cornelia; Lewis, Benjamin A; Towfic, Fadi; Terribilini, Michael; El-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant.

BMC Bioinformatics ; 13: 89, 2012 May 10.

Artículo en Inglés | MEDLINE | ID: mdl-22574904

RESUMEN

BACKGROUND: RNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition 'code' that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction. RESULTS: We provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naïve Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based representation (PSSMSeq) outperform those that use an amino acid identity based representation (IDSeq) or a smoothed PSSM (SmoPSSMSeq); (ii) Structure-based classifiers that use smoothed PSSM representation (SmoPSSMStr) outperform those that use PSSM (PSSMStr) as well as sequence identity based representation (IDStr). PSSMSeq classifiers, when tested on an independent test set of 44 proteins, achieve performance that is comparable to that of three state-of-the-art structure-based predictors (including those that exploit geometric features) in terms of Matthews Correlation Coefficient (MCC), although the structure-based methods achieve substantially higher Specificity (albeit at the expense of Sensitivity) compared to sequence-based methods. We also find that the expected performance of the classifiers on a residue level can be markedly different from that on a protein level. Our experiments show that the classifiers trained on three different non-redundant protein-RNA interface datasets achieve comparable cross-validation performance. However, we find that the results are significantly affected by differences in the distance threshold used to define interface residues. CONCLUSIONS: Our results demonstrate that protein-RNA interface residue predictors that use a PSSM-based encoding of sequence windows outperform classifiers that use other encodings of sequence windows. While structure-based methods that exploit geometric features can yield significant increases in the Specificity of protein-RNA interface residue predictions, such increases are offset by decreases in Sensitivity. These results underscore the importance of comparing alternative methods using rigorous statistical procedures, multiple performance measures, and datasets that are constructed based on several alternative definitions of interface residues and redundancy cutoffs as well as including evaluations on independent test sets into the comparisons.

Asunto(s)

Inteligencia Artificial , Proteínas de Unión al ARN/química , ARN/química , Algoritmos , Aminoácidos/química , Teorema de Bayes , Humanos , Posición Específica de Matrices de Puntuación , Conformación Proteica , ARN/metabolismo , Proteínas de Unión al ARN/metabolismo , Análisis de Secuencia de Proteína , Máquina de Vectores de Soporte

Human telomerase model shows the role of the TEN domain in advancing the double helix for the next polymerization step.

Steczkiewicz, Kamil; Zimmermann, Michael T; Kurcinski, Mateusz; Lewis, Benjamin A; Dobbs, Drena; Kloczkowski, Andrzej; Jernigan, Robert L; Kolinski, Andrzej; Ginalski, Krzysztof.

Proc Natl Acad Sci U S A ; 108(23): 9443-8, 2011 Jun 07.

Artículo en Inglés | MEDLINE | ID: mdl-21606328

RESUMEN

Telomerases constitute a group of specialized ribonucleoprotein enzymes that remediate chromosomal shrinkage resulting from the "end-replication" problem. Defects in telomere length regulation are associated with several diseases as well as with aging and cancer. Despite significant progress in understanding the roles of telomerase, the complete structure of the human telomerase enzyme bound to telomeric DNA remains elusive, with the detailed molecular mechanism of telomere elongation still unknown. By application of computational methods for distant homology detection, comparative modeling, and molecular docking, guided by available experimental data, we have generated a three-dimensional structural model of a partial telomerase elongation complex composed of three essential protein domains bound to a single-stranded telomeric DNA sequence in the form of a heteroduplex with the template region of the human RNA subunit, TER. This model provides a structural mechanism for the processivity of telomerase and offers new insights into elongation. We conclude that the RNADNA heteroduplex is constrained by the telomerase TEN domain through repeated extension cycles and that the TEN domain controls the process by moving the template ahead one base at a time by translation and rotation of the double helix. The RNA region directly following the template can bind complementarily to the newly synthesized telomeric DNA, while the template itself is reused in the telomerase active site during the next reaction cycle. This first structural model of the human telomerase enzyme provides many details of the molecular mechanism of telomerase and immediately provides an important target for rational drug design.

Asunto(s)

ADN/química , Estructura Terciaria de Proteína , Telomerasa/química , Telómero/química , Secuencia de Aminoácidos , Sitios de Unión/genética , Dominio Catalítico , Simulación por Computador , ADN/genética , ADN/metabolismo , Humanos , Cinética , Modelos Moleculares , Datos de Secuencia Molecular , Conformación de Ácido Nucleico , Ácidos Nucleicos Heterodúplex/química , Ácidos Nucleicos Heterodúplex/genética , Ácidos Nucleicos Heterodúplex/metabolismo , Polimerizacion , Unión Proteica , Estructura Secundaria de Proteína , ARN/química , ARN/genética , ARN/metabolismo , Homología de Secuencia de Aminoácido , Telomerasa/genética , Telomerasa/metabolismo , Telómero/genética , Telómero/metabolismo

PRIDB: a Protein-RNA interface database.

Lewis, Benjamin A; Walia, Rasna R; Terribilini, Michael; Ferguson, Jeff; Zheng, Charles; Honavar, Vasant; Dobbs, Drena.

Nucleic Acids Res ; 39(Database issue): D277-82, 2011 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-21071426

RESUMEN

The Protein-RNA Interface Database (PRIDB) is a comprehensive database of protein-RNA interfaces extracted from complexes in the Protein Data Bank (PDB). It is designed to facilitate detailed analyses of individual protein-RNA complexes and their interfaces, in addition to automated generation of user-defined data sets of protein-RNA interfaces for statistical analyses and machine learning applications. For any chosen PDB complex or list of complexes, PRIDB rapidly displays interfacial amino acids and ribonucleotides within the primary sequences of the interacting protein and RNA chains. PRIDB also identifies ProSite motifs in protein chains and FR3D motifs in RNA chains and provides links to these external databases, as well as to structure files in the PDB. An integrated JMol applet is provided for visualization of interacting atoms and residues in the context of the 3D complex structures. The current version of PRIDB contains structural information regarding 926 protein-RNA complexes available in the PDB (as of 10 October 2010). Atomic- and residue-level contact information for the entire data set can be downloaded in a simple machine-readable format. Also, several non-redundant benchmark data sets of protein-RNA complexes are provided. The PRIDB database is freely available online at http://bindr.gdcb.iastate.edu/PRIDB.

Asunto(s)

Bases de Datos de Proteínas , Proteínas de Unión al ARN/química , ARN/química , Aminoácidos/química , Sitios de Unión , Conformación de Ácido Nucleico , Conformación Proteica , Ribonucleótidos/química , Interfaz Usuario-Computador

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA