Búsqueda | Portal de Búsqueda de la BVS Enfermería

Estimation of model accuracy in CASP13.

Cheng, Jianlin; Choe, Myong-Ho; Elofsson, Arne; Han, Kun-Sop; Hou, Jie; Maghrabi, Ali H A; McGuffin, Liam J; Menéndez-Hurtado, David; Olechnovic, Kliment; Schwede, Torsten; Studer, Gabriel; Uziela, Karolis; Venclovas, Ceslovas; Wallner, Björn.

Proteins ; 87(12): 1361-1377, 2019 12.

Artículo en Inglés | MEDLINE | ID: mdl-31265154

RESUMEN

Methods to reliably estimate the accuracy of 3D models of proteins are both a fundamental part of most protein folding pipelines and important for reliable identification of the best models when multiple pipelines are used. Here, we describe the progress made from CASP12 to CASP13 in the field of estimation of model accuracy (EMA) as seen from the progress of the most successful methods in CASP13. We show small but clear progress, that is, several methods perform better than the best methods from CASP12 when tested on CASP13 EMA targets. Some progress is driven by applying deep learning and residue-residue contacts to model accuracy prediction. We show that the best EMA methods select better models than the best servers in CASP13, but that there exists a great potential to improve this further. Also, according to the evaluation criteria based on local similarities, such as lDDT and CAD, it is now clear that single model accuracy methods perform relatively better than consensus-based methods.

Asunto(s)

Biología Computacional , Conformación Proteica , Proteínas/ultraestructura , Programas Informáticos , Algoritmos , Bases de Datos de Proteínas , Modelos Moleculares , Pliegue de Proteína , Proteínas/química , Proteínas/genética , Alineación de Secuencia , Análisis de Secuencia de Proteína

Improved protein model quality assessments by changing the target function.

Uziela, Karolis; Menéndez Hurtado, David; Shu, Nanjiang; Wallner, Björn; Elofsson, Arne.

Proteins ; 86(6): 654-663, 2018 06.

Artículo en Inglés | MEDLINE | ID: mdl-29524250

RESUMEN

Protein modeling quality is an important part of protein structure prediction. We have for more than a decade developed a set of methods for this problem. We have used various types of description of the protein and different machine learning methodologies. However, common to all these methods has been the target function used for training. The target function in ProQ describes the local quality of a residue in a protein model. In all versions of ProQ the target function has been the S-score. However, other quality estimation functions also exist, which can be divided into superposition- and contact-based methods. The superposition-based methods, such as S-score, are based on a rigid body superposition of a protein model and the native structure, while the contact-based methods compare the local environment of each residue. Here, we examine the effects of retraining our latest predictor, ProQ3D, using identical inputs but different target functions. We find that the contact-based methods are easier to predict and that predictors trained on these measures provide some advantages when it comes to identifying the best model. One possible reason for this is that contact based methods are better at estimating the quality of multi-domain targets. However, training on the S-score gives the best correlation with the GDT_TS score, which is commonly used in CASP to score the global model quality. To take the advantage of both of these features we provide an updated version of ProQ3D that predicts local and global model quality estimates based on different quality estimates.

Asunto(s)

Modelos Moleculares , Proteínas/química , Algoritmos , Bases de Datos de Proteínas , Aprendizaje Automático , Conformación Proteica , Programas Informáticos , Relación Estructura-Actividad

Methods for estimation of model accuracy in CASP12.

Elofsson, Arne; Joo, Keehyoung; Keasar, Chen; Lee, Jooyoung; Maghrabi, Ali H A; Manavalan, Balachandran; McGuffin, Liam J; Ménendez Hurtado, David; Mirabello, Claudio; Pilstål, Robert; Sidi, Tomer; Uziela, Karolis; Wallner, Björn.

Proteins ; 86 Suppl 1: 361-373, 2018 03.

Artículo en Inglés | MEDLINE | ID: mdl-28975666

RESUMEN

Methods to reliably estimate the quality of 3D models of proteins are essential drivers for the wide adoption and serious acceptance of protein structure predictions by life scientists. In this article, the most successful groups in CASP12 describe their latest methods for estimates of model accuracy (EMA). We show that pure single model accuracy estimation methods have shown clear progress since CASP11; the 3 top methods (MESHI, ProQ3, SVMQA) all perform better than the top method of CASP11 (ProQ2). Although the pure single model accuracy estimation methods outperform quasi-single (ModFOLD6 variations) and consensus methods (Pcons, ModFOLDclust2, Pcomb-domain, and Wallner) in model selection, they are still not as good as those methods in absolute model quality estimation and predictions of local quality. Finally, we show that when using contact-based model quality measures (CAD, lDDT) the single model quality methods perform relatively better.

Asunto(s)

Biología Computacional/métodos , Modelos Moleculares , Conformación Proteica , Proteínas/química , Bases de Datos de Proteínas , Humanos , Alineación de Secuencia , Análisis de Secuencia de Proteína

Large-scale structure prediction by improved contact predictions and model quality assessment.

Michel, Mirco; Menéndez Hurtado, David; Uziela, Karolis; Elofsson, Arne.

Bioinformatics ; 33(14): i23-i29, 2017 Jul 15.

Artículo en Inglés | MEDLINE | ID: mdl-28881974

RESUMEN

MOTIVATION: Accurate contact predictions can be used for predicting the structure of proteins. Until recently these methods were limited to very big protein families, decreasing their utility. However, recent progress by combining direct coupling analysis with machine learning methods has made it possible to predict accurate contact maps for smaller families. To what extent these predictions can be used to produce accurate models of the families is not known. RESULTS: We present the PconsFold2 pipeline that uses contact predictions from PconsC3, the CONFOLD folding algorithm and model quality estimations to predict the structure of a protein. We show that the model quality estimation significantly increases the number of models that reliably can be identified. Finally, we apply PconsFold2 to 6379 Pfam families of unknown structure and find that PconsFold2 can, with an estimated 90% specificity, predict the structure of up to 558 Pfam families of unknown structure. Out of these, 415 have not been reported before. AVAILABILITY AND IMPLEMENTATION: Datasets as well as models of all the 558 Pfam families are available at http://c3.pcons.net/ . All programs used here are freely available. CONTACT: arne@bioinfo.se.

Asunto(s)

Biología Computacional/métodos , Modelos Moleculares , Conformación Proteica , Programas Informáticos , Aprendizaje Automático , Sensibilidad y Especificidad

ProQ3D: improved model quality assessments using deep learning.

Uziela, Karolis; Menéndez Hurtado, David; Shu, Nanjiang; Wallner, Björn; Elofsson, Arne.

Bioinformatics ; 33(10): 1578-1580, 2017 May 15.

Artículo en Inglés | MEDLINE | ID: mdl-28052925

RESUMEN

SUMMARY: Protein quality assessment is a long-standing problem in bioinformatics. For more than a decade we have developed state-of-art predictors by carefully selecting and optimising inputs to a machine learning method. The correlation has increased from 0.60 in ProQ to 0.81 in ProQ2 and 0.85 in ProQ3 mainly by adding a large set of carefully tuned descriptions of a protein. Here, we show that a substantial improvement can be obtained using exactly the same inputs as in ProQ2 or ProQ3 but replacing the support vector machine by a deep neural network. This improves the Pearson correlation to 0.90 (0.85 using ProQ2 input features). AVAILABILITY AND IMPLEMENTATION: ProQ3D is freely available both as a webserver and a stand-alone program at http://proq3.bioinfo.se/. CONTACT: arne@bioinfo.se. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Biología Computacional/métodos , Redes Neurales de la Computación , Conformación Proteica , Programas Informáticos , Máquina de Vectores de Soporte , Modelos Moleculares

ProQ2: estimation of model accuracy implemented in Rosetta.

Uziela, Karolis; Wallner, Björn.

Bioinformatics ; 32(9): 1411-3, 2016 05 01.

Artículo en Inglés | MEDLINE | ID: mdl-26733453

RESUMEN

MOTIVATION: Model quality assessment programs are used to predict the quality of modeled protein structures. They can be divided into two groups depending on the information they are using: ensemble methods using consensus of many alternative models and methods only using a single model to do its prediction. The consensus methods excel in achieving high correlations between prediction and true quality measures. However, they frequently fail to pick out the best possible model, nor can they be used to generate and score new structures. Single-model methods on the other hand do not have these inherent shortcomings and can be used both to sample new structures and to improve existing consensus methods. RESULTS: Here, we present an implementation of the ProQ2 program to estimate both local and global model accuracy as part of the Rosetta modeling suite. The current implementation does not only make it possible to run large batch runs locally, but it also opens up a whole new arena for conformational sampling using machine learned scoring functions and to incorporate model accuracy estimation in to various existing modeling schemes. ProQ2 participated in CASP11 and results from CASP11 are used to benchmark the current implementation. Based on results from CASP11 and CAMEO-QE, a continuous benchmark of quality estimation methods, it is clear that ProQ2 is the single-model method that performs best in both local and global model accuracy. AVAILABILITY AND IMPLEMENTATION: https://github.com/bjornwallner/ProQ_scripts CONTACT: bjornw@ifm.liu.se SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Modelos Moleculares , Proteínas/química , Conformación Proteica

ProQ3: Improved model quality assessments using Rosetta energy terms.

Uziela, Karolis; Shu, Nanjiang; Wallner, Björn; Elofsson, Arne.

Sci Rep ; 6: 33509, 2016 10 04.

Artículo en Inglés | MEDLINE | ID: mdl-27698390

RESUMEN

Quality assessment of protein models using no other information than the structure of the model itself has been shown to be useful for structure prediction. Here, we introduce two novel methods, ProQRosFA and ProQRosCen, inspired by the state-of-art method ProQ2, but using a completely different description of a protein model. ProQ2 uses contacts and other features calculated from a model, while the new predictors are based on Rosetta energies: ProQRosFA uses the full-atom energy function that takes into account all atoms, while ProQRosCen uses the coarse-grained centroid energy function. The two new predictors also include residue conservation and terms corresponding to the agreement of a model with predicted secondary structure and surface area, as in ProQ2. We show that the performance of these predictors is on par with ProQ2 and significantly better than all other model quality assessment programs. Furthermore, we show that combining the input features from all three predictors, the resulting predictor ProQ3 performs better than any of the individual methods. ProQ3, ProQRosFA and ProQRosCen are freely available both as a webserver and stand-alone programs at http://proq3.bioinfo.se/.

Asunto(s)

Algoritmos , Modelos Moleculares , Proteínas/química , Bases de Datos de Proteínas , Estadísticas no Paramétricas , Máquina de Vectores de Soporte , Termodinámica , Factores de Tiempo

Probe Region Expression Estimation for RNA-Seq Data for Improved Microarray Comparability.

Uziela, Karolis; Honkela, Antti.

PLoS One ; 10(5): e0126545, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-25966034

RESUMEN

Rapidly growing public gene expression databases contain a wealth of data for building an unprecedentedly detailed picture of human biology and disease. This data comes from many diverse measurement platforms that make integrating it all difficult. Although RNA-sequencing (RNA-seq) is attracting the most attention, at present, the rate of new microarray studies submitted to public databases far exceeds the rate of new RNA-seq studies. There is clearly a need for methods that make it easier to combine data from different technologies. In this paper, we propose a new method for processing RNA-seq data that yields gene expression estimates that are much more similar to corresponding estimates from microarray data, hence greatly improving cross-platform comparability. The method we call PREBS is based on estimating the expression from RNA-seq reads overlapping the microarray probe regions, and processing these estimates with standard microarray summarisation algorithms. Using paired microarray and RNA-seq samples from TCGA LAML data set we show that PREBS expression estimates derived from RNA-seq are more similar to microarray-based expression estimates than those from other RNA-seq processing methods. In an experiment to retrieve paired microarray samples from a database using an RNA-seq query sample, gene signatures defined based on PREBS expression estimates were found to be much more accurate than those from other methods. PREBS also allows new ways of using RNA-seq data, such as expression estimation for microarray probe sets. An implementation of the proposed method is available in the Bioconductor package "prebs."

Asunto(s)

Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , ARN/biosíntesis , Secuencia de Bases , Bases de Datos Genéticas , Humanos , ARN/genética , Análisis de Secuencia de ARN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA