Pesquisa | Biblioteca Virtual em Saúde Fiocruz

1.

Outcomes of the EMDataResource cryo-EM Ligand Modeling Challenge.

Lawson, Catherine L; Kryshtafovych, Andriy; Pintilie, Grigore D; Burley, Stephen K; Cerný, Jirí; Chen, Vincent B; Emsley, Paul; Gobbi, Alberto; Joachimiak, Andrzej; Noreng, Sigrid; Prisant, Michael G; Read, Randy J; Richardson, Jane S; Rohou, Alexis L; Schneider, Bohdan; Sellers, Benjamin D; Shao, Chenghua; Sourial, Elizabeth; Williams, Chris I; Williams, Christopher J; Yang, Ying; Abbaraju, Venkat; Afonine, Pavel V; Baker, Matthew L; Bond, Paul S; Blundell, Tom L; Burnley, Tom; Campbell, Arthur; Cao, Renzhi; Cheng, Jianlin; Chojnowski, Grzegorz; Cowtan, K D; DiMaio, Frank; Esmaeeli, Reza; Giri, Nabin; Grubmüller, Helmut; Hoh, Soon Wen; Hou, Jie; Hryc, Corey F; Hunte, Carola; Igaev, Maxim; Joseph, Agnel P; Kao, Wei-Chun; Kihara, Daisuke; Kumar, Dilip; Lang, Lijun; Lin, Sean; Maddhuri Venkata Subramaniya, Sai R; Mittal, Sumit; Mondal, Arup.

Nat Methods ; 2024 Jun 25.

Artigo em Inglês | MEDLINE | ID: mdl-38918604

RESUMO

The EMDataResource Ligand Model Challenge aimed to assess the reliability and reproducibility of modeling ligands bound to protein and protein-nucleic acid complexes in cryogenic electron microscopy (cryo-EM) maps determined at near-atomic (1.9-2.5 Å) resolution. Three published maps were selected as targets: Escherichia coli beta-galactosidase with inhibitor, SARS-CoV-2 virus RNA-dependent RNA polymerase with covalently bound nucleotide analog and SARS-CoV-2 virus ion channel ORF3a with bound lipid. Sixty-one models were submitted from 17 independent research groups, each with supporting workflow details. The quality of submitted ligand models and surrounding atoms were analyzed by visual inspection and quantification of local map quality, model-to-map fit, geometry, energetics and contact scores. A composite rather than a single score was needed to assess macromolecule+ligand model quality. These observations lead us to recommend best practices for assessing cryo-EM structures of liganded macromolecules reported at near-atomic resolution.

2.

Enhancing cryo-EM structure prediction with DeepTracer and AlphaFold2 integration.

Chen, Jason; Zia, Ayisha; Luo, Albert; Meng, Hanze; Wang, Fengbin; Hou, Jie; Cao, Renzhi; Si, Dong.

Brief Bioinform ; 25(3)2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38609330

RESUMO

Understanding the protein structures is invaluable in various biomedical applications, such as vaccine development. Protein structure model building from experimental electron density maps is a time-consuming and labor-intensive task. To address the challenge, machine learning approaches have been proposed to automate this process. Currently, the majority of the experimental maps in the database lack atomic resolution features, making it challenging for machine learning-based methods to precisely determine protein structures from cryogenic electron microscopy density maps. On the other hand, protein structure prediction methods, such as AlphaFold2, leverage evolutionary information from protein sequences and have recently achieved groundbreaking accuracy. However, these methods often require manual refinement, which is labor intensive and time consuming. In this study, we present DeepTracer-Refine, an automated method that refines AlphaFold predicted structures by aligning them to DeepTracers modeled structure. Our method was evaluated on 39 multi-domain proteins and we improved the average residue coverage from 78.2 to 90.0% and average local Distance Difference Test score from 0.67 to 0.71. We also compared DeepTracer-Refine with Phenixs AlphaFold refinement and demonstrated that our method not only performs better when the initial AlphaFold model is less precise but also surpasses Phenix in run-time performance.

Assuntos

Evolução Biológica , Aprendizado de Máquina , Microscopia Crioeletrônica , Sequência de Aminoácidos , Bases de Dados Factuais

3.

Fast and automated protein-DNA/RNA macromolecular complex modeling from cryo-EM maps.

Nakamura, Andrew; Meng, Hanze; Zhao, Minglei; Wang, Fengbin; Hou, Jie; Cao, Renzhi; Si, Dong.

Brief Bioinform ; 24(2)2023 03 19.

Artigo em Inglês | MEDLINE | ID: mdl-36682003

RESUMO

Cryo-electron microscopy (cryo-EM) allows a macromolecular structure such as protein-DNA/RNA complexes to be reconstructed in a three-dimensional coulomb potential map. The structural information of these macromolecular complexes forms the foundation for understanding the molecular mechanism including many human diseases. However, the model building of large macromolecular complexes is often difficult and time-consuming. We recently developed DeepTracer-2.0, an artificial-intelligence-based pipeline that can build amino acid and nucleic acid backbones from a single cryo-EM map, and even predict the best-fitting residues according to the density of side chains. The experiments showed improved accuracy and efficiency when benchmarking the performance on independent experimental maps of protein-DNA/RNA complexes and demonstrated the promising future of macromolecular modeling from cryo-EM maps. Our method and pipeline could benefit researchers worldwide who work in molecular biomedicine and drug discovery, and substantially increase the throughput of the cryo-EM model building. The pipeline has been integrated into the web portal https://deeptracer.uw.edu/.

Assuntos

DNA , RNA , Humanos , Microscopia Crioeletrônica/métodos , Modelos Moleculares , Conformação Proteica , Substâncias Macromoleculares/química

4.

ComplexQA: a deep graph learning approach for protein complex structure assessment.

Zhang, Lei; Wang, Sheng; Hou, Jie; Si, Dong; Zhu, Junyong; Cao, Renzhi.

Brief Bioinform ; 24(6)2023 09 22.

Artigo em Inglês | MEDLINE | ID: mdl-37930021

RESUMO

MOTIVATION: In recent years, the end-to-end deep learning method for single-chain protein structure prediction has achieved high accuracy. For example, the state-of-the-art method AlphaFold, developed by Google, has largely increased the accuracy of protein structure predictions to near experimental accuracy in some of the cases. At the same time, there are few methods that can evaluate the quality of protein complexes at the residue level. In particular, evaluating the quality of residues at the interface of protein complexes can lead to a wide range of applications, such as protein function analysis and drug design. In this paper, we introduce a new deep graph neural network-based method ComplexQA, to evaluate the local quality of interfaces for protein complexes by utilizing the residue-level structural information in 3D space and the sequence-level constraints. RESULTS: We benchmark our method to other state-of-the-art quality assessment approaches on the HAF2 and DBM55-AF2 datasets (high-quality structural models predicted by AlphaFold-Multimer), and the BM5 docking dataset. The experimental results show that our proposed method achieves better or similar performance compared with other state-of-the-art methods, especially on difficult targets which only contain a few acceptable models. Our method is able to suggest a score for each interfac e residue, which demonstrates a powerful assessment tool for the ever-increasing number of protein complexes. AVAILABILITY: https://github.com/Cao-Labs/ComplexQA.git. Contact: caora@plu.edu.

Assuntos

Redes Neurais de Computação , Proteínas , Proteínas/química

5.

Integrated bulk and single-cell transcriptomes reveal pyroptotic signature in prognosis and therapeutic options of hepatocellular carcinoma by combining deep learning.

Liu, Yang; Li, Hanlin; Zeng, Tianyu; Wang, Yang; Zhang, Hongqi; Wan, Ying; Shi, Zheng; Cao, Renzhi; Tang, Hua.

Brief Bioinform ; 25(1)2023 11 22.

Artigo em Inglês | MEDLINE | ID: mdl-38197309

RESUMO

Although some pyroptosis-related (PR) prognostic models for cancers have been reported, pyroptosis-based features have not been fully discovered at the single-cell level in hepatocellular carcinoma (HCC). In this study, by deeply integrating single-cell and bulk transcriptome data, we systematically investigated significance of the shared pyroptotic signature at both single-cell and bulk levels in HCC prognosis. Based on the pyroptotic signature, a robust PR risk system was constructed to quantify the prognostic risk of individual patient. To further verify capacity of the pyroptotic signature on predicting patients' prognosis, an attention mechanism-based deep neural network classification model was constructed. The mechanisms of prognostic difference in the patients with distinct PR risk were dissected on tumor stemness, cancer pathways, transcriptional regulation, immune infiltration and cell communications. A nomogram model combining PR risk with clinicopathologic data was constructed to evaluate the prognosis of individual patients in clinic. The PR risk could also evaluate therapeutic response to neoadjuvant therapies in HCC patients. In conclusion, the constructed PR risk system enables a comprehensive assessment of tumor microenvironment characteristics, accurate prognosis prediction and rational therapeutic options in HCC.

Assuntos

Carcinoma Hepatocelular , Aprendizado Profundo , Neoplasias Hepáticas , Humanos , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/terapia , Transcriptoma , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/terapia , Comunicação Celular , Microambiente Tumoral/genética

6.

Cryo-EM model validation recommendations based on outcomes of the 2019 EMDataResource challenge.

Lawson, Catherine L; Kryshtafovych, Andriy; Adams, Paul D; Afonine, Pavel V; Baker, Matthew L; Barad, Benjamin A; Bond, Paul; Burnley, Tom; Cao, Renzhi; Cheng, Jianlin; Chojnowski, Grzegorz; Cowtan, Kevin; Dill, Ken A; DiMaio, Frank; Farrell, Daniel P; Fraser, James S; Herzik, Mark A; Hoh, Soon Wen; Hou, Jie; Hung, Li-Wei; Igaev, Maxim; Joseph, Agnel P; Kihara, Daisuke; Kumar, Dilip; Mittal, Sumit; Monastyrskyy, Bohdan; Olek, Mateusz; Palmer, Colin M; Patwardhan, Ardan; Perez, Alberto; Pfab, Jonas; Pintilie, Grigore D; Richardson, Jane S; Rosenthal, Peter B; Sarkar, Daipayan; Schäfer, Luisa U; Schmid, Michael F; Schröder, Gunnar F; Shekhar, Mrinal; Si, Dong; Singharoy, Abishek; Terashi, Genki; Terwilliger, Thomas C; Vaiana, Andrea; Wang, Liguo; Wang, Zhe; Wankowicz, Stephanie A; Williams, Christopher J; Winn, Martyn; Wu, Tianqi.

Nat Methods ; 18(2): 156-164, 2021 02.

Artigo em Inglês | MEDLINE | ID: mdl-33542514

RESUMO

This paper describes outcomes of the 2019 Cryo-EM Model Challenge. The goals were to (1) assess the quality of models that can be produced from cryogenic electron microscopy (cryo-EM) maps using current modeling software, (2) evaluate reproducibility of modeling results from different software developers and users and (3) compare performance of current metrics used for model evaluation, particularly Fit-to-Map metrics, with focus on near-atomic resolution. Our findings demonstrate the relatively high accuracy and reproducibility of cryo-EM models derived by 13 participating teams from four benchmark maps, including three forming a resolution series (1.8 to 3.1 Å). The results permit specific recommendations to be made about validating near-atomic cryo-EM structures both in the context of individual experiments and structure data archives such as the Protein Data Bank. We recommend the adoption of multiple scoring parameters to provide full and objective annotation and assessment of the model, reflective of the observed cryo-EM map density.

Assuntos

Microscopia Crioeletrônica/métodos , Modelos Moleculares , Cristalografia por Raios X , Conformação Proteica , Proteínas/química

7.

ZoomQA: residue-level protein model accuracy estimation with machine learning on sequential and 3D structural features.

Hippe, Kyle; Lilley, Cade; William Berkenpas, Joshua; Chandana Pocha, Ciri; Kishaba, Kiyomi; Ding, Hui; Hou, Jie; Si, Dong; Cao, Renzhi.

Brief Bioinform ; 23(1)2022 01 17.

Artigo em Inglês | MEDLINE | ID: mdl-34553747

RESUMO

MOTIVATION: The Estimation of Model Accuracy problem is a cornerstone problem in the field of Bioinformatics. As of CASP14, there are 79 global QA methods, and a minority of 39 residue-level QA methods with very few of them working on protein complexes. Here, we introduce ZoomQA, a novel, single-model method for assessing the accuracy of a tertiary protein structure/complex prediction at residue level, which have many applications such as drug discovery. ZoomQA differs from others by considering the change in chemical and physical features of a fragment structure (a portion of a protein within a radius $r$ of the target amino acid) as the radius of contact increases. Fourteen physical and chemical properties of amino acids are used to build a comprehensive representation of every residue within a protein and grade their placement within the protein as a whole. Moreover, we have shown the potential of ZoomQA to identify problematic regions of the SARS-CoV-2 protein complex. RESULTS: We benchmark ZoomQA on CASP14, and it outperforms other state-of-the-art local QA methods and rivals state of the art QA methods in global prediction metrics. Our experiment shows the efficacy of these new features and shows that our method is able to match the performance of other state-of-the-art methods without the use of homology searching against databases or PSSM matrices. AVAILABILITY: http://zoomQA.renzhitech.com.

Assuntos

COVID-19 , Caspases/química , Aprendizado de Máquina , Modelos Moleculares , SARS-CoV-2/química , Proteínas Virais/química , Humanos , Estrutura Quaternária de Proteína , Estrutura Terciária de Proteína , Análise de Sequência de Proteína

8.

Application of artificial intelligence and machine learning for COVID-19 drug discovery and vaccine design.

Lv, Hao; Shi, Lei; Berkenpas, Joshua William; Dao, Fu-Ying; Zulfiqar, Hasan; Ding, Hui; Zhang, Yang; Yang, Liming; Cao, Renzhi.

Brief Bioinform ; 22(6)2021 11 05.

Artigo em Inglês | MEDLINE | ID: mdl-34410360

RESUMO

The global pandemic of coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2, has led to a dramatic loss of human life worldwide. Despite many efforts, the development of effective drugs and vaccines for this novel virus will take considerable time. Artificial intelligence (AI) and machine learning (ML) offer promising solutions that could accelerate the discovery and optimization of new antivirals. Motivated by this, in this paper, we present an extensive survey on the application of AI and ML for combating COVID-19 based on the rapidly emerging literature. Particularly, we point out the challenges and future directions associated with state-of-the-art solutions to effectively control the COVID-19 pandemic. We hope that this review provides researchers with new insights into the ways AI and ML fight and have fought the COVID-19 outbreak.

Assuntos

Tratamento Farmacológico da COVID-19 , Vacinas contra COVID-19/genética , Descoberta de Drogas , SARS-CoV-2/genética , Inteligência Artificial , COVID-19/genética , COVID-19/virologia , Vacinas contra COVID-19/química , Desenho de Fármacos , Humanos , Aprendizado de Máquina , Pandemias , SARS-CoV-2/química , SARS-CoV-2/patogenicidade

9.

Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13.

Hou, Jie; Wu, Tianqi; Cao, Renzhi; Cheng, Jianlin.

Proteins ; 87(12): 1165-1178, 2019 12.

Artigo em Inglês | MEDLINE | ID: mdl-30985027

RESUMO

Predicting residue-residue distance relationships (eg, contacts) has become the key direction to advance protein structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, distance-driven template-free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template-free and template-based structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue-residue features such as coevolution scores to substantially improve contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template-based modeling targets. Deep learning also successfully integrated one-dimensional structural features, two-dimensional contact information, and three-dimensional structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system clearly shows that protein contact distance prediction and model selection driven by deep learning holds the key of solving protein structure prediction problem. However, there are still challenges in accurately predicting protein contact distance when there are few homologous sequences, folding proteins from noisy contact distances, and ranking models of hard targets.

Assuntos

Biologia Computacional , Conformação Proteica , Proteínas/ultraestrutura , Software , Algoritmos , Bases de Dados de Proteínas , Aprendizado Profundo , Modelos Moleculares , Redes Neurais de Computação , Dobramento de Proteína , Estrutura Terciária de Proteína/genética , Proteínas/química , Proteínas/genética , Análise de Sequência de Proteína

10.

A Glycine max sodium/hydrogen exchanger enhances salt tolerance through maintaining higher Na⁺ efflux rate and K⁺/Na⁺ ratio in Arabidopsis.

Sun, Tian-Jie; Fan, Long; Yang, Jun; Cao, Ren-Zhi; Yang, Chun-Yan; Zhang, Jie; Wang, Dong-Mei.

BMC Plant Biol ; 19(1): 469, 2019 Nov 05.

Artigo em Inglês | MEDLINE | ID: mdl-31690290

RESUMO

BACKGROUND: Soybean (Glycine max (L.)) is one the most important oil-yielding cash crops. However, the soybean production has been seriously restricted by salinization. It is therefore crucial to identify salt tolerance-related genes and reveal molecular mechanisms underlying salt tolerance in soybean crops. A better understanding of how plants resist salt stress provides insights in improving existing soybean varieties as well as cultivating novel salt tolerant varieties. In this study, the biological function of GmNHX1, a NHX-like gene, and the molecular basis underlying GmNHX1-mediated salt stress resistance have been revealed. RESULTS: We found that the transcription level of GmNHX1 was up-regulated under salt stress condition in soybean, reaching its peak at 24 h after salt treatment. By employing the virus-induced gene silencing technique (VIGS), we also found that soybean plants became more susceptible to salt stress after silencing GmNHX1 than wild-type and more silenced plants wilted than wild-type under salt treatment. Furthermore, Arabidopsis thaliana expressing GmNHX1 grew taller and generated more rosette leaves under salt stress condition compared to wild-type. Exogenous expression of GmNHX1 resulted in an increase of Na+ transportation to leaves along with a reduction of Na+ absorption in roots, and the consequent maintenance of a high K+/Na+ ratio under salt stress condition. GmNHX1-GFP-transformed onion bulb endothelium cells showed fluorescent pattern in which GFP fluorescence signals enriched in vacuolar membranes. Using the non-invasive micro-test technique (NMT), we found that the Na+ efflux rate of both wild-type and transformed plants after salt treatment were significantly higher than that of before salt treatment. Additionally, the Na+ efflux rate of transformed plants after salt treatment were significantly higher than that of wild-type. Meanwhile, the transcription levels of three osmotic stress-related genes, SKOR, SOS1 and AKT1 were all up-regulated in GmNHX1-expressing plants under salt stress condition. CONCLUSION: Vacuolar membrane-localized GmNHX1 enhances plant salt tolerance through maintaining a high K+/Na+ ratio along with inducing the expression of SKOR, SOS1 and AKT1. Our findings provide molecular insights on the roles of GmNHX1 and similar sodium/hydrogen exchangers in regulating salt tolerance.

Assuntos

Glycine max/metabolismo , Proteínas de Plantas/metabolismo , Tolerância ao Sal/genética , Plantas Tolerantes a Sal/metabolismo , Trocadores de Sódio-Hidrogênio/metabolismo , Arabidopsis/genética , Inativação Gênica , Proteínas de Plantas/genética , Potássio/metabolismo , Plantas Tolerantes a Sal/genética , Sódio/metabolismo , Trocadores de Sódio-Hidrogênio/genética , Glycine max/genética , Estresse Fisiológico/genética , Regulação para Cima , Vacúolos/metabolismo

11.

QAcon: single model quality assessment using protein structural and contact information with machine learning techniques.

Cao, Renzhi; Adhikari, Badri; Bhattacharya, Debswapna; Sun, Miao; Hou, Jie; Cheng, Jianlin.

Bioinformatics ; 33(4): 586-588, 2017 02 15.

Artigo em Inglês | MEDLINE | ID: mdl-28035027

RESUMO

Motivation: Protein model quality assessment (QA) plays a very important role in protein structure prediction. It can be divided into two groups of methods: single model and consensus QA method. The consensus QA methods may fail when there is a large portion of low quality models in the model pool. Results: In this paper, we develop a novel single-model quality assessment method QAcon utilizing structural features, physicochemical properties, and residue contact predictions. We apply residue-residue contact information predicted by two protein contact prediction methods PSICOV and DNcon to generate a new score as feature for quality assessment. This novel feature and other 11 features are used as input to train a two-layer neural network on CASP9 datasets to predict the quality of a single protein model. We blindly benchmarked our method QAcon on CASP11 dataset as the MULTICOM-CLUSTER server. Based on the evaluation, our method is ranked as one of the top single model QA methods. The good performance of the features based on contact prediction illustrates the value of using contact information in protein quality assessment. Availability and Implementation: The web server and the source code of QAcon are freely available at: http://cactus.rnet.missouri.edu/QAcon. Contact: chengji@missouri.edu. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Aprendizado de Máquina , Modelos Moleculares , Proteínas/química , Animais , Humanos , Conformação Proteica , Proteínas/metabolismo , Controle de Qualidade

12.

3Drefine: an interactive web server for efficient protein structure refinement.

Bhattacharya, Debswapna; Nowotny, Jackson; Cao, Renzhi; Cheng, Jianlin.

Nucleic Acids Res ; 44(W1): W406-9, 2016 07 08.

Artigo em Inglês | MEDLINE | ID: mdl-27131371

RESUMO

3Drefine is an interactive web server for consistent and computationally efficient protein structure refinement with the capability to perform web-based statistical and visual analysis. The 3Drefine refinement protocol utilizes iterative optimization of hydrogen bonding network combined with atomic-level energy minimization on the optimized model using a composite physics and knowledge-based force fields for efficient protein structure refinement. The method has been extensively evaluated on blind CASP experiments as well as on large-scale and diverse benchmark datasets and exhibits consistent improvement over the initial structure in both global and local structural quality measures. The 3Drefine web server allows for convenient protein structure refinement through a text or file input submission, email notification, provided example submission and is freely available without any registration requirement. The server also provides comprehensive analysis of submissions through various energy and statistical feedback and interactive visualization of multiple refined models through the JSmol applet that is equipped with numerous protein model analysis tools. The web server has been extensively tested and used by many users. As a result, the 3Drefine web server conveniently provides a useful tool easily accessible to the community. The 3Drefine web server has been made publicly available at the URL: http://sysbio.rnet.missouri.edu/3Drefine/.

Assuntos

Internet , Modelos Moleculares , Proteínas/química , Software , Caspases/química , Conjuntos de Dados como Assunto , Ligação de Hidrogênio , Bases de Conhecimento , Conformação Proteica

13.

UniCon3D: de novo protein structure prediction using united-residue conformational search via stepwise, probabilistic sampling.

Bhattacharya, Debswapna; Cao, Renzhi; Cheng, Jianlin.

Bioinformatics ; 32(18): 2791-9, 2016 09 15.

Artigo em Inglês | MEDLINE | ID: mdl-27259540

RESUMO

MOTIVATION: Recent experimental studies have suggested that proteins fold via stepwise assembly of structural units named 'foldons' through the process of sequential stabilization. Alongside, latest developments on computational side based on probabilistic modeling have shown promising direction to perform de novo protein conformational sampling from continuous space. However, existing computational approaches for de novo protein structure prediction often randomly sample protein conformational space as opposed to experimentally suggested stepwise sampling. RESULTS: Here, we develop a novel generative, probabilistic model that simultaneously captures local structural preferences of backbone and side chain conformational space of polypeptide chains in a united-residue representation and performs experimentally motivated conditional conformational sampling via stepwise synthesis and assembly of foldon units that minimizes a composite physics and knowledge-based energy function for de novo protein structure prediction. The proposed method, UniCon3D, has been found to (i) sample lower energy conformations with higher accuracy than traditional random sampling in a small benchmark of 6 proteins; (ii) perform comparably with the top five automated methods on 30 difficult target domains from the 11th Critical Assessment of Protein Structure Prediction (CASP) experiment and on 15 difficult target domains from the 10th CASP experiment; and (iii) outperform two state-of-the-art approaches and a baseline counterpart of UniCon3D that performs traditional random sampling for protein modeling aided by predicted residue-residue contacts on 45 targets from the 10th edition of CASP. AVAILABILITY AND IMPLEMENTATION: Source code, executable versions, manuals and example data of UniCon3D for Linux and OSX are freely available to non-commercial users at http://sysbio.rnet.missouri.edu/UniCon3D/ CONTACT: chengji@missouri.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Modelos Estatísticos , Conformação Proteica , Benchmarking , Biologia Computacional/métodos , Modelos Moleculares , Física , Probabilidade , Proteínas

14.

Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks.

Cao, Renzhi; Cheng, Jianlin.

Methods ; 93: 84-91, 2016 Jan 15.

Artigo em Inglês | MEDLINE | ID: mdl-26370280

RESUMO

MOTIVATIONS: Protein function prediction is an important and challenging problem in bioinformatics and computational biology. Functionally relevant biological information such as protein sequences, gene expression, and protein-protein interactions has been used mostly separately for protein function prediction. One of the major challenges is how to effectively integrate multiple sources of both traditional and new information such as spatial gene-gene interaction networks generated from chromosomal conformation data together to improve protein function prediction. RESULTS: In this work, we developed three different probabilistic scores (MIS, SEQ, and NET score) to combine protein sequence, function associations, and protein-protein interaction and spatial gene-gene interaction networks for protein function prediction. The MIS score is mainly generated from homologous proteins found by PSI-BLAST search, and also association rules between Gene Ontology terms, which are learned by mining the Swiss-Prot database. The SEQ score is generated from protein sequences. The NET score is generated from protein-protein interaction and spatial gene-gene interaction networks. These three scores were combined in a new Statistical Multiple Integrative Scoring System (SMISS) to predict protein function. We tested SMISS on the data set of 2011 Critical Assessment of Function Annotation (CAFA). The method performed substantially better than three base-line methods and an advanced method based on protein profile-sequence comparison, profile-profile comparison, and domain co-occurrence networks according to the maximum F-measure.

Assuntos

Mineração de Dados/métodos , Redes Reguladoras de Genes/genética , Domínios e Motivos de Interação entre Proteínas/genética , Mapeamento de Interação de Proteínas/métodos , Análise de Sequência de Proteína/métodos , Bases de Dados de Proteínas

15.

ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network.

Cao, Renzhi; Freitas, Colton; Chan, Leong; Sun, Miao; Jiang, Haiqing; Chen, Zhangxin.

Molecules ; 22(10)2017 Oct 17.

Artigo em Inglês | MEDLINE | ID: mdl-29039790

RESUMO

With the development of next generation sequencing techniques, it is fast and cheap to determine protein sequences but relatively slow and expensive to extract useful information from protein sequences because of limitations of traditional biological experimental techniques. Protein function prediction has been a long standing challenge to fill the gap between the huge amount of protein sequences and the known function. In this paper, we propose a novel method to convert the protein function problem into a language translation problem by the new proposed protein sequence language "ProLan" to the protein function language "GOLan", and build a neural machine translation model based on recurrent neural networks to translate "ProLan" language to "GOLan" language. We blindly tested our method by attending the latest third Critical Assessment of Function Annotation (CAFA 3) in 2016, and also evaluate the performance of our methods on selected proteins whose function was released after CAFA competition. The good performance on the training and testing datasets demonstrates that our new proposed method is a promising direction for protein function prediction. In summary, we first time propose a method which converts the protein function prediction problem to a language translation problem and applies a neural machine translation model for protein function prediction.

Assuntos

Biologia Computacional/métodos , Redes Neurais de Computação , Proteínas/metabolismo , Software , Algoritmos , Bases de Dados de Proteínas , Ontologia Genética , Aprendizado de Máquina , Reprodutibilidade dos Testes

16.

DeepQA: improving the estimation of single protein model quality with deep belief networks.

Cao, Renzhi; Bhattacharya, Debswapna; Hou, Jie; Cheng, Jianlin.

BMC Bioinformatics ; 17(1): 495, 2016 Dec 05.

Artigo em Inglês | MEDLINE | ID: mdl-27919220

RESUMO

BACKGROUND: Protein quality assessment (QA) useful for ranking and selecting protein models has long been viewed as one of the major challenges for protein tertiary structure prediction. Especially, estimating the quality of a single protein model, which is important for selecting a few good models out of a large model pool consisting of mostly low-quality models, is still a largely unsolved problem. RESULTS: We introduce a novel single-model quality assessment method DeepQA based on deep belief network that utilizes a number of selected features describing the quality of a model from different perspectives, such as energy, physio-chemical characteristics, and structural information. The deep belief network is trained on several large datasets consisting of models from the Critical Assessment of Protein Structure Prediction (CASP) experiments, several publicly available datasets, and models generated by our in-house ab initio method. Our experiments demonstrate that deep belief network has better performance compared to Support Vector Machines and Neural Networks on the protein model quality assessment problem, and our method DeepQA achieves the state-of-the-art performance on CASP11 dataset. It also outperformed two well-established methods in selecting good outlier models from a large set of models of mostly low quality generated by ab initio modeling methods. CONCLUSION: DeepQA is a useful deep learning tool for protein single model quality assessment and protein structure prediction. The source code, executable, document and training/test datasets of DeepQA for Linux is freely available to non-commercial users at http://cactus.rnet.missouri.edu/DeepQA/ .

Assuntos

Aprendizado de Máquina , Modelos Moleculares , Redes Neurais de Computação , Proteínas/química , Máquina de Vetores de Suporte , Algoritmos , Confiabilidade dos Dados , Estrutura Terciária de Proteína , Proteínas/metabolismo

17.

Massive integration of diverse protein quality assessment methods to improve template based modeling in CASP11.

Cao, Renzhi; Bhattacharya, Debswapna; Adhikari, Badri; Li, Jilong; Cheng, Jianlin.

Proteins ; 84 Suppl 1: 247-59, 2016 09.

Artigo em Inglês | MEDLINE | ID: mdl-26369671

RESUMO

Model evaluation and selection is an important step and a big challenge in template-based protein structure prediction. Individual model quality assessment methods designed for recognizing some specific properties of protein structures often fail to consistently select good models from a model pool because of their limitations. Therefore, combining multiple complimentary quality assessment methods is useful for improving model ranking and consequently tertiary structure prediction. Here, we report the performance and analysis of our human tertiary structure predictor (MULTICOM) based on the massive integration of 14 diverse complementary quality assessment methods that was successfully benchmarked in the 11th Critical Assessment of Techniques of Protein Structure prediction (CASP11). The predictions of MULTICOM for 39 template-based domains were rigorously assessed by six scoring metrics covering global topology of Cα trace, local all-atom fitness, side chain quality, and physical reasonableness of the model. The results show that the massive integration of complementary, diverse single-model and multi-model quality assessment methods can effectively leverage the strength of single-model methods in distinguishing quality variation among similar good models and the advantage of multi-model quality assessment methods of identifying reasonable average-quality models. The overall excellent performance of the MULTICOM predictor demonstrates that integrating a large number of model quality assessment methods in conjunction with model clustering is a useful approach to improve the accuracy, diversity, and consequently robustness of template-based protein structure prediction. Proteins 2016; 84(Suppl 1):247-259. © 2015 Wiley Periodicals, Inc.

Assuntos

Benchmarking , Biologia Computacional/estatística & dados numéricos , Modelos Moleculares , Modelos Estatísticos , Proteínas/química , Software , Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Bases de Dados de Proteínas , Humanos , Internet , Dobramento de Proteína , Domínios e Motivos de Interação entre Proteínas , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Controle de Qualidade , Homologia Estrutural de Proteína , Termodinâmica

18.

Large-scale model quality assessment for improving protein tertiary structure prediction.

Cao, Renzhi; Bhattacharya, Debswapna; Adhikari, Badri; Li, Jilong; Cheng, Jianlin.

Bioinformatics ; 31(12): i116-23, 2015 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-26072473

RESUMO

MOTIVATION: Sampling structural models and ranking them are the two major challenges of protein structure prediction. Traditional protein structure prediction methods generally use one or a few quality assessment (QA) methods to select the best-predicted models, which cannot consistently select relatively better models and rank a large number of models well. RESULTS: Here, we develop a novel large-scale model QA method in conjunction with model clustering to rank and select protein structural models. It unprecedentedly applied 14 model QA methods to generate consensus model rankings, followed by model refinement based on model combination (i.e. averaging). Our experiment demonstrates that the large-scale model QA approach is more consistent and robust in selecting models of better quality than any individual QA method. Our method was blindly tested during the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM group. It was officially ranked third out of all 143 human and server predictors according to the total scores of the first models predicted for 78 CASP11 protein domains and second according to the total scores of the best of the five models predicted for these domains. MULTICOM's outstanding performance in the extremely competitive 2014 CASP11 experiment proves that our large-scale QA approach together with model clustering is a promising solution to one of the two major problems in protein structure modeling. AVAILABILITY AND IMPLEMENTATION: The web server is available at: http://sysbio.rnet.missouri.edu/multicom_cluster/human/.

Assuntos

Modelos Moleculares , Estrutura Terciária de Proteína , Análise por Conglomerados , Humanos , Software

19.

A large-scale conformation sampling and evaluation server for protein tertiary structure prediction and its assessment in CASP11.

Li, Jilong; Cao, Renzhi; Cheng, Jianlin.

BMC Bioinformatics ; 16: 337, 2015 Oct 23.

Artigo em Inglês | MEDLINE | ID: mdl-26493701

RESUMO

BACKGROUND: With more and more protein sequences produced in the genomic era, predicting protein structures from sequences becomes very important for elucidating the molecular details and functions of these proteins for biomedical research. Traditional template-based protein structure prediction methods tend to focus on identifying the best templates, generating the best alignments, and applying the best energy function to rank models, which often cannot achieve the best performance because of the difficulty of obtaining best templates, alignments, and models. METHODS: We developed a large-scale conformation sampling and evaluation method and its servers to improve the reliability and robustness of protein structure prediction. In the first step, our method used a variety of alignment methods to sample relevant and complementary templates and to generate alternative and diverse target-template alignments, used a template and alignment combination protocol to combine alignments, and used template-based and template-free modeling methods to generate a pool of conformations for a target protein. In the second step, it used a large number of protein model quality assessment methods to evaluate and rank the models in the protein model pool, in conjunction with an exception handling strategy to deal with any additional failure in model ranking. RESULTS: The method was implemented as two protein structure prediction servers: MULTICOM-CONSTRUCT and MULTICOM-CLUSTER that participated in the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) in 2014. The two servers were ranked among the best 10 server predictors. CONCLUSIONS: The good performance of our servers in CASP11 demonstrates the effectiveness and robustness of the large-scale conformation sampling and evaluation. The MULTICOM server is available at: http://sysbio.rnet.missouri.edu/multicom_cluster/.

Assuntos

Genômica/métodos , Estrutura Terciária de Proteína , Proteínas/química , Sequência de Aminoácidos , Modelos Químicos , Conformação Proteica

20.

Iterative reconstruction of three-dimensional models of human chromosomes from chromosomal contact data.

Nowotny, Jackson; Ahmed, Sharif; Xu, Lingfei; Oluwadare, Oluwatosin; Chen, Hannah; Hensley, Noelan; Trieu, Tuan; Cao, Renzhi; Cheng, Jianlin.

BMC Bioinformatics ; 16: 338, 2015 Oct 23.

Artigo em Inglês | MEDLINE | ID: mdl-26493399

RESUMO

BACKGROUND: The entire collection of genetic information resides within the chromosomes, which themselves reside within almost every cell nucleus of eukaryotic organisms. Each individual chromosome is found to have its own preferred three-dimensional (3D) structure independent of the other chromosomes. The structure of each chromosome plays vital roles in controlling certain genome operations, including gene interaction and gene regulation. As a result, knowing the structure of chromosomes assists in the understanding of how the genome functions. Fortunately, the 3D structure of chromosomes proves possible to construct through computational methods via contact data recorded from the chromosome. We developed a unique computational approach based on optimization procedures known as adaptation, simulated annealing, and genetic algorithm to construct 3D models of human chromosomes, using chromosomal contact data. RESULTS: Our models were evaluated using a percentage-based scoring function. Analysis of the scores of the final 3D models demonstrated their effective construction from our computational approach. Specifically, the models resulting from our approach yielded an average score of 80.41%, with a high of 91%, across models for all chromosomes of a normal human B-cell. Comparisons made with other methods affirmed the effectiveness of our strategy. Particularly, juxtaposition with models generated through the publicly available method Markov chain Monte Carlo 5C (MCMC5C) illustrated the outperformance of our approach, as seen through a higher average score for all chromosomes. Our methodology was further validated using two consistency checking techniques known as convergence testing and robustness checking, which both proved successful. CONCLUSIONS: The pursuit of constructing accurate 3D chromosomal structures is fueled by the benefits revealed by the findings as well as any possible future areas of study that arise. This motivation has led to the development of our computational methodology. The implementation of our approach proved effective in constructing 3D chromosome models and proved consistent with, and more effective than, some other methods thereby achieving our goal of creating a tool to help advance certain research efforts. The source code, test data, test results, and documentation of our method, Gen3D, are available at our sourceforge site at: http://sourceforge.net/projects/gen3d/.

Assuntos

Cromossomos Humanos/genética , Genoma/genética , Linfócitos B , Humanos , Modelos Teóricos , Alinhamento de Sequência , Moldes Genéticos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA