Pesquisa | Portal Regional da BVS

1.

Enhancing cryo-EM structure prediction with DeepTracer and AlphaFold2 integration.

Chen, Jason; Zia, Ayisha; Luo, Albert; Meng, Hanze; Wang, Fengbin; Hou, Jie; Cao, Renzhi; Si, Dong.

Brief Bioinform ; 25(3)2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38609330

RESUMO

Understanding the protein structures is invaluable in various biomedical applications, such as vaccine development. Protein structure model building from experimental electron density maps is a time-consuming and labor-intensive task. To address the challenge, machine learning approaches have been proposed to automate this process. Currently, the majority of the experimental maps in the database lack atomic resolution features, making it challenging for machine learning-based methods to precisely determine protein structures from cryogenic electron microscopy density maps. On the other hand, protein structure prediction methods, such as AlphaFold2, leverage evolutionary information from protein sequences and have recently achieved groundbreaking accuracy. However, these methods often require manual refinement, which is labor intensive and time consuming. In this study, we present DeepTracer-Refine, an automated method that refines AlphaFold predicted structures by aligning them to DeepTracers modeled structure. Our method was evaluated on 39 multi-domain proteins and we improved the average residue coverage from 78.2 to 90.0% and average local Distance Difference Test score from 0.67 to 0.71. We also compared DeepTracer-Refine with Phenixs AlphaFold refinement and demonstrated that our method not only performs better when the initial AlphaFold model is less precise but also surpasses Phenix in run-time performance.

Assuntos

Evolução Biológica , Aprendizado de Máquina , Microscopia Crioeletrônica , Sequência de Aminoácidos , Bases de Dados Factuais

2.

Outcomes of the EMDataResource Cryo-EM Ligand Modeling Challenge.

Lawson, Catherine L; Kryshtafovych, Andriy; Pintilie, Grigore D; Burley, Stephen K; Cerný, Jirí; Chen, Vincent B; Emsley, Paul; Gobbi, Alberto; Joachimiak, Andrzej; Noreng, Sigrid; Prisant, Michael; Read, Randy J; Richardson, Jane S; Rohou, Alexis L; Schneider, Bohdan; Sellers, Benjamin D; Shao, Chenghua; Sourial, Elizabeth; Williams, Chris I; Williams, Christopher J; Yang, Ying; Abbaraju, Venkat; Afonine, Pavel V; Baker, Matthew L; Bond, Paul S; Blundell, Tom L; Burnley, Tom; Campbell, Arthur; Cao, Renzhi; Cheng, Jianlin; Chojnowski, Grzegorz; Cowtan, Kevin D; DiMaio, Frank; Esmaeeli, Reza; Giri, Nabin; Grubmüller, Helmut; Hoh, Soon Wen; Hou, Jie; Hryc, Corey F; Hunte, Carola; Igaev, Maxim; Joseph, Agnel P; Kao, Wei-Chun; Kihara, Daisuke; Kumar, Dilip; Lang, Lijun; Lin, Sean; Maddhuri Venkata Subramaniya, Sai R; Mittal, Sumit; Mondal, Arup.

Res Sq ; 2024 Jan 25.

Artigo em Inglês | MEDLINE | ID: mdl-38343795

RESUMO

The EMDataResource Ligand Model Challenge aimed to assess the reliability and reproducibility of modeling ligands bound to protein and protein/nucleic-acid complexes in cryogenic electron microscopy (cryo-EM) maps determined at near-atomic (1.9-2.5 Å) resolution. Three published maps were selected as targets: E. coli beta-galactosidase with inhibitor, SARS-CoV-2 RNA-dependent RNA polymerase with covalently bound nucleotide analog, and SARS-CoV-2 ion channel ORF3a with bound lipid. Sixty-one models were submitted from 17 independent research groups, each with supporting workflow details. We found that (1) the quality of submitted ligand models and surrounding atoms varied, as judged by visual inspection and quantification of local map quality, model-to-map fit, geometry, energetics, and contact scores, and (2) a composite rather than a single score was needed to assess macromolecule+ligand model quality. These observations lead us to recommend best practices for assessing cryo-EM structures of liganded macromolecules reported at near-atomic resolution.

3.

ComplexQA: a deep graph learning approach for protein complex structure assessment.

Zhang, Lei; Wang, Sheng; Hou, Jie; Si, Dong; Zhu, Junyong; Cao, Renzhi.

Brief Bioinform ; 24(6)2023 09 22.

Artigo em Inglês | MEDLINE | ID: mdl-37930021

RESUMO

MOTIVATION: In recent years, the end-to-end deep learning method for single-chain protein structure prediction has achieved high accuracy. For example, the state-of-the-art method AlphaFold, developed by Google, has largely increased the accuracy of protein structure predictions to near experimental accuracy in some of the cases. At the same time, there are few methods that can evaluate the quality of protein complexes at the residue level. In particular, evaluating the quality of residues at the interface of protein complexes can lead to a wide range of applications, such as protein function analysis and drug design. In this paper, we introduce a new deep graph neural network-based method ComplexQA, to evaluate the local quality of interfaces for protein complexes by utilizing the residue-level structural information in 3D space and the sequence-level constraints. RESULTS: We benchmark our method to other state-of-the-art quality assessment approaches on the HAF2 and DBM55-AF2 datasets (high-quality structural models predicted by AlphaFold-Multimer), and the BM5 docking dataset. The experimental results show that our proposed method achieves better or similar performance compared with other state-of-the-art methods, especially on difficult targets which only contain a few acceptable models. Our method is able to suggest a score for each interfac e residue, which demonstrates a powerful assessment tool for the ever-increasing number of protein complexes. AVAILABILITY: https://github.com/Cao-Labs/ComplexQA.git. Contact: caora@plu.edu.

Assuntos

Redes Neurais de Computação , Proteínas , Proteínas/química

4.

Fast and automated protein-DNA/RNA macromolecular complex modeling from cryo-EM maps.

Nakamura, Andrew; Meng, Hanze; Zhao, Minglei; Wang, Fengbin; Hou, Jie; Cao, Renzhi; Si, Dong.

Brief Bioinform ; 24(2)2023 03 19.

Artigo em Inglês | MEDLINE | ID: mdl-36682003

RESUMO

Cryo-electron microscopy (cryo-EM) allows a macromolecular structure such as protein-DNA/RNA complexes to be reconstructed in a three-dimensional coulomb potential map. The structural information of these macromolecular complexes forms the foundation for understanding the molecular mechanism including many human diseases. However, the model building of large macromolecular complexes is often difficult and time-consuming. We recently developed DeepTracer-2.0, an artificial-intelligence-based pipeline that can build amino acid and nucleic acid backbones from a single cryo-EM map, and even predict the best-fitting residues according to the density of side chains. The experiments showed improved accuracy and efficiency when benchmarking the performance on independent experimental maps of protein-DNA/RNA complexes and demonstrated the promising future of macromolecular modeling from cryo-EM maps. Our method and pipeline could benefit researchers worldwide who work in molecular biomedicine and drug discovery, and substantially increase the throughput of the cryo-EM model building. The pipeline has been integrated into the web portal https://deeptracer.uw.edu/.

Assuntos

DNA , RNA , Humanos , Microscopia Crioeletrônica/métodos , Modelos Moleculares , Conformação Proteica , Substâncias Macromoleculares/química

5.

Integrated bulk and single-cell transcriptomes reveal pyroptotic signature in prognosis and therapeutic options of hepatocellular carcinoma by combining deep learning.

Liu, Yang; Li, Hanlin; Zeng, Tianyu; Wang, Yang; Zhang, Hongqi; Wan, Ying; Shi, Zheng; Cao, Renzhi; Tang, Hua.

Brief Bioinform ; 25(1)2023 11 22.

Artigo em Inglês | MEDLINE | ID: mdl-38197309

RESUMO

Although some pyroptosis-related (PR) prognostic models for cancers have been reported, pyroptosis-based features have not been fully discovered at the single-cell level in hepatocellular carcinoma (HCC). In this study, by deeply integrating single-cell and bulk transcriptome data, we systematically investigated significance of the shared pyroptotic signature at both single-cell and bulk levels in HCC prognosis. Based on the pyroptotic signature, a robust PR risk system was constructed to quantify the prognostic risk of individual patient. To further verify capacity of the pyroptotic signature on predicting patients' prognosis, an attention mechanism-based deep neural network classification model was constructed. The mechanisms of prognostic difference in the patients with distinct PR risk were dissected on tumor stemness, cancer pathways, transcriptional regulation, immune infiltration and cell communications. A nomogram model combining PR risk with clinicopathologic data was constructed to evaluate the prognosis of individual patients in clinic. The PR risk could also evaluate therapeutic response to neoadjuvant therapies in HCC patients. In conclusion, the constructed PR risk system enables a comprehensive assessment of tumor microenvironment characteristics, accurate prognosis prediction and rational therapeutic options in HCC.

Assuntos

Carcinoma Hepatocelular , Aprendizado Profundo , Neoplasias Hepáticas , Humanos , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/terapia , Transcriptoma , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/terapia , Comunicação Celular , Microambiente Tumoral/genética

6.

The Development of Machine Learning Methods in Discriminating Secretory Proteins of Malaria Parasite.

Liu, Ting; Chen, Jiamao; Zhang, Qian; Hippe, Kyle; Hunt, Cassandra; Le, Thu; Cao, Renzhi; Tang, Hua.

Curr Med Chem ; 29(5): 807-821, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-34636289

RESUMO

Malaria caused by Plasmodium falciparum is one of the major infectious diseases in the world. It is essential to exploit an effective method to predict secretory proteins of malaria parasites to develop effective cures and treatment. Biochemical assays can provide details for accurate identification of the secretory proteins, but these methods are expensive and time-consuming. In this paper, we summarized the machine learningbased identification algorithms and compared the construction strategies between different computational methods. Also, we discussed the use of machine learning to improve the ability of algorithms to identify proteins secreted by malaria parasites.

Assuntos

Malária Falciparum , Malária , Parasitos , Animais , Humanos , Aprendizado de Máquina , Malária/diagnóstico , Malária Falciparum/diagnóstico , Malária Falciparum/parasitologia , Parasitos/metabolismo , Plasmodium falciparum/química , Proteínas de Protozoários/química , Proteínas de Protozoários/metabolismo

7.

ZoomQA: residue-level protein model accuracy estimation with machine learning on sequential and 3D structural features.

Hippe, Kyle; Lilley, Cade; William Berkenpas, Joshua; Chandana Pocha, Ciri; Kishaba, Kiyomi; Ding, Hui; Hou, Jie; Si, Dong; Cao, Renzhi.

Brief Bioinform ; 23(1)2022 01 17.

Artigo em Inglês | MEDLINE | ID: mdl-34553747

RESUMO

MOTIVATION: The Estimation of Model Accuracy problem is a cornerstone problem in the field of Bioinformatics. As of CASP14, there are 79 global QA methods, and a minority of 39 residue-level QA methods with very few of them working on protein complexes. Here, we introduce ZoomQA, a novel, single-model method for assessing the accuracy of a tertiary protein structure/complex prediction at residue level, which have many applications such as drug discovery. ZoomQA differs from others by considering the change in chemical and physical features of a fragment structure (a portion of a protein within a radius $r$ of the target amino acid) as the radius of contact increases. Fourteen physical and chemical properties of amino acids are used to build a comprehensive representation of every residue within a protein and grade their placement within the protein as a whole. Moreover, we have shown the potential of ZoomQA to identify problematic regions of the SARS-CoV-2 protein complex. RESULTS: We benchmark ZoomQA on CASP14, and it outperforms other state-of-the-art local QA methods and rivals state of the art QA methods in global prediction metrics. Our experiment shows the efficacy of these new features and shows that our method is able to match the performance of other state-of-the-art methods without the use of homology searching against databases or PSSM matrices. AVAILABILITY: http://zoomQA.renzhitech.com.

Assuntos

COVID-19 , Caspases/química , Aprendizado de Máquina , Modelos Moleculares , SARS-CoV-2/química , Proteínas Virais/química , Humanos , Estrutura Quaternária de Proteína , Estrutura Terciária de Proteína , Análise de Sequência de Proteína

8.

Recent Progress of Machine Learning in Gene Therapy.

Hunt, Cassandra; Montgomery, Sandra; Berkenpas, Joshua William; Sigafoos, Noel; Oakley, John Christian; Espinosa, Jacob; Justice, Nicola; Kishaba, Kiyomi; Hippe, Kyle; Si, Dong; Hou, Jie; Ding, Hui; Cao, Renzhi.

Curr Gene Ther ; 22(2): 132-143, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-34161210

RESUMO

With new developments in biomedical technology, it is now a viable therapeutic treatment to alter genes with techniques like CRISPR. At the same time, it is increasingly cheaper to perform whole genome sequencing, resulting in rapid advancement in gene therapy and editing in precision medicine. Understanding the current industry and academic applications of gene therapy provides an important backdrop to future scientific developments. Additionally, machine learning and artificial intelligence techniques allow for the reduction of time and money spent in the development of new gene therapy products and techniques. In this paper, we survey the current progress of gene therapy treatments for several diseases and explore machine learning applications in gene therapy. We also discuss the ethical implications of gene therapy and the use of machine learning in precision medicine. Machine learning and gene therapy are both topics gaining popularity in various publications, and we conclude that there is still room for continued research and application of machine learning techniques in the gene therapy field.

Assuntos

Inteligência Artificial , Aprendizado de Máquina , Terapia Genética , Medicina de Precisão

9.

Application of artificial intelligence and machine learning for COVID-19 drug discovery and vaccine design.

Lv, Hao; Shi, Lei; Berkenpas, Joshua William; Dao, Fu-Ying; Zulfiqar, Hasan; Ding, Hui; Zhang, Yang; Yang, Liming; Cao, Renzhi.

Brief Bioinform ; 22(6)2021 11 05.

Artigo em Inglês | MEDLINE | ID: mdl-34410360

RESUMO

The global pandemic of coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2, has led to a dramatic loss of human life worldwide. Despite many efforts, the development of effective drugs and vaccines for this novel virus will take considerable time. Artificial intelligence (AI) and machine learning (ML) offer promising solutions that could accelerate the discovery and optimization of new antivirals. Motivated by this, in this paper, we present an extensive survey on the application of AI and ML for combating COVID-19 based on the rapidly emerging literature. Particularly, we point out the challenges and future directions associated with state-of-the-art solutions to effectively control the COVID-19 pandemic. We hope that this review provides researchers with new insights into the ways AI and ML fight and have fought the COVID-19 outbreak.

Assuntos

Tratamento Farmacológico da COVID-19 , Vacinas contra COVID-19/genética , Descoberta de Drogas , SARS-CoV-2/genética , Inteligência Artificial , COVID-19/genética , COVID-19/virologia , Vacinas contra COVID-19/química , Desenho de Fármacos , Humanos , Aprendizado de Máquina , Pandemias , SARS-CoV-2/química , SARS-CoV-2/patogenicidade

10.

Computational identification of N4-methylcytosine sites in the mouse genome with machine-learning method.

Zulfiqar, Hasan; Khan, Rida Sarwar; Hassan, Farwa; Hippe, Kyle; Hunt, Cassandra; Ding, Hui; Song, Xiao-Ming; Cao, Renzhi.

Math Biosci Eng ; 18(4): 3348-3363, 2021 04 15.

Artigo em Inglês | MEDLINE | ID: mdl-34198389

RESUMO

N4-methylcytosine (4mC) is a kind of DNA modification which could regulate multiple biological processes. Correctly identifying 4mC sites in genomic sequences can provide precise knowledge about their genetic roles. This study aimed to develop an ensemble model to predict 4mC sites in the mouse genome. In the proposed model, DNA sequences were encoded by k-mer, enhanced nucleic acid composition and composition of k-spaced nucleic acid pairs. Subsequently, these features were optimized by using minimum redundancy maximum relevance (mRMR) with incremental feature selection (IFS) and five-fold cross-validation. The obtained optimal features were inputted into random forest classifier for discriminating 4mC from non-4mC sites in mouse. On the independent dataset, our model could yield the overall accuracy of 85.41%, which was approximately 3.8% -6.3% higher than the two existing models, i4mC-Mouse and 4mCpred-EL respectively. The data and source code of the model can be freely download from https://github.com/linDing-groups/model_4mc.

Assuntos

Citosina , DNA , Animais , Biologia Computacional , Genoma , Aprendizado de Máquina , Camundongos , Software

11.

Cryo-EM model validation recommendations based on outcomes of the 2019 EMDataResource challenge.

Lawson, Catherine L; Kryshtafovych, Andriy; Adams, Paul D; Afonine, Pavel V; Baker, Matthew L; Barad, Benjamin A; Bond, Paul; Burnley, Tom; Cao, Renzhi; Cheng, Jianlin; Chojnowski, Grzegorz; Cowtan, Kevin; Dill, Ken A; DiMaio, Frank; Farrell, Daniel P; Fraser, James S; Herzik, Mark A; Hoh, Soon Wen; Hou, Jie; Hung, Li-Wei; Igaev, Maxim; Joseph, Agnel P; Kihara, Daisuke; Kumar, Dilip; Mittal, Sumit; Monastyrskyy, Bohdan; Olek, Mateusz; Palmer, Colin M; Patwardhan, Ardan; Perez, Alberto; Pfab, Jonas; Pintilie, Grigore D; Richardson, Jane S; Rosenthal, Peter B; Sarkar, Daipayan; Schäfer, Luisa U; Schmid, Michael F; Schröder, Gunnar F; Shekhar, Mrinal; Si, Dong; Singharoy, Abishek; Terashi, Genki; Terwilliger, Thomas C; Vaiana, Andrea; Wang, Liguo; Wang, Zhe; Wankowicz, Stephanie A; Williams, Christopher J; Winn, Martyn; Wu, Tianqi.

Nat Methods ; 18(2): 156-164, 2021 02.

Artigo em Inglês | MEDLINE | ID: mdl-33542514

RESUMO

This paper describes outcomes of the 2019 Cryo-EM Model Challenge. The goals were to (1) assess the quality of models that can be produced from cryogenic electron microscopy (cryo-EM) maps using current modeling software, (2) evaluate reproducibility of modeling results from different software developers and users and (3) compare performance of current metrics used for model evaluation, particularly Fit-to-Map metrics, with focus on near-atomic resolution. Our findings demonstrate the relatively high accuracy and reproducibility of cryo-EM models derived by 13 participating teams from four benchmark maps, including three forming a resolution series (1.8 to 3.1 Å). The results permit specific recommendations to be made about validating near-atomic cryo-EM structures both in the context of individual experiments and structure data archives such as the Protein Data Bank. We recommend the adoption of multiple scoring parameters to provide full and objective annotation and assessment of the model, reflective of the observed cryo-EM map density.

Assuntos

Microscopia Crioeletrônica/métodos , Modelos Moleculares , Cristalografia por Raios X , Conformação Proteica , Proteínas/química

12.

Deep Learning to Predict Protein Backbone Structure from High-Resolution Cryo-EM Density Maps.

Si, Dong; Moritz, Spencer A; Pfab, Jonas; Hou, Jie; Cao, Renzhi; Wang, Liguo; Wu, Tianqi; Cheng, Jianlin.

Sci Rep ; 10(1): 4282, 2020 03 09.

Artigo em Inglês | MEDLINE | ID: mdl-32152330

RESUMO

Cryo-electron microscopy (cryo-EM) has become a leading technology for determining protein structures. Recent advances in this field have allowed for atomic resolution. However, predicting the backbone trace of a protein has remained a challenge on all but the most pristine density maps (<2.5 Å resolution). Here we introduce a deep learning model that uses a set of cascaded convolutional neural networks (CNNs) to predict Cα atoms along a protein's backbone structure. The cascaded-CNN (C-CNN) is a novel deep learning architecture comprised of multiple CNNs, each predicting a specific aspect of a protein's structure. This model predicts secondary structure elements (SSEs), backbone structure, and Cα atoms, combining the results of each to produce a complete prediction map. The cascaded-CNN is a semantic segmentation image classifier and was trained using thousands of simulated density maps. This method is largely automatic and only requires a recommended threshold value for each protein density map. A specialized tabu-search path walking algorithm was used to produce an initial backbone trace with Cα placements. A helix-refinement algorithm made further improvements to the α-helix SSEs of the backbone trace. Finally, a novel quality assessment-based combinatorial algorithm was used to effectively map protein sequences onto Cα traces to obtain full-atom protein structures. This method was tested on 50 experimental maps between 2.6 Å and 4.4 Å resolution. It outperformed several state-of-the-art prediction methods including Rosetta de-novo, MAINMAST, and a Phenix based method by producing the most complete predicted protein structures, as measured by percentage of found Cα atoms. This method accurately predicted 88.9% (mean) of the Cα atoms within 3 Å of a protein's backbone structure surpassing the 66.8% mark achieved by the leading alternate method (Phenix based fully automatic method) on the same set of density maps. The C-CNN also achieved an average root-mean-square deviation (RMSD) of 1.24 Å on a set of 50 experimental density maps which was tested by the Phenix based fully automatic method. The source code and demo of this research has been published at https://github.com/DrDongSi/Ca-Backbone-Prediction.

Assuntos

Algoritmos , Microscopia Crioeletrônica/métodos , Aprendizado Profundo , Redes Neurais de Computação , Conformação Proteica , Proteínas/química , Software , Sequência de Aminoácidos , Humanos , Modelos Moleculares , Homologia de Sequência

13.

A Glycine max sodium/hydrogen exchanger enhances salt tolerance through maintaining higher Na⁺ efflux rate and K⁺/Na⁺ ratio in Arabidopsis.

Sun, Tian-Jie; Fan, Long; Yang, Jun; Cao, Ren-Zhi; Yang, Chun-Yan; Zhang, Jie; Wang, Dong-Mei.

BMC Plant Biol ; 19(1): 469, 2019 Nov 05.

Artigo em Inglês | MEDLINE | ID: mdl-31690290

RESUMO

BACKGROUND: Soybean (Glycine max (L.)) is one the most important oil-yielding cash crops. However, the soybean production has been seriously restricted by salinization. It is therefore crucial to identify salt tolerance-related genes and reveal molecular mechanisms underlying salt tolerance in soybean crops. A better understanding of how plants resist salt stress provides insights in improving existing soybean varieties as well as cultivating novel salt tolerant varieties. In this study, the biological function of GmNHX1, a NHX-like gene, and the molecular basis underlying GmNHX1-mediated salt stress resistance have been revealed. RESULTS: We found that the transcription level of GmNHX1 was up-regulated under salt stress condition in soybean, reaching its peak at 24 h after salt treatment. By employing the virus-induced gene silencing technique (VIGS), we also found that soybean plants became more susceptible to salt stress after silencing GmNHX1 than wild-type and more silenced plants wilted than wild-type under salt treatment. Furthermore, Arabidopsis thaliana expressing GmNHX1 grew taller and generated more rosette leaves under salt stress condition compared to wild-type. Exogenous expression of GmNHX1 resulted in an increase of Na+ transportation to leaves along with a reduction of Na+ absorption in roots, and the consequent maintenance of a high K+/Na+ ratio under salt stress condition. GmNHX1-GFP-transformed onion bulb endothelium cells showed fluorescent pattern in which GFP fluorescence signals enriched in vacuolar membranes. Using the non-invasive micro-test technique (NMT), we found that the Na+ efflux rate of both wild-type and transformed plants after salt treatment were significantly higher than that of before salt treatment. Additionally, the Na+ efflux rate of transformed plants after salt treatment were significantly higher than that of wild-type. Meanwhile, the transcription levels of three osmotic stress-related genes, SKOR, SOS1 and AKT1 were all up-regulated in GmNHX1-expressing plants under salt stress condition. CONCLUSION: Vacuolar membrane-localized GmNHX1 enhances plant salt tolerance through maintaining a high K+/Na+ ratio along with inducing the expression of SKOR, SOS1 and AKT1. Our findings provide molecular insights on the roles of GmNHX1 and similar sodium/hydrogen exchangers in regulating salt tolerance.

Assuntos

Glycine max/metabolismo , Proteínas de Plantas/metabolismo , Tolerância ao Sal/genética , Plantas Tolerantes a Sal/metabolismo , Trocadores de Sódio-Hidrogênio/metabolismo , Arabidopsis/genética , Inativação Gênica , Proteínas de Plantas/genética , Potássio/metabolismo , Plantas Tolerantes a Sal/genética , Sódio/metabolismo , Trocadores de Sódio-Hidrogênio/genética , Glycine max/genética , Estresse Fisiológico/genética , Regulação para Cima , Vacúolos/metabolismo

14.

Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13.

Hou, Jie; Wu, Tianqi; Cao, Renzhi; Cheng, Jianlin.

Proteins ; 87(12): 1165-1178, 2019 12.

Artigo em Inglês | MEDLINE | ID: mdl-30985027

RESUMO

Predicting residue-residue distance relationships (eg, contacts) has become the key direction to advance protein structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, distance-driven template-free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template-free and template-based structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue-residue features such as coevolution scores to substantially improve contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template-based modeling targets. Deep learning also successfully integrated one-dimensional structural features, two-dimensional contact information, and three-dimensional structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system clearly shows that protein contact distance prediction and model selection driven by deep learning holds the key of solving protein structure prediction problem. However, there are still challenges in accurately predicting protein contact distance when there are few homologous sequences, folding proteins from noisy contact distances, and ranking models of hard targets.

Assuntos

Biologia Computacional , Conformação Proteica , Proteínas/ultraestrutura , Software , Algoritmos , Bases de Dados de Proteínas , Aprendizado Profundo , Modelos Moleculares , Redes Neurais de Computação , Dobramento de Proteína , Estrutura Terciária de Proteína/genética , Proteínas/química , Proteínas/genética , Análise de Sequência de Proteína

15.

Survey of Machine Learning Techniques in Drug Discovery.

Stephenson, Natalie; Shane, Emily; Chase, Jessica; Rowland, Jason; Ries, David; Justice, Nicola; Zhang, Jie; Chan, Leong; Cao, Renzhi.

Curr Drug Metab ; 20(3): 185-193, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-30124147

RESUMO

BACKGROUND: Drug discovery, which is the process of discovering new candidate medications, is very important for pharmaceutical industries. At its current stage, discovering new drugs is still a very expensive and time-consuming process, requiring Phases I, II and III for clinical trials. Recently, machine learning techniques in Artificial Intelligence (AI), especially the deep learning techniques which allow a computational model to generate multiple layers, have been widely applied and achieved state-of-the-art performance in different fields, such as speech recognition, image classification, bioinformatics, etc. One very important application of these AI techniques is in the field of drug discovery. METHODS: We did a large-scale literature search on existing scientific websites (e.g, ScienceDirect, Arxiv) and startup companies to understand current status of machine learning techniques in drug discovery. RESULTS: Our experiments demonstrated that there are different patterns in machine learning fields and drug discovery fields. For example, keywords like prediction, brain, discovery, and treatment are usually in drug discovery fields. Also, the total number of papers published in drug discovery fields with machine learning techniques is increasing every year. CONCLUSION: The main focus of this survey is to understand the current status of machine learning techniques in the drug discovery field within both academic and industrial settings, and discuss its potential future applications. Several interesting patterns for machine learning techniques in drug discovery fields are discussed in this survey.

Assuntos

Descoberta de Drogas , Aprendizado de Máquina , Biologia Computacional/métodos , Indústria Farmacêutica , Humanos , Inquéritos e Questionários

16.

An analysis and evaluation of the WeFold collaborative for protein structure prediction and its pipelines in CASP11 and CASP12.

Keasar, Chen; McGuffin, Liam J; Wallner, Björn; Chopra, Gaurav; Adhikari, Badri; Bhattacharya, Debswapna; Blake, Lauren; Bortot, Leandro Oliveira; Cao, Renzhi; Dhanasekaran, B K; Dimas, Itzhel; Faccioli, Rodrigo Antonio; Faraggi, Eshel; Ganzynkowicz, Robert; Ghosh, Sambit; Ghosh, Soma; Gieldon, Artur; Golon, Lukasz; He, Yi; Heo, Lim; Hou, Jie; Khan, Main; Khatib, Firas; Khoury, George A; Kieslich, Chris; Kim, David E; Krupa, Pawel; Lee, Gyu Rie; Li, Hongbo; Li, Jilong; Lipska, Agnieszka; Liwo, Adam; Maghrabi, Ali Hassan A; Mirdita, Milot; Mirzaei, Shokoufeh; Mozolewska, Magdalena A; Onel, Melis; Ovchinnikov, Sergey; Shah, Anand; Shah, Utkarsh; Sidi, Tomer; Sieradzan, Adam K; Slusarz, Magdalena; Slusarz, Rafal; Smadbeck, James; Tamamis, Phanourios; Trieber, Nicholas; Wirecki, Tomasz; Yin, Yanping; Zhang, Yang.

Sci Rep ; 8(1): 9939, 2018 07 02.

Artigo em Inglês | MEDLINE | ID: mdl-29967418

RESUMO

Every two years groups worldwide participate in the Critical Assessment of Protein Structure Prediction (CASP) experiment to blindly test the strengths and weaknesses of their computational methods. CASP has significantly advanced the field but many hurdles still remain, which may require new ideas and collaborations. In 2012 a web-based effort called WeFold, was initiated to promote collaboration within the CASP community and attract researchers from other fields to contribute new ideas to CASP. Members of the WeFold coopetition (cooperation and competition) participated in CASP as individual teams, but also shared components of their methods to create hybrid pipelines and actively contributed to this effort. We assert that the scale and diversity of integrative prediction pipelines could not have been achieved by any individual lab or even by any collaboration among a few partners. The models contributed by the participating groups and generated by the pipelines are publicly available at the WeFold website providing a wealth of data that remains to be tapped. Here, we analyze the results of the 2014 and 2016 pipelines showing improvements according to the CASP assessment as well as areas that require further adjustments and research.

Assuntos

Caspase 12/metabolismo , Caspases/metabolismo , Biologia Computacional/métodos , Modelos Moleculares , Software , Caspase 12/química , Caspases/química , Humanos , Conformação Proteica

17.

ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network.

Cao, Renzhi; Freitas, Colton; Chan, Leong; Sun, Miao; Jiang, Haiqing; Chen, Zhangxin.

Molecules ; 22(10)2017 Oct 17.

Artigo em Inglês | MEDLINE | ID: mdl-29039790

RESUMO

With the development of next generation sequencing techniques, it is fast and cheap to determine protein sequences but relatively slow and expensive to extract useful information from protein sequences because of limitations of traditional biological experimental techniques. Protein function prediction has been a long standing challenge to fill the gap between the huge amount of protein sequences and the known function. In this paper, we propose a novel method to convert the protein function problem into a language translation problem by the new proposed protein sequence language "ProLan" to the protein function language "GOLan", and build a neural machine translation model based on recurrent neural networks to translate "ProLan" language to "GOLan" language. We blindly tested our method by attending the latest third Critical Assessment of Function Annotation (CAFA 3) in 2016, and also evaluate the performance of our methods on selected proteins whose function was released after CAFA competition. The good performance on the training and testing datasets demonstrates that our new proposed method is a promising direction for protein function prediction. In summary, we first time propose a method which converts the protein function prediction problem to a language translation problem and applies a neural machine translation model for protein function prediction.

Assuntos

Biologia Computacional/métodos , Redes Neurais de Computação , Proteínas/metabolismo , Software , Algoritmos , Bases de Dados de Proteínas , Ontologia Genética , Aprendizado de Máquina , Reprodutibilidade dos Testes

18.

QAcon: single model quality assessment using protein structural and contact information with machine learning techniques.

Cao, Renzhi; Adhikari, Badri; Bhattacharya, Debswapna; Sun, Miao; Hou, Jie; Cheng, Jianlin.

Bioinformatics ; 33(4): 586-588, 2017 02 15.

Artigo em Inglês | MEDLINE | ID: mdl-28035027

RESUMO

Motivation: Protein model quality assessment (QA) plays a very important role in protein structure prediction. It can be divided into two groups of methods: single model and consensus QA method. The consensus QA methods may fail when there is a large portion of low quality models in the model pool. Results: In this paper, we develop a novel single-model quality assessment method QAcon utilizing structural features, physicochemical properties, and residue contact predictions. We apply residue-residue contact information predicted by two protein contact prediction methods PSICOV and DNcon to generate a new score as feature for quality assessment. This novel feature and other 11 features are used as input to train a two-layer neural network on CASP9 datasets to predict the quality of a single protein model. We blindly benchmarked our method QAcon on CASP11 dataset as the MULTICOM-CLUSTER server. Based on the evaluation, our method is ranked as one of the top single model QA methods. The good performance of the features based on contact prediction illustrates the value of using contact information in protein quality assessment. Availability and Implementation: The web server and the source code of QAcon are freely available at: http://cactus.rnet.missouri.edu/QAcon. Contact: chengji@missouri.edu. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Aprendizado de Máquina , Modelos Moleculares , Proteínas/química , Animais , Humanos , Conformação Proteica , Proteínas/metabolismo , Controle de Qualidade

19.

Assessing Predicted Contacts for Building Protein Three-Dimensional Models.

Adhikari, Badri; Bhattacharya, Debswapna; Cao, Renzhi; Cheng, Jianlin.

Methods Mol Biol ; 1484: 115-126, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-27787823

RESUMO

Recent successes of contact-guided protein structure prediction methods have revived interest in solving the long-standing problem of ab initio protein structure prediction. With homology modeling failing for many protein sequences that do not have templates, contact-guided structure prediction has shown promise, and consequently, contact prediction has gained a lot of interest recently. Although a few dozen contact prediction tools are already currently available as web servers and downloadables, not enough research has been done towards using existing measures like precision and recall to evaluate these contacts with the goal of building three-dimensional models. Moreover, when we do not have a native structure for a set of predicted contacts, the only analysis we can perform is a simple contact map visualization of the predicted contacts. A wider and more rigorous assessment of the predicted contacts is needed, in order to build tertiary structure models. This chapter discusses instructions and protocols for using tools and applying techniques in order to assess predicted contacts for building three-dimensional models.

Assuntos

Biologia Computacional/métodos , Proteínas/química , Software , Algoritmos , Bases de Dados de Proteínas , Redes Neurais de Computação , Conformação Proteica , Dobramento de Proteína , Proteínas/genética , Análise de Sequência de Proteína

20.

DeepQA: improving the estimation of single protein model quality with deep belief networks.

Cao, Renzhi; Bhattacharya, Debswapna; Hou, Jie; Cheng, Jianlin.

BMC Bioinformatics ; 17(1): 495, 2016 Dec 05.

Artigo em Inglês | MEDLINE | ID: mdl-27919220

RESUMO

BACKGROUND: Protein quality assessment (QA) useful for ranking and selecting protein models has long been viewed as one of the major challenges for protein tertiary structure prediction. Especially, estimating the quality of a single protein model, which is important for selecting a few good models out of a large model pool consisting of mostly low-quality models, is still a largely unsolved problem. RESULTS: We introduce a novel single-model quality assessment method DeepQA based on deep belief network that utilizes a number of selected features describing the quality of a model from different perspectives, such as energy, physio-chemical characteristics, and structural information. The deep belief network is trained on several large datasets consisting of models from the Critical Assessment of Protein Structure Prediction (CASP) experiments, several publicly available datasets, and models generated by our in-house ab initio method. Our experiments demonstrate that deep belief network has better performance compared to Support Vector Machines and Neural Networks on the protein model quality assessment problem, and our method DeepQA achieves the state-of-the-art performance on CASP11 dataset. It also outperformed two well-established methods in selecting good outlier models from a large set of models of mostly low quality generated by ab initio modeling methods. CONCLUSION: DeepQA is a useful deep learning tool for protein single model quality assessment and protein structure prediction. The source code, executable, document and training/test datasets of DeepQA for Linux is freely available to non-commercial users at http://cactus.rnet.missouri.edu/DeepQA/ .

Assuntos

Aprendizado de Máquina , Modelos Moleculares , Redes Neurais de Computação , Proteínas/química , Máquina de Vetores de Suporte , Algoritmos , Confiabilidade dos Dados , Estrutura Terciária de Proteína , Proteínas/metabolismo

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA