RESUMEN
Machine learning (ML) has been an important arsenal in computational biology used to elucidate protein function for decades. With the recent burgeoning of novel ML methods and applications, new ML approaches have been incorporated into many areas of computational biology dealing with protein function. We examine how ML has been integrated into a wide range of computational models to improve prediction accuracy and gain a better understanding of protein function. The applications discussed are protein structure prediction, protein engineering using sequence modifications to achieve stability and druggability characteristics, molecular docking in terms of protein-ligand binding, including allosteric effects, protein-protein interactions and protein-centric drug discovery. To quantify the mechanisms underlying protein function, a holistic approach that takes structure, flexibility, stability, and dynamics into account is required, as these aspects become inseparable through their interdependence. Another key component of protein function is conformational dynamics, which often manifest as protein kinetics. Computational methods that use ML to generate representative conformational ensembles and quantify differences in conformational ensembles important for function are included in this review. Future opportunities are highlighted for each of these topics.
Asunto(s)
Biología Computacional , Proteínas , Biología Computacional/métodos , Ligandos , Aprendizaje Automático , Simulación del Acoplamiento Molecular , Simulación de Dinámica Molecular , Conformación Proteica , Proteínas/químicaRESUMEN
G protein-coupled receptors (GPCRs) are a large family of integral membrane proteins responsible for cellular signal transductions. Identification of therapeutic compounds to regulate physiological processes is an important first step of drug discovery. We proposed MAGELLAN, a novel hierarchical virtual-screening (VS) pipeline, which starts with low-resolution protein structure prediction and structure-based binding-site identification, followed by homologous GPCR detections through structure and orthosteric binding-site comparisons. Ligand profiles constructed from the homologous ligand-GPCR complexes are then used to thread through compound databases for VS. The pipeline was first tested in a large-scale retrospective screening experiment against 224 human Class A GPCRs, where MAGELLAN achieved a median enrichment factor (EF) of 14.38, significantly higher than that using individual ligand profiles. Next, MAGELLAN was examined on 5 and 20 GPCRs from two public VS databases (DUD-E and GPCR-Bench) and resulted in an average EF of 9.75 and 13.70, respectively, which compare favorably with other state-of-the-art docking- and ligand-based methods, including AutoDock Vina (with EFâ¯=â¯1.48/3.16 in DUD-E and GPCR-Bench), DOCK 6 (2.12/3.47 in DUD-E and GPCR-Bench), PoLi (2.2 in DUD-E), and FINDSITECcomb2.0 (2.90 in DUD-E). Detailed data analyses show that the major advantage of MAGELLAN is attributed to the power of ligand profiling, which integrates complementary methods for ligand-GPCR interaction recognition and thus significantly improves the coverage and sensitivity of VS models. Finally, cases studies on opioid and motilin receptors show that new connections between functionally related GPCRs can be visualized in the minimum spanning tree built on the similarities of predicted ligand-binding ensembles, suggesting a novel use of MAGELLAN for GPCR deorphanization.