Búsqueda | Portal de Búsqueda de la BVS Enfermería

1.

Improved inter-protein contact prediction using dimensional hybrid residual networks and protein language models.

Si, Yunda; Yan, Chengfei.

Brief Bioinform ; 24(2)2023 03 19.

Artículo en Inglés | MEDLINE | ID: mdl-36759333

RESUMEN

The knowledge of contacting residue pairs between interacting proteins is very useful for the structural characterization of protein-protein interactions (PPIs). However, accurately identifying the tens of contacting ones from hundreds of thousands of inter-protein residue pairs is extremely challenging, and performances of the state-of-the-art inter-protein contact prediction methods are still quite limited. In this study, we developed a deep learning method for inter-protein contact prediction, which is referred to as DRN-1D2D_Inter. Specifically, we employed pretrained protein language models to generate structural information-enriched input features to residual networks formed by dimensional hybrid residual blocks to perform inter-protein contact prediction. Extensively bechmarking DRN-1D2D_Inter on multiple datasets, including both heteromeric PPIs and homomeric PPIs, we show DRN-1D2D_Inter consistently and significantly outperformed two state-of-the-art inter-protein contact prediction methods, including GLINTER and DeepHomo, although both the latter two methods leveraged the native structures of interacting proteins in the prediction, and DRN-1D2D_Inter made the prediction purely from sequences. We further show that applying the predicted contacts as constraints for protein-protein docking can significantly improve its performance for protein complex structure prediction.

Asunto(s)

Algoritmos , Biología Computacional , Biología Computacional/métodos , Proteínas/química

2.

Protein complex structure prediction powered by multiple sequence alignments of interologs from multiple taxonomic ranks and AlphaFold2.

Si, Yunda; Yan, Chengfei.

Brief Bioinform ; 23(4)2022 07 18.

Artículo en Inglés | MEDLINE | ID: mdl-35649388

RESUMEN

AlphaFold2 can predict protein complex structures as long as a multiple sequence alignment (MSA) of the interologs of the target protein-protein interaction (PPI) can be provided. In this study, a simplified phylogeny-based approach was applied to generate the MSA of interologs, which was then used as the input to AlphaFold2 for protein complex structure prediction. In this extensively benchmarked protocol on nonredundant PPI dataset, including 107 bacterial PPIs and 442 eukaryotic PPIs, we show complex structures of 79.5% of the bacterial PPIs and 49.8% of the eukaryotic PPIs can be successfully predicted, which yielded significantly better performance than the application of MSA of interologs prepared by two existing approaches. Considering PPIs may not be conserved in species with long evolutionary distances, we further restricted interologs in the MSA to different taxonomic ranks of the species of the target PPI in protein complex structure prediction. We found that the success rates can be increased to 87.9% for the bacterial PPIs and 56.3% for the eukaryotic PPIs if interologs in the MSA are restricted to a specific taxonomic rank of the species of each target PPI. Finally, we show that the optimal taxonomic ranks for protein complex structure prediction can be selected with the application of the predicted template modeling (TM) scores of the output models.

Asunto(s)

Mapeo de Interacción de Proteínas , Proteínas , Filogenia , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Alineación de Secuencia

3.

A reproducibility analysis-based statistical framework for residue-residue evolutionary coupling detection.

Si, Yunda; Zhang, Yi; Yan, Chengfei.

Brief Bioinform ; 23(2)2022 03 10.

Artículo en Inglés | MEDLINE | ID: mdl-35037015

RESUMEN

Direct coupling analysis (DCA) has been widely used to infer evolutionary coupled residue pairs from the multiple sequence alignment (MSA) of homologous sequences. However, effectively selecting residue pairs with significant evolutionary couplings according to the result of DCA is a non-trivial task. In this study, we developed a general statistical framework for significant evolutionary coupling detection, referred to as irreproducible discovery rate (IDR)-DCA, which is based on reproducibility analysis of the coupling scores obtained from DCA on manually created MSA replicates. IDR-DCA was applied to select residue pairs for contact prediction for monomeric proteins, protein-protein interactions and monomeric RNAs, in which three different versions of DCA were applied. We demonstrated that with the application of IDR-DCA, the residue pairs selected using a universal threshold always yielded stable performance for contact prediction. Comparing with the application of carefully tuned coupling score cutoffs, IDR-DCA always showed better performance. The robustness of IDR-DCA was also supported through the MSA downsampling analysis. We further demonstrated the effectiveness of applying constraints obtained from residue pairs selected by IDR-DCA to assist RNA secondary structure prediction.

Asunto(s)

Algoritmos , Proteínas , Estructura Secundaria de Proteína , Proteínas/química , ARN , Reproducibilidad de los Resultados , Alineación de Secuencia

4.

Supervised enhancer prediction with epigenetic pattern recognition and targeted validation.

Sethi, Anurag; Gu, Mengting; Gumusgoz, Emrah; Chan, Landon; Yan, Koon-Kiu; Rozowsky, Joel; Barozzi, Iros; Afzal, Veena; Akiyama, Jennifer A; Plajzer-Frick, Ingrid; Yan, Chengfei; Novak, Catherine S; Kato, Momoe; Garvin, Tyler H; Pham, Quan; Harrington, Anne; Mannion, Brandon J; Lee, Elizabeth A; Fukuda-Yuzawa, Yoko; Visel, Axel; Dickel, Diane E; Yip, Kevin Y; Sutton, Richard; Pennacchio, Len A; Gerstein, Mark.

Nat Methods ; 17(8): 807-814, 2020 08.

Artículo en Inglés | MEDLINE | ID: mdl-32737473

RESUMEN

Enhancers are important non-coding elements, but they have traditionally been hard to characterize experimentally. The development of massively parallel assays allows the characterization of large numbers of enhancers for the first time. Here, we developed a framework using Drosophila STARR-seq to create shape-matching filters based on meta-profiles of epigenetic features. We integrated these features with supervised machine-learning algorithms to predict enhancers. We further demonstrated that our model could be transferred to predict enhancers in mammals. We comprehensively validated the predictions using a combination of in vivo and in vitro approaches, involving transgenic assays in mice and transduction-based reporter assays in human cell lines (153 enhancers in total). The results confirmed that our model can accurately predict enhancers in different species without re-parameterization. Finally, we examined the transcription factor binding patterns at predicted enhancers versus promoters. We demonstrated that these patterns enable the construction of a secondary model that effectively distinguishes enhancers and promoters.

Asunto(s)

Epigénesis Genética/fisiología , Reconocimiento de Normas Patrones Automatizadas/métodos , Animales , Línea Celular , Drosophila , Histonas/genética , Histonas/metabolismo , Humanos , Ratones , Ratones Transgénicos , Reproducibilidad de los Resultados

5.

Improved protein contact prediction using dimensional hybrid residual networks and singularity enhanced loss function.

Si, Yunda; Yan, Chengfei.

Brief Bioinform ; 22(6)2021 11 05.

Artículo en Inglés | MEDLINE | ID: mdl-34448830

RESUMEN

Deep residual learning has shown great success in protein contact prediction. In this study, a new deep residual learning-based protein contact prediction model was developed. Comparing with previous models, a new type of residual block hybridizing 1D and 2D convolutions was designed to increase the effective receptive field of the residual network, and a new loss function emphasizing the easily misclassified residue pairs was proposed to enhance the model training. The developed protein contact prediction model referred to as DRN-1D2D was first evaluated on 105 CASP11 targets, 76 CAMEO hard targets and 398 membrane proteins together with two in house-developed reference models based on either the standard 2D residual block or the traditional BCE loss function, from which we confirmed that both the dimensional hybrid residual block and the singularity enhanced loss function can be employed to improve the model performance for protein contact prediction. DRN-1D2D was further evaluated on 39 CASP13 and CASP14 free modeling targets together with the two reference models and six state-of-the-art protein contact prediction models including DeepCov, DeepCon, DeepConPred2, SPOT-Contact, RaptorX-Contact and TripleRes. The result shows that DRN-1D2D consistently achieved the best performance among all these models.

Asunto(s)

Proteínas Portadoras/química , Biología Computacional/métodos , Aprendizaje Profundo , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Proteínas Portadoras/metabolismo , Unión Proteica , Proteínas/metabolismo , Reproducibilidad de los Resultados , Programas Informáticos

6.

Preparation and pH Detection Performance of Rosin-Based Fluorescent Polyurethane Microspheres.

Yu, Caili; Lu, Guangjie; Yan, Chengfei; Xu, Jianben; Zhang, Faai.

J Fluoresc ; 33(4): 1593-1602, 2023 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-36790631

RESUMEN

Rosin-based fluorescent polyurethane emulsion (FPU) was prepared using isophorone diisocyanate, ester of acrylic rosin and glycidyl methacrylate, 1,5-dihydroxy naphthalene (1,5-DN), and 1,4-butanediol as the raw materials. Then, rosin-based fluorescent polyurethane microspheres (FPUMs) were successfully prepared by suspension polymerization method using FPU as the main material, azodiisobutyronitrile as the initiator, and gelatin as the dispersant. FPUMs were characterized by Fourier transform infrared spectra, thermogravimetric analysis, optical microscopy, scanning electron microscopy and fluorescence spectra, and the response performance of FPUMs to pH was studied. The results showed that FPUMs were successfully prepared. With the increase of the level of 1,5-DN, the particle size of FPUMs increased gradually, and the fluorescence intensity increased first and then decreased. When the level of 1,5-DN was 3 wt.%, the average particle size was 49.3 µm, the particle distribution index (PDI) was 1.05, and the fluorescence intensity was the largest (3662 a.u.). The fluorescence intensity of FPUMs increased linearly with the decrease of pH, which can be used for pH detection in solution. Furthermore, the FPUMs exhibited good thermal stability, anti-interference and recoverability.

7.

MDockPeP: An ab-initio protein-peptide docking server.

Xu, Xianjin; Yan, Chengfei; Zou, Xiaoqin.

J Comput Chem ; 39(28): 2409-2413, 2018 10 30.

Artículo en Inglés | MEDLINE | ID: mdl-30368849

RESUMEN

Protein-peptide interactions play a crucial role in a variety of cellular processes. The protein-peptide complex structure is a key to understand the mechanisms underlying protein-peptide interactions and is critical for peptide therapeutic development. We present a user-friendly protein-peptide docking server, MDockPeP. Starting from a peptide sequence and a protein receptor structure, the MDockPeP Server globally docks the all-atom, flexible peptide to the protein receptor. The produced modes are then evaluated with a statistical potential-based scoring function, ITScorePeP. This method was systematically validated using the peptiDB benchmarking database. At least one near-native peptide binding mode was ranked among top 10 (or top 500) in 59% (85%) of the bound cases, and in 40.6% (71.9%) of the challenging unbound cases. The server can be used for both protein-peptide complex structure prediction and initial-stage sampling of the protein-peptide binding modes for other docking or simulation methods. MDockPeP Server is freely available at http://zougrouptoolkit.missouri.edu/mdockpep. © 2018 Wiley Periodicals, Inc.

Asunto(s)

Computadores , Internet , Simulación del Acoplamiento Molecular , Péptidos/química , Proteínas/química , Bases de Datos de Proteínas , Unión Proteica , Conformación Proteica

8.

HiC-spector: a matrix library for spectral and reproducibility analysis of Hi-C contact maps.

Yan, Koon-Kiu; Yardimci, Galip Gürkan; Yan, Chengfei; Noble, William S; Gerstein, Mark.

Bioinformatics ; 33(14): 2199-2201, 2017 Jul 15.

Artículo en Inglés | MEDLINE | ID: mdl-28369339

RESUMEN

SUMMARY: Genome-wide proximity ligation based assays like Hi-C have opened a window to the 3D organization of the genome. In so doing, they present data structures that are different from conventional 1D signal tracks. To exploit the 2D nature of Hi-C contact maps, matrix techniques like spectral analysis are particularly useful. Here, we present HiC-spector, a collection of matrix-related functions for analyzing Hi-C contact maps. In particular, we introduce a novel reproducibility metric for quantifying the similarity between contact maps based on spectral decomposition. The metric successfully separates contact maps mapped from Hi-C data coming from biological replicates, pseudo-replicates and different cell types. AVAILABILITY AND IMPLEMENTATION: Source code in Julia and Python, and detailed documentation is available at https://github.com/gersteinlab/HiC-spector . CONTACT: koonkiu.yan@gmail.com or mark@gersteinlab.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Cromosomas/química , Técnicas Genéticas , Genoma , Biotinilación , ADN/química , Biblioteca de Genes , Humanos , Reproducibilidad de los Resultados

9.

Performance of MDockPP in CAPRI rounds 28-29 and 31-35 including the prediction of water-mediated interactions.

Xu, Xianjin; Qiu, Liming; Yan, Chengfei; Ma, Zhiwei; Grinter, Sam Z; Zou, Xiaoqin.

Proteins ; 85(3): 424-434, 2017 03.

Artículo en Inglés | MEDLINE | ID: mdl-27802576

RESUMEN

Protein-protein interactions are either through direct contacts between two binding partners or mediated by structural waters. Both direct contacts and water-mediated interactions are crucial to the formation of a protein-protein complex. During the recent CAPRI rounds, a novel parallel searching strategy for predicting water-mediated interactions is introduced into our protein-protein docking method, MDockPP. Briefly, a FFT-based docking algorithm is employed in generating putative binding modes, and an iteratively derived statistical potential-based scoring function, ITScorePP, in conjunction with biological information is used to assess and rank the binding modes. Up to 10 binding modes are selected as the initial protein-protein complex structures for MD simulations in explicit solvent. Water molecules near the interface are clustered based on the snapshots extracted from independent equilibrated trajectories. Then, protein-ligand docking is employed for a parallel search for water molecules near the protein-protein interface. The water molecules generated by ligand docking and the clustered water molecules generated by MD simulations are merged, referred to as the predicted structural water molecules. Here, we report the performance of this protocol for CAPRI rounds 28-29 and 31-35 containing 20 valid docking targets and 11 scoring targets. In the docking experiments, we predicted correct binding modes for nine targets, including one high-accuracy, two medium-accuracy, and six acceptable predictions. Regarding the two targets for the prediction of water-mediated interactions, we achieved models ranked as "excellent" in accordance with the CAPRI evaluation criteria; one of these two targets is considered as a difficult target for structural water prediction. Proteins 2017; 85:424-434. © 2016 Wiley Periodicals, Inc.

Asunto(s)

Algoritmos , Biología Computacional/métodos , Simulación del Acoplamiento Molecular/métodos , Proteínas/química , Agua/química , Benchmarking , Sitios de Unión , Simulación de Dinámica Molecular , Unión Proteica , Conformación Proteica , Mapeo de Interacción de Proteínas , Multimerización de Proteína , Proyectos de Investigación , Programas Informáticos , Homología Estructural de Proteína , Termodinámica

10.

Improving binding mode and binding affinity predictions of docking by ligand-based search of protein conformations: evaluation in D3R grand challenge 2015.

Xu, Xianjin; Yan, Chengfei; Zou, Xiaoqin.

J Comput Aided Mol Des ; 31(8): 689-699, 2017 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-28668990

RESUMEN

The growing number of protein-ligand complex structures, particularly the structures of proteins co-bound with different ligands, in the Protein Data Bank helps us tackle two major challenges in molecular docking studies: the protein flexibility and the scoring function. Here, we introduced a systematic strategy by using the information embedded in the known protein-ligand complex structures to improve both binding mode and binding affinity predictions. Specifically, a ligand similarity calculation method was employed to search a receptor structure with a bound ligand sharing high similarity with the query ligand for the docking use. The strategy was applied to the two datasets (HSP90 and MAP4K4) in recent D3R Grand Challenge 2015. In addition, for the HSP90 dataset, a system-specific scoring function (ITScore2_hsp90) was generated by recalibrating our statistical potential-based scoring function (ITScore2) using the known protein-ligand complex structures and the statistical mechanics-based iterative method. For the HSP90 dataset, better performances were achieved for both binding mode and binding affinity predictions comparing with the original ITScore2 and with ensemble docking. For the MAP4K4 dataset, although there were only eight known protein-ligand complex structures, our docking strategy achieved a comparable performance with ensemble docking. Our method for receptor conformational selection and iterative method for the development of system-specific statistical potential-based scoring functions can be easily applied to other protein targets that have a number of protein-ligand complex structures available to improve predictions on binding.

Asunto(s)

Proteínas HSP90 de Choque Térmico/química , Péptidos y Proteínas de Señalización Intracelular/química , Simulación del Acoplamiento Molecular , Proteínas Serina-Treonina Quinasas/química , Sitios de Unión , Bases de Datos de Proteínas , Diseño de Fármacos , Humanos , Ligandos , Unión Proteica , Conformación Proteica

11.

Iterative Knowledge-Based Scoring Functions Derived from Rigid and Flexible Decoy Structures: Evaluation with the 2013 and 2014 CSAR Benchmarks.

Yan, Chengfei; Grinter, Sam Z; Merideth, Benjamin Ryan; Ma, Zhiwei; Zou, Xiaoqin.

J Chem Inf Model ; 56(6): 1013-21, 2016 06 27.

Artículo en Inglés | MEDLINE | ID: mdl-26389744

RESUMEN

In this study, we developed two iterative knowledge-based scoring functions, ITScore_pdbbind(rigid) and ITScore_pdbbind(flex), using rigid decoy structures and flexible decoy structures, respectively, that were generated from the protein-ligand complexes in the refined set of PDBbind 2012. These two scoring functions were evaluated using the 2013 and 2014 CSAR benchmarks. The results were compared with the results of two other scoring functions, the Vina scoring function and ITScore, the scoring function that we previously developed from rigid decoy structures for a smaller set of protein-ligand complexes. A graph-based method was developed to evaluate the root-mean-square deviation between two conformations of the same ligand with different atom names and orders due to different file preparations, and the program is freely available. Our study showed that the two new scoring functions developed from the larger training set yielded significantly improved performance in binding mode predictions. For binding affinity predictions, all four scoring functions showed protein-dependent performance. We suggest the development of protein-family-dependent scoring functions for accurate binding affinity prediction.

Asunto(s)

Descubrimiento de Drogas/métodos , Simulación del Acoplamiento Molecular , Benchmarking , Ligandos , Unión Proteica , Conformación Proteica , Proteínas/química , Proteínas/metabolismo , Relación Estructura-Actividad

12.

A chemical genetic approach demonstrates that MPK3/MPK6 activation and NADPH oxidase-mediated oxidative burst are two independent signaling events in plant immunity.

Xu, Juan; Xie, Jie; Yan, Chengfei; Zou, Xiaoqin; Ren, Dongtao; Zhang, Shuqun.

Plant J ; 77(2): 222-34, 2014 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-24245741

RESUMEN

Plant recognition of pathogen-associated molecular patterns (PAMPs) such as bacterial flagellin-derived flg22 triggers rapid activation of mitogen-activated protein kinases (MAPKs) and generation of reactive oxygen species (ROS). Arabidopsis has at least four PAMP/pathogen-responsive MAPKs: MPK3, MPK6, MPK4 and MPK11. It was speculated that these MAPKs may function downstream of ROS in plant immunity because of their activation by exogenously added H2 O2 . MPK3/MPK6 or their orthologs in other plant species have also been reported to be involved in the ROS burst from the plant respiratory burst oxidase homolog (Rboh) of the human neutrophil gp91phox. However, detailed genetic analysis is lacking. Using a chemical genetic approach, we generated a conditional loss-of-function mpk3 mpk6 double mutant. Consistent with results obtained using a conditionally rescued mpk3 mpk6 double mutant generated previously, the results obtained using the new conditional loss-of-function mpk3 mpk6 double mutant demonstrate that the flg22-triggered ROS burst is independent of MPK3/MPK6. In Arabidopsis mutants lacking a functional AtRbohD, the flg22-induced ROS burst was completely blocked. However, activation of MPK3/MPK6 was not affected. Based on these results, we conclude that the rapid ROS burst and MPK3/MPK6 activation are two independent early signaling events in plant immunity, downstream of FLS2. We also found that MPK4 negatively affects the flg22-induced ROS burst. In addition, salicylic acid pre-treatment enhances the AtRbohD-mediated ROS burst, which is again independent of MPK3/MPK6 based on analysis of the mpk3 mpk6 double mutant. The establishment of an mpk3 mpk6 double mutant system using a chemical genetic approach provides a powerful tool to investigate the function of MPK3/MPK6 in the plant defense signaling pathway.

Asunto(s)

Proteínas de Arabidopsis/metabolismo , Arabidopsis/inmunología , Quinasas de Proteína Quinasa Activadas por Mitógenos/metabolismo , Proteínas Quinasas Activadas por Mitógenos/metabolismo , NADPH Oxidasas/metabolismo , Estallido Respiratorio , Transducción de Señal , Arabidopsis/enzimología , Arabidopsis/metabolismo , Activación Enzimática , Especies Reactivas de Oxígeno/metabolismo

13.

Predicting peptide binding sites on protein surfaces by clustering chemical interactions.

Yan, Chengfei; Zou, Xiaoqin.

J Comput Chem ; 36(1): 49-61, 2015 Jan 05.

Artículo en Inglés | MEDLINE | ID: mdl-25363279

RESUMEN

Short peptides play important roles in cellular processes including signal transduction, immune response, and transcription regulation. Correct identification of the peptide binding site on a given protein surface is of great importance not only for mechanistic investigation of these biological processes but also for therapeutic development. In this study, we developed a novel computational approach, referred to as ACCLUSTER, for predicting the peptide binding sites on protein surfaces. Specifically, we use the 20 standard amino acids as probes to globally scan the protein surface. The poses forming good chemical interactions with the protein are identified, followed by clustering with the density-based spatial clustering of applications with noise technique. Finally, these clusters are ranked based on their sizes. The cluster with the largest size is predicted as the putative binding site. Assessment of ACCLUSTER was performed on a diverse test set of 251 nonredundant protein-peptide complexes. The results were compared with the performance of POCASA, a pocket detection method for ligand binding site prediction. Peptidb, another protein-peptide database that contains both bound structures and unbound or homologous structures was used to test the robustness of ACCLUSTER. The performance of ACCLUSTER was also compared with PepSite2 and PeptiMap, two recently developed methods developed for identifying peptide binding sites. The results showed that ACCLUSTER is a promising method for peptide binding site prediction. Additionally, ACCLUSTER was also shown to be applicable to nonpeptide ligand binding site prediction.

Asunto(s)

Biología Computacional , Péptidos/química , Proteínas/química , Algoritmos , Aminoácidos/química , Sitios de Unión , Análisis por Conglomerados , Simulación del Acoplamiento Molecular , Propiedades de Superficie

14.

Protein language model-embedded geometric graphs power inter-protein contact prediction.

Si, Yunda; Yan, Chengfei.

Elife ; 122024 Apr 02.

Artículo en Inglés | MEDLINE | ID: mdl-38564241

RESUMEN

Accurate prediction of contacting residue pairs between interacting proteins is very useful for structural characterization of protein-protein interactions. Although significant improvement has been made in inter-protein contact prediction recently, there is still a large room for improving the prediction accuracy. Here we present a new deep learning method referred to as PLMGraph-Inter for inter-protein contact prediction. Specifically, we employ rotationally and translationally invariant geometric graphs obtained from structures of interacting proteins to integrate multiple protein language models, which are successively transformed by graph encoders formed by geometric vector perceptrons and residual networks formed by dimensional hybrid residual blocks to predict inter-protein contacts. Extensive evaluation on multiple test sets illustrates that PLMGraph-Inter outperforms five top inter-protein contact prediction methods, including DeepHomo, GLINTER, CDPred, DeepHomo2, and DRN-1D2D_Inter, by large margins. In addition, we also show that the prediction of PLMGraph-Inter can complement the result of AlphaFold-Multimer. Finally, we show leveraging the contacts predicted by PLMGraph-Inter as constraints for protein-protein docking can dramatically improve its performance for protein complex structure prediction.

Asunto(s)

Lenguaje , Redes Neurales de la Computación

15.

Inclusion of the orientational entropic effect and low-resolution experimental information for protein-protein docking in Critical Assessment of PRedicted Interactions (CAPRI).

Huang, Sheng-You; Yan, Chengfei; Grinter, Sam Z; Chang, Shan; Jiang, Lin; Zou, Xiaoqin.

Proteins ; 81(12): 2183-91, 2013 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-24227686

RESUMEN

Inclusion of entropy is important and challenging for protein-protein binding prediction. Here, we present a statistical mechanics-based approach to empirically consider the effect of orientational entropy. Specifically, we globally sample the possible binding orientations based on a simple shape-complementarity scoring function using an FFT-type docking method. Then, for each generated orientation, we calculate the probability through the partition function of the ensemble of accessible states, which are assumed to be represented by the set of nearby binding modes. For each mode, the interaction energy is calculated using our ITScorePP scoring function that was developed in our laboratory based on principles of statistical mechanics. Using the above protocol, we present the results of our participation in Rounds 22-27 of the Critical Assessment of PRedicted Interactions (CAPRI) experiment for 10 targets (T46-T58). Additional experimental information, such as low-resolution small-angle X-ray scattering data, was used when available. In the prediction (or docking) experiments of the 10 target complexes, we achieved correct binding modes for six targets: one with high accuracy (T47), two with medium accuracy (T48 and T57), and three with acceptable accuracy (T49, T50, and T58). In the scoring experiments of seven target complexes, we obtained correct binding modes for six targets: one with high accuracy (T47), two with medium accuracy (T49 and T50), and three with acceptable accuracy (T46, T51, and T53).

Asunto(s)

Simulación del Acoplamiento Molecular , Mapas de Interacción de Proteínas , Proteínas/química , Programas Informáticos , Algoritmos , Biología Computacional , Cristalografía por Rayos X , Bases de Datos de Proteínas , Entropía , Modelos Moleculares , Unión Proteica , Conformación Proteica

16.

Automated large-scale file preparation, docking, and scoring: evaluation of ITScore and STScore using the 2012 Community Structure-Activity Resource benchmark.

Grinter, Sam Z; Yan, Chengfei; Huang, Sheng-You; Jiang, Lin; Zou, Xiaoqin.

J Chem Inf Model ; 53(8): 1905-14, 2013 Aug 26.

Artículo en Inglés | MEDLINE | ID: mdl-23656179

RESUMEN

In this study, we use the recently released 2012 Community Structure-Activity Resource (CSAR) data set to evaluate two knowledge-based scoring functions, ITScore and STScore, and a simple force-field-based potential (VDWScore). The CSAR data set contains 757 compounds, most with known affinities, and 57 crystal structures. With the help of the script files for docking preparation, we use the full CSAR data set to evaluate the performances of the scoring functions on binding affinity prediction and active/inactive compound discrimination. The CSAR subset that includes crystal structures is used as well, to evaluate the performances of the scoring functions on binding mode and affinity predictions. Within this structure subset, we investigate the importance of accurate ligand and protein conformational sampling and find that the binding affinity predictions are less sensitive to non-native ligand and protein conformations than the binding mode predictions. We also find the full CSAR data set to be more challenging in making binding mode predictions than the subset with structures. The script files used for preparing the CSAR data set for docking, including scripts for canonicalization of the ligand atoms, are offered freely to the academic community.

Asunto(s)

Bases de Datos Farmacéuticas , Procesamiento Automatizado de Datos , Simulación del Acoplamiento Molecular/métodos , Automatización , Cristalografía por Rayos X , Conformación Proteica , Relación Estructura-Actividad

17.

Genomics and data science: an application within an umbrella.

Navarro, Fábio C P; Mohsen, Hussein; Yan, Chengfei; Li, Shantao; Gu, Mengting; Meyerson, William; Gerstein, Mark.

Genome Biol ; 20(1): 109, 2019 05 29.

Artículo en Inglés | MEDLINE | ID: mdl-31142351

RESUMEN

Data science allows the extraction of practical insights from large-scale data. Here, we contextualize it as an umbrella term, encompassing several disparate subdomains. We focus on how genomics fits as a specific application subdomain, in terms of well-known 3 V data and 4 M process frameworks (volume-velocity-variety and measurement-mining-modeling-manipulation, respectively). We further analyze the technical and cultural "exports" and "imports" between genomics and other data-science subdomains (e.g., astronomy). Finally, we discuss how data value, privacy, and ownership are pressing issues for data science applications, in general, and are especially relevant to genomics, due to the persistent nature of DNA.

Asunto(s)

Ciencia de los Datos , Genómica

18.

Building a Hybrid Physical-Statistical Classifier for Predicting the Effect of Variants Related to Protein-Drug Interactions.

Wang, Bo; Yan, Chengfei; Lou, Shaoke; Emani, Prashant; Li, Bian; Xu, Min; Kong, Xiangmeng; Meyerson, William; Yang, Yucheng T; Lee, Donghoon; Gerstein, Mark.

Structure ; 27(9): 1469-1481.e3, 2019 09 03.

Artículo en Inglés | MEDLINE | ID: mdl-31279629

RESUMEN

A key issue in drug design is how population variation affects drug efficacy by altering binding affinity (BA) in different individuals, an essential consideration for government regulators. Ideally, we would like to evaluate the BA perturbations of millions of single-nucleotide variants (SNVs). However, only hundreds of protein-drug complexes with SNVs have experimentally characterized BAs, constituting too small a gold standard for straightforward statistical model training. Thus, we take a hybrid approach: using physically based calculations to bootstrap the parameterization of a full model. In particular, we do 3D structure-based docking on â¼10,000 SNVs modifying known protein-drug complexes to construct a pseudo gold standard. Then we use this augmented set of BAs to train a statistical model combining structure, ligand and sequence features and illustrate how it can be applied to millions of SNVs. Finally, we show that our model has good cross-validated performance (97% AUROC) and can also be validated by orthogonal ligand-binding data.

Asunto(s)

Biología Computacional/métodos , Polimorfismo de Nucleótido Simple , Proteínas/química , Proteínas/genética , Bases de Datos de Proteínas , Diseño de Fármacos , Humanos , Ligandos , Aprendizaje Automático , Modelos Estadísticos , Simulación del Acoplamiento Molecular , Unión Proteica , Conformación Proteica , Proteínas/metabolismo

19.

Comprehensive functional genomic resource and integrative model for the human brain.

Wang, Daifeng; Liu, Shuang; Warrell, Jonathan; Won, Hyejung; Shi, Xu; Navarro, Fabio C P; Clarke, Declan; Gu, Mengting; Emani, Prashant; Yang, Yucheng T; Xu, Min; Gandal, Michael J; Lou, Shaoke; Zhang, Jing; Park, Jonathan J; Yan, Chengfei; Rhie, Suhn Kyong; Manakongtreecheep, Kasidet; Zhou, Holly; Nathan, Aparna; Peters, Mette; Mattei, Eugenio; Fitzgerald, Dominic; Brunetti, Tonya; Moore, Jill; Jiang, Yan; Girdhar, Kiran; Hoffman, Gabriel E; Kalayci, Selim; Gümüs, Zeynep H; Crawford, Gregory E; Roussos, Panos; Akbarian, Schahram; Jaffe, Andrew E; White, Kevin P; Weng, Zhiping; Sestan, Nenad; Geschwind, Daniel H; Knowles, James A; Gerstein, Mark B.

Science ; 362(6420)2018 12 14.

Artículo en Inglés | MEDLINE | ID: mdl-30545857

RESUMEN

Despite progress in defining genetic risk for psychiatric disorders, their molecular mechanisms remain elusive. Addressing this, the PsychENCODE Consortium has generated a comprehensive online resource for the adult brain across 1866 individuals. The PsychENCODE resource contains ~79,000 brain-active enhancers, sets of Hi-C linkages, and topologically associating domains; single-cell expression profiles for many cell types; expression quantitative-trait loci (QTLs); and further QTLs associated with chromatin, splicing, and cell-type proportions. Integration shows that varying cell-type proportions largely account for the cross-population variation in expression (with >88% reconstruction accuracy). It also allows building of a gene regulatory network, linking genome-wide association study variants to genes (e.g., 321 for schizophrenia). We embed this network into an interpretable deep-learning model, which improves disease prediction by ~6-fold versus polygenic risk scores and identifies key genes and pathways in psychiatric disorders.

Asunto(s)

Encéfalo/metabolismo , Regulación de la Expresión Génica , Trastornos Mentales/genética , Conjuntos de Datos como Asunto , Aprendizaje Profundo , Elementos de Facilitación Genéticos , Epigénesis Genética , Epigenómica , Redes Reguladoras de Genes , Estudio de Asociación del Genoma Completo , Humanos , Sitios de Carácter Cuantitativo , Análisis de la Célula Individual , Transcriptoma

20.

The Usage of ACCLUSTER for Peptide Binding Site Prediction.

Yan, Chengfei; Xu, Xianjin; Zou, Xiaoqin.

Methods Mol Biol ; 1561: 3-9, 2017.

Artículo en Inglés | MEDLINE | ID: mdl-28236229

RESUMEN

Peptides mediate up to 40 % of protein-protein interactions in a variety of cellular processes and are also attractive drug candidates. Thus, predicting peptide binding sites on the given protein structure is of great importance for mechanistic investigation of protein-peptide interactions and peptide therapeutics development. In this chapter, we describe the usage of our web server, referred to as ACCLUSTER, for peptide binding site prediction for a given protein structure. ACCLUSTER is freely available for users without registration at http://zougrouptoolkit.missouri.edu/accluster .

Asunto(s)

Bases de Datos de Proteínas , Fragmentos de Péptidos/metabolismo , Proteínas/metabolismo , Animales , Sitios de Unión , Eucariontes/química , Eucariontes/metabolismo , Humanos , Simulación del Acoplamiento Molecular , Fragmentos de Péptidos/química , Proteínas/química , Programas Informáticos , Navegador Web

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA