Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
J Chem Inf Model ; 63(16): 5120-5132, 2023 08 28.
Artículo en Inglés | MEDLINE | ID: mdl-37578123

RESUMEN

DNA-encoded libraries (DELs) provide the means to make and screen millions of diverse compounds against a target of interest in a single experiment. However, despite producing large volumes of binding data at a relatively low cost, the DEL selection process is susceptible to noise, necessitating computational follow-up to increase signal-to-noise ratios. In this work, we present a set of informatics tools to employ data from prior DEL screen(s) to gain information about which building blocks are most likely to be productive when designing new DELs for the same target. We demonstrate that similar building blocks have similar probabilities of forming compounds that bind. We then build a model from the inference that the combined behavior of individual building blocks is predictive of whether an overall compound binds. We illustrate our approach on a set of three-cycle OpenDEL libraries screened against soluble epoxide hydrolase (sEH) and report performance of more than an order of magnitude greater than random guessing on a holdout set, demonstrating that our model can serve as a baseline for comparison against other machine learning models on DEL data. Lastly, we provide a discussion on how we believe this informatics workflow could be applied to benefit researchers in their specific DEL campaigns.


Asunto(s)
Descubrimiento de Drogas , Bibliotecas de Moléculas Pequeñas , Bibliotecas de Moléculas Pequeñas/química , ADN/química , Aprendizaje Automático
2.
J Chem Inf Model ; 60(5): 2522-2532, 2020 05 26.
Artículo en Inglés | MEDLINE | ID: mdl-31872764

RESUMEN

Cryo-EM has become one of the prime methods for protein structure elucidation, frequently yielding density maps with near-atomic or medium resolution. If protein structures cannot be deduced unambiguously from the density maps, computational structure refinement tools are needed to generate protein structural models. We have previously developed an iterative Rosetta-MDFF protocol that used cryo-EM densities to refine protein structures. Here we show that, in addition to cryo-EM densities, incorporation of other experimental restraints into the Rosetta-MDFF protocol further improved refined structures. We used NMR chemical shift (CS) data integrated with cryo-EM densities in our hybrid protocol in both the Rosetta step and the molecular dynamics (MD) simulations step. In 15 out of 18 cases for all MD rounds, the refinement results obtained when density maps and NMR chemical shift data were used in combination outperformed those of density map-only refinement. Notably, the improvement in refinement was highest when medium and low-resolution density maps were used. With our hybrid method, the RMSDs of final models obtained were always better than the RMSDs obtained by our previous protocol with just density refinement for both medium (6.9 Å) and low (9 Å) resolution maps. For all the six test proteins with medium resolution density maps (6.9 Å), the final refined structure RMSDs were lower for the hybrid method than for the cryo-EM only refinement. The final refined RMSDs were less than 1.5 Å when our hybrid protocol was used with 4 Å density maps. For four out of the six proteins the final RMSDs were even less than 1 Å. This study demonstrates that by using a combination of cryo-EM and NMR restraints, it is possible to refine structures to atomic resolution, outperforming single restraint refinement. This hybrid protocol will be a valuable tool when only low-resolution cryo-EM density data and NMR chemical shift data are available to refine structures.


Asunto(s)
Simulación de Dinámica Molecular , Proteínas , Microscopía por Crioelectrón , Imagen por Resonancia Magnética , Conformación Proteica
3.
BMC Bioinformatics ; 17(1): 328, 2016 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-27578239

RESUMEN

BACKGROUND: Sequence matching is extremely important for applications throughout biology, particularly for discovering information such as functional and evolutionary relationships, and also for discriminating between unimportant and disease mutants. At present the functions of a large fraction of genes are unknown; improvements in sequence matching will improve gene annotations. Universal amino acid substitution matrices such as Blosum62 are used to measure sequence similarities and to identify distant homologues, regardless of the structure class. However, such single matrices do not take into account important structural information evident within the different topologies of proteins and treats substitutions within all protein folds identically. Others have suggested that the use of structural information can lead to significant improvements in sequence matching but this has not yet been very effective. Here we develop novel substitution matrices that include not only general sequence information but also have a topology specific component that is unique for each CATH topology. This novel feature of using a combination of sequence and structure information for each protein topology significantly improves the sequence matching scores for the sequence pairs tested. We have used a novel multi-structure alignment method for each homology level of CATH in order to extract topological information. RESULTS: We obtain statistically significant improved sequence matching scores for 73 % of the alpha helical test cases. On average, 61 % of the test cases showed improvements in homology detection when structure information was incorporated into the substitution matrices. On average z-scores for homology detection are improved by more than 54 % for all cases, and some individual cases have z-scores more than twice those obtained using generic matrices. Our topology specific similarity matrices also outperform other traditional similarity matrices and single matrix based structure methods. When default amino acid substitution matrix in the Psi-blast algorithm is replaced by our structure-based matrices, the structure matching is significantly improved over conventional Psi-blast. It also outperforms results obtained for the corresponding HMM profiles generated for each topology. CONCLUSIONS: We show that by incorporating topology-specific structure information in addition to sequence information into specific amino acid substitution matrices, the sequence matching scores and homology detection are significantly improved. Our topology specific similarity matrices outperform other traditional similarity matrices, single matrix based structure methods, also show improvement over conventional Psi-blast and HMM profile based methods in sequence matching. The results support the discriminatory ability of the new amino acid similarity matrices to distinguish between distant homologs and structurally dissimilar pairs.


Asunto(s)
Análisis de Secuencia de Proteína/métodos , Homología Estructural de Proteína , Algoritmos , Sustitución de Aminoácidos , Estructura Secundaria de Proteína
4.
Beilstein J Org Chem ; 12: 2694-2718, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-28144341

RESUMEN

The process for drug discovery and development is challenging, time consuming and expensive. Computer-aided drug discovery (CADD) tools can act as a virtual shortcut, assisting in the expedition of this long process and potentially reducing the cost of research and development. Today CADD has become an effective and indispensable tool in therapeutic development. The human genome project has made available a substantial amount of sequence data that can be used in various drug discovery projects. Additionally, increasing knowledge of biological structures, as well as increasing computer power have made it possible to use computational methods effectively in various phases of the drug discovery and development pipeline. The importance of in silico tools is greater than ever before and has advanced pharmaceutical research. Here we present an overview of computational methods used in different facets of drug discovery and highlight some of the recent successes. In this review, both structure-based and ligand-based drug discovery methods are discussed. Advances in virtual high-throughput screening, protein structure prediction methods, protein-ligand docking, pharmacophore modeling and QSAR techniques are reviewed.

5.
Phys Biol ; 9(1): 014001, 2012 02.
Artículo en Inglés | MEDLINE | ID: mdl-22314977

RESUMEN

Loops in proteins that connect secondary structures such as alpha-helix and beta-sheet, are often on the surface and may play a critical role in some functions of a protein. The mobility of loops is central for the motional freedom and flexibility requirements of active-site loops and may play a critical role for some functions. The structures and behaviors of loops have not been studied much in the context of the whole structure and its overall motions, especially how these might be coupled. Here we investigate loop motions by using coarse-grained structures (C(α) atoms only) to solve the motions of the system by applying Lagrange equations with elastic network models to learn about which loops move in an independent fashion and which move in coordination with domain motions, faster and slower, respectively. The normal modes of the system are calculated using eigen-decomposition of the stiffness matrix. The contribution of individual modes and groups of modes is investigated for their effects on all residues in each loop by using Fourier analyses. Our results indicate overall that the motions of functional sets of loops behave in similar ways as the whole structure. But overall only a relatively few loops move in coordination with the dominant slow modes of motion, and these are often closely related to function.

6.
J Struct Funct Genomics ; 12(2): 137-47, 2011 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-21674234

RESUMEN

We propose a novel method of calculation of free energy for coarse grained models of proteins by combining our newly developed multibody potentials with entropies computed from elastic network models of proteins. Multi-body potentials have been of much interest recently because they take into account three dimensional interactions related to residue packing and capture the cooperativity of these interactions in protein structures. Combining four-body non-sequential, four-body sequential and pairwise short range potentials with optimized weights for each term, our coarse-grained potential improved recognition of native structure among misfolded decoys, outperforming all other contact potentials for CASP8 decoy sets and performance comparable to the fully atomic empirical DFIRE potentials. By combing statistical contact potentials with entropies from elastic network models of the same structures we can compute free energy changes and improve coarse-grained modeling of protein structure and dynamics. The consideration of protein flexibility and dynamics should improve protein structure prediction and refinement of computational models. This work is the first to combine coarse-grained multibody potentials with an entropic model that takes into account contributions of the entire structure, investigating native-like decoy selection.


Asunto(s)
Conformación Proteica , Proteínas/química , Termodinámica , Algoritmos , Aminoácidos/química , Simulación por Computador , Interpretación Estadística de Datos , Entropía , Modelos Moleculares
7.
Proteins ; 79(6): 1923-9, 2011 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-21560165

RESUMEN

Multibody potentials have been of much interest recently because they take into account three dimensional interactions related to residue packing and capture the cooperativity of these interactions in protein structures. Our goal was to combine long range multibody potentials and short range potentials to improve recognition of native structure among misfolded decoys. We optimized the weights for four-body nonsequential, four-body sequential, and short range potentials to obtain optimal model ranking results for threading and have compared these data against results obtained with other potentials (26 different coarse-grained potentials from the Potentials 'R'Us web server have been used). Our optimized multibody potentials outperform all other contact potentials in the recognition of the native structure among decoys, both for models from homology template-based modeling and from template-free modeling in CASP8 decoy sets. We have compared the results obtained for this optimized coarse-grained potentials, where each residue is represented by a single point, with results obtained by using the DFIRE potential, which takes into account atomic level information of proteins. We found that for all proteins larger than 80 amino acids our optimized coarse-grained potentials yield results comparable to those obtained with the atomic DFIRE potential.


Asunto(s)
Biología Computacional/métodos , Proteínas/química , Aminoácidos/química , Modelos Moleculares , Conformación Proteica
8.
J Chem Phys ; 134(23): 235101, 2011 Jun 21.
Artículo en Inglés | MEDLINE | ID: mdl-21702580

RESUMEN

Protein structures are evolutionarily more conserved than sequences, and sequences with very low sequence identity frequently share the same fold. This leads to the concept of protein designability. Some folds are more designable and lots of sequences can assume that fold. Elucidating the relationship between protein sequence and the three-dimensional (3D) structure that the sequence folds into is an important problem in computational structural biology. Lattice models have been utilized in numerous studies to model protein folds and predict the designability of certain folds. In this study, all possible compact conformations within a set of two-dimensional and 3D lattice spaces are explored. Complementary interaction graphs are then generated for each conformation and are described using a set of graph features. The full HP sequence space for each lattice model is generated and contact energies are calculated by threading each sequence onto all the possible conformations. Unique conformation giving minimum energy is identified for each sequence and the number of sequences folding to each conformation (designability) is obtained. Machine learning algorithms are used to predict the designability of each conformation. We find that the highly designable structures can be distinguished from other non-designable conformations based on certain graphical geometric features of the interactions. This finding confirms the fact that the topology of a conformation is an important determinant of the extent of its designability and suggests that the interactions themselves are important for determining the designability.


Asunto(s)
Proteínas/química , Biología Computacional , Conformación Proteica , Pliegue de Proteína
9.
J Chem Theory Comput ; 13(10): 5131-5145, 2017 Oct 10.
Artículo en Inglés | MEDLINE | ID: mdl-28949136

RESUMEN

Knowing atomistic details of proteins is essential not only for the understanding of protein function but also for the development of drugs. Experimental methods such as X-ray crystallography, NMR, and cryo-electron microscopy (cryo-EM) are the preferred forms of protein structure determination and have achieved great success over the most recent decades. Computational methods may be an alternative when experimental techniques fail. However, computational methods are severely limited when it comes to predicting larger macromolecule structures with little sequence similarity to known structures. The incorporation of experimental restraints in computational methods is becoming increasingly important to more reliably predict protein structure. One such experimental input used in structure prediction and refinement is cryo-EM densities. Recent advances in cryo-EM have arguably revolutionized the field of structural biology. Our previously developed cryo-EM-guided Rosetta-MD protocol has shown great promise in the refinement of soluble protein structures. In this study, we extended cryo-EM density-guided iterative Rosetta-MD to membrane proteins. We also improved the methodology in general by picking models based on a combination of their score and fit-to-density during the Rosetta model selection. By doing so, we have been able to pick models superior to those with the previous selection based on Rosetta score only and we have been able to further improve our previously refined models of soluble proteins. The method was tested with five membrane spanning protein structures. By applying density-guided Rosetta-MD iteratively we were able to refine the predicted structures of these membrane proteins to atomic resolutions. We also showed that the resolution of the density maps determines the improvement and quality of the refined models. By incorporating high-resolution density maps (∼4 Å), we were able to more significantly improve the quality of the models than when medium-resolution maps (6.9 Å) were used. Beginning from an average starting structure root mean square deviation (RMSD) to native of 4.66 Å, our protocol was able to refine the structures to bring the average refined structure RMSD to 1.66 Å when 4 Å density maps were used. The protocol also successfully refined the HIV-1 CTD guided by an experimental 5 Å density map.


Asunto(s)
Microscopía por Crioelectrón , Proteínas de la Membrana/química , Simulación de Dinámica Molecular , Conformación Proteica
10.
Methods Mol Biol ; 1484: 35-44, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-27787818

RESUMEN

Predicting the secondary structure of a protein from its sequence still remains a challenging problem. The prediction accuracies remain around 80 %, and for very diverse methods. Using evolutionary information and machine learning algorithms in particular has had the most impact. In this chapter, we will first define secondary structures, then we will review the Consensus Data Mining (CDM) technique based on the robust GOR algorithm and Fragment Database Mining (FDM) approach. GOR V is an empirical method utilizing a sliding window approach to model the secondary structural elements of a protein by making use of generalized evolutionary information. FDM uses data mining from experimental structure fragments, and is able to successfully predict the secondary structure of a protein by combining experimentally determined structural fragments based on sequence similarities of the fragments. The CDM method combines predictions from GOR V and FDM in a hierarchical manner to produce consensus predictions for secondary structure. In other words, if sequence fragment are not available, then it uses GOR V to make the secondary structure prediction. The online server of CDM is available at http://gor.bb.iastate.edu/cdm/ .


Asunto(s)
Estructura Secundaria de Proteína/genética , Proteínas/genética , Programas Informáticos , Algoritmos , Secuencia de Aminoácidos/genética , Minería de Datos , Proteínas/química , Alineación de Secuencia/métodos
11.
J Comput Biol ; 23(5): 400-11, 2016 05.
Artículo en Inglés | MEDLINE | ID: mdl-27159634

RESUMEN

Highly designable structures can be distinguished based on certain geometric graphical features of the interactions, confirming the fact that the topology of a protein structure and its residue-residue interaction network are important determinants of its designability. The most designable structures and least designable structures obtained for sets of proteins having the same number of residues are compared. It is shown that the most designable structures predicted by the graph features of the contact diagrams are more densely packed, whereas the poorly designable structures are more open structures or structures that are loosely packed. Interestingly enough, it can also be seen that the highly designable identified are also common structural motifs found in nature.


Asunto(s)
Biología Computacional/métodos , Proteínas/química , Proteínas/metabolismo , Simulación por Computador , Modelos Moleculares , Unión Proteica , Conformación Proteica , Pliegue de Proteína
12.
Curr Pharm Des ; 20(8): 1208-22, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-23713774

RESUMEN

Computing volumes and surface areas of molecular structures is generally considered to be a solved problem, however, comparisons presented in this review show that different ways of computing surface areas and volumes can yield dramatically different values. Volumes and surface areas are the most basic geometric properties of structures, and estimating these becomes especially important for large scale simulations when individual components are being assembled in protein complexes or drugs being fitted into proteins. Good approximations of volumes and surfaces are derived from Delaunay tessellations, but these values can differ significantly from those from the rolling ball approach of Lee and Richards (3V webserver). The origin of these differences lies in the extended parts and the less well packed parts of the proteins, which are ignored in some approaches. Even though surface areas and volumes from the two approaches differ significantly, their correlations are high. Atomic models have been compared, and the poorly packed regions of proteins are found to be most different between the two approaches. The Delaunay complexes have been explored for both fully atomic and for coarse-grained representations of proteins based on only C(α) atoms. The scaling relationships between the fully atomic models and the coarse-grained model representations of proteins are reported, and the lines fit yield simple relationships for the surface areas and volumes as a function of the number of protein residues and the number of heavy atoms. Further, the atomic and coarse-grained values are strongly correlated and simple relationships are reported.


Asunto(s)
Biología Computacional/métodos , Diseño de Fármacos , Modelos Químicos , Modelos Moleculares , Proteínas/química , Sitios de Unión , Tamaño de la Partícula , Unión Proteica , Mapeo de Interacción de Proteínas , Estructura Secundaria de Proteína , Propiedades de Superficie
13.
J Phys Chem B ; 116(23): 6725-31, 2012 Jun 14.
Artículo en Inglés | MEDLINE | ID: mdl-22490366

RESUMEN

Protein structure prediction and protein-protein docking are important and widely used tools, but methods to confidently evaluate the quality of a predicted structure or binding pose have had limited success. Typically, either knowledge-based or physics-based energy functions are employed to evaluate a set of predicted structures (termed "decoys" in structure prediction and "poses" in docking), with the lowest energy structure being assumed to be the one closest to the native state. While successful for many cases, failures are still common. Thus, improvements to structure evaluation methods are essential for future improvements. In this work, we combine multibody statistical potentials with dynamics models, evaluating fluctuation-based entropies that include contributions from the entire structure. This leads to enhanced selection of native-like structures for CASP9 decoys, refined ClusPro docking poses, as well as large sets of docking poses from the Benchmark 3.0 and Dockground data sets. The data used include both bound and unbound docking, and positive results are found for each type. Not only does this method yield improved average results, but for high quality docking poses, we often pick the best pose.


Asunto(s)
Entropía , Simulación de Dinámica Molecular , Proteínas/química , Conformación Proteica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA