Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
ArXiv ; 2024 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-38764597

RESUMO

Sidechain rotamer libraries of the common amino acids of a protein are useful for folded protein structure determination and for generating ensembles of intrinsically disordered proteins (IDPs). However much of protein function is modulated beyond the translated sequence through thFiguree introduction of post-translational modifications (PTMs). In this work we have provided a curated set of side chain rotamers for the most common PTMs derived from the RCSB PDB database, including phosphorylated, methylated, and acetylated sidechains. Our rotamer libraries improve upon existing methods such as SIDEpro and Rosetta in predicting the experimental structures for PTMs in folded proteins. In addition, we showcase our PTM libraries in full use by generating ensembles with the Monte Carlo Side Chain Entropy (MCSCE) for folded proteins, and combining MCSCE with the Local Disordered Region Sampling algorithms within IDPConformerGenerator for proteins with intrinsically disordered regions.

2.
Bioinformatics ; 39(12)2023 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-38060268

RESUMO

SUMMARY: The Local Disordered Region Sampling (LDRS, pronounced loaders) tool is a new module developed for IDPConformerGenerator, a previously validated approach to model intrinsically disordered proteins (IDPs). The IDPConformerGenerator LDRS module provides a method for generating all-atom conformations of intrinsically disordered protein regions at N- and C-termini of and in loops or linkers between folded regions of an existing protein structure. These disordered elements often lead to missing coordinates in experimental structures or low confidence in predicted structures. Requiring only a pre-existing PDB or mmCIF formatted structural template of the protein with missing coordinates or with predicted confidence scores and its full-length primary sequence, LDRS will automatically generate physically meaningful conformational ensembles of the missing flexible regions to complete the full-length protein. The capabilities of the LDRS tool of IDPConformerGenerator include modeling phosphorylation sites using enhanced Monte Carlo-Side Chain Entropy, transmembrane proteins within an all-atom bilayer, and multi-chain complexes. The modeling capacity of LDRS capitalizes on the modularity, the ability to be used as a library and via command-line, and the computational speed of the IDPConformerGenerator platform. AVAILABILITY AND IMPLEMENTATION: The LDRS module is part of the IDPConformerGenerator modeling suite, which can be downloaded from GitHub at https://github.com/julie-forman-kay-lab/IDPConformerGenerator. IDPConformerGenerator is written in Python3 and works on Linux, Microsoft Windows, and Mac OS versions that support DSSP. Users can utilize LDRS's Python API for scripting the same way they can use any part of IDPConformerGenerator's API, by importing functions from the "idpconfgen.ldrs_helper" library. Otherwise, LDRS can be used as a command line interface application within IDPConformerGenerator. Full documentation is available within the command-line interface as well as on IDPConformerGenerator's official documentation pages (https://idpconformergenerator.readthedocs.io/en/latest/).


Assuntos
Proteínas Intrinsicamente Desordenadas , Software , Biblioteca Gênica , Proteínas de Membrana , Documentação
3.
ACS Cent Sci ; 9(11): 2161-2170, 2023 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-38033801

RESUMO

We leveraged the power of ChatGPT and Bayesian optimization in the development of a multi-AI-driven system, backed by seven large language model-based assistants and equipped with machine learning algorithms, that seamlessly orchestrates a multitude of research aspects in a chemistry laboratory (termed the ChatGPT Research Group). Our approach accelerated the discovery of optimal microwave synthesis conditions, enhancing the crystallinity of MOF-321, MOF-322, and COF-323 and achieving the desired porosity and water capacity. In this system, human researchers gained assistance from these diverse AI collaborators, each with a unique role within the laboratory environment, spanning strategy planning, literature search, coding, robotic operation, labware design, safety inspection, and data analysis. Such a comprehensive approach enables a single researcher working in concert with AI to achieve productivity levels analogous to those of an entire traditional scientific team. Furthermore, by reducing human biases in screening experimental conditions and deftly balancing the exploration and exploitation of synthesis parameters, our Bayesian search approach precisely zeroed in on optimal synthesis conditions from a pool of 6 million within a significantly shortened time scale. This work serves as a compelling proof of concept for an AI-driven revolution in the chemistry laboratory, painting a future where AI becomes an efficient collaborator, liberating us from routine tasks to focus on pushing the boundaries of innovation.

4.
bioRxiv ; 2023 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-37546943

RESUMO

The Local Disordered Region Sampling (LDRS, pronounced loaders) tool, developed for the IDPConformerGenerator platform (Teixeira et al. 2022), provides a method for generating all-atom conformations of intrinsically disordered regions (IDRs) at N- and C-termini of and in loops or linkers between folded regions of an existing protein structure. These disordered elements often lead to missing coordinates in experimental structures or low confidence in predicted structures. Requiring only a pre-existing PDB structure of the protein with missing coordinates or with predicted confidence scores and its full-length primary sequence, LDRS will automatically generate physically meaningful conformational ensembles of the missing flexible regions to complete the full-length protein. The capabilities of the LDRS tool of IDPConformerGenerator include modeling phosphorylation sites using enhanced Monte Carlo Side Chain Entropy (MC-SCE) (Bhowmick and Head-Gordon 2015), transmembrane proteins within an all-atom bilayer, and multi-chain complexes. The modeling capacity of LDRS capitalizes on the modularity, ability to be used as a library and via command-line, and computational speed of the IDPConformerGenerator platform.

5.
J Am Chem Soc ; 145(32): 18048-18062, 2023 Aug 16.
Artigo em Inglês | MEDLINE | ID: mdl-37548379

RESUMO

We use prompt engineering to guide ChatGPT in the automation of text mining of metal-organic framework (MOF) synthesis conditions from diverse formats and styles of the scientific literature. This effectively mitigates ChatGPT's tendency to hallucinate information, an issue that previously made the use of large language models (LLMs) in scientific fields challenging. Our approach involves the development of a workflow implementing three different processes for text mining, programmed by ChatGPT itself. All of them enable parsing, searching, filtering, classification, summarization, and data unification with different trade-offs among labor, speed, and accuracy. We deploy this system to extract 26 257 distinct synthesis parameters pertaining to approximately 800 MOFs sourced from peer-reviewed research articles. This process incorporates our ChemPrompt Engineering strategy to instruct ChatGPT in text mining, resulting in impressive precision, recall, and F1 scores of 90-99%. Furthermore, with the data set built by text mining, we constructed a machine-learning model with over 87% accuracy in predicting MOF experimental crystallization outcomes and preliminarily identifying important factors in MOF crystallization. We also developed a reliable data-grounded MOF chatbot to answer questions about chemical reactions and synthesis procedures. Given that the process of using ChatGPT reliably mines and tabulates diverse MOF synthesis information in a unified format while using only narrative language requiring no coding expertise, we anticipate that our ChatGPT Chemistry Assistant will be very useful across various other chemistry subdisciplines.

6.
ArXiv ; 2023 Aug 18.
Artigo em Inglês | MEDLINE | ID: mdl-37645037

RESUMO

Many physics-based and machine-learned scoring functions (SFs) used to predict protein-ligand binding free energies have been trained on the PDBBind dataset. However, it is controversial as to whether new SFs are actually improving since the general, refined, and core datasets of PDBBind are cross-contaminated with proteins and ligands with high similarity, and hence they may not perform comparably well in binding prediction of new protein-ligand complexes. In this work we have carefully prepared a cleaned PDBBind data set of non-covalent binders that are split into training, validation, and test datasets to control for data leakage. The resulting leak-proof (LP)-PDBBind data is used to retrain four popular SFs: AutoDock vina, Random Forest (RF)-Score, InteractionGraphNet (IGN), and DeepDTA, to better test their capabilities when applied to new protein-ligand complexes. In particular we have formulated a new independent data set, BDB2020+, by matching high quality binding free energies from BindingDB with co-crystalized ligand-protein complexes from the PDB that have been deposited since 2020. Based on all the benchmark results, the retrained models using LP-PDBBind that rely on 3D information perform consistently among the best, with IGN especially being recommended for scoring and ranking applications for new protein-ligand systems.

7.
J Chem Phys ; 158(17)2023 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-37144719

RESUMO

The structural characterization of proteins with a disorder requires a computational approach backed by experiments to model their diverse and dynamic structural ensembles. The selection of conformational ensembles consistent with solution experiments of disordered proteins highly depends on the initial pool of conformers, with currently available tools limited by conformational sampling. We have developed a Generative Recurrent Neural Network (GRNN) that uses supervised learning to bias the probability distributions of torsions to take advantage of experimental data types such as nuclear magnetic resonance J-couplings, nuclear Overhauser effects, and paramagnetic resonance enhancements. We show that updating the generative model parameters according to the reward feedback on the basis of the agreement between experimental data and probabilistic selection of torsions from learned distributions provides an alternative to existing approaches that simply reweight conformers of a static structural pool for disordered proteins. Instead, the biased GRNN, DynamICE, learns to physically change the conformations of the underlying pool of the disordered protein to those that better agree with experiments.


Assuntos
Proteínas Intrinsicamente Desordenadas , Proteínas , Ressonância Magnética Nuclear Biomolecular , Proteínas/química , Espectroscopia de Ressonância Magnética , Conformação Proteica , Proteínas Intrinsicamente Desordenadas/química
8.
J Chem Theory Comput ; 19(14): 4689-4700, 2023 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-36749957

RESUMO

We consider a generic representation problem of internal coordinates (bond lengths, valence angles, and dihedral angles) and their transformation to 3-dimensional Cartesian coordinates of a biomolecule. We show that the internal-to-Cartesian process relies on correctly predicting chemically subtle correlations among the internal coordinates themselves, and learning these correlations increases the fidelity of the Cartesian representation. We developed a machine learning algorithm, Int2Cart, to predict bond lengths and bond angles from backbone torsion angles and residue types of a protein, which allows reconstruction of protein structures better than using fixed bond lengths and bond angles or a static library method that relies on backbone torsion angles and residue types in a local environment. The method is able to be used for structure validation, as we show that the agreement between Int2Cart-predicted bond geometries and those from an AlphaFold 2 model can be used to estimate model quality. Additionally, by using Int2Cart to reconstruct an IDP ensemble, we are able to decrease the clash rate during modeling. The Int2Cart algorithm has been implemented as a publicly accessible python package at https://github.com/THGLab/int2cart.


Assuntos
Algoritmos , Proteínas , Proteínas/química , Aprendizado de Máquina
10.
J Phys Chem A ; 126(35): 5985-6003, 2022 Sep 08.
Artigo em Inglês | MEDLINE | ID: mdl-36030416

RESUMO

The power of structural information for informing biological mechanisms is clear for stable folded macromolecules, but similar structure-function insight is more difficult to obtain for highly dynamic systems such as intrinsically disordered proteins (IDPs) which must be described as structural ensembles. Here, we present IDPConformerGenerator, a flexible, modular open-source software platform for generating large and diverse ensembles of disordered protein states that builds conformers that obey geometric, steric, and other physical restraints on the input sequence. IDPConformerGenerator samples backbone phi (φ), psi (ψ), and omega (ω) torsion angles of relevant sequence fragments from loops and secondary structure elements extracted from folded protein structures in the RCSB Protein Data Bank and builds side chains from robust Monte Carlo algorithms using expanded rotamer libraries. IDPConformerGenerator has many user-defined options enabling variable fractional sampling of secondary structures, supports Bayesian models for assessing the agreement of IDP ensembles for consistency with experimental data, and introduces a machine learning approach to transform between internal and Cartesian coordinates with reduced error. IDPConformerGenerator will facilitate the characterization of disordered proteins to ultimately provide structural insights into these states that have key biological functions.


Assuntos
Proteínas Intrinsicamente Desordenadas , Teorema de Bayes , Bases de Dados de Proteínas , Proteínas Intrinsicamente Desordenadas/química , Conformação Proteica , Estrutura Secundária de Proteína , Software
11.
Digit Discov ; 1(3): 333-343, 2022 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-35769203

RESUMO

We report a new deep learning message passing network that takes inspiration from Newton's equations of motion to learn interatomic potentials and forces. With the advantage of directional information from trainable force vectors, and physics-infused operators that are inspired by Newtonian physics, the entire model remains rotationally equivariant, and many-body interactions are inferred by more interpretable physical features. We test NewtonNet on the prediction of several reactive and non-reactive high quality ab initio data sets including single small molecules, a large set of chemically diverse molecules, and methane and hydrogen combustion reactions, achieving state-of-the-art test performance on energies and forces with far greater data and computational efficiency than other deep learning models.

12.
Sci Data ; 9(1): 215, 2022 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-35581204

RESUMO

The generation of reference data for deep learning models is challenging for reactive systems, and more so for combustion reactions due to the extreme conditions that create radical species and alternative spin states during the combustion process. Here, we extend intrinsic reaction coordinate (IRC) calculations with ab initio MD simulations and normal mode displacement calculations to more extensively cover the potential energy surface for 19 reaction channels for hydrogen combustion. A total of ∼290,000 potential energies and ∼1,270,000 nuclear force vectors are evaluated with a high quality range-separated hybrid density functional, ωB97X-V, to construct the reference data set, including transition state ensembles, for the deep learning models to study hydrogen combustion reaction.

13.
J Phys Chem B ; 126(9): 1885-1894, 2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35213160

RESUMO

Intrinsically disordered proteins and unfolded proteins have fluctuating conformational ensembles that are fundamental to their biological function and impact protein folding, stability, and misfolding. Despite the importance of protein dynamics and conformational sampling, time-dependent data types are not fully exploited when defining and refining disordered protein ensembles. Here we introduce a computational framework using an elastic network model and normal-mode displacements to generate a dynamic disordered ensemble consistent with NMR-derived dynamics parameters, including transverse R2 relaxation rates and Lipari-Szabo order parameters (S2 values). We illustrate our approach using the unfolded state of the drkN SH3 domain to show that the dynamical ensembles give better agreement than a static ensemble for a wide range of experimental validation data including NMR chemical shifts, J-couplings, nuclear Overhauser effects, paramagnetic relaxation enhancements, residual dipolar couplings, hydrodynamic radii, single-molecule fluorescence Förster resonance energy transfer, and small-angle X-ray scattering.


Assuntos
Proteínas Intrinsicamente Desordenadas , Dobramento de Proteína , Transferência Ressonante de Energia de Fluorescência , Proteínas Intrinsicamente Desordenadas/química , Ressonância Magnética Nuclear Biomolecular , Conformação Proteica , Domínios de Homologia de src
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...