Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
J Mol Biol ; : 168551, 2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38548261

ABSTRACT

CATH (https://www.cathdb.info) classifies domain structures from experimental protein structures in the PDB and predicted structures in the AlphaFold Database (AFDB). To cope with the scale of the predicted data a new NextFlow workflow (CATH-AlphaFlow), has been developed to classify high-quality domains into CATH superfamilies and identify novel fold groups and superfamilies. CATH-AlphaFlow uses a novel state-of-the-art structure-based domain boundary prediction method (ChainSaw) for identifying domains in multi-domain proteins. We applied CATH-AlphaFlow to process PDB structures not classified in CATH and AFDB structures from 21 model organisms, expanding CATH by over 100%. Domains not classified in existing CATH superfamilies or fold groups were used to seed novel folds, giving 253 new folds from PDB structures (September 2023 release) and 96 from AFDB structures of proteomes of 21 model organisms. Where possible, functional annotations were obtained using (i) predictions from publicly available methods (ii) annotations from structural relatives in AFDB/UniProt50. We also predicted functional sites and highly conserved residues. Some folds are associated with important functions such as photosynthetic acclimation (in flowering plants), iron permease activity (in fungi) and post-natal spermatogenesis (in mice). CATH-AlphaFlow will allow us to identify many more CATH relatives in the AFDB, further characterising the protein structure landscape.

2.
Commun Biol ; 6(1): 160, 2023 02 08.
Article in English | MEDLINE | ID: mdl-36755055

ABSTRACT

Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of ~370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.


Subject(s)
Furylfuramide , Proteins , Humans , Databases, Protein , Proteins/chemistry
3.
Trends Biochem Sci ; 48(4): 345-359, 2023 04.
Article in English | MEDLINE | ID: mdl-36504138

ABSTRACT

Breakthrough methods in machine learning (ML), protein structure prediction, and novel ultrafast structural aligners are revolutionizing structural biology. Obtaining accurate models of proteins and annotating their functions on a large scale is no longer limited by time and resources. The most recent method to be top ranked by the Critical Assessment of Structure Prediction (CASP) assessment, AlphaFold 2 (AF2), is capable of building structural models with an accuracy comparable to that of experimental structures. Annotations of 3D models are keeping pace with the deposition of the structures due to advancements in protein language models (pLMs) and structural aligners that help validate these transferred annotations. In this review we describe how recent developments in ML for protein science are making large-scale structural bioinformatics available to the general scientific community.


Subject(s)
Machine Learning , Proteins , Proteins/chemistry , Computational Biology/methods , Protein Conformation
4.
Curr Opin Struct Biol ; 70: 108-122, 2021 10.
Article in English | MEDLINE | ID: mdl-34225010

ABSTRACT

Understanding the mechanisms of protein function is indispensable for many biological applications, such as protein engineering and drug design. However, experimental annotations are sparse, and therefore, theoretical strategies are needed to fill the gap. Here, we present the latest developments in building functional subclassifications of protein superfamilies and using evolutionary conservation to detect functional determinants, for example, catalytic-, binding- and specificity-determining residues important for delineating the functional families. We also briefly review other features exploited for functional site detection and new machine learning strategies for combining multiple features.


Subject(s)
Biological Evolution , Proteins , Binding Sites , Catalysis , Computational Biology , Humans , Machine Learning , Protein Engineering , Proteins/genetics
5.
Nucleic Acids Res ; 49(D1): D266-D273, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33237325

ABSTRACT

CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.


Subject(s)
Computational Biology/statistics & numerical data , Databases, Protein/statistics & numerical data , Protein Domains , Proteins/chemistry , Amino Acid Sequence , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19/virology , Computational Biology/methods , Epidemics , Humans , Internet , Molecular Sequence Annotation , Proteins/genetics , Proteins/metabolism , SARS-CoV-2/genetics , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , Sequence Analysis, Protein/methods , Sequence Homology, Amino Acid , Viral Proteins/chemistry , Viral Proteins/genetics , Viral Proteins/metabolism
6.
J Chem Phys ; 153(1): 014101, 2020 Jul 07.
Article in English | MEDLINE | ID: mdl-32640817

ABSTRACT

We consider the prediction of a basic thermodynamic property-hydration free energies-across a large subset of the chemical space of small organic molecules. Our in silico study is based on computer simulations at the atomistic level with implicit solvent. We report on a kernel-based machine learning approach that is inspired by recent work in learning electronic properties but differs in key aspects: The representation is averaged over several conformers to account for the statistical ensemble. We also include an atomic-decomposition ansatz, which offers significant added transferability compared to molecular learning. Finally, we explore the existence of severe biases from databases of experimental compounds. By performing a combination of dimensionality reduction and cross-learning models, we show that the rate of learning depends significantly on the breadth and variety of the training dataset. Our study highlights the dangers of fitting machine-learning models to databases of a narrow chemical range.

7.
Monatsh Chem ; 149(1): 1-9, 2018.
Article in English | MEDLINE | ID: mdl-29290634

ABSTRACT

ABSTRACT: Cyclobutane thymine dimerization is the most prominent DNA photoinduced damage. While the ultrafast mechanism that proceeds in the singlet manifold is nowadays well established, the triplet-state pathway is not completely understood. Here we report the underlying mechanism of the photosensitized dimerization process in the triplet state. Quantum chemical calculations, combined with wavefunction analysis, and nonadiabatic molecular dynamics simulations demonstrate that this is a stepwise reaction, traversing a long-lived triplet biradical intermediate, which is characterized as a Frenkel exciton with very small charge-transfer character. The low yield of the reaction is regulated by two factors: (i) a relatively large energy barrier that needs to be overcome to form the exciton intermediate, and (ii) a bifurcation of the ground-state potential-energy surface that mostly leads back to the Franck-Condon region because dimerization requires a very restricted combination of coordinates and velocities at the event of non-radiative decay to the ground state.

8.
J Am Chem Soc ; 138(49): 15911-15916, 2016 12 14.
Article in English | MEDLINE | ID: mdl-27682199

ABSTRACT

The formation of cyclobutane thymine dimers is one of the most important DNA carcinogenic photolesions induced by ultraviolet irradiation. The long debated question whether thymine dimerization after direct light excitation involves singlet or triplet states is investigated here for the first time using nonadiabatic molecular dynamics simulations. We find that the precursor of this [2 + 2] cycloaddition reaction is the singlet doubly π2π*2 excited state, which is spectroscopically rather dark. Excitation to the bright 1ππ* or dark 1nπ* excited states does not lead to thymine dimer formation. In all cases, intersystem crossing to the triplet states is not observed during the simulated time, indicating that ultrafast dimerization occurs in the singlet manifold. The dynamics simulations also show that dimerization takes place only when conformational control happens in the doubly excited state.

9.
J Chem Phys ; 144(10): 101102, 2016 Mar 14.
Article in English | MEDLINE | ID: mdl-26979674

ABSTRACT

Full multiple spawning is a formally exact method to describe the excited-state dynamics of molecular systems beyond the Born-Oppenheimer approximation. However, it has been limited until now to the description of radiationless transitions taking place between electronic states with the same spin multiplicity. This Communication presents a generalization of the full and ab initio multiple spawning methods to both internal conversion (mediated by nonadiabatic coupling terms) and intersystem crossing events (triggered by spin-orbit coupling matrix elements) based on a spin-diabatic representation. The results of two numerical applications, a model system and the deactivation of thioformaldehyde, validate the presented formalism and its implementation.

10.
J Am Chem Soc ; 137(13): 4368-81, 2015 Apr 08.
Article in English | MEDLINE | ID: mdl-25763596

ABSTRACT

The excited-state dynamics of the purine free base and 9-methylpurine are investigated using experimental and theoretical methods. Femtosecond broadband transient absorption experiments reveal that excitation of these purine derivatives in aqueous solution at 266 nm results primarily in ultrafast conversion of the S2(ππ*) state to the vibrationally excited (1)nπ* state. Following vibrational and conformational relaxation, the (1)nπ* state acts as a doorway state in the efficient population of the triplet manifold with an intersystem crossing lifetime of hundreds of picoseconds. Experiments show an almost 2-fold increase in the intersystem crossing rate on going from polar aprotic to nonpolar solvents, suggesting that a solvent-dependent energy barrier must be surmounted to access the singlet-to-triplet crossing region. Ab initio static and surface-hopping dynamics simulations lend strong support to the proposed relaxation mechanism. Collectively, the experimental and computational results demonstrate that the accessibility of the nπ* states and the topology of the potential energy surfaces in the vicinity of conical intersections are key elements in controlling the excited-state dynamics of the purine derivatives. From a structural perspective, it is shown that the purine chromophore is not responsible for the ultrafast internal conversion in the adenine and guanine monomers. Instead, C6 functionalization plays an important role in regulating the rates of radiative and nonradiative relaxation. C6 functionalization inhibits access to the (1)nπ* state while simultaneously facilitating access to the (1)ππ*(La)/S0 conical intersection, such that population of the (1)nπ* state cannot compete with the relaxation pathways to the ground state involving ring puckering at the C2 position.


Subject(s)
Electrons , Purines/chemistry , Quantum Theory , Absorption, Physicochemical , Models, Molecular , Molecular Conformation , Thermodynamics , Vibration
11.
J Chem Phys ; 140(13): 134504, 2014 Apr 07.
Article in English | MEDLINE | ID: mdl-24712798

ABSTRACT

High-density amorphous water is simulated by use of isothermal-isobaric molecular dynamics at a pressure of 0.3 GPa making use of several water models (SPC/E, TIP3P, TIP4P variants, and TIP5P). Heating/cooling cycles are performed in the temperature range 80-280 K and quantities like density, total energy, and mobility are analysed. Raw data as well as the glass transition temperatures Tg observed in our studies depend on the water model used as well as on the treatment of intramolecular bonds and angles. However, a clear-cut evidence for the occurrence of a glass-to-liquid transition is found in all cases. Thus, all models indicate that high-density amorphous ice found experimentally may be a low-temperature proxy of an ultraviscous high-density liquid.

SELECTION OF CITATIONS
SEARCH DETAIL
...