Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
ArXiv ; 2024 Jun 19.
Artículo en Inglés | MEDLINE | ID: mdl-38947930

RESUMEN

We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We build upon SE(3) flow matching for protein backbone generation and establish protocols for data preparation and evaluation to address unique challenges posed by RNA modeling. We formulate RNA structures as a set of rigid-body frames and associated loss functions which account for larger, more conformationally flexible RNA backbones (13 atoms per nucleotide) vs. proteins (4 atoms per residue). Toward tackling the lack of diversity in 3D RNA datasets, we explore training with structural clustering and cropping augmentations. Additionally, we define a suite of evaluation metrics to measure whether the generated RNA structures are globally self-consistent (via inverse folding followed by forward folding) and locally recover RNA-specific structural descriptors. The most performant version of RNA-FrameFlow generates locally realistic RNA backbones of 40-150 nucleotides, over 40% of which pass our validity criteria as measured by a self-consistency TM-score >= 0.45, at which two RNAs have the same global fold. Open-source code: https://github.com/rish-16/rna-backbone-design.

2.
ArXiv ; 2024 Jun 19.
Artículo en Inglés | MEDLINE | ID: mdl-38947934

RESUMEN

We introduce ProteinWorkshop, a comprehensive benchmark suite for representation learning on protein structures with Geometric Graph Neural Networks. We consider large-scale pre-training and downstream tasks on both experimental and predicted structures to enable the systematic evaluation of the quality of the learned structural representation and their usefulness in capturing functional relationships for downstream tasks. We find that: (1) large-scale pretraining on AlphaFold structures and auxiliary tasks consistently improve the performance of both rotation-invariant and equivariant GNNs, and (2) more expressive equivariant GNNs benefit from pretraining to a greater extent compared to invariant models. We aim to establish a common ground for the machine learning and computational biology communities to rigorously compare and advance protein structure representation learning. Our open-source codebase reduces the barrier to entry for working with large protein structure datasets by providing: (1) storage-efficient dataloaders for large-scale structural databases including AlphaFoldDB and ESM Atlas, as well as (2) utilities for constructing new tasks from the entire PDB. ProteinWorkshop is available at: github.com/a-r-j/ProteinWorkshop.

3.
Nat Comput Sci ; 4(5): 367-378, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38730184

RESUMEN

Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule-ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of ~20,000 experimental protein-ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein-ligand complexes in explicit water is included, accumulating over 170 µs. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models.


Asunto(s)
Descubrimiento de Drogas , Aprendizaje Automático , Simulación de Dinámica Molecular , Proteínas , Ligandos , Descubrimiento de Drogas/métodos , Proteínas/química , Proteínas/metabolismo , Teoría Cuántica
4.
Nat Commun ; 13(1): 7845, 2022 12 21.
Artículo en Inglés | MEDLINE | ID: mdl-36543777

RESUMEN

The assembly of biomolecules into condensates is a fundamental process underlying the organisation of the intracellular space and the regulation of many cellular functions. Mapping and characterising phase behaviour of biomolecules is essential to understand the mechanisms of condensate assembly, and to develop therapeutic strategies targeting biomolecular condensate systems. A central concept for characterising phase-separating systems is the phase diagram. Phase diagrams are typically built from numerous individual measurements sampling different parts of the parameter space. However, even when performed in microwell plate format, this process is slow, low throughput and requires significant sample consumption. To address this challenge, we present here a combinatorial droplet microfluidic platform, termed PhaseScan, for rapid and high-resolution acquisition of multidimensional biomolecular phase diagrams. Using this platform, we characterise the phase behaviour of a wide range of systems under a variety of conditions and demonstrate that this approach allows the quantitative characterisation of the effect of small molecules on biomolecular phase transitions.


Asunto(s)
Condensados Biomoleculares , Microfluídica , Espacio Intracelular , Transición de Fase
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...