Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
ArXiv ; 2024 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-38947930

RESUMO

We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We build upon SE(3) flow matching for protein backbone generation and establish protocols for data preparation and evaluation to address unique challenges posed by RNA modeling. We formulate RNA structures as a set of rigid-body frames and associated loss functions which account for larger, more conformationally flexible RNA backbones (13 atoms per nucleotide) vs. proteins (4 atoms per residue). Toward tackling the lack of diversity in 3D RNA datasets, we explore training with structural clustering and cropping augmentations. Additionally, we define a suite of evaluation metrics to measure whether the generated RNA structures are globally self-consistent (via inverse folding followed by forward folding) and locally recover RNA-specific structural descriptors. The most performant version of RNA-FrameFlow generates locally realistic RNA backbones of 40-150 nucleotides, over 40% of which pass our validity criteria as measured by a self-consistency TM-score >= 0.45, at which two RNAs have the same global fold. Open-source code: https://github.com/rish-16/rna-backbone-design.

2.
ArXiv ; 2024 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-38947934

RESUMO

We introduce ProteinWorkshop, a comprehensive benchmark suite for representation learning on protein structures with Geometric Graph Neural Networks. We consider large-scale pre-training and downstream tasks on both experimental and predicted structures to enable the systematic evaluation of the quality of the learned structural representation and their usefulness in capturing functional relationships for downstream tasks. We find that: (1) large-scale pretraining on AlphaFold structures and auxiliary tasks consistently improve the performance of both rotation-invariant and equivariant GNNs, and (2) more expressive equivariant GNNs benefit from pretraining to a greater extent compared to invariant models. We aim to establish a common ground for the machine learning and computational biology communities to rigorously compare and advance protein structure representation learning. Our open-source codebase reduces the barrier to entry for working with large protein structure datasets by providing: (1) storage-efficient dataloaders for large-scale structural databases including AlphaFoldDB and ESM Atlas, as well as (2) utilities for constructing new tasks from the entire PDB. ProteinWorkshop is available at: github.com/a-r-j/ProteinWorkshop.

3.
Front Bioeng Biotechnol ; 12: 1375626, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39070163

RESUMO

DNA sequences of nearly any desired composition, length, and function can be synthesized to alter the biology of an organism for purposes ranging from the bioproduction of therapeutic compounds to invasive pest control. Yet despite offering many great benefits, engineered DNA poses a risk due to their possible misuse or abuse by malicious actors, or their unintentional introduction into the environment. Monitoring the presence of engineered DNA in biological or environmental systems is therefore crucial for routine and timely detection of emerging biological threats, and for improving public acceptance of genetic technologies. To address this, we developed Synsor, a tool for identifying engineered DNA sequences in high-throughput sequencing data. Synsor leverages the k-mer signature differences between naturally occurring and engineered DNA sequences and uses an artificial neural network to classify whether a DNA sequence is natural or engineered. By querying suspected sequences against the model, Synsor can identify sequences that are likely to have been engineered. Using natural plasmid and engineered vector sequences, we showed that Synsor identifies engineered DNA with >99% accuracy. We demonstrate how Synsor can be used to detect potential genetically engineered organisms and locate where engineered DNA is being introduced into the environment by analysing genomic and metagenomic data from yeast and wastewater samples, respectively. Synsor is therefore a powerful tool that will streamline the process of identifying engineered DNA in poorly characterized biological or environmental systems, thereby allowing for enhanced monitoring of emerging biological threats.

4.
Nat Comput Sci ; 4(5): 367-378, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38730184

RESUMO

Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule-ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of ~20,000 experimental protein-ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein-ligand complexes in explicit water is included, accumulating over 170 µs. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models.


Assuntos
Descoberta de Drogas , Aprendizado de Máquina , Simulação de Dinâmica Molecular , Proteínas , Ligantes , Descoberta de Drogas/métodos , Proteínas/química , Proteínas/metabolismo , Teoria Quântica
5.
Nat Commun ; 13(1): 7845, 2022 12 21.
Artigo em Inglês | MEDLINE | ID: mdl-36543777

RESUMO

The assembly of biomolecules into condensates is a fundamental process underlying the organisation of the intracellular space and the regulation of many cellular functions. Mapping and characterising phase behaviour of biomolecules is essential to understand the mechanisms of condensate assembly, and to develop therapeutic strategies targeting biomolecular condensate systems. A central concept for characterising phase-separating systems is the phase diagram. Phase diagrams are typically built from numerous individual measurements sampling different parts of the parameter space. However, even when performed in microwell plate format, this process is slow, low throughput and requires significant sample consumption. To address this challenge, we present here a combinatorial droplet microfluidic platform, termed PhaseScan, for rapid and high-resolution acquisition of multidimensional biomolecular phase diagrams. Using this platform, we characterise the phase behaviour of a wide range of systems under a variety of conditions and demonstrate that this approach allows the quantitative characterisation of the effect of small molecules on biomolecular phase transitions.


Assuntos
Condensados Biomoleculares , Microfluídica , Espaço Intracelular , Transição de Fase
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA