Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

CryoTEN: Efficiently Enhancing Cryo-EM Density Maps Using Transformers.

Selvaraj, Joel; Wang, Liguo; Cheng, Jianlin.

bioRxiv ; 2024 Sep 11.

Artigo em Inglês | MEDLINE | ID: mdl-39314387

RESUMO

Motivation: Cryogenic Electron Microscopy (cryo-EM) is a core experimental technique used to determine the structure of macromolecules such as proteins. However, the effectiveness of cryo-EM is often hindered by the noise and missing density values in cryo-EM density maps caused by experimental conditions such as low contrast and conformational heterogeneity. Although various global and local map sharpening techniques are widely employed to improve cryo-EM density maps, it is still challenging to efficiently improve their quality for building better protein structures from them. Results: In this study, we introduce CryoTEN - a three-dimensional U-Net style transformer to improve cryo-EM maps effectively. CryoTEN is trained using a diverse set of 1,295 cryo-EM maps as inputs and their corresponding simulated maps generated from known protein structures as targets. An independent test set containing 150 maps is used to evaluate CryoTEN, and the results demonstrate that it can robustly enhance the quality of cryo-EM density maps. In addition, the automatic de novo protein structure modeling shows that the protein structures built from the density maps processed by CryoTEN have substantially better quality than those built from the original maps. Compared to the existing state-of-the-art deep learning methods for enhancing cryo-EM density maps, CryoTEN ranks second in improving the quality of density maps, while running > 10 times faster and requiring much less GPU memory than them. Availability and implementation: The source code and data is freely available at https://github.com/jianlin-cheng/cryoten.

2.

Improving protein function prediction by learning and integrating representations of protein sequences and function labels.

Boadu, Frimpong; Cheng, Jianlin.

Bioinform Adv ; 4(1): vbae120, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39233898

RESUMO

Motivation: As fewer than 1% of proteins have protein function information determined experimentally, computationally predicting the function of proteins is critical for obtaining functional information for most proteins and has been a major challenge in protein bioinformatics. Despite the significant progress made in protein function prediction by the community in the last decade, the general accuracy of protein function prediction is still not high, particularly for rare function terms associated with few proteins in the protein function annotation database such as the UniProt. Results: We introduce TransFew, a new transformer model, to learn the representations of both protein sequences and function labels [Gene Ontology (GO) terms] to predict the function of proteins. TransFew leverages a large pre-trained protein language model (ESM2-t48) to learn function-relevant representations of proteins from raw protein sequences and uses a biological natural language model (BioBert) and a graph convolutional neural network-based autoencoder to generate semantic representations of GO terms from their textual definition and hierarchical relationships, which are combined together to predict protein function via the cross-attention. Integrating the protein sequence and label representations not only enhances overall function prediction accuracy, but delivers a robust performance of predicting rare function terms with limited annotations by facilitating annotation transfer between GO terms. Availability and implementation: https://github.com/BioinfoMachineLearning/TransFew.

3.

Evaluating Representation Learning on the Protein Structure Universe.

Jamasb, Arian R; Morehead, Alex; Joshi, Chaitanya K; Zhang, Zuobai; Didi, Kieran; Mathis, Simon; Harris, Charles; Tang, Jian; Cheng, Jianlin; Liò, Pietro; Blundell, Tom L.

ArXiv ; 2024 Jun 19.

Artigo em Inglês | MEDLINE | ID: mdl-38947934

RESUMO

We introduce ProteinWorkshop, a comprehensive benchmark suite for representation learning on protein structures with Geometric Graph Neural Networks. We consider large-scale pre-training and downstream tasks on both experimental and predicted structures to enable the systematic evaluation of the quality of the learned structural representation and their usefulness in capturing functional relationships for downstream tasks. We find that: (1) large-scale pretraining on AlphaFold structures and auxiliary tasks consistently improve the performance of both rotation-invariant and equivariant GNNs, and (2) more expressive equivariant GNNs benefit from pretraining to a greater extent compared to invariant models. We aim to establish a common ground for the machine learning and computational biology communities to rigorously compare and advance protein structure representation learning. Our open-source codebase reduces the barrier to entry for working with large protein structure datasets by providing: (1) storage-efficient dataloaders for large-scale structural databases including AlphaFoldDB and ESM Atlas, as well as (2) utilities for constructing new tasks from the entire PDB. ProteinWorkshop is available at: github.com/a-r-j/ProteinWorkshop.

4.

De novo atomic protein structure modeling for cryoEM density maps using 3D transformer and HMM.

Giri, Nabin; Cheng, Jianlin.

Nat Commun ; 15(1): 5511, 2024 Jun 29.

Artigo em Inglês | MEDLINE | ID: mdl-38951555

RESUMO

Accurately building 3D atomic structures from cryo-EM density maps is a crucial step in cryo-EM-based protein structure determination. Converting density maps into 3D atomic structures for proteins lacking accurate homologous or predicted structures as templates remains a significant challenge. Here, we introduce Cryo2Struct, a fully automated de novo cryo-EM structure modeling method. Cryo2Struct utilizes a 3D transformer to identify atoms and amino acid types in cryo-EM density maps, followed by an innovative Hidden Markov Model (HMM) to connect predicted atoms and build protein backbone structures. Cryo2Struct produces substantially more accurate and complete protein structural models than the widely used ab initio method Phenix. Additionally, its performance in building atomic structural models is robust against changes in the resolution of density maps and the size of protein structures.

Assuntos

Microscopia Crioeletrônica , Cadeias de Markov , Modelos Moleculares , Conformação Proteica , Proteínas , Microscopia Crioeletrônica/métodos , Proteínas/química , Proteínas/ultraestrutura , Algoritmos , Software

5.

Geometry-complete diffusion for 3D molecule generation and optimization.

Morehead, Alex; Cheng, Jianlin.

Commun Chem ; 7(1): 150, 2024 Jul 03.

Artigo em Inglês | MEDLINE | ID: mdl-38961141

RESUMO

Generative deep learning methods have recently been proposed for generating 3D molecules using equivariant graph neural networks (GNNs) within a denoising diffusion framework. However, such methods are unable to learn important geometric properties of 3D molecules, as they adopt molecule-agnostic and non-geometric GNNs as their 3D graph denoising networks, which notably hinders their ability to generate valid large 3D molecules. In this work, we address these gaps by introducing the Geometry-Complete Diffusion Model (GCDM) for 3D molecule generation, which outperforms existing 3D molecular diffusion models by significant margins across conditional and unconditional settings for the QM9 dataset and the larger GEOM-Drugs dataset, respectively. Importantly, we demonstrate that GCDM's generative denoising process enables the model to generate a significant proportion of valid and energetically-stable large molecules at the scale of GEOM-Drugs, whereas previous methods fail to do so with the features they learn. Additionally, we show that extensions of GCDM can not only effectively design 3D molecules for specific protein pockets but can be repurposed to consistently optimize the geometry and chemical composition of existing 3D molecules for molecular stability and property specificity, demonstrating new versatility of molecular diffusion models. Code and data are freely available on GitHub .

6.

Deep learning methods for protein function prediction.

Boadu, Frimpong; Lee, Ahhyun; Cheng, Jianlin.

Proteomics ; : e2300471, 2024 Jul 12.

Artigo em Inglês | MEDLINE | ID: mdl-38996351

RESUMO

Predicting protein function from protein sequence, structure, interaction, and other relevant information is important for generating hypotheses for biological experiments and studying biological systems, and therefore has been a major challenge in protein bioinformatics. Numerous computational methods had been developed to advance protein function prediction gradually in the last two decades. Particularly, in the recent years, leveraging the revolutionary advances in artificial intelligence (AI), more and more deep learning methods have been developed to improve protein function prediction at a faster pace. Here, we provide an in-depth review of the recent developments of deep learning methods for protein function prediction. We summarize the significant advances in the field, identify several remaining major challenges to be tackled, and suggest some potential directions to explore. The data sources and evaluation metrics widely used in protein function prediction are also discussed to assist the machine learning, AI, and bioinformatics communities to develop more cutting-edge methods to advance protein function prediction.

7.

Outcomes of the EMDataResource cryo-EM Ligand Modeling Challenge.

Lawson, Catherine L; Kryshtafovych, Andriy; Pintilie, Grigore D; Burley, Stephen K; Cerný, Jirí; Chen, Vincent B; Emsley, Paul; Gobbi, Alberto; Joachimiak, Andrzej; Noreng, Sigrid; Prisant, Michael G; Read, Randy J; Richardson, Jane S; Rohou, Alexis L; Schneider, Bohdan; Sellers, Benjamin D; Shao, Chenghua; Sourial, Elizabeth; Williams, Chris I; Williams, Christopher J; Yang, Ying; Abbaraju, Venkat; Afonine, Pavel V; Baker, Matthew L; Bond, Paul S; Blundell, Tom L; Burnley, Tom; Campbell, Arthur; Cao, Renzhi; Cheng, Jianlin; Chojnowski, Grzegorz; Cowtan, K D; DiMaio, Frank; Esmaeeli, Reza; Giri, Nabin; Grubmüller, Helmut; Hoh, Soon Wen; Hou, Jie; Hryc, Corey F; Hunte, Carola; Igaev, Maxim; Joseph, Agnel P; Kao, Wei-Chun; Kihara, Daisuke; Kumar, Dilip; Lang, Lijun; Lin, Sean; Maddhuri Venkata Subramaniya, Sai R; Mittal, Sumit; Mondal, Arup.

Nat Methods ; 21(7): 1340-1348, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38918604

RESUMO

The EMDataResource Ligand Model Challenge aimed to assess the reliability and reproducibility of modeling ligands bound to protein and protein-nucleic acid complexes in cryogenic electron microscopy (cryo-EM) maps determined at near-atomic (1.9-2.5 Å) resolution. Three published maps were selected as targets: Escherichia coli beta-galactosidase with inhibitor, SARS-CoV-2 virus RNA-dependent RNA polymerase with covalently bound nucleotide analog and SARS-CoV-2 virus ion channel ORF3a with bound lipid. Sixty-one models were submitted from 17 independent research groups, each with supporting workflow details. The quality of submitted ligand models and surrounding atoms were analyzed by visual inspection and quantification of local map quality, model-to-map fit, geometry, energetics and contact scores. A composite rather than a single score was needed to assess macromolecule+ligand model quality. These observations lead us to recommend best practices for assessing cryo-EM structures of liganded macromolecules reported at near-atomic resolution.

Assuntos

Microscopia Crioeletrônica , Modelos Moleculares , Microscopia Crioeletrônica/métodos , Ligantes , SARS-CoV-2 , COVID-19/virologia , Escherichia coli , beta-Galactosidase/química , beta-Galactosidase/metabolismo , Conformação Proteica , Reprodutibilidade dos Testes

8.

HiCDiff: single-cell Hi-C data denoising with diffusion models.

Wang, Yanli; Cheng, Jianlin.

Brief Bioinform ; 25(4)2024 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-38856167

RESUMO

The genome-wide single-cell chromosome conformation capture technique, i.e. single-cell Hi-C (ScHi-C), was recently developed to interrogate the conformation of the genome of individual cells. However, single-cell Hi-C data are much sparser than bulk Hi-C data of a population of cells, and noise in single-cell Hi-C makes it difficult to apply and analyze them in biological research. Here, we developed the first generative diffusion models (HiCDiff) to denoise single-cell Hi-C data in the form of chromosomal contact matrices. HiCDiff uses a deep residual network to remove the noise in the reverse process of diffusion and can be trained in both unsupervised and supervised learning modes. Benchmarked on several single-cell Hi-C test datasets, the diffusion models substantially remove the noise in single-cell Hi-C data. The unsupervised HiCDiff outperforms most supervised non-diffusion deep learning methods and achieves the performance comparable to the state-of-the-art supervised deep learning method in terms of multiple metrics, demonstrating that diffusion models are a useful approach to denoising single-cell Hi-C data. Moreover, its good performance holds on denoising bulk Hi-C data.

Assuntos

Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Biologia Computacional/métodos , Aprendizado Profundo , Algoritmos

9.

CryoSegNet: accurate cryo-EM protein particle picking by integrating the foundational AI image segmentation model and attention-gated U-Net.

Gyawali, Rajan; Dhakal, Ashwin; Wang, Liguo; Cheng, Jianlin.

Brief Bioinform ; 25(4)2024 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-38860738

RESUMO

Picking protein particles in cryo-electron microscopy (cryo-EM) micrographs is a crucial step in the cryo-EM-based structure determination. However, existing methods trained on a limited amount of cryo-EM data still cannot accurately pick protein particles from noisy cryo-EM images. The general foundational artificial intelligence-based image segmentation model such as Meta's Segment Anything Model (SAM) cannot segment protein particles well because their training data do not include cryo-EM images. Here, we present a novel approach (CryoSegNet) of integrating an attention-gated U-shape network (U-Net) specially designed and trained for cryo-EM particle picking and the SAM. The U-Net is first trained on a large cryo-EM image dataset and then used to generate input from original cryo-EM images for SAM to make particle pickings. CryoSegNet shows both high precision and recall in segmenting protein particles from cryo-EM micrographs, irrespective of protein type, shape and size. On several independent datasets of various protein types, CryoSegNet outperforms two top machine learning particle pickers crYOLO and Topaz as well as SAM itself. The average resolution of density maps reconstructed from the particles picked by CryoSegNet is 3.33 Å, 7% better than 3.58 Å of Topaz and 14% better than 3.87 Å of crYOLO. It is publicly available at https://github.com/jianlin-cheng/CryoSegNet.

Assuntos

Microscopia Crioeletrônica , Processamento de Imagem Assistida por Computador , Microscopia Crioeletrônica/métodos , Processamento de Imagem Assistida por Computador/métodos , Proteínas/química , Inteligência Artificial , Algoritmos , Bases de Dados de Proteínas

10.

Multi-omics analyses and machine learning prediction of oviductal responses in the presence of gametes and embryos.

Finnerty, Ryan M; Carulli, Daniel J; Hegde, Akshata; Wang, Yanli; Baodu, Frimpong; Winuthayanon, Sarayut; Cheng, Jianlin; Winuthayanon, Wipawee.

bioRxiv ; 2024 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-38915688

RESUMO

The oviduct is the site of fertilization and preimplantation embryo development in mammals. Evidence suggests that gametes alter oviductal gene expression. To delineate the adaptive interactions between the oviduct and gamete/embryo, we performed a multi-omics characterization of oviductal tissues utilizing bulk RNA-sequencing (RNA-seq), single-cell RNA-sequencing (scRNA-seq), and proteomics collected from distal and proximal at various stages after mating in mice. We observed robust region-specific transcriptional signatures. Specifically, the presence of sperm induces genes involved in pro-inflammatory responses in the proximal region at 0.5 days post-coitus (dpc). Genes involved in inflammatory responses were produced specifically by secretory epithelial cells in the oviduct. At 1.5 and 2.5 dpc, genes involved in pyruvate and glycolysis were enriched in the proximal region, potentially providing metabolic support for developing embryos. Abundant proteins in the oviductal fluid were differentially observed between naturally fertilized and superovulated samples. RNA-seq data were used to identify transcription factors predicted to influence protein abundance in the proteomic data via a novel machine learning model based on transformers of integrating transcriptomics and proteomics data. The transformers identified influential transcription factors and correlated predictive protein expressions in alignment with the in vivo-derived data. In conclusion, our multi-omics characterization and subsequent in vivo confirmation of proteins/RNAs indicate that the oviduct is adaptive and responsive to the presence of sperm and embryos in a spatiotemporal manner.

11.

Deep Learning for Protein-Ligand Docking: Are We There Yet?

Morehead, Alex; Giri, Nabin; Liu, Jian; Cheng, Jianlin.

ArXiv ; 2024 Jul 07.

Artigo em Inglês | MEDLINE | ID: mdl-38827451

RESUMO

The effects of ligand binding on protein structures and their in vivo functions carry numerous implications for modern biomedical research and biotechnology development efforts such as drug discovery. Although several deep learning (DL) methods and benchmarks designed for protein-ligand docking have recently been introduced, to date no prior works have systematically studied the behavior of docking methods within the practical context of (1) using predicted (apo) protein structures for docking (e.g., for broad applicability); (2) docking multiple ligands concurrently to a given target protein (e.g., for enzyme design); and (3) having no prior knowledge of binding pockets (e.g., for pocket generalization). To enable a deeper understanding of docking methods' real-world utility, we introduce PoseBench, the first comprehensive benchmark for practical protein-ligand docking. PoseBench enables researchers to rigorously and systematically evaluate DL docking methods for apoto-holo protein-ligand docking and protein-ligand structure generation using both single and multi-ligand benchmark datasets, the latter of which we introduce for the first time to the DL community. Empirically, using PoseBench, we find that all recent DL docking methods but one fail to generalize to multi-ligand protein targets and also that template-based docking algorithms perform equally well or better for multi-ligand docking as recent single-ligand DL docking methods, suggesting areas of improvement for future work. Code, data, tutorials, and benchmark results are available at https://github.com/BioinfoMachineLearning/PoseBench.

12.

Cryo2StructData: A Large Labeled Cryo-EM Density Map Dataset for AI-based Modeling of Protein Structures.

Giri, Nabin; Wang, Liguo; Cheng, Jianlin.

Sci Data ; 11(1): 458, 2024 May 06.

Artigo em Inglês | MEDLINE | ID: mdl-38710720

RESUMO

The advent of single-particle cryo-electron microscopy (cryo-EM) has brought forth a new era of structural biology, enabling the routine determination of large biological molecules and their complexes at atomic resolution. The high-resolution structures of biological macromolecules and their complexes significantly expedite biomedical research and drug discovery. However, automatically and accurately building atomic models from high-resolution cryo-EM density maps is still time-consuming and challenging when template-based models are unavailable. Artificial intelligence (AI) methods such as deep learning trained on limited amount of labeled cryo-EM density maps generate inaccurate atomic models. To address this issue, we created a dataset called Cryo2StructData consisting of 7,600 preprocessed cryo-EM density maps whose voxels are labelled according to their corresponding known atomic structures for training and testing AI methods to build atomic models from cryo-EM density maps. Cryo2StructData is larger than existing, publicly available datasets for training AI methods to build atomic protein structures from cryo-EM density maps. We trained and tested deep learning models on Cryo2StructData to validate its quality showing that it is ready for being used to train and test AI methods for building atomic models.

Assuntos

Inteligência Artificial , Microscopia Crioeletrônica , Proteínas , Microscopia Crioeletrônica/métodos , Proteínas/química , Proteínas/ultraestrutura , Modelos Moleculares , Conformação Proteica

13.

A Survey of Deep Learning Methods for Estimating the Accuracy of Protein Quaternary Structure Models.

Chen, Xiao; Liu, Jian; Park, Nolan; Cheng, Jianlin.

Biomolecules ; 14(5)2024 May 13.

Artigo em Inglês | MEDLINE | ID: mdl-38785981

RESUMO

The quality prediction of quaternary structure models of a protein complex, in the absence of its true structure, is known as the Estimation of Model Accuracy (EMA). EMA is useful for ranking predicted protein complex structures and using them appropriately in biomedical research, such as protein-protein interaction studies, protein design, and drug discovery. With the advent of more accurate protein complex (multimer) prediction tools, such as AlphaFold2-Multimer and ESMFold, the estimation of the accuracy of protein complex structures has attracted increasing attention. Many deep learning methods have been developed to tackle this problem; however, there is a noticeable absence of a comprehensive overview of these methods to facilitate future development. Addressing this gap, we present a review of deep learning EMA methods for protein complex structures developed in the past several years, analyzing their methodologies, data and feature construction. We also provide a prospective summary of some potential new developments for further improving the accuracy of the EMA methods.

Assuntos

Aprendizado Profundo , Estrutura Quaternária de Proteína , Proteínas , Proteínas/química , Modelos Moleculares , Humanos

14.

Integrating transformer-based machine learning with SERS technology for the analysis of hazardous pesticides in spinach.

Hajikhani, Mehdi; Hegde, Akashata; Snyder, John; Cheng, Jianlin; Lin, Mengshi.

J Hazard Mater ; 470: 134208, 2024 May 15.

Artigo em Inglês | MEDLINE | ID: mdl-38593663

RESUMO

This study introduces an innovative strategy for the rapid and accurate identification of pesticide residues in agricultural products by combining surface-enhanced Raman spectroscopy (SERS) with a state-of-the-art transformer model, termed SERSFormer. Gold-silver core-shell nanoparticles were synthesized and served as high-performance SERS substrates, which possess well-defined structures, uniform dispersion, and a core-shell composition with an average diameter of 21.44 ± 4.02 nm, as characterized by TEM-EDS. SERSFormer employs sophisticated, task-specific data processing techniques and CNN embedders, powered by an architecture features weight-shared multi-head self-attention transformer encoder layers. The SERSFormer model demonstrated exceptional proficiency in qualitative analysis, successfully classifying six categories, including five pesticides (coumaphos, oxamyl, carbophenothion, thiabendazole, and phosmet) and a control group of spinach data, with 98.4% accuracy. For quantitative analysis, the model accurately predicted pesticide concentrations with a mean absolute error of 0.966, a mean squared error of 1.826, and an R2 score of 0.849. This novel approach, which combines SERS with machine learning and is supported by robust transformer models, showcases the potential for real-time pesticide detection to improve food safety in the agricultural and food industries.

Assuntos

Ouro , Aprendizado de Máquina , Nanopartículas Metálicas , Praguicidas , Prata , Análise Espectral Raman , Spinacia oleracea , Análise Espectral Raman/métodos , Spinacia oleracea/química , Nanopartículas Metálicas/química , Prata/química , Ouro/química , Praguicidas/análise , Contaminação de Alimentos/análise , Resíduos de Praguicidas/análise

15.

Biochemical, structural, and computational analyses of two new clinically identified missense mutations of ALDH7A1.

Korasick, David A; Buckley, David P; Palpacelli, Alessandra; Cursio, Ida; Cesaroni, Elisabetta; Cheng, Jianlin; Tanner, John J.

Chem Biol Interact ; 394: 110993, 2024 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-38604394

RESUMO

Aldehyde dehydrogenase 7A1 (ALDH7A1) catalyzes a step of lysine catabolism. Certain missense mutations in the ALDH7A1 gene cause pyridoxine dependent epilepsy (PDE), a rare autosomal neurometabolic disorder with recessive inheritance that affects almost 1:65,000 live births and is classically characterized by recurrent seizures from the neonatal period. We report a biochemical, structural, and computational study of two novel ALDH7A1 missense mutations that were identified in a child with rare recurrent seizures from the third month of life. The mutations affect two residues in the oligomer interfaces of ALDH7A1, Arg134 and Arg441 (Arg162 and Arg469 in the HGVS nomenclature). The corresponding enzyme variants R134S and R441C (p.Arg162Ser and p.Arg469Cys in the HGVS nomenclature) were expressed in Escherichia coli and purified. R134S and R441C have 10,000- and 50-fold lower catalytic efficiency than wild-type ALDH7A1, respectively. Sedimentation velocity analytical ultracentrifugation shows that R134S is defective in tetramerization, remaining locked in a dimeric state even in the presence of the tetramer-inducing coenzyme NAD+. Because the tetramer is the active form of ALDH7A1, the defect in oligomerization explains the very low catalytic activity of R134S. In contrast, R441C exhibits wild-type oligomerization behavior, and the 2.0 Å resolution crystal structure of R441C complexed with NAD+ revealed no obvious structural perturbations when compared to the wild-type enzyme structure. Molecular dynamics simulations suggest that the mutation of Arg441 to Cys may increase intersubunit ion pairs and alter the dynamics of the active site gate. Our biochemical, structural, and computational data on two novel clinical variants of ALDH7A1 add to the complexity of the molecular determinants underlying pyridoxine dependent epilepsy.

Assuntos

Aldeído Desidrogenase , Mutação de Sentido Incorreto , Aldeído Desidrogenase/genética , Aldeído Desidrogenase/química , Aldeído Desidrogenase/metabolismo , Humanos , Simulação de Dinâmica Molecular , Cristalografia por Raios X , Modelos Moleculares , Epilepsia/genética , Lactente , Masculino

16.

Diffusion models in bioinformatics and computational biology.

Guo, Zhiye; Liu, Jian; Wang, Yanli; Chen, Mengrui; Wang, Duolin; Xu, Dong; Cheng, Jianlin.

Nat Rev Bioeng ; 2(2): 136-154, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38576453

RESUMO

Denoising diffusion models embody a type of generative artificial intelligence that can be applied in computer vision, natural language processing and bioinformatics. In this Review, we introduce the key concepts and theoretical foundations of three diffusion modelling frameworks (denoising diffusion probabilistic models, noise-conditioned scoring networks and score stochastic differential equations). We then explore their applications in bioinformatics and computational biology, including protein design and generation, drug and small-molecule design, protein-ligand interaction modelling, cryo-electron microscopy image data analysis and single-cell data analysis. Finally, we highlight open-source diffusion model tools and consider the future applications of diffusion models in bioinformatics.

17.

Pigs: Large Animal Preclinical Cancer Models.

Joshi, Kirtan; Katam, Tejas; Hegde, Akshata; Cheng, Jianlin; Prather, Randall S; Whitworth, Kristin; Wells, Kevin; Bryan, Jeffrey N; Hoffman, Timothy; Telugu, Bhanu P; Kaifi, Jussuf T; Rachagani, Satyanarayana.

World J Oncol ; 15(2): 149-168, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38545477

RESUMO

Pigs are playing an increasingly vital role as translational biomedical models for studying human pathophysiology. The annotation of the pig genome was a huge step forward in translatability of pigs as a biomedical model for various human diseases. Similarities between humans and pigs in terms of anatomy, physiology, genetics, and immunology have allowed pigs to become a comprehensive preclinical model for human diseases. With a diverse range, from craniofacial and ophthalmology to reproduction, wound healing, musculoskeletal, and cancer, pigs have provided a seminal understanding of human pathophysiology. This review focuses on the current research using pigs as preclinical models for cancer research and highlights the strengths and opportunities for studying various human cancers.

18.

Geometry-complete perceptron networks for 3D molecular graphs.

Morehead, Alex; Cheng, Jianlin.

Bioinformatics ; 40(2)2024 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-38373819

RESUMO

MOTIVATION: The field of geometric deep learning has recently had a profound impact on several scientific domains such as protein structure prediction and design, leading to methodological advancements within and outside of the realm of traditional machine learning. Within this spirit, in this work, we introduce GCPNet, a new chirality-aware SE(3)-equivariant graph neural network designed for representation learning of 3D biomolecular graphs. We show that GCPNet, unlike previous representation learning methods for 3D biomolecules, is widely applicable to a variety of invariant or equivariant node-level, edge-level, and graph-level tasks on biomolecular structures while being able to (1) learn important chiral properties of 3D molecules and (2) detect external force fields. RESULTS: Across four distinct molecular-geometric tasks, we demonstrate that GCPNet's predictions (1) for protein-ligand binding affinity achieve a statistically significant correlation of 0.608, more than 5%, greater than current state-of-the-art methods; (2) for protein structure ranking achieve statistically significant target-local and dataset-global correlations of 0.616 and 0.871, respectively; (3) for Newtownian many-body systems modeling achieve a task-averaged mean squared error less than 0.01, more than 15% better than current methods; and (4) for molecular chirality recognition achieve a state-of-the-art prediction accuracy of 98.7%, better than any other machine learning method to date. AVAILABILITY AND IMPLEMENTATION: The source code, data, and instructions to train new models or reproduce our results are freely available at https://github.com/BioinfoMachineLearning/GCPNet.

Assuntos

Aprendizado de Máquina , Redes Neurais de Computação , Software

19.

CryoTransformer: a transformer model for picking protein particles from cryo-EM micrographs.

Dhakal, Ashwin; Gyawali, Rajan; Wang, Liguo; Cheng, Jianlin.

Bioinformatics ; 40(3)2024 03 04.

Artigo em Inglês | MEDLINE | ID: mdl-38407301

RESUMO

MOTIVATION: Cryo-electron microscopy (cryo-EM) is a powerful technique for determining the structures of large protein complexes. Picking single protein particles from cryo-EM micrographs (images) is a crucial step in reconstructing protein structures from them. However, the widely used template-based particle picking process requires some manual particle picking and is labor-intensive and time-consuming. Though machine learning and artificial intelligence (AI) can potentially automate particle picking, the current AI methods pick particles with low precision or low recall. The erroneously picked particles can severely reduce the quality of reconstructed protein structures, especially for the micrographs with low signal-to-noise ratio. RESULTS: To address these shortcomings, we devised CryoTransformer based on transformers, residual networks, and image processing techniques to accurately pick protein particles from cryo-EM micrographs. CryoTransformer was trained and tested on the largest labeled cryo-EM protein particle dataset-CryoPPP. It outperforms the current state-of-the-art machine learning methods of particle picking in terms of the resolution of 3D density maps reconstructed from the picked particles as well as F1-score, and is poised to facilitate the automation of the cryo-EM protein particle picking. AVAILABILITY AND IMPLEMENTATION: The source code and data for CryoTransformer are openly available at: https://github.com/jianlin-cheng/CryoTransformer.

Assuntos

Inteligência Artificial , Software , Microscopia Crioeletrônica/métodos , Aprendizado de Máquina , Processamento de Imagem Assistida por Computador/métodos , Proteínas

20.

Outcomes of the EMDataResource Cryo-EM Ligand Modeling Challenge.

Lawson, Catherine L; Kryshtafovych, Andriy; Pintilie, Grigore D; Burley, Stephen K; Cerný, Jirí; Chen, Vincent B; Emsley, Paul; Gobbi, Alberto; Joachimiak, Andrzej; Noreng, Sigrid; Prisant, Michael; Read, Randy J; Richardson, Jane S; Rohou, Alexis L; Schneider, Bohdan; Sellers, Benjamin D; Shao, Chenghua; Sourial, Elizabeth; Williams, Chris I; Williams, Christopher J; Yang, Ying; Abbaraju, Venkat; Afonine, Pavel V; Baker, Matthew L; Bond, Paul S; Blundell, Tom L; Burnley, Tom; Campbell, Arthur; Cao, Renzhi; Cheng, Jianlin; Chojnowski, Grzegorz; Cowtan, Kevin D; DiMaio, Frank; Esmaeeli, Reza; Giri, Nabin; Grubmüller, Helmut; Hoh, Soon Wen; Hou, Jie; Hryc, Corey F; Hunte, Carola; Igaev, Maxim; Joseph, Agnel P; Kao, Wei-Chun; Kihara, Daisuke; Kumar, Dilip; Lang, Lijun; Lin, Sean; Maddhuri Venkata Subramaniya, Sai R; Mittal, Sumit; Mondal, Arup.

Res Sq ; 2024 Jan 25.

Artigo em Inglês | MEDLINE | ID: mdl-38343795

RESUMO

The EMDataResource Ligand Model Challenge aimed to assess the reliability and reproducibility of modeling ligands bound to protein and protein/nucleic-acid complexes in cryogenic electron microscopy (cryo-EM) maps determined at near-atomic (1.9-2.5 Å) resolution. Three published maps were selected as targets: E. coli beta-galactosidase with inhibitor, SARS-CoV-2 RNA-dependent RNA polymerase with covalently bound nucleotide analog, and SARS-CoV-2 ion channel ORF3a with bound lipid. Sixty-one models were submitted from 17 independent research groups, each with supporting workflow details. We found that (1) the quality of submitted ligand models and surrounding atoms varied, as judged by visual inspection and quantification of local map quality, model-to-map fit, geometry, energetics, and contact scores, and (2) a composite rather than a single score was needed to assess macromolecule+ligand model quality. These observations lead us to recommend best practices for assessing cryo-EM structures of liganded macromolecules reported at near-atomic resolution.

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA