Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 194
Filter
1.
Chem Sci ; 15(27): 10366-10380, 2024 Jul 10.
Article in English | MEDLINE | ID: mdl-38994407

ABSTRACT

Dual-target drug design has gained significant attention in the treatment of complex diseases, such as cancers and autoimmune disorders. A widely employed design strategy is combining pharmacophores to leverage the knowledge of structure-activity relationships of both targets. Unfortunately, pharmacophore combination often struggles with long and expensive trial and error, because the protein pockets of the two targets impose complex structural constraints. In this study, we propose AIxFuse, a structure-aware dual-target drug design method that learns pharmacophore fusion patterns to satisfy the dual-target structural constraints simulated by molecular docking. AIxFuse employs two self-play reinforcement learning (RL) agents to learn pharmacophore selection and fusion by comprehensive feedback including dual-target molecular docking scores. Collaboratively, the molecular docking scores are learned by active learning (AL). Through collaborative RL and AL, AIxFuse learns to generate molecules with multiple desired properties. AIxFuse is shown to outperform state-of-the-art methods in generating dual-target drugs against glycogen synthase kinase-3 beta (GSK3ß) and c-Jun N-terminal kinase 3 (JNK3). When applied to another task against retinoic acid receptor-related orphan receptor γ-t (RORγt) and dihydroorotate dehydrogenase (DHODH), AIxFuse exhibits consistent performance while compared methods suffer from performance drops, leading to a 5 times higher performance in success rate. Docking studies demonstrate that AIxFuse can generate molecules concurrently satisfying the binding mode required by both targets. Further free energy perturbation calculation indicates that the generated candidates have promising binding free energies against both targets.

2.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38960404

ABSTRACT

Recent advances in microfluidics and sequencing technologies allow researchers to explore cellular heterogeneity at single-cell resolution. In recent years, deep learning frameworks, such as generative models, have brought great changes to the analysis of transcriptomic data. Nevertheless, relying on the potential space of these generative models alone is insufficient to generate biological explanations. In addition, most of the previous work based on generative models is limited to shallow neural networks with one to three layers of latent variables, which may limit the capabilities of the models. Here, we propose a deep interpretable generative model called d-scIGM for single-cell data analysis. d-scIGM combines sawtooth connectivity techniques and residual networks, thereby constructing a deep generative framework. In addition, d-scIGM incorporates hierarchical prior knowledge of biological domains to enhance the interpretability of the model. We show that d-scIGM achieves excellent performance in a variety of fundamental tasks, including clustering, visualization, and pseudo-temporal inference. Through topic pathway studies, we found that d-scIGM-learned topics are better enriched for biologically meaningful pathways compared to the baseline models. Furthermore, the analysis of drug response data shows that d-scIGM can capture drug response patterns in large-scale experiments, which provides a promising way to elucidate the underlying biological mechanisms. Lastly, in the melanoma dataset, d-scIGM accurately identified different cell types and revealed multiple melanin-related driver genes and key pathways, which are critical for understanding disease mechanisms and drug development.


Subject(s)
Deep Learning , RNA-Seq , Single-Cell Analysis , Single-Cell Analysis/methods , Humans , RNA-Seq/methods , Computational Biology/methods , Algorithms , Sequence Analysis, RNA/methods , Neural Networks, Computer , Single-Cell Gene Expression Analysis
3.
J Adv Res ; 2024 Jun 09.
Article in English | MEDLINE | ID: mdl-38862035

ABSTRACT

INTRODUCTION: Frailty Index (FI) is a common measure of frailty, which has been advocated as a routine clinical test by many guidelines. The genetic and phenotypic relationships of FI with cardiovascular indicators (CIs) and behavioral characteristics (BCs) are unclear, which has hampered ability to monitor FI using easily collected data. OBJECTIVES: This study is designed to investigate the genetic and phenotypic associations of frailty with CIs and BCs, and further to construct a model to predict FI. METHOD: Genetic relationships of FI with 288 CIs and 90 BCs were assessed by the cross-trait LD score regression (LDSC) and Mendelian randomization (MR). The phenotypic data of these CIs and BCs were integrated with a machine-learning model to predict FI of individuals in UK-biobank. The relationships of the predicted FI with risks of type 2 diabetes (T2D) and neurodegenerative diseases were tested by the Kaplan-Meier estimator and Cox proportional hazards model. RESULTS: MR revealed putative causal effects of seven CIs and eight BCs on FI. These CIs and BCs were integrated to establish a model for predicting FI. The predicted FI is significantly correlated with the observed FI (Pearson correlation coefficient = 0.660, P-value = 4.96 × 10-62). The prediction model indicated "usual walking pace" contributes the most to prediction. Patients who were predicted with high FI are in significantly higher risk of T2D (HR = 2.635, P < 2 × 10-16) and neurodegenerative diseases (HR = 2.307, P = 1.62 × 10-3) than other patients. CONCLUSION: This study supports associations of FI with CIs and BCs from genetic and phenotypic perspectives. The model that is developed by integrating easily collected CIs and BCs data in predicting FI has the potential to monitor disease risk.

4.
Nucleic Acids Res ; 52(W1): W248-W255, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38738636

ABSTRACT

Knowledge of protein function is essential for elucidating disease mechanisms and discovering new drug targets. However, there is a widening gap between the exponential growth of protein sequences and their limited function annotations. In our prior studies, we have developed a series of methods including GraphPPIS, GraphSite, LMetalSite and SPROF-GO for protein function annotations at residue or protein level. To further enhance their applicability and performance, we now present GPSFun, a versatile web server for Geometry-aware Protein Sequence Function annotations, which equips our previous tools with language models and geometric deep learning. Specifically, GPSFun employs large language models to efficiently predict 3D conformations of the input protein sequences and extract informative sequence embeddings. Subsequently, geometric graph neural networks are utilized to capture the sequence and structure patterns in the protein graphs, facilitating various downstream predictions including protein-ligand binding sites, gene ontologies, subcellular locations and protein solubility. Notably, GPSFun achieves superior performance to state-of-the-art methods across diverse tasks without requiring multiple sequence alignments or experimental protein structures. GPSFun is freely available to all users at https://bio-web1.nscc-gz.cn/app/GPSFun with user-friendly interfaces and rich visualizations.


Subject(s)
Proteins , Software , Proteins/chemistry , Proteins/metabolism , Protein Conformation , Sequence Analysis, Protein , Deep Learning , Binding Sites , Molecular Sequence Annotation , Neural Networks, Computer , Amino Acid Sequence , Humans , Internet
5.
Molecules ; 29(9)2024 Apr 26.
Article in English | MEDLINE | ID: mdl-38731487

ABSTRACT

The wheat scab caused by Fusarium graminearum (F. graminearum) has seriously affected the yield and quality of wheat in China. In this study, gallic acid (GA), a natural polyphenol, was used to synthesize three azole-modified gallic acid derivatives (AGAs1-3). The antifungal activity of GA and its derivatives against F. graminearum was studied through mycelial growth rate experiments and field efficacy experiments. The results of the mycelial growth rate test showed that the EC50 of AGAs-2 was 0.49 mg/mL, and that of AGAs-3 was 0.42 mg/mL. The biological activity of AGAs-3 on F. graminearum is significantly better than that of GA. The results of field efficacy tests showed that AGAs-2 and AGAs-3 significantly reduced the incidence rate and disease index of wheat scab, and the control effect reached 68.86% and 72.11%, respectively. In addition, preliminary investigation was performed on the possible interaction between AGAs-3 and F. graminearum using density functional theory (DFT). These results indicate that compound AGAs-3, because of its characteristic of imidazolium salts, has potential for use as a green and environmentally friendly plant-derived antifungal agent for plant pathogenic fungi.


Subject(s)
Antifungal Agents , Azoles , Fusarium , Gallic Acid , Triticum , Fusarium/drug effects , Fusarium/growth & development , Gallic Acid/chemistry , Gallic Acid/pharmacology , Antifungal Agents/pharmacology , Antifungal Agents/chemistry , Triticum/microbiology , Azoles/pharmacology , Azoles/chemistry , Plant Diseases/microbiology , Plant Diseases/prevention & control , Microbial Sensitivity Tests
6.
Nat Commun ; 15(1): 4476, 2024 May 25.
Article in English | MEDLINE | ID: mdl-38796523

ABSTRACT

Protein functions are characterized by interactions with proteins, drugs, and other biomolecules. Understanding these interactions is essential for deciphering the molecular mechanisms underlying biological processes and developing new therapeutic strategies. Current computational methods mostly predict interactions based on either molecular network or structural information, without integrating them within a unified multi-scale framework. While a few multi-view learning methods are devoted to fusing the multi-scale information, these methods tend to rely intensively on a single scale and under-fitting the others, likely attributed to the imbalanced nature and inherent greediness of multi-scale learning. To alleviate the optimization imbalance, we present MUSE, a multi-scale representation learning framework based on a variant expectation maximization to optimize different scales in an alternating procedure over multiple iterations. This strategy efficiently fuses multi-scale information between atomic structure and molecular network scale through mutual supervision and iterative optimization. MUSE outperforms the current state-of-the-art models not only in molecular interaction (protein-protein, drug-protein, and drug-drug) tasks but also in protein interface prediction at the atomic structure scale. More importantly, the multi-scale learning framework shows potential for extension to other scales of computational drug discovery.


Subject(s)
Computational Biology , Proteins , Proteins/chemistry , Proteins/metabolism , Computational Biology/methods , Algorithms , Pharmaceutical Preparations/chemistry , Pharmaceutical Preparations/metabolism , Machine Learning , Drug Interactions , Humans , Protein Binding
7.
Brief Bioinform ; 25(3)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38605640

ABSTRACT

Language models pretrained by self-supervised learning (SSL) have been widely utilized to study protein sequences, while few models were developed for genomic sequences and were limited to single species. Due to the lack of genomes from different species, these models cannot effectively leverage evolutionary information. In this study, we have developed SpliceBERT, a language model pretrained on primary ribonucleic acids (RNA) sequences from 72 vertebrates by masked language modeling, and applied it to sequence-based modeling of RNA splicing. Pretraining SpliceBERT on diverse species enables effective identification of evolutionarily conserved elements. Meanwhile, the learned hidden states and attention weights can characterize the biological properties of splice sites. As a result, SpliceBERT was shown effective on several downstream tasks: zero-shot prediction of variant effects on splicing, prediction of branchpoints in humans, and cross-species prediction of splice sites. Our study highlighted the importance of pretraining genomic language models on a diverse range of species and suggested that SSL is a promising approach to enhance our understanding of the regulatory logic underlying genomic sequences.


Subject(s)
RNA Splicing , Vertebrates , Animals , Humans , Base Sequence , Vertebrates/genetics , RNA , Supervised Machine Learning
8.
Elife ; 132024 Apr 17.
Article in English | MEDLINE | ID: mdl-38630609

ABSTRACT

Revealing protein binding sites with other molecules, such as nucleic acids, peptides, or small ligands, sheds light on disease mechanism elucidation and novel drug design. With the explosive growth of proteins in sequence databases, how to accurately and efficiently identify these binding sites from sequences becomes essential. However, current methods mostly rely on expensive multiple sequence alignments or experimental protein structures, limiting their genome-scale applications. Besides, these methods haven't fully explored the geometry of the protein structures. Here, we propose GPSite, a multi-task network for simultaneously predicting binding residues of DNA, RNA, peptide, protein, ATP, HEM, and metal ions on proteins. GPSite was trained on informative sequence embeddings and predicted structures from protein language models, while comprehensively extracting residual and relational geometric contexts in an end-to-end manner. Experiments demonstrate that GPSite substantially surpasses state-of-the-art sequence-based and structure-based approaches on various benchmark datasets, even when the structures are not well-predicted. The low computational cost of GPSite enables rapid genome-scale binding residue annotations for over 568,000 sequences, providing opportunities to unveil unexplored associations of binding sites with molecular functions, biological processes, and genetic variants. The GPSite webserver and annotation database can be freely accessed at https://bio-web1.nscc-gz.cn/app/GPSite.


Subject(s)
Deep Learning , Protein Binding , Proteins/metabolism , Binding Sites , Peptides/metabolism
9.
Nat Comput Sci ; 4(4): 285-298, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38600256

ABSTRACT

The single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) technology provides insight into gene regulation and epigenetic heterogeneity at single-cell resolution, but cell annotation from scATAC-seq remains challenging due to high dimensionality and extreme sparsity within the data. Existing cell annotation methods mostly focus on the cell peak matrix without fully utilizing the underlying genomic sequence. Here we propose a method, SANGO, for accurate single-cell annotation by integrating genome sequences around the accessibility peaks within scATAC data. The genome sequences of peaks are encoded into low-dimensional embeddings, and then iteratively used to reconstruct the peak statistics of cells through a fully connected network. The learned weights are considered as regulatory modes to represent cells, and utilized to align the query cells and the annotated cells in the reference data through a graph transformer network for cell annotations. SANGO was demonstrated to consistently outperform competing methods on 55 paired scATAC-seq datasets across samples, platforms and tissues. SANGO was also shown to be able to detect unknown tumor cells through attention edge weights learned by the graph transformer. Moreover, from the annotated cells, we found cell-type-specific peaks that provide functional insights/biological signals through expression enrichment analysis, cis-regulatory chromatin interaction analysis and motif enrichment analysis.


Subject(s)
Chromatin , Single-Cell Analysis , Humans , Algorithms , Chromatin/genetics , Chromatin/metabolism , Chromatin Immunoprecipitation Sequencing/methods , Computational Biology/methods , Genome/genetics , Genomics/methods , Neoplasms/genetics , Single-Cell Analysis/methods , Transposases/genetics , Transposases/metabolism
10.
Hum Genet ; 2024 Apr 04.
Article in English | MEDLINE | ID: mdl-38575818

ABSTRACT

Genetic diseases are mostly implicated with genetic variants, including missense, synonymous, non-sense, and copy number variants. These different kinds of variants are indicated to affect phenotypes in various ways from previous studies. It remains essential but challenging to understand the functional consequences of these genetic variants, especially the noncoding ones, due to the lack of corresponding annotations. While many computational methods have been proposed to identify the risk variants. Most of them have only curated DNA-level and protein-level annotations to predict the pathogenicity of the variants, and others have been restricted to missense variants exclusively. In this study, we have curated DNA-, RNA-, and protein-level features to discriminate disease-causing variants in both coding and noncoding regions, where the features of protein sequences and protein structures have been shown essential for analyzing missense variants in coding regions while the features related to RNA-splicing and RBP binding are significant for variants in noncoding regions and synonymous variants in coding regions. Through the integration of these features, we have formulated the Multi-level feature Genomic Variants Predictor (ML-GVP) using the gradient boosting tree. The method has been trained on more than 400,000 variants in the Sherloc-training set from the 6th critical assessment of genome interpretation with superior performance. The method is one of the two best-performing predictors on the blind test in the Sherloc assessment, and is further confirmed by another independent test dataset of de novo variants.

11.
Comput Biol Med ; 173: 108365, 2024 May.
Article in English | MEDLINE | ID: mdl-38537563

ABSTRACT

BACKGROUND: Most of the methods using digital pathological image for predicting Hepatocellular carcinoma (HCC) prognosis have not considered paracancerous tissue microenvironment (PTME), which are potentially important for tumour initiation and metastasis. This study aimed to identify roles of image features of PTME in predicting prognosis and tumour recurrence of HCC patients. METHODS: We collected whole slide images (WSIs) of 146 HCC patients from Sun Yat-sen Memorial Hospital (SYSM dataset). For each WSI, five types of regions of interests (ROIs) in PTME and tumours were manually annotated. These ROIs were used to construct a Lasso Cox survival model for predicting the prognosis of HCC patients. To make the model broadly useful, we established a deep learning method to automatically segment WSIs, and further used it to construct a prognosis prediction model. This model was tested by the samples of 225 HCC patients from the Cancer Genome Atlas Liver Hepatocellular Carcinoma (TCGA-LIHC). RESULTS: In predicting prognosis of the HCC patients, using the image features of manually annotated ROIs in PTME achieved C-index 0.668 in the SYSM testing dataset, which is higher than the C-index 0.648 reached by the model only using image features of tumours. Integrating ROIs of PTME and tumours achieved C-index 0.693 in the SYSM testing dataset. The model using automatically segmented ROIs of PTME and tumours achieved C-index of 0.665 (95% CI: 0.556-0.774) in the TCGA-LIHC samples, which is better than the widely used methods, WSISA (0.567), DeepGraphSurv (0.593), and SeTranSurv (0.642). Finally, we found the Texture SumAverage Skew HV on immune cell infiltration and Texture related features on desmoplastic reaction are the most important features of PTME in predicting HCC prognosis. We additionally used the model in prediction HCC recurrence for patients from SYSM-training, SYSM-testing, and TCGA-LIHC datasets, indicating the important roles of PTME in the prediction. CONCLUSIONS: Our results indicate image features of PTME is critical for improving the prognosis prediction of HCC. Moreover, the image features related with immune cell infiltration and desmoplastic reaction of PTME are the most important factors associated with prognosis of HCC.


Subject(s)
Carcinoma, Hepatocellular , Liver Neoplasms , Humans , Carcinoma, Hepatocellular/diagnostic imaging , Liver Neoplasms/diagnostic imaging , Hospitals , Tumor Microenvironment
12.
J Chem Inf Model ; 64(6): 1945-1954, 2024 Mar 25.
Article in English | MEDLINE | ID: mdl-38484468

ABSTRACT

Self-supervised molecular representation learning has demonstrated great promise in bridging machine learning and chemical science to accelerate the development of new drugs. Due to the limited reaction data, existing methods are mostly pretrained by augmenting the intrinsic topology of molecules without effectively incorporating chemical reaction prior information, which makes them difficult to generalize to chemical reaction-related tasks. To address this issue, we propose ReaKE, a reaction knowledge embedding framework, which formulates chemical reactions as a knowledge graph. Specifically, we constructed a chemical synthesis knowledge graph with reactants and products as nodes and reaction rules as the edges. Based on the knowledge graph, we further proposed novel contrastive learning at both molecule and reaction levels to capture the reaction-related functional group information within and between molecules. Extensive experiments demonstrate the effectiveness of ReaKE compared with state-of-the-art methods on several downstream tasks, including reaction classification, product prediction, and yield prediction.


Subject(s)
Machine Learning , Pattern Recognition, Automated
13.
Comput Biol Med ; 170: 108048, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38310804

ABSTRACT

Illuminating associations between diseases and genes can help reveal the pathogenesis of syndromes and contribute to treatments, but a large number of associations remained unexplored. To identify novel disease-gene associations, many computational methods have been developed using disease and gene-related prior knowledge. However, these methods remain of relatively inferior performance due to the limited external data sources and the inevitable noise among the prior knowledge. In this study, we have developed a new method, Self-Supervised Mutual Infomax Graph Convolution Network (MiGCN), to predict disease-gene associations under the guidance of external disease-disease and gene-gene collaborative graphs. The noises within the collaborative graphs were eliminated by maximizing the mutual information between nodes and neighbors through a graphical mutual infomax layer. In parallel, the node interactions were strengthened by a novel informative message passing layer to improve the learning ability of graph neural network. The extensive experiments showed that our model achieved performance improvement over the state-of-art method by more than 8 % on AUC. The datasets, source codes and trained models of MiGCN are available at https://github.com/biomed-AI/MiGCN.


Subject(s)
Learning , Neural Networks, Computer , Humans , Software , Syndrome
14.
J Exp Med ; 221(3)2024 Mar 04.
Article in English | MEDLINE | ID: mdl-38324068

ABSTRACT

TH17 differentiation is critically controlled by "signal 3" of cytokines (IL-6/IL-23) through STAT3. However, cytokines alone induced only a moderate level of STAT3 phosphorylation. Surprisingly, TCR stimulation alone induced STAT3 phosphorylation through Lck/Fyn, and synergistically with IL-6/IL-23 induced robust and optimal STAT3 phosphorylation at Y705. Inhibition of Lck/Fyn kinase activity by Srci1 or disrupting the interaction between Lck/Fyn and STAT3 by disease-causing STAT3 mutations selectively impaired TCR stimulation, but not cytokine-induced STAT3 phosphorylation, which consequently abolished TH17 differentiation and converted them to FOXP3+ Treg cells. Srci1 administration or disrupting the interaction between Lck/Fyn and STAT3 significantly ameliorated TH17 cell-mediated EAE disease. These findings uncover an unexpected deterministic role of TCR signaling in fate determination between TH17 and Treg cells through Lck/Fyn-dependent phosphorylation of STAT3, which can be exploited to develop therapeutics selectively against TH17-related autoimmune diseases. Our study thus provides insight into how TCR signaling could integrate with cytokine signal to direct T cell differentiation.


Subject(s)
Encephalomyelitis, Autoimmune, Experimental , Receptors, Antigen, T-Cell , Th17 Cells , Cell Differentiation , Cytokines , Interleukin-23 , Interleukin-6 , Lymphocyte Specific Protein Tyrosine Kinase p56(lck) , Phosphorylation , Encephalomyelitis, Autoimmune, Experimental/immunology , Animals
15.
BMC Bioinformatics ; 25(1): 88, 2024 Feb 29.
Article in English | MEDLINE | ID: mdl-38418940

ABSTRACT

BACKGROUND: Predicting outcome of breast cancer is important for selecting appropriate treatments and prolonging the survival periods of patients. Recently, different deep learning-based methods have been carefully designed for cancer outcome prediction. However, the application of these methods is still challenged by interpretability. In this study, we proposed a novel multitask deep neural network called UISNet to predict the outcome of breast cancer. The UISNet is able to interpret the importance of features for the prediction model via an uncertainty-based integrated gradients algorithm. UISNet improved the prediction by introducing prior biological pathway knowledge and utilizing patient heterogeneity information. RESULTS: The model was tested in seven public datasets of breast cancer, and showed better performance (average C-index = 0.691) than the state-of-the-art methods (average C-index = 0.650, ranged from 0.619 to 0.677). Importantly, the UISNet identified 20 genes as associated with breast cancer, among which 11 have been proven to be associated with breast cancer by previous studies, and others are novel findings of this study. CONCLUSIONS: Our proposed method is accurate and robust in predicting breast cancer outcomes, and it is an effective way to identify breast cancer-associated genes. The method codes are available at: https://github.com/chh171/UISNet .


Subject(s)
Breast Neoplasms , Deep Learning , Humans , Female , Breast Neoplasms/genetics , Uncertainty , Neural Networks, Computer , Algorithms
16.
Int J Biol Macromol ; 260(Pt 2): 129526, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38242387

ABSTRACT

A model system of gelatinized wheat starch (GWS) and lauric acid (LA) was used to examine the effect of residual short-range molecular order in GWS on the formation of starch-lipid complexes. The extent of residual short-range molecular order, as determined by Raman spectroscopy, decreased with increasing water content or heating duration of gelatinization. The enthalpy changes, crystallinity, short-range molecular order and the in vitro enzymic digestion of GWS-LA complexes increased initially to a maximum and then declined as the short-range molecular order in GWS decreased, showing that there was an optimal amount of residual short-range molecular order in GWS for maximizing GWS-LA complexes formation. Below this optimum amount, the limited disruption of short-range molecular order may constrain the mobility of amylose chains for complexation with LA, whereas with excessive disruption above this amount the amylose chains may be too disorganized or entangled to form complexes with LA. The susceptibility of GWS-LA complexes to enzymatic hydrolysis was influenced by both long- and short-range structural order, and to a lesser extent the amounts of complexes. This study showed clearly the role of short-range molecular order in gelatinized starch in influencing the formation of GWS-LA complexes.


Subject(s)
Amylose , Starch , Starch/chemistry , Amylose/chemistry , Lauric Acids/chemistry , Hydrolysis
17.
J Chem Inf Model ; 64(7): 2554-2564, 2024 Apr 08.
Article in English | MEDLINE | ID: mdl-38267393

ABSTRACT

In molecular optimization, one popular way is R-group decoration on molecular scaffolds, and many efforts have been made to generate R-groups based on deep generative models. However, these methods mostly use information on known binding ligands, without fully utilizing target structure information. In this study, we proposed a new method, DiffDec, to involve 3D pocket constraints by a modified diffusion technique for optimizing molecules through molecular scaffold decoration. For end-to-end generation of R-groups with different sizes, we designed a novel fake atom mechanism. DiffDec was shown to be able to generate structure-aware R-groups with realistic geometric substructures by the analysis of bond angles and dihedral angles and simultaneously generate multiple R-groups for one scaffold on different growth anchors. The growth anchors could be provided by users or automatically determined by our model. DiffDec achieved R-group recovery rates of 69.67% and 45.34% in the single and multiple R-group decoration tasks, respectively, and these values were significantly higher than competing methods (37.33% and 26.85%). According to the molecular docking study, our decorated molecules obtained a better average binding affinity than baseline methods. The docking pose analysis revealed that DiffDec could decorate scaffolds with R-groups that exhibited improved binding affinities and more favorable interactions with the pocket. These results demonstrated the potential and applicability of DiffDec in real-world scaffold decoration for molecular optimization.


Subject(s)
Quantitative Structure-Activity Relationship , Molecular Docking Simulation
18.
Hum Genet ; 143(1): 49-58, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38180560

ABSTRACT

Observational studies have revealed that ischemic heart disease (IHD) has a unique manifestation on electrocardiographic (ECG). However, the genetic relationships between IHD and ECG remain unclear. We took 12-lead ECG as phenotypes to conduct genome-wide association studies (GWAS) for 41,960 samples from UK-Biobank (UKB). By leveraging large-scale GWAS summary of ECG and IHD (downloaded from FinnGen database), we performed LD score regression (LDSC), Mendelian randomization (MR), and polygenic risk score (PRS) regression to explore genetic relationships between IHD and ECG. Finally, we constructed an XGBoost model to predict IHD by integrating PRS and ECG. The GWAS identified 114 independent SNPs significantly (P value < 5 × 10-8/800, where 800 denotes the number of ECG features) associated with ECG. LDSC analysis indicated significant (P value < 0.05) genetic correlations between 39 ECG features and IHD. MR analysis performed by five approaches showed a putative causal effect of IHD on four S wave related ECG features at lead III. Integrating PRS for these ECG features with age and gender, the XGBoost model achieved Area Under Curve (AUC) 0.72 in predicting IHD. Here, we provide genetic evidence supporting S wave related ECG features at lead III to monitor the IHD risk, and open up a unique approach to integrate ECG with genetic factors for pre-warning IHD.


Subject(s)
Genome-Wide Association Study , Myocardial Ischemia , Humans , Mendelian Randomization Analysis/methods , Myocardial Ischemia/genetics , Polymorphism, Single Nucleotide , Phenotype , Genetic Risk Score
19.
J Chem Inf Model ; 64(3): 666-676, 2024 Feb 12.
Article in English | MEDLINE | ID: mdl-38241022

ABSTRACT

Fragment-based drug discovery (FBDD) is widely used in drug design. One useful strategy in FBDD is designing linkers for linking fragments to optimize their molecular properties. In the current study, we present a novel generative fragment linking model, GRELinker, which utilizes a gated-graph neural network combined with reinforcement and curriculum learning to generate molecules with desirable attributes. The model has been shown to be efficient in multiple tasks, including controlling log P, optimizing synthesizability or predicted bioactivity of compounds, and generating molecules with high 3D similarity but low 2D similarity to the lead compound. Specifically, our model outperforms the previously reported reinforcement learning (RL) built-in method DRlinker on these benchmark tasks. Moreover, GRELinker has been successfully used in an actual FBDD case to generate optimized molecules with enhanced affinities by employing the docking score as the scoring function in RL. Besides, the implementation of curriculum learning in our framework enables the generation of structurally complex linkers more efficiently. These results demonstrate the benefits and feasibility of GRELinker in linker design for molecular optimization and drug discovery.


Subject(s)
Drug Design , Drug Discovery , Neural Networks, Computer , Learning , Curriculum
20.
Nucleic Acids Res ; 52(D1): D98-D106, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37953349

ABSTRACT

Long noncoding RNAs (lncRNAs) have emerged as crucial regulators across diverse biological processes and diseases. While high-throughput sequencing has enabled lncRNA discovery, functional characterization remains limited. The EVLncRNAs database is the first and exclusive repository for all experimentally validated functional lncRNAs from various species. After previous releases in 2018 and 2021, this update marks a major expansion through exhaustive manual curation of nearly 25 000 publications from 15 May 2020, to 15 May 2023. It incorporates substantial growth across all categories: a 154% increase in functional lncRNAs, 160% in associated diseases, 186% in lncRNA-disease associations, 235% in interactions, 138% in structures, 234% in circular RNAs, 235% in resistant lncRNAs and 4724% in exosomal lncRNAs. More importantly, it incorporated additional information include functional classifications, detailed interaction pathways, homologous lncRNAs, lncRNA locations, COVID-19, phase-separation and organoid-related lncRNAs. The web interface was substantially improved for browsing, visualization, and searching. ChatGPT was tested for information extraction and functional overview with its limitation noted. EVLncRNAs 3.0 represents the most extensive curated resource of experimentally validated functional lncRNAs and will serve as an indispensable platform for unravelling emerging lncRNA functions. The updated database is freely available at https://www.sdklab-biophysics-dzu.net/EVLncRNAs3/.


Subject(s)
Databases, Nucleic Acid , RNA, Long Noncoding , Data Management , Information Storage and Retrieval , RNA, Long Noncoding/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...