Pesquisa | Secretaria de Estado da Saúde

1.

Multi-Cover Persistence (MCP)-based machine learning for polymer property prediction.

Zhang, Yipeng; Shen, Cong; Xia, Kelin.

Brief Bioinform ; 25(6)2024 Sep 23.

Artigo em Inglês | MEDLINE | ID: mdl-39323091

RESUMO

Accurate and efficient prediction of polymers properties is crucial for polymer design. Recently, data-driven artificial intelligence (AI) models have demonstrated great promise in polymers property analysis. Even with the great progresses, a pivotal challenge in all the AI-driven models remains to be the effective representation of molecules. Here we introduce Multi-Cover Persistence (MCP)-based molecular representation and featurization for the first time. Our MCP-based polymer descriptors are combined with machine learning models, in particular, Gradient Boosting Tree (GBT) models, for polymers property prediction. Different from all previous molecular representation, polymer molecular structure and interactions are represented as MCP, which utilizes Delaunay slices at different dimensions and Rhomboid tiling to characterize the complicated geometric and topological information within the data. Statistic features from the generated persistent barcodes are used as polymer descriptors, and further combined with GBT model. Our model has been extensively validated on polymer benchmark datasets. It has been found that our models can outperform traditional fingerprint-based models and has similar accuracy with geometric deep learning models. In particular, our model tends to be more effective on large-sized monomer structures, demonstrating the great potential of MCP in characterizing more complicated polymer data. This work underscores the potential of MCP in polymer informatics, presenting a novel perspective on molecular representation and its application in polymer science.

Assuntos

Aprendizado de Máquina , Polímeros , Polímeros/química , Algoritmos

2.

Persistent Tor-algebra for protein-protein interaction analysis.

Liu, Xiang; Feng, Huitao; Lü, Zhi; Xia, Kelin.

Brief Bioinform ; 24(2)2023 03 19.

Artigo em Inglês | MEDLINE | ID: mdl-36790858

RESUMO

Protein-protein interactions (PPIs) play crucial roles in almost all biological processes from cell-signaling and membrane transport to metabolism and immune systems. Efficient characterization of PPIs at the molecular level is key to the fundamental understanding of PPI mechanisms. Even with the gigantic amount of PPI models from graphs, networks, geometry and topology, it remains as a great challenge to design functional models that efficiently characterize the complicated multiphysical information within PPIs. Here we propose persistent Tor-algebra (PTA) model for a unified algebraic representation of the multiphysical interactions. Mathematically, our PTA is inherently algebraic data analysis. In our PTA model, protein structures and interactions are described as a series of face rings and Tor modules, from which PTA model is developed. The multiphysical information within/between biomolecules are implicitly characterized by PTA and further represented as PTA barcodes. To test our PTA models, we consider PTA-based ensemble learning for PPI binding affinity prediction. The two most commonly used datasets, i.e. SKEMPI and AB-Bind, are employed. It has been found that our model outperforms all the existing models as far as we know. Mathematically, our PTA model provides a highly efficient way for the characterization of molecular structures and interactions.

Assuntos

Mapeamento de Interação de Proteínas , Proteínas , Proteínas/química , Mapas de Interação de Proteínas

3.

Persistent spectral based ensemble learning (PerSpect-EL) for protein-protein binding affinity prediction.

Wee, JunJie; Xia, Kelin.

Brief Bioinform ; 23(2)2022 03 10.

Artigo em Inglês | MEDLINE | ID: mdl-35189639

RESUMO

Protein-protein interactions (PPIs) play a significant role in nearly all cellular and biological activities. Data-driven machine learning models have demonstrated great power in PPIs. However, the design of efficient molecular featurization poses a great challenge for all learning models for PPIs. Here, we propose persistent spectral (PerSpect) based PPI representation and featurization, and PerSpect-based ensemble learning (PerSpect-EL) models for PPI binding affinity prediction, for the first time. In our model, a sequence of Hodge (or combinatorial) Laplacian (HL) matrices at various different scales are generated from a specially designed filtration process. PerSpect attributes, which are statistical and combinatorial properties of spectrum information from these HL matrices, are used as features for PPI characterization. Each PerSpect attribute is input into a 1D convolutional neural network (CNN), and these CNN networks are stacked together in our PerSpect-based ensemble learning models. We systematically test our model on the two most commonly used datasets, i.e. SKEMPI and AB-Bind. It has been found that our model can achieve state-of-the-art results and outperform all existing models to the best of our knowledge.

Assuntos

Aprendizado de Máquina , Redes Neurais de Computação , Ligação Proteica

4.

Multiphysical graph neural network (MP-GNN) for COVID-19 drug design.

Li, Xiao-Shuang; Liu, Xiang; Lu, Le; Hua, Xian-Sheng; Chi, Ying; Xia, Kelin.

Brief Bioinform ; 23(4)2022 07 18.

Artigo em Inglês | MEDLINE | ID: mdl-35696650

RESUMO

Graph neural networks (GNNs) are the most promising deep learning models that can revolutionize non-Euclidean data analysis. However, their full potential is severely curtailed by poorly represented molecular graphs and features. Here, we propose a multiphysical graph neural network (MP-GNN) model based on the developed multiphysical molecular graph representation and featurization. All kinds of molecular interactions, between different atom types and at different scales, are systematically represented by a series of scale-specific and element-specific graphs with distance-related node features. From these graphs, graph convolution network (GCN) models are constructed with specially designed weight-sharing architectures. Base learners are constructed from GCN models from different elements at different scales, and further consolidated together using both one-scale and multi-scale ensemble learning schemes. Our MP-GNN has two distinct properties. First, our MP-GNN incorporates multiscale interactions using more than one molecular graph. Atomic interactions from various different scales are not modeled by one specific graph (as in traditional GNNs), instead they are represented by a series of graphs at different scales. Second, it is free from the complicated feature generation process as in conventional GNN methods. In our MP-GNN, various atom interactions are embedded into element-specific graph representations with only distance-related node features. A unique GNN architecture is designed to incorporate all the information into a consolidated model. Our MP-GNN has been extensively validated on the widely used benchmark test datasets from PDBbind, including PDBbind-v2007, PDBbind-v2013 and PDBbind-v2016. Our model can outperform all existing models as far as we know. Further, our MP-GNN is used in coronavirus disease 2019 drug design. Based on a dataset with 185 complexes of inhibitors for severe acute respiratory syndrome coronavirus (SARS-CoV/SARS-CoV-2), we evaluate their binding affinities using our MP-GNN. It has been found that our MP-GNN is of high accuracy. This demonstrates the great potential of our MP-GNN for the screening of potential drugs for SARS-CoV-2. Availability: The Multiphysical graph neural network (MP-GNN) model can be found in https://github.com/Alibaba-DAMO-DrugAI/MGNN. Additional data or code will be available upon reasonable request.

Assuntos

Tratamento Farmacológico da COVID-19 , Análise de Dados , Desenho de Fármacos , Humanos , Redes Neurais de Computação , SARS-CoV-2

5.

Persistent spectral simplicial complex-based machine learning for chromosomal structural analysis in cellular differentiation.

Gong, Weikang; Wee, JunJie; Wu, Min-Chun; Sun, Xiaohan; Li, Chunhua; Xia, Kelin.

Brief Bioinform ; 23(4)2022 07 18.

Artigo em Inglês | MEDLINE | ID: mdl-35536545

RESUMO

The three-dimensional (3D) chromosomal structure plays an essential role in all DNA-templated processes, including gene transcription, DNA replication and other cellular processes. Although developing chromosome conformation capture (3C) methods, such as Hi-C, which can generate chromosomal contact data characterized genome-wide chromosomal structural properties, understanding 3D genomic nature-based on Hi-C data remains lacking. Here, we propose a persistent spectral simplicial complex (PerSpectSC) model to describe Hi-C data for the first time. Specifically, a filtration process is introduced to generate a series of nested simplicial complexes at different scales. For each of these simplicial complexes, its spectral information can be calculated from the corresponding Hodge Laplacian matrix. PerSpectSC model describes the persistence and variation of the spectral information of the nested simplicial complexes during the filtration process. Different from all previous models, our PerSpectSC-based features provide a quantitative global-scale characterization of chromosome structures and topology. Our descriptors can successfully classify cell types and also cellular differentiation stages for all the 24 types of chromosomes simultaneously. In particular, persistent minimum best characterizes cell types and Dim (1) persistent multiplicity best characterizes cellular differentiation. These results demonstrate the great potential of our PerSpectSC-based models in polymeric data analysis.

Assuntos

Cromossomos , Genômica , Diferenciação Celular , Cromossomos/genética , Genômica/métodos , Aprendizado de Máquina , Conformação Molecular

6.

Molecular persistent spectral image (Mol-PSI) representation for machine learning models in drug design.

Jiang, Peiran; Chi, Ying; Li, Xiao-Shuang; Liu, Xiang; Hua, Xian-Sheng; Xia, Kelin.

Brief Bioinform ; 23(1)2022 01 17.

Artigo em Inglês | MEDLINE | ID: mdl-34958660

RESUMO

Artificial intelligence (AI)-based drug design has great promise to fundamentally change the landscape of the pharmaceutical industry. Even though there are great progress from handcrafted feature-based machine learning models, 3D convolutional neural networks (CNNs) and graph neural networks, effective and efficient representations that characterize the structural, physical, chemical and biological properties of molecular structures and interactions remain to be a great challenge. Here, we propose an equal-sized molecular 2D image representation, known as the molecular persistent spectral image (Mol-PSI), and combine it with CNN model for AI-based drug design. Mol-PSI provides a unique one-to-one image representation for molecular structures and interactions. In general, deep models are empowered to achieve better performance with systematically organized representations in image format. A well-designed parallel CNN architecture for adapting Mol-PSIs is developed for protein-ligand binding affinity prediction. Our results, for the three most commonly used databases, including PDBbind-v2007, PDBbind-v2013 and PDBbind-v2016, are better than all traditional machine learning models, as far as we know. Our Mol-PSI model provides a powerful molecular representation that can be widely used in AI-based drug design and molecular data analysis.

Assuntos

Desenho de Fármacos , Aprendizado de Máquina , Ligação Proteica , Inteligência Artificial , Ligantes , Modelos Moleculares , Modelos Teóricos , Estrutura Molecular , Redes Neurais de Computação , Ligação Proteica/efeitos dos fármacos

7.

Forman persistent Ricci curvature (FPRC)-based machine learning models for protein-ligand binding affinity prediction.

Wee, JunJie; Xia, Kelin.

Brief Bioinform ; 22(6)2021 11 05.

Artigo em Inglês | MEDLINE | ID: mdl-33940588

RESUMO

Artificial intelligence (AI) techniques have already been gradually applied to the entire drug design process, from target discovery, lead discovery, lead optimization and preclinical development to the final three phases of clinical trials. Currently, one of the central challenges for AI-based drug design is molecular featurization, which is to identify or design appropriate molecular descriptors or fingerprints. Efficient and transferable molecular descriptors are key to the success of all AI-based drug design models. Here we propose Forman persistent Ricci curvature (FPRC)-based molecular featurization and feature engineering, for the first time. Molecular structures and interactions are modeled as simplicial complexes, which are generalization of graphs to their higher dimensional counterparts. Further, a multiscale representation is achieved through a filtration process, during which a series of nested simplicial complexes at different scales are generated. Forman Ricci curvatures (FRCs) are calculated on the series of simplicial complexes, and the persistence and variation of FRCs during the filtration process is defined as FPRC. Moreover, persistent attributes, which are FPRC-based functions and properties, are employed as molecular descriptors, and combined with machine learning models, in particular, gradient boosting tree (GBT). Our FPRC-GBT models are extensively trained and tested on three most commonly-used datasets, including PDBbind-2007, PDBbind-2013 and PDBbind-2016. It has been found that our results are better than the ones from machine learning models with traditional molecular descriptors.

Assuntos

Bases de Dados de Proteínas , Aprendizado de Máquina , Proteínas/química , Ligantes , Ligação Proteica , Proteínas/metabolismo

8.

Persistent spectral hypergraph based machine learning (PSH-ML) for protein-ligand binding affinity prediction.

Liu, Xiang; Feng, Huitao; Wu, Jie; Xia, Kelin.

Brief Bioinform ; 22(5)2021 09 02.

Artigo em Inglês | MEDLINE | ID: mdl-33837771

RESUMO

Molecular descriptors are essential to not only quantitative structure activity/property relationship (QSAR/QSPR) models, but also machine learning based chemical and biological data analysis. In this paper, we propose persistent spectral hypergraph (PSH) based molecular descriptors or fingerprints for the first time. Our PSH-based molecular descriptors are used in the characterization of molecular structures and interactions, and further combined with machine learning models, in particular gradient boosting tree (GBT), for protein-ligand binding affinity prediction. Different from traditional molecular descriptors, which are usually based on molecular graph models, a hypergraph-based topological representation is proposed for protein-ligand interaction characterization. Moreover, a filtration process is introduced to generate a series of nested hypergraphs in different scales. For each of these hypergraphs, its eigen spectrum information can be obtained from the corresponding (Hodge) Laplacain matrix. PSH studies the persistence and variation of the eigen spectrum of the nested hypergraphs during the filtration process. Molecular descriptors or fingerprints can be generated from persistent attributes, which are statistical or combinatorial functions of PSH, and combined with machine learning models, in particular, GBT. We test our PSH-GBT model on three most commonly used datasets, including PDBbind-2007, PDBbind-2013 and PDBbind-2016. Our results, for all these databases, are better than all existing machine learning models with traditional molecular descriptors, as far as we know.

Assuntos

Desenho de Fármacos , Descoberta de Drogas/métodos , Aprendizado de Máquina , Preparações Farmacêuticas/química , Proteínas/química , Relação Quantitativa Estrutura-Atividade , Algoritmos , Ligação Competitiva , Biologia Computacional/métodos , Bases de Dados de Proteínas , Humanos , Ligantes , Modelos Moleculares , Estrutura Molecular , Preparações Farmacêuticas/metabolismo , Ligação Proteica , Proteínas/metabolismo

9.

Hypergraph-based persistent cohomology (HPC) for molecular representations in drug design.

Liu, Xiang; Wang, Xiangjun; Wu, Jie; Xia, Kelin.

Brief Bioinform ; 22(5)2021 09 02.

Artigo em Inglês | MEDLINE | ID: mdl-33480394

RESUMO

Artificial intelligence (AI) based drug design has demonstrated great potential to fundamentally change the pharmaceutical industries. Currently, a key issue in AI-based drug design is efficient transferable molecular descriptors or fingerprints. Here, we present hypergraph-based molecular topological representation, hypergraph-based (weighted) persistent cohomology (HPC/HWPC) and HPC/HWPC-based molecular fingerprints for machine learning models in drug design. Molecular structures and their atomic interactions are highly complicated and pose great challenges for efficient mathematical representations. We develop the first hypergraph-based topological framework to characterize detailed molecular structures and interactions at atomic level. Inspired by the elegant path complex model, hypergraph-based embedded homology and persistent homology have been proposed recently. Based on them, we construct HPC/HWPC, and use them to generate molecular descriptors for learning models in protein-ligand binding affinity prediction, one of the key step in drug design. Our models are tested on three most commonly-used databases, including PDBbind-v2007, PDBbind-v2013 and PDBbind-v2016, and outperform all existing machine learning models with traditional molecular descriptors. Our HPC/HWPC models have demonstrated great potential in AI-based drug design.

Assuntos

Bases de Dados de Proteínas , Desenho de Fármacos , Aprendizado de Máquina , Modelos Químicos

10.

Dowker complex based machine learning (DCML) models for protein-ligand binding affinity prediction.

Liu, Xiang; Feng, Huitao; Wu, Jie; Xia, Kelin.

PLoS Comput Biol ; 18(4): e1009943, 2022 04.

Artigo em Inglês | MEDLINE | ID: mdl-35385478

RESUMO

With the great advancements in experimental data, computational power and learning algorithms, artificial intelligence (AI) based drug design has begun to gain momentum recently. AI-based drug design has great promise to revolutionize pharmaceutical industries by significantly reducing the time and cost in drug discovery processes. However, a major issue remains for all AI-based learning model that is efficient molecular representations. Here we propose Dowker complex (DC) based molecular interaction representations and Riemann Zeta function based molecular featurization, for the first time. Molecular interactions between proteins and ligands (or others) are modeled as Dowker complexes. A multiscale representation is generated by using a filtration process, during which a series of DCs are generated at different scales. Combinatorial (Hodge) Laplacian matrices are constructed from these DCs, and the Riemann zeta functions from their spectral information can be used as molecular descriptors. To validate our models, we consider protein-ligand binding affinity prediction. Our DC-based machine learning (DCML) models, in particular, DC-based gradient boosting tree (DC-GBT), are tested on three most-commonly used datasets, i.e., including PDBbind-2007, PDBbind-2013 and PDBbind-2016, and extensively compared with other existing state-of-the-art models. It has been found that our DC-based descriptors can achieve the state-of-the-art results and have better performance than all machine learning models with traditional molecular descriptors. Our Dowker complex based machine learning models can be used in other tasks in AI-based drug design and molecular data analysis.

Assuntos

Inteligência Artificial , Aprendizado de Máquina , Ligantes , Ligação Proteica , Proteínas/química

11.

Multiscale Topological Indices for the Quantitative Prediction of SARS CoV-2 Binding Affinity Change upon Mutations.

Bi, Jialin; Wee, JunJie; Liu, Xiang; Qu, Cunquan; Wang, Guanghui; Xia, Kelin.

J Chem Inf Model ; 63(13): 4216-4227, 2023 07 10.

Artigo em Inglês | MEDLINE | ID: mdl-37381769

RESUMO

The Coronavirus disease 2019 (COVID-19) has affected people's lives and the development of the global economy. Biologically, protein-protein interactions between SARS-CoV-2 surface spike (S) protein and human ACE2 protein are the key mechanism behind the COVID-19 disease. In this study, we provide insights into interactions between the SARS-CoV-2 S-protein and ACE2, and propose topological indices to quantitatively characterize the impact of mutations on binding affinity changes (ΔΔG). In our model, a series of nested simplicial complexes and their related adjacency matrices at various different scales are generated from a specially designed filtration process, based on the 3D structures of spike-ACE2 protein complexes. We develop a set of multiscale simplicial complexes-based topological indices, for the first time. Unlike previous graph network models, which give only a qualitative analysis, our topological indices can provide a quantitative prediction of the binding affinity change caused by mutations and achieve great accuracy. In particular, for mutations that happened at specifical amino acids, such as Polar amino acids or Arginine amino acids, the correlation between our topological gravity model index and binding affinity change, in terms of Pearson correlation coefficient, can be higher than 0.8. As far as we know, this is the first time multiscale topological indices have been used in the quantitative analysis of protein-protein interactions.

Assuntos

COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , SARS-CoV-2/metabolismo , Enzima de Conversão de Angiotensina 2/metabolismo , Ligação Proteica , Mutação , Glicoproteína da Espícula de Coronavírus/metabolismo

12.

Fingerprint-Enhanced Graph Attention Network (FinGAT) Model for Antibiotic Discovery.

Choo, Hou Yee; Wee, JunJie; Shen, Cong; Xia, Kelin.

J Chem Inf Model ; 63(10): 2928-2935, 2023 05 22.

Artigo em Inglês | MEDLINE | ID: mdl-37167016

RESUMO

Artificial Intelligence (AI) techniques are of great potential to fundamentally change antibiotic discovery industries. Efficient and effective molecular featurization is key to all highly accurate learning models for antibiotic discovery. In this paper, we propose a fingerprint-enhanced graph attention network (FinGAT) model by the combination of sequence-based 2D fingerprints and structure-based graph representation. In our feature learning process, sequence information is transformed into a fingerprint vector, and structural information is encoded through a GAT module into another vector. These two vectors are concatenated and input into a multilayer perceptron (MLP) for antibiotic activity classification. Our model is extensively tested and compared with existing models. It has been found that our FinGAT can outperform various state-of-the-art GNN models in antibiotic discovery.

Assuntos

Antibacterianos , Inteligência Artificial , Antibacterianos/farmacologia , Aprendizagem , Redes Neurais de Computação

13.

Laplacian Spectra of Persistent Structures in Taiwan, Singapore, and US Stock Markets.

Yen, Peter Tsung-Wen; Xia, Kelin; Cheong, Siew Ann.

Entropy (Basel) ; 25(6)2023 May 25.

Artigo em Inglês | MEDLINE | ID: mdl-37372190

RESUMO

An important challenge in the study of complex systems is to identify appropriate effective variables at different times. In this paper, we explain why structures that are persistent with respect to changes in length and time scales are proper effective variables, and illustrate how persistent structures can be identified from the spectra and Fiedler vector of the graph Laplacian at different stages of the topological data analysis (TDA) filtration process for twelve toy models. We then investigated four market crashes, three of which were related to the COVID-19 pandemic. In all four crashes, a persistent gap opens up in the Laplacian spectra when we go from a normal phase to a crash phase. In the crash phase, the persistent structure associated with the gap remains distinguishable up to a characteristic length scale Ïµ* where the first non-zero Laplacian eigenvalue changes most rapidly. Before Ïµ*, the distribution of components in the Fiedler vector is predominantly bi-modal, and this distribution becomes uni-modal after Ïµ*. Our findings hint at the possibility of understanding market crashs in terms of both continuous and discontinuous changes. Beyond the graph Laplacian, we can also employ Hodge Laplacians of higher order for future research.

14.

Hom-Complex-Based Machine Learning (HCML) for the Prediction of Protein-Protein Binding Affinity Changes upon Mutation.

Liu, Xiang; Feng, Huitao; Wu, Jie; Xia, Kelin.

J Chem Inf Model ; 62(17): 3961-3969, 2022 09 12.

Artigo em Inglês | MEDLINE | ID: mdl-36040839

RESUMO

Protein-protein interactions (PPIs) are involved in almost all biological processes in the cell. Understanding protein-protein interactions holds the key for the understanding of biological functions, diseases and the development of therapeutics. Recently, artificial intelligence (AI) models have demonstrated great power in PPIs. However, a key issue for all AI-based PPI models is efficient molecular representations and featurization. Here, we propose Hom-complex-based PPI representation, and Hom-complex-based machine learning models for the prediction of PPI binding affinity changes upon mutation, for the first time. In our model, various Hom complexes Hom(G1, G) can be generated for the graph representation G of protein-protein complex by using different graphs G1, which reveal G1-related inner connections within the graph representation G of protein-protein complex. Further, for a specific graph G1, a series of nested Hom complexes are generated to give a multiscale characterization of the PPIs. Its persistent homology and persistent Euler characteristic are used as molecular descriptors and further combined with the machine learning model, in particular, gradient boosting tree (GBT). We systematically test our model on the two most-commonly used data sets, that is, SKEMPI and AB-Bind. It has been found that our model outperforms all the existing models as far as we know, which demonstrates the great potential of our model for the analysis of PPIs. Our model can be used for the analysis and design of efficient antibodies for SARS-CoV-2.

Assuntos

Inteligência Artificial , COVID-19 , Humanos , Aprendizado de Máquina , Mutação , Ligação Proteica , SARS-CoV-2/genética

15.

Biomolecular Topology: Modelling and Analysis.

Liu, Jian; Xia, Ke-Lin; Wu, Jie; Yau, Stephen Shing-Toung; Wei, Guo-Wei.

Acta Math Sin Engl Ser ; 38(10): 1901-1938, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36407804

RESUMO

With the great advancement of experimental tools, a tremendous amount of biomolecular data has been generated and accumulated in various databases. The high dimensionality, structural complexity, the nonlinearity, and entanglements of biomolecular data, ranging from DNA knots, RNA secondary structures, protein folding configurations, chromosomes, DNA origami, molecular assembly, to others at the macromolecular level, pose a severe challenge in their analysis and characterization. In the past few decades, mathematical concepts, models, algorithms, and tools from algebraic topology, combinatorial topology, computational topology, and topological data analysis, have demonstrated great power and begun to play an essential role in tackling the biomolecular data challenge. In this work, we introduce biomolecular topology, which concerns the topological problems and models originated from the biomolecular systems. More specifically, the biomolecular topology encompasses topological structures, properties and relations that are emerged from biomolecular structures, dynamics, interactions, and functions. We discuss the various types of biomolecular topology from structures (of proteins, DNAs, and RNAs), protein folding, and protein assembly. A brief discussion of databanks (and databases), theoretical models, and computational algorithms, is presented. Further, we systematically review related topological models, including graphs, simplicial complexes, persistent homology, persistent Laplacians, de Rham-Hodge theory, Yau-Hausdorff distance, and the topology-based machine learning models.

16.

Ollivier Persistent Ricci Curvature-Based Machine Learning for the Protein-Ligand Binding Affinity Prediction.

Wee, JunJie; Xia, Kelin.

J Chem Inf Model ; 61(4): 1617-1626, 2021 04 26.

Artigo em Inglês | MEDLINE | ID: mdl-33724038

RESUMO

Efficient molecular featurization is one of the major issues for machine learning models in drug design. Here, we propose a persistent Ricci curvature (PRC), in particular, Ollivier PRC (OPRC), for the molecular featurization and feature engineering, for the first time. The filtration process proposed in the persistent homology is employed to generate a series of nested molecular graphs. Persistence and variation of Ollivier Ricci curvatures on these nested graphs are defined as OPRC. Moreover, persistent attributes, which are statistical and combinatorial properties of OPRCs during the filtration process, are used as molecular descriptors and further combined with machine learning models, in particular, gradient boosting tree (GBT). Our OPRC-GBT model is used in the prediction of the protein-ligand binding affinity, which is one of the key steps in drug design. Based on three of the most commonly used data sets from the well-established protein-ligand binding databank, that is, PDBbind, we intensively test our model and compare with existing models. It has been found that our model can achieve the state-of-the-art results and has advantages over traditional molecular descriptors.

Assuntos

Aprendizado de Máquina , Proteínas , Bases de Dados de Proteínas , Ligantes , Ligação Proteica , Proteínas/metabolismo

17.

Understanding Changes in the Topology and Geometry of Financial Market Correlations during a Market Crash.

Yen, Peter Tsung-Wen; Xia, Kelin; Cheong, Siew Ann.

Entropy (Basel) ; 23(9)2021 Sep 14.

Artigo em Inglês | MEDLINE | ID: mdl-34573837

RESUMO

In econophysics, the achievements of information filtering methods over the past 20 years, such as the minimal spanning tree (MST) by Mantegna and the planar maximally filtered graph (PMFG) by Tumminello et al., should be celebrated. Here, we show how one can systematically improve upon this paradigm along two separate directions. First, we used topological data analysis (TDA) to extend the notions of nodes and links in networks to faces, tetrahedrons, or k-simplices in simplicial complexes. Second, we used the Ollivier-Ricci curvature (ORC) to acquire geometric information that cannot be provided by simple information filtering. In this sense, MSTs and PMFGs are but first steps to revealing the topological backbones of financial networks. This is something that TDA can elucidate more fully, following which the ORC can help us flesh out the geometry of financial networks. We applied these two approaches to a recent stock market crash in Taiwan and found that, beyond fusions and fissions, other non-fusion/fission processes such as cavitation, annihilation, rupture, healing, and puncture might also be important. We also successfully identified neck regions that emerged during the crash, based on their negative ORCs, and performed a case study on one such neck region.

18.

Determining Optimal Coarse-Grained Representation for Biomolecules Using Internal Cluster Validation Indexes.

Wu, Zhenliang; Zhang, Yuwei; Zhang, John Zenghui; Xia, Kelin; Xia, Fei.

J Comput Chem ; 41(1): 14-20, 2020 01 05.

Artigo em Inglês | MEDLINE | ID: mdl-31568566

RESUMO

The development of ultracoarse-grained models for large biomolecules needs to derive the optimal number of coarse-grained (CG) sites to represent the targets. In this work, we propose to use the statistical internal cluster validation indexes to determine the optimal number of CG sites that are optimized based on the essential dynamics coarse-graining method. The calculated curves of Calinski-Harabasz and Silhouette Coefficient indexes exhibit the extrema corresponding to the similar CG numbers. The calculated ratios of the optimal CG numbers to the residue numbers of fine-grained models are in the range from 4 to 2. The comparison of the stability of index results indicates that Calinski-Harabasz index is the better choice to determine the optimal CG representation in coarse-graining. © 2019 Wiley Periodicals, Inc.

19.

General Recognition of U-G, U-A, and C-G Pairs by Double-Stranded RNA-Binding PNAs Incorporated with an Artificial Nucleobase.

Ong, Alan Ann Lerk; Toh, Desiree-Faye Kaixin; Patil, Kiran M; Meng, Zhenyu; Yuan, Zhen; Krishna, Manchugondanahalli S; Devi, Gitali; Haruehanroengra, Phensinee; Lu, Yunpeng; Xia, Kelin; Okamura, Katsutomo; Sheng, Jia; Chen, Gang.

Biochemistry ; 58(10): 1319-1331, 2019 03 12.

Artigo em Inglês | MEDLINE | ID: mdl-30775913

RESUMO

Chemically modified peptide nucleic acids (PNAs) show great promise in the recognition of RNA duplexes by major-groove PNA·RNA-RNA triplex formation. Triplex formation is favored for RNA duplexes with a purine tract within one of the RNA duplex strands, and is severely destabilized if the purine tract is interrupted by pyrimidine residues. Here, we report the synthesis of a PNA monomer incorporated with an artificial nucleobase S, followed by the binding studies of a series of S-modified PNAs. Our data suggest that an S residue incorporated into short 8-mer dsRNA-binding PNAs (dbPNAs) can recognize internal Watson-Crick C-G and U-A, and wobble U-G base pairs (but not G-C, A-U, and G-U pairs) in RNA duplexes. The short S-modified PNAs show no appreciable binding to DNA duplexes or single-stranded RNAs. Interestingly, replacement of the C residue in an S·C-G triple with a 5-methyl C results in the disruption of the triplex, probably due to a steric clash between S and 5-methyl C. Previously reported PNA E base shows recognition of U-A and A-U pairs, but not a U-G pair. Thus, S-modified dbPNAs may be uniquely useful for the general recognition of RNA U-G, U-A, and C-G pairs. Shortening the succinyl linker of our PNA S monomer by one carbon atom to have a malonyl linker causes a severe destabilization of triplex formation. Our experimental and modeling data indicate that part of the succinyl moiety in a PNA S monomer may serve to expand the S base forming stacking interactions with adjacent PNA bases.

Assuntos

Ácidos Nucleicos Peptídicos/síntese química , Ácidos Nucleicos Peptídicos/fisiologia , RNA/química , Pareamento de Bases/genética , Pareamento de Bases/fisiologia , Simulação por Computador , DNA/química , Modelos Biológicos , Conformação de Ácido Nucleico , Ácidos Nucleicos Peptídicos/química , RNA/metabolismo , RNA de Cadeia Dupla

20.

Sequence- And Structure-Specific Probing of RNAs by Short Nucleobase-Modified dsRNA-Binding PNAs Incorporating a Fluorescent Light-up Uracil Analog.

Krishna, Manchugondanahalli S; Toh, Desiree-Faye Kaixin; Meng, Zhenyu; Ong, Alan Ann Lerk; Wang, Zhenzhang; Lu, Yunpeng; Xia, Kelin; Prabakaran, Mookkan; Chen, Gang.

Anal Chem ; 91(8): 5331-5338, 2019 04 16.

Artigo em Inglês | MEDLINE | ID: mdl-30873827

RESUMO

RNAs are emerging as important biomarkers and therapeutic targets. The strategy of directly targeting double-stranded RNA (dsRNA) by triplex-formation is relatively underexplored mainly due to the weak binding at physiological conditions for the traditional triplex-forming oligonucleotides (TFOs). Compared to DNA and RNA, peptide nucleic acids (PNAs) are chemically stable and have a neutral peptide-like backbone, and thus, they show significantly enhanced binding to natural nucleic acids. We have successfully developed nucleobase-modified dsRNA-binding PNAs (dbPNAs) to facilitate structure-specific and selective recognition of dsRNA over single-stranded RNA (ssRNA) and dsDNA regions at near-physiological conditions. The triplex formation strategy facilitates the targeting of not only the sequence but also the secondary structure of RNA. Here, we report the development of novel dbPNA-based fluorescent light-up probes through the incorporation of A-U pair-recognizing 5-benzothiophene uracil (btU). The incorporation of btU into dbPNAs does not affect the binding affinity toward dsRNAs significantly, in most cases, as evidenced by our nondenaturing gel shift assay data. The blue fluorescence emission intensity of btU-modified dbPNAs is sequence- and structure-specifically enhanced by dsRNAs, including the influenza viral RNA panhandle duplex and HIV-1-1 ribosomal frameshift-inducing RNA hairpin, but not ssRNAs or DNAs, at 200 mM NaCl, pH 7.5. Thus, dbPNAs incorporating btU-modified and other further modified fluorescent nucleobases will be useful biochemical tools for probing and detecting RNA structures, interactions, and functions.

Assuntos

Fluorescência , Ácidos Nucleicos Peptídicos/química , RNA/química , Uracila/química , Sequência de Bases , Sítios de Ligação , Estrutura Molecular , Conformação de Ácido Nucleico , Espectrometria de Fluorescência , Uracila/análogos & derivados

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa