Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38632952

RESUMO

Single-cell RNA sequencing (scRNA-seq) enables dissecting cellular heterogeneity in tissues, resulting in numerous biological discoveries. Various computational methods have been devised to delineate cell types by clustering scRNA-seq data, where clusters are often annotated using prior knowledge of marker genes. In addition to identifying pure cell types, several methods have been developed to identify cells undergoing state transitions, which often rely on prior clustering results. The present computational approaches predominantly investigate the local and first-order structures of scRNA-seq data using graph representations, while scRNA-seq data frequently display complex high-dimensional structures. Here, we introduce scGeom, a tool that exploits the multiscale and multidimensional structures in scRNA-seq data by analyzing the geometry and topology through curvature and persistent homology of both cell and gene networks. We demonstrate the utility of these structural features to reflect biological properties and functions in several applications, where we show that curvatures and topological signatures of cell and gene networks can help indicate transition cells and the differentiation potential of cells. We also illustrate that structural characteristics can improve the classification of cell types.


Assuntos
Algoritmos , Análise de Célula Única , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos , Transcriptoma , Análise por Conglomerados
2.
Biophys J ; 2024 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-38356263

RESUMO

Electrostatics is of paramount importance to chemistry, physics, biology, and medicine. The Poisson-Boltzmann (PB) theory is a primary model for electrostatic analysis. However, it is highly challenging to compute accurate PB electrostatic solvation free energies for macromolecules due to the nonlinearity, dielectric jumps, charge singularity, and geometric complexity associated with the PB equation. The present work introduces a PB-based machine learning (PBML) model for biomolecular electrostatic analysis. Trained with the second-order accurate MIBPB solver, the proposed PBML model is found to be more accurate and faster than several eminent PB solvers in electrostatic analysis. The proposed PBML model can provide highly accurate PB electrostatic solvation free energy of new biomolecules or new conformations generated by molecular dynamics with much reduced computational cost.

3.
Nat Methods ; 20(2): 218-228, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36690742

RESUMO

Spatial transcriptomic technologies and spatially annotated single-cell RNA sequencing datasets provide unprecedented opportunities to dissect cell-cell communication (CCC). However, incorporation of the spatial information and complex biochemical processes required in the reconstruction of CCC remains a major challenge. Here, we present COMMOT (COMMunication analysis by Optimal Transport) to infer CCC in spatial transcriptomics, which accounts for the competition between different ligand and receptor species as well as spatial distances between cells. A collective optimal transport method is developed to handle complex molecular interactions and spatial constraints. Furthermore, we introduce downstream analysis tools to infer spatial signaling directionality and genes regulated by signaling using machine learning models. We apply COMMOT to simulation data and eight spatial datasets acquired with five different technologies to show its effectiveness and robustness in identifying spatial CCC in data with varying spatial resolutions and gene coverages. Finally, COMMOT identifies new CCCs during skin morphogenesis in a case study of human epidermal development.


Assuntos
Comunicação Celular , Transcriptoma , Humanos , Comunicação Celular/genética , Perfilação da Expressão Gênica , Transdução de Sinais , Simulação por Computador , Análise de Célula Única
4.
Nat Commun ; 13(1): 4076, 2022 07 14.
Artigo em Inglês | MEDLINE | ID: mdl-35835774

RESUMO

One major challenge in analyzing spatial transcriptomic datasets is to simultaneously incorporate the cell transcriptome similarity and their spatial locations. Here, we introduce SpaceFlow, which generates spatially-consistent low-dimensional embeddings by incorporating both expression similarity and spatial information using spatially regularized deep graph networks. Based on the embedding, we introduce a pseudo-Spatiotemporal Map that integrates the pseudotime concept with spatial locations of the cells to unravel spatiotemporal patterns of cells. By comparing with multiple existing methods on several spatial transcriptomic datasets at both spot and single-cell resolutions, SpaceFlow is shown to produce a robust domain segmentation and identify biologically meaningful spatiotemporal patterns. Applications of SpaceFlow reveal evolving lineage in heart developmental data and tumor-immune interactions in human breast cancer data. Our study provides a flexible deep learning framework to incorporate spatiotemporal information in analyzing spatial transcriptomic data.


Assuntos
Transcriptoma , Humanos , Transcriptoma/genética
5.
Commun Biol ; 5(1): 220, 2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35273328

RESUMO

The rapid development of spatial transcriptomics (ST) techniques has allowed the measurement of transcriptional levels across many genes together with the spatial positions of cells. This has led to an explosion of interest in computational methods and techniques for harnessing both spatial and transcriptional information in analysis of ST datasets. The wide diversity of approaches in aim, methodology and technology for ST provides great challenges in dissecting cellular functions in spatial contexts. Here, we synthesize and review the key problems in analysis of ST data and methods that are currently applied, while also expanding on open questions and areas of future development.


Assuntos
Transcriptoma
6.
Cell Rep ; 37(12): 110140, 2021 12 21.
Artigo em Inglês | MEDLINE | ID: mdl-34936864

RESUMO

Neural crest (NC) cells migrate throughout vertebrate embryos to give rise to a huge variety of cell types, but when and where lineages emerge and their regulation remain unclear. We have performed single-cell RNA sequencing (RNA-seq) of cranial NC cells from the first pharyngeal arch in zebrafish over several stages during migration. Computational analysis combining pseudotime and real-time data reveals that these NC cells first adopt a transitional state, becoming specified mid-migration, with the first lineage decisions being skeletal and pigment, followed by neural and glial progenitors. In addition, by computationally integrating these data with RNA-seq data from a transgenic Wnt reporter line, we identify gene cohorts with similar temporal responses to Wnts during migration and show that one, Atp6ap2, is required for melanocyte differentiation. Together, our results show that cranial NC cell lineages arise progressively and uncover a series of spatially restricted cell interactions likely to regulate such cell-fate decisions.


Assuntos
Linhagem da Célula , Crista Neural/metabolismo , Proteínas Wnt/metabolismo , Proteínas de Peixe-Zebra/genética , Proteínas de Peixe-Zebra/metabolismo , Peixe-Zebra/genética , Peixe-Zebra/metabolismo , Animais , Animais Geneticamente Modificados , Região Branquial/metabolismo , Comunicação Celular , Diferenciação Celular , Movimento Celular , Nervos Cranianos/metabolismo , Embrião não Mamífero/metabolismo , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica no Desenvolvimento , RNA-Seq , Transdução de Sinais , Análise de Célula Única
7.
Ann Biomed Eng ; 49(12): 3524-3539, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34585335

RESUMO

Genetic mutations to the Lamin A/C gene (LMNA) can cause heart disease, but the mechanisms making cardiac tissues uniquely vulnerable to the mutations remain largely unknown. Further, patients with LMNA mutations have highly variable presentation of heart disease progression and type. In vitro patient-specific experiments could provide a powerful platform for studying this phenomenon, but the use of induced pluripotent stem cell-derived cardiomyocytes (iPSC-CM) introduces heterogeneity in maturity and function thus complicating the interpretation of the results of any single experiment. We hypothesized that integrating single cell RNA sequencing (scRNA-seq) with analysis of the tissue architecture and contractile function would elucidate some of the probable mechanisms. To test this, we investigated five iPSC-CM lines, three controls and two patients with a (c.357-2A>G) mutation. The patient iPSC-CM tissues had significantly weaker stress generation potential than control iPSC-CM tissues demonstrating the viability of our in vitro approach. Through scRNA-seq, differentially expressed genes between control and patient lines were identified. Some of these genes, linked to quantitative structural and functional changes, were cardiac specific, explaining the targeted nature of the disease progression seen in patients. The results of this work demonstrate the utility of combining in vitro tools in exploring heart disease mechanics.


Assuntos
Cardiomiopatia Dilatada/genética , Cardiomiopatia Dilatada/fisiopatologia , Expressão Gênica , Células-Tronco Pluripotentes Induzidas/citologia , Lamina Tipo A/genética , Contração Miocárdica , Miócitos Cardíacos/fisiologia , Adulto , Idoso , Linhagem Celular , Humanos , Pessoa de Meia-Idade
8.
Insect Biochem Mol Biol ; 137: 103625, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34358664

RESUMO

Scorpion α-toxins bind at the pharmacologically-defined site-3 on the sodium channel and inhibit channel inactivation by preventing the outward movement of the voltage sensor in domain IV (IVS4), whereas scorpion ß-toxins bind at site-4 on the sodium channel and enhance channel activation by trapping the voltage sensor of domain II (IIS4) in its outward position. However, limited information is available on the role of the voltage-sensing modules (VSM, comprising S1-S4) of domains I and III in toxin actions. We have previously shown that charge reversing substitutions of the innermost positively-charged residues in IIIS4 (R4E, R5E) increase the activity of an insect-selective site-4 scorpion toxin, Lqh-dprIT3-c, on BgNav1-1a, a cockroach sodium channel. Here we show that substitutions R4E and R5E in IIIS4 also increase the activity of two site-3 toxins, LqhαIT from Leiurusquinquestriatus hebraeus and insect-selective Av3 from Anemonia viridis. Furthermore, charge reversal of either of two conserved negatively-charged residues, D1K and E2K, in IIIS2 also increase the action of the site-3 and site-4 toxins. Homology modeling suggests that S2-D1 and S2-E2 interact with S4-R4 and S4-R5 in the VSM of domain III (III-VSM), respectively, in the activated state of the channel. However, charge swapping between S2-D1 and S4-R4 had no compensatory effects on gating or toxin actions, suggesting that charged residue interactions are complex. Collectively, our results highlight the involvement of III-VSM in the actions of both site 3 and site 4 toxins, suggesting that charge reversing substitutions in III-VSM allosterically facilitate IIS4 or IVS4 voltage sensor trapping by these toxins.


Assuntos
Venenos de Cnidários/farmacologia , Drosophila melanogaster/genética , Proteínas de Insetos/genética , Venenos de Escorpião/farmacologia , Canais de Sódio/genética , Animais , Drosophila melanogaster/efeitos dos fármacos , Drosophila melanogaster/metabolismo , Proteínas de Insetos/metabolismo , Oócitos/efeitos dos fármacos , Oócitos/metabolismo , Canais de Sódio/metabolismo
9.
Curr Opin Syst Biol ; 26: 12-23, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33969247

RESUMO

Cell-cell communication is a fundamental process that shapes biological tissue. Historically, studies of cell-cell communication have been feasible for one or two cell types and a few genes. With the emergence of single-cell transcriptomics, we are now able to examine the genetic profiles of individual cells at unprecedented scale and depth. The availability of such data presents an exciting opportunity to construct a more comprehensive description of cell-cell communication. This review discusses the recent explosion of methods that have been developed to infer cell-cell communication from non-spatial and spatial single-cell transcriptomics, two promising technologies which have complementary strengths and limitations. We propose several avenues to propel this rapidly expanding field forward in meaningful ways.

10.
Front Genet ; 12: 636743, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33833776

RESUMO

Single-cell RNA sequencing (scRNA-seq) data provides unprecedented information on cell fate decisions; however, the spatial arrangement of cells is often lost. Several recent computational methods have been developed to impute spatial information onto a scRNA-seq dataset through analyzing known spatial expression patterns of a small subset of genes known as a reference atlas. However, there is a lack of comprehensive analysis of the accuracy, precision, and robustness of the mappings, along with the generalizability of these methods, which are often designed for specific systems. We present a system-adaptive deep learning-based method (DEEPsc) to impute spatial information onto a scRNA-seq dataset from a given spatial reference atlas. By introducing a comprehensive set of metrics that evaluate the spatial mapping methods, we compare DEEPsc with four existing methods on four biological systems. We find that while DEEPsc has comparable accuracy to other methods, an improved balance between precision and robustness is achieved. DEEPsc provides a data-adaptive tool to connect scRNA-seq datasets and spatial imaging datasets to analyze cell fate decisions. Our implementation with a uniform API can serve as a portal with access to all the methods investigated in this work for spatial exploration of cell fate decisions in scRNA-seq data. All methods evaluated in this work are implemented as an open-source software with a uniform interface.

11.
PLoS Comput Biol ; 17(3): e1008571, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33684098

RESUMO

During early mammalian embryo development, a small number of cells make robust fate decisions at particular spatial locations in a tight time window to form inner cell mass (ICM), and later epiblast (Epi) and primitive endoderm (PE). While recent single-cell transcriptomics data allows scrutinization of heterogeneity of individual cells, consistent spatial and temporal mechanisms the early embryo utilize to robustly form the Epi/PE layers from ICM remain elusive. Here we build a multiscale three-dimensional model for mammalian embryo to recapitulate the observed patterning process from zygote to late blastocyst. By integrating the spatiotemporal information reconstructed from multiple single-cell transcriptomic datasets, the data-informed modeling analysis suggests two major processes critical to the formation of Epi/PE layers: a selective cell-cell adhesion mechanism (via EphA4/EphrinB2) for fate-location coordination and a temporal attenuation mechanism of cell signaling (via Fgf). Spatial imaging data and distinct subsets of single-cell gene expression data are then used to validate the predictions. Together, our study provides a multiscale framework that incorporates single-cell gene expression datasets to analyze gene regulations, cell-cell communications, and physical interactions among cells in complex geometries at single-cell resolution, with direct application to late-stage development of embryogenesis.


Assuntos
Desenvolvimento Embrionário/genética , Camadas Germinativas , Modelos Biológicos , Transcriptoma/genética , Animais , Embrião de Mamíferos/citologia , Embrião de Mamíferos/metabolismo , Embrião de Mamíferos/fisiologia , Camadas Germinativas/citologia , Camadas Germinativas/metabolismo , Camadas Germinativas/fisiologia , Camundongos , Análise de Célula Única
12.
BMVC ; 322021 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36227018

RESUMO

Complex biological tissues consist of numerous cells in a highly coordinated manner and carry out various biological functions. Therefore, segmenting a tissue into spatial and functional domains is critically important for understanding and controlling the biological functions. The emerging spatial transcriptomics technologies allow simultaneous measurements of thousands of genes with precise spatial information, providing an unprecedented opportunity for dissecting biological tissues. However, how to utilize such noisy, sparse, and high dimensional data for tissue segmentation remains a major challenge. Here, we develop a deep learning-based method, named SCAN-IT by transforming the spatial domain identification problem into an image segmentation problem, with cells mimicking pixels and expression values of genes within a cell representing the color channels. Specifically, SCAN-IT relies on geometric modeling, graph neural networks, and an informatics approach, DeepGraphInfomax. We demonstrate that SCAN-IT can handle datasets from a wide range of spatial transcriptomics techniques, including the ones with high spatial resolution but low gene coverage as well as those with low spatial resolution but high gene coverage. We show that SCAN-IT outperforms state-of-the-art methods using a benchmark dataset with ground truth domain annotations.

13.
Proc Natl Acad Sci U S A ; 117(36): 22146-22156, 2020 09 08.
Artigo em Inglês | MEDLINE | ID: mdl-32848056

RESUMO

Packing interaction is a critical driving force in the folding of helical membrane proteins. Despite the importance, packing defects (i.e., cavities including voids, pockets, and pores) are prevalent in membrane-integral enzymes, channels, transporters, and receptors, playing essential roles in function. Then, a question arises regarding how the two competing requirements, packing for stability vs. cavities for function, are reconciled in membrane protein structures. Here, using the intramembrane protease GlpG of Escherichiacoli as a model and cavity-filling mutation as a probe, we tested the impacts of native cavities on the thermodynamic stability and function of a membrane protein. We find several stabilizing mutations which induce substantial activity reduction without distorting the active site. Notably, these mutations are all mapped onto the regions of conformational flexibility and functional importance, indicating that the cavities facilitate functional movement of GlpG while compromising the stability. Experiment and molecular dynamics simulation suggest that the stabilization is induced by the coupling between enhanced protein packing and weakly unfavorable lipid desolvation, or solely by favorable lipid solvation on the cavities. Our result suggests that, stabilized by the relatively weak interactions with lipids, cavities are accommodated in membrane proteins without severe energetic cost, which, in turn, serve as a platform to fine-tune the balance between stability and flexibility for optimal activity.


Assuntos
Proteínas de Ligação a DNA/química , Endopeptidases/química , Proteínas de Escherichia coli/química , Proteínas de Membrana/química , Domínio Catalítico , Proteínas de Ligação a DNA/metabolismo , Endopeptidases/metabolismo , Proteínas de Escherichia coli/metabolismo , Humanos , Proteínas de Membrana/metabolismo , Modelos Moleculares , Simulação de Dinâmica Molecular , Mutação , Conformação Proteica , Dobramento de Proteína , Estabilidade Proteica , Serina Endopeptidases/química
14.
Nat Commun ; 11(1): 2084, 2020 04 29.
Artigo em Inglês | MEDLINE | ID: mdl-32350282

RESUMO

Single-cell RNA sequencing (scRNA-seq) provides details for individual cells; however, crucial spatial information is often lost. We present SpaOTsc, a method relying on structured optimal transport to recover spatial properties of scRNA-seq data by utilizing spatial measurements of a relatively small number of genes. A spatial metric for individual cells in scRNA-seq data is first established based on a map connecting it with the spatial measurements. The cell-cell communications are then obtained by "optimally transporting" signal senders to target signal receivers in space. Using partial information decomposition, we next compute the intercellular gene-gene information flow to estimate the spatial regulations between genes across cells. Four datasets are employed for cross-validation of spatial gene expression prediction and comparison to known cell-cell communications. SpaOTsc has broader applications, both in integrating non-spatial single-cell measurements with spatial data, and directly in spatial single-cell transcriptomics data to reconstruct spatial cellular dynamics in tissues.


Assuntos
Transdução de Sinais/genética , Análise de Célula Única , Transcriptoma/genética , Animais , Comunicação Celular , Análise por Conglomerados , Bases de Dados Genéticas , Drosophila/embriologia , Drosophila/genética , Regulação da Expressão Gênica no Desenvolvimento , Reprodutibilidade dos Testes , Análise de Sequência de RNA , Córtex Visual/metabolismo , Peixe-Zebra/embriologia , Peixe-Zebra/genética
15.
Cell Rep ; 30(11): 3932-3947.e6, 2020 03 17.
Artigo em Inglês | MEDLINE | ID: mdl-32187560

RESUMO

Our knowledge of transcriptional heterogeneities in epithelial stem and progenitor cell compartments is limited. Epidermal basal cells sustain cutaneous tissue maintenance and drive wound healing. Previous studies have probed basal cell heterogeneity in stem and progenitor potential, but a comprehensive dissection of basal cell dynamics during differentiation is lacking. Using single-cell RNA sequencing coupled with RNAScope and fluorescence lifetime imaging, we identify three non-proliferative and one proliferative basal cell state in homeostatic skin that differ in metabolic preference and become spatially partitioned during wound re-epithelialization. Pseudotemporal trajectory and RNA velocity analyses predict a quasi-linear differentiation hierarchy where basal cells progress from Col17a1Hi/Trp63Hi state to early-response state, proliferate at the juncture of these two states, or become growth arrested before differentiating into spinous cells. Wound healing induces plasticity manifested by dynamic basal-spinous interconversions at multiple basal transcriptional states. Our study provides a systematic view of epidermal cellular dynamics, supporting a revised "hierarchical-lineage" model of homeostasis.


Assuntos
Epiderme/metabolismo , Epiderme/patologia , Perfilação da Expressão Gênica , Homeostase/genética , Análise de Célula Única , Cicatrização/genética , Animais , Movimento Celular/genética , Feminino , Inflamação/genética , Inflamação/patologia , Camundongos Endogâmicos C57BL , Camundongos Transgênicos , Regulação para Cima/genética
16.
Phys Chem Chem Phys ; 22(8): 4343-4367, 2020 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-32067019

RESUMO

Recently, machine learning (ML) has established itself in various worldwide benchmarking competitions in computational biology, including Critical Assessment of Structure Prediction (CASP) and Drug Design Data Resource (D3R) Grand Challenges. However, the intricate structural complexity and high ML dimensionality of biomolecular datasets obstruct the efficient application of ML algorithms in the field. In addition to data and algorithm, an efficient ML machinery for biomolecular predictions must include structural representation as an indispensable component. Mathematical representations that simplify the biomolecular structural complexity and reduce ML dimensionality have emerged as a prime winner in D3R Grand Challenges. This review is devoted to the recent advances in developing low-dimensional and scalable mathematical representations of biomolecules in our laboratory. We discuss three classes of mathematical approaches, including algebraic topology, differential geometry, and graph theory. We elucidate how the physical and biological challenges have guided the evolution and development of these mathematical apparatuses for massive and diverse biomolecular data. We focus the performance analysis on protein-ligand binding predictions in this review although these methods have had tremendous success in many other applications, such as protein classification, virtual screening, and the predictions of solubility, solvation free energies, toxicity, partition coefficients, protein folding stability changes upon mutation, etc.


Assuntos
Biologia Computacional , Modelos Biológicos , Algoritmos , Dados de Sequência Molecular
17.
Nat Mach Intell ; 2(2): 116-123, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-34170981

RESUMO

The ability to predict protein-protein interactions is crucial to our understanding of a wide range of biological activities and functions in the human body, and for guiding drug discovery. Despite considerable efforts to develop suitable computational methods, predicting protein-protein interaction binding affinity changes following mutation (ΔΔG) remains a severe challenge. Algebraic topology, a champion in recent worldwide competitions for protein-ligand binding affinity predictions, is a promising approach to simplifying the complexity of biological structures. Here we introduce element- and site-specific persistent homology (a new branch of algebraic topology) to simplify the structural complexity of protein-protein complexes and embed crucial biological information into topological invariants. We also propose a new deep learning algorithm called NetTree to take advantage of convolutional neural networks and gradient-boosting trees. A topology-based network tree is constructed by integrating the topological representation and NetTree for predicting protein-protein interaction ΔΔG. Tests on major benchmark datasets indicate that the proposed topology-based network tree is an important improvement over the current state of the art in predicting ΔΔG.

18.
J Appl Comput Topol ; 4(4): 481-507, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34179350

RESUMO

While the spatial topological persistence is naturally constructed from a radius-based filtration, it has hardly been derived from a temporal filtration. Most topological models are designed for the global topology of a given object as a whole. There is no method reported in the literature for the topology of an individual component in an object to the best of our knowledge. For many problems in science and engineering, the topology of an individual component is important for describing its properties. We propose evolutionary homology (EH) constructed via a time evolution-based filtration and topological persistence. Our approach couples a set of dynamical systems or chaotic oscillators by the interactions of a physical system, such as a macromolecule. The interactions are approximated by weighted graph Laplacians. Simplices, simplicial complexes, algebraic groups and topological persistence are defined on the coupled trajectories of the chaotic oscillators. The resulting EH gives rise to time-dependent topological invariants or evolutionary barcodes for an individual component of the physical system, revealing its topology-function relationship. In conjunction with Wasserstein metrics, the proposed EH is applied to protein flexibility analysis, an important problem in computational biophysics. Numerical results for the B-factor prediction of a benchmark set of 364 proteins indicate that the proposed EH outperforms all the other state-of-the-art methods in the field.

19.
SIAM J Math Data Sci ; 2(2): 396-418, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-34222831

RESUMO

Persistent homology is a powerful tool for characterizing the topology of a data set at various geometric scales. When applied to the description of molecular structures, persistent homology can capture the multiscale geometric features and reveal certain interaction patterns in terms of topological invariants. However, in addition to the geometric information, there is a wide variety of nongeometric information of molecular structures, such as element types, atomic partial charges, atomic pairwise interactions, and electrostatic potential functions, that is not described by persistent homology. Although element-specific homology and electrostatic persistent homology can encode some nongeometric information into geometry based topological invariants, it is desirable to have a mathematical paradigm to systematically embed both geometric and nongeometric information, i.e., multicomponent heterogeneous information, into unified topological representations. To this end, we propose a persistent cohomology based framework for the enriched representation of data. In our framework, nongeometric information can either be distributed globally or reside locally on the datasets in the geometric sense and can be properly defined on topological spaces, i.e., simplicial complexes. Using the proposed persistent cohomology based framework, enriched barcodes are extracted from datasets to represent heterogeneous information. We consider a variety of datasets to validate the present formulation and illustrate the usefulness of the proposed method based on persistent cohomology. It is found that the proposed framework outperforms or at least matches the state-of-the-art methods in the protein-ligand binding affinity prediction from massive biomolecular datasets without resorting to any deep learning formulation.

20.
J Comput Aided Mol Des ; 33(1): 71-82, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30116918

RESUMO

Advanced mathematics, such as multiscale weighted colored subgraph and element specific persistent homology, and machine learning including deep neural networks were integrated to construct mathematical deep learning models for pose and binding affinity prediction and ranking in the last two D3R Grand Challenges in computer-aided drug design and discovery. D3R Grand Challenge 2 focused on the pose prediction, binding affinity ranking and free energy prediction for Farnesoid X receptor ligands. Our models obtained the top place in absolute free energy prediction for free energy set 1 in stage 2. The latest competition, D3R Grand Challenge 3 (GC3), is considered as the most difficult challenge so far. It has five subchallenges involving Cathepsin S and five other kinase targets, namely VEGFR2, JAK2, p38-α, TIE2, and ABL1. There is a total of 26 official competitive tasks for GC3. Our predictions were ranked 1st in 10 out of these 26 tasks.


Assuntos
Aprendizado Profundo , Simulação de Acoplamento Molecular/métodos , Receptores Citoplasmáticos e Nucleares/química , Sítios de Ligação , Catepsinas/química , Desenho Assistido por Computador , Cristalografia por Raios X , Bases de Dados de Proteínas , Desenho de Fármacos , Ligantes , Ligação Proteica , Conformação Proteica , Proteínas Quinases/química , Termodinâmica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA