Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 213
Filtrar
2.
J Chem Inf Model ; 64(8): 3558-3568, 2024 Apr 22.
Artículo en Inglés | MEDLINE | ID: mdl-38572676

RESUMEN

RNA velocity has the ability to capture the cell dynamic information in the biological processes; yet, a comprehensive analysis of the cell state transitions and their associated chemical and biological processes remains a gap. In this work, we provide the Hodge decomposition, coupled with discrete exterior calculus (DEC), to unveil cell dynamics by examining the decomposed curl-free, divergence-free, and harmonic components of the RNA velocity field in a low dimensional representation, such as a UMAP or a t-SNE representation. Decomposition results show that the decomposed components distinctly reveal key cell dynamic features such as cell cycle, bifurcation, and cell lineage differentiation, regardless of the choice of the low-dimensional representations. The consistency across different representations demonstrates that the Hodge decomposition is a reliable and robust way to extract these cell dynamic features, offering unique analysis and insightful visualization of single-cell RNA velocity fields.


Asunto(s)
ARN , Análisis de la Célula Individual , ARN/química , ARN/metabolismo , Humanos
3.
Comput Biol Med ; 175: 108497, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38678944

RESUMEN

Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell-cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing downstream analysis. Traditional PCA, a main workhorse in dimensionality reduction, lacks the ability to capture geometrical structure information embedded in the data, and previous graph Laplacian regularizations are limited by the analysis of only a single scale. We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L2,1 norm regularization to address multiscale and multiclass heterogeneity issues in data. We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method. The proposed kNN-PL is a new algebraic topology technique which addresses the many limitations of the traditional persistent homology. Rather than inducing filtration via the varying of a distance threshold, we introduced kNN-tPCA, where filtrations are achieved by varying the number of neighbors in a kNN network at each step, and find that this framework has significant implications for hyper-parameter tuning. We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse benchmark scRNA-seq datasets, and showcase that our methods outperform other unsupervised PCA enhancements from the literature, as well as popular Uniform Manifold Approximation (UMAP), t-Distributed Stochastic Neighbor Embedding (tSNE), and Projection Non-Negative Matrix Factorization (NMF) by significant margins. For example, tPCA provides up to 628%, 78%, and 149% improvements to UMAP, tSNE, and NMF, respectively on classification in the F1 metric, and kNN-tPCA offers 53%, 63%, and 32% improvements to UMAP, tSNE, and NMF, respectively on clustering in the ARI metric.


Asunto(s)
Análisis de Componente Principal , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Humanos , Análisis de Secuencia de ARN/métodos , Algoritmos , RNA-Seq/métodos
4.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38499497

RESUMEN

The escalating drug addiction crisis in the United States underscores the urgent need for innovative therapeutic strategies. This study embarked on an innovative and rigorous strategy to unearth potential drug repurposing candidates for opioid and cocaine addiction treatment, bridging the gap between transcriptomic data analysis and drug discovery. We initiated our approach by conducting differential gene expression analysis on addiction-related transcriptomic data to identify key genes. We propose a novel topological differentiation to identify key genes from a protein-protein interaction network derived from DEGs. This method utilizes persistent Laplacians to accurately single out pivotal nodes within the network, conducting this analysis in a multiscale manner to ensure high reliability. Through rigorous literature validation, pathway analysis and data-availability scrutiny, we identified three pivotal molecular targets, mTOR, mGluR5 and NMDAR, for drug repurposing from DrugBank. We crafted machine learning models employing two natural language processing (NLP)-based embeddings and a traditional 2D fingerprint, which demonstrated robust predictive ability in gauging binding affinities of DrugBank compounds to selected targets. Furthermore, we elucidated the interactions of promising drugs with the targets and evaluated their drug-likeness. This study delineates a multi-faceted and comprehensive analytical framework, amalgamating bioinformatics, topological data analysis and machine learning, for drug repurposing in addiction treatment, setting the stage for subsequent experimental validation. The versatility of the methods we developed allows for applications across a range of diseases and transcriptomic datasets.


Asunto(s)
Reposicionamiento de Medicamentos , Transcriptoma , Estados Unidos , Reposicionamiento de Medicamentos/métodos , Reproducibilidad de los Resultados , Perfilación de la Expresión Génica , Biología Computacional/métodos
5.
J Comput Appl Math ; 4452024 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-38464901

RESUMEN

Single-cell RNA sequencing (scRNA-seq) is a relatively new technology that has stimulated enormous interest in statistics, data science, and computational biology due to the high dimensionality, complexity, and large scale associated with scRNA-seq data. Nonnegative matrix factorization (NMF) offers a unique approach due to its meta-gene interpretation of resulting low-dimensional components. However, NMF approaches suffer from the lack of multiscale analysis. This work introduces two persistent Laplacian regularized NMF methods, namely, topological NMF (TNMF) and robust topological NMF (rTNMF). By employing a total of 12 datasets, we demonstrate that the proposed TNMF and rTNMF significantly outperform all other NMF-based methods. We have also utilized TNMF and rTNMF for the visualization of popular Uniform Manifold Approximation and Projection (UMAP) and t-distributed stochastic neighbor embedding (t-SNE).

6.
ArXiv ; 2024 Mar 05.
Artículo en Inglés | MEDLINE | ID: mdl-38495558

RESUMEN

As COVID-19 enters its fifth year, it continues to pose a significant global health threat, with the constantly mutating SARS-CoV-2 virus challenging drug effectiveness. A comprehensive understanding of virus-drug interactions is essential for predicting and improving drug effectiveness, especially in combating drug resistance during the pandemic. In response, the Path Laplacian Transformer-based Prospective Analysis Framework (PLFormer-PAF) has been proposed, integrating historical data analysis and predictive modeling strategies. This dual-strategy approach utilizes path topology to transform protein-ligand complexes into topological sequences, enabling the use of advanced large language models for analyzing protein-ligand interactions and enhancing its reliability with factual insights garnered from historical data. It has shown unparalleled performance in predicting binding affinity tasks across various benchmarks, including specific evaluations related to SARS-CoV-2, and assesses the impact of virus mutations on drug efficacy, offering crucial insights into potential drug resistance. The predictions align with observed mutation patterns in SARS-CoV-2, indicating that the widespread use of the Pfizer drug has lead to viral evolution and reduced drug efficacy. PLFormer-PAF's capabilities extend beyond identifying drug-resistant strains, positioning it as a key tool in drug discovery research and the development of new therapeutic strategies against fast-mutating viruses like COVID-19.

7.
Res Sq ; 2024 Feb 09.
Artículo en Inglés | MEDLINE | ID: mdl-38405777

RESUMEN

Pre-trained deep Transformers have had tremendous success in a wide variety of disciplines. However, in computational biology, essentially all Transformers are built upon the biological sequences, which ignores vital stereochemical information and may result in crucial errors in downstream predictions. On the other hand, three-dimensional (3D) molecular structures are incompatible with the sequential architecture of Transformer and natural language processing (NLP) models in general. This work addresses this foundational challenge by a topological Transformer (TopoFormer). TopoFormer is built by integrating NLP and a multiscale topology techniques, the persistent topological hyperdigraph Laplacian (PTHL), which systematically converts intricate 3D protein-ligand complexes at various spatial scales into a NLP-admissible sequence of topological invariants and homotopic shapes. Element-specific PTHLs are further developed to embed crucial physical, chemical, and biological interactions into topological sequences. TopoFormer surges ahead of conventional algorithms and recent deep learning variants and gives rise to exemplary scoring accuracy and superior performance in ranking, docking, and screening tasks in a number of benchmark datasets. The proposed topological sequences can be extracted from all kinds of structural data in data science to facilitate various NLP models, heralding a new era in AI-driven discovery.

8.
J Magn Reson Imaging ; 2024 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-38358090

RESUMEN

In recent years, magnetic particle imaging (MPI) has emerged as a promising imaging technique depicting high sensitivity and spatial resolution. It originated in the early 2000s where it proposed a new approach to challenge the low spatial resolution achieved by using relaxometry in order to measure the magnetic fields. MPI presents 2D and 3D images with high temporal resolution, non-ionizing radiation, and optimal visual contrast due to its lack of background tissue signal. Traditionally, the images were reconstructed by the conversion of signal from the induced voltage by generating system matrix and X-space based methods. Because image reconstruction and analyses play an integral role in obtaining precise information from MPI signals, newer artificial intelligence-based methods are continuously being researched and developed upon. In this work, we summarize and review the significance and employment of machine learning and deep learning models for applications with MPI and the potential they hold for the future. LEVEL OF EVIDENCE: 5 TECHNICAL EFFICACY: Stage 1.

9.
Biophys J ; 2024 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-38356263

RESUMEN

Electrostatics is of paramount importance to chemistry, physics, biology, and medicine. The Poisson-Boltzmann (PB) theory is a primary model for electrostatic analysis. However, it is highly challenging to compute accurate PB electrostatic solvation free energies for macromolecules due to the nonlinearity, dielectric jumps, charge singularity, and geometric complexity associated with the PB equation. The present work introduces a PB-based machine learning (PBML) model for biomolecular electrostatic analysis. Trained with the second-order accurate MIBPB solver, the proposed PBML model is found to be more accurate and faster than several eminent PB solvers in electrostatic analysis. The proposed PBML model can provide highly accurate PB electrostatic solvation free energy of new biomolecules or new conformations generated by molecular dynamics with much reduced computational cost.

10.
Comput Biol Med ; 171: 108211, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38422960

RESUMEN

Single-cell RNA sequencing (scRNA-seq) has emerged as a transformative technology, offering unparalleled insights into the intricate landscape of cellular diversity and gene expression dynamics. scRNA-seq analysis represents a challenging and cutting-edge frontier within the field of biological research. Differential geometry serves as a powerful mathematical tool in various applications of scientific research. In this study, we introduce, for the first time, a multiscale differential geometry (MDG) strategy for addressing the challenges encountered in scRNA-seq data analysis. We assume that intrinsic properties of cells lie on a family of low-dimensional manifolds embedded in the high-dimensional space of scRNA-seq data. Multiscale cell-cell interactive manifolds are constructed to reveal complex relationships in the cell-cell network, where curvature-based features for cells can decipher the intricate structural and biological information. We showcase the utility of our novel approach by demonstrating its effectiveness in classifying cell types. This innovative application of differential geometry in scRNA-seq analysis opens new avenues for understanding the intricacies of biological networks and holds great potential for network analysis in other fields.


Asunto(s)
Análisis de Datos , Perfilación de la Expresión Génica , Análisis de Secuencia de ARN/métodos , Perfilación de la Expresión Génica/métodos , Análisis por Conglomerados
11.
Comput Biol Med ; 169: 107918, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38194782

RESUMEN

Protein mutations can significantly influence protein solubility, which results in altered protein functions and leads to various diseases. Despite tremendous effort, machine learning prediction of protein solubility changes upon mutation remains a challenging task as indicated by the poor scores of normalized Correct Prediction Ratio (CPR). Part of the challenge stems from the fact that there is no three-dimensional (3D) structures for the wild-type and mutant proteins. This work integrates persistent Laplacians and pre-trained Transformer for the task. The Transformer, pretrained with hundreds of millions of protein sequences, embeds wild-type and mutant sequences, while persistent Laplacians track the topological invariant change and homotopic shape evolution induced by mutations in 3D protein structures, which are rendered from AlphaFold2. The resulting machine learning model was trained on an extensive data set labeled with three solubility types. Our model outperforms all existing predictive methods and improves the state-of-the-art up to 15%.


Asunto(s)
Aprendizaje Automático , Solubilidad , Secuencia de Aminoácidos , Mutación
12.
Pain ; 165(4): 908-921, 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-37851391

RESUMEN

ABSTRACT: Pain is a significant global health issue, and the current treatment options for pain management have limitations in terms of effectiveness, side effects, and potential for addiction. There is a pressing need for improved pain treatments and the development of new drugs. Voltage-gated sodium channels, particularly Nav1.3, Nav1.7, Nav1.8, and Nav1.9, play a crucial role in neuronal excitability and are predominantly expressed in the peripheral nervous system. Targeting these channels may provide a means to treat pain while minimizing central and cardiac adverse effects. In this study, we construct protein-protein interaction (PPI) networks based on pain-related sodium channels and develop a corresponding drug-target interaction network to identify potential lead compounds for pain management. To ensure reliable machine learning predictions, we carefully select 111 inhibitor data sets from a pool of more than 1000 targets in the PPI network. We employ 3 distinct machine learning algorithms combined with advanced natural language processing (NLP)-based embeddings, specifically pretrained transformer and autoencoder representations. Through a systematic screening process, we evaluate the side effects and repurposing potential of more than 150,000 drug candidates targeting Nav1.7 and Nav1.8 sodium channels. In addition, we assess the ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties of these candidates to identify leads with near-optimal characteristics. Our strategy provides an innovative platform for the pharmacological development of pain treatments, offering the potential for improved efficacy and reduced side effects.


Asunto(s)
Canales de Sodio Activados por Voltaje , Humanos , Canales de Sodio Activados por Voltaje/metabolismo , Dolor/tratamiento farmacológico , Canal de Sodio Activado por Voltaje NAV1.7/genética , Canal de Sodio Activado por Voltaje NAV1.7/metabolismo
13.
J Chem Inf Model ; 64(7): 2829-2838, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-37402705

RESUMEN

Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell-cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing the downstream analysis. We present Correlated Clustering and Projection (CCP), a new data-domain dimensionality reduction method, for the first time. CCP projects each cluster of similar genes into a supergene defined as the accumulated pairwise nonlinear gene-gene correlations among all cells. Using 14 benchmark data sets, we demonstrate that CCP has significant advantages over classical principal component analysis (PCA) for clustering and/or classification problems with intrinsically high dimensionality. In addition, we introduce the Residue-Similarity index (RSI) as a novel metric for clustering and classification and the R-S plot as a new visualization tool. We show that the RSI correlates with accuracy without requiring the knowledge of the true labels. The R-S plot provides a unique alternative to the uniform manifold approximation and projection (UMAP) and t-distributed stochastic neighbor embedding (t-SNE) for data with a large number of cell types.


Asunto(s)
Algoritmos , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Análisis por Conglomerados , Análisis de Componente Principal , Análisis de Secuencia de ARN/métodos , Perfilación de la Expresión Génica/métodos
14.
J Chem Inf Model ; 64(7): 2405-2420, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-37738663

RESUMEN

Over the years, Principal Component Analysis (PCA) has served as the baseline approach for dimensionality reduction in gene expression data analysis. Its primary objective is to identify a subset of disease-causing genes from a vast pool of thousands of genes. However, PCA possesses inherent limitations that hinder its interpretability, introduce class ambiguity, and fail to capture complex geometric structures in the data. Although these limitations have been partially addressed in the literature by incorporating various regularizers, such as graph Laplacian regularization, existing PCA based methods still face challenges related to multiscale analysis and capturing higher-order interactions in the data. To address these challenges, we propose a novel approach called Persistent Laplacian-enhanced Principal Component Analysis (PLPCA). PLPCA amalgamates the advantages of earlier regularized PCA methods with persistent spectral graph theory, specifically persistent Laplacians derived from algebraic topology. In contrast to graph Laplacians, persistent Laplacians enable multiscale analysis through filtration and can incorporate higher-order simplicial complexes to capture higher-order interactions in the data. We evaluate and validate the performance of PLPCA using ten benchmark microarray data sets that exhibit a wide range of dimensions and data imbalance ratios. Our extensive studies over these data sets demonstrate that PLPCA provides up to 12% improvement to the current state-of-the-art PCA models on five evaluation metrics for classification tasks after dimensionality reduction.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica , Análisis de Componente Principal , Perfilación de la Expresión Génica/métodos , Análisis de Datos , Benchmarking
15.
Small ; 20(5): e2305300, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-37735143

RESUMEN

Caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), coronavirus disease 2019 (COVID-19) has shown extensive lung manifestations in vulnerable individuals, putting lung imaging and monitoring at the forefront of early detection and treatment. Magnetic particle imaging (MPI) is an imaging modality, which can bring excellent contrast, sensitivity, and signal-to-noise ratios to lung imaging for the development of new theranostic approaches for respiratory diseases. Advances in MPI tracers would offer additional improvements and increase the potential for clinical translation of MPI. Here, a high-performance nanotracer based on shape anisotropy of magnetic nanoparticles is developed and its use in MPI imaging of the lung is demonstrated. Shape anisotropy proves to be a critical parameter for increasing signal intensity and resolution and exceeding those properties of conventional spherical nanoparticles. The 0D nanoparticles exhibit a 2-fold increase, while the 1D nanorods have a > 5-fold increase in signal intensity when compared to VivoTrax. Newly designed 1D nanorods displayed high signal intensities and excellent resolution in lung images. A spatiotemporal lung imaging study in mice revealed that this tracer offers new opportunities for monitoring disease and guiding intervention.


Asunto(s)
Nanopartículas de Magnetita , Nanopartículas , Ratones , Animales , Anisotropía , Diagnóstico por Imagen/métodos , Magnetismo , Fenómenos Magnéticos , Imagen por Resonancia Magnética
16.
J Comput Chem ; 45(6): 306-320, 2024 Mar 05.
Artículo en Inglés | MEDLINE | ID: mdl-37830273

RESUMEN

The Poisson-Boltzmann (PB) model is a widely used electrostatic model for biomolecular solvation analysis. Formulated as an elliptic interface problem, the PB model can be numerically solved on either Eulerian meshes using finite difference/finite element methods or Lagrangian meshes using boundary element methods. Molecular surface generators, which produce the discretized dielectric interfaces between solutes and solvents, are critical factors in determining the accuracy and efficiency of the PB solvers. In this work, we investigate the utility of the Eulerian Solvent Excluded Surface (ESES) software for rendering conjugated Eulerian and Lagrangian surface representations, which enables us to numerically validate and compare the quality of Eulerian PB solvers, such as the MIBPB solver, and the Lagrangian PB solvers, such as the TABI-PB solver. Furthermore, with the ESES software and its associated PB solvers, we are able to numerically validate an interesting and useful but often neglected source-target symmetric property associated with the linearized PB model.

18.
J Chem Inf Model ; 63(22): 7189-7209, 2023 Nov 27.
Artículo en Inglés | MEDLINE | ID: mdl-37956228

RESUMEN

The birth of ChatGPT, a cutting-edge language model-based chatbot developed by OpenAI, ushered in a new era in AI. However, due to potential pitfalls, its role in rigorous scientific research is not clear yet. This paper vividly showcases its innovative application within the field of drug discovery. Focused specifically on developing anticocaine addiction drugs, the study employs GPT-4 as a virtual guide, offering strategic and methodological insights to researchers working on generative models for drug candidates. The primary objective is to generate optimal drug-like molecules with desired properties. By leveraging the capabilities of ChatGPT, the study introduces a novel approach to the drug discovery process. This symbiotic partnership between AI and researchers transforms how drug development is approached. Chatbots become facilitators, steering researchers toward innovative methodologies and productive paths for creating effective drug candidates. This research sheds light on the collaborative synergy between human expertise and AI assistance, wherein ChatGPT's cognitive abilities enhance the design and development of pharmaceutical solutions. This paper not only explores the integration of advanced AI in drug discovery but also reimagines the landscape by advocating for AI-powered chatbots as trailblazers in revolutionizing therapeutic innovation.


Asunto(s)
Desarrollo de Medicamentos , Trastornos Relacionados con Sustancias , Humanos , Descubrimiento de Drogas , Lenguaje , Investigadores
19.
Appl Intell (Dordr) ; 53(12): 15727-15746, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38031564

RESUMEN

Machine learning has greatly influenced many fields, including science. However, despite of the tremendous accomplishments of machine learning, one of the key limitations of most existing machine learning approaches is their reliance on large labeled sets, and thus, data with limited labeled samples remains a challenge. Moreover, the performance of machine learning methods often severely hindered in case of diverse data, usually associated with smaller data sets or data associated with areas of study where the size of the data sets is constrained by high experimental cost and/or ethics. These challenges call for innovative strategies for dealing with these types of data. In this work, the aforementioned challenges are addressed by integrating graph-based frameworks, semi-supervised techniques, multiscale structures, and modified and adapted optimization procedures. This results in two innovative multiscale Laplacian learning (MLL) approaches for machine learning tasks, such as data classification, and for tackling data with limited samples, diverse data, and small data sets. The first approach, multikernel manifold learning (MML), integrates manifold learning with multikernel information and incorporates a warped kernel regularizer using multiscale graph Laplacians. The second approach, the multiscale MBO (MMBO) method, introduces multiscale Laplacians to the modification of the famous classical Merriman-Bence-Osher (MBO) scheme, and makes use of fast solvers. We demonstrate the performance of our algorithms experimentally on a variety of benchmark data sets, and compare them favorably to the state-of-art approaches.

20.
ArXiv ; 2023 Nov 02.
Artículo en Inglés | MEDLINE | ID: mdl-37961732

RESUMEN

Protein mutations can significantly influence protein solubility, which results in altered protein functions and leads to various diseases. Despite of tremendous effort, machine learning prediction of protein solubility changes upon mutation remains a challenging task as indicated by the poor scores of normalized Correct Prediction Ratio (CPR). Part of the challenge stems from the fact that there is no three-dimensional (3D) structures for the wild-type and mutant proteins. This work integrates persistent Laplacians and pre-trained Transformer for the task. The Transformer, pretrained with hunderds of millions of protein sequences, embeds wild-type and mutant sequences, while persistent Laplacians track the topological invariant change and homotopic shape evolution induced by mutations in 3D protein structures, which are rendered from AlphaFold2. The resulting machine learning model was trained on an extensive data set labeled with three solubility types. Our model outperforms all existing predictive methods and improves the state-of-the-art up to 15%.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...