Búsqueda | Portal de Búsqueda de la BVS España

1.

scGHOST: identifying single-cell 3D genome subcompartments.

Xiong, Kyle; Zhang, Ruochi; Ma, Jian.

Nat Methods ; 21(5): 814-822, 2024 May.

Artículo en Inglés | MEDLINE | ID: mdl-38589516

RESUMEN

Single-cell Hi-C (scHi-C) technologies allow for probing of genome-wide cell-to-cell variability in three-dimensional (3D) genome organization from individual cells. Computational methods have been developed to reveal single-cell 3D genome features based on scHi-C, including A/B compartments, topologically associating domains and chromatin loops. However, no method exists for annotating single-cell subcompartments, which is important for understanding chromosome spatial localization in single cells. Here we present scGHOST, a single-cell subcompartment annotation method using graph embedding with constrained random walk sampling. Applications of scGHOST to scHi-C data and contact maps derived from single-cell 3D genome imaging demonstrate reliable identification of single-cell subcompartments, offering insights into cell-to-cell variability of nuclear subcompartments. Using scHi-C data from complex tissues, scGHOST identifies cell-type-specific or allele-specific subcompartments linked to gene transcription across various cell types and developmental stages, suggesting functional implications of single-cell subcompartments. scGHOST is an effective method for annotating single-cell 3D genome subcompartments in a broad range of biological contexts.

Asunto(s)

Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Animales , Humanos , Genoma , Ratones , Cromatina/genética , Cromatina/metabolismo , Imagenología Tridimensional/métodos

2.

Orchestrating information across tissues via a novel multitask GAT framework to improve quantitative gene regulation relation modeling for survival analysis.

Duan, Meiyu; Wang, Yueying; Zhao, Dong; Liu, Hongmei; Zhang, Gongyou; Li, Kewei; Zhang, Haotian; Huang, Lan; Zhang, Ruochi; Zhou, Fengfeng.

Brief Bioinform ; 24(4)2023 07 20.

Artículo en Inglés | MEDLINE | ID: mdl-37427963

RESUMEN

Survival analysis is critical to cancer prognosis estimation. High-throughput technologies facilitate the increase in the dimension of genic features, but the number of clinical samples in cohorts is relatively small due to various reasons, including difficulties in participant recruitment and high data-generation costs. Transcriptome is one of the most abundantly available OMIC (referring to the high-throughput data, including genomic, transcriptomic, proteomic and epigenomic) data types. This study introduced a multitask graph attention network (GAT) framework DQSurv for the survival analysis task. We first used a large dataset of healthy tissue samples to pretrain the GAT-based HealthModel for the quantitative measurement of the gene regulatory relations. The multitask survival analysis framework DQSurv used the idea of transfer learning to initiate the GAT model with the pretrained HealthModel and further fine-tuned this model using two tasks i.e. the main task of survival analysis and the auxiliary task of gene expression prediction. This refined GAT was denoted as DiseaseModel. We fused the original transcriptomic features with the difference vector between the latent features encoded by the HealthModel and DiseaseModel for the final task of survival analysis. The proposed DQSurv model stably outperformed the existing models for the survival analysis of 10 benchmark cancer types and an independent dataset. The ablation study also supported the necessity of the main modules. We released the codes and the pretrained HealthModel to facilitate the feature encodings and survival analysis of transcriptome-based future studies, especially on small datasets. The model and the code are available at http://www.healthinformaticslab.org/supp/.

Asunto(s)

Algoritmos , Neoplasias , Humanos , Proteómica , Análisis de Supervivencia

3.

MolFeSCue: enhancing molecular property prediction in data-limited and imbalanced contexts using few-shot and contrastive learning.

Zhang, Ruochi; Wu, Chao; Yang, Qian; Liu, Chang; Wang, Yan; Li, Kewei; Huang, Lan; Zhou, Fengfeng.

Bioinformatics ; 40(4)2024 Mar 29.

Artículo en Inglés | MEDLINE | ID: mdl-38426310

RESUMEN

MOTIVATION: Predicting molecular properties is a pivotal task in various scientific domains, including drug discovery, material science, and computational chemistry. This problem is often hindered by the lack of annotated data and imbalanced class distributions, which pose significant challenges in developing accurate and robust predictive models. RESULTS: This study tackles these issues by employing pretrained molecular models within a few-shot learning framework. A novel dynamic contrastive loss function is utilized to further improve model performance in the situation of class imbalance. The proposed MolFeSCue framework not only facilitates rapid generalization from minimal samples, but also employs a contrastive loss function to extract meaningful molecular representations from imbalanced datasets. Extensive evaluations and comparisons of MolFeSCue and state-of-the-art algorithms have been conducted on multiple benchmark datasets, and the experimental data demonstrate our algorithm's effectiveness in molecular representations and its broad applicability across various pretrained models. Our findings underscore MolFeSCues potential to accelerate advancements in drug discovery. AVAILABILITY AND IMPLEMENTATION: We have made all the source code utilized in this study publicly accessible via GitHub at http://www.healthinformaticslab.org/supp/ or https://github.com/zhangruochi/MolFeSCue. The code (MolFeSCue-v1-00) is also available as the supplementary file of this paper.

Asunto(s)

Algoritmos , Benchmarking , Descubrimiento de Drogas , Modelos Moleculares , Programas Informáticos

4.

HELM-GPT: de novo macrocyclic peptide design using generative pre-trained transformer.

Xu, Xiaopeng; Xu, Chencheng; He, Wenjia; Wei, Lesong; Li, Haoyang; Zhou, Juexiao; Zhang, Ruochi; Wang, Yu; Xiong, Yuanpeng; Gao, Xin.

Bioinformatics ; 40(6)2024 Jun 03.

Artículo en Inglés | MEDLINE | ID: mdl-38867692

RESUMEN

MOTIVATION: Macrocyclic peptides hold great promise as therapeutics targeting intracellular proteins. This stems from their remarkable ability to bind flat protein surfaces with high affinity and specificity while potentially traversing the cell membrane. Research has already explored their use in developing inhibitors for intracellular proteins, such as KRAS, a well-known driver in various cancers. However, computational approaches for de novo macrocyclic peptide design remain largely unexplored. RESULTS: Here, we introduce HELM-GPT, a novel method that combines the strength of the hierarchical editing language for macromolecules (HELM) representation and generative pre-trained transformer (GPT) for de novo macrocyclic peptide design. Through reinforcement learning (RL), our experiments demonstrate that HELM-GPT has the ability to generate valid macrocyclic peptides and optimize their properties. Furthermore, we introduce a contrastive preference loss during the RL process, further enhanced the optimization performance. Finally, to co-optimize peptide permeability and KRAS binding affinity, we propose a step-by-step optimization strategy, demonstrating its effectiveness in generating molecules fulfilling both criteria. In conclusion, the HELM-GPT method can be used to identify novel macrocyclic peptides to target intracellular proteins. AVAILABILITY AND IMPLEMENTATION: The code and data of HELM-GPT are freely available on GitHub (https://github.com/charlesxu90/helm-gpt).

Asunto(s)

Péptidos Cíclicos , Péptidos Cíclicos/química , Biología Computacional/métodos , Diseño de Fármacos , Péptidos/química , Humanos , Algoritmos , Programas Informáticos

5.

MOCHI enables discovery of heterogeneous interactome modules in 3D nucleome.

Tian, Dechao; Zhang, Ruochi; Zhang, Yang; Zhu, Xiaopeng; Ma, Jian.

Genome Res ; 30(2): 227-238, 2020 02.

Artículo en Inglés | MEDLINE | ID: mdl-31907193

RESUMEN

The composition of the cell nucleus is highly heterogeneous, with different constituents forming complex interactomes. However, the global patterns of these interwoven heterogeneous interactomes remain poorly understood. Here we focus on two different interactomes, chromatin interaction network and gene regulatory network, as a proof of principle to identify heterogeneous interactome modules (HIMs), each of which represents a cluster of gene loci that is in spatial contact more frequently than expected and that is regulated by the same group of transcription factors. HIM integrates transcription factor binding and 3D genome structure to reflect "transcriptional niche" in the nucleus. We develop a new algorithm, MOCHI, to facilitate the discovery of HIMs based on network motif clustering in heterogeneous interactomes. By applying MOCHI to five different cell types, we found that HIMs have strong spatial preference within the nucleus and show distinct functional properties. Through integrative analysis, this work shows the utility of MOCHI to identify HIMs, which may provide new perspectives on the interplay between transcriptional regulation and 3D genome organization.

Asunto(s)

Cromatina/genética , Epistasis Genética/genética , Regulación de la Expresión Génica/genética , Redes Reguladoras de Genes/genética , Algoritmos , Análisis por Conglomerados , Genoma Humano/genética , Humanos , Unión Proteica/genética , Factores de Transcripción/genética

6.

XGraphBoost: Extracting Graph Neural Network-Based Features for a Better Prediction of Molecular Properties.

Deng, Daiguo; Chen, Xiaowei; Zhang, Ruochi; Lei, Zengrong; Wang, Xiaojian; Zhou, Fengfeng.

J Chem Inf Model ; 61(6): 2697-2705, 2021 06 28.

Artículo en Inglés | MEDLINE | ID: mdl-34009965

RESUMEN

Determining the properties of chemical molecules is essential for screening candidates similar to a specific drug. These candidate molecules are further evaluated for their target binding affinities, side effects, target missing probabilities, etc. Conventional machine learning algorithms demonstrated satisfying prediction accuracies of molecular properties. A molecule cannot be directly loaded into a machine learning model, and a set of engineered features needs to be designed and calculated from a molecule. Such hand-crafted features rely heavily on the experiences of the investigating researchers. The concept of graph neural networks (GNNs) was recently introduced to describe the chemical molecules. The features may be automatically and objectively extracted from the molecules through various types of GNNs, e.g., GCN (graph convolution network), GGNN (gated graph neural network), DMPNN (directed message passing neural network), etc. However, the training of a stable GNN model requires a huge number of training samples and a large amount of computing power, compared with the conventional machine learning strategies. This study proposed the integrated framework XGraphBoost to extract the features using a GNN and build an accurate prediction model of molecular properties using the classifier XGBoost. The proposed framework XGraphBoost fully inherits the merits of the GNN-based automatic molecular feature extraction and XGBoost-based accurate prediction performance. Both classification and regression problems were evaluated using the framework XGraphBoost. The experimental results strongly suggest that XGraphBoost may facilitate the efficient and accurate predictions of various molecular properties. The source code is freely available to academic users at https://github.com/chenxiaowei-vincent/XGraphBoost.git.

Asunto(s)

Aprendizaje Automático , Redes Neurales de la Computación , Algoritmos , Programas Informáticos

7.

Predicting CTCF-mediated chromatin loops using CTCF-MP.

Zhang, Ruochi; Wang, Yuchuan; Yang, Yang; Zhang, Yang; Ma, Jian.

Bioinformatics ; 34(13): i133-i141, 2018 07 01.

Artículo en Inglés | MEDLINE | ID: mdl-29949986

RESUMEN

Motivation: The three dimensional organization of chromosomes within the cell nucleus is highly regulated. It is known that CCCTC-binding factor (CTCF) is an important architectural protein to mediate long-range chromatin loops. Recent studies have shown that the majority of CTCF binding motif pairs at chromatin loop anchor regions are in convergent orientation. However, it remains unknown whether the genomic context at the sequence level can determine if a convergent CTCF motif pair is able to form a chromatin loop. Results: In this article, we directly ask whether and what sequence-based features (other than the motif itself) may be important to establish CTCF-mediated chromatin loops. We found that motif conservation measured by 'branch-of-origin' that accounts for motif turn-over in evolution is an important feature. We developed a new machine learning algorithm called CTCF-MP based on word2vec to demonstrate that sequence-based features alone have the capability to predict if a pair of convergent CTCF motifs would form a loop. Together with functional genomic signals from CTCF ChIP-seq and DNase-seq, CTCF-MP is able to make highly accurate predictions on whether a convergent CTCF motif pair would form a loop in a single cell type and also across different cell types. Our work represents an important step further to understand the sequence determinants that may guide the formation of complex chromatin architectures. Availability and implementation: The source code of CTCF-MP can be accessed at: https://github.com/ma-compbio/CTCF-MP. Supplementary information: Supplementary data are available at Bioinformatics online.

Asunto(s)

Factor de Unión a CCCTC/metabolismo , Cromatina/ultraestructura , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Cromatina/metabolismo , Inmunoprecipitación de Cromatina/métodos , Cromosomas Humanos/metabolismo , Cromosomas Humanos/ultraestructura , Células HeLa , Humanos

8.

pyHIVE, a health-related image visualization and engineering system using Python.

Zhang, Ruochi; Zhao, Ruixue; Zhao, Xinyang; Wu, Di; Zheng, Weiwei; Feng, Xin; Zhou, Fengfeng.

BMC Bioinformatics ; 19(1): 452, 2018 Nov 26.

Artículo en Inglés | MEDLINE | ID: mdl-30477418

RESUMEN

BACKGROUND: Imaging is one of the major biomedical technologies to investigate the status of a living object. But the biomedical image based data mining problem requires extensive knowledge across multiple disciplinaries, e.g. biology, mathematics and computer science, etc. RESULTS: pyHIVE (a Health-related Image Visualization and Engineering system using Python) was implemented as an image processing system, providing five widely used image feature engineering algorithms. A standard binary classification pipeline was also provided to help researchers build data models immediately after the data is collected. pyHIVE may calculate five widely-used image feature engineering algorithms efficiently using multiple computing cores, and also featured the modules of Principal Component Analysis (PCA) based preprocessing and normalization. CONCLUSIONS: The demonstrative example shows that the image features generated by pyHIVE achieved very good classification performances based on the gastrointestinal endoscopic images. This system pyHIVE and the demonstrative example are freely available and maintained at http://www.healthinformaticslab.org/supp/resources.php .

Asunto(s)

Procesamiento de Imagen Asistido por Computador/métodos , Algoritmos , Endoscopía Gastrointestinal , Humanos , Análisis de Componente Principal , Lenguajes de Programación

9.

Exploiting sequence-based features for predicting enhancer-promoter interactions.

Yang, Yang; Zhang, Ruochi; Singh, Shashank; Ma, Jian.

Bioinformatics ; 33(14): i252-i260, 2017 Jul 15.

Artículo en Inglés | MEDLINE | ID: mdl-28881991

RESUMEN

MOTIVATION: A large number of distal enhancers and proximal promoters form enhancer-promoter interactions to regulate target genes in the human genome. Although recent high-throughput genome-wide mapping approaches have allowed us to more comprehensively recognize potential enhancer-promoter interactions, it is still largely unknown whether sequence-based features alone are sufficient to predict such interactions. RESULTS: Here, we develop a new computational method (named PEP) to predict enhancer-promoter interactions based on sequence-based features only, when the locations of putative enhancers and promoters in a particular cell type are given. The two modules in PEP (PEP-Motif and PEP-Word) use different but complementary feature extraction strategies to exploit sequence-based information. The results across six different cell types demonstrate that our method is effective in predicting enhancer-promoter interactions as compared to the state-of-the-art methods that use functional genomic signals. Our work demonstrates that sequence-based features alone can reliably predict enhancer-promoter interactions genome-wide, which could potentially facilitate the discovery of important sequence determinants for long-range gene regulation. AVAILABILITY AND IMPLEMENTATION: The source code of PEP is available at: https://github.com/ma-compbio/PEP . CONTACT: jianma@cs.cmu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Elementos de Facilitación Genéticos , Genoma Humano , Genómica/métodos , Regiones Promotoras Genéticas , Análisis de Secuencia de ADN/métodos , Algoritmos , Línea Celular , Regulación de la Expresión Génica , Humanos

10.

Correction to "XGraphBoost: Extracting Graph Neural Network-Based Features for a Better Prediction of Molecular Properties".

Deng, Daiguo; Chen, Xiaowei; Zhang, Ruochi; Lei, Zengrong; Wang, Xiaojian; Zhou, Fengfeng.

J Chem Inf Model ; 61(9): 4820-4822, 2021 Sep 27.

Artículo en Inglés | MEDLINE | ID: mdl-34477362

11.

GAGE-seq concurrently profiles multiscale 3D genome organization and gene expression in single cells.

Zhou, Tianming; Zhang, Ruochi; Jia, Deyong; Doty, Raymond T; Munday, Adam D; Gao, Daniel; Xin, Li; Abkowitz, Janis L; Duan, Zhijun; Ma, Jian.

Nat Genet ; 2024 May 14.

Artículo en Inglés | MEDLINE | ID: mdl-38744973

RESUMEN

The organization of mammalian genomes features a complex, multiscale three-dimensional (3D) architecture, whose functional significance remains elusive because of limited single-cell technologies that can concurrently profile genome organization and transcriptional activities. Here, we introduce genome architecture and gene expression by sequencing (GAGE-seq), a scalable, robust single-cell co-assay measuring 3D genome structure and transcriptome simultaneously within the same cell. Applied to mouse brain cortex and human bone marrow CD34+ cells, GAGE-seq characterized the intricate relationships between 3D genome and gene expression, showing that multiscale 3D genome features inform cell-type-specific gene expression and link regulatory elements to target genes. Integration with spatial transcriptomic data revealed in situ 3D genome variations in mouse cortex. Observations in human hematopoiesis unveiled discordant changes between 3D genome organization and gene expression, underscoring a complex, temporal interplay at the single-cell level. GAGE-seq provides a powerful, cost-effective approach for exploring genome structure and gene expression relationships at the single-cell level across diverse biological contexts.

12.

Fiber-based free-space optical coherent receiver with vibration compensation mechanism.

Zhang, Ruochi; Wang, Jianmin; Zhao, Guang; Lv, Junyi.

Opt Express ; 21(15): 18434-41, 2013 Jul 29.

Artículo en Inglés | MEDLINE | ID: mdl-23938715

RESUMEN

We propose a novel fiber-based free-space optical (FSO) coherent receiver for inter-satellite communication. The receiver takes advantage of established fiber-optic components and utilizes the fine-pointing subsystem installed in FSO terminals to minimize the influence of satellite platform vibrations. The received beam is coupled to a single-mode fiber, and the coupling efficiency of the system is investigated both analytically and experimentally. A receiving sensitivity of -38 dBm is obtained at the forward error correction limit with a transmission rate of 22.4 Gbit/s. The proposed receiver is shown to be a promising component for inter-satellite optical communication.

Asunto(s)

Artefactos , Tecnología de Fibra Óptica/instrumentación , Modelos Teóricos , Nave Espacial/instrumentación , Telecomunicaciones/instrumentación , Simulación por Computador , Diseño Asistido por Computadora , Diseño de Equipo , Análisis de Falla de Equipo , Retroalimentación , Luz , Dispersión de Radiación

13.

scGHOST: Identifying single-cell 3D genome subcompartments.

Xiong, Kyle; Zhang, Ruochi; Ma, Jian.

bioRxiv ; 2023 May 25.

Artículo en Inglés | MEDLINE | ID: mdl-37292994

RESUMEN

New single-cell Hi-C (scHi-C) technologies enable probing of the genome-wide cell-to-cell variability in 3D genome organization from individual cells. Several computational methods have been developed to reveal single-cell 3D genome features based on scHi-C data, including A/B compartments, topologically-associating domains, and chromatin loops. However, no scHi-C analysis method currently exists for annotating single-cell subcompartments, which are crucial for providing a more refined view of large-scale chromosome spatial localization in single cells. Here, we present scGhost, a single-cell subcompartment annotation method based on graph embedding with constrained random walk sampling. Applications of scGhost to scHi-C data and single-cell 3D genome imaging data demonstrate the reliable identification of single-cell subcompartments and offer new insights into cell-to-cell variability of nuclear subcompartments. Using scHi-C data from the human prefrontal cortex, scGhost identifies cell type-specific subcompartments that are strongly connected to cell type-specific gene expression, suggesting the functional implications of single-cell subcompartments. Overall, scGhost is an effective new method for single-cell 3D genome subcompartment annotation based on scHi-C data for a broad range of biological contexts.

14.

PhyGCN: Pre-trained Hypergraph Convolutional Neural Networks with Self-supervised Learning.

Deng, Yihe; Zhang, Ruochi; Xu, Pan; Ma, Jian; Gu, Quanquan.

bioRxiv ; 2023 Oct 02.

Artículo en Inglés | MEDLINE | ID: mdl-37873233

RESUMEN

Hypergraphs are powerful tools for modeling complex interactions across various domains, including biomedicine. However, learning meaningful node representations from hypergraphs remains a challenge. Existing supervised methods often lack generalizability, thereby limiting their real-world applications. We propose a new method, Pre-trained Hypergraph Convolutional Neural Networks with Self-supervised Learning (PhyGCN), which leverages hypergraph structure for self-supervision to enhance node representations. PhyGCN introduces a unique training strategy that integrates variable hyperedge sizes with self-supervised learning, enabling improved generalization to unseen data. Applications on multi-way chromatin interactions and polypharmacy side-effects demonstrate the effectiveness of PhyGCN. As a generic framework for high-order interaction datasets with abundant unlabeled data, PhyGCN holds strong potential for enhancing hypergraph node representations across various domains.

15.

AB-Gen: Antibody Library Design with Generative Pre-trained Transformer and Deep Reinforcement Learning.

Xu, Xiaopeng; Xu, Tiantian; Zhou, Juexiao; Liao, Xingyu; Zhang, Ruochi; Wang, Yu; Zhang, Lu; Gao, Xin.

Genomics Proteomics Bioinformatics ; 21(5): 1043-1053, 2023 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-37364719

RESUMEN

Antibody leads must fulfill multiple desirable properties to be clinical candidates. Primarily due to the low throughput in the experimental procedure, the need for such multi-property optimization causes the bottleneck in preclinical antibody discovery and development, because addressing one issue usually causes another. We developed a reinforcement learning (RL) method, named AB-Gen, for antibody library design using a generative pre-trained transformer (GPT) as the policy network of the RL agent. We showed that this model can learn the antibody space of heavy chain complementarity determining region 3 (CDRH3) and generate sequences with similar property distributions. Besides, when using human epidermal growth factor receptor-2 (HER2) as the target, the agent model of AB-Gen was able to generate novel CDRH3 sequences that fulfill multi-property constraints. Totally, 509 generated sequences were able to pass all property filters, and three highly conserved residues were identified. The importance of these residues was further demonstrated by molecular dynamics simulations, consolidating that the agent model was capable of grasping important information in this complex optimization task. Overall, the AB-Gen method is able to design novel antibody sequences with an improved success rate than the traditional propose-then-filter approach. It has the potential to be used in practical antibody design, thus empowering the antibody discovery and development process. The source code of AB-Gen is freely available at Zenodo (https://doi.org/10.5281/zenodo.7657016) and BioCode (https://ngdc.cncb.ac.cn/biocode/tools/BT007341).

Asunto(s)

Anticuerpos , Simulación de Dinámica Molecular , Humanos , Biblioteca de Genes , Programas Informáticos

16.

A comprehensive benchmarking with practical guidelines for cellular deconvolution of spatial transcriptomics.

Li, Haoyang; Zhou, Juexiao; Li, Zhongxiao; Chen, Siyuan; Liao, Xingyu; Zhang, Bin; Zhang, Ruochi; Wang, Yu; Sun, Shiwei; Gao, Xin.

Nat Commun ; 14(1): 1548, 2023 03 21.

Artículo en Inglés | MEDLINE | ID: mdl-36941264

RESUMEN

Spatial transcriptomics technologies are used to profile transcriptomes while preserving spatial information, which enables high-resolution characterization of transcriptional patterns and reconstruction of tissue architecture. Due to the existence of low-resolution spots in recent spatial transcriptomics technologies, uncovering cellular heterogeneity is crucial for disentangling the spatial patterns of cell types, and many related methods have been proposed. Here, we benchmark 18 existing methods resolving a cellular deconvolution task with 50 real-world and simulated datasets by evaluating the accuracy, robustness, and usability of the methods. We compare these methods comprehensively using different metrics, resolutions, spatial transcriptomics technologies, spot numbers, and gene numbers. In terms of performance, CARD, Cell2location, and Tangram are the best methods for conducting the cellular deconvolution task. To refine our comparative results, we provide decision-tree-style guidelines and recommendations for method selection and their additional features, which will help users easily choose the best method for fulfilling their concerns.

Asunto(s)

Benchmarking , Transcriptoma , Transcriptoma/genética , Perfilación de la Expresión Génica , Tecnología

17.

Concurrent profiling of multiscale 3D genome organization and gene expression in single mammalian cells.

Zhou, Tianming; Zhang, Ruochi; Jia, Deyong; Doty, Raymond T; Munday, Adam D; Gao, Daniel; Xin, Li; Abkowitz, Janis L; Duan, Zhijun; Ma, Jian.

bioRxiv ; 2023 Jul 25.

Artículo en Inglés | MEDLINE | ID: mdl-37546900

RESUMEN

The organization of mammalian genomes within the nucleus features a complex, multiscale three-dimensional (3D) architecture. The functional significance of these 3D genome features, however, remains largely elusive due to limited single-cell technologies that can concurrently profile genome organization and transcriptional activities. Here, we report GAGE-seq, a highly scalable, robust single-cell co-assay that simultaneously measures 3D genome structure and transcriptome within the same cell. Employing GAGE-seq on mouse brain cortex and human bone marrow CD34+ cells, we comprehensively characterized the intricate relationships between 3D genome and gene expression. We found that these multiscale 3D genome features collectively inform cell type-specific gene expressions, hence contributing to defining cell identity at the single-cell level. Integration of GAGE-seq data with spatial transcriptomic data revealed in situ variations of the 3D genome in mouse cortex. Moreover, our observations of lineage commitment in normal human hematopoiesis unveiled notable discordant changes between 3D genome organization and gene expression, underscoring a complex, temporal interplay at the single-cell level that is more nuanced than previously appreciated. Together, GAGE-seq provides a powerful, cost-effective approach for interrogating genome structure and gene expression relationships at the single-cell level across diverse biological contexts.

18.

Optimization of binding affinities in chemical space with generative pre-trained transformer and deep reinforcement learning.

Xu, Xiaopeng; Zhou, Juexiao; Zhu, Chen; Zhan, Qing; Li, Zhongxiao; Zhang, Ruochi; Wang, Yu; Liao, Xingyu; Gao, Xin.

F1000Res ; 12: 757, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-38434657

RESUMEN

Background: The key challenge in drug discovery is to discover novel compounds with desirable properties. Among the properties, binding affinity to a target is one of the prerequisites and usually evaluated by molecular docking or quantitative structure activity relationship (QSAR) models. Methods: In this study, we developed SGPT-RL, which uses a generative pre-trained transformer (GPT) as the policy network of the reinforcement learning (RL) agent to optimize the binding affinity to a target. SGPT-RL was evaluated on the Moses distribution learning benchmark and two goal-directed generation tasks, with Dopamine Receptor D2 (DRD2) and Angiotensin-Converting Enzyme 2 (ACE2) as the targets. Both QSAR model and molecular docking were implemented as the optimization goals in the tasks. The popular Reinvent method was used as the baseline for comparison. Results: The results on the Moses benchmark showed that SGPT-RL learned good property distributions and generated molecules with high validity and novelty. On the two goal-directed generation tasks, both SGPT-RL and Reinvent were able to generate valid molecules with improved target scores. The SGPT-RL method achieved better results than Reinvent on the ACE2 task, where molecular docking was used as the optimization goal. Further analysis shows that SGPT-RL learned conserved scaffold patterns during exploration. Conclusions: The superior performance of SGPT-RL in the ACE2 task indicates that it can be applied to the virtual screening process where molecular docking is widely used as the criteria. Besides, the scaffold patterns learned by SGPT-RL during the exploration process can assist chemists to better design and discover novel lead candidates.

Asunto(s)

Enzima Convertidora de Angiotensina 2 , Aprendizaje , Alanina Transaminasa , Simulación del Acoplamiento Molecular , Benchmarking

19.

OCMR: A comprehensive framework for optical chemical molecular recognition.

Wang, Yan; Zhang, Ruochi; Zhang, Shengde; Guo, Liming; Zhou, Qiong; Zhao, Bowen; Mo, Xiaotong; Yang, Qian; Huang, Yajuan; Li, Kewei; Fan, Yusi; Huang, Lan; Zhou, Fengfeng.

Comput Biol Med ; 163: 107187, 2023 09.

Artículo en Inglés | MEDLINE | ID: mdl-37393787

RESUMEN

Artificial intelligence (AI) has achieved significant progress in the field of drug discovery. AI-based tools have been used in all aspects of drug discovery, including chemical structure recognition. We propose a chemical structure recognition framework, Optical Chemical Molecular Recognition (OCMR), to improve the data extraction capability in practical scenarios compared with the rule-based and end-to-end deep learning models. The proposed OCMR framework enhances the recognition performances via the integration of local information in the topology of molecular graphs. OCMR handles complex tasks like non-canonical drawing and atomic group abbreviation and substantially improves the current state-of-the-art results on multiple public benchmark datasets and one internally curated dataset.

Asunto(s)

Inteligencia Artificial , Benchmarking , Descubrimiento de Drogas

20.

Predicting the antigenic evolution of SARS-COV-2 with deep learning.

Han, Wenkai; Chen, Ningning; Xu, Xinzhou; Sahil, Adil; Zhou, Juexiao; Li, Zhongxiao; Zhong, Huawen; Gao, Elva; Zhang, Ruochi; Wang, Yu; Sun, Shiwei; Cheung, Peter Pak-Hang; Gao, Xin.

Nat Commun ; 14(1): 3478, 2023 06 13.

Artículo en Inglés | MEDLINE | ID: mdl-37311849

RESUMEN

The relentless evolution of SARS-CoV-2 poses a significant threat to public health, as it adapts to immune pressure from vaccines and natural infections. Gaining insights into potential antigenic changes is critical but challenging due to the vast sequence space. Here, we introduce the Machine Learning-guided Antigenic Evolution Prediction (MLAEP), which combines structure modeling, multi-task learning, and genetic algorithms to predict the viral fitness landscape and explore antigenic evolution via in silico directed evolution. By analyzing existing SARS-CoV-2 variants, MLAEP accurately infers variant order along antigenic evolutionary trajectories, correlating with corresponding sampling time. Our approach identified novel mutations in immunocompromised COVID-19 patients and emerging variants like XBB1.5. Additionally, MLAEP predictions were validated through in vitro neutralizing antibody binding assays, demonstrating that the predicted variants exhibited enhanced immune evasion. By profiling existing variants and predicting potential antigenic changes, MLAEP aids in vaccine development and enhances preparedness against future SARS-CoV-2 variants.

Asunto(s)

COVID-19 , Aprendizaje Profundo , Humanos , SARS-CoV-2/genética , Anticuerpos Neutralizantes

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA