Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
1.
Biophys Rep ; 10(3): 135-151, 2024 Jun 30.
Article in English | MEDLINE | ID: mdl-39027316

ABSTRACT

Determining correlations between molecules at various levels is an important topic in molecular biology. Large language models have demonstrated a remarkable ability to capture correlations from large amounts of data in the field of natural language processing as well as image generation, and correlations captured from data using large language models can also be applicable to solving a wide range of specific tasks, hence large language models are also referred to as foundation models. The massive amount of data that exists in the field of molecular biology provides an excellent basis for the development of foundation models, and the recent emergence of foundation models in the field of molecular biology has really pushed the entire field forward. We summarize the foundation models developed based on RNA sequence data, DNA sequence data, protein sequence data, single-cell transcriptome data, and spatial transcriptome data respectively, and further discuss the research directions for the development of foundation models in molecular biology.

2.
Genome Med ; 15(1): 105, 2023 Dec 01.
Article in English | MEDLINE | ID: mdl-38041202

ABSTRACT

BACKGROUND: The precise characterization of individual tumors and immune microenvironments using transcriptome sequencing has provided a great opportunity for successful personalized cancer treatment. However, the cancer treatment response is often characterized by in vitro assays or bulk transcriptomes that neglect the heterogeneity of malignant tumors in vivo and the immune microenvironment, motivating the need to use single-cell transcriptomes for personalized cancer treatment. METHODS: Here, we present comboSC, a computational proof-of-concept study to explore the feasibility of personalized cancer combination therapy optimization using single-cell transcriptomes. ComboSC provides a workable solution to stratify individual patient samples based on quantitative evaluation of their personalized immune microenvironment with single-cell RNA sequencing and maximize the translational potential of in vitro cellular response to unify the identification of synergistic drug/small molecule combinations or small molecules that can be paired with immune checkpoint inhibitors to boost immunotherapy from a large collection of small molecules and drugs, and finally prioritize them for personalized clinical use based on bipartition graph optimization. RESULTS: We apply comboSC to publicly available 119 single-cell transcriptome data from a comprehensive set of 119 tumor samples from 15 cancer types and validate the predicted drug combination with literature evidence, mining clinical trial data, perturbation of patient-derived cell line data, and finally in-vivo samples. CONCLUSIONS: Overall, comboSC provides a feasible and one-stop computational prototype and a proof-of-concept study to predict potential drug combinations for further experimental validation and clinical usage using the single-cell transcriptome, which will facilitate and accelerate personalized tumor treatment by reducing screening time from a large drug combination space and saving valuable treatment time for individual patients. A user-friendly web server of comboSC for both clinical and research users is available at www.combosc.top . The source code is also available on GitHub at https://github.com/bm2-lab/comboSC .


Subject(s)
Neoplasms , Transcriptome , Humans , Neoplasms/drug therapy , Neoplasms/genetics , Combined Modality Therapy , Software , Drug Combinations , Tumor Microenvironment , Single-Cell Analysis
3.
Nat Commun ; 14(1): 7521, 2023 11 18.
Article in English | MEDLINE | ID: mdl-37980345

ABSTRACT

The powerful CRISPR genome editing system is hindered by its off-target effects, and existing computational tools achieved limited performance in genome-wide off-target prediction due to the lack of deep understanding of the CRISPR molecular mechanism. In this study, we propose to incorporate molecular dynamics (MD) simulations in the computational analysis of CRISPR system, and present CRISOT, an integrated tool suite containing four related modules, i.e., CRISOT-FP, CRISOT-Score, CRISOT-Spec, CRISORT-Opti for RNA-DNA molecular interaction fingerprint generation, genome-wide CRISPR off-target prediction, sgRNA specificity evaluation and sgRNA optimization of Cas9 system respectively. Our comprehensive computational and experimental tests reveal that CRISOT outperforms existing tools with extensive in silico validations and proof-of-concept experimental validations. In addition, CRISOT shows potential in accurately predicting off-target effects of the base editors and prime editors, indicating that the derived RNA-DNA molecular interaction fingerprint captures the underlying mechanisms of RNA-DNA interaction among distinct CRISPR systems. Collectively, CRISOT provides an efficient and generalizable framework for genome-wide CRISPR off-target prediction, evaluation and sgRNA optimization for improved targeting specificity in CRISPR genome editing.


Subject(s)
CRISPR-Cas Systems , RNA , CRISPR-Cas Systems/genetics , RNA/genetics , RNA, Guide, CRISPR-Cas Systems , Gene Editing , DNA/genetics
6.
Article in English | MEDLINE | ID: mdl-35792260

ABSTRACT

Base editing technology is being increasingly applied in genome engineering, but the current strategy for designing guide RNA (gRNA) relies substantially on empirical experience rather than a dependable and efficient in silico design. Furthermore, the pleiotropic effect of base editing on disease treatment remains unexplored, which prevents its further clinical usage. Here, we presented BExplorer, an integrated and comprehensive computational pipeline to optimize the design of gRNAs for 26 existing types of base editors in silico. Using BExplorer, we described its results for two types of mainstream base editors, BE3 and ABE7.10, and evaluated the pleiotropic effect of the corresponding base editing loci. BExplorer revealed 524 and 900 editable pathogenic Single Nucleotide Polymorphism (SNP) loci in the human genome together with the selected optimized gRNAs for BE3 and ABE7.10, respectively. In addition, the impact of 707 edited pathogenic SNP loci following base editing on 151 diseases was systematically explored by revealing their pleiotropic effects, indicating that base editing should be carefully utilized given the potential pleiotropic effects. Collectively, the systematic exploration of optimized base editing gRNA design and the corresponding pleiotropic effects with BExplorer provides a computational basis for applying base editing in disease treatment.

7.
Genome Biol ; 23(1): 20, 2022 01 12.
Article in English | MEDLINE | ID: mdl-35022082

ABSTRACT

Here, we present a multi-modal deep generative model, the single-cell Multi-View Profiler (scMVP), which is designed for handling sequencing data that simultaneously measure gene expression and chromatin accessibility in the same cell, including SNARE-seq, sci-CAR, Paired-seq, SHARE-seq, and Multiome from 10X Genomics. scMVP generates common latent representations for dimensionality reduction, cell clustering, and developmental trajectory inference and generates separate imputations for differential analysis and cis-regulatory element identification. scMVP can help mitigate data sparsity issues with imputation and accurately identify cell groups for different joint profiling techniques with common latent embedding, and we demonstrate its advantages on several realistic datasets.


Subject(s)
Chromatin Immunoprecipitation Sequencing , Chromatin , Cluster Analysis , RNA-Seq , Regulatory Sequences, Nucleic Acid , Single-Cell Analysis/methods
8.
Chem Sci ; 12(43): 14459-14472, 2021 Nov 10.
Article in English | MEDLINE | ID: mdl-34880997

ABSTRACT

Various computational methods have been developed for quantitative modeling of organic chemical reactions; however, the lack of universality as well as the requirement of large amounts of experimental data limit their broad applications. Here, we present DeepReac+, an efficient and universal computational framework for prediction of chemical reaction outcomes and identification of optimal reaction conditions based on deep active learning. Under this framework, DeepReac is designed as a graph-neural-network-based model, which directly takes 2D molecular structures as inputs and automatically adapts to different prediction tasks. In addition, carefully-designed active learning strategies are incorporated to substantially reduce the number of necessary experiments for model training. We demonstrate the universality and high efficiency of DeepReac+ by achieving the state-of-the-art results with a minimum of labeled data on three diverse chemical reaction datasets in several scenarios. Collectively, DeepReac+ has great potential and utility in the development of AI-aided chemical synthesis. DeepReac+ is freely accessible at https://github.com/bm2-lab/DeepReac.

9.
Bioinformatics ; 36(22-23): 5492-5498, 2021 04 01.
Article in English | MEDLINE | ID: mdl-33289524

ABSTRACT

MOTIVATION: Quantitative structure-activity relationship (QSAR) analysis is commonly used in drug discovery. Collaborations among pharmaceutical institutions can lead to a better performance in QSAR prediction, however, intellectual property and related financial interests remain substantially hindering inter-institutional collaborations in QSAR modeling for drug discovery. RESULTS: For the first time, we verified the feasibility of applying the horizontal federated learning (HFL), which is a recently developed collaborative and privacy-preserving learning framework to perform QSAR analysis. A prototype platform of federated-learning-based QSAR modeling for collaborative drug discovery, i.e. FL-QSAR, is presented accordingly. We first compared the HFL framework with a classic privacy-preserving computation framework, i.e. secure multiparty computation to indicate its difference from various perspective. Then we compared FL-QSAR with the public collaboration in terms of QSAR modeling. Our extensive experiments demonstrated that (i) collaboration by FL-QSAR outperforms a single client using only its private data, and (ii) collaboration by FL-QSAR achieves almost the same performance as that of collaboration via cleartext learning algorithms using all shared information. Taking together, our results indicate that FL-QSAR under the HFL framework provides an efficient solution to break the barriers between pharmaceutical institutions in QSAR modeling, therefore promote the development of collaborative and privacy-preserving drug discovery with extendable ability to other privacy-related biomedical areas. AVAILABILITY AND IMPLEMENTATION: The source codes of FL-QSAR are available on the GitHub: https://github.com/bm2-lab/FL-QSAR. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Drug Discovery , Quantitative Structure-Activity Relationship , Algorithms , Humans , Privacy
10.
Nucleic Acids Res ; 48(20): 11370-11379, 2020 11 18.
Article in English | MEDLINE | ID: mdl-33137817

ABSTRACT

Systematic evaluation of genome-wide Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) off-target profiles is a fundamental step for the successful application of the CRISPR system to clinical therapies. Many experimental techniques and in silico tools have been proposed for detecting and predicting genome-wide CRISPR off-target profiles. These techniques and tools, however, have not been systematically benchmarked. A comprehensive benchmark study and an integrated strategy that takes advantage of the currently available tools to improve predictions of genome-wide CRISPR off-target profiles are needed. We focused on the specificity of the traditional CRISPR SpCas9 system for gene knockout. First, we benchmarked 10 available genome-wide off-target cleavage site (OTS) detection techniques with the published OTS detection datasets. Second, taking the datasets generated from OTS detection techniques as the benchmark datasets, we benchmarked 17 available in silico genome-wide OTS prediction tools to evaluate their genome-wide CRISPR off-target prediction performances. Finally, we present the first one-stop integrated Genome-Wide Off-target cleavage Search platform (iGWOS) that was specifically designed for the optimal genome-wide OTS prediction by integrating the available OTS prediction algorithms with an AdaBoost ensemble framework.


Subject(s)
CRISPR-Cas Systems , Gene Editing/methods , Genomics/methods , Algorithms , Benchmarking , Cell Line, Tumor , Computer Simulation , Databases, Genetic , Gene Knockout Techniques , Genome , Humans , Models, Molecular , RNA, Guide, Kinetoplastida , Whole Genome Sequencing
11.
Sci Adv ; 6(44)2020 10.
Article in English | MEDLINE | ID: mdl-33127686

ABSTRACT

Efficient single-cell assignment without prior marker gene annotations is essential for single-cell sequencing data analysis. Current methods, however, have limited effectiveness for distinct single-cell assignment. They failed to achieve a well-generalized performance in different tasks because of the inherent heterogeneity of different single-cell sequencing datasets and different single-cell types. Furthermore, current methods are inefficient to identify novel cell types that are absent in the reference datasets. To this end, we present scLearn, a learning-based framework that automatically infers quantitative measurement/similarity and threshold that can be used for different single-cell assignment tasks, achieving a well-generalized assignment performance on different single-cell types. We evaluated scLearn on a comprehensive set of publicly available benchmark datasets. We proved that scLearn outperformed the comparable existing methods for single-cell assignment from various aspects, demonstrating state-of-the-art effectiveness with a reliable and generalized single-cell type identification and categorizing ability.

12.
Brief Bioinform ; 21(4): 1448-1454, 2020 07 15.
Article in English | MEDLINE | ID: mdl-31267129

ABSTRACT

For genome-wide CRISPR off-target cleavage sites (OTS) prediction, an important issue is data imbalance-the number of true OTS recognized by whole-genome off-target detection techniques is much smaller than that of all possible nucleotide mismatch loci, making the training of machine learning model very challenging. Therefore, computational models proposed for OTS prediction and scoring should be carefully designed and properly evaluated in order to avoid bias. In our study, two tools are taken as examples to further emphasize the data imbalance issue in CRISPR off-target prediction to achieve better sensitivity and specificity for optimized CRISPR gene editing. We would like to indicate that (1) the benchmark of CRISPR off-target prediction should be properly evaluated and not overestimated by considering data imbalance issue; (2) incorporation of efficient computational techniques (including ensemble learning and data synthesis techniques) can help to address the data imbalance issue and improve the performance of CRISPR off-target prediction. Taking together, we call for more efforts to address the data imbalance issue in CRISPR off-target prediction to facilitate clinical utility of CRISPR-based gene editing techniques.


Subject(s)
Clustered Regularly Interspaced Short Palindromic Repeats , Gene Editing/methods , Machine Learning
13.
Genome Med ; 11(1): 67, 2019 10 30.
Article in English | MEDLINE | ID: mdl-31666118

ABSTRACT

BACKGROUND: Cancer neoantigens are expressed only in cancer cells and presented on the tumor cell surface in complex with major histocompatibility complex (MHC) class I proteins for recognition by cytotoxic T cells. Accurate and rapid identification of neoantigens play a pivotal role in cancer immunotherapy. Although several in silico tools for neoantigen prediction have been presented, limitations of these tools exist. RESULTS: We developed pTuneos, a computational pipeline for prioritizing tumor neoantigens from next-generation sequencing data. We tested the performance of pTuneos on the melanoma cancer vaccine cohort data and tumor-infiltrating lymphocyte (TIL)-recognized neopeptide data. pTuneos is able to predict the MHC presentation and T cell recognition ability of the candidate neoantigens, and the actual immunogenicity of single-nucleotide variant (SNV)-based neopeptides considering their natural processing and presentation, surpassing the existing tools with a comprehensive and quantitative benchmark of their neoantigen prioritization performance and running time. pTuneos was further tested on The Cancer Genome Atlas (TCGA) cohort data as well as the melanoma and non-small cell lung cancer (NSCLC) cohort data undergoing checkpoint blockade immunotherapy. The overall neoantigen immunogenicity score proposed by pTuneos is demonstrated to be a powerful and pan-cancer marker for survival prediction compared to traditional well-established biomarkers. CONCLUSIONS: In summary, pTuneos provides the state-of-the-art one-stop and user-friendly solution for prioritizing SNV-based candidate neoepitopes, which could help to advance research on next-generation cancer immunotherapies and personalized cancer vaccines. pTuneos is available at https://github.com/bm2-lab/pTuneos , with a Docker version for quick deployment at https://cloud.docker.com/u/bm2lab/repository/docker/bm2lab/ptuneos .


Subject(s)
Antigens, Neoplasm/immunology , Cancer Vaccines/immunology , Carcinoma, Non-Small-Cell Lung/immunology , High-Throughput Nucleotide Sequencing/methods , Lung Neoplasms/immunology , Melanoma/immunology , Software , Antigens, Neoplasm/analysis , Antigens, Neoplasm/genetics , Cancer Vaccines/genetics , Carcinoma, Non-Small-Cell Lung/drug therapy , Carcinoma, Non-Small-Cell Lung/genetics , Cohort Studies , Genome , Humans , Immunotherapy/methods , Lung Neoplasms/drug therapy , Lung Neoplasms/genetics , Melanoma/drug therapy , Melanoma/genetics , T-Lymphocytes, Cytotoxic
14.
Genome Biol ; 19(1): 80, 2018 06 26.
Article in English | MEDLINE | ID: mdl-29945655

ABSTRACT

A major challenge for effective application of CRISPR systems is to accurately predict the single guide RNA (sgRNA) on-target knockout efficacy and off-target profile, which would facilitate the optimized design of sgRNAs with high sensitivity and specificity. Here we present DeepCRISPR, a comprehensive computational platform to unify sgRNA on-target and off-target site prediction into one framework with deep learning, surpassing available state-of-the-art in silico tools. In addition, DeepCRISPR fully automates the identification of sequence and epigenetic features that may affect sgRNA knockout efficacy in a data-driven manner. DeepCRISPR is available at http://www.deepcrispr.net/ .


Subject(s)
CRISPR-Cas Systems/genetics , Clustered Regularly Interspaced Short Palindromic Repeats/genetics , RNA, Guide, Kinetoplastida/genetics , Cell Line , Cell Line, Tumor , Computational Biology/methods , Computer Simulation , HCT116 Cells , HEK293 Cells , HL-60 Cells , HeLa Cells , Humans , Machine Learning , RNA Editing/genetics
15.
Brief Bioinform ; 19(4): 721-724, 2018 07 20.
Article in English | MEDLINE | ID: mdl-28203699

ABSTRACT

CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)-based gene editing has been widely implemented in various cell types and organisms. A major challenge in the effective application of the CRISPR system is the need to design highly efficient single-guide RNA (sgRNA) with minimal off-target cleavage. Several tools are available for sgRNA design, while limited tools were compared. In our opinion, benchmarking the performance of the available tools and indicating their applicable scenarios are important issues. Moreover, whether the reported sgRNA design rules are reproducible across different sgRNA libraries, cell types and organisms remains unclear. In our study, a systematic and unbiased benchmark of the sgRNA predicting efficacy was performed on nine representative on-target design tools, based on six benchmark data sets covering five different cell types. The benchmark study presented here provides novel quantitative insights into the available CRISPR tools.


Subject(s)
Benchmarking/methods , CRISPR-Cas Systems , Gene Editing , RNA, Guide, Kinetoplastida/genetics , Computer Simulation , Humans
16.
Sheng Wu Gong Cheng Xue Bao ; 33(10): 1744-1756, 2017 Oct 25.
Article in Chinese | MEDLINE | ID: mdl-29082722

ABSTRACT

CRISPR-based genome editing has been widely implemented in various cell types. In-silico single guide RNA (sgRNA) design is a key step for successful gene editing using CRISPR system. Continuing efforts are made to refine in-silico sgRNA design with high on-target efficacy and reduced off-target effects. In this paper, we summarize the present sgRNA design tools, and show that efficient in-silico models can be built that integrate current heterogeneous genome-editing data to derive unbiased sgRNA design rules and identify key features for improving sgRNA design. Our review shows that systematic comparisons and evaluation of on-target and off-target effects of sgRNA will allow more precise genome editing and gene therapies using the CRISPR system.


Subject(s)
CRISPR-Cas Systems , Computer Simulation , RNA, Guide, Kinetoplastida , Clustered Regularly Interspaced Short Palindromic Repeats
17.
BMC Genomics ; 18(Suppl 1): 962, 2017 01 25.
Article in English | MEDLINE | ID: mdl-28198670

ABSTRACT

BACKGROUND: Deciphering taxonomical structures based on high dimensional sequencing data is still challenging in metagenomics study. Moreover, the common workflow processed in this field fails to identify microbial communities and their effect on a specific disease status. Even the relationships and interactions between different bacteria in a microbial community keep unknown. RESULTS: MetaTopics can efficiently extract the latent microbial communities which reflect the intrinsic relations or interactions among several major microbes. Furthermore, a quantitative measurement, Quetelet Index, is defined to estimate the influence of a latent sub-community on a certain disease status for given samples. An analysis of our in-house oral metagenomics data and public gut microbe data was presented to demonstrate the application and usefulness of MetaTopics. To preset a user-friendly R package, we have built a dedicated website, https://github.com/bm2-lab/MetaTopics , which includes free downloads, detailed tutorials and illustration examples. CONCLUSIONS: MetaTopics is the first interactive R package to integrate the state-of-arts topic model derived from statistical learning community to analyze and visualize the metagenomics taxonomy data.


Subject(s)
Computational Biology/methods , Metagenome , Metagenomics/methods , Microbiota , Software , Web Browser
18.
Trends Biotechnol ; 35(1): 12-21, 2017 01.
Article in English | MEDLINE | ID: mdl-27418421

ABSTRACT

CRISPR-based genome editing has been widely implemented in various cell types. In silico single guide RNA (sgRNA) design is a key step for successful gene editing using the CRISPR system, and continuing efforts are aimed at refining in silico sgRNA design with high on-target efficacy and reduced off-target effects. Many sgRNA design tools are available, but careful assessments of their application scenarios and performance benchmarks across different types of genome-editing data are needed. Efficient in silico models can be built that integrate current heterogeneous genome-editing data to derive unbiased sgRNA design rules and identify key features for improving sgRNA design. Comprehensive evaluation of on-target and off-target effects of sgRNA will allow more precise genome editing and gene therapies using the CRISPR system.


Subject(s)
Clustered Regularly Interspaced Short Palindromic Repeats/genetics , Gene Editing/methods , Gene Targeting/methods , High-Throughput Nucleotide Sequencing/methods , RNA Editing/genetics , RNA, Guide, Kinetoplastida/genetics , Sequence Analysis, RNA/methods , Animals , Computer Simulation , Humans , Models, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL