Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 166
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Nat Immunol ; 25(8): 1383-1394, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38942990

ABSTRACT

The immunological mechanisms underlying chronic colitis are poorly understood. T follicular helper (TFH) cells are critical in helping B cells during germinal center reactions. In a T cell transfer colitis model, a lymphoid structure composed of mature dendritic cells (DCs) and TFH cells was found within T cell zones of colonic lymphoid follicles. TFH cells were required for mature DC accumulation, the formation of DC-T cell clusters and colitis development. Moreover, DCs promoted TFH cell differentiation, contributing to colitis development. A lineage-tracing analysis showed that, following migration to the lamina propria, TFH cells transdifferentiated into long-lived pathogenic TH1 cells, promoting colitis development. Our findings have therefore demonstrated the reciprocal regulation of TFH cells and DCs in colonic lymphoid follicles, which is critical in chronic colitis pathogenesis.


Subject(s)
Cell Differentiation , Colitis , Dendritic Cells , T Follicular Helper Cells , Animals , Dendritic Cells/immunology , Colitis/immunology , Colitis/pathology , T Follicular Helper Cells/immunology , Mice , Cell Differentiation/immunology , Mice, Inbred C57BL , Disease Models, Animal , Th1 Cells/immunology , Colon/immunology , Colon/pathology , Mice, Knockout , Germinal Center/immunology , Mice, Transgenic
2.
Nat Methods ; 21(8): 1481-1491, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38844628

ABSTRACT

Large pretrained models have become foundation models leading to breakthroughs in natural language processing and related fields. Developing foundation models for deciphering the 'languages' of cells and facilitating biomedical research is promising yet challenging. Here we developed a large pretrained model scFoundation, also named 'xTrimoscFoundationα', with 100 million parameters covering about 20,000 genes, pretrained on over 50 million human single-cell transcriptomic profiles. scFoundation is a large-scale model in terms of the size of trainable parameters, dimensionality of genes and volume of training data. Its asymmetric transformer-like architecture and pretraining task design empower effectively capturing complex context relations among genes in a variety of cell types and states. Experiments showed its merit as a foundation model that achieved state-of-the-art performances in a diverse array of single-cell analysis tasks such as gene expression enhancement, tissue drug response prediction, single-cell drug response classification, single-cell perturbation prediction, cell type annotation and gene module inference.


Subject(s)
Gene Expression Profiling , Single-Cell Analysis , Transcriptome , Single-Cell Analysis/methods , Humans , Gene Expression Profiling/methods , Computational Biology/methods , Algorithms
3.
Genome Res ; 33(10): 1788-1805, 2023 10.
Article in English | MEDLINE | ID: mdl-37827697

ABSTRACT

Cell-cell communication (CCC) is critical for determining cell fates and functions in multicellular organisms. With the advent of single-cell RNA-sequencing (scRNA-seq) and spatial transcriptomics (ST), an increasing number of CCC inference methods have been developed. Nevertheless, a thorough comparison of their performances is yet to be conducted. To fill this gap, we developed a systematic benchmark framework called ESICCC to evaluate 18 ligand-receptor (LR) inference methods and five ligand/receptor-target inference methods using a total of 116 data sets, including 15 ST data sets, 15 sets of cell line perturbation data, two sets of cell type-specific expression/proteomics data, and 84 sets of sampled or unsampled scRNA-seq data. We evaluated and compared the agreement, accuracy, robustness, and usability of these methods. Regarding accuracy evaluation, RNAMagnet, CellChat, and scSeqComm emerge as the three best-performing methods for intercellular ligand-receptor inference based on scRNA-seq data, whereas stMLnet and HoloNet are the best methods for predicting ligand/receptor-target regulation using ST data. To facilitate the practical applications, we provide a decision-tree-style guideline for users to easily choose best tools for their specific research concerns in CCC inference, and develop an ensemble pipeline CCCbank that enables versatile combinations of methods and databases. Moreover, our comparative results also uncover several critical influential factors for CCC inference, such as prior interaction information, ligand-receptor scoring algorithm, intracellular signaling complexity, and spatial relationship, which may be considered in the future studies to advance the development of new methodologies.


Subject(s)
Single-Cell Analysis , Software , Ligands , Single-Cell Analysis/methods , Algorithms , Cell Communication/genetics , Sequence Analysis, RNA/methods
4.
Genome Res ; 33(10): 1757-1773, 2023 10.
Article in English | MEDLINE | ID: mdl-37903634

ABSTRACT

Rapid advances in spatial transcriptomics (ST) have revolutionized the interrogation of spatial heterogeneity and increase the demand for comprehensive methods to effectively characterize spatial domains. As a prerequisite for ST data analysis, spatial domain characterization is a crucial step for downstream analyses and biological implications. Here we propose a prior-based self-attention framework for spatial transcriptomics (PAST), a variational graph convolutional autoencoder for ST, which effectively integrates prior information via a Bayesian neural network, captures spatial patterns via a self-attention mechanism, and enables scalable application via a ripple walk sampler strategy. Through comprehensive experiments on data sets generated by different technologies, we show that PAST can effectively characterize spatial domains and facilitate various downstream analyses, including ST visualization, spatial trajectory inference and pseudotime analysis. Also, we highlight the advantages of PAST for multislice joint embedding and automatic annotation of spatial domains in newly sequenced ST data. Compared with existing methods, PAST is the first ST method that integrates reference data to analyze ST data. We anticipate that PAST will open up new avenues for researchers to decipher ST data with customized reference data, which expands the applicability of ST technology.


Subject(s)
Gene Expression Profiling , Transcriptome , Bayes Theorem , Neural Networks, Computer , Spatial Analysis
5.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38493343

ABSTRACT

Recent advancements in single-cell sequencing technologies have generated extensive omics data in various modalities and revolutionized cell research, especially in the single-cell RNA and ATAC data. The joint analysis across scRNA-seq data and scATAC-seq data has paved the way to comprehending the cellular heterogeneity and complex cellular regulatory networks. Multi-omics integration is gaining attention as an important step in joint analysis, and the number of computational tools in this field is growing rapidly. In this paper, we benchmarked 12 multi-omics integration methods on three integration tasks via qualitative visualization and quantitative metrics, considering six main aspects that matter in multi-omics data analysis. Overall, we found that different methods have their own advantages on different aspects, while some methods outperformed other methods in most aspects. We therefore provided guidelines for selecting appropriate methods for specific scenarios and tasks to help obtain meaningful insights from multi-omics data integration.


Subject(s)
Benchmarking , Multiomics , Algorithms , Cell Cycle , RNA
6.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38324621

ABSTRACT

Single-cell clustered regularly interspaced short palindromic repeats-sequencing (scCRISPR-seq) is an emerging high-throughput CRISPR screening technology where the true cellular response to perturbation is coupled with infected proportion bias of guide RNAs (gRNAs) across different cell clusters. The mixing of these effects introduces noise into scCRISPR-seq data analysis and thus obstacles to relevant studies. We developed scDecouple to decouple true cellular response of perturbation from the influence of infected proportion bias. scDecouple first models the distribution of gene expression profiles in perturbed cells and then iteratively finds the maximum likelihood of cell cluster proportions as well as the cellular response for each gRNA. We demonstrated its performance in a series of simulation experiments. By applying scDecouple to real scCRISPR-seq data, we found that scDecouple enhances the identification of biologically perturbation-related genes. scDecouple can benefit scCRISPR-seq data analysis, especially in the case of heterogeneous samples or complex gRNA libraries.


Subject(s)
High-Throughput Screening Assays , RNA, Guide, CRISPR-Cas Systems
7.
Nucleic Acids Res ; 2024 Oct 18.
Article in English | MEDLINE | ID: mdl-39420637

ABSTRACT

Investigating mutations, including single nucleotide variations (SNVs), gene fusions, alternative splicing and copy number variations (CNVs), is fundamental to cancer study. Recent computational methods and biological research have demonstrated the reliability and biological significance of detecting mutations from single-cell transcriptomic data. However, there is a lack of a single-cell-level database containing comprehensive mutation information in all types of cancer. Establishing a single-cell mutation landscape from the huge emerging single-cell transcriptomic data can provide a critical resource for elucidating the mechanisms of tumorigenesis and evolution. Here, we developed scTML (http://sctml.xglab.tech/), the first database offering a pan-cancer single-cell landscape of multiple mutation types. It includes SNVs, insertions/deletions, gene fusions, alternative splicing and CNVs, along with gene expression, cell states and other phenotype information. The data are from 74 datasets with 2 582 633 cells, including 35 full-length (Smart-seq2) transcriptomic single-cell datasets (all publicly available data with raw sequencing files), 23 datasets from 10X technology and 16 spatial transcriptomic datasets. scTML enables users to interactively explore multiple mutation landscapes across tumors or cell types, analyze single-cell-level mutation-phenotype associations and detect cell subclusters of interest. scTML is an important resource that will significantly advance deciphering intra-tumor and inter-tumor heterogeneity, and how mutations shape cell phenotypes.

8.
Proc Natl Acad Sci U S A ; 120(15): e2216698120, 2023 04 11.
Article in English | MEDLINE | ID: mdl-37023129

ABSTRACT

Discovering DNA regulatory sequence motifs and their relative positions is vital to understanding the mechanisms of gene expression regulation. Although deep convolutional neural networks (CNNs) have achieved great success in predicting cis-regulatory elements, the discovery of motifs and their combinatorial patterns from these CNN models has remained difficult. We show that the main difficulty is due to the problem of multifaceted neurons which respond to multiple types of sequence patterns. Since existing interpretation methods were mainly designed to visualize the class of sequences that can activate the neuron, the resulting visualization will correspond to a mixture of patterns. Such a mixture is usually difficult to interpret without resolving the mixed patterns. We propose the NeuronMotif algorithm to interpret such neurons. Given any convolutional neuron (CN) in the network, NeuronMotif first generates a large sample of sequences capable of activating the CN, which typically consists of a mixture of patterns. Then, the sequences are "demixed" in a layer-wise manner by backward clustering of the feature maps of the involved convolutional layers. NeuronMotif can output the sequence motifs, and the syntax rules governing their combinations are depicted by position weight matrices organized in tree structures. Compared to existing methods, the motifs found by NeuronMotif have more matches to known motifs in the JASPAR database. The higher-order patterns uncovered for deep CNs are supported by the literature and ATAC-seq footprinting. Overall, NeuronMotif enables the deciphering of cis-regulatory codes from deep CNs and enhances the utility of CNN in genome interpretation.


Subject(s)
Algorithms , Neural Networks, Computer , Nucleotide Motifs/genetics , Regulatory Sequences, Nucleic Acid/genetics , Databases, Factual
9.
Brief Bioinform ; 24(6)2023 09 22.
Article in English | MEDLINE | ID: mdl-37824741

ABSTRACT

Cell-cell communication events (CEs) are mediated by multiple ligand-receptor (LR) pairs. Usually only a particular subset of CEs directly works for a specific downstream response in a particular microenvironment. We name them as functional communication events (FCEs) of the target responses. Decoding FCE-target gene relations is: important for understanding the mechanisms of many biological processes, but has been intractable due to the mixing of multiple factors and the lack of direct observations. We developed a method HoloNet for decoding FCEs using spatial transcriptomic data by integrating LR pairs, cell-type spatial distribution and downstream gene expression into a deep learning model. We modeled CEs as a multi-view network, developed an attention-based graph learning method to train the model for generating target gene expression with the CE networks, and decoded the FCEs for specific downstream genes by interpreting trained models. We applied HoloNet on three Visium datasets of breast cancer and liver cancer. The results detangled the multiple factors of FCEs by revealing how LR signals and cell types affect specific biological processes, and specified FCE-induced effects in each single cell. We conducted simulation experiments and showed that HoloNet is more reliable on LR prioritization in comparison with existing methods. HoloNet is a powerful tool to illustrate cell-cell communication landscapes and reveal vital FCEs that shape cellular phenotypes. HoloNet is available as a Python package at https://github.com/lhc17/HoloNet.


Subject(s)
Liver Neoplasms , Transcriptome , Humans , Gene Expression Profiling , Cell Communication/genetics , Computer Simulation , Tumor Microenvironment
10.
Bioinformatics ; 40(9)2024 09 02.
Article in English | MEDLINE | ID: mdl-39171840

ABSTRACT

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) data are important for studying the laws of life at single-cell level. However, it is still challenging to obtain enough high-quality scRNA-seq data. To mitigate the limited availability of data, generative models have been proposed to computationally generate synthetic scRNA-seq data. Nevertheless, the data generated with current models are not very realistic yet, especially when we need to generate data with controlled conditions. In the meantime, diffusion models have shown their power in generating data with high fidelity, providing a new opportunity for scRNA-seq generation. RESULTS: In this study, we developed scDiffusion, a generative model combining the diffusion model and foundation model to generate high-quality scRNA-seq data with controlled conditions. We designed multiple classifiers to guide the diffusion process simultaneously, enabling scDiffusion to generate data under multiple condition combinations. We also proposed a new control strategy called Gradient Interpolation. This strategy allows the model to generate continuous trajectories of cell development from a given cell state. Experiments showed that scDiffusion could generate single-cell gene expression data closely resembling real scRNA-seq data. Also, scDiffusion can conditionally produce data on specific cell types including rare cell types. Furthermore, we could use the multiple-condition generation of scDiffusion to generate cell type that was out of the training data. Leveraging the Gradient Interpolation strategy, we generated a continuous developmental trajectory of mouse embryonic cells. These experiments demonstrate that scDiffusion is a powerful tool for augmenting the real scRNA-seq data and can provide insights into cell fate research. AVAILABILITY AND IMPLEMENTATION: scDiffusion is openly available at the GitHub repository https://github.com/EperLuo/scDiffusion or Zenodo https://zenodo.org/doi/10.5281/zenodo.13268742.


Subject(s)
Single-Cell Analysis , Single-Cell Analysis/methods , Animals , Mice , Sequence Analysis, RNA/methods , Algorithms , Computational Biology/methods
11.
Brief Bioinform ; 23(1)2022 01 17.
Article in English | MEDLINE | ID: mdl-34472588

ABSTRACT

Quantifying cell proportions, especially for rare cell types in some scenarios, is of great value in tracking signals associated with certain phenotypes or diseases. Although some methods have been proposed to infer cell proportions from multicomponent bulk data, they are substantially less effective for estimating the proportions of rare cell types which are highly sensitive to feature outliers and collinearity. Here we proposed a new deconvolution algorithm named ARIC to estimate cell type proportions from gene expression or DNA methylation data. ARIC employs a novel two-step marker selection strategy, including collinear feature elimination based on the component-wise condition number and adaptive removal of outlier markers. This strategy can systematically obtain effective markers for weighted $\upsilon$-support vector regression to ensure a robust and precise rare proportion prediction. We showed that ARIC can accurately estimate fractions in both DNA methylation and gene expression data from different experiments. We further applied ARIC to the survival prediction of ovarian cancer and the condition monitoring of chronic kidney disease, and the results demonstrate the high accuracy and robustness as well as clinical potentials of ARIC. Taken together, ARIC is a promising tool to solve the deconvolution problem of bulk data where rare components are of vital importance.


Subject(s)
Algorithms , DNA Methylation , Biomarkers , Gene Expression
12.
Bioinformatics ; 39(8)2023 08 01.
Article in English | MEDLINE | ID: mdl-37494428

ABSTRACT

MOTIVATION: Single-cell chromatin accessibility sequencing (scCAS) technology provides an epigenomic perspective to characterize gene regulatory mechanisms at single-cell resolution. With an increasing number of computational methods proposed for analyzing scCAS data, a powerful simulation framework is desirable for evaluation and validation of these methods. However, existing simulators generate synthetic data by sampling reads from real data or mimicking existing cell states, which is inadequate to provide credible ground-truth labels for method evaluation. RESULTS: We present simCAS, an embedding-based simulator, for generating high-fidelity scCAS data from both cell- and peak-wise embeddings. We demonstrate simCAS outperforms existing simulators in resembling real data and show that simCAS can generate cells of different states with user-defined cell populations and differentiation trajectories. Additionally, simCAS can simulate data from different batches and encode user-specified interactions of chromatin regions in the synthetic data, which provides ground-truth labels more than cell states. We systematically demonstrate that simCAS facilitates the benchmarking of four core tasks in downstream analysis: cell clustering, trajectory inference, data integration, and cis-regulatory interaction inference. We anticipate simCAS will be a reliable and flexible simulator for evaluating the ongoing computational methods applied on scCAS data. AVAILABILITY AND IMPLEMENTATION: simCAS is freely available at https://github.com/Chen-Li-17/simCAS.


Subject(s)
Chromatin , Gene Expression Regulation , Computer Simulation , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing/methods , Single-Cell Analysis/methods
13.
Bioinformatics ; 38(11): 2996-3003, 2022 05 26.
Article in English | MEDLINE | ID: mdl-35394015

ABSTRACT

MOTIVATION: Single-cell technologies play a crucial role in revolutionizing biological research over the past decade, which strengthens our understanding in cell differentiation, development and regulation from a single-cell level perspective. Single-cell RNA sequencing (scRNA-seq) is one of the most common single cell technologies, which enables probing transcriptional states in thousands of cells in one experiment. Identification of cell types from scRNA-seq measurements is a fundamental and crucial question to answer. Most previous studies directly take gene expression as input while ignoring the comprehensive gene-gene interactions. RESULTS: We propose scGraph, an automatic cell identification algorithm leveraging gene interaction relationships to enhance the performance of the cell-type identification. scGraph is based on a graph neural network to aggregate the information of interacting genes. In a series of experiments, we demonstrate that scGraph is accurate and outperforms eight comparison methods in the task of cell-type identification. Moreover, scGraph automatically learns the gene interaction relationships from biological data and the pathway enrichment analysis shows consistent findings with previous analysis, providing insights on the analysis of regulatory mechanism. AVAILABILITY AND IMPLEMENTATION: scGraph is freely available at https://github.com/QijinYin/scGraph and https://figshare.com/articles/software/scGraph/17157743. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Gene Expression Profiling , Single-Cell Analysis , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Gene Expression Profiling/methods , Software , Neural Networks, Computer
14.
Eur Radiol ; 33(2): 893-903, 2023 Feb.
Article in English | MEDLINE | ID: mdl-36001124

ABSTRACT

OBJECTIVES: To quantify intra-tumor heterogeneity (ITH) in non-small cell lung cancer (NSCLC) from computed tomography (CT) images. METHODS: We developed a quantitative ITH measurement-ITHscore-by integrating local radiomic features and global pixel distribution patterns. The associations of ITHscore with tumor phenotypes, genotypes, and patient's prognosis were examined on six patient cohorts (n = 1399) to validate its effectiveness in characterizing ITH. RESULTS: For stage I NSCLC, ITHscore was consistent with tumor progression from stage IA1 to IA3 (p < 0.001) and captured key pathological change in terms of malignancy (p < 0.001). ITHscore distinguished the presence of lymphovascular invasion (p = 0.003) and pleural invasion (p = 0.001) in tumors. ITHscore also separated patient groups with different overall survival (p = 0.004) and disease-free survival conditions (p = 0.005). Radiogenomic analysis showed that the level of ITHscore in stage I and stage II NSCLC is correlated with heterogeneity-related pathways. In addition, ITHscore was proved to be a stable measurement and can be applied to ITH quantification in head-and-neck cancer (HNC). CONCLUSIONS: ITH in NSCLC can be quantified from CT images by ITHscore, which is an indicator for tumor phenotypes and patient's prognosis. KEY POINTS: • ITHscore provides a radiomic quantification of intra-tumor heterogeneity in NSCLC. • ITHscore is an indicator for tumor phenotypes and patient's prognosis. • ITHscore has the potential to be generalized to other cancer types such as HNC.


Subject(s)
Carcinoma, Non-Small-Cell Lung , Head and Neck Neoplasms , Lung Neoplasms , Humans , Carcinoma, Non-Small-Cell Lung/pathology , Lung Neoplasms/pathology , Prognosis , Tomography, X-Ray Computed/methods
15.
Nucleic Acids Res ; 49(W1): W483-W490, 2021 07 02.
Article in English | MEDLINE | ID: mdl-33999180

ABSTRACT

Chromatin accessibility, as a powerful marker of active DNA regulatory elements, provides valuable information for understanding regulatory mechanisms. The revolution in high-throughput methods has accumulated massive chromatin accessibility profiles in public repositories. Nevertheless, utilization of these data is hampered by cumbersome collection, time-consuming processing, and manual chromatin accessibility (openness) annotation of genomic regions. To fill this gap, we developed OpenAnnotate (http://health.tsinghua.edu.cn/openannotate/) as the first web server for efficiently annotating openness of massive genomic regions across various biosample types, tissues, and biological systems. In addition to the annotation resource from 2729 comprehensive profiles of 614 biosample types of human and mouse, OpenAnnotate provides user-friendly functionalities, ultra-efficient calculation, real-time browsing, intuitive visualization, and elaborate application notebooks. We show its unique advantages compared to existing databases and toolkits by effectively revealing cell type-specificity, identifying regulatory elements and 3D chromatin contacts, deciphering gene functional relationships, inferring functions of transcription factors, and unprecedentedly promoting single-cell data analyses. We anticipate OpenAnnotate will provide a promising avenue for researchers to construct a more holistic perspective to understand regulatory mechanisms.


Subject(s)
Chromatin/metabolism , Genomics/methods , Molecular Sequence Annotation/methods , Software , Internet , Regulatory Sequences, Nucleic Acid , Single-Cell Analysis , Transcription Factors/metabolism
16.
BMC Anesthesiol ; 23(1): 160, 2023 05 09.
Article in English | MEDLINE | ID: mdl-37161402

ABSTRACT

OBJECTIVE: To examine the prognostic value of HRV measurements during anesthesia for postoperative clinical outcomes prediction using machine learning models. DATA SOURCES: VitalDB, a comprehensive database of 6388 surgical patients admitted to Seoul National University Hospital. ELIGIBILITY CRITERIA FOR STUDY SELECTION: Cases with ECG lead II recording duration of less than one hour were excluded. Cases with more than 20% of missing HRV measurements were also excluded. A total of 5641 cases were eligible for the analyses. METHODS: Six machine learning models including Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Gradient Boosting Decision Trees (GBT), Extreme Gradient Boosting (XGB), and an ensemble of the five baseline models were developed to predict postoperative clinical outcomes. The prediction models were trained using only clinical information, and using both clinical information and HRV features, respectively. Feature importance based on the SHAP method was used to assess the contribution of the HRV measurements to the outcome predictions. Subgroup analysis was also performed to evaluate the risk association between postoperative ICU stay and various HRV measurements such as heart rate, low-frequency power (LFP), and short-term fluctuation DFA [Formula: see text]. RESULT: The final cohort included 5641 unique cases, among whom 4678 (83.0%) cases had ages over 40, 2877 (51.0%) were male, 1073 (19.0%) stayed in ICU after surgery, 52 (0.9%) suffered in-hospital death, and 3167(56.1%) had a total length of hospital stay longer than 7 days. In the final test set, the highest AUROC performance with only clinical information was 0.79 for postoperative ICU stay, 0.58 for in-hospital mortality, and 0.76 for the total length of hospital stay prediction. Importantly, using both clinical information and HRV features, the AUROC performance was 0.83, 0.70, and 0.76 for the three clinical outcome predictions, respectively. Subgroup analysis found that patients with an average heart rate higher than 70, low-frequency power (LFP) < 33, and short-term fluctuation DFA [Formula: see text] < 0.95 during anesthesia, had a significantly higher risk of entering the ICU after surgery. CONCLUSION: This study suggested that HRV measurements during anesthesia are feasible and effective for predicting postoperative clinical outcomes.


Subject(s)
Anesthesia , Anesthesiology , Humans , Heart Rate , Hospital Mortality , Prognosis
17.
BMC Bioinformatics ; 23(Suppl 4): 129, 2022 Apr 15.
Article in English | MEDLINE | ID: mdl-35428192

ABSTRACT

BACKGROUND: Drug resistance is a critical obstacle in cancer therapy. Discovering cancer drug response is important to improve anti-cancer drug treatment and guide anti-cancer drug design. Abundant genomic and drug response resources of cancer cell lines provide unprecedented opportunities for such study. However, cancer cell lines cannot fully reflect heterogeneous tumor microenvironments. Transferring knowledge studied from in vitro cell lines to single-cell and clinical data will be a promising direction to better understand drug resistance. Most current studies include single nucleotide variants (SNV) as features and focus on improving predictive ability of cancer drug response on cell lines. However, obtaining accurate SNVs from clinical tumor samples and single-cell data is not reliable. This makes it difficult to generalize such SNV-based models to clinical tumor data or single-cell level studies in the future. RESULTS: We present a new method, DualGCN, a unified Dual Graph Convolutional Network model to predict cancer drug response. DualGCN encodes both chemical structures of drugs and omics data of biological samples using graph convolutional networks. Then the two embeddings are fed into a multilayer perceptron to predict drug response. DualGCN incorporates prior knowledge on cancer-related genes and protein-protein interactions, and outperforms most state-of-the-art methods while avoiding using large-scale SNV data. CONCLUSIONS: The proposed method outperforms most state-of-the-art methods in predicting cancer drug response without the use of large-scale SNV data. These favorable results indicate its potential to be extended to clinical and single-cell tumor samples and advancements in precision medicine.


Subject(s)
Antineoplastic Agents , Neoplasms , Antineoplastic Agents/pharmacology , Antineoplastic Agents/therapeutic use , Genomics , Humans , Neoplasms/drug therapy , Neoplasms/genetics , Neural Networks, Computer , Tumor Microenvironment
18.
Bioinformatics ; 37(23): 4392-4398, 2021 12 07.
Article in English | MEDLINE | ID: mdl-34165490

ABSTRACT

MOTIVATION: Recent developments of spatial transcriptomic sequencing technologies provide powerful tools for understanding cells in the physical context of tissue microenvironments. A fundamental task in spatial gene expression analysis is to identify genes with spatially variable expression patterns, or spatially variable genes (SVgenes). Several computational methods have been developed for this task. Their high computational complexity limited their scalability to the latest and future large-scale spatial expression data. RESULTS: We present SOMDE, an efficient method for identifying SVgenes in large-scale spatial expression data. SOMDE uses self-organizing map to cluster neighboring cells into nodes, and then uses a Gaussian process to fit the node-level spatial gene expression to identify SVgenes. Experiments show that SOMDE is about 5-50 times faster than existing methods with comparable results. The adjustable resolution of SOMDE makes it the only method that can give results in ∼5 min in large datasets of more than 20 000 sequencing sites. SOMDE is available as a python package on PyPI at https://pypi.org/project/somde free for academic use. AVAILABILITY AND IMPLEMENTATION: SOMDE is available for download from PyPI, and the source code is openly available from the Github repository https://github.com/XuegongLab/somde. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Computational Biology , Computational Biology/methods , Software , Normal Distribution
19.
Bioinformatics ; 37(21): 3964-3965, 2021 11 05.
Article in English | MEDLINE | ID: mdl-34096998

ABSTRACT

SUMMARY: Clustering is a key step in revealing heterogeneities in single-cell data. Most existing single-cell clustering methods output a fixed number of clusters without the hierarchical information. Classical hierarchical clustering (HC) provides dendrograms of cells, but cannot scale to large datasets due to high computational complexity. We present HGC, a fast Hierarchical Graph-based Clustering tool to address both problems. It combines the advantages of graph-based clustering and HC. On the shared nearest-neighbor graph of cells, HGC constructs the hierarchical tree with linear time complexity. Experiments showed that HGC enables multiresolution exploration of the biological hierarchy underlying the data, achieves state-of-the-art accuracy on benchmark data and can scale to large datasets. AVAILABILITY AND IMPLEMENTATION: The R package of HGC is available at https://bioconductor.org/packages/HGC/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Software , Cluster Analysis , Benchmarking , Genetic Heterogeneity
20.
Bioinformatics ; 37(2): 285-287, 2021 Apr 19.
Article in English | MEDLINE | ID: mdl-33416830

ABSTRACT

SUMMARY: Recent advances of long-term time-lapse microscopy have made it easy for researchers to quantify cell behavior and molecular dynamics at single-cell resolution. However, the lack of easy-to-use software tools optimized for customized research is still a major challenge for quantitatively understanding biological processes through microscopy images. Here, we present CellTracker, a highly integrated graphical user interface software, for automated cell segmentation and tracking of time-lapse microscopy images. It covers essential steps in image analysis including project management, image pre-processing, cell segmentation, cell tracking, manually correction and statistical analysis such as the quantification of cell size and fluorescence intensity, etc. Furthermore, CellTracker provides an annotation tool and supports model training from scratch, thus proposing a flexible and scalable solution for customized dataset analysis. AVAILABILITY AND IMPLEMENTATION: CellTracker is an open-source software under the GPL-3.0 license. It is implemented in Python and provides an easy-to-use graphical user interface. The source code, instruction manual and demos can be found at https://github.com/WangLabTHU/CellTracker. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

SELECTION OF CITATIONS
SEARCH DETAIL