Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
Nat Methods ; 2024 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-38844628

RESUMEN

Large pretrained models have become foundation models leading to breakthroughs in natural language processing and related fields. Developing foundation models for deciphering the 'languages' of cells and facilitating biomedical research is promising yet challenging. Here we developed a large pretrained model scFoundation, also named 'xTrimoscFoundationα', with 100 million parameters covering about 20,000 genes, pretrained on over 50 million human single-cell transcriptomic profiles. scFoundation is a large-scale model in terms of the size of trainable parameters, dimensionality of genes and volume of training data. Its asymmetric transformer-like architecture and pretraining task design empower effectively capturing complex context relations among genes in a variety of cell types and states. Experiments showed its merit as a foundation model that achieved state-of-the-art performances in a diverse array of single-cell analysis tasks such as gene expression enhancement, tissue drug response prediction, single-cell drug response classification, single-cell perturbation prediction, cell type annotation and gene module inference.

2.
Commun Biol ; 7(1): 56, 2024 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-38184694

RESUMEN

Profiling spatial variations of cellular composition and transcriptomic characteristics is important for understanding the physiology and pathology of tissues. Spatial transcriptomics (ST) data depict spatial gene expression but the currently dominating high-throughput technology is yet not at single-cell resolution. Single-cell RNA-sequencing (SC) data provide high-throughput transcriptomic information at the single-cell level but lack spatial information. Integrating these two types of data would be ideal for revealing transcriptomic landscapes at single-cell resolution. We develop the method STEM (SpaTially aware EMbedding) for this purpose. It uses deep transfer learning to encode both ST and SC data into a unified spatially aware embedding space, and then uses the embeddings to infer SC-ST mapping and predict pseudo-spatial adjacency between cells in SC data. Semi-simulation and real data experiments verify that the embeddings preserved spatial information and eliminated technical biases between SC and ST data. We apply STEM to human squamous cell carcinoma and hepatic lobule datasets to uncover the localization of rare cell types and reveal cell-type-specific gene expression variation along a spatial axis. STEM is powerful for mapping SC and ST data to build single-cell level spatial transcriptomic landscapes, and can provide mechanistic insights into the spatial heterogeneity and microenvironments of tissues.


Asunto(s)
Carcinoma de Células Escamosas , Aprendizaje , Humanos , Perfilación de la Expresión Génica , Transcriptoma , Aprendizaje Automático , Microambiente Tumoral
3.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37824741

RESUMEN

Cell-cell communication events (CEs) are mediated by multiple ligand-receptor (LR) pairs. Usually only a particular subset of CEs directly works for a specific downstream response in a particular microenvironment. We name them as functional communication events (FCEs) of the target responses. Decoding FCE-target gene relations is: important for understanding the mechanisms of many biological processes, but has been intractable due to the mixing of multiple factors and the lack of direct observations. We developed a method HoloNet for decoding FCEs using spatial transcriptomic data by integrating LR pairs, cell-type spatial distribution and downstream gene expression into a deep learning model. We modeled CEs as a multi-view network, developed an attention-based graph learning method to train the model for generating target gene expression with the CE networks, and decoded the FCEs for specific downstream genes by interpreting trained models. We applied HoloNet on three Visium datasets of breast cancer and liver cancer. The results detangled the multiple factors of FCEs by revealing how LR signals and cell types affect specific biological processes, and specified FCE-induced effects in each single cell. We conducted simulation experiments and showed that HoloNet is more reliable on LR prioritization in comparison with existing methods. HoloNet is a powerful tool to illustrate cell-cell communication landscapes and reveal vital FCEs that shape cellular phenotypes. HoloNet is available as a Python package at https://github.com/lhc17/HoloNet.


Asunto(s)
Neoplasias Hepáticas , Transcriptoma , Humanos , Perfilación de la Expresión Génica , Comunicación Celular/genética , Simulación por Computador , Microambiente Tumoral
4.
STAR Protoc ; 3(3): 101589, 2022 09 16.
Artículo en Inglés | MEDLINE | ID: mdl-35942342

RESUMEN

Human Ensemble Cell Atlas (hECA) provides a unified informatics framework and the cell-centric-assembled single-cell transcriptome data of 1,093,299 labeled human cells from 116 published datasets. In this protocol, we provide three applications of hECA: "quantitative portraiture" exploration with websites, customizable reference creation for automatic cell type annotation, and "in data" cell sorting with logical conditions. We provide detail steps of connecting to the database, searching cell with conditions, downloading data, and annotating new datasets with customized reference. For complete details on the use and execution of this protocol, please refer to Chen et al. (2022).


Asunto(s)
Proteínas de Neoplasias , Transcriptoma , Humanos , Transcriptoma/genética
5.
iScience ; 25(5): 104318, 2022 May 20.
Artículo en Inglés | MEDLINE | ID: mdl-35602947

RESUMEN

The accumulation of massive single-cell omics data provides growing resources for building biomolecular atlases of all cells of human organs or the whole body. The true assembly of a cell atlas should be cell-centric rather than file-centric. We developed a unified informatics framework for seamless cell-centric data assembly and built the human Ensemble Cell Atlas (hECA) from scattered data. hECA v1.0 assembled 1,093,299 labeled human cells from 116 published datasets, covering 38 organs and 11 systems. We invented three new methods of atlas applications based on the cell-centric assembly: "in data" cell sorting for targeted data retrieval with customizable logic expressions, "quantitative portraiture" for multi-view representations of biological entities, and customizable reference creation for generating references for automatic annotations. Case studies on agile construction of user-defined sub-atlases and "in data" investigation of CAR-T off-targets in multiple organs showed the great potential enabled by the cell-centric ensemble atlas.

6.
Bioinformatics ; 37(23): 4392-4398, 2021 12 07.
Artículo en Inglés | MEDLINE | ID: mdl-34165490

RESUMEN

MOTIVATION: Recent developments of spatial transcriptomic sequencing technologies provide powerful tools for understanding cells in the physical context of tissue microenvironments. A fundamental task in spatial gene expression analysis is to identify genes with spatially variable expression patterns, or spatially variable genes (SVgenes). Several computational methods have been developed for this task. Their high computational complexity limited their scalability to the latest and future large-scale spatial expression data. RESULTS: We present SOMDE, an efficient method for identifying SVgenes in large-scale spatial expression data. SOMDE uses self-organizing map to cluster neighboring cells into nodes, and then uses a Gaussian process to fit the node-level spatial gene expression to identify SVgenes. Experiments show that SOMDE is about 5-50 times faster than existing methods with comparable results. The adjustable resolution of SOMDE makes it the only method that can give results in ∼5 min in large datasets of more than 20 000 sequencing sites. SOMDE is available as a python package on PyPI at https://pypi.org/project/somde free for academic use. AVAILABILITY AND IMPLEMENTATION: SOMDE is available for download from PyPI, and the source code is openly available from the Github repository https://github.com/XuegongLab/somde. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Biología Computacional , Biología Computacional/métodos , Programas Informáticos , Distribución Normal
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA