Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
Bioinformatics ; 39(39 Suppl 1): i404-i412, 2023 06 30.
Artículo en Inglés | MEDLINE | ID: mdl-37387141

RESUMEN

MOTIVATION: Knowing the relation between cell types is crucial for translating experimental results from mice to humans. Establishing cell type matches, however, is hindered by the biological differences between the species. A substantial amount of evolutionary information between genes that could be used to align the species is discarded by most of the current methods since they only use one-to-one orthologous genes. Some methods try to retain the information by explicitly including the relation between genes, however, not without caveats. RESULTS: In this work, we present a model to transfer and align cell types in cross-species analysis (TACTiCS). First, TACTiCS uses a natural language processing model to match genes using their protein sequences. Next, TACTiCS employs a neural network to classify cell types within a species. Afterward, TACTiCS uses transfer learning to propagate cell type labels between species. We applied TACTiCS on scRNA-seq data of the primary motor cortex of human, mouse, and marmoset. Our model can accurately match and align cell types on these datasets. Moreover, our model outperforms Seurat and the state-of-the-art method SAMap. Finally, we show that our gene matching method results in better cell type matches than BLAST in our model. AVAILABILITY AND IMPLEMENTATION: The implementation is available on GitHub (https://github.com/kbiharie/TACTiCS). The preprocessed datasets and trained models can be downloaded from Zenodo (https://doi.org/10.5281/zenodo.7582460).


Asunto(s)
Evolución Biológica , Técnicas Genéticas , Humanos , Animales , Ratones , Secuencia de Aminoácidos , Procesamiento de Lenguaje Natural , Aprendizaje Automático
2.
Nucleic Acids Res ; 45(10): e83, 2017 Jun 02.
Artículo en Inglés | MEDLINE | ID: mdl-28132031

RESUMEN

Spatial and temporal brain transcriptomics has recently emerged as an invaluable data source for molecular neuroscience. The complexity of such data poses considerable challenges for analysis and visualization. We present BrainScope: a web portal for fast, interactive visual exploration of the Allen Atlases of the adult and developing human brain transcriptome. Through a novel methodology to explore high-dimensional data (dual t-SNE), BrainScope enables the linked, all-in-one visualization of genes and samples across the whole brain and genome, and across developmental stages. We show that densities in t-SNE scatter plots of the spatial samples coincide with anatomical regions, and that densities in t-SNE scatter plots of the genes represent gene co-expression modules that are significantly enriched for biological functions. We also show that the topography of the gene t-SNE maps reflect brain region-specific gene functions, enabling hypothesis and data driven research. We demonstrate the discovery potential of BrainScope through three examples: (i) analysis of cell type specific gene sets, (ii) analysis of a set of stable gene co-expression modules across the adult human donors and (iii) analysis of the evolution of co-expression of oligodendrocyte specific genes over developmental stages. BrainScope is publicly accessible at www.brainscope.nl.


Asunto(s)
Encéfalo/metabolismo , Regulación del Desarrollo de la Expresión Génica , Redes Reguladoras de Genes , Genoma Humano , Programas Informáticos , Transcriptoma , Adolescente , Adulto , Atlas como Asunto , Encéfalo/crecimiento & desarrollo , Niño , Preescolar , Mapeo Cromosómico/métodos , Marcadores Genéticos , Humanos , Lactante , Anotación de Secuencia Molecular , Oligodendroglía/citología , Oligodendroglía/metabolismo
3.
Front Bioinform ; 4: 1347276, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38501113

RESUMEN

Most regulatory elements, especially enhancer sequences, are cell population-specific. One could even argue that a distinct set of regulatory elements is what defines a cell population. However, discovering which non-coding regions of the DNA are essential in which context, and as a result, which genes are expressed, is a difficult task. Some computational models tackle this problem by predicting gene expression directly from the genomic sequence. These models are currently limited to predicting bulk measurements and mainly make tissue-specific predictions. Here, we present a model that leverages single-cell RNA-sequencing data to predict gene expression. We show that cell population-specific models outperform tissue-specific models, especially when the expression profile of a cell population and the corresponding tissue are dissimilar. Further, we show that our model can prioritize GWAS variants and learn motifs of transcription factor binding sites. We envision that our model can be useful for delineating cell population-specific regulatory elements.

4.
NAR Genom Bioinform ; 5(3): lqad070, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37502708

RESUMEN

Single-cell genomics is now producing an ever-increasing amount of datasets that, when integrated, could provide large-scale reference atlases of tissue in health and disease. Such large-scale atlases increase the scale and generalizability of analyses and enable combining knowledge generated by individual studies. Specifically, individual studies often differ regarding cell annotation terminology and depth, with different groups specializing in different cell type compartments, often using distinct terminology. Understanding how these distinct sets of annotations are related and complement each other would mark a major step towards a consensus-based cell-type annotation reflecting the latest knowledge in the field. Whereas recent computational techniques, referred to as 'reference mapping' methods, facilitate the usage and expansion of existing reference atlases by mapping new datasets (i.e. queries) onto an atlas; a systematic approach towards harmonizing dataset-specific cell-type terminology and annotation depth is still lacking. Here, we present 'treeArches', a framework to automatically build and extend reference atlases while enriching them with an updatable hierarchy of cell-type annotations across different datasets. We demonstrate various use cases for treeArches, from automatically resolving relations between reference and query cell types to identifying unseen cell types absent in the reference, such as disease-associated cell states. We envision treeArches enabling data-driven construction of consensus atlas-level cell-type hierarchies and facilitating efficient usage of reference atlases.

5.
Nat Commun ; 12(1): 2799, 2021 05 14.
Artículo en Inglés | MEDLINE | ID: mdl-33990598

RESUMEN

Supervised methods are increasingly used to identify cell populations in single-cell data. Yet, current methods are limited in their ability to learn from multiple datasets simultaneously, are hampered by the annotation of datasets at different resolutions, and do not preserve annotations when retrained on new datasets. The latter point is especially important as researchers cannot rely on downstream analysis performed using earlier versions of the dataset. Here, we present scHPL, a hierarchical progressive learning method which allows continuous learning from single-cell data by leveraging the different resolutions of annotations across multiple datasets to learn and continuously update a classification tree. We evaluate the classification and tree learning performance using simulated as well as real datasets and show that scHPL can successfully learn known cellular hierarchies from multiple datasets while preserving the original annotations. scHPL is available at https://github.com/lcmmichielsen/scHPL .


Asunto(s)
Células/clasificación , Aprendizaje Profundo , Análisis de la Célula Individual/estadística & datos numéricos , Animales , Encéfalo/citología , Simulación por Computador , Bases de Datos Factuales/estadística & datos numéricos , Humanos , Leucocitos Mononucleares/clasificación , Ratones , Programas Informáticos , Aprendizaje Automático Supervisado
6.
Genome Biol ; 20(1): 194, 2019 09 09.
Artículo en Inglés | MEDLINE | ID: mdl-31500660

RESUMEN

BACKGROUND: Single-cell transcriptomics is rapidly advancing our understanding of the cellular composition of complex tissues and organisms. A major limitation in most analysis pipelines is the reliance on manual annotations to determine cell identities, which are time-consuming and irreproducible. The exponential growth in the number of cells and samples has prompted the adaptation and development of supervised classification methods for automatic cell identification. RESULTS: Here, we benchmarked 22 classification methods that automatically assign cell identities including single-cell-specific and general-purpose classifiers. The performance of the methods is evaluated using 27 publicly available single-cell RNA sequencing datasets of different sizes, technologies, species, and levels of complexity. We use 2 experimental setups to evaluate the performance of each method for within dataset predictions (intra-dataset) and across datasets (inter-dataset) based on accuracy, percentage of unclassified cells, and computation time. We further evaluate the methods' sensitivity to the input features, number of cells per population, and their performance across different annotation levels and datasets. We find that most classifiers perform well on a variety of datasets with decreased accuracy for complex datasets with overlapping classes or deep annotations. The general-purpose support vector machine classifier has overall the best performance across the different experiments. CONCLUSIONS: We present a comprehensive evaluation of automatic cell identification methods for single-cell RNA sequencing data. All the code used for the evaluation is available on GitHub ( https://github.com/tabdelaal/scRNAseq_Benchmark ). Additionally, we provide a Snakemake workflow to facilitate the benchmarking and to support the extension of new methods and new datasets.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Máquina de Vectores de Soporte
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA