Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 92
Filtrar
1.
Bioinformatics ; 40(6)2024 Jun 03.
Artículo en Inglés | MEDLINE | ID: mdl-38889275

RESUMEN

MOTIVATION: Single-cell omics technologies have enabled the quantification of molecular profiles in individual cells at an unparalleled resolution. Deep learning, a rapidly evolving sub-field of machine learning, has instilled a significant interest in single-cell omics research due to its remarkable success in analysing heterogeneous high-dimensional single-cell omics data. Nevertheless, the inherent multi-layer nonlinear architecture of deep learning models often makes them 'black boxes' as the reasoning behind predictions is often unknown and not transparent to the user. This has stimulated an increasing body of research for addressing the lack of interpretability in deep learning models, especially in single-cell omics data analyses, where the identification and understanding of molecular regulators are crucial for interpreting model predictions and directing downstream experimental validations. RESULTS: In this work, we introduce the basics of single-cell omics technologies and the concept of interpretable deep learning. This is followed by a review of the recent interpretable deep learning models applied to various single-cell omics research. Lastly, we highlight the current limitations and discuss potential future directions.


Asunto(s)
Aprendizaje Profundo , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Humanos , Biología Computacional/métodos , Genómica/métodos
2.
Cell Rep ; 43(5): 114219, 2024 May 28.
Artículo en Inglés | MEDLINE | ID: mdl-38748874

RESUMEN

Defining the molecular networks orchestrating human brain formation is crucial for understanding neurodevelopment and neurological disorders. Challenges in acquiring early brain tissue have incentivized the use of three-dimensional human pluripotent stem cell (hPSC)-derived neural organoids to recapitulate neurodevelopment. To elucidate the molecular programs that drive this highly dynamic process, here, we generate a comprehensive trans-omic map of the phosphoproteome, proteome, and transcriptome of the exit of pluripotency and neural differentiation toward human cerebral organoids (hCOs). These data reveal key phospho-signaling events and their convergence on transcriptional factors to regulate hCO formation. Comparative analysis with developing human and mouse embryos demonstrates the fidelity of our hCOs in modeling embryonic brain development. Finally, we demonstrate that biochemical modulation of AKT signaling can control hCO differentiation. Together, our data provide a comprehensive resource to study molecular controls in human embryonic brain development and provide a guide for the future development of hCO differentiation protocols.


Asunto(s)
Encéfalo , Diferenciación Celular , Organoides , Humanos , Organoides/metabolismo , Encéfalo/metabolismo , Encéfalo/embriología , Animales , Ratones , Células Madre Pluripotentes/metabolismo , Células Madre Pluripotentes/citología , Proteoma/metabolismo , Transducción de Señal , Transcriptoma/genética , Proteómica/métodos , Neurogénesis , Proteínas Proto-Oncogénicas c-akt/metabolismo
3.
Dev Cell ; 59(6): 705-722.e8, 2024 Mar 25.
Artículo en Inglés | MEDLINE | ID: mdl-38354738

RESUMEN

Wnt signaling is a critical determinant of cell lineage development. This study used Wnt dose-dependent induction programs to gain insights into molecular regulation of stem cell differentiation. We performed single-cell RNA sequencing of hiPSCs responding to a dose escalation protocol with Wnt agonist CHIR-99021 during the exit from pluripotency to identify cell types and genetic activity driven by Wnt stimulation. Results of activated gene sets and cell types were used to build a multiple regression model that predicts the efficiency of cardiomyocyte differentiation. Cross-referencing Wnt-associated gene expression profiles to the Connectivity Map database, we identified the small-molecule drug, tranilast. We found that tranilast synergistically activates Wnt signaling to promote cardiac lineage differentiation, which we validate by in vitro analysis of hiPSC differentiation and in vivo analysis of developing quail embryos. Our study provides an integrated workflow that links experimental datasets, prediction models, and small-molecule databases to identify drug-like compounds that control cell differentiation.


Asunto(s)
Miocitos Cardíacos , Vía de Señalización Wnt , ortoaminobenzoatos , Miocitos Cardíacos/metabolismo , Diferenciación Celular/genética , Linaje de la Célula/genética , Vía de Señalización Wnt/genética , Mesodermo
4.
Genome Biol ; 25(1): 18, 2024 01 15.
Artículo en Inglés | MEDLINE | ID: mdl-38225676

RESUMEN

BACKGROUND: The identification of genes that vary across spatial domains in tissues and cells is an essential step for spatial transcriptomics data analysis. Given the critical role it serves for downstream data interpretations, various methods for detecting spatially variable genes (SVGs) have been proposed. However, the lack of benchmarking complicates the selection of a suitable method. RESULTS: Here we systematically evaluate a panel of popular SVG detection methods on a large collection of spatial transcriptomics datasets, covering various tissue types, biotechnologies, and spatial resolutions. We address questions including whether different methods select a similar set of SVGs, how reliable is the reported statistical significance from each method, how accurate and robust is each method in terms of SVG detection, and how well the selected SVGs perform in downstream applications such as clustering of spatial domains. Besides these, practical considerations such as computational time and memory usage are also crucial for deciding which method to use. CONCLUSIONS: Our study evaluates the performance of each method from multiple aspects and highlights the discrepancy among different methods when calling statistically significant SVGs across diverse datasets. Overall, our work provides useful considerations for choosing methods for identifying SVGs and serves as a key reference for the future development of related methods.


Asunto(s)
Benchmarking , Perfilación de la Expresión Génica , Biotecnología , Análisis por Conglomerados , Prueba de Histocompatibilidad , Transcriptoma
5.
Bioinform Adv ; 3(1): vbad141, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37928340

RESUMEN

Motivation: The advent of highly multiplexed in situ imaging cytometry assays has revolutionized the study of cellular systems, offering unparalleled detail in observing cellular activities and characteristics. These assays provide comprehensive insights by concurrently profiling the spatial distribution and molecular features of numerous cells. In navigating this complex data landscape, unsupervised machine learning techniques, particularly clustering algorithms, have become essential tools. They enable the identification and categorization of cell types and subsets based on their molecular characteristics. Despite their widespread adoption, most clustering algorithms in use were initially developed for cell suspension technologies, leading to a potential mismatch in application. There is a critical gap in the systematic evaluation of these methods, particularly in determining the properties that make them optimal for in situ imaging assays. Addressing this gap is vital for ensuring accurate, reliable analyses and fostering advancements in cellular biology research. Results: In our extensive investigation, we evaluated a range of similarity metrics, which are crucial in determining the relationships between cells during the clustering process. Our findings reveal substantial variations in clustering performance, contingent on the similarity metric employed. These variations underscore the importance of selecting appropriate metrics to ensure accurate cell type and subset identification. In response to these challenges, we introduce FuseSOM, a novel ensemble clustering algorithm that integrates hierarchical multiview learning of similarity metrics with self-organizing maps. Through a rigorous stratified subsampling analysis framework, we demonstrate that FuseSOM outperforms existing best-practice clustering methods specifically tailored for in situ imaging cytometry data. Our work not only provides critical insights into the performance of clustering algorithms in this novel context but also offers a robust solution, paving the way for more accurate and reliable in situ imaging cytometry data analysis. Availability and implementation: The FuseSOM R package is available on Bioconductor and is available under the GPL-3 license. All the codes for the analysis performed can be found at Github.

6.
NAR Genom Bioinform ; 5(4): lqad099, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37954574

RESUMEN

A major challenge in mass spectrometry-based phosphoproteomics lies in identifying the substrates of kinases, as currently only a small fraction of substrates identified can be confidently linked with a known kinase. Machine learning techniques are promising approaches for leveraging large-scale phosphoproteomics data to computationally predict substrates of kinases. However, the small number of experimentally validated kinase substrates (true positive) and the high data noise in many phosphoproteomics datasets together limit their applicability and utility. Here, we aim to develop advanced kinase-substrate prediction methods to address these challenges. Using a collection of seven large phosphoproteomics datasets, and both traditional and deep learning models, we first demonstrate that a 'pseudo-positive' learning strategy for alleviating small sample size is effective at improving model predictive performance. We next show that a data resampling-based ensemble learning strategy is useful for improving model stability while further enhancing prediction. Lastly, we introduce an ensemble deep learning model ('SnapKin') by incorporating the above two learning strategies into a 'snapshot' ensemble learning algorithm. We propose SnapKin, an ensemble deep learning method, for predicting substrates of kinases from large-scale phosphoproteomics data. We demonstrate that SnapKin consistently outperforms existing methods in kinase-substrate prediction. SnapKin is freely available at https://github.com/PYangLab/SnapKin.

7.
Genome Biol ; 24(1): 259, 2023 11 10.
Artículo en Inglés | MEDLINE | ID: mdl-37950331

RESUMEN

BACKGROUND: Feature selection is an essential task in single-cell RNA-seq (scRNA-seq) data analysis and can be critical for gene dimension reduction and downstream analyses, such as gene marker identification and cell type classification. Most popular methods for feature selection from scRNA-seq data are based on the concept of differential distribution wherein a statistical model is used to detect changes in gene expression among cell types. Recent development of deep learning-based feature selection methods provides an alternative approach compared to traditional differential distribution-based methods in that the importance of a gene is determined by neural networks. RESULTS: In this work, we explore the utility of various deep learning-based feature selection methods for scRNA-seq data analysis. We sample from Tabula Muris and Tabula Sapiens atlases to create scRNA-seq datasets with a range of data properties and evaluate the performance of traditional and deep learning-based feature selection methods for cell type classification, feature selection reproducibility and diversity, and computational time. CONCLUSIONS: Our study provides a reference for future development and application of deep learning-based feature selection methods for single-cell omics data analyses.


Asunto(s)
Aprendizaje Profundo , Perfilación de la Expresión Génica , Perfilación de la Expresión Génica/métodos , Reproducibilidad de los Resultados , Análisis de la Célula Individual/métodos , Análisis de Datos , Análisis de Secuencia de ARN/métodos , Análisis por Conglomerados , Algoritmos
8.
Geroscience ; 45(6): 3307-3331, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37782439

RESUMEN

Alzheimer's disease (AD) is an age-related disease, with loss of integrity of the blood-brain barrier (BBB) being an early feature. Cellular senescence is one of the reported nine hallmarks of aging. Here, we show for the first time the presence of senescent cells in the vasculature in AD patients and mouse models of AD. Senescent endothelial cells and pericytes are present in APP/PS1 transgenic mice but not in wild-type littermates at the time of amyloid deposition. In vitro, senescent endothelial cells display altered VE-cadherin expression and loss of cell junction formation and increased permeability. Consistent with this, senescent endothelial cells in APP/PS1 mice are present at areas of vascular leak that have decreased claudin-5 and VE-cadherin expression confirming BBB breakdown. Furthermore, single cell sequencing of endothelial cells from APP/PS1 transgenic mice confirms that adhesion molecule pathways are among the most highly altered pathways in these cells. At the pre-plaque stage, the vasculature shows significant signs of breakdown, with a general loss of VE-cadherin, leakage within the microcirculation, and obvious pericyte perturbation. Although senescent vascular cells were not directly observed at sites of vascular leak, senescent cells were close to the leak area. Thus, we would suggest in AD that there is a progressive induction of senescence in constituents of the neurovascular unit contributing to an increasing loss of vascular integrity. Targeting the vasculature early in AD, either with senolytics or with drugs that improve the integrity of the BBB may be valid therapeutic strategies.


Asunto(s)
Enfermedad de Alzheimer , Barrera Hematoencefálica , Humanos , Ratones , Animales , Barrera Hematoencefálica/metabolismo , Enfermedad de Alzheimer/metabolismo , Células Endoteliales , Ratones Transgénicos , Envejecimiento
9.
NPJ Syst Biol Appl ; 9(1): 51, 2023 Oct 19.
Artículo en Inglés | MEDLINE | ID: mdl-37857632

RESUMEN

Inferring gene regulatory networks (GRNs) is a fundamental challenge in biology that aims to unravel the complex relationships between genes and their regulators. Deciphering these networks plays a critical role in understanding the underlying regulatory crosstalk that drives many cellular processes and diseases. Recent advances in sequencing technology have led to the development of state-of-the-art GRN inference methods that exploit matched single-cell multi-omic data. By employing diverse mathematical and statistical methodologies, these methods aim to reconstruct more comprehensive and precise gene regulatory networks. In this review, we give a brief overview on the statistical and methodological foundations commonly used in GRN inference methods. We then compare and contrast the latest state-of-the-art GRN inference methods for single-cell matched multi-omics data, and discuss their assumptions, limitations and opportunities. Finally, we discuss the challenges and future directions that hold promise for further advancements in this rapidly developing field.


Asunto(s)
Redes Reguladoras de Genes , Multiómica , Redes Reguladoras de Genes/genética
10.
Bioinformatics ; 39(6)2023 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-37314966

RESUMEN

MOTIVATION: Recent advances in multimodal single-cell omics technologies enable multiple modalities of molecular attributes, such as gene expression, chromatin accessibility, and protein abundance, to be profiled simultaneously at a global level in individual cells. While the increasing availability of multiple data modalities is expected to provide a more accurate clustering and characterization of cells, the development of computational methods that are capable of extracting information embedded across data modalities is still in its infancy. RESULTS: We propose SnapCCESS for clustering cells by integrating data modalities in multimodal single-cell omics data using an unsupervised ensemble deep learning framework. By creating snapshots of embeddings of multimodality using variational autoencoders, SnapCCESS can be coupled with various clustering algorithms for generating consensus clustering of cells. We applied SnapCCESS with several clustering algorithms to various datasets generated from popular multimodal single-cell omics technologies. Our results demonstrate that SnapCCESS is effective and more efficient than conventional ensemble deep learning-based clustering methods and outperforms other state-of-the-art multimodal embedding generation methods in integrating data modalities for clustering cells. The improved clustering of cells from SnapCCESS will pave the way for more accurate characterization of cell identity and types, an essential step for various downstream analyses of multimodal single-cell omics data. AVAILABILITY AND IMPLEMENTATION: SnapCCESS is implemented as a Python package and is freely available from https://github.com/PYangLab/SnapCCESS under the open-source license of GPL-3. The data used in this study are publicly available (see section 'Data availability').


Asunto(s)
Aprendizaje Profundo , Algoritmos , Análisis por Conglomerados , Cromatina , Análisis de la Célula Individual
11.
Brief Bioinform ; 24(3)2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-37096588

RESUMEN

The advances of single-cell transcriptomic technologies have led to increasing use of single-cell RNA sequencing (scRNA-seq) data in large-scale patient cohort studies. The resulting high-dimensional data can be summarized and incorporated into patient outcome prediction models in several ways; however, there is a pressing need to understand the impact of analytical decisions on such model quality. In this study, we evaluate the impact of analytical choices on model choices, ensemble learning strategies and integrate approaches on patient outcome prediction using five scRNA-seq COVID-19 datasets. First, we examine the difference in performance between using single-view feature space versus multi-view feature space. Next, we survey multiple learning platforms from classical machine learning to modern deep learning methods. Lastly, we compare different integration approaches when combining datasets is necessary. Through benchmarking such analytical combinations, our study highlights the power of ensemble learning, consistency among different learning methods and robustness to dataset normalization when using multiple datasets as the model input.


Asunto(s)
Benchmarking , COVID-19 , Humanos , Perfilación de la Expresión Génica , Aprendizaje Automático , Análisis de Secuencia de ARN/métodos
12.
Cytometry A ; 103(7): 593-599, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-36879360

RESUMEN

Highly multiplexed in situ imaging cytometry assays have made it possible to study the spatial organization of numerous cell types simultaneously. We have addressed the challenge of quantifying complex multi-cellular relationships by proposing a statistical method which clusters local indicators of spatial association. Our approach successfully identifies distinct tissue architectures in datasets generated from three state-of-the-art high-parameter assays demonstrating its value in summarizing the information-rich data generated from these technologies.


Asunto(s)
Citometría de Imagen , Análisis Espacial
13.
Nucleic Acids Res ; 51(8): e45, 2023 05 08.
Artículo en Inglés | MEDLINE | ID: mdl-36912104

RESUMEN

Multimodal single-cell omics technologies enable multiple molecular programs to be simultaneously profiled at a global scale in individual cells, creating opportunities to study biological systems at a resolution that was previously inaccessible. However, the analysis of multimodal single-cell omics data is challenging due to the lack of methods that can integrate across multiple data modalities generated from such technologies. Here, we present Matilda, a multi-task learning method for integrative analysis of multimodal single-cell omics data. By leveraging the interrelationship among tasks, Matilda learns to perform data simulation, dimension reduction, cell type classification, and feature selection in a single unified framework. We compare Matilda with other state-of-the-art methods on datasets generated from some of the most popular multimodal single-cell omics technologies. Our results demonstrate the utility of Matilda for addressing multiple key tasks on integrative multimodal single-cell omics data analysis. Matilda is implemented in Pytorch and is freely available from https://github.com/PYangLab/Matilda.


Asunto(s)
Genómica , Análisis de la Célula Individual , Genómica/métodos , Simulación por Computador
14.
STAR Protoc ; 4(2): 102203, 2023 Mar 29.
Artículo en Inglés | MEDLINE | ID: mdl-37000617

RESUMEN

Characterizing transcription factor (TF) genomic colocalization is essential for identifying cooperative binding of TFs in controlling gene expression. Here, we introduce a protocol for using PAD2, an interactive web application that enables the investigation of colocalization of various TFs and chromatin-regulating proteins from mouse embryonic stem cells at various functional genomic regions. We describe steps for accessing and searching the PAD2 database and selecting and submitting genomic regions. We then detail protein colocalization analysis using heatmap and ranked correlation plot. For complete details on the use and execution of this protocol, please refer to Kim et al. (2022).1.

15.
Nat Commun ; 14(1): 923, 2023 02 18.
Artículo en Inglés | MEDLINE | ID: mdl-36808134

RESUMEN

The failure of metabolic tissues to appropriately respond to insulin ("insulin resistance") is an early marker in the pathogenesis of type 2 diabetes. Protein phosphorylation is central to the adipocyte insulin response, but how adipocyte signaling networks are dysregulated upon insulin resistance is unknown. Here we employ phosphoproteomics to delineate insulin signal transduction in adipocyte cells and adipose tissue. Across a range of insults causing insulin resistance, we observe a marked rewiring of the insulin signaling network. This includes both attenuated insulin-responsive phosphorylation, and the emergence of phosphorylation uniquely insulin-regulated in insulin resistance. Identifying dysregulated phosphosites common to multiple insults reveals subnetworks containing non-canonical regulators of insulin action, such as MARK2/3, and causal drivers of insulin resistance. The presence of several bona fide GSK3 substrates among these phosphosites led us to establish a pipeline for identifying context-specific kinase substrates, revealing widespread dysregulation of GSK3 signaling. Pharmacological inhibition of GSK3 partially reverses insulin resistance in cells and tissue explants. These data highlight that insulin resistance is a multi-nodal signaling defect that includes dysregulated MARK2/3 and GSK3 activity.


Asunto(s)
Diabetes Mellitus Tipo 2 , Resistencia a la Insulina , Humanos , Diabetes Mellitus Tipo 2/metabolismo , Glucógeno Sintasa Quinasa 3/metabolismo , Insulina/metabolismo , Resistencia a la Insulina/fisiología , Fosforilación , Transducción de Señal/fisiología , Proteoma/metabolismo
16.
Stem Cell Reports ; 18(1): 175-189, 2023 01 10.
Artículo en Inglés | MEDLINE | ID: mdl-36630901

RESUMEN

Characterizing cell identity in complex tissues such as the human retina is essential for studying its development and disease. While retinal organoids derived from pluripotent stem cells have been widely used to model development and disease of the human retina, there is a lack of studies that have systematically evaluated the molecular and cellular fidelity of the organoids derived from various culture protocols in recapitulating their in vivo counterpart. To this end, we performed an extensive meta-atlas characterization of cellular identities of the human eye, covering a wide range of developmental stages. The resulting map uncovered previously unknown biomarkers of major retinal cell types and those associated with cell-type-specific maturation. Using our retinal-cell-identity map from the fetal and adult tissues, we systematically assessed the fidelity of the retinal organoids in mimicking the human eye, enabling us to comprehensively benchmark the current protocols for retinal organoid generation.


Asunto(s)
Células Madre Pluripotentes Inducidas , Células Madre Pluripotentes , Adulto , Humanos , Retina/metabolismo , Células Madre Pluripotentes/metabolismo , Neuronas , Organoides , Diferenciación Celular , Células Madre Pluripotentes Inducidas/metabolismo
17.
Proteomics ; 23(3-4): e2200068, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-35580145

RESUMEN

Protein phosphorylation plays an essential role in modulating cell signalling and its downstream transcriptional and translational regulations. Until recently, protein phosphorylation has been studied mostly using low-throughput biochemical assays. The advancement of mass spectrometry (MS)-based phosphoproteomics transformed the field by enabling measurement of proteome-wide phosphorylation events, where tens of thousands of phosphosites are routinely identified and quantified in an experiment. This has brought a significant challenge in analysing large-scale phosphoproteomic data, making computational methods and systems approaches integral parts of phosphoproteomics. Previous works have primarily focused on reviewing the experimental techniques in MS-based phosphoproteomics, yet a systematic survey of the computational landscape in this field is still missing. Here, we review computational methods and tools, and systems approaches that have been developed for phosphoproteomics data analysis. We categorise them into four aspects including data processing, functional analysis, phosphoproteome annotation and their integration with other omics, and in each aspect, we discuss the key methods and example studies. Lastly, we highlight some of the potential research directions on which future work would make a significant contribution to this fast-growing field. We hope this review provides a useful snapshot of the field of computational systems phosphoproteomics and stimulates new research that drives future development.


Asunto(s)
Fosfoproteínas , Procesamiento Proteico-Postraduccional , Fosfoproteínas/metabolismo , Fosforilación , Proteoma/metabolismo , Análisis de Sistemas
18.
F1000Res ; 12: 261, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38434622

RESUMEN

Background: Globally, scientists now have the ability to generate a vast amount of high throughput biomedical data that carry critical information for important clinical and public health applications. This data revolution in biology is now creating a plethora of new single-cell datasets. Concurrently, there have been significant methodological advances in single-cell research. Integrating these two resources, creating tailor-made, efficient, and purpose-specific data analysis approaches can assist in accelerating scientific discovery. Methods: We developed a series of living workshops for building data stories, using Single-cell data integrative analysis (scdney). scdney is a wrapper package with a collection of single-cell analysis R packages incorporating data integration, cell type annotation, higher order testing and more. Results: Here, we illustrate two specific workshops. The first workshop examines how to characterise the identity and/or state of cells and the relationship between them, known as phenotyping. The second workshop focuses on extracting higher-order features from cells to predict disease progression. Conclusions: Through these workshops, we not only showcase current solutions, but also highlight critical thinking points. In particular, we highlight the Thinking Process Template that provides a structured framework for the decision-making process behind such single-cell analyses. Furthermore, our workshop will incorporate dynamic contributions from the community in a collaborative learning approach, thus the term 'living'.

19.
iScience ; 25(10): 105049, 2022 Oct 21.
Artículo en Inglés | MEDLINE | ID: mdl-36124234

RESUMEN

Lysine-specific demethylase 1 (LSD1) is well-known for its role in decommissioning enhancers during mouse embryonic stem cell (ESC) differentiation. Its role in gene promoters remains poorly understood despite its widespread presence at these sites. Here, we report that LSD1 promotes RNA polymerase II (RNAPII) pausing, a rate-limiting step in transcription regulation, in ESCs. We found the knockdown of LSD1 preferentially affects genes with higher RNAPII pausing. Next, we demonstrate that the co-localization sites of LSD1 and MYC, a factor known to regulate pause-release, are enriched for other RNAPII pausing factors. We show that LSD1 and MYC directly interact and MYC recruitment to genes co-regulated with LSD1 is dependent on LSD1 but not vice versa. The co-regulated gene set is significantly enriched for housekeeping processes and depleted of transcription factors compared to those bound by LSD1 alone. Collectively, our integrative analysis reveals a pleiotropic role of LSD1 in promoting RNAPII pausing.

20.
Bioinformatics ; 38(20): 4745-4753, 2022 10 14.
Artículo en Inglés | MEDLINE | ID: mdl-36040148

RESUMEN

MOTIVATION: With the recent surge of large-cohort scale single cell research, it is of critical importance that analytical methods can fully utilize the comprehensive characterization of cellular systems that single cell technologies produce to provide insights into samples from individuals. Currently, there is little consensus on the best ways to compress information from the complex data structures of these technologies to summary statistics that represent each sample (e.g. individuals). RESULTS: Here, we present scFeatures, an approach that creates interpretable cellular and molecular representations of single-cell and spatial data at the sample level. We demonstrate that summarizing a broad collection of features at the sample level is both important for understanding underlying disease mechanisms in different experimental studies and for accurately classifying disease status of individuals. AVAILABILITY AND IMPLEMENTATION: scFeatures is publicly available as an R package at https://github.com/SydneyBioX/scFeatures. All data used in this study are publicly available with accession ID reported in the Section 2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Programas Informáticos , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA