RESUMEN
MOTIVATION: Single-cell omics technologies have enabled the quantification of molecular profiles in individual cells at an unparalleled resolution. Deep learning, a rapidly evolving sub-field of machine learning, has instilled a significant interest in single-cell omics research due to its remarkable success in analysing heterogeneous high-dimensional single-cell omics data. Nevertheless, the inherent multi-layer nonlinear architecture of deep learning models often makes them 'black boxes' as the reasoning behind predictions is often unknown and not transparent to the user. This has stimulated an increasing body of research for addressing the lack of interpretability in deep learning models, especially in single-cell omics data analyses, where the identification and understanding of molecular regulators are crucial for interpreting model predictions and directing downstream experimental validations. RESULTS: In this work, we introduce the basics of single-cell omics technologies and the concept of interpretable deep learning. This is followed by a review of the recent interpretable deep learning models applied to various single-cell omics research. Lastly, we highlight the current limitations and discuss potential future directions.
Asunto(s)
Aprendizaje Profundo , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Humanos , Biología Computacional/métodos , Genómica/métodosRESUMEN
Protein phosphorylation plays an essential role in modulating cell signalling and its downstream transcriptional and translational regulations. Until recently, protein phosphorylation has been studied mostly using low-throughput biochemical assays. The advancement of mass spectrometry (MS)-based phosphoproteomics transformed the field by enabling measurement of proteome-wide phosphorylation events, where tens of thousands of phosphosites are routinely identified and quantified in an experiment. This has brought a significant challenge in analysing large-scale phosphoproteomic data, making computational methods and systems approaches integral parts of phosphoproteomics. Previous works have primarily focused on reviewing the experimental techniques in MS-based phosphoproteomics, yet a systematic survey of the computational landscape in this field is still missing. Here, we review computational methods and tools, and systems approaches that have been developed for phosphoproteomics data analysis. We categorise them into four aspects including data processing, functional analysis, phosphoproteome annotation and their integration with other omics, and in each aspect, we discuss the key methods and example studies. Lastly, we highlight some of the potential research directions on which future work would make a significant contribution to this fast-growing field. We hope this review provides a useful snapshot of the field of computational systems phosphoproteomics and stimulates new research that drives future development.
Asunto(s)
Fosfoproteínas , Procesamiento Proteico-Postraduccional , Fosfoproteínas/metabolismo , Fosforilación , Proteoma/metabolismo , Análisis de SistemasRESUMEN
BACKGROUND: The identification of genes that vary across spatial domains in tissues and cells is an essential step for spatial transcriptomics data analysis. Given the critical role it serves for downstream data interpretations, various methods for detecting spatially variable genes (SVGs) have been proposed. However, the lack of benchmarking complicates the selection of a suitable method. RESULTS: Here we systematically evaluate a panel of popular SVG detection methods on a large collection of spatial transcriptomics datasets, covering various tissue types, biotechnologies, and spatial resolutions. We address questions including whether different methods select a similar set of SVGs, how reliable is the reported statistical significance from each method, how accurate and robust is each method in terms of SVG detection, and how well the selected SVGs perform in downstream applications such as clustering of spatial domains. Besides these, practical considerations such as computational time and memory usage are also crucial for deciding which method to use. CONCLUSIONS: Our study evaluates the performance of each method from multiple aspects and highlights the discrepancy among different methods when calling statistically significant SVGs across diverse datasets. Overall, our work provides useful considerations for choosing methods for identifying SVGs and serves as a key reference for the future development of related methods.
Asunto(s)
Benchmarking , Perfilación de la Expresión Génica , Biotecnología , Análisis por Conglomerados , Prueba de Histocompatibilidad , TranscriptomaRESUMEN
Defining the molecular networks orchestrating human brain formation is crucial for understanding neurodevelopment and neurological disorders. Challenges in acquiring early brain tissue have incentivized the use of three-dimensional human pluripotent stem cell (hPSC)-derived neural organoids to recapitulate neurodevelopment. To elucidate the molecular programs that drive this highly dynamic process, here, we generate a comprehensive trans-omic map of the phosphoproteome, proteome, and transcriptome of the exit of pluripotency and neural differentiation toward human cerebral organoids (hCOs). These data reveal key phospho-signaling events and their convergence on transcriptional factors to regulate hCO formation. Comparative analysis with developing human and mouse embryos demonstrates the fidelity of our hCOs in modeling embryonic brain development. Finally, we demonstrate that biochemical modulation of AKT signaling can control hCO differentiation. Together, our data provide a comprehensive resource to study molecular controls in human embryonic brain development and provide a guide for the future development of hCO differentiation protocols.
Asunto(s)
Encéfalo , Diferenciación Celular , Organoides , Humanos , Organoides/metabolismo , Encéfalo/metabolismo , Encéfalo/embriología , Animales , Ratones , Células Madre Pluripotentes/metabolismo , Células Madre Pluripotentes/citología , Proteoma/metabolismo , Transducción de Señal , Transcriptoma/genética , Proteómica/métodos , Neurogénesis , Proteínas Proto-Oncogénicas c-akt/metabolismoRESUMEN
The use of single-cell RNA-sequencing (scRNA-seq) allows observation of different cells at multi-tiered complexity in the same microenvironment. To get insights into cell identity using scRNA-seq data, we present Cepo, which generates cell-type-specific gene statistics of differentially stable genes from scRNA-seq data to define cell identity. When applied to multiple datasets, Cepo outperforms current methods in assigning cell identity and enhances several cell identification applications such as cell-type characterisation, spatial mapping of single cells and lineage inference of single cells.
RESUMEN
Binary expression systems like the LexA-LexAop system provide a powerful experimental tool kit to study gene and tissue function in developmental biology, neurobiology, and physiology. However, the number of well-defined LexA enhancer trap insertions remains limited. In this study, we present the molecular characterization and initial tissue expression analysis of nearly 100 novel StanEx LexA enhancer traps, derived from the StanEx1 index line. This includes 76 insertions into novel, distinct gene loci not previously associated with enhancer traps or targeted LexA constructs. Additionally, our studies revealed evidence for selective transposase-dependent replacement of a previously-undetected KP element on chromosome III within the StanEx1 genetic background during hybrid dysgenesis, suggesting a molecular basis for the over-representation of LexA insertions at the NK7.1 locus in our screen. Production and characterization of novel fly lines were performed by students and teachers in experiment-based genetics classes within a geographically diverse network of public and independent high schools. Thus, unique partnerships between secondary schools and university-based programs have produced and characterized novel genetic and molecular resources in Drosophila for open-source distribution, and provide paradigms for development of science education through experience-based pedagogy.