Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 60
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Genome Res ; 2021 Oct 19.
Artigo em Inglês | MEDLINE | ID: mdl-34667120

RESUMO

Bivalent chromatin is characterized by the simultaneous presence of H3K4me3 and H3K27me3, histone modifications generally associated with transcriptionally active and repressed chromatin, respectively. Prevalent in embryonic stem cells (ESCs), bivalency is postulated to poise/prime lineage-controlling developmental genes for rapid activation during embryogenesis while maintaining a transcriptionally repressed state in the absence of activation cues; however, this hypothesis remains to be directly tested. Most gene promoters DNA-hypermethylated in adult human cancers are bivalently marked in ESCs, and it was speculated that bivalency predisposes them for aberrant de novo DNA methylation and irreversible silencing in cancer, but evidence supporting this model is largely lacking. Here we show that bivalent chromatin does not poise genes for rapid activation but protects promoters from de novo DNA methylation. Genome-wide studies in differentiating ESCs reveal that activation of bivalent genes is no more rapid than that of other transcriptionally silent genes, challenging the premise that H3K4me3 is instructive for transcription. H3K4me3 at bivalent promoters, a product of the underlying DNA sequence, persists in nearly all cell types irrespective of gene expression and confers protection from de novo DNA methylation. Bivalent genes in ESCs that are frequent targets of aberrant hypermethylation in cancer are particularly strongly associated with loss of H3K4me3/bivalency in cancer. Altogether, our findings suggest that bivalency protects reversibly repressed genes from irreversible silencing and that loss of H3K4me3 may make them more susceptible to aberrant DNA methylation in diseases such as cancer. Bivalency may thus represent a distinct regulatory mechanism for maintaining epigenetic plasticity.

2.
Nat Commun ; 12(1): 4992, 2021 08 17.
Artigo em Inglês | MEDLINE | ID: mdl-34404777

RESUMO

Liquid chromatography-mass spectrometry-based metabolomics studies are increasingly applied to large population cohorts, which run for several weeks or even years in data acquisition. This inevitably introduces unwanted intra- and inter-batch variations over time that can overshadow true biological signals and thus hinder potential biological discoveries. To date, normalisation approaches have struggled to mitigate the variability introduced by technical factors whilst preserving biological variance, especially for protracted acquisitions. Here, we propose a study design framework with an arrangement for embedding biological sample replicates to quantify variance within and between batches and a workflow that uses these replicates to remove unwanted variation in a hierarchical manner (hRUV). We use this design to produce a dataset of more than 1000 human plasma samples run over an extended period of time. We demonstrate significant improvement of hRUV over existing methods in preserving biological signals whilst removing unwanted variation for large scale metabolomics studies. Our tools not only provide a strategy for large scale data normalisation, but also provides guidance on the design strategy for large omics studies.


Assuntos
Metabolômica/métodos , Cromatografia Líquida , Humanos , Espectrometria de Massas/métodos , Modelos Biológicos , Fluxo de Trabalho
3.
Elife ; 102021 07 13.
Artigo em Inglês | MEDLINE | ID: mdl-34253290

RESUMO

The phosphoinositide 3-kinase (PI3K)-Akt network is tightly controlled by feedback mechanisms that regulate signal flow and ensure signal fidelity. A rapid overshoot in insulin-stimulated recruitment of Akt to the plasma membrane has previously been reported, which is indicative of negative feedback operating on acute timescales. Here, we show that Akt itself engages this negative feedback by phosphorylating insulin receptor substrate (IRS) 1 and 2 on a number of residues. Phosphorylation results in the depletion of plasma membrane-localised IRS1/2, reducing the pool available for interaction with the insulin receptor. Together these events limit plasma membrane-associated PI3K and phosphatidylinositol (3,4,5)-trisphosphate (PIP3) synthesis. We identified two Akt-dependent phosphorylation sites in IRS2 at S306 (S303 in mouse) and S577 (S573 in mouse) that are key drivers of this negative feedback. These findings establish a novel mechanism by which the kinase Akt acutely controls PIP3 abundance, through post-translational modification of the IRS scaffold.

4.
STAR Protoc ; 2(2): 100585, 2021 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-34151303

RESUMO

Analysis of phosphoproteomic data requires advanced computational methodologies. To this end, we developed PhosR, a set of tools and methodologies implemented in R to allow the comprehensive analysis of phosphoproteomic data. PhosR enables processing steps such as imputation, normalization, and functional analysis such as kinase activity inference and signalome construction. Together, PhosR facilitates interpretation and discovery from large-scale phosphoproteomic data sets. For complete details on the use and execution of this protocol, please refer to Kim et al. (2021).

5.
Cell Regen ; 10(1): 20, 2021 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-33931812

RESUMO

Identifying genes that define cell identity is a requisite step for characterising cell types and cell states and predicting cell fate choices. By far, the most widely used approach for this task is based on differential expression (DE) of genes, whereby the shift of mean expression are used as the primary statistics for identifying gene transcripts that are specific to cell types and states. While DE-based methods are useful for pinpointing genes that discriminate cell types, their reliance on measuring difference in mean expression may not reflect the biological attributes of cell identity genes. Here, we highlight the quest for non-DE methods and provide an overview of these methods and their applications to identify genes that define cell identity and functionality.

6.
Cell Death Discov ; 7(1): 81, 2021 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-33863878

RESUMO

Both tumour suppressive and oncogenic functions have been reported for dual-specificity tyrosine phosphorylation-regulated kinase 1A (DYRK1A). Herein, we performed a detailed investigation to delineate the role of DYRK1A in glioblastoma. Our phosphoproteomic and mechanistic studies show that DYRK1A induces degradation of cyclin B by phosphorylating CDC23, which is necessary for the function of the anaphase-promoting complex, a ubiquitin ligase that degrades mitotic proteins. DYRK1A inhibition leads to the accumulation of cyclin B and activation of CDK1. Importantly, we established that the phenotypic response of glioblastoma cells to DYRK1A inhibition depends on both retinoblastoma (RB) expression and the degree of residual DYRK1A activity. Moderate DYRK1A inhibition leads to moderate cyclin B accumulation, CDK1 activation and increased proliferation in RB-deficient cells. In RB-proficient cells, cyclin B/CDK1 activation in response to DYRK1A inhibition is neutralized by the RB pathway, resulting in an unchanged proliferation rate. In contrast, complete DYRK1A inhibition with high doses of inhibitors results in massive cyclin B accumulation, saturation of CDK1 activity and cell cycle arrest, regardless of RB status. These findings provide new insights into the complexity of context-dependent DYRK1A signalling in cancer cells.

7.
iScience ; 24(2): 102118, 2021 Feb 19.
Artigo em Inglês | MEDLINE | ID: mdl-33659881

RESUMO

Insulin's activation of PI3K/Akt signaling, stimulates glucose uptake by enhancing delivery of GLUT4 to the cell surface. Here we examined the origins of intercellular heterogeneity in insulin signaling. Akt activation alone accounted for ~25% of the variance in GLUT4, indicating that additional sources of variance exist. The Akt and GLUT4 responses were highly reproducible within the same cell, suggesting the variance is between cells (extrinsic) and not within cells (intrinsic). Generalized mechanistic models (supported by experimental observations) demonstrated that the correlation between the steady-state levels of two measured signaling processes decreases with increasing distance from each other and that intercellular variation in protein expression (as an example of extrinsic variance) is sufficient to account for the variance in and between Akt and GLUT4. Thus, the response of a population to insulin signaling is underpinned by considerable single-cell heterogeneity that is largely driven by variance in gene/protein expression between cells.

8.
J Hematol Oncol ; 14(1): 22, 2021 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-33531041

RESUMO

Genetic heterogeneity of tumor is closely related to its clonal evolution, phenotypic diversity and treatment resistance, and such heterogeneity has only been characterized at single-cell sub-chromosomal scale in liver cancer. Here we reconstructed the single-variant resolution clonal evolution in human liver cancer based on single-cell mutational profiles. The results indicated that key genetic events occurred early during tumorigenesis, and an early metastasis followed by independent evolution was observed in primary liver tumor and intrahepatic metastatic portal vein tumor thrombus. By parallel single-cell RNA-Seq, the transcriptomic phenotype of HCC was found to be related with genetic heterogeneity. For the first time we reconstructed the single-cell and single-variant clonal evolution in human liver cancer, and dissection of both genetic and phenotypic heterogeneity will facilitate better understanding of their relationship.


Assuntos
Carcinoma Hepatocelular/genética , Evolução Clonal , Neoplasias Hepáticas/genética , Humanos , Mutação , Análise de Célula Única , Células Tumorais Cultivadas
9.
Cell Rep ; 34(8): 108771, 2021 02 23.
Artigo em Inglês | MEDLINE | ID: mdl-33626354

RESUMO

Mass spectrometry (MS)-based phosphoproteomics has revolutionized our ability to profile phosphorylation-based signaling in cells and tissues on a global scale. To infer the action of kinases and signaling pathways in phosphoproteomic experiments, we present PhosR, a set of tools and methodologies implemented in a suite of R packages facilitating comprehensive analysis of phosphoproteomic data. By applying PhosR to both published and new phosphoproteomic datasets, we demonstrate capabilities in data imputation and normalization by using a set of "stably phosphorylated sites" and in functional analysis for inferring active kinases and signaling pathways. In particular, we introduce a "signalome" construction method for identifying a collection of signaling modules to summarize and visualize the interaction of kinases and their collective actions on signal transduction. Together, our data and findings demonstrate the utility of PhosR in processing and generating biological knowledge from MS-based phosphoproteomic data.

10.
Stem Cells Transl Med ; 10(3): 492-505, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-33145960

RESUMO

The differentiation of human stem cells into insulin secreting beta-like cells holds great promise to treat diabetes. Current protocols drive stem cells through stages of directed differentiation and maturation and produce cells that secrete insulin in response to glucose. Further refinements are now needed to faithfully phenocopy the responses of normal beta cells. A critical factor in normal beta cell behavior is the islet microenvironment which plays a central role in beta cell survival, proliferation, gene expression and secretion. One important influence on native cell responses is the capillary basement membrane. In adult islets, each beta cell makes a point of contact with basement membrane protein secreted by vascular endothelial cells resulting in structural and functional polarization. Interaction with basement membrane proteins triggers local activation of focal adhesions, cell orientation, and targeting of insulin secretion. This study aims to identifying the role of basement membrane proteins on the structure and function of human embryonic stem cell and induced pluripotent stem cell-derived beta cells. Here, we show that differentiated human stem cells-derived spheroids do contain basement membrane proteins as a diffuse web-like structure. However, the beta-like cells within the spheroid do not polarize in response to this basement membrane. We demonstrate that 2D culture of the differentiated beta cells on to basement membrane proteins enforces cell polarity and favorably alters glucose dependent insulin secretion.

11.
Mol Cell Proteomics ; 2020 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-32938751

RESUMO

Many cell surface and secreted proteins are modified by the covalent addition of glycans that play an important role in the development of multicellular organisms. These glycan modifications enable communication between cells and the extracellular matrix via interactions with specific glycan-binding lectins and the regulation of receptor-mediated signaling. Aberrant protein glycosylation has been associated with the development of several muscular diseases suggesting essential glycan- and lectin-mediated functions in myogenesis and muscle development but our molecular understanding of the precise glycans, catalytic enzymes and lectins involved remain only partially understood. Here, we quantified dynamic remodeling of the membrane-associated proteome during a time-course of myogenesis in cell culture. We observed wide-spread changes in the abundance of several important lectins and enzymes facilitating glycan biosynthesis. Glycomics-based quantification of released N-linked glycans confirmed remodeling of the glycome consistent with the regulation of glycosyltransferases and glycosidases responsible for their formation including a previously unknown di-galactose-to-sialic acid switch supporting a functional role of these glycoepitopes in myogenesis. Furthermore, dynamic quantitative glycoproteomic analysis with multiplexed stable isotope labelling and analysis of enriched glycopeptides with multiple fragmentation approaches identified glycoproteins modified by these regulated glycans including several integrins and growth factor receptors. Myogenesis was also associated with the regulation of several lectins most notably the up-regulation of galectin-1 (LGALS1). CRISPR/Cas9-mediated deletion of Lgals1 inhibited differentiation and myotube formation suggesting an early functional role of galectin-1 in the myogenic program. Importantly, similar changes in N-glycosylation and the up-regulation of galectin-1 during postnatal skeletal muscle development were observed in mice. Treatment of new-born mice with recombinant adeno-associated viruses to overexpress galectin-1 in the musculature resulted in enhanced muscle mass. Our data form a valuable resource to further understand the glycobiology of myogenesis and will aid the development of intervention strategies to promote healthy muscle development or regeneration.

12.
NPJ Syst Biol Appl ; 6(1): 22, 2020 07 16.
Artigo em Inglês | MEDLINE | ID: mdl-32678105

RESUMO

Temporal changes in omics events can now be routinely measured; however, current analysis methods are often inadequate, especially for multiomics experiments. We report a novel analysis method that can infer event ordering at better temporal resolution than the experiment, and integrates omic events into two concise visualizations (event maps and sparklines). Testing our method gave results well-correlated with prior knowledge and indicated it streamlines analysis of time-series data.


Assuntos
Biologia Computacional/métodos , Proteômica/métodos , Algoritmos , Simulação por Computador , Interpretação Estatística de Dados , Software , Análise Espaço-Temporal
13.
Mol Syst Biol ; 16(6): e9389, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32567229

RESUMO

Automated cell type identification is a key computational challenge in single-cell RNA-sequencing (scRNA-seq) data. To capitalise on the large collection of well-annotated scRNA-seq datasets, we developed scClassify, a multiscale classification framework based on ensemble learning and cell type hierarchies constructed from single or multiple annotated datasets as references. scClassify enables the estimation of sample size required for accurate classification of cell types in a cell type hierarchy and allows joint classification of cells when multiple references are available. We show that scClassify consistently performs better than other supervised cell type classification methods across 114 pairs of reference and testing data, representing a diverse combination of sizes, technologies and levels of complexity, and further demonstrate the unique components of scClassify through simulations and compendia of experimental datasets. Finally, we demonstrate the scalability of scClassify on large single-cell atlases and highlight a novel application of identifying subpopulations of cells from the Tabula Muris data that were unidentified in the original publication. Together, scClassify represents state-of-the-art methodology in automated cell type identification from scRNA-seq data.

14.
Bioinformatics ; 36(14): 4137-4143, 2020 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-32353146

RESUMO

MOTIVATION: Multi-modal profiling of single cells represents one of the latest technological advancements in molecular biology. Among various single-cell multi-modal strategies, cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) allows simultaneous quantification of two distinct species: RNA and cell-surface proteins. Here, we introduce CiteFuse, a streamlined package consisting of a suite of tools for doublet detection, modality integration, clustering, differential RNA and protein expression analysis, antibody-derived tag evaluation, ligand-receptor interaction analysis and interactive web-based visualization of CITE-seq data. RESULTS: We demonstrate the capacity of CiteFuse to integrate the two data modalities and its relative advantage against data generated from single-modality profiling using both simulations and real-world CITE-seq data. Furthermore, we illustrate a novel doublet detection method based on a combined index of cell hashing and transcriptome data. Finally, we demonstrate CiteFuse for predicting ligand-receptor interactions by using multi-modal CITE-seq data. Collectively, we demonstrate the utility and effectiveness of CiteFuse for the integrative analysis of transcriptome and epitope profiles from CITE-seq data. AVAILABILITY AND IMPLEMENTATION: CiteFuse is freely available at http://shiny.maths.usyd.edu.au/CiteFuse/ as an online web service and at https://github.com/SydneyBioX/CiteFuse/ as an R package. CONTACT: pengyi.yang@sydney.edu.au. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Transcriptoma , Epitopos , Perfilação da Expressão Gênica , RNA , Análise de Sequência de RNA , Análise de Célula Única
15.
J Invest Dermatol ; 140(1): 212-222.e11, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31254517

RESUMO

Actinic keratosis, Bowen's disease and cutaneous squamous cell carcinoma (cSCC) are heterogeneous keratinocytic skin lesions. Biomarkers that can accurately stratify these lesion types are needed to support a new paradigm of personalized and precise management of skin neoplasia. In this paper, we used a data independent acquisition proteomics workflow, sequential window acquisition of all theoretical mass spectra, to analyze formalin-fixed paraffin-embedded samples of normal skin and keratinocytic skin lesions, including well-differentiated, moderately differentiated and poorly differentiated cSCC lesions. We quantified 3,574 proteins across the 93 samples studied. Differential abundance analysis identified 19, 5, and 6 protein markers exclusive to actinic keratosis, Bowen's disease and cSCC lesions, respectively. Among cSCC lesions of various levels of tumor differentiation, 118, 230, and 17 proteins showed a potential as biomarkers of well-differentiated, moderately differentiated and poorly differentiated cSCC lesions, respectively. Bioinformatics analysis revealed that actinic keratosis and cSCC lesions were associated with decreased apoptosis, and Bowen's disease lesions with over-representation of the DNA damage repair pathway. Differential expression of alternatively spliced FGFR2, Rho guanosine triphosphatase signaling, and RNA metabolism proteins were associated with the level of cSCC tumor differentiation. Proteome profiles also separated keratinocytic skin lesion subtypes on principal components analysis. Overall, protein markers have excellent potential to discriminate keratinocytic skin lesion subtypes and facilitate new diagnostic and therapeutic strategies.


Assuntos
Biomarcadores/metabolismo , Doença de Bowen/metabolismo , Carcinoma de Células Escamosas/metabolismo , Ceratose Actínica/metabolismo , Proteômica/métodos , Neoplasias Cutâneas/metabolismo , Pele/metabolismo , Doença de Bowen/diagnóstico , Carcinogênese , Carcinoma de Células Escamosas/diagnóstico , Diferenciação Celular , Biologia Computacional , Reparo do DNA , Diagnóstico Diferencial , Progressão da Doença , GTP Fosfo-Hidrolases/genética , GTP Fosfo-Hidrolases/metabolismo , Humanos , Ceratose Actínica/diagnóstico , Análise de Componente Principal , Receptor Tipo 2 de Fator de Crescimento de Fibroblastos/genética , Receptor Tipo 2 de Fator de Crescimento de Fibroblastos/metabolismo , Transdução de Sinais , Pele/patologia , Neoplasias Cutâneas/diagnóstico
16.
Nucleic Acids Res ; 48(4): 1828-1842, 2020 02 28.
Artigo em Inglês | MEDLINE | ID: mdl-31853542

RESUMO

The developmental potential of cells, termed pluripotency, is highly dynamic and progresses through a continuum of naive, formative and primed states. Pluripotency progression of mouse embryonic stem cells (ESCs) from naive to formative and primed state is governed by transcription factors (TFs) and their target genes. Genomic techniques have uncovered a multitude of TF binding sites in ESCs, yet a major challenge lies in identifying target genes from functional binding sites and reconstructing dynamic transcriptional networks underlying pluripotency progression. Here, we integrated time-resolved 'trans-omic' datasets together with TF binding profiles and chromatin conformation data to identify target genes of a panel of TFs. Our analyses revealed that naive TF target genes are more likely to be TFs themselves than those of formative TFs, suggesting denser hierarchies among naive TFs. We also discovered that formative TF target genes are marked by permissive epigenomic signatures in the naive state, indicating that they are poised for expression prior to the initiation of pluripotency transition to the formative state. Finally, our reconstructed transcriptional networks pinpointed the precise timing from naive to formative pluripotency progression and enabled the spatiotemporal mapping of differentiating ESCs to their in vivo counterparts in developing embryos.


Assuntos
Desenvolvimento Embrionário/genética , Células-Tronco Embrionárias Murinas/metabolismo , Células-Tronco Pluripotentes/metabolismo , Fatores de Transcrição/genética , Animais , Sítios de Ligação/genética , Diferenciação Celular/genética , Cromatina/genética , Regulação da Expressão Gênica no Desenvolvimento/genética , Redes Reguladoras de Genes/genética , Genoma/genética , Camundongos
17.
Mol Cell Proteomics ; 20: 100030, 2020 Dec 19.
Artigo em Inglês | MEDLINE | ID: mdl-33583770

RESUMO

Many cell surface and secreted proteins are modified by the covalent addition of glycans that play an important role in the development of multicellular organisms. These glycan modifications enable communication between cells and the extracellular matrix via interactions with specific glycan-binding lectins and the regulation of receptor-mediated signaling. Aberrant protein glycosylation has been associated with the development of several muscular diseases, suggesting essential glycan- and lectin-mediated functions in myogenesis and muscle development, but our molecular understanding of the precise glycans, catalytic enzymes, and lectins involved remains only partially understood. Here, we quantified dynamic remodeling of the membrane-associated proteome during a time-course of myogenesis in cell culture. We observed wide-spread changes in the abundance of several important lectins and enzymes facilitating glycan biosynthesis. Glycomics-based quantification of released N-linked glycans confirmed remodeling of the glycome consistent with the regulation of glycosyltransferases and glycosidases responsible for their formation including a previously unknown digalactose-to-sialic acid switch supporting a functional role of these glycoepitopes in myogenesis. Furthermore, dynamic quantitative glycoproteomic analysis with multiplexed stable isotope labeling and analysis of enriched glycopeptides with multiple fragmentation approaches identified glycoproteins modified by these regulated glycans including several integrins and growth factor receptors. Myogenesis was also associated with the regulation of several lectins, most notably the upregulation of galectin-1 (LGALS1). CRISPR/Cas9-mediated deletion of Lgals1 inhibited differentiation and myotube formation, suggesting an early functional role of galectin-1 in the myogenic program. Importantly, similar changes in N-glycosylation and the upregulation of galectin-1 during postnatal skeletal muscle development were observed in mice. Treatment of new-born mice with recombinant adeno-associated viruses to overexpress galectin-1 in the musculature resulted in enhanced muscle mass. Our data form a valuable resource to further understand the glycobiology of myogenesis and will aid the development of intervention strategies to promote healthy muscle development or regeneration.

18.
BMC Genomics ; 20(Suppl 9): 913, 2019 Dec 24.
Artigo em Inglês | MEDLINE | ID: mdl-31874628

RESUMO

BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) is a fast emerging technology allowing global transcriptome profiling on the single cell level. Cell type identification from scRNA-seq data is a critical task in a variety of research such as developmental biology, cell reprogramming, and cancers. Typically, cell type identification relies on human inspection using a combination of prior biological knowledge (e.g. marker genes and morphology) and computational techniques (e.g. PCA and clustering). Due to the incompleteness of our current knowledge and the subjectivity involved in this process, a small amount of cells may be subject to mislabelling. RESULTS: Here, we propose a semi-supervised learning framework, named scReClassify, for 'post hoc' cell type identification from scRNA-seq datasets. Starting from an initial cell type annotation with potentially mislabelled cells, scReClassify first performs dimension reduction using PCA and next applies a semi-supervised learning method to learn and subsequently reclassify cells that are likely mislabelled initially to the most probable cell types. By using both simulated and real-world experimental datasets that profiled various tissues and biological systems, we demonstrate that scReClassify is able to accurately identify and reclassify misclassified cells to their correct cell types. CONCLUSIONS: scReClassify can be used for scRNA-seq data as a post hoc cell type classification tool to fine-tune cell type annotations generated by any cell type classification procedure. It is implemented as an R package and is freely available from https://github.com/SydneyBioX/scReClassify.


Assuntos
RNA-Seq/métodos , Animais , Humanos , Aprendizado de Máquina , Camundongos , Análise de Célula Única/métodos , Software
19.
BMC Bioinformatics ; 20(Suppl 19): 660, 2019 Dec 24.
Artigo em Inglês | MEDLINE | ID: mdl-31870278

RESUMO

BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) is a transformative technology, allowing global transcriptomes of individual cells to be profiled with high accuracy. An essential task in scRNA-seq data analysis is the identification of cell types from complex samples or tissues profiled in an experiment. To this end, clustering has become a key computational technique for grouping cells based on their transcriptome profiles, enabling subsequent cell type identification from each cluster of cells. Due to the high feature-dimensionality of the transcriptome (i.e. the large number of measured genes in each cell) and because only a small fraction of genes are cell type-specific and therefore informative for generating cell type-specific clusters, clustering directly on the original feature/gene dimension may lead to uninformative clusters and hinder correct cell type identification. RESULTS: Here, we propose an autoencoder-based cluster ensemble framework in which we first take random subspace projections from the data, then compress each random projection to a low-dimensional space using an autoencoder artificial neural network, and finally apply ensemble clustering across all encoded datasets to generate clusters of cells. We employ four evaluation metrics to benchmark clustering performance and our experiments demonstrate that the proposed autoencoder-based cluster ensemble can lead to substantially improved cell type-specific clusters when applied with both the standard k-means clustering algorithm and a state-of-the-art kernel-based clustering algorithm (SIMLR) designed specifically for scRNA-seq data. Compared to directly using these clustering algorithms on the original datasets, the performance improvement in some cases is up to 100%, depending on the evaluation metric used. CONCLUSIONS: Our results suggest that the proposed framework can facilitate more accurate cell type identification as well as other downstream analyses. The code for creating the proposed autoencoder-based cluster ensemble framework is freely available from https://github.com/gedcom/scCCESS.


Assuntos
Análise de Sequência de RNA , Algoritmos , Análise por Conglomerados , Análise de Dados , Humanos , Redes Neurais de Computação , RNA-Seq , Análise de Célula Única , Transcriptoma
20.
BMC Bioinformatics ; 20(Suppl 19): 721, 2019 Dec 24.
Artigo em Inglês | MEDLINE | ID: mdl-31870280

RESUMO

BACKGROUND: Differences in cell-type composition across subjects and conditions often carry biological significance. Recent advancements in single cell sequencing technologies enable cell-types to be identified at the single cell level, and as a result, cell-type composition of tissues can now be studied in exquisite detail. However, a number of challenges remain with cell-type composition analysis - none of the existing methods can identify cell-type perfectly and variability related to cell sampling exists in any single cell experiment. This necessitates the development of method for estimating uncertainty in cell-type composition. RESULTS: We developed a novel single cell differential composition (scDC) analysis method that performs differential cell-type composition analysis via bootstrap resampling. scDC captures the uncertainty associated with cell-type proportions of each subject via bias-corrected and accelerated bootstrap confidence intervals. We assessed the performance of our method using a number of simulated datasets and synthetic datasets curated from publicly available single cell datasets. In simulated datasets, scDC correctly recovered the true cell-type proportions. In synthetic datasets, the cell-type compositions returned by scDC were highly concordant with reference cell-type compositions from the original data. Since the majority of datasets tested in this study have only 2 to 5 subjects per condition, the addition of confidence intervals enabled better comparisons of compositional differences between subjects and across conditions. CONCLUSIONS: scDC is a novel statistical method for performing differential cell-type composition analysis for scRNA-seq data. It uses bootstrap resampling to estimate the standard errors associated with cell-type proportion estimates and performs significance testing through GLM and GLMM models. We have made this method available to the scientific community as part of the scdney package (Single Cell Data Integrative Analysis) R package, available from https://github.com/SydneyBioX/scdney.


Assuntos
Análise de Célula Única/métodos , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...