RESUMO
MOTIVATION: Multiplexed immunofluorescence (mIF) is an emerging assay for multichannel protein imaging that can decipher cell-level spatial features in tissues. However, existing automated cell phenotyping methods, such as clustering, face challenges in achieving consistency across experiments and often require subjective evaluation. As a result, mIF analyses often revert to marker gating based on manual thresholding of raw imaging data. RESULTS: To address the need for an evaluable semi-automated algorithm, we developed GammaGateR, an R package for interactive marker gating designed specifically for segmented cell-level data from mIF images. Based on a novel closed-form gamma mixture model, GammaGateR provides estimates of marker-positive cell proportions and soft clustering of marker-positive cells. The model incorporates user-specified constraints that provide a consistent but slide-specific model fit. We compared GammaGateR against the newest unsupervised approach for annotating mIF data, employing two colon datasets and one ovarian cancer dataset for the evaluation. We showed that GammaGateR produces highly similar results to a silver standard established through manual annotation. Furthermore, we demonstrated its effectiveness in identifying biological signals, achieved by mapping known spatial interactions between CD68 and MUC5AC cells in the colon and by accurately predicting survival in ovarian cancer patients using the phenotype probabilities as input for machine learning methods. GammaGateR is a highly efficient tool that can improve the replicability of marker gating results, while reducing the time of manual segmentation. AVAILABILITY AND IMPLEMENTATION: The R package is available at https://github.com/JiangmeiRubyXiong/GammaGateR.
Assuntos
Algoritmos , Análise de Célula Única , Humanos , Análise de Célula Única/métodos , Software , Processamento de Imagem Assistida por Computador/métodos , Feminino , Neoplasias Ovarianas/metabolismo , Imunofluorescência/métodos , Biomarcadores/metabolismoRESUMO
Colorectal cancer exhibits dynamic cellular and genetic heterogeneity during progression from precursor lesions toward malignancy. Analysis of spatial multi-omic data from 31 human colorectal specimens enabled phylogeographic mapping of tumor evolution that revealed individualized progression trajectories and accompanying microenvironmental and clonal alterations. Phylogeographic mapping ordered genetic events, classified tumors by their evolutionary dynamics, and placed clonal regions along global pseudotemporal progression trajectories encompassing the chromosomal instability (CIN+) and hypermutated (HM) pathways. Integrated single-cell and spatial transcriptomic data revealed recurring epithelial programs and infiltrating immune states along progression pseudotime. We discovered an immune exclusion signature (IEX), consisting of extracellular matrix regulators DDR1, TGFBI, PAK4, and DPEP1, that charts with CIN+ tumor progression, is associated with reduced cytotoxic cell infiltration, and shows prognostic value in independent cohorts. This spatial multi-omic atlas provides insights into colorectal tumor-microenvironment co-evolution, serving as a resource for stratification and targeted treatments.
Assuntos
Neoplasias Colorretais , Instabilidade de Microssatélites , Microambiente Tumoral , Humanos , Instabilidade Cromossômica/genética , Neoplasias Colorretais/patologia , Perfilação da Expressão Gênica , Quinases Ativadas por p21/genética , Filogenia , Mutação , Progressão da Doença , PrognósticoRESUMO
Motivation: Multiplexed immunofluorescence (mIF) is an emerging assay for multichannel protein imaging that can decipher cell-level spatial features in tissues. However, existing automated cell phenotyping methods, such as clustering, face challenges in achieving consistency across experiments and often require subjective evaluation. As a result, mIF analyses often revert to marker gating based on manual thresholding of raw imaging data. Results: To address the need for an evaluable semi-automated algorithm, we developed GammaGateR, an R package for interactive marker gating designed specifically for segmented cell-level data from mIF images. Based on a novel closed-form gamma mixture model, GammaGateR provides estimates of marker-positive cell proportions and soft clustering of marker-positive cells. The model incorporates user-specified constraints that provide a consistent but slide-specific model fit. We compared GammaGateR against the newest unsupervised approach for annotating mIF data, employing two colon datasets and one ovarian cancer dataset for the evaluation. We showed that GammaGateR produces highly similar results to a silver standard established through manual annotation. Furthermore, we demonstrated its effectiveness in identifying biological signals, achieved by mapping known spatial interactions between CD68 and MUC5AC cells in the colon and by accurately predicting survival in ovarian cancer patients using the phenotype probabilities as input for machine learning methods. GammaGateR is a highly efficient tool that can improve the replicability of marker gating results, while reducing the time of manual segmentation. Availability and Implementation: The R package is available at https://github.com/JiangmeiRubyXiong/GammaGateR.
RESUMO
Droplet-based single-cell RNA-seq (scRNA-seq) data are plagued by ambient contaminations caused by nucleic acid material released by dead and dying cells. This material is mixed into the buffer and is co-encapsulated with cells, leading to a lower signal-to-noise ratio. Although there exist computational methods to remove ambient contaminations post-hoc, the reliability of algorithms in generating high-quality data from low-quality sources remains uncertain. Here, we assess data quality before data filtering by a set of quantitative, contamination-based metrics that assess data quality more effectively than standard metrics. Through a series of controlled experiments, we report improvements that can minimize ambient contamination outside of tissue dissociation, via cell fixation, improved cell loading, microfluidic dilution, and nuclei versus cell preparation; many of these parameters are inaccessible on commercial platforms. We provide end-users with insights on factors that can guide their decision-making regarding optimizations that minimize ambient contamination, and metrics to assess data quality.
RESUMO
Advanced solid cancers are complex assemblies of tumor, immune, and stromal cells characterized by high intratumoral variation. We use highly multiplexed tissue imaging, 3D reconstruction, spatial statistics, and machine learning to identify cell types and states underlying morphological features of known diagnostic and prognostic significance in colorectal cancer. Quantitation of these features in high-plex marker space reveals recurrent transitions from one tumor morphology to the next, some of which are coincident with long-range gradients in the expression of oncogenes and epigenetic regulators. At the tumor invasive margin, where tumor, normal, and immune cells compete, T cell suppression involves multiple cell types and 3D imaging shows that seemingly localized 2D features such as tertiary lymphoid structures are commonly interconnected and have graded molecular properties. Thus, while cancer genetics emphasizes the importance of discrete changes in tumor state, whole-specimen imaging reveals large-scale morphological and molecular gradients analogous to those in developing tissues.
Assuntos
Adenocarcinoma , Neoplasias Colorretais , Humanos , Adenocarcinoma/patologia , Neoplasias Colorretais/genética , Neoplasias Colorretais/imunologia , Neoplasias Colorretais/patologia , Processamento de Imagem Assistida por Computador , Oncogenes , Microambiente TumoralRESUMO
Defining the complex role of the microbiome in colorectal cancer and the discovery of novel, protumorigenic microbes are areas of active investigation. In the present study, culturing and reassociation experiments revealed that toxigenic strains of Clostridioides difficile drove the tumorigenic phenotype of a subset of colorectal cancer patient-derived mucosal slurries in germ-free ApcMin/+ mice. Tumorigenesis was dependent on the C. difficile toxin TcdB and was associated with induction of Wnt signaling, reactive oxygen species, and protumorigenic mucosal immune responses marked by the infiltration of activated myeloid cells and IL17-producing lymphoid and innate lymphoid cell subsets. These findings suggest that chronic colonization with toxigenic C. difficile is a potential driver of colorectal cancer in patients. SIGNIFICANCE: Colorectal cancer is a leading cause of cancer and cancer-related deaths worldwide, with a multifactorial etiology that likely includes procarcinogenic bacteria. Using human colon cancer specimens, culturing, and murine models, we demonstrate that chronic infection with the enteric pathogen C. difficile is a previously unrecognized contributor to colonic tumorigenesis. See related commentary by Jain and Dudeja, p. 1838. This article is highlighted in the In This Issue feature, p. 1825.
Assuntos
Toxinas Bacterianas , Clostridioides difficile , Neoplasias do Colo , Neoplasias Colorretais , Animais , Toxinas Bacterianas/genética , Toxinas Bacterianas/metabolismo , Carcinogênese , Clostridioides , Humanos , Imunidade Inata , Linfócitos/metabolismo , CamundongosRESUMO
Increasingly, highly multiplexed tissue imaging methods are used to profile protein expression at the single-cell level. However, a critical limitation is the lack of robust cell segmentation tools for tissue sections. We present Multiplexed Image Resegmentation of Internal Aberrant Membranes (MIRIAM) that combines (a) a pipeline for cell segmentation and quantification that incorporates machine learning-based pixel classification to define cellular compartments, (b) a novel method for extending incomplete cell membranes, and (c) a deep learning-based cell shape descriptor. Using human colonic adenomas as an example, we show that MIRIAM is superior to widely utilized segmentation methods and provides a pipeline that is broadly applicable to different imaging platforms and tissue types.
Assuntos
Aprendizado Profundo , Forma Celular , Humanos , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de MáquinaRESUMO
Colorectal cancers (CRCs) arise from precursor polyps whose cellular origins, molecular heterogeneity, and immunogenic potential may reveal diagnostic and therapeutic insights when analyzed at high resolution. We present a single-cell transcriptomic and imaging atlas of the two most common human colorectal polyps, conventional adenomas and serrated polyps, and their resulting CRC counterparts. Integrative analysis of 128 datasets from 62 participants reveals adenomas arise from WNT-driven expansion of stem cells, while serrated polyps derive from differentiated cells through gastric metaplasia. Metaplasia-associated damage is coupled to a cytotoxic immune microenvironment preceding hypermutation, driven partly by antigen-presentation differences associated with tumor cell-differentiation status. Microsatellite unstable CRCs contain distinct non-metaplastic regions where tumor cells acquire stem cell properties and cytotoxic immune cells are depleted. Our multi-omic atlas provides insights into malignant progression of colorectal polyps and their microenvironment, serving as a framework for precision surveillance and prevention of CRC.
Assuntos
Pólipos do Colo/patologia , Neoplasias Colorretais/patologia , Microambiente Tumoral , Imunidade Adaptativa , Adenoma/genética , Adenoma/patologia , Adulto , Idoso , Animais , Carcinogênese/genética , Carcinogênese/patologia , Morte Celular , Diferenciação Celular , Pólipos do Colo/genética , Pólipos do Colo/imunologia , Neoplasias Colorretais/genética , Neoplasias Colorretais/imunologia , Progressão da Doença , Feminino , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Heterogeneidade Genética , Humanos , Masculino , Camundongos , Pessoa de Meia-Idade , Mutação/genética , Células-Tronco Neoplásicas/metabolismo , Células-Tronco Neoplásicas/patologia , RNA-Seq , Reprodutibilidade dos Testes , Análise de Célula Única , Microambiente Tumoral/imunologiaRESUMO
The Gut Cell Atlas (GCA), an initiative funded by the Helmsley Charitable Trust, seeks to create a reference platform to understand the human gut, with a specific focus on Crohn's disease. Although a primary focus of the GCA is on focusing on single-cell profiling, we seek to provide a framework to integrate other analyses on multi-modality data such as electronic health record data, radiological images, and histology tissues/images. Herein, we use the research electronic data capture (REDCap) system as the central tool for a secure web application that supports protected health information (PHI) restricted access. Our innovations focus on addressing the challenges with tracking all specimens and biopsies, validating manual data entry at scale, and sharing organizational data across the group. We present a scalable, cross-platform barcode printing/record system that integrates with REDCap. The central informatics infrastructure to support our design is a tuple table to track longitudinal data entry and sample tracking. The current data collection (by December 2020) is illustrated with types and formats of the data that the system collects. We estimate that one terabyte is needed for data storage per patient study. Our proposed data sharing informatics system addresses the challenges with integrating physical sample tracking, large files, and manual data entry with REDCap.
RESUMO
INTRODUCTION: The SARS-CoV-2 (COVID-19) pandemic has exposed health disparities throughout the USA, particularly among racial and ethnic minorities. As a result, there is a need for data-driven approaches to pinpoint the unique constellation of clinical and social determinants of health (SDOH) risk factors that give rise to poor patient outcomes following infection in US communities. METHODS: We combined county-level COVID-19 testing data, COVID-19 vaccination rates and SDOH information in Tennessee. Between February and May 2021, we trained machine learning models on a semimonthly basis using these datasets to predict COVID-19 incidence in Tennessee counties. We then analyzed SDOH data features at each time point to rank the impact of each feature on model performance. RESULTS: Our results indicate that COVID-19 vaccination rates play a crucial role in determining future COVID-19 disease risk. Beginning in mid-March 2021, higher vaccination rates significantly correlated with lower COVID-19 case growth predictions. Further, as the relative importance of COVID-19 vaccination data features grew, demographic SDOH features such as age, race and ethnicity decreased while the impact of socioeconomic and environmental factors, including access to healthcare and transportation, increased. CONCLUSION: Incorporating a data framework to track the evolving patterns of community-level SDOH risk factors could provide policy-makers with additional data resources to improve health equity and resilience to future public health emergencies.
Assuntos
COVID-19 , Determinantes Sociais da Saúde , Vacinação/estatística & dados numéricos , COVID-19/epidemiologia , Teste para COVID-19 , Vacinas contra COVID-19/administração & dosagem , Humanos , Aprendizado de Máquina , Modelos Teóricos , Tennessee/epidemiologiaRESUMO
Single-cell RNA sequencing data require several processing procedures to arrive at interpretable results. While commercial platforms can serve as "one-stop shops" for data analysis, they relinquish the flexibility required for customized analyses and are often inflexible between experimental systems. For instance, there is no universal solution for the discrimination of informative or uninformative encapsulated cellular material; thus, pipeline flexibility takes priority. Here, we demonstrate a full data analysis pipeline, constructed modularly from open-source software, including tools that we have contributed. For complete details on the use and execution of this protocol, please refer to Petukhov et al. (2018), Heiser et al. (2020), and Heiser and Lau (2020).
Assuntos
Bases de Dados de Ácidos Nucleicos , RNA-Seq , Análise de Sequência de RNA , Análise de Célula Única , SoftwareRESUMO
A major challenge for droplet-based single-cell sequencing technologies is distinguishing true cells from uninformative barcodes in data sets with disparate library sizes confounded by high technical noise (i.e., batch-specific ambient RNA). We present dropkick, a fully automated software tool for quality control and filtering of single-cell RNA sequencing (scRNA-seq) data with a focus on excluding ambient barcodes and recovering real cells bordering the quality threshold. By automatically determining data set-specific training labels based on predictive global heuristics, dropkick learns a gene-based representation of real cells and ambient noise, calculating a cell probability score for each barcode. Using simulated and real-world scRNA-seq data, we benchmarked dropkick against conventional thresholding approaches and EmptyDrops, a popular computational method, showing greater recovery of rare cell types and exclusion of empty droplets and noisy, uninformative barcodes. We show for both low- and high-background data sets that dropkick's weakly supervised model reliably learns which genes are enriched in ambient barcodes and draws a multidimensional boundary that is more robust to data set-specific variation than existing filtering approaches. dropkick provides a fast, automated tool for reproducible cell identification from scRNA-seq data that is critical to downstream analysis and compatible with popular single-cell Python packages.
Assuntos
Análise de Célula Única , Software , Perfilação da Expressão Gênica/métodos , Controle de Qualidade , RNA/genética , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodosRESUMO
High-dimensional data, such as those generated by single-cell RNA sequencing (scRNA-seq), present challenges in interpretation and visualization. Numerical and computational methods for dimensionality reduction allow for low-dimensional representation of genome-scale expression data for downstream clustering, trajectory reconstruction, and biological interpretation. However, a comprehensive and quantitative evaluation of the performance of these techniques has not been established. We present an unbiased framework that defines metrics of global and local structure preservation in dimensionality reduction transformations. Using discrete and continuous real-world and synthetic scRNA-seq datasets, we show how input cell distribution and method parameters are largely determinant of global, local, and organizational data structure preservation by 11 common dimensionality reduction methods.
Assuntos
Perfilação da Expressão Gênica , Análise de Sequência de RNA , Análise de Célula Única , Transcriptoma/genética , Algoritmos , Análise por Conglomerados , Humanos , RNA Citoplasmático Pequeno/genética , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Sequenciamento do Exoma/métodosRESUMO
Swi-independent 3a and 3b (Sin3a and Sin3b) are paralogous transcriptional coregulators that direct cellular differentiation, survival, and function. Here, we report that mouse Sin3a and Sin3b are coproduced in most pancreatic cells during embryogenesis but become much more enriched in endocrine cells in adults, implying continued essential roles in mature endocrine cell function. Mice with loss of Sin3a in endocrine progenitors were normal during early postnatal stages but gradually developed diabetes before weaning. These physiological defects were preceded by the compromised survival, insulin-vesicle packaging, insulin secretion, and nutrient-induced Ca2+ influx of Sin3a-deficient ß-cells. RNA sequencing coupled with candidate chromatin immunoprecipitation assays revealed several genes that could be directly regulated by Sin3a in ß-cells, which modulate Ca2+/ion transport, cell survival, vesicle/membrane trafficking, glucose metabolism, and stress responses. Finally, mice with loss of both Sin3a and Sin3b in multipotent embryonic pancreatic progenitors had significantly reduced islet cell mass at birth, caused by decreased endocrine progenitor production and increased ß-cell death. These findings highlight the stage-specific requirements for the presumed "general" coregulators Sin3a and Sin3b in islet ß-cells, with Sin3a being dispensable for differentiation but required for postnatal function and survival.