RESUMEN
Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.
Asunto(s)
Algoritmos , Procesamiento de Imagen Asistido por Computador , Aprendizaje Automático , SemánticaRESUMEN
Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multistage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides a reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Although focused on biomedical image analysis, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. The work serves to enhance global comprehension of a key topic in image analysis validation.
Asunto(s)
Inteligencia ArtificialRESUMEN
Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.
RESUMEN
For medicine to fulfill its promise of personalized treatments based on a better understanding of disease biology, computational and statistical tools must exist to analyze the increasing amount of patient data that becomes available. A particular challenge is that several types of data are being measured to cope with the complexity of the underlying systems, enhance predictive modeling and enrich molecular understanding. Here we review a number of recent approaches that specialize in the analysis of multimodal data in the context of predictive biomedicine. We focus on methods that combine different OMIC measurements with image or genome variation data. Our overview shows the diversity of methods that address analysis challenges and reveals new avenues for novel developments.
RESUMEN
Hematopoietic stem cells (HSCs) mediate regeneration of the hematopoietic system following injury, such as following infection or inflammation. These challenges impair HSC function, but whether this functional impairment extends beyond the duration of inflammatory exposure is unknown. Unexpectedly, we observed an irreversible depletion of functional HSCs following challenge with inflammation or bacterial infection, with no evidence of any recovery up to 1 year afterward. HSCs from challenged mice demonstrated multiple cellular and molecular features of accelerated aging and developed clinically relevant blood and bone marrow phenotypes not normally observed in aged laboratory mice but commonly seen in elderly humans. In vivo HSC self-renewal divisions were absent or extremely rare during both challenge and recovery periods. The progressive, irreversible attrition of HSC function demonstrates that temporally discrete inflammatory events elicit a cumulative inhibitory effect on HSCs. This work positions early/mid-life inflammation as a mediator of lifelong defects in tissue maintenance and regeneration.
Asunto(s)
Hematopoyesis , Células Madre Hematopoyéticas , Anciano , Envejecimiento , Animales , Médula Ósea , Humanos , Inflamación , RatonesRESUMEN
The heterogeneous nature of human CD34+ hematopoietic stem cells (HSCs) has hampered our understanding of the cellular and molecular trajectories that HSCs navigate during lineage commitment. Using various platforms including single cell RNA-sequencing and extensive xenotransplantation, we have uncovered an uncharacterized human CD34+ HSC population. These CD34+EPCR+(CD38/CD45RA)- (simply as EPCR+) HSCs have a high repopulating and self-renewal abilities, reaching a stem cell frequency of ~1 in 3 cells, the highest described to date. Their unique transcriptomic wiring in which many gene modules associated with differentiated cell lineages confers their multilineage lineage output both in vivo and in vitro. At the single cell level, EPCR+ HSCs are the most transcriptomically and functionally homogenous human HSC population defined to date and can also be easily identified in post-natal tissues. Therefore, this EPCR+ population not only offers a high human HSC resolution but also a well-structured human hematopoietic hierarchical organization at the most primitive level.
Asunto(s)
Células Madre Hematopoyéticas , Análisis de la Célula Individual , Antígenos CD34 , Moléculas de Adhesión Celular , Linaje de la Célula , Receptor de Proteína C Endotelial , HumanosRESUMEN
Acute myeloid leukemia (AML) is an aggressive blood cancer with a poor prognosis. We report a comprehensive proteogenomic analysis of bone marrow biopsies from 252 uniformly treated AML patients to elucidate the molecular pathophysiology of AML in order to inform future diagnostic and therapeutic approaches. In addition to in-depth quantitative proteomics, our analysis includes cytogenetic profiling and DNA/RNA sequencing. We identify five proteomic AML subtypes, each reflecting specific biological features spanning genomic boundaries. Two of these proteomic subtypes correlate with patient outcome, but none is exclusively associated with specific genomic aberrations. Remarkably, one subtype (Mito-AML), which is captured only in the proteome, is characterized by high expression of mitochondrial proteins and confers poor outcome, with reduced remission rate and shorter overall survival on treatment with intensive induction chemotherapy. Functional analyses reveal that Mito-AML is metabolically wired toward stronger complex I-dependent respiration and is more responsive to treatment with the BCL2 inhibitor venetoclax.
Asunto(s)
Leucemia Mieloide Aguda , Proteogenómica , Humanos , Leucemia Mieloide Aguda/tratamiento farmacológico , Leucemia Mieloide Aguda/genética , Leucemia Mieloide Aguda/patología , ProteómicaRESUMEN
Standard cancer therapy targets tumor cells without considering possible damage on the tumor microenvironment that could impair therapy response. In rectal cancer patients we find that inflammatory cancer-associated fibroblasts (iCAFs) are associated with poor chemoradiotherapy response. Employing a murine rectal cancer model or patient-derived tumor organoids and primary stroma cells, we show that, upon irradiation, interleukin-1α (IL-1α) not only polarizes cancer-associated fibroblasts toward the inflammatory phenotype but also triggers oxidative DNA damage, thereby predisposing iCAFs to p53-mediated therapy-induced senescence, which in turn results in chemoradiotherapy resistance and disease progression. Consistently, IL-1 inhibition, prevention of iCAFs senescence, or senolytic therapy sensitizes mice to irradiation, while lower IL-1 receptor antagonist serum levels in rectal patients correlate with poor prognosis. Collectively, we unravel a critical role for iCAFs in rectal cancer therapy resistance and identify IL-1 signaling as an attractive target for stroma-repolarization and prevention of cancer-associated fibroblasts senescence.
Asunto(s)
Fibroblastos Asociados al Cáncer/metabolismo , Resistencia a Antineoplásicos , Neoplasias del Recto/metabolismo , Microambiente Tumoral , Animales , Biomarcadores , Fibroblastos Asociados al Cáncer/patología , Línea Celular Tumoral , Senescencia Celular/efectos de los fármacos , Senescencia Celular/genética , Citocinas/genética , Citocinas/metabolismo , Daño del ADN , Modelos Animales de Enfermedad , Susceptibilidad a Enfermedades , Perfilación de la Expresión Génica , Xenoinjertos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Inmunohistoquímica , Estimación de Kaplan-Meier , Ratones , Terapia Neoadyuvante , Pronóstico , Neoplasias del Recto/tratamiento farmacológico , Neoplasias del Recto/etiología , Neoplasias del Recto/patología , Transducción de Señal , Microambiente Tumoral/genéticaRESUMEN
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMEN
Recent developments in stem cell biology have enabled the study of cell fate decisions in early human development that are impossible to study in vivo. However, understanding how development varies across individuals and, in particular, the influence of common genetic variants during this process has not been characterised. Here, we exploit human iPS cell lines from 125 donors, a pooled experimental design, and single-cell RNA-sequencing to study population variation of endoderm differentiation. We identify molecular markers that are predictive of differentiation efficiency of individual lines, and utilise heterogeneity in the genetic background across individuals to map hundreds of expression quantitative trait loci that influence expression dynamically during differentiation and across cellular contexts.
Asunto(s)
Diferenciación Celular/genética , Expresión Génica/genética , Células Madre Pluripotentes Inducidas/citología , Línea Celular , Endodermo/citología , Femenino , Perfilación de la Expresión Génica , Interacción Gen-Ambiente , Estudios de Asociación Genética , Heterogeneidad Genética , Humanos , Masculino , Sitios de Carácter Cuantitativo , Análisis de la Célula IndividualRESUMEN
Formation of the three primary germ layers during gastrulation is an essential step in the establishment of the vertebrate body plan and is associated with major transcriptional changes1-5. Global epigenetic reprogramming accompanies these changes6-8, but the role of the epigenome in regulating early cell-fate choice remains unresolved, and the coordination between different molecular layers is unclear. Here we describe a single-cell multi-omics map of chromatin accessibility, DNA methylation and RNA expression during the onset of gastrulation in mouse embryos. The initial exit from pluripotency coincides with the establishment of a global repressive epigenetic landscape, followed by the emergence of lineage-specific epigenetic patterns during gastrulation. Notably, cells committed to mesoderm and endoderm undergo widespread coordinated epigenetic rearrangements at enhancer marks, driven by ten-eleven translocation (TET)-mediated demethylation and a concomitant increase of accessibility. By contrast, the methylation and accessibility landscape of ectodermal cells is already established in the early epiblast. Hence, regulatory elements associated with each germ layer are either epigenetically primed or remodelled before cell-fate decisions, providing the molecular framework for a hierarchical emergence of the primary germ layers.
Asunto(s)
Metilación de ADN , Epigénesis Genética , Gástrula/citología , Gástrula/metabolismo , Gastrulación/genética , Regulación del Desarrollo de la Expresión Génica , ARN/genética , Análisis de la Célula Individual , Animales , Diferenciación Celular/genética , Linaje de la Célula/genética , Cromatina/genética , Cromatina/metabolismo , Desmetilación , Cuerpos Embrioides/citología , Endodermo/citología , Endodermo/embriología , Endodermo/metabolismo , Elementos de Facilitación Genéticos/genética , Epigenoma/genética , Eritropoyesis , Análisis Factorial , Gástrula/embriología , Gastrulación/fisiología , Mesodermo/citología , Mesodermo/embriología , Mesodermo/metabolismo , Ratones , Células Madre Pluripotentes/citología , Células Madre Pluripotentes/metabolismo , ARN/análisis , Factores de Tiempo , Dedos de ZincRESUMEN
An intricate link is becoming apparent between metabolism and cellular identities. Here, we explore the basis for such a link in an in vitro model for early mouse embryonic development: from naïve pluripotency to the specification of primordial germ cells (PGCs). Using single-cell RNA-seq with statistical modelling and modulation of energy metabolism, we demonstrate a functional role for oxidative mitochondrial metabolism in naïve pluripotency. We link mitochondrial tricarboxylic acid cycle activity to IDH2-mediated production of alpha-ketoglutarate and through it, the activity of key epigenetic regulators. Accordingly, this metabolite has a role in the maintenance of naïve pluripotency as well as in PGC differentiation, likely through preserving a particular histone methylation status underlying the transient state of developmental competence for the PGC fate. We reveal a link between energy metabolism and epigenetic control of cell state transitions during a developmental trajectory towards germ cell specification, and establish a paradigm for stabilizing fleeting cellular states through metabolic modulation.
Asunto(s)
Diferenciación Celular/efectos de los fármacos , Células Madre Embrionarias/efectos de los fármacos , Células Germinativas/efectos de los fármacos , Ácidos Cetoglutáricos/farmacología , Células Madre Pluripotentes/efectos de los fármacos , Animales , Diferenciación Celular/genética , Células Cultivadas , Embrión de Mamíferos , Células Madre Embrionarias/fisiología , Epigénesis Genética/efectos de los fármacos , Epigénesis Genética/genética , Femenino , Regulación del Desarrollo de la Expresión Génica/efectos de los fármacos , Células Germinativas/fisiología , Ácidos Cetoglutáricos/metabolismo , Masculino , Redes y Vías Metabólicas/efectos de los fármacos , Redes y Vías Metabólicas/genética , Ratones , Ratones Endogámicos C57BL , Ratones Transgénicos , Células Madre Pluripotentes/fisiologíaRESUMEN
BACKGROUND AND PURPOSE: To evaluate spatial differences in dose distributions of the ano-rectal wall (ARW) using dose-surface maps (DSMs) between prostate cancer patients receiving intensity-modulated radiation therapy with and without implantable rectum spacer (IMRT+IRS; IMRT-IRS, respectively), and to correlate this with late gastro-intestinal (GI) toxicities using validated spatial and non-spatial normal-tissue complication probability (NTCP) models. MATERIALS AND METHODS: For 26 patients DSMs of the ARW were generated. From the DSMs various shape-based dose measures were calculated at different dose levels: lateral extent, longitudinal extent, and eccentricity. The contiguity of the ARW dose distribution was assessed by the contiguous-DSH (cDSH). Predicted complication rates between IMRT+IRS and IMRT-IRS plans were assessed using a spatial NTCP model and compared against a non-spatial NTCP model. RESULTS: Dose surface maps are generated for prostate radiotherapy using an IRS. Lateral extent, longitudinal extent and cDSH were significantly lower in IMRT+IRS than for IMRT-IRS at high-dose levels. Largest significant differences were observed for cDSH at dose levels >50â¯Gy, followed by lateral extent at doses >57â¯Gy, and longitudinal extent in anterior and superior-inferior directions. Significant decreases (pâ¯=â¯0.01) in median rectal and anal NTCPs (respectively, Gr 2 late rectal bleeding and subjective sphincter control) were predicted when using an IRS. CONCLUSIONS: Local-dose effects are predicted to be significantly reduced by an IRS. The spatial NTCP model predicts a significant decrease in Gr 2 late rectal bleeding and subjective sphincter control. Dose constraints can be improved for current clinical treatment planning.
RESUMEN
Multi-omics studies promise the improved characterization of biological processes across molecular layers. However, methods for the unsupervised integration of the resulting heterogeneous data sets are lacking. We present Multi-Omics Factor Analysis (MOFA), a computational method for discovering the principal sources of variation in multi-omics data sets. MOFA infers a set of (hidden) factors that capture biological and technical sources of variability. It disentangles axes of heterogeneity that are shared across multiple modalities and those specific to individual data modalities. The learnt factors enable a variety of downstream analyses, including identification of sample subgroups, data imputation and the detection of outlier samples. We applied MOFA to a cohort of 200 patient samples of chronic lymphocytic leukaemia, profiled for somatic mutations, RNA expression, DNA methylation and ex vivo drug responses. MOFA identified major dimensions of disease heterogeneity, including immunoglobulin heavy-chain variable region status, trisomy of chromosome 12 and previously underappreciated drivers, such as response to oxidative stress. In a second application, we used MOFA to analyse single-cell multi-omics data, identifying coordinated transcriptional and epigenetic changes along cell differentiation.
Asunto(s)
Biología Computacional/métodos , Conjuntos de Datos como Asunto , Antineoplásicos/uso terapéutico , Simulación por Computador , Humanos , Leucemia Linfocítica Crónica de Células B/tratamiento farmacológico , Leucemia Linfocítica Crónica de Células B/genética , Modelos Estadísticos , Estrés Oxidativo , Programas Informáticos , TranscriptomaRESUMEN
PURPOSE: The purpose of this study is to investigate whether machine learning with dosiomic, radiomic, and demographic features allows for xerostomia risk assessment more precise than normal tissue complication probability (NTCP) models based on the mean radiation dose to parotid glands. MATERIAL AND METHODS: A cohort of 153 head-and-neck cancer patients was used to model xerostomia at 0-6 months (early), 6-15 months (late), 15-24 months (long-term), and at any time (a longitudinal model) after radiotherapy. Predictive power of the features was evaluated by the area under the receiver operating characteristic curve (AUC) of univariate logistic regression models. The multivariate NTCP models were tuned and tested with single and nested cross-validation, respectively. We compared predictive performance of seven classification algorithms, six feature selection methods, and ten data cleaning/class balancing techniques using the Friedman test and the Nemenyi post hoc analysis. RESULTS: NTCP models based on the parotid mean dose failed to predict xerostomia (AUCs < 0.60). The most informative predictors were found for late and long-term xerostomia. Late xerostomia correlated with the contralateral dose gradient in the anterior-posterior (AUC = 0.72) and the right-left (AUC = 0.68) direction, whereas long-term xerostomia was associated with parotid volumes (AUCs > 0.85), dose gradients in the right-left (AUCs > 0.78), and the anterior-posterior (AUCs > 0.72) direction. Multivariate models of long-term xerostomia were typically based on the parotid volume, the parotid eccentricity, and the dose-volume histogram (DVH) spread with the generalization AUCs ranging from 0.74 to 0.88. On average, support vector machines and extra-trees were the top performing classifiers, whereas the algorithms based on logistic regression were the best choice for feature selection. We found no advantage in using data cleaning or class balancing methods. CONCLUSION: We demonstrated that incorporation of organ- and dose-shape descriptors is beneficial for xerostomia prediction in highly conformal radiotherapy treatments. Due to strong reliance on patient-specific, dose-independent factors, our results underscore the need for development of personalized data-driven risk profiles for NTCP models of xerostomia. The facilitated machine learning pipeline is described in detail and can serve as a valuable reference for future work in radiomic and dosiomic NTCP modeling.
RESUMEN
Single-cell RNA-sequencing (scRNA-seq) allows studying heterogeneity in gene expression in large cell populations. Such heterogeneity can arise due to technical or biological factors, making decomposing sources of variation difficult. We here describe f-scLVM (factorial single-cell latent variable model), a method based on factor analysis that uses pathway annotations to guide the inference of interpretable factors underpinning the heterogeneity. Our model jointly estimates the relevance of individual factors, refines gene set annotations, and infers factors without annotation. In applications to multiple scRNA-seq datasets, we find that f-scLVM robustly decomposes scRNA-seq datasets into interpretable components, thereby facilitating the identification of novel subpopulations.
Asunto(s)
Análisis Factorial , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Programas Informáticos , Animales , Simulación por Computador , Bases de Datos como Asunto , Regulación de la Expresión Génica , Ratones , Modelos Teóricos , Células Madre Embrionarias de Ratones/metabolismo , Neuronas/metabolismo , Reproducibilidad de los ResultadosRESUMEN
PURPOSE: Xerostomia is a common side effect of radiotherapy resulting from excessive irradiation of salivary glands. Typically, xerostomia is modeled by the mean dose-response characteristic of parotid glands and prevented by mean dose constraints to either contralateral or both parotid glands. The aim of this study was to investigate whether normal tissue complication probability (NTCP) models based on the mean radiation dose to parotid glands are suitable for the prediction of xerostomia in a highly conformal low-dose regime of modern intensity-modulated radiotherapy (IMRT) techniques. MATERIAL AND METHODS: We present a retrospective analysis of 153 head and neck cancer patients treated with radiotherapy. The Lyman Kutcher Burman (LKB) model was used to evaluate predictive power of the parotid gland mean dose with respect to xerostomia at 6 and 12 months after the treatment. The predictive performance of the model was evaluated by receiver operating characteristic (ROC) curves and precision-recall (PR) curves. RESULTS: Average mean doses to ipsilateral and contralateral parotid glands were 25.4 Gy and 18.7 Gy, respectively. QUANTEC constraints were met in 74% of patients. Mild to severe (G1+) xerostomia prevalence at both 6 and 12 months was 67%. Moderate to severe (G2+) xerostomia prevalence at 6 and 12 months was 20% and 15%, respectively. G1 + xerostomia was predicted reasonably well with area under the ROC curve ranging from 0.69 to 0.76. The LKB model failed to provide reliable G2 + xerostomia predictions at both time points. CONCLUSIONS: Reduction of the mean dose to parotid glands below QUANTEC guidelines resulted in low G2 + xerostomia rates. In this dose domain, the mean dose models predicted G1 + xerostomia fairly well, however, failed to recognize patients at risk of G2 + xerostomia. There is a need for the development of more flexible models able to capture complexity of dose response in this dose regime.
Asunto(s)
Neoplasias de Cabeza y Cuello/radioterapia , Glándula Parótida/patología , Planificación de la Radioterapia Asistida por Computador/métodos , Radioterapia de Intensidad Modulada/efectos adversos , Xerostomía/diagnóstico , Adulto , Anciano , Anciano de 80 o más Años , Femenino , Humanos , Masculino , Persona de Mediana Edad , Glándula Parótida/efectos de la radiación , Curva ROC , Dosificación Radioterapéutica , Estudios Retrospectivos , Xerostomía/etiologíaRESUMEN
Dormant hematopoietic stem cells (dHSCs) are atop the hematopoietic hierarchy. The molecular identity of dHSCs and the mechanisms regulating their maintenance or exit from dormancy remain uncertain. Here, we use single-cell RNA sequencing (RNA-seq) analysis to show that the transition from dormancy toward cell-cycle entry is a continuous developmental path associated with upregulation of biosynthetic processes rather than a stepwise progression. In addition, low Myc levels and high expression of a retinoic acid program are characteristic for dHSCs. To follow the behavior of dHSCs in situ, a Gprc5c-controlled reporter mouse was established. Treatment with all-trans retinoic acid antagonizes stress-induced activation of dHSCs by restricting protein translation and levels of reactive oxygen species (ROS) and Myc. Mice maintained on a vitamin A-free diet lose HSCs and show a disrupted re-entry into dormancy after exposure to inflammatory stress stimuli. Our results highlight the impact of dietary vitamin A on the regulation of cell-cycle-mediated stem cell plasticity. VIDEO ABSTRACT.
Asunto(s)
Células Madre Hematopoyéticas/citología , Transducción de Señal , Tretinoina/farmacología , Vitamina A/administración & dosificación , Animales , Vías Biosintéticas , Técnicas de Cultivo de Célula , Ciclo Celular/efectos de los fármacos , Supervivencia Celular , Dieta , Perfilación de la Expresión Génica , Células Madre Hematopoyéticas/efectos de los fármacos , Ratones , Poli I-C/farmacología , Especies Reactivas de Oxígeno/metabolismo , Receptores Acoplados a Proteínas G/metabolismo , Análisis de la Célula Individual , Estrés Fisiológico , Vitamina A/farmacología , Vitaminas/administración & dosificación , Vitaminas/farmacologíaRESUMEN
Accessing gene expression at a single-cell level has unraveled often large heterogeneity among seemingly homogeneous cells, which remains obscured when using traditional population-based approaches. The computational analysis of single-cell transcriptomics data, however, still imposes unresolved challenges with respect to normalization, visualization and modeling the data. One such issue is differences in cell size, which introduce additional variability into the data and for which appropriate normalization techniques are needed. Otherwise, these differences in cell size may obscure genuine heterogeneities among cell populations and lead to overdispersed steady-state distributions of mRNA transcript numbers. We present cgCorrect, a statistical framework to correct for differences in cell size that are due to cell growth in single-cell transcriptomics data. We derive the probability for the cell-growth-corrected mRNA transcript number given the measured, cell size-dependent mRNA transcript number, based on the assumption that the average number of transcripts in a cell increases proportionally to the cell's volume during the cell cycle. cgCorrect can be used for both data normalization and to analyze the steady-state distributions used to infer the gene expression mechanism. We demonstrate its applicability on both simulated data and single-cell quantitative real-time polymerase chain reaction (PCR) data from mouse blood stem and progenitor cells (and to quantitative single-cell RNA-sequencing data obtained from mouse embryonic stem cells). We show that correcting for differences in cell size affects the interpretation of the data obtained by typically performed computational analysis.
Asunto(s)
Aumento de la Célula , Tamaño de la Célula , Perfilación de la Expresión Génica/métodos , Expresión Génica , ARN Mensajero/metabolismo , Biología Computacional , Modelos GenéticosRESUMEN
Differentiation alters molecular properties of stem and progenitor cells, leading to changes in their shape and movement characteristics. We present a deep neural network that prospectively predicts lineage choice in differentiating primary hematopoietic progenitors using image patches from brightfield microscopy and cellular movement. Surprisingly, lineage choice can be detected up to three generations before conventional molecular markers are observable. Our approach allows identification of cells with differentially expressed lineage-specifying genes without molecular labeling.