RESUMEN
In profiling assays, thousands of biological properties are measured in a single test, yielding biological discoveries by capturing the state of a cell population, often at the single-cell level. However, for profiling datasets, it has been challenging to evaluate the phenotypic activity of a sample and the phenotypic consistency among samples, due to profiles' high dimensionality, heterogeneous nature, and non-linear properties. Existing methods leave researchers uncertain where to draw boundaries between meaningful biological response and technical noise. Here, we developed a statistical framework that uses the well-established mean average precision (mAP) as a single, data-driven metric to bridge this gap. We validated the mAP framework against established metrics through simulations and real-world data applications, revealing its ability to capture subtle and meaningful biological differences in cell state. Specifically, we used mAP to assess both phenotypic activity for a given perturbation (or a sample) as well as consistency within groups of perturbations (or samples) across diverse high-dimensional datasets. We evaluated the framework on different profile types (image, protein, and mRNA profiles), perturbation types (CRISPR gene editing, gene overexpression, and small molecules), and profile resolutions (single-cell and bulk). Our open-source software allows this framework to be applied to identify interesting biological phenomena and promising therapeutics from large-scale profiling data.
RESUMEN
Measuring the phenotypic effect of treatments on cells through imaging assays is an efficient and powerful way of studying cell biology, and requires computational methods for transforming images into quantitative data. Here, we present an improved strategy for learning representations of treatment effects from high-throughput imaging, following a causal interpretation. We use weakly supervised learning for modeling associations between images and treatments, and show that it encodes both confounding factors and phenotypic features in the learned representation. To facilitate their separation, we constructed a large training dataset with images from five different studies to maximize experimental diversity, following insights from our causal analysis. Training a model with this dataset successfully improves downstream performance, and produces a reusable convolutional network for image-based profiling, which we call Cell Painting CNN. We evaluated our strategy on three publicly available Cell Painting datasets, and observed that the Cell Painting CNN improves performance in downstream analysis up to 30% with respect to classical features, while also being more computationally efficient.
Asunto(s)
Redes Neurales de la ComputaciónRESUMEN
Technological advances in high-throughput microscopy have facilitated the acquisition of cell images at a rapid pace, and data pipelines can now extract and process thousands of image-based features from microscopy images. These features represent valuable single-cell phenotypes that contain information about cell state and biological processes. The use of these features for biological discovery is known as image-based or morphological profiling. However, these raw features need processing before use and image-based profiling lacks scalable and reproducible open-source software. Inconsistent processing across studies makes it difficult to compare datasets and processing steps, further delaying the development of optimal pipelines, methods, and analyses. To address these issues, we present Pycytominer, an open-source software package with a vibrant community that establishes an image-based profiling standard. Pycytominer has a simple, user-friendly Application Programming Interface (API) that implements image-based profiling functions for processing high-dimensional morphological features extracted from microscopy images of cells. Establishing Pycytominer as a standard image-based profiling toolkit ensures consistent data processing pipelines with data provenance, therefore minimizing potential inconsistencies and enabling researchers to confidently derive accurate conclusions and discover novel insights from their data, thus driving progress in our field.
RESUMEN
Morphological and gene expression profiling can cost-effectively capture thousands of features in thousands of samples across perturbations by disease, mutation, or drug treatments, but it is unclear to what extent the two modalities capture overlapping versus complementary information. Here, using both the L1000 and Cell Painting assays to profile gene expression and cell morphology, respectively, we perturb human A549 lung cancer cells with 1,327 small molecules from the Drug Repurposing Hub across six doses, providing a data resource including dose-response data from both assays. The two assays capture both shared and complementary information for mapping cell state. Cell Painting profiles from compound perturbations are more reproducible and show more diversity but measure fewer distinct groups of features. Applying unsupervised and supervised methods to predict compound mechanisms of action (MOAs) and gene targets, we find that the two assays not only provide a partially shared but also a complementary view of drug mechanisms. Given the numerous applications of profiling in biology, our analyses provide guidance for planning experiments that profile cells for detecting distinct cell types, disease phenotypes, and response to chemical or genetic perturbations.
Asunto(s)
Perfilación de la Expresión Génica , Humanos , Perfilación de la Expresión Génica/métodos , FenotipoRESUMEN
The objective of this study was to develop instant thin-layer chromatography (ITLC) conditions for the determination of radiochemical purity of 68Ga-DOTATATE in a shorter time period than those stated in the NETSPOT (Advanced Accelerator Applications, Saint-Genis-Pouilly, France; AAA) kit package insert (PI). A faster ITLC system is needed to reduce the 48- to 50-min development time so that more radioactivity is available for single patient use and wait times are shorter in the event of kit failure. Methods: Variations of the PI mobile system were evaluated with microfiber chromatography paper impregnated with silica gel (ITLC-SG). After a more suitable mobile system was identified, evaluation began by attempting to shorten the 10-cm development distance to 7, 8, and 9 cm. Results: Experiments using variations of PI mobile phase showed that increasing the proportion of methanol in the mobile phase decreased development time. Additionally, if the ratio of 1 M ammonium acetate was reduced to 10% or less, retention factor values fall outside specification. Reducing the development distance shortened development time as expected; however, it also affected the resolution aspect of the radiochromatogram. Conclusion: The fastest developing ITLC system, which maintained resolution and peak shape, was methanol:1 M ammonium acetate (80:20 V/V) with ITLC-SG using a development distance of 8 cm.