RESUMEN
Widespread sequencing has yielded thousands of missense variants predicted or confirmed as disease causing. This creates a new bottleneck: determining the functional impact of each variant-typically a painstaking, customized process undertaken one or a few genes and variants at a time. Here, we established a high-throughput imaging platform to assay the impact of coding variation on protein localization, evaluating 3,448 missense variants of over 1,000 genes and phenotypes. We discovered that mislocalization is a common consequence of coding variation, affecting about one-sixth of all pathogenic missense variants, all cellular compartments, and recessive and dominant disorders alike. Mislocalization is primarily driven by effects on protein stability and membrane insertion rather than disruptions of trafficking signals or specific interactions. Furthermore, mislocalization patterns help explain pleiotropy and disease severity and provide insights on variants of uncertain significance. Our publicly available resource extends our understanding of coding variation in human diseases.
RESUMEN
Quantitative optical microscopy-an emerging, transformative approach to single-cell biology-has seen dramatic methodological advancements over the past few years. However, its impact has been hampered by challenges in the areas of data generation, management, and analysis. Here we outline these technical and cultural challenges and provide our perspective on the trajectory of this field, ushering in a new era of quantitative, data-driven microscopy. We also contrast it to the three decades of enormous advances in the field of genomics that have significantly enhanced the reproducibility and wider adoption of a plethora of genomic approaches.
Asunto(s)
Genómica/tendencias , Microscopía/tendencias , Imagen Óptica/tendencias , Análisis de la Célula Individual/tendencias , Animales , Difusión de Innovaciones , Genómica/historia , Ensayos Analíticos de Alto Rendimiento/tendencias , Historia del Siglo XX , Historia del Siglo XXI , Humanos , Microscopía/historia , Imagen Óptica/historia , Reproducibilidad de los Resultados , Proyectos de Investigación/tendencias , Análisis de la Célula Individual/historiaRESUMEN
The identification of genetic and chemical perturbations with similar impacts on cell morphology can elucidate compounds' mechanisms of action or novel regulators of genetic pathways. Research on methods for identifying such similarities has lagged due to a lack of carefully designed and well-annotated image sets of cells treated with chemical and genetic perturbations. Here we create such a Resource dataset, CPJUMP1, in which each perturbed gene's product is a known target of at least two chemical compounds in the dataset. We systematically explore the directionality of correlations among perturbations that target the same protein encoded by a given gene, and we find that identifying matches between chemical and genetic perturbations is a challenging task. Our dataset and baseline analyses provide a benchmark for evaluating methods that measure perturbation similarities and impact, and more generally, learn effective representations of cellular state from microscopy images. Such advancements would accelerate the applications of image-based profiling of cellular states, such as uncovering drug mode of action or probing functional genomics.
Asunto(s)
Procesamiento de Imagen Asistido por Computador , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Microscopía/métodosRESUMEN
The mechanism by which cells decide to skip mitosis to become polyploid is largely undefined. Here we used a high-content image-based screen to identify small-molecule probes that induce polyploidization of megakaryocytic leukemia cells and serve as perturbagens to help understand this process. Our study implicates five networks of kinases that regulate the switch to polyploidy. Moreover, we find that dimethylfasudil (diMF, H-1152P) selectively increased polyploidization, mature cell-surface marker expression, and apoptosis of malignant megakaryocytes. An integrated target identification approach employing proteomic and shRNA screening revealed that a major target of diMF is Aurora kinase A (AURKA). We further find that MLN8237 (Alisertib), a selective inhibitor of AURKA, induced polyploidization and expression of mature megakaryocyte markers in acute megakaryocytic leukemia (AMKL) blasts and displayed potent anti-AMKL activity in vivo. Our findings provide a rationale to support clinical trials of MLN8237 and other inducers of polyploidization and differentiation in AMKL.
Asunto(s)
Azepinas/farmacología , Descubrimiento de Drogas , Leucemia Megacarioblástica Aguda/tratamiento farmacológico , Megacariocitos/metabolismo , Poliploidía , Pirimidinas/farmacología , Bibliotecas de Moléculas Pequeñas , 1-(5-Isoquinolinesulfonil)-2-Metilpiperazina/análogos & derivados , 1-(5-Isoquinolinesulfonil)-2-Metilpiperazina/farmacología , Animales , Aurora Quinasa A , Aurora Quinasas , Diferenciación Celular/efectos de los fármacos , Proliferación Celular/efectos de los fármacos , Humanos , Leucemia Megacarioblástica Aguda/genética , Megacariocitos/citología , Megacariocitos/patología , Ratones , Ratones Endogámicos C57BL , Mapas de Interacción de Proteínas , Proteínas Serina-Treonina Quinasas/antagonistas & inhibidores , Proteínas Serina-Treonina Quinasas/metabolismo , Quinasas Asociadas a rho/metabolismoRESUMEN
The nutrient- and growth factor-responsive kinase mTOR complex 1 (mTORC1) regulates many processes that control growth, including protein synthesis, autophagy, and lipogenesis. Through unknown mechanisms, mTORC1 promotes the function of SREBP, a master regulator of lipo- and sterolgenic gene transcription. Here, we demonstrate that mTORC1 regulates SREBP by controlling the nuclear entry of lipin 1, a phosphatidic acid phosphatase. Dephosphorylated, nuclear, catalytically active lipin 1 promotes nuclear remodeling and mediates the effects of mTORC1 on SREBP target gene, SREBP promoter activity, and nuclear SREBP protein abundance. Inhibition of mTORC1 in the liver significantly impairs SREBP function and makes mice resistant, in a lipin 1-dependent fashion, to the hepatic steatosis and hypercholesterolemia induced by a high-fat and -cholesterol diet. These findings establish lipin 1 as a key component of the mTORC1-SREBP pathway.
Asunto(s)
Proteínas Nucleares/metabolismo , Proteínas/metabolismo , Transducción de Señal , Proteína 1 de Unión a los Elementos Reguladores de Esteroles/metabolismo , Proteína 2 de Unión a Elementos Reguladores de Esteroles/metabolismo , Animales , Humanos , Metabolismo de los Lípidos , Masculino , Diana Mecanicista del Complejo 1 de la Rapamicina , Ratones , Complejos Multiproteicos , Fosfatidato Fosfatasa , Serina-Treonina Quinasas TORRESUMEN
Cells can be perturbed by various chemical and genetic treatments and the impact on gene expression and morphology can be measured via transcriptomic profiling and image-based assays, respectively. The patterns observed in these high-dimensional profile data can power a dozen applications in drug discovery and basic biology research, but both types of profiles are rarely available for large-scale experiments. Here, we provide a collection of four datasets with both gene expression and morphological profile data useful for developing and testing multimodal methodologies. Roughly a thousand features are measured for each of the two data types, across more than 28,000 chemical and genetic perturbations. We define biological problems that use the shared and complementary information in these two data modalities, provide baseline analysis and evaluation metrics for multi-omic applications, and make the data resource publicly available ( https://broad.io/rosetta/ ).
Asunto(s)
Descubrimiento de Drogas , Perfilación de la Expresión Génica , Perfilación de la Expresión Génica/métodos , Expresión GénicaRESUMEN
Drug-induced liver injury (DILI) has been a significant challenge in drug discovery, often leading to clinical trial failures and necessitating drug withdrawals. Over the last decade, the existing suite of in vitro proxy-DILI assays has generally improved at identifying compounds with hepatotoxicity. However, there is considerable interest in enhancing the in silico prediction of DILI because it allows for evaluating large sets of compounds more quickly and cost-effectively, particularly in the early stages of projects. In this study, we aim to study ML models for DILI prediction that first predict nine proxy-DILI labels and then use them as features in addition to chemical structural features to predict DILI. The features include in vitro (e.g., mitochondrial toxicity, bile salt export pump inhibition) data, in vivo (e.g., preclinical rat hepatotoxicity studies) data, pharmacokinetic parameters of maximum concentration, structural fingerprints, and physicochemical parameters. We trained DILI-prediction models on 888 compounds from the DILI data set (composed of DILIst and DILIrank) and tested them on a held-out external test set of 223 compounds from the DILI data set. The best model, DILIPredictor, attained an AUC-PR of 0.79. This model enabled the detection of the top 25 toxic compounds (2.68 LR+, positive likelihood ratio) compared to models using only structural features (1.65 LR+ score). Using feature interpretation from DILIPredictor, we identified the chemical substructures causing DILI and differentiated cases of DILI caused by compounds in animals but not in humans. For example, DILIPredictor correctly recognized 2-butoxyethanol as nontoxic in humans despite its hepatotoxicity in mice models. Overall, the DILIPredictor model improves the detection of compounds causing DILI with an improved differentiation between animal and human sensitivity and the potential for mechanism evaluation. DILIPredictor required only chemical structures as input for prediction and is publicly available at https://broad.io/DILIPredictor for use via web interface and with all code available for download.
Asunto(s)
Enfermedad Hepática Inducida por Sustancias y Drogas , Humanos , Animales , RatasRESUMEN
Evolving in sync with the computation revolution over the past 30 years, computational biology has emerged as a mature scientific field. While the field has made major contributions toward improving scientific knowledge and human health, individual computational biology practitioners at various institutions often languish in career development. As optimistic biologists passionate about the future of our field, we propose solutions for both eager and reluctant individual scientists, institutions, publishers, funding agencies, and educators to fully embrace computational biology. We believe that in order to pave the way for the next generation of discoveries, we need to improve recognition for computational biologists and better align pathways of career success with pathways of scientific progress. With 10 outlined steps, we call on all adjacent fields to move away from the traditional individual, single-discipline investigator research model and embrace multidisciplinary, data-driven, team science.
Asunto(s)
Biología Computacional , Presupuestos , Conducta Cooperativa , Humanos , Investigación Interdisciplinaria , Tutoría , Motivación , Publicaciones , Recompensa , Programas InformáticosRESUMEN
Drug-induced cardiotoxicity (DICT) is a major concern in drug development, accounting for 10-14% of postmarket withdrawals. In this study, we explored the capabilities of chemical and biological data to predict cardiotoxicity, using the recently released DICTrank data set from the United States FDA. We found that such data, including protein targets, especially those related to ion channels (e.g., hERG), physicochemical properties (e.g., electrotopological state), and peak concentration in plasma offer strong predictive ability for DICT. Compounds annotated with mechanisms of action such as cyclooxygenase inhibition could distinguish between most-concern and no-concern DICT. Cell Painting features for ER stress discerned most-concern cardiotoxic from nontoxic compounds. Models based on physicochemical properties provided substantial predictive accuracy (AUCPR = 0.93). With the availability of omics data in the future, using biological data promises enhanced predictability and deeper mechanistic insights, paving the way for safer drug development. All models from this study are available at https://broad.io/DICTrank_Predictor.
Asunto(s)
Cardiotoxicidad , Desarrollo de Medicamentos , Humanos , Cardiotoxicidad/etiología , Cardiotoxicidad/metabolismoRESUMEN
Obesity and its associated metabolic syndrome are a leading cause of morbidity and mortality. Given the disease's heavy burden on patients and the healthcare system, there has been increased interest in identifying pharmacological targets for the treatment and prevention of obesity. Towards this end, genome-wide association studies (GWAS) have identified hundreds of human genetic variants associated with obesity. The next challenge is to experimentally define which of these variants are causally linked to obesity, and could therefore become targets for the treatment or prevention of obesity. Here we employ high-throughput in vivo RNAi screening to test for causality 293 C. elegans orthologs of human obesity-candidate genes reported in GWAS. We RNAi screened these 293 genes in C. elegans subject to two different feeding regimens: (1) regular diet, and (2) high-fructose diet, which we developed and present here as an invertebrate model of diet-induced obesity (DIO). We report 14 genes that promote obesity and 3 genes that prevent DIO when silenced in C. elegans. Further, we show that knock-down of the 3 DIO genes not only prevents excessive fat accumulation in primary and ectopic fat depots but also improves the health and extends the lifespan of C. elegans overconsuming fructose. Importantly, the direction of the association between expression variants in these loci and obesity in mice and humans matches the phenotypic outcome of the loss-of-function of the C. elegans ortholog genes, supporting the notion that some of these genes would be causally linked to obesity across phylogeny. Therefore, in addition to defining causality for several genes so far merely correlated with obesity, this study demonstrates the value of model systems compatible with in vivo high-throughput genetic screening to causally link GWAS gene candidates to human diseases.
Asunto(s)
Caenorhabditis elegans/genética , Predisposición Genética a la Enfermedad , Obesidad/genética , Animales , Carbohidratos de la Dieta/administración & dosificación , Fructosa/administración & dosificación , Expresión Génica , Homeostasis , Humanos , Metaanálisis como Asunto , FenotipoRESUMEN
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMEN
Quantitative microscopy is a powerful method for performing phenotypic screens from which image-based profiling can extract a wealth of information, termed profiles. These profiles can be used to elucidate the changes in cellular phenotypes across cell populations from different patient samples or following genetic or chemical perturbations. One such image-based profiling method is the Cell Painting assay, which provides morphological insight through the imaging of eight cellular compartments. Here, we examine the performance of the Cell Painting assay across multiple high-throughput microscope systems and find that all are compatible with this assay. Furthermore, we determine independently for each microscope system the best performing settings, providing those who wish to adopt this assay an ideal starting point for their own assays. We also explore the impact of microscopy setting changes in the Cell Painting assay and find that few dramatically reduce the quality of a Cell Painting profile, regardless of the microscope used.
Asunto(s)
Bioensayo , Microscopía , Humanos , Microscopía/métodos , Bioensayo/métodosRESUMEN
Pollen and tracheophyte spores are ubiquitous environmental indicators at local and global scales. Palynology is typically performed manually by microscopic analysis; a specialised and time-consuming task limited in taxonomical precision and sampling frequency, therefore restricting data quality used to inform climate change and pollen forecasting models. We build on the growing work using AI (artificial intelligence) for automated pollen classification to design a flexible network that can deal with the uncertainty of broad-scale environmental applications. We combined imaging flow cytometry with Guided Deep Learning to identify and accurately categorise pollen in environmental samples; here, pollen grains captured within c. 5500 Cal yr BP old lake sediments. Our network discriminates not only pollen included in training libraries to the species level but, depending on the sample, can classify previously unseen pollen to the likely phylogenetic order, family and even genus. Our approach offers valuable insights into the development of a widely transferable, rapid and accurate exploratory tool for pollen classification in 'real-world' environmental samples with improved accuracy over pure deep learning techniques. This work has the potential to revolutionise many aspects of palynology, allowing a more detailed spatial and temporal understanding of pollen in the environment with improved taxonomical resolution.
Asunto(s)
Aprendizaje Profundo , Inteligencia Artificial , Citometría de Flujo , Filogenia , PolenRESUMEN
A variational autoencoder (VAE) is a machine learning algorithm, useful for generating a compressed and interpretable latent space. These representations have been generated from various biomedical data types and can be used to produce realistic-looking simulated data. However, standard vanilla VAEs suffer from entangled and uninformative latent spaces, which can be mitigated using other types of VAEs such as ß-VAE and MMD-VAE. In this project, we evaluated the ability of VAEs to learn cell morphology characteristics derived from cell images. We trained and evaluated these three VAE variants-Vanilla VAE, ß-VAE, and MMD-VAE-on cell morphology readouts and explored the generative capacity of each model to predict compound polypharmacology (the interactions of a drug with more than one target) using an approach called latent space arithmetic (LSA). To test the generalizability of the strategy, we also trained these VAEs using gene expression data of the same compound perturbations and found that gene expression provides complementary information. We found that the ß-VAE and MMD-VAE disentangle morphology signals and reveal a more interpretable latent space. We reliably simulated morphology and gene expression readouts from certain compounds thereby predicting cell states perturbed with compounds of known polypharmacology. Inferring cell state for specific drug mechanisms could aid researchers in developing and identifying targeted therapeutics and categorizing off-target effects in the future.
Asunto(s)
Aprendizaje Automático , Polifarmacología , AlgoritmosRESUMEN
Stored red blood cells (RBCs) are needed for life-saving blood transfusions, but they undergo continuous degradation. RBC storage lesions are often assessed by microscopic examination or biochemical and biophysical assays, which are complex, time-consuming, and destructive to fragile cells. Here we demonstrate the use of label-free imaging flow cytometry and deep learning to characterize RBC lesions. Using brightfield images, a trained neural network achieved 76.7% agreement with experts in classifying seven clinically relevant RBC morphologies associated with storage lesions, comparable to 82.5% agreement between different experts. Given that human observation and classification may not optimally discern RBC quality, we went further and eliminated subjective human annotation in the training step by training a weakly supervised neural network using only storage duration times. The feature space extracted by this network revealed a chronological progression of morphological changes that better predicted blood quality, as measured by physiological hemolytic assay readouts, than the conventional expert-assessed morphology classification system. With further training and clinical testing across multiple sites, protocols, and instruments, deep learning and label-free imaging flow cytometry might be used to routinely and objectively assess RBC storage lesions. This would automate a complex protocol, minimize laboratory sample handling and preparation, and reduce the impact of procedural errors and discrepancies between facilities and blood donors. The chronology-based machine-learning approach may also improve upon humans' assessment of morphological changes in other biomedically important progressions, such as differentiation and metastasis.
Asunto(s)
Bancos de Sangre , Aprendizaje Profundo , Eritrocitos/citología , HumanosRESUMEN
Segmenting the nuclei of cells in microscopy images is often the first step in the quantitative analysis of imaging data for biological and biomedical applications. Many bioimage analysis tools can segment nuclei in images but need to be selected and configured for every experiment. The 2018 Data Science Bowl attracted 3,891 teams worldwide to make the first attempt to build a segmentation method that could be applied to any two-dimensional light microscopy image of stained nuclei across experiments, with no human interaction. Top participants in the challenge succeeded in this task, developing deep-learning-based models that identified cell nuclei across many image types and experimental conditions without the need to manually adjust segmentation parameters. This represents an important step toward configuration-free bioimage analysis software tools.
Asunto(s)
Núcleo Celular/ultraestructura , Procesamiento de Imagen Asistido por Computador/métodos , Ciencia de los Datos , Humanos , Microscopía Fluorescente/métodosRESUMEN
SUMMARY: Image-based experiments can yield many thousands of individual measurements describing each object of interest, such as cells in microscopy screens. CellProfiler Analyst is a free, open-source software package designed for the exploration of quantitative image-derived data and the training of machine learning classifiers with an intuitive user interface. We have now released CellProfiler Analyst 3.0, which in addition to enhanced performance adds support for neural network classifiers, identifying rare object subsets, and direct transfer of objects of interest from visualization tools into the Classifier tool for use as training data. This release also increases interoperability with the recently released CellProfiler 4, making it easier for users to detect and measure particular classes of objects in their analyses. AVAILABILITY: CellProfiler Analyst binaries for Windows and MacOS are freely available for download at https://cellprofileranalyst.org/. Source code is implemented in Python 3 and is available at https://github.com/CellProfiler/CellProfiler-Analyst/. A sample dataset is available at https://cellprofileranalyst.org/examples, based on images freely available from the Broad Bioimage Benchmark Collection.
Asunto(s)
Aprendizaje Automático , Programas Informáticos , Microscopía/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Redes Neurales de la ComputaciónRESUMEN
Forums and email lists play a major role in assisting scientists in using software. Previously, each open-source bioimaging software package had its own distinct forum or email list. Although each provided access to experts from various software teams, this fragmentation resulted in many scientists not knowing where to begin with their projects. Thus, the scientific imaging community lacked a central platform where solutions could be discussed in an open, software-independent manner. In response, we introduce the Scientific Community Image Forum, where users can pose software-related questions about digital image analysis, acquisition, and data management.
Asunto(s)
Diagnóstico por Imagen/tendencias , Difusión de la Información/métodos , Correo Electrónico , Humanos , Procesamiento de Imagen Asistido por Computador , Internet , Programas Informáticos , Encuestas y CuestionariosRESUMEN
BACKGROUND: Imaging data contains a substantial amount of information which can be difficult to evaluate by eye. With the expansion of high throughput microscopy methodologies producing increasingly large datasets, automated and objective analysis of the resulting images is essential to effectively extract biological information from this data. CellProfiler is a free, open source image analysis program which enables researchers to generate modular pipelines with which to process microscopy images into interpretable measurements. RESULTS: Herein we describe CellProfiler 4, a new version of this software with expanded functionality. Based on user feedback, we have made several user interface refinements to improve the usability of the software. We introduced new modules to expand the capabilities of the software. We also evaluated performance and made targeted optimizations to reduce the time and cost associated with running common large-scale analysis pipelines. CONCLUSIONS: CellProfiler 4 provides significantly improved performance in complex workflows compared to previous versions. This release will ensure that researchers will have continued access to CellProfiler's powerful computational tools in the coming years.