RESUMEN
Understanding how a subset of expressed genes dictates cellular phenotype is a considerable challenge owing to the large numbers of molecules involved, their combinatorics and the plethora of cellular behaviours that they determine1,2. Here we reduced this complexity by focusing on cellular organization-a key readout and driver of cell behaviour3,4-at the level of major cellular structures that represent distinct organelles and functional machines, and generated the WTC-11 hiPSC Single-Cell Image Dataset v1, which contains more than 200,000 live cells in 3D, spanning 25 key cellular structures. The scale and quality of this dataset permitted the creation of a generalizable analysis framework to convert raw image data of cells and their structures into dimensionally reduced, quantitative measurements that can be interpreted by humans, and to facilitate data exploration. This framework embraces the vast cell-to-cell variability that is observed within a normal population, facilitates the integration of cell-by-cell structural data and allows quantitative analyses of distinct, separable aspects of organization within and across different cell populations. We found that the integrated intracellular organization of interphase cells was robust to the wide range of variation in cell shape in the population; that the average locations of some structures became polarized in cells at the edges of colonies while maintaining the 'wiring' of their interactions with other structures; and that, by contrast, changes in the location of structures during early mitotic reorganization were accompanied by changes in their wiring.
Asunto(s)
Células Madre Pluripotentes Inducidas , Espacio Intracelular , Humanos , Células Madre Pluripotentes Inducidas/citología , Análisis de la Célula Individual , Conjuntos de Datos como Asunto , Interfase , Forma de la Célula , Mitosis , Polaridad Celular , Supervivencia CelularRESUMEN
We introduce a framework for end-to-end integrative modeling of 3D single-cell multi-channel fluorescent image data of diverse subcellular structures. We employ stacked conditional ß-variational autoencoders to first learn a latent representation of cell morphology, and then learn a latent representation of subcellular structure localization which is conditioned on the learned cell morphology. Our model is flexible and can be trained on images of arbitrary subcellular structures and at varying degrees of sparsity and reconstruction fidelity. We train our full model on 3D cell image data and explore design trade-offs in the 2D setting. Once trained, our model can be used to predict plausible locations of structures in cells where these structures were not imaged. The trained model can also be used to quantify the variation in the location of subcellular structures by generating plausible instantiations of each structure in arbitrary cell geometries. We apply our trained model to a small drug perturbation screen to demonstrate its applicability to new data. We show how the latent representations of drugged cells differ from unperturbed cells as expected by on-target effects of the drugs.
Asunto(s)
Núcleo Celular/fisiología , Forma de la Célula/fisiología , Células Madre Pluripotentes Inducidas/citología , Espacio Intracelular , Modelos Biológicos , Células Cultivadas , Biología Computacional , Humanos , Imagenología Tridimensional , Espacio Intracelular/química , Espacio Intracelular/metabolismo , Espacio Intracelular/fisiología , Microscopía Fluorescente , Análisis de la Célula IndividualRESUMEN
We performed a comprehensive analysis of the transcriptional changes occurring during human induced pluripotent stem cell (hiPSC) differentiation to cardiomyocytes. Using single cell RNA-seq, we sequenced > 20,000 single cells from 55 independent samples representing two differentiation protocols and multiple hiPSC lines. Samples included experimental replicates ranging from undifferentiated hiPSCs to mixed populations of cells at D90 post-differentiation. Differentiated cell populations clustered by time point, with differential expression analysis revealing markers of cardiomyocyte differentiation and maturation changing from D12 to D90. We next performed a complementary cluster-independent sparse regression analysis to identify and rank genes that best assigned cells to differentiation time points. The two highest ranked genes between D12 and D24 (MYH7 and MYH6) resulted in an accuracy of 0.84, and the three highest ranked genes between D24 and D90 (A2M, H19, IGF2) resulted in an accuracy of 0.94, revealing that low dimensional gene features can identify differentiation or maturation stages in differentiating cardiomyocytes. Expression levels of select genes were validated using RNA FISH. Finally, we interrogated differences in cardiac gene expression resulting from two differentiation protocols, experimental replicates, and three hiPSC lines in the WTC-11 background to identify sources of variation across these experimental variables.
Asunto(s)
Biomarcadores/metabolismo , Diferenciación Celular , Regulación de la Expresión Génica , Células Madre Pluripotentes Inducidas/metabolismo , Miocitos Cardíacos/citología , Miocitos Cardíacos/metabolismo , Transcriptoma , Humanos , Células Madre Pluripotentes Inducidas/citología , RNA-SeqRESUMEN
Although some cell types may be defined anatomically or by physiological function, a rigorous definition of cell state remains elusive. Here, we develop a quantitative, imaging-based platform for the systematic and automated classification of subcellular organization in single cells. We use this platform to quantify subcellular organization and gene expression in >30,000 individual human induced pluripotent stem cell-derived cardiomyocytes, producing a publicly available dataset that describes the population distributions of local and global sarcomere organization, mRNA abundance, and correlations between these traits. While the mRNA abundance of some phenotypically important genes correlates with subcellular organization (e.g., the beta-myosin heavy chain, MYH7), these two cellular metrics are heterogeneous and often uncorrelated, which suggests that gene expression alone is not sufficient to classify cell states. Instead, we posit that cell state should be defined by observing full distributions of quantitative, multidimensional traits in single cells that also account for space, time, and function.
Asunto(s)
Células Madre Pluripotentes Inducidas , Diferenciación Celular/genética , Humanos , Miocitos Cardíacos/metabolismo , Transcriptoma/genéticaRESUMEN
Characterizing the tissue-specific binding sites of transcription factors (TFs) is essential to reconstruct gene regulatory networks and predict functions for non-coding genetic variation. DNase-seq footprinting enables the prediction of genome-wide binding sites for hundreds of TFs simultaneously. Despite the public availability of high-quality DNase-seq data from hundreds of samples, a comprehensive, up-to-date resource for the locations of genomic footprints is lacking. Here, we develop a scalable footprinting workflow using two state-of-the-art algorithms: Wellington and HINT. We apply our workflow to detect footprints in 192 ENCODE DNase-seq experiments and predict the genomic occupancy of 1,515 human TFs in 27 human tissues. We validate that these footprints overlap true-positive TF binding sites from ChIP-seq. We demonstrate that the locations, depth, and tissue specificity of footprints predict effects of genetic variants on gene expression and capture a substantial proportion of genetic risk for complex traits.
Asunto(s)
Sitios de Unión/genética , Desoxirribonucleasas/metabolismo , Genómica/métodos , Factores de Transcripción/metabolismo , HumanosRESUMEN
Motivated by the extremely high computing costs associated with estimates of free energies for biological systems using molecular simulations, we further the exploration of existing "belief propagation" (BP) algorithms for fixed-backbone peptide and protein systems. The precalculation of pairwise interactions among discretized libraries of side-chain conformations, along with representation of protein side chains as nodes in a graphical model, enables direct application of the BP approach, which requires only â¼1 s of single-processor run time after the precalculation stage. We use a "loopy BP" algorithm, which can be seen as an approximate generalization of the transfer-matrix approach to highly connected (i.e., loopy) graphs, and it has previously been applied to protein calculations. We examine the application of loopy BP to several peptides as well as the binding site of the T4 lysozyme L99A mutant. The present study reports on (i) the comparison of the approximate BP results with estimates from unbiased estimators based on the Amber99SB force field; (ii) investigation of the effects of varying library size on BP predictions; and (iii) a theoretical discussion of the discretization effects that can arise in BP calculations. The data suggest that, despite their approximate nature, BP free-energy estimates are highly accurate-indeed, they never fall outside confidence intervals from unbiased estimators for the systems where independent results could be obtained. Furthermore, we find that libraries of sufficiently fine discretization (which diminish library-size sensitivity) can be obtained with standard computing resources in most cases. Altogether, the extremely low computing times and accurate results suggest the BP approach warrants further study.
Asunto(s)
Algoritmos , Simulación de Dinámica Molecular , Péptidos/química , Proteínas/química , TermodinámicaRESUMEN
The ATP synthase (F-ATPase) is a highly complex rotary machine that synthesizes ATP, powered by a proton electrochemical gradient. Why did evolution select such an elaborate mechanism over arguably simpler alternating-access processes that can be reversed to perform ATP synthesis? We studied a systematic enumeration of alternative mechanisms, using numerical and theoretical means. When the alternative models are optimized subject to fundamental thermodynamic constraints, they fail to match the kinetic ability of the rotary mechanism over a wide range of conditions, particularly under low-energy conditions. We used a physically interpretable, closed-form solution for the steady-state rate for an arbitrary chemical cycle, which clarifies kinetic effects of complex free-energy landscapes. Our analysis also yields insights into the debated "kinetic equivalence" of ATP synthesis driven by transmembrane pH and potential difference. Overall, our study suggests that the complexity of the F-ATPase may have resulted from positive selection for its kinetic advantage.