RESUMO
Analysis across a growing number of single-cell perturbation datasets is hampered by poor data interoperability. To facilitate development and benchmarking of computational methods, we collect a set of 44 publicly available single-cell perturbation-response datasets with molecular readouts, including transcriptomics, proteomics and epigenomics. We apply uniform quality control pipelines and harmonize feature annotations. The resulting information resource, scPerturb, enables development and testing of computational methods, and facilitates comparison and integration across datasets. We describe energy statistics (E-statistics) for quantification of perturbation effects and significance testing, and demonstrate E-distance as a general distance measure between sets of single-cell expression profiles. We illustrate the application of E-statistics for quantifying similarity and efficacy of perturbations. The perturbation-response datasets and E-statistics computation software are publicly available at scperturb.org. This work provides an information resource for researchers working with single-cell perturbation data and recommendations for experimental design, including optimal cell counts and read depth.
Assuntos
Proteômica , Software , Perfilação da Expressão Gênica/métodos , Epigenômica , Análise de Célula ÚnicaRESUMO
For large libraries of small molecules, exhaustive combinatorial chemical screens become infeasible to perform when considering a range of disease models, assay conditions, and dose ranges. Deep learning models have achieved state-of-the-art results in silico for the prediction of synergy scores. However, databases of drug combinations are biased toward synergistic agents and results do not generalize out of distribution. During 5 rounds of experimentation, we employ sequential model optimization with a deep learning model to select drug combinations increasingly enriched for synergism and active against a cancer cell line-evaluating only â¼5% of the total search space. Moreover, we find that learned drug embeddings (using structural information) begin to reflect biological mechanisms. In silico benchmarking suggests search queries are â¼5-10× enriched for highly synergistic drug combinations by using sequential rounds of evaluation when compared with random selection or â¼3× when using a pretrained model.
Assuntos
Biologia Computacional , Neoplasias , Humanos , Sinergismo Farmacológico , Biologia Computacional/métodos , Combinação de Medicamentos , Neoplasias/tratamento farmacológicoRESUMO
Cellular differentiation requires extensive alterations in chromatin structure and function, which is elicited by the coordinated action of chromatin and transcription factors. By contrast with transcription factors, the roles of chromatin factors in differentiation have not been systematically characterized. Here, we combine bulk ex vivo and single-cell in vivo CRISPR screens to characterize the role of chromatin factor families in hematopoiesis. We uncover marked lineage specificities for 142 chromatin factors, revealing functional diversity among related chromatin factors (i.e. barrier-to-autointegration factor subcomplexes) as well as shared roles for unrelated repressive complexes that restrain excessive myeloid differentiation. Using epigenetic profiling, we identify functional interactions between lineage-determining transcription factors and several chromatin factors that explain their lineage dependencies. Studying chromatin factor functions in leukemia, we show that leukemia cells engage homeostatic chromatin factor functions to block differentiation, generating specific chromatin factor-transcription factor interactions that might be therapeutically targeted. Together, our work elucidates the lineage-determining properties of chromatin factors across normal and malignant hematopoiesis.
Assuntos
Cromatina , Leucemia , Humanos , Cromatina/genética , Linhagem da Célula/genética , Hematopoese/genética , Diferenciação Celular/genética , Fatores de Transcrição/genéticaRESUMO
MOTIVATION: A common strategy to infer and quantify interactions between components of a biological system is to deduce them from the network's response to targeted perturbations. Such perturbation experiments are often challenging and costly. Therefore, optimizing the experimental design is essential to achieve a meaningful characterization of biological networks. However, it remains difficult to predict which combination of perturbations allows to infer specific interaction strengths in a given network topology. Yet, such a description of identifiability is necessary to select perturbations that maximize the number of inferable parameters. RESULTS: We show analytically that the identifiability of network parameters can be determined by an intuitive maximum-flow problem. Furthermore, we used the theory of matroids to describe identifiability relationships between sets of parameters in order to build identifiable effective network models. Collectively, these results allowed to device strategies for an optimal design of the perturbation experiments. We benchmarked these strategies on a database of human pathways. Remarkably, full network identifiability was achieved, on average, with less than a third of the perturbations that are needed in a random experimental design. Moreover, we determined perturbation combinations that additionally decreased experimental effort compared to single-target perturbations. In summary, we provide a framework that allows to infer a maximal number of interaction strengths with a minimal number of perturbation experiments. AVAILABILITY AND IMPLEMENTATION: IdentiFlow is available at github.com/GrossTor/IdentiFlow. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Modelos Biológicos , Projetos de Pesquisa , HumanosRESUMO
MOTIVATION: A major challenge in molecular and cellular biology is to map out the regulatory networks of cells. As regulatory interactions can typically not be directly observed experimentally, various computational methods have been proposed to disentangling direct and indirect effects. Most of these rely on assumptions that are rarely met or cannot be adapted to a given context. RESULTS: We present a network inference method that is based on a simple response logic with minimal presumptions. It requires that we can experimentally observe whether or not some of the system's components respond to perturbations of some other components, and then identifies the directed networks that most accurately account for the observed propagation of the signal. To cope with the intractable number of possible networks, we developed a logic programming approach that can infer networks of hundreds of nodes, while being robust to noisy, heterogeneous or missing data. This allows to directly integrate prior network knowledge and additional constraints such as sparsity. We systematically benchmark our method on KEGG pathways, and show that it outperforms existing approaches in DREAM3 and DREAM4 challenges. Applied to a novel perturbation dataset on PI3K and MAPK pathways in isogenic models of a colon cancer cell line, it generates plausible network hypotheses that explain distinct sensitivities toward various targeted inhibitors due to different PI3K mutants. AVAILABILITY AND IMPLEMENTATION: A Python/Answer Set Programming implementation can be accessed at github.com/GrossTor/response-logic. Data and analysis scripts are available at github.com/GrossTor/response-logic-projects. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Lógica , Algoritmos , Biologia ComputacionalRESUMO
Motivation: Intracellular signalling is realized by complex signalling networks, which are almost impossible to understand without network models, especially if feedbacks are involved. Modular Response Analysis (MRA) is a convenient modelling method to study signalling networks in various contexts. Results: We developed the software package STASNet (STeady-STate Analysis of Signalling Networks) that provides an augmented and extended version of MRA suited to model signalling networks from incomplete perturbation schemes and multi-perturbation data. Using data from the Dialogue on Reverse Engineering Assessment and Methods challenge, we show that predictions from STASNet models are among the top-performing methods. We applied the method to study the effect of SHP2, a protein that has been implicated in resistance to targeted therapy in colon cancer, using a novel dataset from the colon cancer cell line Widr and a SHP2-depleted derivative. We find that SHP2 is required for mitogen-activated protein kinase signalling, whereas AKT signalling only partially depends on SHP2. Availability and implementation: An R-package is available at https://github.com/molsysbio/STASNet. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Transdução de Sinais , Software , Linhagem Celular Tumoral , Neoplasias do Colo , Biologia Computacional , Humanos , Proteína Tirosina Fosfatase não Receptora Tipo 11/genéticaRESUMO
Non-coding RNAs are ubiquitous, but the discovery of new RNA gene sequences far outpaces the research on the structure and functional interactions of these RNA gene sequences. We mine the evolutionary sequence record to derive precise information about the function and structure of RNAs and RNA-protein complexes. As in protein structure prediction, we use maximum entropy global probability models of sequence co-variation to infer evolutionarily constrained nucleotide-nucleotide interactions within RNA molecules and nucleotide-amino acid interactions in RNA-protein complexes. The predicted contacts allow all-atom blinded 3D structure prediction at good accuracy for several known RNA structures and RNA-protein complexes. For unknown structures, we predict contacts in 160 non-coding RNA families. Beyond 3D structure prediction, evolutionary couplings help identify important functional interactions-e.g., at switch points in riboswitches and at a complex nucleation site in HIV. Aided by increasing sequence accumulation, evolutionary coupling analysis can accelerate the discovery of functional interactions and 3D structures involving RNA.
Assuntos
Conformação de Ácido Nucleico , RNA não Traduzido/química , Entropia , Evolução Molecular , Modelos Moleculares , Dobramento de RNA , RNA não Traduzido/genética , RNA não Traduzido/metabolismo , Proteínas de Ligação a RNA/química , Proteínas de Ligação a RNA/metabolismo , Ribossomos/metabolismoRESUMO
We describe the emergence and interactions of breather modes and resonant wave modes within a two-dimensional ringlike oscillator chain in a microcanonical situation. Our analytical results identify different dynamical regimes characterized by the potential dominance of either type of mode. The chain is initially placed in a metastable state, which it can leave by passing over the brim of the applied Mexican-hat-like potential. We elucidate the influence of the different wave modes on the mean-first passage time. A central finding is that also in this complex potential landscape a fast noise-free escape scenario solely relying on nonlinear cooperative effects is accomplishable even in a low-energy setting.