Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 103
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Bioinformatics ; 39(6)2023 06 01.
Article in English | MEDLINE | ID: mdl-37289551

ABSTRACT

MOTIVATION: Mathematical models of biological processes altered in cancer are built using the knowledge of complex networks of signaling pathways, detailing the molecular regulations inside different cell types, such as tumor cells, immune and other stromal cells. If these models mainly focus on intracellular information, they often omit a description of the spatial organization among cells and their interactions, and with the tumoral microenvironment. RESULTS: We present here a model of tumor cell invasion simulated with PhysiBoSS, a multiscale framework, which combines agent-based modeling and continuous time Markov processes applied on Boolean network models. With this model, we aim to study the different modes of cell migration and to predict means to block it by considering not only spatial information obtained from the agent-based simulation but also intracellular regulation obtained from the Boolean model.Our multiscale model integrates the impact of gene mutations with the perturbation of the environmental conditions and allows the visualization of the results with 2D and 3D representations. The model successfully reproduces single and collective migration processes and is validated on published experiments on cell invasion. In silico experiments are suggested to search for possible targets that can block the more invasive tumoral phenotypes. AVAILABILITY AND IMPLEMENTATION: https://github.com/sysbio-curie/Invasion_model_PhysiBoSS.


Subject(s)
Models, Biological , Models, Theoretical , Humans , Computer Simulation , Signal Transduction/genetics , Neoplasm Invasiveness , Tumor Microenvironment
2.
BMC Bioinformatics ; 24(1): 83, 2023 Mar 06.
Article in English | MEDLINE | ID: mdl-36879200

ABSTRACT

BACKGROUND: Exploring the function or the developmental history of cells in various organisms provides insights into a given cell type's core molecular characteristics and putative evolutionary mechanisms. Numerous computational methods now exist for analyzing single-cell data and identifying cell states. These methods mostly rely on the expression of genes considered as markers for a given cell state. Yet, there is a lack of scRNA-seq computational tools to study the evolution of cell states, particularly how cell states change their molecular profiles. This can include novel gene activation or the novel deployment of programs already existing in other cell types, known as co-option. RESULTS: Here we present scEvoNet, a Python tool for predicting cell type evolution in cross-species or cancer-related scRNA-seq datasets. ScEvoNet builds the confusion matrix of cell states and a bipartite network connecting genes and cell states. It allows a user to obtain a set of genes shared by the characteristic signature of two cell states even between distantly-related datasets. These genes can be used as indicators of either evolutionary divergence or co-option occurring during organism or tumor evolution. Our results on cancer and developmental datasets indicate that scEvoNet is a helpful tool for the initial screening of such genes as well as for measuring cell state similarities. CONCLUSION: The scEvoNet package is implemented in Python and is freely available from https://github.com/monsoro/scEvoNet . Utilizing this framework and exploring the continuum of transcriptome states between developmental stages and species will help explain cell state dynamics.


Subject(s)
Single-Cell Gene Expression Analysis , Software , Transcriptome , Computational Biology
3.
Bioinformatics ; 38(4): 1045-1051, 2022 01 27.
Article in English | MEDLINE | ID: mdl-34871374

ABSTRACT

MOTIVATION: Single-cell RNA-seq (scRNAseq) datasets are characterized by large ambient dimensionality, and their analyses can be affected by various manifestations of the dimensionality curse. One of these manifestations is the hubness phenomenon, i.e. existence of data points with surprisingly large incoming connectivity degree in the datapoint neighbourhood graph. Conventional approach to dampen the unwanted effects of high dimension consists in applying drastic dimensionality reduction. It remains unexplored if this step can be avoided thus retaining more information than contained in the low-dimensional projections, by correcting directly hubness. RESULTS: We investigated hubness in scRNAseq data. We show that hub cells do not represent any visible technical or biological bias. The effect of various hubness reduction methods is investigated with respect to the clustering, trajectory inference and visualization tasks in scRNAseq datasets. We show that hubness reduction generates neighbourhood graphs with properties more suitable for applying machine learning methods; and that it outperforms other state-of-the-art methods for improving neighbourhood graphs. As a consequence, clustering, trajectory inference and visualization perform better, especially for datasets characterized by large intrinsic dimensionality. Hubness is an important phenomenon characterizing data point neighbourhood graphs computed for various types of sequencing datasets. Reducing hubness can be beneficial for the analysis of scRNAseq data with large intrinsic dimensionality in which case it can be an alternative to drastic dimensionality reduction. AVAILABILITY AND IMPLEMENTATION: The code used to analyze the datasets and produce the figures of this article is available from https://github.com/sysbio-curie/schubness. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Single-Cell Analysis , Transcriptome , Gene Expression Profiling , Sequence Analysis, RNA , Cluster Analysis
4.
Bioinformatics ; 38(10): 2963-2964, 2022 05 13.
Article in English | MEDLINE | ID: mdl-35561190

ABSTRACT

SUMMARY: We developed BIODICA, an integrated computational environment for application of independent component analysis (ICA) to bulk and single-cell molecular profiles, interpretation of the results in terms of biological functions and correlation with metadata. The computational core is the novel Python package stabilized-ica which provides interface to several ICA algorithms, a stabilization procedure, meta-analysis and component interpretation tools. BIODICA is equipped with a user-friendly graphical user interface, allowing non-experienced users to perform the ICA-based omics data analysis. The results are provided in interactive ways, thus facilitating communication with biology experts. AVAILABILITY AND IMPLEMENTATION: BIODICA is implemented in Java, Python and JavaScript. The source code is freely available on GitHub under the MIT and the GNU LGPL licenses. BIODICA is supported on all major operating systems. URL: https://sysbio-curie.github.io/biodica-environment/.


Subject(s)
Algorithms , Software , Computational Biology/methods , Metadata
5.
Adv Exp Med Biol ; 1385: 259-279, 2022.
Article in English | MEDLINE | ID: mdl-36352218

ABSTRACT

In recent cancer genomics programs, large-scale profiling of microRNAs has been routinely used in order to better understand the role of microRNAs in gene regulation and disease. To support the analysis of such amount of data, scalability of bioinformatics pipelines is increasingly important to handle larger datasets.Here, we describe a scalable implementation of the clustered miRNA Master Regulator Analysis (clustMMRA) pipeline, developed to search for genomic clusters of microRNAs potentially driving cancer molecular subtyping. Genomically clustered microRNAs can be simultaneously expressed to work in a combined manner and jointly regulate cell phenotypes. However, the majority of computational approaches for the identification of microRNA master regulators are typically designed to detect the regulatory effect of a single microRNA.We have applied the clustMMRA pipeline to multiple pediatric tumor datasets, up to a hundred samples in size, demonstrating very satisfying performances of the software on large datasets. Results have highlighted genomic clusters of microRNAs potentially involved in several subgroups of the different pediatric cancers or specifically involved in the phenotype of a subgroup. In particular, we confirmed the cluster of microRNAs at the 14q32 locus to be involved in multiple pediatric cancers, showing its specific downregulation in tumor subgroups with aggressive phenotype.


Subject(s)
MicroRNAs , Neoplasms , Humans , MicroRNAs/genetics , Gene Expression Profiling/methods , Neoplasms/genetics , Cluster Analysis , Gene Expression Regulation , Computational Biology , Gene Expression Regulation, Neoplastic
6.
Entropy (Basel) ; 25(1)2022 Dec 24.
Article in English | MEDLINE | ID: mdl-36673174

ABSTRACT

Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the problem of divergence (or shift) between the labeled training and validation datasets (source domain) and a potentially large unlabeled dataset (target domain). The task is to embed both datasets into a common space in which the source dataset is informative for training while the divergence between source and target is minimized. The most popular domain adaptation solutions are based on training neural networks that combine classification and adversarial learning modules, frequently making them both data-hungry and difficult to train. We present a method called Domain Adaptation Principal Component Analysis (DAPCA) that identifies a linear reduced data representation useful for solving the domain adaptation task. DAPCA algorithm introduces positive and negative weights between pairs of data points, and generalizes the supervised extension of principal component analysis. DAPCA is an iterative algorithm that solves a simple quadratic optimization problem at each iteration. The convergence of the algorithm is guaranteed, and the number of iterations is small in practice. We validate the suggested algorithm on previously proposed benchmarks for solving the domain adaptation task. We also show the benefit of using DAPCA in analyzing single-cell omics datasets in biomedical applications. Overall, DAPCA can serve as a practical preprocessing step in many machine learning applications leading to reduced dataset representations, taking into account possible divergence between source and target domains.

7.
Brief Bioinform ; 20(4): 1238-1249, 2019 07 19.
Article in English | MEDLINE | ID: mdl-29237040

ABSTRACT

Mathematical models can serve as a tool to formalize biological knowledge from diverse sources, to investigate biological questions in a formal way, to test experimental hypotheses, to predict the effect of perturbations and to identify underlying mechanisms. We present a pipeline of computational tools that performs a series of analyses to explore a logical model's properties. A logical model of initiation of the metastatic process in cancer is used as a transversal example. We start by analysing the structure of the interaction network constructed from the literature or existing databases. Next, we show how to translate this network into a mathematical object, specifically a logical model, and how robustness analyses can be applied to it. We explore the visualization of the stable states, defined as specific attractors of the model, and match them to cellular fates or biological read-outs. With the different tools we present here, we explain how to assign to each solution of the model a probability and how to identify genetic interactions using mutant phenotype probabilities. Finally, we connect the model to relevant experimental data: we present how some data analyses can direct the construction of the network, and how the solutions of a mathematical model can also be compared with experimental data, with a particular focus on high-throughput data in cancer biology. A step-by-step tutorial is provided as a Supplementary Material and all models, tools and scripts are provided on an accompanying website: https://github.com/sysbio-curie/Logical_modelling_pipeline.


Subject(s)
Models, Biological , Signal Transduction , Computational Biology/methods , Computer Simulation , Databases, Factual , Disease , Epistasis, Genetic , Gene Regulatory Networks , Humans , Logistic Models , Mathematical Concepts , Metabolic Networks and Pathways , Mutation , Neoplasm Metastasis/genetics , Neoplasm Metastasis/pathology , Neoplasm Metastasis/physiopathology , Software , Systems Biology/statistics & numerical data
8.
Brief Bioinform ; 20(2): 701-716, 2019 03 25.
Article in English | MEDLINE | ID: mdl-29726961

ABSTRACT

Cancer initiation and progression are associated with multiple molecular mechanisms. The knowledge of these mechanisms is expanding and should be converted into guidelines for tackling the disease. Here, we discuss the formalization of biological knowledge into a comprehensive resource: the Atlas of Cancer Signalling Network (ACSN) and the Google Maps-based tool NaviCell, which supports map navigation. The application of ACSN for omics data visualization, in the context of signalling maps, is possible via the NaviCell Web Service module and through the NaviCom tool. It allows generation of network-based molecular portraits of cancer using multilevel omics data. We review how these resources and tools are applied for cancer preclinical studies. Structural analysis of the maps together with omics data helps to rationalize the synergistic effects of drugs and allows design of complex disease stage-specific druggable interventions. The use of ACSN modules and maps as signatures of biological functions can help in cancer data analysis and interpretation. In addition, they empowered finding of associations between perturbations in particular molecular mechanisms and the risk to develop a specific type of cancer. These approaches are helpful, among others, to study the interplay between molecular mechanisms of cancer. It opens an opportunity to decipher how gene interactions govern the hallmarks of cancer in specific contexts. We discuss a perspective to develop a flexible methodology and a pipeline to enable systematic omics data analysis in the context of signalling network maps, for stratifying patients and suggesting interventions points and drug repositioning in cancer and other diseases.


Subject(s)
Atlases as Topic , Neoplasms/metabolism , Signal Transduction , Computational Biology/methods , Humans , Neoplasms/genetics
9.
Brief Bioinform ; 20(2): 659-670, 2019 03 25.
Article in English | MEDLINE | ID: mdl-29688273

ABSTRACT

The Disease Maps Project builds on a network of scientific and clinical groups that exchange best practices, share information and develop systems biomedicine tools. The project aims for an integrated, highly curated and user-friendly platform for disease-related knowledge. The primary focus of disease maps is on interconnected signaling, metabolic and gene regulatory network pathways represented in standard formats. The involvement of domain experts ensures that the key disease hallmarks are covered and relevant, up-to-date knowledge is adequately represented. Expert-curated and computer readable, disease maps may serve as a compendium of knowledge, allow for data-supported hypothesis generation or serve as a scaffold for the generation of predictive mathematical models. This article summarizes the 2nd Disease Maps Community meeting, highlighting its important topics and outcomes. We outline milestones on the roadmap for the future development of disease maps, including creating and maintaining standardized disease maps; sharing parts of maps that encode common human disease mechanisms; providing technical solutions for complexity management of maps; and Web tools for in-depth exploration of such maps. A dedicated discussion was focused on mathematical modeling approaches, as one of the main goals of disease map development is the generation of mathematically interpretable representations to predict disease comorbidity or drug response and to suggest drug repositioning, altogether supporting clinical decisions.


Subject(s)
Gene Regulatory Networks , Genetic Predisposition to Disease , Computational Biology , Humans , Models, Statistical , Translational Research, Biomedical
10.
Bioinformatics ; 36(8): 2620-2622, 2020 04 15.
Article in English | MEDLINE | ID: mdl-31904823

ABSTRACT

MOTIVATION: CellDesigner is a well-established biological map editor used in many large-scale scientific efforts. However, the interoperability between the Systems Biology Graphical Notation (SBGN) Markup Language (SBGN-ML) and the CellDesigner's proprietary Systems Biology Markup Language (SBML) extension formats remains a challenge due to the proprietary extensions used in CellDesigner files. RESULTS: We introduce a library named cd2sbgnml and an associated web service for bidirectional conversion between CellDesigner's proprietary SBML extension and SBGN-ML formats. We discuss the functionality of the cd2sbgnml converter, which was successfully used for the translation of comprehensive large-scale diagrams such as the RECON Human Metabolic network and the complete Atlas of Cancer Signalling Network, from the CellDesigner file format into SBGN-ML. AVAILABILITY AND IMPLEMENTATION: The cd2sbgnml conversion library and the web service were developed in Java, and distributed under the GNU Lesser General Public License v3.0. The sources along with a set of examples are available on GitHub (https://github.com/sbgn/cd2sbgnml and https://github.com/sbgn/cd2sbgnml-webservice, respectively). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Software , Systems Biology , Humans , Metabolic Networks and Pathways , Signal Transduction
11.
PLoS Comput Biol ; 16(2): e1007652, 2020 02.
Article in English | MEDLINE | ID: mdl-32069277

ABSTRACT

English Wikipedia, containing more than five millions articles, has approximately eleven thousands web pages devoted to proteins or genes most of which were generated by the Gene Wiki project. These pages contain information about interactions between proteins and their functional relationships. At the same time, they are interconnected with other Wikipedia pages describing biological functions, diseases, drugs and other topics curated by independent, not coordinated collective efforts. Therefore, Wikipedia contains a directed network of protein functional relations or physical interactions embedded into the global network of the encyclopedia terms, which defines hidden (indirect) functional proximity between proteins. We applied the recently developed reduced Google Matrix (REGOMAX) algorithm in order to extract the network of hidden functional connections between proteins in Wikipedia. In this network we discovered tight communities which reflect areas of interest in molecular biology or medicine and can be considered as definitions of biological functions shaped by collective intelligence. Moreover, by comparing two snapshots of Wikipedia graph (from years 2013 and 2017), we studied the evolution of the network of direct and hidden protein connections. We concluded that the hidden connections are more dynamic compared to the direct ones and that the size of the hidden interaction communities grows with time. We recapitulate the results of Wikipedia protein community analysis and annotation in the form of an interactive online map, which can serve as a portal to the Gene Wiki project.


Subject(s)
Biological Phenomena , Computational Biology/methods , Protein Interaction Mapping , Proteins/chemistry , Search Engine , Algorithms , Cluster Analysis , Databases, Genetic , Internet , Markov Chains , Probability
12.
Nucleic Acids Res ; 47(5): 2205-2215, 2019 03 18.
Article in English | MEDLINE | ID: mdl-30657980

ABSTRACT

MicroRNAs play important roles in many biological processes. Their aberrant expression can have oncogenic or tumor suppressor function directly participating to carcinogenesis, malignant transformation, invasiveness and metastasis. Indeed, miRNA profiles can distinguish not only between normal and cancerous tissue but they can also successfully classify different subtypes of a particular cancer. Here, we focus on a particular class of transcripts encoding polycistronic miRNA genes that yields multiple miRNA components. We describe 'clustered MiRNA Master Regulator Analysis (ClustMMRA)', a fully redesigned release of the MMRA computational pipeline (MiRNA Master Regulator Analysis), developed to search for clustered miRNAs potentially driving cancer molecular subtyping. Genomically clustered miRNAs are frequently co-expressed to target different components of pro-tumorigenic signaling pathways. By applying ClustMMRA to breast cancer patient data, we identified key miRNA clusters driving the phenotype of different tumor subgroups. The pipeline was applied to two independent breast cancer datasets, providing statistically concordant results between the two analyses. We validated in cell lines the miR-199/miR-214 as a novel cluster of miRNAs promoting the triple negative breast cancer (TNBC) phenotype through its control of proliferation and EMT.


Subject(s)
Epithelial-Mesenchymal Transition/genetics , MicroRNAs/genetics , Multigene Family/genetics , Triple Negative Breast Neoplasms/genetics , Triple Negative Breast Neoplasms/pathology , Cell Line, Tumor , Cell Proliferation , Datasets as Topic , Gene Silencing , Humans , Neoplasm Invasiveness/genetics , Reproducibility of Results , Triple Negative Breast Neoplasms/classification
13.
Nucleic Acids Res ; 47(D1): D614-D624, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30371894

ABSTRACT

A multitude of factors contribute to complex diseases and can be measured with 'omics' methods. Databases facilitate data interpretation for underlying mechanisms. Here, we describe the Virtual Metabolic Human (VMH, www.vmh.life) database encapsulating current knowledge of human metabolism within five interlinked resources 'Human metabolism', 'Gut microbiome', 'Disease', 'Nutrition', and 'ReconMaps'. The VMH captures 5180 unique metabolites, 17 730 unique reactions, 3695 human genes, 255 Mendelian diseases, 818 microbes, 632 685 microbial genes and 8790 food items. The VMH's unique features are (i) the hosting of the metabolic reconstructions of human and gut microbes amenable for metabolic modeling; (ii) seven human metabolic maps for data visualization; (iii) a nutrition designer; (iv) a user-friendly webpage and application-programming interface to access its content; (v) user feedback option for community engagement and (vi) the connection of its entities to 57 other web resources. The VMH represents a novel, interdisciplinary database for data interpretation and hypothesis generation to the biomedical community.


Subject(s)
Databases, Genetic , Gastrointestinal Microbiome , Genomics/methods , Metabolome , Metabolomics/methods , Genome, Human , Host-Pathogen Interactions , Humans , Software
14.
Entropy (Basel) ; 23(10)2021 Oct 19.
Article in English | MEDLINE | ID: mdl-34682092

ABSTRACT

Dealing with uncertainty in applications of machine learning to real-life data critically depends on the knowledge of intrinsic dimensionality (ID). A number of methods have been suggested for the purpose of estimating ID, but no standard package to easily apply them one by one or all at once has been implemented in Python. This technical note introduces scikit-dimension, an open-source Python package for intrinsic dimension estimation. The scikit-dimension package provides a uniform implementation of most of the known ID estimators based on the scikit-learn application programming interface to evaluate the global and local intrinsic dimension, as well as generators of synthetic toy and benchmark datasets widespread in the literature. The package is developed with tools assessing the code quality, coverage, unit testing and continuous integration. We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation for real-life and synthetic data.

15.
BMC Bioinformatics ; 21(1): 241, 2020 Jun 11.
Article in English | MEDLINE | ID: mdl-32527218

ABSTRACT

BACKGROUND: Solutions to stochastic Boolean models are usually estimated by Monte Carlo simulations, but as the state space of these models can be enormous, there is an inherent uncertainty about the accuracy of Monte Carlo estimates and whether simulations have reached all attractors. Moreover, these models have timescale parameters (transition rates) that the probability values of stationary solutions depend on in complex ways, raising the necessity of parameter sensitivity analysis. We address these two issues by an exact calculation method for this class of models. RESULTS: We show that the stationary probability values of the attractors of stochastic (asynchronous) continuous time Boolean models can be exactly calculated. The calculation does not require Monte Carlo simulations, instead it uses graph theoretical and matrix calculation methods previously applied in the context of chemical kinetics. In this version of the asynchronous updating framework the states of a logical model define a continuous time Markov chain and for a given initial condition the stationary solution is fully defined by the right and left nullspace of the master equation's kinetic matrix. We use topological sorting of the state transition graph and the dependencies between the nullspaces and the kinetic matrix to derive the stationary solution without simulations. We apply this calculation to several published Boolean models to analyze the under-explored question of the effect of transition rates on the stationary solutions and show they can be sensitive to parameter changes. The analysis distinguishes processes robust or, alternatively, sensitive to parameter values, providing both methodological and biological insights. CONCLUSION: Up to an intermediate size (the biggest model analyzed is 23 nodes) stochastic Boolean models can be efficiently solved by an exact matrix method, without using Monte Carlo simulations. Sensitivity analysis with respect to the model's timescale parameters often reveals a small subset of all parameters that primarily determine the stationary probability of attractor states.


Subject(s)
Models, Biological , Monte Carlo Method , Stochastic Processes
16.
Bioinformatics ; 35(7): 1188-1196, 2019 04 01.
Article in English | MEDLINE | ID: mdl-30169736

ABSTRACT

MOTIVATION: Due to the complexity and heterogeneity of multicellular biological systems, mathematical models that take into account cell signalling, cell population behaviour and the extracellular environment are particularly helpful. We present PhysiBoSS, an open source software which combines intracellular signalling using Boolean modelling (MaBoSS) and multicellular behaviour using agent-based modelling (PhysiCell). RESULTS: PhysiBoSS provides a flexible and computationally efficient framework to explore the effect of environmental and genetic alterations of individual cells at the population level, bridging the critical gap from single-cell genotype to single-cell phenotype and emergent multicellular behaviour. PhysiBoSS thus becomes very useful when studying heterogeneous population response to treatment, mutation effects, different modes of invasion or isomorphic morphogenesis events. To concretely illustrate a potential use of PhysiBoSS, we studied heterogeneous cell fate decisions in response to TNF treatment. We explored the effect of different treatments and the behaviour of several resistant mutants. We highlighted the importance of spatial information on the population dynamics by considering the effect of competition for resources like oxygen. AVAILABILITY AND IMPLEMENTATION: PhysiBoSS is freely available on GitHub (https://github.com/sysbio-curie/PhysiBoSS), with a Docker image (https://hub.docker.com/r/gletort/physiboss/). It is distributed as open source under the BSD 3-clause license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Models, Genetic , Signal Transduction , Software , Genotype , Humans , Signal Transduction/genetics , Systems Analysis
17.
Bioinformatics ; 35(21): 4307-4313, 2019 11 01.
Article in English | MEDLINE | ID: mdl-30938767

ABSTRACT

MOTIVATION: Matrix factorization (MF) methods are widely used in order to reduce dimensionality of transcriptomic datasets to the action of few hidden factors (metagenes). MF algorithms have never been compared based on the between-datasets reproducibility of their outputs in similar independent datasets. Lack of this knowledge might have a crucial impact when generalizing the predictions made in a study to others. RESULTS: We systematically test widely used MF methods on several transcriptomic datasets collected from the same cancer type (14 colorectal, 8 breast and 4 ovarian cancer transcriptomic datasets). Inspired by concepts of evolutionary bioinformatics, we design a novel framework based on Reciprocally Best Hit (RBH) graphs in order to benchmark the MF methods for their ability to produce generalizable components. We show that a particular protocol of application of independent component analysis (ICA), accompanied by a stabilization procedure, leads to a significant increase in the between-datasets reproducibility. Moreover, we show that the signals detected through this method are systematically more interpretable than those of other standard methods. We developed a user-friendly tool for performing the Stabilized ICA-based RBH meta-analysis. We apply this methodology to the study of colorectal cancer (CRC) for which 14 independent transcriptomic datasets can be collected. The resulting RBH graph maps the landscape of interconnected factors associated to biological processes or to technological artifacts. These factors can be used as clinical biomarkers or robust and tumor-type specific transcriptomic signatures of tumoral cells or tumoral microenvironment. Their intensities in different samples shed light on the mechanistic basis of CRC molecular subtyping. AVAILABILITY AND IMPLEMENTATION: The RBH construction tool is available from http://goo.gl/DzpwYp. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Transcriptome , Algorithms , Breast Neoplasms , Gene Expression Profiling , Humans , Reproducibility of Results , Tumor Microenvironment
18.
Entropy (Basel) ; 22(11)2020 Nov 11.
Article in English | MEDLINE | ID: mdl-33287042

ABSTRACT

Construction of graph-based approximations for multi-dimensional data point clouds is widely used in a variety of areas. Notable examples of applications of such approximators are cellular trajectory inference in single-cell data analysis, analysis of clinical trajectories from synchronic datasets, and skeletonization of images. Several methods have been proposed to construct such approximating graphs, with some based on computation of minimum spanning trees and some based on principal graphs generalizing principal curves. In this article we propose a methodology to compare and benchmark these two graph-based data approximation approaches, as well as to define their hyperparameters. The main idea is to avoid comparing graphs directly, but at first to induce clustering of the data point cloud from the graph approximation and, secondly, to use well-established methods to compare and score the data cloud partitioning induced by the graphs. In particular, mutual information-based approaches prove to be useful in this context. The induced clustering is based on decomposing a graph into non-branching segments, and then clustering the data point cloud by the nearest segment. Such a method allows efficient comparison of graph-based data approximations of arbitrary topology and complexity. The method is implemented in Python using the standard scikit-learn library which provides high speed and efficiency. As a demonstration of the methodology we analyse and compare graph-based data approximation methods using synthetic as well as real-life single cell datasets.

19.
Entropy (Basel) ; 22(3)2020 Mar 04.
Article in English | MEDLINE | ID: mdl-33286070

ABSTRACT

Multidimensional datapoint clouds representing large datasets are frequently characterized by non-trivial low-dimensional geometry and topology which can be recovered by unsupervised machine learning approaches, in particular, by principal graphs. Principal graphs approximate the multivariate data by a graph injected into the data space with some constraints imposed on the node mapping. Here we present ElPiGraph, a scalable and robust method for constructing principal graphs. ElPiGraph exploits and further develops the concept of elastic energy, the topological graph grammar approach, and a gradient descent-like optimization of the graph topology. The method is able to withstand high levels of noise and is capable of approximating data point clouds via principal graph ensembles. This strategy can be used to estimate the statistical significance of complex data features and to summarize them into a single consensus principal graph. ElPiGraph deals efficiently with large datasets in various fields such as biology, where it can be used for example with single-cell transcriptomic or epigenomic datasets to infer gene expression dynamics and recover differentiation landscapes.

20.
BMC Bioinformatics ; 20(Suppl 4): 140, 2019 Apr 18.
Article in English | MEDLINE | ID: mdl-30999838

ABSTRACT

BACKGROUND: The interplay between metabolic processes and signalling pathways remains poorly understood. Global, detailed and comprehensive reconstructions of human metabolism and signalling pathways exist in the form of molecular maps, but they have never been integrated together. We aim at filling in this gap by integrating of both signalling and metabolic pathways allowing a visual exploration of multi-level omics data and study of cross-regulatory circuits between these processes in health and in disease. RESULTS: We combined two comprehensive manually curated network maps. Atlas of Cancer Signalling Network (ACSN), containing mechanisms frequently implicated in cancer; and ReconMap 2.0, a comprehensive reconstruction of human metabolic network. We linked ACSN and ReconMap 2.0 maps via common players and represented the two maps as interconnected layers using the NaviCell platform for maps exploration ( https://navicell.curie.fr/pages/maps_ReconMap%202.html ). In addition, proteins catalysing metabolic reactions in ReconMap 2.0 were not previously visually represented on the map canvas. This precluded visualisation of omics data in the context of ReconMap 2.0. We suggested a solution for displaying protein nodes on the ReconMap 2.0 map in the vicinity of the corresponding reaction or process nodes. This permits multi-omics data visualisation in the context of both map layers. Exploration and shuttling between the two map layers is possible using Google Maps-like features of NaviCell. The integrated networks ACSN-ReconMap 2.0 are accessible online and allows data visualisation through various modes such as markers, heat maps, bar-plots, glyphs and map staining. The integrated networks were applied for comparison of immunoreactive and proliferative ovarian cancer subtypes using transcriptomic, copy number and mutation multi-omics data. A certain number of metabolic and signalling processes specifically deregulated in each of the ovarian cancer sub-types were identified. CONCLUSIONS: As knowledge evolves and new omics data becomes more heterogeneous, gathering together existing domains of biology under common platforms is essential. We believe that an integrated ACSN-ReconMap 2.0 networks will help in understanding various disease mechanisms and discovery of new interactions at the intersection of cell signalling and metabolism. In addition, the successful integration of metabolic and signalling networks allows broader systems biology approach application for data interpretation and retrieval of intervention points to tackle simultaneously the key players coordinating signalling and metabolism in human diseases.


Subject(s)
Data Analysis , Genomics/methods , Metabolic Networks and Pathways , Neoplasms/genetics , Signal Transduction , Female , Humans , Software , Systems Biology
SELECTION OF CITATIONS
SEARCH DETAIL