ABSTRACT
The introduction of Next-Generation Sequencing technologies in the clinics has improved rare disease diagnosis. Nonetheless, for very heterogeneous or very rare diseases, more than half of cases still lack molecular diagnosis. Novel strategies are needed to prioritize variants within a single individual. The Population Sampling Probability (PSAP) method was developed to meet this aim but only for coding variants in exome data. Here, we propose an extension of the PSAP method to the non-coding genome called PSAP-genomic-regions. In this extension, instead of considering genes as testing units (PSAP-genes strategy), we use genomic regions defined over the whole genome that pinpoint potential functional constraints. We conceived an evaluation protocol for our method using artificially generated disease exomes and genomes, by inserting coding and non-coding pathogenic ClinVar variants in large data sets of exomes and genomes from the general population. PSAP-genomic-regions significantly improves the ranking of these variants compared to using a pathogenicity score alone. Using PSAP-genomic-regions, more than 50% of non-coding ClinVar variants were among the top 10 variants of the genome. On real sequencing data from six patients with Cerebral Small Vessel Disease and nine patients with male infertility, all causal variants were ranked in the top 100 variants with PSAP-genomic-regions. By revisiting the testing units used in the PSAP method to include non-coding variants, we have developed PSAP-genomic-regions, an efficient whole-genome prioritization tool which offers promising results for the diagnosis of unresolved rare diseases.
ABSTRACT
Many genetic syndromes are linked to mutations in genes encoding factors that guide chromatin organization. Among them, several distinct rare genetic diseases are linked to mutations in SMCHD1 that encodes the structural maintenance of chromosomes flexible hinge domain containing 1 chromatin-associated factor. In humans, its function as well as the impact of its mutations remains poorly defined. To fill this gap, we determined the episignature associated with heterozygous SMCHD1 variants in primary cells and cell lineages derived from induced pluripotent stem cells for Bosma arhinia and microphthalmia syndrome (BAMS) and type 2 facioscapulohumeral dystrophy (FSHD2). In human tissues, SMCHD1 regulates the distribution of methylated CpGs, H3K27 trimethylation and CTCF at repressed chromatin but also at euchromatin. Based on the exploration of tissues affected either in FSHD or in BAMS, i.e. skeletal muscle fibers and neural crest stem cells, respectively, our results emphasize multiple functions for SMCHD1, in chromatin compaction, chromatin insulation and gene regulation with variable targets or phenotypical outcomes. We concluded that in rare genetic diseases, SMCHD1 variants impact gene expression in two ways: (i) by changing the chromatin context at a number of euchromatin loci or (ii) by directly regulating some loci encoding master transcription factors required for cell fate determination and tissue differentiation.
Subject(s)
Microphthalmos , Muscular Dystrophy, Facioscapulohumeral , Humans , Muscular Dystrophy, Facioscapulohumeral/genetics , Neural Crest/metabolism , Microphthalmos/genetics , Euchromatin/genetics , Chromosomal Proteins, Non-Histone/metabolism , Muscle, Skeletal/metabolism , Phenotype , Chromatin/geneticsABSTRACT
BACKGROUND: Biological networks have proven invaluable ability for representing biological knowledge. Multilayer networks, which gather different types of nodes and edges in multiplex, heterogeneous and bipartite networks, provide a natural way to integrate diverse and multi-scale data sources into a common framework. Recently, we developed MultiXrank, a Random Walk with Restart algorithm able to explore such multilayer networks. MultiXrank outputs scores reflecting the proximity between an initial set of seed node(s) and all the other nodes in the multilayer network. We illustrate here the versatility of bioinformatics tasks that can be performed using MultiXrank. RESULTS: We first show that MultiXrank can be used to prioritise genes and drugs of interest by exploring multilayer networks containing interactions between genes, drugs, and diseases. In a second study, we illustrate how MultiXrank scores can also be used in a supervised strategy to train a binary classifier to predict gene-disease associations. The classifier performance are validated using outdated and novel gene-disease association for training and evaluation, respectively. Finally, we show that MultiXrank scores can be used to compute diffusion profiles and use them as disease signatures. We computed the diffusion profiles of more than 100 immune diseases using a multilayer network that includes cell-type specific genomic information. The clustering of the immune disease diffusion profiles reveals shared shared phenotypic characteristics. CONCLUSION: Overall, we illustrate here diverse applications of MultiXrank to showcase its versatility. We expect that this can lead to further and broader bioinformatics applications.
Subject(s)
Algorithms , Computational Biology , GenomicsABSTRACT
CONTEXT: Identifying clusters (i.e., subgroups) of patients from the analysis of medico-administrative databases is particularly important to better understand disease heterogeneity. However, these databases contain different types of longitudinal variables which are measured over different follow-up periods, generating truncated data. It is therefore fundamental to develop clustering approaches that can handle this type of data. OBJECTIVE: We propose here cluster-tracking approaches to identify clusters of patients from truncated longitudinal data contained in medico-administrative databases. MATERIAL AND METHODS: We first cluster patients at each age. We then track the identified clusters over ages to construct cluster-trajectories. We compared our novel approaches with three classical longitudinal clustering approaches by calculating the silhouette score. As a use-case, we analyzed antithrombotic drugs used from 2008 to 2018 contained in the Échantillon Généraliste des Bénéficiaires (EGB), a French national cohort. RESULTS: Our cluster-tracking approaches allow us to identify several cluster-trajectories with clinical significance without any imputation of data. The comparison of the silhouette scores obtained with the different approaches highlights the better performances of the cluster-tracking approaches. CONCLUSION: The cluster-tracking approaches are a novel and efficient alternative to identify patient clusters from medico-administrative databases by taking into account their specificities.
Subject(s)
Clinical Relevance , Data Management , Humans , Databases, Factual , Cluster AnalysisABSTRACT
BACKGROUND: Enrichment analyses are widely applied to investigate lists of genes of interest. However, such analyses often result in long lists of annotation terms with high redundancy, making the interpretation and reporting difficult. Long annotation lists and redundancy also complicate the comparison of results obtained from different enrichment analyses. An approach to overcome these issues is using down-sized annotation collections composed of non-redundant terms. However, down-sized collections are generic and the level of detail may not fit the user's study. Other available approaches include clustering and filtering tools, which are based on similarity measures and thresholds that can be complicated to comprehend and set. RESULT: We propose orsum, a Python package to filter enrichment results. orsum can filter multiple enrichment results collectively and highlight common and specific annotation terms. Filtering in orsum is based on a simple principle: a term is discarded if there is a more significant term that annotates at least the same genes; the remaining more significant term becomes the representative term for the discarded term. This principle ensures that the main biological information is preserved in the filtered results while reducing redundancy. In addition, as the representative terms are selected from the original enrichment results, orsum outputs filtered terms tailored to the study. As a use case, we applied orsum to the enrichment analyses of four lists of genes, each associated with a neurodegenerative disease. CONCLUSION: orsum provides a comprehensible and effective way of filtering and comparing enrichment results. It is available at https://anaconda.org/bioconda/orsum .
Subject(s)
Computational Biology , Neurodegenerative Diseases , Cluster Analysis , Computational Biology/methods , Humans , SoftwareABSTRACT
The identification of subnetworks of interest-or active modules-by integrating biological networks with molecular profiles is a key resource to inform on the processes perturbed in different cellular conditions. We here propose MOGAMUN, a Multi-Objective Genetic Algorithm to identify active modules in MUltiplex biological Networks. MOGAMUN optimizes both the density of interactions and the scores of the nodes (e.g., their differential expression). We compare MOGAMUN with state-of-the-art methods, representative of different algorithms dedicated to the identification of active modules in single networks. MOGAMUN identifies dense and high-scoring modules that are also easier to interpret. In addition, to our knowledge, MOGAMUN is the first method able to use multiplex networks. Multiplex networks are composed of different layers of physical and functional relationships between genes and proteins. Each layer is associated to its own meaning, topology, and biases; the multiplex framework allows exploiting this diversity of biological networks. We applied MOGAMUN to identify cellular processes perturbed in Facio-Scapulo-Humeral muscular Dystrophy, by integrating RNA-seq expression data with a multiplex biological network. We identified different active modules of interest, thereby providing new angles for investigating the pathomechanisms of this disease. Availability: MOGAMUN is available at https://github.com/elvanov/MOGAMUN and as a Bioconductor package at https://bioconductor.org/packages/release/bioc/html/MOGAMUN.html. Contact: anais.baudot@univ-amu.fr.
Subject(s)
Algorithms , Models, Biological , Computational Biology , Computer Simulation , Databases, Nucleic Acid , Gene Regulatory Networks , Humans , Models, Genetic , Muscular Dystrophy, Facioscapulohumeral/genetics , Muscular Dystrophy, Facioscapulohumeral/metabolism , RNA-Seq , Software , Systems Biology , Systems Integration , Systems Theory , TranscriptomeABSTRACT
Motivation: Recent years have witnessed an exponential growth in the number of identified interactions between biological molecules. These interactions are usually represented as large and complex networks, calling for the development of appropriated tools to exploit the functional information they contain. Random walk with restart (RWR) is the state-of-the-art guilt-by-association approach. It explores the network vicinity of gene/protein seeds to study their functions, based on the premise that nodes related to similar functions tend to lie close to each other in the networks. Results: In this study, we extended the RWR algorithm to multiplex and heterogeneous networks. The walk can now explore different layers of physical and functional interactions between genes and proteins, such as protein-protein interactions and co-expression associations. In addition, the walk can also jump to a network containing different sets of edges and nodes, such as phenotype similarities between diseases. We devised a leave-one-out cross-validation strategy to evaluate the algorithms abilities to predict disease-associated genes. We demonstrate the increased performances of the multiplex-heterogeneous RWR as compared to several random walks on monoplex or heterogeneous networks. Overall, our framework is able to leverage the different interaction sources to outperform current approaches. Finally, we applied the algorithm to predict candidate genes for the Wiedemann-Rautenstrauch syndrome, and to explore the network vicinity of the SHORT syndrome. Availability and implementation: The source code is available on GitHub at: https://github.com/alberto-valdeolivas/RWR-MH. In addition, an R package is freely available through Bioconductor at: http://bioconductor.org/packages/RandomWalkRestartMH/. Supplementary information: Supplementary data are available at Bioinformatics online.
Subject(s)
Algorithms , Computational Biology , Phenotype , SoftwareABSTRACT
Matrix factorization (MF) is an established paradigm for large-scale biological data analysis with tremendous potential in computational biology. Here, we challenge MF in depicting the molecular bases of epidemiologically described disease-disease (DD) relationships. As a use case, we focus on the inverse comorbidity association between Alzheimer's disease (AD) and lung cancer (LC), described as a lower than expected probability of developing LC in AD patients. To this day, the molecular mechanisms underlying DD relationships remain poorly explained and their better characterization might offer unprecedented clinical opportunities. To this goal, we extend our previously designed MF-based framework for the molecular characterization of DD relationships. Considering AD-LC inverse comorbidity as a case study, we highlight multiple molecular mechanisms, among which we confirm the involvement of processes related to the immune system and mitochondrial metabolism. We then distinguish mechanisms specific to LC from those shared with other cancers through a pan-cancer analysis. Additionally, new candidate molecular players, such as estrogen receptor (ER), cadherin 1 (CDH1) and histone deacetylase (HDAC), are pinpointed as factors that might underlie the inverse relationship, opening the way to new investigations. Finally, some lung cancer subtype-specific factors are also detected, also suggesting the existence of heterogeneity across patients in the context of inverse comorbidity.
Subject(s)
Alzheimer Disease/epidemiology , Computational Biology , Lung Neoplasms/epidemiology , Models, Biological , Algorithms , Alzheimer Disease/complications , Alzheimer Disease/etiology , Comorbidity , Computational Biology/methods , Humans , Lung Neoplasms/complications , Lung Neoplasms/etiologyABSTRACT
There is epidemiological evidence that patients with certain Central Nervous System (CNS) disorders have a lower than expected probability of developing some types of Cancer. We tested here the hypothesis that this inverse comorbidity is driven by molecular processes common to CNS disorders and Cancers, and that are deregulated in opposite directions. We conducted transcriptomic meta-analyses of three CNS disorders (Alzheimer's disease, Parkinson's disease and Schizophrenia) and three Cancer types (Lung, Prostate, Colorectal) previously described with inverse comorbidities. A significant overlap was observed between the genes upregulated in CNS disorders and downregulated in Cancers, as well as between the genes downregulated in CNS disorders and upregulated in Cancers. We also observed expression deregulations in opposite directions at the level of pathways. Our analysis points to specific genes and pathways, the upregulation of which could increase the incidence of CNS disorders and simultaneously lower the risk of developing Cancer, while the downregulation of another set of genes and pathways could contribute to a decrease in the incidence of CNS disorders while increasing the Cancer risk. These results reinforce the previously proposed involvement of the PIN1 gene, Wnt and P53 pathways, and reveal potential new candidates, in particular related with protein degradation processes.
Subject(s)
Alzheimer Disease/genetics , Comorbidity , Neoplasms/genetics , Parkinson Disease/genetics , Schizophrenia/genetics , Alzheimer Disease/epidemiology , Alzheimer Disease/pathology , Central Nervous System/pathology , Gene Expression Profiling , Gene Expression Regulation , Humans , NIMA-Interacting Peptidylprolyl Isomerase , Neoplasms/epidemiology , Neoplasms/pathology , Parkinson Disease/epidemiology , Parkinson Disease/pathology , Peptidylprolyl Isomerase/genetics , Schizophrenia/epidemiology , Schizophrenia/pathology , Signal TransductionABSTRACT
Discovery of efficient anti-cancer drug combinations is a major challenge, since experimental testing of all possible combinations is clearly impossible. Recent efforts to computationally predict drug combination responses retain this experimental search space, as model definitions typically rely on extensive drug perturbation data. We developed a dynamical model representing a cell fate decision network in the AGS gastric cancer cell line, relying on background knowledge extracted from literature and databases. We defined a set of logical equations recapitulating AGS data observed in cells in their baseline proliferative state. Using the modeling software GINsim, model reduction and simulation compression techniques were applied to cope with the vast state space of large logical models and enable simulations of pairwise applications of specific signaling inhibitory chemical substances. Our simulations predicted synergistic growth inhibitory action of five combinations from a total of 21 possible pairs. Four of the predicted synergies were confirmed in AGS cell growth real-time assays, including known effects of combined MEK-AKT or MEK-PI3K inhibitions, along with novel synergistic effects of combined TAK1-AKT or TAK1-PI3K inhibitions. Our strategy reduces the dependence on a priori drug perturbation experimentation for well-characterized signaling networks, by demonstrating that a model predictive of combinatorial drug effects can be inferred from background knowledge on unperturbed and proliferating cancer cells. Our modeling approach can thus contribute to preclinical discovery of efficient anticancer drug combinations, and thereby to development of strategies to tailor treatment to individual cancer patients.
Subject(s)
Antineoplastic Agents/pharmacology , Computational Biology/methods , Drug Synergism , Stomach Neoplasms/drug therapy , Antineoplastic Agents/therapeutic use , Cell Line, Tumor , Cell Proliferation/drug effects , Computer Simulation , Drug Discovery , Humans , Models, BiologicalABSTRACT
Previously, we identified the stress-induced chaperone, Hsp27, as highly overexpressed in castration-resistant prostate cancer and developed an Hsp27 inhibitor (OGX-427) currently tested in phase I/II clinical trials as a chemosensitizing agent in different cancers. To better understand the Hsp27 poorly-defined cytoprotective functions in cancers and increase the OGX-427 pharmacological safety, we established the Hsp27-protein interaction network using a yeast two-hybrid approach and identified 226 interaction partners. As an example, we showed that targeting Hsp27 interaction with TCTP, a partner protein identified in our screen increases therapy sensitivity, opening a new promising field of research for therapeutic approaches that could decrease or abolish toxicity for normal cells. Results of an in-depth bioinformatics network analysis allying the Hsp27 interaction map into the human interactome underlined the multifunctional character of this protein. We identified interactions of Hsp27 with proteins involved in eight well known functions previously related to Hsp27 and uncovered 17 potential new ones, such as DNA repair and RNA splicing. Validation of Hsp27 involvement in both processes in human prostate cancer cells supports our system biology-predicted functions and provides new insights into Hsp27 roles in cancer cells.
Subject(s)
Biomarkers, Tumor/metabolism , DNA Repair , Gene Expression Regulation, Neoplastic , HSP27 Heat-Shock Proteins/metabolism , Prostatic Neoplasms, Castration-Resistant/metabolism , Alternative Splicing , Antineoplastic Agents/chemical synthesis , Antineoplastic Agents/metabolism , Biomarkers, Tumor/genetics , Cell Line, Tumor , Clinical Trials as Topic , Female , HSP27 Heat-Shock Proteins/antagonists & inhibitors , HSP27 Heat-Shock Proteins/genetics , HeLa Cells , Heat-Shock Proteins , Humans , Male , Molecular Chaperones , Molecular Targeted Therapy , Oligonucleotides/chemical synthesis , Oligonucleotides/genetics , Oligonucleotides/metabolism , Prostatic Neoplasms, Castration-Resistant/drug therapy , Prostatic Neoplasms, Castration-Resistant/genetics , Prostatic Neoplasms, Castration-Resistant/pathology , Protein Binding , Protein Interaction Mapping , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Signal Transduction , Tumor Protein, Translationally-Controlled 1 , Two-Hybrid System TechniquesABSTRACT
Muscular dystrophies (MDs) are inherited genetic diseases causing weakness and degeneration of muscles. The distribution of muscle weakness differs between MDs, involving distal muscles or proximal muscles. While the mutations in most of the MD-associated genes lead to either distal or proximal onset, there are also genes whose mutations can cause both types of onsets. We hypothesized that the genes associated with different MD onsets code proteins with distinct cellular functions. To investigate this, we collected the MD-associated genes and assigned them to three onset groups: genes mutated only in distal onset dystrophies, genes mutated only in proximal onset dystrophies, and genes mutated in both types of onsets. We then systematically evaluated the cellular functions of these gene sets with computational strategies based on functional enrichment analysis and biological network analysis. Our analyses demonstrate that genes mutated in either distal or proximal onset MDs code proteins linked with two distinct sets of cellular processes. Interestingly, these two sets of cellular processes are relevant for the genes that are associated with both onsets. Moreover, the genes associated with both onsets display high centrality and connectivity in the network of muscular dystrophy genes. Our findings support the hypothesis that the proteins associated with distal or proximal onsets have distinct functional characteristics, whereas the proteins associated with both onsets are multifunctional.
Subject(s)
Muscle Weakness , Muscular Dystrophies , Mutation , Humans , Muscular Dystrophies/genetics , Muscle Weakness/genetics , Gene Regulatory Networks , Computational Biology/methods , Muscle, Skeletal/metabolism , Muscle, Skeletal/physiopathology , Muscle, Skeletal/pathologyABSTRACT
Premature Aging (PA) diseases are rare genetic disorders that mimic some aspects of physiological aging at an early age. Various causative genes of PA diseases have been identified in recent years, providing insights into some dysfunctional cellular processes. However, the identification of PA genes also revealed significant genetic heterogeneity and highlighted the gaps in this understanding of PA-associated molecular mechanisms. Furthermore, many patients remain undiagnosed. Overall, the current lack of knowledge about PA diseases hinders the development of effective diagnosis and therapies and poses significant challenges to improving patient care. Here, a network-based approach to systematically unravel the cellular functions disrupted in PA diseases is presented. Leveraging a network community identification algorithm, it is delved into a vast multilayer network of biological interactions to extract the communities of 67 PA diseases from their 132 associated genes. It is found that these communities can be grouped into six distinct clusters, each reflecting specific cellular functions: DNA repair, cell cycle, transcription regulation, inflammation, cell communication, and vesicle-mediated transport. That these clusters collectively represent the landscape of the molecular mechanisms that are perturbed in PA diseases, providing a framework for better understanding their pathogenesis is proposed. Intriguingly, most clusters also exhibited a significant enrichment in genes associated with physiological aging, suggesting a potential overlap between the molecular underpinnings of PA diseases and natural aging.
ABSTRACT
Genetic diagnosis plays a crucial role in rare diseases, particularly with the increasing availability of emerging and accessible treatments. The International Rare Diseases Research Consortium (IRDiRC) has set its primary goal as: "Ensuring that all patients who present with a suspected rare disease receive a diagnosis within one year if their disorder is documented in the medical literature". Despite significant advances in genomic sequencing technologies, more than half of the patients with suspected Mendelian disorders remain undiagnosed. In response, IRDiRC proposes the establishment of "a globally coordinated diagnostic and research pipeline". To help facilitate this, IRDiRC formed the Task Force on Integrating New Technologies for Rare Disease Diagnosis. This multi-stakeholder Task Force aims to provide an overview of the current state of innovative diagnostic technologies for clinicians and researchers, focusing on the patient's diagnostic journey. Herein, we provide an overview of a broad spectrum of emerging diagnostic technologies involving genomics, epigenomics and multi-omics, functional testing and model systems, data sharing, bioinformatics, and Artificial Intelligence (AI), highlighting their advantages, limitations, and the current state of clinical adaption. We provide expert recommendations outlining the stepwise application of these innovative technologies in the diagnostic pathways while considering global differences in accessibility. The importance of FAIR (Findability, Accessibility, Interoperability, and Reusability) and CARE (Collective benefit, Authority to control, Responsibility, and Ethics) data management is emphasized, along with the need for enhanced and continuing education in medical genomics. We provide a perspective on future technological developments in genome diagnostics and their integration into clinical practice. Lastly, we summarize the challenges related to genomic diversity and accessibility, highlighting the significance of innovative diagnostic technologies, global collaboration, and equitable access to diagnosis and treatment for people living with rare disease.
Subject(s)
Rare Diseases , Humans , Rare Diseases/diagnosis , Rare Diseases/genetics , Genomics , Genetic Testing/methodsABSTRACT
Congenital Anomalies of the Kidney and Urinary Tract (CAKUT) is the leading cause of childhood chronic kidney failure and a significant cause of chronic kidney disease in adults. Genetic and environmental factors are known to influence CAKUT development, but the currently known disease mechanism remains incomplete. Our goal is to identify affected pathways and networks in CAKUT, and thereby aid in getting a better understanding of its pathophysiology. With this goal, the miRNome, peptidome, and proteome of over 30 amniotic fluid samples of patients with non-severe CAKUT was compared to patients with severe CAKUT. These omics data sets were made findable, accessible, interoperable, and reusable (FAIR) to facilitate their integration with external data resources. Furthermore, we analysed and integrated the omics data sets using three different bioinformatics strategies: integrative analysis with mixOmics, joint dimensionality reduction and pathway analysis. The three bioinformatics analyses provided complementary features, but all pointed towards an important role for collagen in CAKUT development and the PI3K-AKT signalling pathway. Additionally, several key genes (CSF1, IGF2, ITGB1, and RAC1) and microRNAs were identified. We published the three analysis strategies as containerized workflows. These workflows can be applied to other FAIR data sets and help gaining knowledge on other rare diseases.
Subject(s)
Collagen , Phosphatidylinositol 3-Kinases , Proto-Oncogene Proteins c-akt , Signal Transduction , Humans , Proto-Oncogene Proteins c-akt/metabolism , Proto-Oncogene Proteins c-akt/genetics , Phosphatidylinositol 3-Kinases/metabolism , Phosphatidylinositol 3-Kinases/genetics , Collagen/metabolism , Collagen/genetics , Computational Biology/methods , MicroRNAs/genetics , MicroRNAs/metabolism , Vesico-Ureteral Reflux/genetics , Vesico-Ureteral Reflux/metabolism , Female , Proteome/metabolism , Amniotic Fluid/metabolism , Urinary Tract/metabolism , Multiomics , Urogenital AbnormalitiesABSTRACT
Summary: Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology. Availability and implementation: Not applicable.
ABSTRACT
MOTIVATION: Assessing functional associations between an experimentally derived gene or protein set of interest and a database of known gene/protein sets is a common task in the analysis of large-scale functional genomics data. For this purpose, a frequently used approach is to apply an over-representation-based enrichment analysis. However, this approach has four drawbacks: (i) it can only score functional associations of overlapping gene/proteins sets; (ii) it disregards genes with missing annotations; (iii) it does not take into account the network structure of physical interactions between the gene/protein sets of interest and (iv) tissue-specific gene/protein set associations cannot be recognized. RESULTS: To address these limitations, we introduce an integrative analysis approach and web-application called EnrichNet. It combines a novel graph-based statistic with an interactive sub-network visualization to accomplish two complementary goals: improving the prioritization of putative functional gene/protein set associations by exploiting information from molecular interaction networks and tissue-specific gene expression data and enabling a direct biological interpretation of the results. By using the approach to analyse sets of genes with known involvement in human diseases, new pathway associations are identified, reflecting a dense sub-network of interactions between their corresponding proteins. AVAILABILITY: EnrichNet is freely available at http://www.enrichnet.org. CONTACT: Natalio.Krasnogor@nottingham.ac.uk, reinhard.schneider@uni.lu or avalencia@cnio.es SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Online.
Subject(s)
Gene Regulatory Networks , Protein Interaction Mapping/methods , Software , Data Interpretation, Statistical , Databases, Genetic , Gene Expression Profiling , Genes , Humans , Internet , Neoplasms/genetics , Neoplasms/metabolism , Parkinson Disease/genetics , Parkinson Disease/metabolism , Protein Interaction MapsABSTRACT
Giant axonal neuropathy (GAN) is a fatal neurodegenerative disorder for which there is currently no treatment. Affecting the nervous system, GAN starts in infancy with motor deficits that rapidly evolve toward total loss of ambulation. Using the gan zebrafish model that reproduces the loss of motility as seen in patients, we conducted the first pharmacological screening for the GAN pathology. Here, we established a multilevel pipeline to identify small molecules restoring both the physiological and the cellular deficits in GAN. We combined behavioral, in silico, and high-content imaging analyses to refine our Hits to five drugs restoring locomotion, axonal outgrowth, and stabilizing neuromuscular junctions in the gan zebrafish. The postsynaptic nature of the drug's cellular targets provides direct evidence for the pivotal role the neuromuscular junction holds in the restoration of motility. Our results identify the first drug candidates that can now be integrated in a repositioning approach to fasten therapy for the GAN disease. Moreover, we anticipate both our methodological development and the identified hits to be of benefit to other neuromuscular diseases.
Subject(s)
Giant Axonal Neuropathy , Animals , Giant Axonal Neuropathy/diagnosis , Giant Axonal Neuropathy/pathology , Giant Axonal Neuropathy/therapy , Cytoskeletal Proteins , Zebrafish , Neuromuscular JunctionABSTRACT
Integration of the many available sources of cancer gene information--such as large-scale tumour-resequencing studies--identifies the 'usual suspect' genes, mutated in many tumour types, as well as different sets of mutated genes according to the specific tumour type. Scaling-up the analysis reveals that this large collection of mutated genes cluster into a smaller number of signalling pathways and processes. From this, we draw a map of the altered processes, and their combinations, in more than 10 tumours types. Literature searches identify pathways and processes that are covered sparsely in the literature, and invite the proposal of new hypotheses to investigate cancer initiation and progression.
Subject(s)
Gene Expression Regulation, Neoplastic , Genetic Predisposition to Disease/genetics , Multigene Family , Mutation , Neoplasms/genetics , Cluster Analysis , Databases, Genetic , Genes, Neoplasm , Signal TransductionABSTRACT
The phosphatidylinositol 3-kinase-mammalian target of rapamycin (PI3K-mTOR) pathway plays pivotal roles in cell survival, growth, and proliferation downstream of growth factors. Its perturbations are associated with cancer progression, type 2 diabetes, and neurological disorders. To better understand the mechanisms of action and regulation of this pathway, we initiated a large scale yeast two-hybrid screen for 33 components of the PI3K-mTOR pathway. Identification of 67 new interactions was followed by validation by co-affinity purification and exhaustive literature curation of existing information. We provide a nearly complete, functionally annotated interactome of 802 interactions for the PI3K-mTOR pathway. Our screen revealed a predominant place for glycogen synthase kinase-3 (GSK3) A and B and the AMP-activated protein kinase. In particular, we identified the deformed epidermal autoregulatory factor-1 (DEAF1) transcription factor as an interactor and in vitro substrate of GSK3A and GSK3B. Moreover, GSK3 inhibitors increased DEAF1 transcriptional activity on the 5-HT1A serotonin receptor promoter. We propose that DEAF1 may represent a therapeutic target of lithium and other GSK3 inhibitors used in bipolar disease and depression.