RESUMEN
A wide range of approaches can be used to detect micro RNA (miRNA)-target gene pairs (mTPs) from expression data, differing in the ways the gene and miRNA expression profiles are calculated, combined and correlated. However, there is no clear consensus on which is the best approach across all datasets. Here, we have implemented multiple strategies and applied them to three distinct rare disease datasets that comprise smallRNA-Seq and RNA-Seq data obtained from the same samples, obtaining mTPs related to the disease pathology. All datasets were preprocessed using a standardized, freely available computational workflow, DEG_workflow. This workflow includes coRmiT, a method to compare multiple strategies for mTP detection. We used it to investigate the overlap of the detected mTPs with predicted and validated mTPs from 11 different databases. Results show that there is no clear best strategy for mTP detection applicable to all situations. We therefore propose the integration of the results of the different strategies by selecting the one with the highest odds ratio for each miRNA, as the optimal way to integrate the results. We applied this selection-integration method to the datasets and showed it to be robust to changes in the predicted and validated mTP databases. Our findings have important implications for miRNA analysis. coRmiT is implemented as part of the ExpHunterSuite Bioconductor package available from https://bioconductor.org/packages/ExpHunterSuite.
Asunto(s)
MicroARNs , Consenso , Bases de Datos Factuales , MicroARNs/genética , Oportunidad Relativa , RNA-SeqRESUMEN
Meniere Disease (MD) is a chronic inner ear disorder characterized by vertigo attacks, sensorineural hearing loss, tinnitus, and aural fullness. Extensive evidence supporting the inflammatory etiology of MD has been found, therefore, by using transcriptome analysis, we aim to describe the inflammatory variants of MD. We performed Bulk RNAseq on 45 patients with definite MD and 15 healthy controls. MD patients were classified according to their basal levels of IL-1ß into 2 groups: high and low. Differentially expression analysis was performed using the ExpHunter Suite, and cell type proportion was evaluated using the estimation algorithms xCell, ABIS, and CIBERSORTx. MD patients showed 15 differentially expressed genes (DEG) compared to controls. The top DEGs include IGHG1 (p = 1.64 × 10-6) and IGLV3-21 (p = 6.28 × 10-3), supporting a role in the adaptative immune response. Cytokine profiling defines a subgroup of patients with high levels of IL-1ß with up-regulation of IL6 (p = 7.65 × 10-8) and INHBA (p = 3.39 × 10-7) genes. Transcriptomic data from peripheral blood mononuclear cells support a proinflammatory subgroup of MD patients with high levels of IL6 and an increase in naïve B-cells, and memory CD8+ T cells.
Asunto(s)
Enfermedad de Meniere , Humanos , Enfermedad de Meniere/metabolismo , Leucocitos Mononucleares/metabolismo , Interleucina-6/metabolismo , Linfocitos T CD8-positivos/metabolismo , Perfilación de la Expresión GénicaRESUMEN
BACKGROUND: Angiogenesis is regulated by multiple genes whose variants can lead to different disorders. Among them, rare diseases are a heterogeneous group of pathologies, most of them genetic, whose information may be of interest to determine the still unknown genetic and molecular causes of other diseases. In this work, we use the information on rare diseases dependent on angiogenesis to investigate the genes that are associated with this biological process and to determine if there are interactions between the genes involved in its deregulation. RESULTS: We propose a systemic approach supported by the use of pathological phenotypes to group diseases by semantic similarity. We grouped 158 angiogenesis-related rare diseases in 18 clusters based on their phenotypes. Of them, 16 clusters had traceable gene connections in a high-quality interaction network. These disease clusters are associated with 130 different genes. We searched for genes associated with angiogenesis througth ClinVar pathogenic variants. Of the seven retrieved genes, our system confirms six of them. Furthermore, it allowed us to identify common affected functions among these disease clusters. AVAILABILITY: https://github.com/ElenaRojano/angio_cluster. CONTACT: seoanezonjic@uma.es and elenarojano@uma.es.
Asunto(s)
Biología Computacional , Enfermedades Raras , Algoritmos , Análisis por Conglomerados , Humanos , Fenotipo , Enfermedades Raras/genética , SemánticaRESUMEN
BACKGROUND: Schaaf-Yang syndrome (SYS) is caused by truncating mutations in MAGEL2, mapping to the Prader-Willi region (15q11-q13), with an observed phenotype partially overlapping that of Prader-Willi syndrome. MAGEL2 plays a role in retrograde transport and protein recycling regulation. Our aim is to contribute to the characterisation of SYS pathophysiology at clinical, genetic and molecular levels. METHODS: We performed an extensive phenotypic and mutational revision of previously reported patients with SYS. We analysed the secretion levels of amyloid-ß 1-40 peptide (Aß1-40) and performed targeted metabolomic and transcriptomic profiles in fibroblasts of patients with SYS (n=7) compared with controls (n=11). We also transfected cell lines with vectors encoding wild-type (WT) or mutated MAGEL2 to assess stability and subcellular localisation of the truncated protein. RESULTS: Functional studies show significantly decreased levels of secreted Aß1-40 and intracellular glutamine in SYS fibroblasts compared with WT. We also identified 132 differentially expressed genes, including non-coding RNAs (ncRNAs) such as HOTAIR, and many of them related to developmental processes and mitotic mechanisms. The truncated form of MAGEL2 displayed a stability similar to the WT but it was significantly switched to the nucleus, compared with a mainly cytoplasmic distribution of the WT MAGEL2. Based on the updated knowledge, we offer guidelines for the clinical management of patients with SYS. CONCLUSION: A truncated MAGEL2 protein is stable and localises mainly in the nucleus, where it might exert a pathogenic neomorphic effect. Aß1-40 secretion levels and HOTAIR mRNA levels might be promising biomarkers for SYS. Our findings may improve SYS understanding and clinical management.
Asunto(s)
Síndrome de Prader-Willi , Humanos , Síndrome de Prader-Willi/genética , Fenotipo , Mutación , Proteínas/genética , BiomarcadoresRESUMEN
Angiogenesis is essential for tumor growth and cancer metastasis. Identifying the molecular pathways involved in this process is the first step in the rational design of new therapeutic strategies to improve cancer treatment. In recent years, RNA-seq data analysis has helped to determine the genetic and molecular factors associated with different types of cancer. In this work we performed integrative analysis using RNA-seq data from human umbilical vein endothelial cells (HUVEC) and patients with angiogenesis-dependent diseases to find genes that serve as potential candidates to improve the prognosis of tumor angiogenesis deregulation and understand how this process is orchestrated at the genetic and molecular level. We downloaded four RNA-seq datasets (including cellular models of tumor angiogenesis and ischaemic heart disease) from the Sequence Read Archive. Our integrative analysis includes a first step to determine differentially and co-expressed genes. For this, we used the ExpHunter Suite, an R package that performs differential expression, co-expression and functional analysis of RNA-seq data. We used both differentially and co-expressed genes to explore the human gene interaction network and determine which genes were found in the different datasets that may be key for the angiogenesis deregulation. Finally, we performed drug repositioning analysis to find potential targets related to angiogenesis inhibition. We found that that among the transcriptional alterations identified, SEMA3D and IL33 genes are deregulated in all datasets. Microenvironment remodeling, cell cycle, lipid metabolism and vesicular transport are the main molecular pathways affected. In addition to this, interacting genes are involved in intracellular signaling pathways, especially in immune system and semaphorins, respiratory electron transport and fatty acid metabolism. The methodology presented here can be used for finding common transcriptional alterations in other genetically-based diseases.
Asunto(s)
Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Perfilación de la Expresión Génica/métodos , Células Endoteliales , Transducción de Señal/genéticaRESUMEN
Genetic and molecular analysis of rare disease is made difficult by the small numbers of affected patients. Phenotypic comorbidity analysis can help rectify this by combining information from individuals with similar phenotypes and looking for overlap in terms of shared genes and underlying functional systems. However, few studies have combined comorbidity analysis with genomic data. We present a computational approach that connects patient phenotypes based on phenotypic co-occurence and uses genomic information related to the patient mutations to assign genes to the phenotypes, which are used to detect enriched functional systems. These phenotypes are clustered using network analysis to obtain functionally coherent phenotype clusters. We applied the approach to the DECIPHER database, containing phenotypic and genomic information for thousands of patients with heterogeneous rare disorders and copy number variants. Validity was demonstrated through overlap with known diseases, co-mention within the biomedical literature, semantic similarity measures, and patient cluster membership. These connected pairs formed multiple phenotype clusters, showing functional coherence, and mapped to genes and systems involved in similar pathological processes. Examples include claudin genes from the 22q11 genomic region associated with a cluster of phenotypes related to DiGeorge syndrome and genes related to the GO term anterior/posterior pattern specification associated with abnormal development. The clusters generated can help with the diagnosis of rare diseases, by suggesting additional phenotypes for a given patient and potential underlying functional systems. Other tools to find causal genes based on phenotype were also investigated. The approach has been implemented as a workflow, named PhenCo, which can be adapted to any set of patients for which phenomic and genomic data is available. Full details of the analysis, including the clusters formed, their constituent functional systems and underlying genes are given. Code to implement the workflow is available from GitHub.
Asunto(s)
Comorbilidad , Predisposición Genética a la Enfermedad , Genómica , Enfermedades Raras/genética , Variaciones en el Número de Copia de ADN/genética , Bases de Datos Genéticas , Estudios de Asociación Genética , Genoma Humano/genética , Genotipo , Humanos , Mutación/genética , Fenotipo , Enfermedades Raras/diagnóstico , Enfermedades Raras/patologíaRESUMEN
BACKGROUND: Protein function prediction remains a key challenge. Domain composition affects protein function. Here we present DomFun, a Ruby gem that uses associations between protein domains and functions, calculated using multiple indices based on tripartite network analysis. These domain-function associations are combined at the protein level, to generate protein-function predictions. RESULTS: We analysed 16 tripartite networks connecting homologous superfamily and FunFam domains from CATH-Gene3D with functional annotations from the three Gene Ontology (GO) sub-ontologies, KEGG, and Reactome. We validated the results using the CAFA 3 benchmark platform for GO annotation, finding that out of the multiple association metrics and domain datasets tested, Simpson index for FunFam domain-function associations combined with Stouffer's method leads to the best performance in almost all scenarios. We also found that using FunFams led to better performance than superfamilies, and better results were found for GO molecular function compared to GO biological process terms. DomFun performed as well as the highest-performing method in certain CAFA 3 evaluation procedures in terms of [Formula: see text] and [Formula: see text] We also implemented our own benchmark procedure, Pathway Prediction Performance (PPP), which can be used to validate function prediction for additional annotations sources, such as KEGG and Reactome. Using PPP, we found similar results to those found with CAFA 3 for GO, moreover we found good performance for the other annotation sources. As with CAFA 3, Simpson index with Stouffer's method led to the top performance in almost all scenarios. CONCLUSIONS: DomFun shows competitive performance with other methods evaluated in CAFA 3 when predicting proteins function with GO, although results vary depending on the evaluation procedure. Through our own benchmark procedure, PPP, we have shown it can also make accurate predictions for KEGG and Reactome. It performs best when using FunFams, combining Simpson index derived domain-function associations using Stouffer's method. The tool has been implemented so that it can be easily adapted to incorporate other protein features, such as domain data from other sources, amino acid k-mers and motifs. The DomFun Ruby gem is available from https://rubygems.org/gems/DomFun . Code maintained at https://github.com/ElenaRojano/DomFun . Validation procedure scripts can be found at https://github.com/ElenaRojano/DomFun_project .
Asunto(s)
Biología Computacional , Proteínas , Bases de Datos de Proteínas , Ontología de Genes , Anotación de Secuencia Molecular , Proteínas/genéticaRESUMEN
Copy number variation (CNV) related disorders tend to show complex phenotypic profiles that do not match known diseases. This makes it difficult to ascertain their underlying molecular basis. A potential solution is to compare the affected genomic regions for multiple patients that share a pathological phenotype, looking for commonalities. Here, we present a novel approach to associate phenotypes with functional systems, in terms of GO categories and KEGG and Reactome pathways, based on patient data. The approach uses genomic and phenomic data from the same patients, finding shared genomic regions between patients with similar phenotypes. These regions are mapped to genes to find associated functional systems. We applied the approach to analyse patients in the DECIPHER database with de novo CNVs, finding functional systems associated with most phenotypes, often due to mutations affecting related genes in the same genomic region. Manual inspection of the ten top-scoring phenotypes found multiple FunSys connections supported by the previous studies for seven of them. The workflow also produces reports focussed on the genes and FunSys connected to the different phenotypes, alongside patient-specific reports, which give details of the associated genes and FunSys for each individual in the cohort. These can be run in "confidential" mode, preserving patient confidentiality. The workflow presented here can be used to associate phenotypes with functional systems using data at the level of a whole cohort of patients, identifying important connections that could not be found when considering them individually. The full workflow is available for download, enabling it to be run on any patient cohort for which phenotypic and CNV data are available.
Asunto(s)
Variaciones en el Número de Copia de ADN , Predisposición Genética a la Enfermedad , Genotipo , Fenotipo , Estudios de Cohortes , Bases de Datos Genéticas , HumanosRESUMEN
Variants within non-coding genomic regions can greatly affect disease. In recent years, increasing focus has been given to these variants, and how they can alter regulatory elements, such as enhancers, transcription factor binding sites and DNA methylation regions. Such variants can be considered regulatory variants. Concurrently, much effort has been put into establishing international consortia to undertake large projects aimed at discovering regulatory elements in different tissues, cell lines and organisms, and probing the effects of genetic variants on regulation by measuring gene expression. Here, we describe methods and techniques for discovering disease-associated non-coding variants using sequencing technologies. We then explain the computational procedures that can be used for annotating these variants using the information from the aforementioned projects, and prediction of their putative effects, including potential pathogenicity, based on rule-based and machine learning approaches. We provide the details of techniques to validate these predictions, by mapping chromatin-chromatin and chromatin-protein interactions, and introduce Clustered Regularly Interspaced Short Palindromic Repeats-Associated Protein 9 (CRISPR-Cas9) technology, which has already been used in this field and is likely to have a big impact on its future evolution. We also give examples of regulatory variants associated with multiple complex diseases. This review is aimed at bioinformaticians interested in the characterization of regulatory variants, molecular biologists and geneticists interested in understanding more about the nature and potential role of such variants from a functional point of views, and clinicians who may wish to learn about variants in non-coding genomic regions associated with a given disease and find out what to do next to uncover how they impact on the underlying mechanisms.
Asunto(s)
Repeticiones Palindrómicas Cortas Agrupadas y Regularmente Espaciadas , Secuencias Reguladoras de Ácidos Nucleicos , Cromatina/metabolismo , Genoma Humano , Humanos , Aprendizaje Automático , Unión ProteicaRESUMEN
Almost 90 % of disease-associated genetic variants found using genome wide association studies (GWAS) are located in non-coding regions of the genome. Such variants can affect phenotype by altering important regulatory elements such as promoters, enhancers or repressors, leading to changes in gene expression and consequently disease, such as thyroid cancer and allergic diseases. A number of allergy and atopy related diseases such as asthma and atopic dermatitis are related to histamine receptors; however, these diseases are not fully characterized at the molecular level. Moreover, candidate gene based studies of common variants known as single nucleotide polymorphism (SNPs) located in the coding regions of these receptors have given mixed results. It is important to complement these approaches by identifying and characterising non-coding variants in order to further elucidate the role of these receptors in disease. Here we present an analysis of histamine receptor genes using the tool AnNCR-SNP to characterise variants in non-coding genomic regions. AnNCR-SNP combines bioinformatics and experimental data sets from various sources to predict the effects of genetic variation on gene expression regulation. We find many SNPs located in areas of open chromatin, overlapping with transcription factor binding sites and associated with changes in gene expression in expression quantitative trait loci (eQTL) experiments. Here we present the results as a catalogue of non-coding variation in histamine receptor genes to aid histamine researchers in identifying putative functional SNPs found in GWAS for further validation, and to help select variants for candidate gene studies.
Asunto(s)
Polimorfismo de Nucleótido Simple , Carácter Cuantitativo Heredable , Receptores Histamínicos/genética , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Estudio de Asociación del Genoma Completo , HumanosRESUMEN
Hirschsprung's disease (HSCR) is a rare developmental disorder in which enteric ganglia are missing along a portion of the intestine. HSCR has a complex inheritance, with RET as the major disease-causing gene. However, the pathogenesis of HSCR is still not completely understood. Therefore, we applied a computational approach based on multi-omics network characterization and clustering analysis for HSCR-related gene/miRNA identification and biomarker discovery. Protein-protein interaction (PPI) and miRNA-target interaction (MTI) networks were analyzed by DPClusO and BiClusO, respectively, and finally, the biomarker potential of miRNAs was computationally screened by miRNA-BD. In this study, a total of 55 significant gene-disease modules were identified, allowing us to propose 178 new HSCR candidate genes and two biological pathways. Moreover, we identified 12 key miRNAs with biomarker potential among 137 predicted HSCR-associated miRNAs. Functional analysis of new candidates showed that enrichment terms related to gene ontology (GO) and pathways were associated with HSCR. In conclusion, this approach has allowed us to decipher new clues of the etiopathogenesis of HSCR, although molecular experiments are further needed for clinical validations.
Asunto(s)
Enfermedad de Hirschsprung , MicroARNs , Humanos , Enfermedad de Hirschsprung/genética , Multiómica , MicroARNs/genética , Biología Computacional , BiomarcadoresRESUMEN
High-throughput gene expression analysis is widely used. However, analysis is not straightforward. Multiple approaches should be applied and methods to combine their results implemented and investigated. We present methodology for the comprehensive analysis of expression data, including co-expression module detection and result integration via data-fusion, threshold based methods, and a Naïve Bayes classifier trained on simulated data. Application to rare-disease model datasets confirms existing knowledge related to immune cell infiltration and suggest novel hypotheses including the role of calcium channels. Application to simulated and spike-in experiments shows that combining multiple methods using consensus and classifiers leads to optimal results. ExpHunter Suite is implemented as an R/Bioconductor package available from https://bioconductor.org/packages/ExpHunterSuite . It can be applied to model and non-model organisms and can be run modularly in R; it can also be run from the command line, allowing scalability with large datasets. Code and reports for the studies are available from https://github.com/fmjabato/ExpHunterSuiteExamples .
Asunto(s)
Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica/genética , RNA-Seq/métodos , Programas Informáticos , Algoritmos , Arabidopsis/genética , Teorema de Bayes , Canales de Calcio/genética , Humanos , Enfermedades Raras/genética , Enfermedades Raras/metabolismoRESUMEN
Exhaustive and comprehensive analysis of pathological traits is essential to understanding genetic diseases, performing precise diagnosis and prescribing personalized treatments. It is particularly important for disease cohorts, as thoroughly detailed phenotypic profiles allow patients to be compared and contrasted. However, many disease cohorts contain patients that have been ascribed low numbers of very general and relatively uninformative phenotypes. We present Cohort Analyzer, a tool that measures the phenotyping quality of patient cohorts. It calculates multiple statistics to give a general overview of the cohort status in terms of the depth and breadth of phenotyping, allowing us to detect less well-phenotyped patients for re-examining or excluding from further analyses. In addition, it performs clustering analysis to find subgroups of patients that share similar phenotypic profiles. We used it to analyse three cohorts of genetic diseases patients with very different properties. We found that cohorts with the most specific and complete phenotypic characterization give more potential insights into the disease than those that were less deeply characterised by forming more informative clusters. For two of the cohorts, we also analysed genomic data related to the patients, and linked the genomic data to the patient-subgroups by mapping shared variants to genes and functions. The work highlights the need for improved phenotyping in this era of personalized medicine. The tool itself is freely available alongside a workflow to allow the analyses shown in this work to be applied to other datasets.
RESUMEN
Copy number variations (CNVs) are genomic structural variations (deletions, duplications, or translocations) that represent the 4.8-9.5% of human genome variation in healthy individuals. In some cases, CNVs can also lead to disease, being the etiology of many known rare genetic/genomic disorders. Despite the last advances in genomic sequencing and diagnosis, the pathological effects of many rare genetic variations remain unresolved, largely due to the low number of patients available for these cases, making it difficult to identify consistent patterns of genotype-phenotype relationships. We aimed to improve the identification of statistically consistent genotype-phenotype relationships by integrating all the genetic and clinical data of thousands of patients with rare genomic disorders (obtained from the DECIPHER database) into a phenotype-patient-genotype tripartite network. Then we assessed how our network approach could help in the characterization and diagnosis of novel cases in clinical genetics. The systematic approach implemented in this work is able to better define the relationships between phenotypes and specific loci, by exploiting large-scale association networks of phenotypes and genotypes in thousands of rare disease patients. The application of the described methodology facilitated the diagnosis of novel clinical cases, ranking phenotypes by locus specificity and reporting putative new clinical features that may suggest additional clinical follow-ups. In this work, the proof of concept developed over a set of novel clinical cases demonstrates that this network-based methodology might help improve the precision of patient clinical records and the characterization of rare syndromes.