RESUMO
Exclusive enteral nutrition (EEN) is a first-line therapy for pediatric Crohn's disease (CD), but protective mechanisms remain unknown. We established a prospective pediatric cohort to characterize the function of fecal microbiota and metabolite changes of treatment-naive CD patients in response to EEN (German Clinical Trials DRKS00013306). Integrated multi-omics analysis identified network clusters from individually variable microbiome profiles, with Lachnospiraceae and medium-chain fatty acids as protective features. Bioorthogonal non-canonical amino acid tagging selectively identified bacterial species in response to medium-chain fatty acids. Metagenomic analysis identified high strain-level dynamics in response to EEN. Functional changes in diet-exposed fecal microbiota were further validated using gut chemostat cultures and microbiota transfer into germ-free Il10-deficient mice. Dietary model conditions induced individual patient-specific strain signatures to prevent or cause inflammatory bowel disease (IBD)-like inflammation in gnotobiotic mice. Hence, we provide evidence that EEN therapy operates through explicit functional changes of temporally and individually variable microbiome profiles.
RESUMO
BACKGROUND: Patients with eosinophilic esophagitis (EoE) require long-lasting resolution of inflammation to prevent fibrostenosis and dysphagia. However, the dissociation between symptoms and histologic improvement suggests persistent molecular drivers despite histologic remission. OBJECTIVE: We characterized persisting molecular alterations in pediatric patients with EoE using tissue transcriptomics and proteomics. METHODS: Esophageal biopsy samples (n = 247) collected prospectively during 189 endoscopies from pediatric patients with EoE (n = 36, up to 11 follow-up endoscopies) and pediatric controls (n = 44, single endoscopies) were subjected to bulk transcriptomics (n = 96) and proteomics (n = 151). Intercellular junctions (desmoglein-1/3, desmoplakin, E-cadherin) and epithelial-to-mesenchymal transition (vimentin:E-cadherin ratio) were assessed by immunofluorescence staining. RESULTS: Active EoE (≥15 eosinophils per high-power field [eos/hpf]), inactive EoE (<15 eos/hpf), and deep-remission EoE (0 eos/hpf) were diagnosed in 107 of 185, 78 of 185, and 41 of 185 biopsy samples, respectively. Among the dysregulated genes (up-/downregulated 310/112) and proteins (up-/downregulated 68/16) between active EoE and controls, 17 genes, and 6 proteins remained dysregulated in inactive EoE. Using persistently upregulated genes (n = 9) and proteins (n = 3) only, such as ALOX15, CXCL1, CXCL6, CTSG, CDH26, PRRX1, CLC, EPX, and periostin (POSTN), was sufficient to separate inactive EoE and deep-remission biopsy samples from control tissue. While 32 differentially expressed genes persisted in deep-remission EoE compared to controls, the proteome normalized except for persistently upregulated POSTN. Epithelial-to-mesenchymal transition normalized in inactive EoE, whereas desmosome recovery remained impaired as a result of desmoglein-1 downregulation. CONCLUSION: The analysis of molecular changes shows persistent EoE-associated esophageal dysregulation despite histologic remission. These data expand our understanding of inflammatory processes and possible mechanisms that underlie tissue remodeling in EoE.
RESUMO
Machine learning methods for extracting patterns from high-dimensional data are very important in the biological sciences. However, in certain cases, real-world applications cannot confirm the reported prediction performance. One of the main reasons for this is data leakage, which can be seen as the illicit sharing of information between the training data and the test data, resulting in performance estimates that are far better than the performance observed in the intended application scenario. Data leakage can be difficult to detect in biological datasets due to their complex dependencies. With this in mind, we present seven questions that should be asked to prevent data leakage when constructing machine learning models in biological domains. We illustrate the usefulness of our questions by applying them to nontrivial examples. Our goal is to raise awareness of potential data leakage problems and to promote robust and reproducible machine learning-based research in biology.
Assuntos
Aprendizado de Máquina , Humanos , Biologia Computacional/métodos , AlgoritmosRESUMO
Most heritable diseases are polygenic. To comprehend the underlying genetic architecture, it is crucial to discover the clinically relevant epistatic interactions (EIs) between genomic single nucleotide polymorphisms (SNPs) (1-3). Existing statistical computational methods for EI detection are mostly limited to pairs of SNPs due to the combinatorial explosion of higher-order EIs. With NeEDL (network-based epistasis detection via local search), we leverage network medicine to inform the selection of EIs that are an order of magnitude more statistically significant compared to existing tools and consist, on average, of five SNPs. We further show that this computationally demanding task can be substantially accelerated once quantum computing hardware becomes available. We apply NeEDL to eight different diseases and discover genes (affected by EIs of SNPs) that are partly known to affect the disease, additionally, these results are reproducible across independent cohorts. EIs for these eight diseases can be interactively explored in the Epistasis Disease Atlas (https://epistasis-disease-atlas.com). In summary, NeEDL demonstrates the potential of seamlessly integrated quantum computing techniques to accelerate biomedical research. Our network medicine approach detects higher-order EIs with unprecedented statistical and biological evidence, yielding unique insights into polygenic diseases and providing a basis for the development of improved risk scores and combination therapies.
Assuntos
Epistasia Genética , Polimorfismo de Nucleotídeo Único , Humanos , Teoria Quântica , Herança Multifatorial/genética , Doença/genética , Biologia Computacional/métodos , Algoritmos , Predisposição Genética para DoençaRESUMO
Mitochondrial dysfunction is associated with inflammatory bowel diseases (IBDs). To understand how microbial-metabolic circuits contribute to intestinal injury, we disrupt mitochondrial function in the epithelium by deleting the mitochondrial chaperone, heat shock protein 60 (Hsp60Δ/ΔIEC). This metabolic perturbation causes self-resolving tissue injury. Regeneration is disrupted in the absence of the aryl hydrocarbon receptor (Hsp60Δ/ΔIEC;AhR-/-) involved in intestinal homeostasis or inflammatory regulator interleukin (IL)-10 (Hsp60Δ/ΔIEC;Il10-/-), causing IBD-like pathology. Injury is absent in the distal colon of germ-free (GF) Hsp60Δ/ΔIEC mice, highlighting bacterial control of metabolic injury. Colonizing GF Hsp60Δ/ΔIEC mice with the synthetic community OMM12 reveals expansion of metabolically flexible Bacteroides, and B. caecimuris mono-colonization recapitulates the injury. Transcriptional profiling of the metabolically impaired epithelium reveals gene signatures involved in oxidative stress (Ido1, Nos2, Duox2). These signatures are observed in samples from Crohn's disease patients, distinguishing active from inactive inflammation. Thus, mitochondrial perturbation of the epithelium causes microbiota-dependent injury with discriminative inflammatory gene profiles relevant for IBD.
Assuntos
Chaperonina 60 , Microbioma Gastrointestinal , Mitocôndrias , Animais , Camundongos , Mitocôndrias/metabolismo , Humanos , Chaperonina 60/genética , Chaperonina 60/metabolismo , Doenças Inflamatórias Intestinais/microbiologia , Mucosa Intestinal/microbiologia , Mucosa Intestinal/metabolismo , Interleucina-10/genética , Interleucina-10/metabolismo , Estresse Oxidativo , Bacteroides/genética , Camundongos Endogâmicos C57BL , Camundongos Knockout , Receptores de Hidrocarboneto Arílico/metabolismo , Receptores de Hidrocarboneto Arílico/genética , Perfilação da Expressão Gênica , Intestinos/microbiologia , Intestinos/patologia , Modelos Animais de Doenças , Doença de Crohn/microbiologiaRESUMO
The traditional nomenclature of enteroendocrine cells (EECs), established in 1977, applied the "one cell - one hormone" dogma, which distinguishes subpopulations based on the secretion of a specific hormone. These hormone-specific subpopulations included S cells for secretin (SCT), K cells for glucose-dependent insulinotropic polypeptide (GIP), N cells producing neurotensin (NTS), I cells producing cholecystokinin (CCK), D cells producing somatostatin (SST), and others. In the past 15 years, reinvestigations into murine and human organoid-derived EECs, however, strongly questioned this dogma and established that certain EECs coexpress multiple hormones. Using the Gut Cell Atlas, the largest available single-cell transcriptome dataset of human intestinal cells, this study consolidates that the original dogma is outdated not only for murine and human organoid-derived EECs, but also for primary human EECs, showing that the expression of certain hormones is not restricted to their designated cell type. Moreover, specific analyses into SCT-expressing cells reject the presence of any cell population that exhibits significantly elevated secretin expression compared to other cell populations, previously referred to as S cells. Instead, this investigation indicates that secretin production is realized jointly by other enteroendocrine subpopulations, validating corresponding observations in murine EECs also for human EECs. Furthermore, our findings corroborate that SCT expression peaks in mature EECs, in contrast, progenitor EECs exhibit markedly lower expression levels, supporting the hypothesis that SCT expression is a hallmark of EEC maturation.
Assuntos
Células Enteroendócrinas , Perfilação da Expressão Gênica , Secretina , Análise de Célula Única , Humanos , Células Enteroendócrinas/metabolismo , Secretina/metabolismo , Secretina/genética , Análise de Célula Única/métodos , Camundongos , Animais , Transcriptoma , Diferenciação Celular , Organoides/metabolismo , Organoides/citologia , Colecistocinina/metabolismo , Colecistocinina/genética , Somatostatina/metabolismo , Somatostatina/genética , Análise da Expressão Gênica de Célula ÚnicaRESUMO
Thromboembolic events are common in patients with essential thrombocythemia (ET). However, the pathophysiological mechanisms underlying the increased thrombotic risk remain to be determined. Here, we perform the first phenotypical characterization of platelet expression using single-cell mass cytometry in six ET patients and six age- and sex-matched healthy individuals. A large panel of 18 transmembrane regulators of platelet function and activation were analyzed, at baseline and after ex-vivo stimulation with thrombin receptor-activating peptide (TRAP). We detected a significant overexpression of the activation marker CD62P (p-Selectin) (p = .049) and the collagen receptor GPVI (p = .044) in non-stimulated ET platelets. In contrast, ET platelets had a lower expression of the integrin subunits of the fibrinogen receptor GPIIb/IIIa CD41 (p = .036) and CD61 (p = .044) and of the von Willebrand factor receptor CD42b (p = .044). Using the FlowSOM algorithm, we identified 2 subclusters of ET platelets with a prothrombotic expression profile, one of them (cluster 3) significantly overrepresented in ET (22.13% of the total platelets in ET, 2.94% in controls, p = .035). Platelet counts were significantly increased in ET compared to controls (p = .0123). In ET, MPV inversely correlated with platelet count (r=-0.96). These data highlight the prothrombotic phenotype of ET and postulate GPVI as a potential target to prevent thrombosis in these patients.
Essential thrombocythemia (ET) is a rare disease characterized by an increased number of platelets in the blood. As a complication, many of these patients develop a blood clot, which can be life-threatening. So far, the reason behind the higher risk of blood clots is unclear. In this study, we analyzed platelet surface markers that play a critical role in platelet function and platelet activation using a modern technology called mass cytometry. For this purpose, blood samples from 6 patients with ET and 6 healthy control individuals were analyzed. We found significant differences between ET platelets and healthy platelets. ET platelets had higher expression levels of p-Selectin (CD62P), a key marker of platelet activation, and of the collagen receptor GPVI, which is important for clot formation. These results may be driven by a specific platelet subcluster overrepresented in ET. Other surface markers, such as the fibrinogen receptor GPIIb/IIIa CD41, CD61, and the von Willebrand factor receptor CD42b, were lower expressed in ET platelets. When ET platelets were treated with the clotting factor thrombin (thrombin receptor-activating peptide, TRAP), we found a differential response in platelet activation compared to healthy platelets. In conclusion, our results show an increased activation and clotting potential of ET platelets. The platelet surface protein GPVI may be a potential drug target to prevent abnormal blood clotting in ET patients.
Assuntos
Plaquetas , Trombocitemia Essencial , Trombose , Humanos , Trombocitemia Essencial/metabolismo , Trombocitemia Essencial/complicações , Plaquetas/metabolismo , Masculino , Feminino , Trombose/metabolismo , Trombose/etiologia , Pessoa de Meia-Idade , Idoso , Citometria de Fluxo/métodos , Ativação Plaquetária , Estudos de Casos e Controles , AdultoRESUMO
In recent decades, the development of new drugs has become increasingly expensive and inefficient, and the molecular mechanisms of most pharmaceuticals remain poorly understood. In response, computational systems and network medicine tools have emerged to identify potential drug repurposing candidates. However, these tools often require complex installation and lack intuitive visual network mining capabilities. To tackle these challenges, we introduce Drugst.One, a platform that assists specialized computational medicine tools in becoming user-friendly, web-based utilities for drug repurposing. With just three lines of code, Drugst.One turns any systems biology software into an interactive web tool for modeling and analyzing complex protein-drug-disease networks. Demonstrating its broad adaptability, Drugst.One has been successfully integrated with 21 computational systems medicine tools. Available at https://drugst.one, Drugst.One has significant potential for streamlining the drug discovery process, allowing researchers to focus on essential aspects of pharmaceutical treatment research.
Assuntos
Reposicionamento de Medicamentos , Software , Reposicionamento de Medicamentos/métodos , Humanos , Internet , Descoberta de Drogas/métodos , Biologia de Sistemas/métodos , Biologia Computacional/métodosRESUMO
Microbiota assembly in the infant gut is influenced by diet. Breastfeeding and human breastmilk oligosaccharides promote the colonization of beneficial bifidobacteria. Infant formulas are supplemented with bifidobacteria or complex oligosaccharides, notably galacto-oligosaccharides (GOS), to mimic breast milk. To compare microbiota development across feeding modes, this randomized controlled intervention study (German Clinical Trial DRKS00012313) longitudinally sampled infant stool during the first year of life, revealing similar fecal bacterial communities between formula- and breast-fed infants (N = 210) but differences across age. Infant formula containing GOS sustained high levels of bifidobacteria compared with formula containing B. longum and B. breve or placebo. Metabolite and bacterial profiling revealed 24-h oscillations and circadian networks. Rhythmicity in bacterial diversity, specific taxa, and functional pathways increased with age and was strongest following breastfeeding and GOS supplementation. Circadian rhythms in dominant taxa were further maintained ex vivo in a chemostat model. Hence, microbiota rhythmicity develops early in life and is impacted by diet.
Assuntos
Fórmulas Infantis , Microbiota , Feminino , Humanos , Lactente , Bifidobacterium , Aleitamento Materno , Ritmo Circadiano , Fezes/microbiologia , Fórmulas Infantis/microbiologia , Leite Humano , Oligossacarídeos/metabolismoRESUMO
Summary: Diseases can be caused by molecular perturbations that induce specific changes in regulatory interactions and their coordinated expression, also referred to as network rewiring. However, the detection of complex changes in regulatory connections remains a challenging task and would benefit from the development of novel nonparametric approaches. We develop a new ensemble method called BoostDiff (boosted differential regression trees) to infer a differential network discriminating between two conditions. BoostDiff builds an adaptively boosted (AdaBoost) ensemble of differential trees with respect to a target condition. To build the differential trees, we propose differential variance improvement as a novel splitting criterion. Variable importance measures derived from the resulting models are used to reflect changes in gene expression predictability and to build the output differential networks. BoostDiff outperforms existing differential network methods on simulated data evaluated in four different complexity settings. We then demonstrate the power of our approach when applied to real transcriptomics data in COVID-19, Crohn's disease, breast cancer, prostate adenocarcinoma, and stress response in Bacillus subtilis. BoostDiff identifies context-specific networks that are enriched with genes of known disease-relevant pathways and complements standard differential expression analyses. Availability and implementation: BoostDiff is available at https://github.com/scibiome/boostdiff_inference.
RESUMO
Summary: Transcriptome deconvolution has emerged as a reliable technique to estimate cell-type abundances from bulk RNA sequencing data. Unlike their human equivalents, methods to quantify the cellular composition of complex tissues from murine transcriptomics are sparse and sometimes not easy to use. We extended the immunedeconv R package to facilitate the deconvolution of mouse transcriptomics, enabling the quantification of murine immune-cell types using 13 different methods. Through immunedeconv, we further offer the possibility of tweaking cell signatures used by deconvolution methods, providing custom annotations tailored for specific cell types and tissues. These developments strongly facilitate the study of the immune-cell composition of mouse models and further open new avenues in the investigation of the cellular composition of other tissues and organisms. Availability and implementation: The R package and the documentation are available at https://github.com/omnideconv/immunedeconv.
RESUMO
Identifying protein-protein interactions (PPIs) is crucial for deciphering biological pathways. Numerous prediction methods have been developed as cheap alternatives to biological experiments, reporting surprisingly high accuracy estimates. We systematically investigated how much reproducible deep learning models depend on data leakage, sequence similarities and node degree information, and compared them with basic machine learning models. We found that overlaps between training and test sets resulting from random splitting lead to strongly overestimated performances. In this setting, models learn solely from sequence similarities and node degrees. When data leakage is avoided by minimizing sequence similarities between training and test set, performances become random. Moreover, baseline models directly leveraging sequence similarity and network topology show good performances at a fraction of the computational cost. Thus, we advocate that any improvements should be reported relative to baseline methods in the future. Our findings suggest that predicting PPIs remains an unsolved task for proteins showing little sequence similarity to previously studied proteins, highlighting that further experimental research into the 'dark' protein interactome and better computational methods are needed.
Assuntos
Aprendizado de MáquinaRESUMO
Molecular profiling techniques such as metagenomics, metatranscriptomics or metabolomics offer important insights into the functional diversity of the microbiome. In contrast, 16S rRNA gene sequencing, a widespread and cost-effective technique to measure microbial diversity, only allows for indirect estimation of microbial function. To mitigate this, tools such as PICRUSt2, Tax4Fun2, PanFP and MetGEM infer functional profiles from 16S rRNA gene sequencing data using different algorithms. Prior studies have cast doubts on the quality of these predictions, motivating us to systematically evaluate these tools using matched 16S rRNA gene sequencing, metagenomic datasets, and simulated data. Our contribution is threefold: (i) using simulated data, we investigate if technical biases could explain the discordance between inferred and expected results; (ii) considering human cohorts for type two diabetes, colorectal cancer and obesity, we test if health-related differential abundance measures of functional categories are concordant between 16S rRNA gene-inferred and metagenome-derived profiles and; (iii) since 16S rRNA gene copy number is an important confounder in functional profiles inference, we investigate if a customised copy number normalisation with the rrnDB database could improve the results. Our results show that 16S rRNA gene-based functional inference tools generally do not have the necessary sensitivity to delineate health-related functional changes in the microbiome and should thus be used with care. Furthermore, we outline important differences in the individual tools tested and offer recommendations for tool selection.
Assuntos
Metagenoma , Microbiota , Humanos , RNA Ribossômico 16S/genética , Genes de RNAr , Microbiota/genética , AlgoritmosRESUMO
Bulk RNA sequencing (RNA-seq) of blood is typically used for gene expression analysis in biomedical research but is still rarely used in clinical practice. In this study, we propose that RNA-seq should be considered a diagnostic tool, as it offers not only insights into aberrant gene expression and splicing but also delivers additional readouts on immune cell type composition as well as B-cell and T-cell receptor (BCR/TCR) repertoires. We demonstrate that RNA-seq offers insights into a patient's immune status via integrative analysis of RNA-seq data from patients infected with various SARS-CoV-2 variants (in total 196 samples with up to 200 million reads sequencing depth). We compare the results of computational cell-type deconvolution methods (e.g., MCP-counter, xCell, EPIC, quanTIseq) to complete blood count data, the current gold standard in clinical practice. We observe varying levels of lymphocyte depletion and significant differences in neutrophil levels between SARS-CoV-2 variants. Additionally, we identify B and T cell receptor (BCR/TCR) sequences using the tools MiXCR and TRUST4 to show that-combined with sequence alignments and BLASTp-they could be used to classify a patient's disease. Finally, we investigated the sequencing depth required for such analyses and concluded that 10 million reads per sample is sufficient. In conclusion, our study reveals that computational cell-type deconvolution and BCR/TCR methods using bulk RNA-seq analyses can supplement missing CBC data and offer insights into immune responses, disease severity, and pathogen-specific immunity, all achievable with a sequencing depth of 10 million reads per sample.
Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/genética , Perfilação da Expressão Gênica , Receptores de Antígenos de Linfócitos T/genética , Análise de Sequência de RNA/métodos , ImunidadeRESUMO
RNA sequencing offers unique insights into transcriptome diversity, and a plethora of tools have been developed to analyze alternative splicing. One important task is to detect changes in the relative transcript abundance in differential transcript usage (DTU) analysis. The choice of the right analysis tool is non-trivial and depends on experimental factors such as the availability of single- or paired-end and bulk or single-cell data. To help users select the most promising tool for their task, we performed a comprehensive benchmark of DTU detection tools. We cover a wide array of experimental settings, using simulated bulk and single-cell RNA-seq data as well as real transcriptomics datasets, including time-series data. Our results suggest that DEXSeq, edgeR, and LimmaDS are better choices for paired-end data, while DSGseq and DEXSeq can be used for single-end data. In single-cell simulation settings, we showed that satuRn performs better than DTUrtle. In addition, we showed that Spycone is optimal for time series DTU/IS analysis based on the evidence provided using GO terms enrichment analysis.
RESUMO
Bulk RNA sequencing (RNA-seq) of blood is typically used for gene expression analysis in biomedical research but is still rarely used in clinical practice. In this study, we argue that RNA-seq should be considered a routine diagnostic tool, as it offers not only insights into aberrant gene expression and splicing but also delivers additional readouts on immune cell type composition as well as B-cell and T-cell receptor (BCR/TCR) repertoires. We demonstrate that RNA-seq offers vital insights into a patient's immune status via integrative analysis of RNA-seq data from patients infected with various SARS-CoV-2 variants (in total 240 samples with up to 200 million reads sequencing depth). We compare the results of computational cell-type deconvolution methods (e.g., MCP-counter, xCell, EPIC, quanTIseq) to complete blood count data, the current gold standard in clinical practice. We observe varying levels of lymphocyte depletion and significant differences in neutrophil levels between SARS-CoV-2 variants. Additionally, we identify B and T cell receptor (BCR/TCR) sequences using the tools MiXCR and TRUST4 to show that - combined with sequence alignments and pBLAST - they could be used to classify a patient's disease. Finally, we investigated the sequencing depth required for such analyses and concluded that 10 million reads per sample is sufficient. In conclusion, our study reveals that computational cell-type deconvolution and BCR/TCR methods using bulk RNA-seq analyses can supplement missing CBC data and offer insights into immune responses, disease severity, and pathogen-specific immunity, all achievable with a sequencing depth of 10 million reads per sample.
RESUMO
Most heritable diseases are polygenic. To comprehend the underlying genetic architecture, it is crucial to discover the clinically relevant epistatic interactions (EIs) between genomic single nucleotide polymorphisms (SNPs)1-3. Existing statistical computational methods for EI detection are mostly limited to pairs of SNPs due to the combinatorial explosion of higher-order EIs. With NeEDL (network-based epistasis detection via local search), we leverage network medicine to inform the selection of EIs that are an order of magnitude more statistically significant compared to existing tools and consist, on average, of five SNPs. We further show that this computationally demanding task can be substantially accelerated once quantum computing hardware becomes available. We apply NeEDL to eight different diseases and discover genes (affected by EIs of SNPs) that are partly known to affect the disease, additionally, these results are reproducible across independent cohorts. EIs for these eight diseases can be interactively explored in the Epistasis Disease Atlas (https://epistasis-disease-atlas.com). In summary, NeEDL is the first application that demonstrates the potential of seamlessly integrated quantum computing techniques to accelerate biomedical research. Our network medicine approach detects higher-order EIs with unprecedented statistical and biological evidence, yielding unique insights into polygenic diseases and providing a basis for the development of improved risk scores and combination therapies.
RESUMO
A key problem in systems biology is the discovery of regulatory mechanisms that drive phenotypic behaviour of complex biological systems in the form of multi-level networks. Modern multi-omics profiling techniques probe these fundamental regulatory networks but are often hampered by experimental restrictions leading to missing data or partially measured omics types for subsets of individuals due to cost restrictions. In such scenarios, in which missing data is present, classical computational approaches to infer regulatory networks are limited. In recent years, approaches have been proposed to infer sparse regression models in the presence of missing information. Nevertheless, these methods have not been adopted for regulatory network inference yet. In this study, we integrated regression-based methods that can handle missingness into KiMONo, a Knowledge guided Multi-Omics Network inference approach, and benchmarked their performance on commonly encountered missing data scenarios in single- and multi-omics studies. Overall, two-step approaches that explicitly handle missingness performed best for a wide range of random- and block-missingness scenarios on imbalanced omics-layers dimensions, while methods implicitly handling missingness performed best on balanced omics-layers dimensions. Our results show that robust multi-omics network inference in the presence of missing data with KiMONo is feasible and thus allows users to leverage available multi-omics data to its full extent.
Assuntos
Benchmarking , Multiômica , Humanos , Biologia de SistemasRESUMO
MicroRNAs (miRNAs) are small non-coding RNA molecules that bind to target sites in different gene regions and regulate post-transcriptional gene expression. Approximately 95% of human multi-exon genes can be spliced alternatively, which enables the production of functionally diverse transcripts and proteins from a single gene. Through alternative splicing, transcripts might lose the exon with the miRNA target site and become unresponsive to miRNA regulation. To check this hypothesis, we studied the role of miRNA target sites in both coding and non-coding regions using six cancer data sets from The Cancer Genome Atlas (TCGA) and Parkinson's disease data from PPMI. First, we predicted miRNA target sites on mRNAs from their sequence using TarPmiR. To check whether alternative splicing interferes with this regulation, we trained linear regression models to predict miRNA expression from transcript expression. Using nested models, we compared the predictive power of transcripts with miRNA target sites in the coding regions to that of transcripts without target sites. Models containing transcripts with target sites perform significantly better. We conclude that alternative splicing does interfere with miRNA regulation by skipping exons with miRNA target sites within the coding region.
RESUMO
Transcription factors (TFs) are essential players in orchestrating the regulatory landscape in cells. Still, their exact modes of action and dependencies on other regulatory aspects remain elusive. Since TFs act cell type-specific and each TF has its own characteristics, untangling their regulatory interactions from an experimental point of view is laborious and convoluted. Thus, there is an ongoing development of computational tools that estimate transcription factor activity (TFA) from a variety of data modalities, either based on a mapping of TFs to their putative target genes or in a genome-wide, gene-unspecific fashion. These tools can help to gain insights into TF regulation and to prioritize candidates for experimental validation. We want to give an overview of available computational tools that estimate TFA, illustrate examples of their application, debate common result validation strategies, and discuss assumptions and concomitant limitations.