Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 124
Filter
1.
Animals (Basel) ; 14(11)2024 Jun 05.
Article in English | MEDLINE | ID: mdl-38891742

ABSTRACT

Complex traits are widely considered to be the result of a compound regulation of genes, environmental factors, and genotype-by-environment interaction (G × E). The inclusion of G × E in genome-wide association analyses is essential to understand animal environmental adaptations and improve the efficiency of breeding decisions. Here, we systematically investigated the G × E of growth traits (including weaning weight, yearling weight, 18-month body weight, and 24-month body weight) with environmental factors (farm and temperature) using genome-wide genotype-by-environment interaction association studies (GWEIS) with a dataset of 1350 cattle. We validated the robust estimator's effectiveness in GWEIS and detected 29 independent interacting SNPs with a significance threshold of 1.67 × 10-6, indicating that these SNPs, which do not show main effects in traditional genome-wide association studies (GWAS), may have non-additive effects across genotypes but are obliterated by environmental means. The gene-based analysis using MAGMA identified three genes that overlapped with the GEWIS results exhibiting G × E, namely SMAD2, PALMD, and MECOM. Further, the results of functional exploration in gene-set analysis revealed the bio-mechanisms of how cattle growth responds to environmental changes, such as mitotic or cytokinesis, fatty acid ß-oxidation, neurotransmitter activity, gap junction, and keratan sulfate degradation. This study not only reveals novel genetic loci and underlying mechanisms influencing growth traits but also transforms our understanding of environmental adaptation in beef cattle, thereby paving the way for more targeted and efficient breeding strategies.

2.
Curr Issues Mol Biol ; 46(5): 4701-4720, 2024 May 13.
Article in English | MEDLINE | ID: mdl-38785552

ABSTRACT

A crucial feature of life is its spatial organization and compartmentalization on the molecular, cellular, and tissue levels. Spatial transcriptomics (ST) technology has opened a new chapter of the sequencing revolution, emerging rapidly with transformative effects across biology. This technique produces extensive and complex sequencing data, raising the need for computational methods for their comprehensive analysis and interpretation. We developed the ST browser web tool for the interactive discovery of ST images, focusing on different functional aspects such as single gene expression, the expression of functional gene sets, as well as the inspection of the spatial patterns of cell-cell interactions. As a unique feature, our tool applies self-organizing map (SOM) machine learning to the ST data. Our SOM data portrayal method generates individual gene expression landscapes for each spot in the ST image, enabling its downstream analysis with high resolution. The performance of the spatial browser is demonstrated by disentangling the intra-tumoral heterogeneity of melanoma and the microarchitecture of the mouse brain. The integration of machine-learning-based SOM portrayal into an interactive ST analysis environment opens novel perspectives for the comprehensive knowledge mining of the organization and interactions of cellular ecosystems.

3.
Neuroimage ; 293: 120622, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38648869

ABSTRACT

Correlating transcriptional profiles with imaging-derived phenotypes has the potential to reveal possible molecular architectures associated with cognitive functions, brain development and disorders. Competitive null models built by resampling genes and self-contained null models built by spinning brain regions, along with varying test statistics, have been used to determine the significance of transcriptional associations. However, there has been no systematic evaluation of their performance in imaging transcriptomics analyses. Here, we evaluated the performance of eight different test statistics (mean, mean absolute value, mean squared value, max mean, median, Kolmogorov-Smirnov (KS), Weighted KS and the number of significant correlations) in both competitive null models and self-contained null models. Simulated brain maps (n = 1,000) and gene sets (n = 500) were used to calculate the probability of significance (Psig) for each statistical test. Our results suggested that competitive null models may result in false positive results driven by co-expression within gene sets. Furthermore, we demonstrated that the self-contained null models may fail to account for distribution characteristics (e.g., bimodality) of correlations between all available genes and brain phenotypes, leading to false positives. These two confounding factors interacted differently with test statistics, resulting in varying outcomes. Specifically, the sign-sensitive test statistics (i.e., mean, median, KS, Weighted KS) were influenced by co-expression bias in the competitive null models, while median and sign-insensitive test statistics were sensitive to the bimodality bias in the self-contained null models. Additionally, KS-based statistics produced conservative results in the self-contained null models, which increased the risk of false negatives. Comprehensive supplementary analyses with various configurations, including realistic scenarios, supported the results. These findings suggest utilizing sign-insensitive test statistics such as mean absolute value, max mean in the competitive null models and the mean as the test statistic for the self-contained null models. Additionally, adopting the confounder-matched (e.g., coexpression-matched) null models as an alternative to standard null models can be a viable strategy. Overall, the present study offers insights into the selection of statistical tests for imaging transcriptomics studies, highlighting areas for further investigation and refinement in the evaluation of novel and commonly used tests.


Subject(s)
Brain , Phenotype , Brain/diagnostic imaging , Brain/anatomy & histology , Humans , Transcriptome , Models, Statistical , Gene Expression Profiling/methods
4.
Article in English | MEDLINE | ID: mdl-38305800

ABSTRACT

The establishment of 3'aQTLs comprehensive database provides an opportunity to help explore the functional interpretation from the genome-wide association study (GWAS) data of psychiatric disorders. In this study, we aim to search novel susceptibility genes, pathways, and related chemicals of five psychiatric disorders via GWAS and 3'aQTLs datasets. The GWAS datasets of five psychiatric disorders were collected from the open platform of Psychiatric Genomics Consortium (PGC, https://www.med.unc.edu/pgc/ ) and iPSYCH ( https://ipsych.dk/ ) (Demontis et al. in Nat Genet 51(1):63-75, 2019; Grove et al. in Nat Genet 51:431-444, 2019; Genomic Dissection of Bipolar Disorder and Schizophrenia in Cell 173: 1705-1715.e1716, 2018; Mullins et al. in Nat Genet 53: 817-829; Howard et al. in Nat Neurosci 22: 343-352, 2019). The 3'untranslated region (3'UTR) alternative polyadenylation (APA) quantitative trait loci (3'aQTLs) summary datasets of 12 brain regions were obtained from another public platform ( https://wlcb.oit.uci.edu/3aQTLatlas/ ) (Cui et al. in Nucleic Acids Res 50: D39-D45, 2022). First, we aligned the GWAS-associated SNPs of psychiatric disorders and datasets of 3'aQTLs, and then, the GWAS-associated 3'aQTLs were identified from the overlap. Second, gene ontology (GO) and pathway analysis was applied to investigate the potential biological functions of matching genes based on the methods provided by MAGMA. Finally, chemical-related gene-set analysis (GSA) was also conducted by MAGMA to explore the potential interaction of GWAS-associated 3'aQTLs and multiple chemicals in the mechanism of psychiatric disorders. A number of susceptibility genes with 3'aQTLs were found to be associated with psychiatric disorders and some of them had brain-region specificity. For schizophrenia (SCZ), HLA-A showed associated with psychiatric disorders in all 12 brain regions, such as cerebellar hemisphere (P = 1.58 × 10-36) and cortex (P = 1.58 × 10-36). GO and pathway analysis identified several associated pathways, such as Phenylpropanoid Metabolic Process (GO:0009698, P = 6.24 × 10-7 for SCZ). Chemical-related GSA detected several chemical-related gene sets associated with psychiatric disorders. For example, gene sets of Ferulic Acid (P = 6.24 × 10-7), Morin (P = 4.47 × 10-2) and Vanillic Acid (P = 6.24 × 10-7) were found to be associated with SCZ. By integrating the functional information from 3'aQTLs, we identified several susceptibility genes and associated pathways especially chemical-related gene sets for five psychiatric disorders. Our results provided new insights to understand the etiology and mechanism of psychiatric disorders.

5.
BMC Bioinformatics ; 24(1): 408, 2023 Oct 30.
Article in English | MEDLINE | ID: mdl-37904108

ABSTRACT

BACKGROUND: Gene-wise differential expression is usually the first major step in the statistical analysis of high-throughput data obtained from techniques such as microarrays or RNA-sequencing. The analysis at gene level is often complemented by interrogating the data in a broader biological context that considers as unit of measure groups of genes that may have a common function or biological trait. Among the vast number of publications about gene set analysis (GSA), the rotation test for gene set analysis, also referred to as roast, is a general sample randomization approach that maintains the integrity of the intra-gene set correlation structure in defining the null distribution of the test. RESULTS: We present roastgsa, an R package that contains several enrichment score functions that feed the roast algorithm for hypothesis testing. These implemented methods are evaluated using both simulated and benchmarking data in microarray and RNA-seq datasets. We find that computationally intensive measures based on Kolmogorov-Smirnov (KS) statistics fail to improve the rates of simpler measures of GSA like mean and maxmean scores. We also show the importance of accounting for the gene linear dependence structure of the testing set, which is linked to the loss of effective signature size. Complete graphical representation of the results, including an approximation for the effective signature size, can be obtained as part of the roastgsa output. CONCLUSIONS: We encourage the usage of the absmean (non-directional), mean (directional) and maxmean (directional) scores for roast GSA analysis as these are simple measures of enrichment that have presented dominant results in all provided analyses in comparison to the more complex KS measures.


Subject(s)
Algorithms , Gene Expression Profiling , Gene Expression Profiling/methods , Rotation , Oligonucleotide Array Sequence Analysis/methods , Phenotype
6.
Int J Cancer ; 153(10): 1819-1828, 2023 11 15.
Article in English | MEDLINE | ID: mdl-37551617

ABSTRACT

Genome-scale screening experiments in cancer produce long lists of candidate genes that require extensive interpretation for biological insight and prioritization for follow-up studies. Interrogation of gene lists frequently represents a significant and time-consuming undertaking, in which experimental biologists typically combine results from a variety of bioinformatics resources in an attempt to portray and understand cancer relevance. As a means to simplify and strengthen the support for this endeavor, we have developed oncoEnrichR, a flexible bioinformatics tool that allows cancer researchers to comprehensively interrogate a given gene list along multiple facets of cancer relevance. oncoEnrichR differs from general gene set analysis frameworks through the integration of an extensive set of prior knowledge specifically relevant for cancer, including ranked gene-tumor type associations, literature-supported proto-oncogene and tumor suppressor gene annotations, target druggability data, regulatory interactions, synthetic lethality predictions, as well as prognostic associations, gene aberrations and co-expression patterns across tumor types. The software produces a structured and user-friendly analysis report as its main output, where versions of all underlying data resources are explicitly logged, the latter being a critical component for reproducible science. We demonstrate the usefulness of oncoEnrichR through interrogation of two candidate lists from proteomic and CRISPR screens. oncoEnrichR is freely available as a web-based service hosted by the Galaxy platform (https://oncotools.elixir.no), and can also be accessed as a stand-alone R package (https://github.com/sigven/oncoEnrichR).


Subject(s)
Neoplasms , Proteomics , Humans , Computational Biology/methods , Software , Neoplasms/genetics
7.
Front Immunol ; 14: 1135859, 2023.
Article in English | MEDLINE | ID: mdl-37304268

ABSTRACT

Background: Sepsis is a dysfunctional host response to infection. The syndrome leads to millions of deaths annually (19.7% of all deaths in 2017) and is the cause of most deaths from severe Covid infections. High throughput sequencing or 'omics' experiments in molecular and clinical sepsis research have been widely utilized to identify new diagnostics and therapies. Transcriptomics, quantifying gene expression, has dominated these studies, due to the efficiency of measuring gene expression in tissues and the technical accuracy of technologies like RNA-Seq. Objective: Most of these studies seek to uncover novel mechanistic insights into sepsis pathogenesis and diagnostic gene signatures by identifying genes differentially expressed between two or more relevant conditions. However, little effort has been made, to date, to aggregate this knowledge from such studies. In this study we sought to build a compendium of previously described gene sets that combines knowledge gained from sepsis-associated studies. This would enable the identification of genes most associated with sepsis pathogenesis, and the description of the molecular pathways commonly associated with sepsis. Methods: PubMed was searched for studies using transcriptomics to characterize acute infection/sepsis and severe sepsis (i.e., sepsis combined with organ failure). Several studies were identified that used transcriptomics to identify differentially expressed (DE) genes, predictive/prognostic signatures, and underlying molecular responses and pathways. The molecules included in each gene set were collected, in addition to the relevant study metadata (e.g., patient groups used for comparison, sample collection time point, tissue type, etc.). Results: After performing extensive literature curation of 74 sepsis-related publications involving transcriptomics, 103 unique gene sets (comprising 20,899 unique genes) from thousands of patients were collated together with associated metadata. Frequently described genes included in gene sets as well as the molecular mechanisms they were involved in were identified. These mechanisms included neutrophil degranulation, generation of second messenger molecules, IL-4 and -13 signaling, and IL-10 signaling among many others. The database, which we named SeptiSearch, is made available in a web application created using the Shiny framework in R, (available at https://septisearch.ca). Conclusions: SeptiSearch provides members of the sepsis community the bioinformatic tools needed to leverage and explore the gene sets contained in the database. This will allow the gene sets to be further scrutinized and analyzed for their enrichment in user-submitted gene expression data and used for validation of in-house gene sets/signatures.


Subject(s)
COVID-19 , Sepsis , Humans , COVID-19/genetics , Sepsis/genetics , Computational Biology , Databases, Factual , Gene Expression Profiling
8.
OMICS ; 27(5): 193-204, 2023 05.
Article in English | MEDLINE | ID: mdl-37145884

ABSTRACT

Advanced integrative analysis of DNA methylation and transcriptomics data may provide deeper insights into smoke-induced epigenetic alterations, their effects on gene expression and related biological processes, linking cigarette smoking and related diseases. We hypothesize that accumulation of DNA methylation changes in CpG sites across genomic locations of different genes might have biological significance. We tested the hypothesis by performing gene set based integrative analysis of blood DNA methylation and transcriptomics data to identify potential transcriptomic consequences of smoking via changes in DNA methylation in the Young Finns Study (YFS) participants (n = 1114, aged 34-49 years, women: 54%, men: 46%). First, we performed epigenome-wide association study (EWAS) of smoking. We then defined sets of genes based on DNA methylation status within their genomic regions, for example, sets of genes containing hyper- or hypomethylated CpG sites in their body or promoter regions. Gene set analysis was performed using transcriptomics data from the same participants. Two sets of genes, one containing 49 genes with hypomethylated CpG sites in their body region and the other containing 33 genes with hypomethylated CpG sites in their promoter region, were differentially expressed among the smokers. Genes in the two gene sets are involved in bone formation, metal ion transport, cell death, peptidyl-serine phosphorylation, and cerebral cortex development process, revealing epigenetic-transcriptomic pathways to smoking-related diseases such as osteoporosis, atherosclerosis, and cognitive impairment. These findings contribute to a deeper understanding of the pathophysiology of smoking-related diseases and may provide potential therapeutic targets.


Subject(s)
Cigarette Smoking , Male , Humans , Female , Epigenome , Genome-Wide Association Study , DNA Methylation/genetics , Gene Expression Profiling , CpG Islands/genetics , Epigenesis, Genetic
9.
Front Cell Dev Biol ; 11: 1091047, 2023.
Article in English | MEDLINE | ID: mdl-36875765

ABSTRACT

Feature identification and manual inspection is currently still an integral part of biological data analysis in single-cell sequencing. Features such as expressed genes and open chromatin status are selectively studied in specific contexts, cell states or experimental conditions. While conventional analysis methods construct a relatively static view on gene candidates, artificial neural networks have been used to model their interactions after hierarchical gene regulatory networks. However, it is challenging to identify consistent features in this modeling process due to the inherently stochastic nature of these methods. Therefore, we propose using ensembles of autoencoders and subsequent rank aggregation to extract consensus features in a less biased manner. Here, we performed sequencing data analyses of different modalities either independently or simultaneously as well as with other analysis tools. Our resVAE ensemble method can successfully complement and find additional unbiased biological insights with minimal data processing or feature selection steps while giving a measurement of confidence, especially for models using stochastic or approximation algorithms. In addition, our method can also work with overlapping clustering identity assignment suitable for transitionary cell types or cell fates in comparison to most conventional tools.

10.
Front Cell Dev Biol ; 11: 1065586, 2023.
Article in English | MEDLINE | ID: mdl-36998245

ABSTRACT

Background: The impact of gene-sets on a spatial phenotype is not necessarily uniform across different locations of cancer tissue. This study introduces a computational platform, GWLCT, for combining gene set analysis with spatial data modeling to provide a new statistical test for location-specific association of phenotypes and molecular pathways in spatial single-cell RNA-seq data collected from an input tumor sample. Methods: The main advantage of GWLCT consists of an analysis beyond global significance, allowing the association between the gene-set and the phenotype to vary across the tumor space. At each location, the most significant linear combination is found using a geographically weighted shrunken covariance matrix and kernel function. Whether a fixed or adaptive bandwidth is determined based on a cross-validation cross procedure. Our proposed method is compared to the global version of linear combination test (LCT), bulk and random-forest based gene-set enrichment analyses using data created by the Visium Spatial Gene Expression technique on an invasive breast cancer tissue sample, as well as 144 different simulation scenarios. Results: In an illustrative example, the new geographically weighted linear combination test, GWLCT, identifies the cancer hallmark gene-sets that are significantly associated at each location with the five spatially continuous phenotypic contexts in the tumors defined by different well-known markers of cancer-associated fibroblasts. Scan statistics revealed clustering in the number of significant gene-sets. A spatial heatmap of combined significance over all selected gene-sets is also produced. Extensive simulation studies demonstrate that our proposed approach outperforms other methods in the considered scenarios, especially when the spatial association increases. Conclusion: Our proposed approach considers the spatial covariance of gene expression to detect the most significant gene-sets affecting a continuous phenotype. It reveals spatially detailed information in tissue space and can thus play a key role in understanding the contextual heterogeneity of cancer cells.

11.
Entropy (Basel) ; 25(3)2023 Mar 21.
Article in English | MEDLINE | ID: mdl-36981431

ABSTRACT

Gene sets are being increasingly leveraged to make high-level biological inferences from transcriptomic data; however, existing gene set analysis methods rely on overly conservative, heuristic approaches for quantifying the statistical significance of gene set enrichment. We created Nonparametric analytical-Rank-based Enrichment Analysis (NaRnEA) to facilitate accurate and robust gene set analysis with an optimal null model derived using the information theoretic Principle of Maximum Entropy. By measuring the differential activity of ~2500 transcriptional regulatory proteins based on the differential expression of each protein's transcriptional targets between primary tumors and normal tissue samples in three cohorts from The Cancer Genome Atlas (TCGA), we demonstrate that NaRnEA critically improves in two widely used gene set analysis methods: Gene Set Enrichment Analysis (GSEA) and analytical-Rank-based Enrichment Analysis (aREA). We show that the NaRnEA-inferred differential protein activity is significantly correlated with differential protein abundance inferred from independent, phenotype-matched mass spectrometry data in the Clinical Proteomic Tumor Analysis Consortium (CPTAC), confirming the statistical and biological accuracy of our approach. Additionally, our analysis crucially demonstrates that the sample-shuffling empirical null models leveraged by GSEA and aREA for gene set analysis are overly conservative, a shortcoming that is avoided by the newly developed Maximum Entropy analytical null model employed by NaRnEA.

12.
Int J Mol Sci ; 24(4)2023 Feb 16.
Article in English | MEDLINE | ID: mdl-36835433

ABSTRACT

Amyotrophic lateral sclerosis (ALS) is a fatal late-onset motor neuron disease characterized by the loss of the upper and lower motor neurons. Our understanding of the molecular basis of ALS pathology remains elusive, complicating the development of efficient treatment. Gene-set analyses of genome-wide data have offered insight into the biological processes and pathways of complex diseases and can suggest new hypotheses regarding causal mechanisms. Our aim in this study was to identify and explore biological pathways and other gene sets having genomic association to ALS. Two cohorts of genomic data from the dbGaP repository were combined: (a) the largest available ALS individual-level genotype dataset (N = 12,319), and (b) a similarly sized control cohort (N = 13,210). Following comprehensive quality control pipelines, imputation and meta-analysis, we assembled a large European descent ALS-control cohort of 9244 ALS cases and 12,795 healthy controls represented by genetic variants of 19,242 genes. Multi-marker analysis of genomic annotation (MAGMA) gene-set analysis was applied to an extensive collection of 31,454 gene sets from the molecular signatures database (MSigDB). Statistically significant associations were observed for gene sets related to immune response, apoptosis, lipid metabolism, neuron differentiation, muscle cell function, synaptic plasticity and development. We also report novel interactions between gene sets, suggestive of mechanistic overlaps. A manual meta-categorization and enrichment mapping approach is used to explore the overlap of gene membership between significant gene sets, revealing a number of shared mechanisms.


Subject(s)
Amyotrophic Lateral Sclerosis , Humans , Amyotrophic Lateral Sclerosis/genetics , Genome-Wide Association Study , Genotype , Motor Neurons
13.
Biol Psychiatry Glob Open Sci ; 2(4): 389-399, 2022 Oct.
Article in English | MEDLINE | ID: mdl-36324656

ABSTRACT

Background: To gain more insight into the biological factors that mediate vulnerability to display externalizing behaviors, we leveraged genome-wide association study summary statistics on 13 externalizing phenotypes. Methods: After data classification based on genetic resemblance, we performed multivariate genome-wide association meta-analyses and conducted extensive bioinformatic analyses, including genetic correlation assessment with other traits, Mendelian randomization, and gene set and gene expression analyses. Results: The genetic data could be categorized into disruptive behavior (DB) and risk-taking behavior (RTB) factors, and subsequent genome-wide association meta-analyses provided association statistics for DB and RTB (N eff = 523,150 and 1,506,537, respectively), yielding 50 and 257 independent genetic signals. The statistics of DB, much more than RTB, signaled genetic predisposition to adverse cognitive, mental health, and personality outcomes. We found evidence for bidirectional causal influences between DB and substance use behaviors. Gene set analyses implicated contributions of neuronal cell development (DB/RTB) and synapse formation and transcription (RTB) mechanisms. Gene-brain mapping confirmed involvement of the amygdala and hypothalamus and highlighted other candidate regions (cerebellar dentate, cuneiform nucleus, claustrum, paracentral cortex). At the cell-type level, we noted enrichment of glutamatergic neurons for DB and RTB. Conclusions: This bottom-up, data-driven study provides new insights into the genetic signals of externalizing behaviors and indicates that commonalities in genetic architecture contribute to the frequent co-occurrence of different DBs and different RTBs, respectively. Bioinformatic analyses supported the DB versus RTB categorization and indicated relevant biological mechanisms. Generally similar gene-brain mappings indicate that neuroanatomical differences, if any, escaped the resolution of our methods.

14.
J Pers Med ; 12(11)2022 Nov 20.
Article in English | MEDLINE | ID: mdl-36422108

ABSTRACT

The rapid increase in the number of genetic variants identified to be associated with Amyotrophic Lateral Sclerosis (ALS) through genome-wide association studies (GWAS) has created an emerging need to understand the functional pathways that are implicated in the pathology of ALS. Gene-set analysis (GSA) is a powerful method that can provide insight into the associated biological pathways, determining the joint effect of multiple genetic markers. The main contribution of this review is the collection of ALS GSA studies that employ GWAS or individual-based genotype data, investigating their methodology and results related to ALS-associated molecular pathways. Furthermore, the limitations in standard single-gene analyses are summarized, highlighting the power of gene-set analysis, and a brief overview of the statistical properties of gene-set analysis and related concepts is provided. The main aims of this review are to investigate the reproducibility of the collected studies and identify their strengths and limitations, in order to enhance the experimental design and therefore the quality of the results of future studies, deepening our understanding of this devastating disease.

15.
Atherosclerosis ; 361: 1-9, 2022 11.
Article in English | MEDLINE | ID: mdl-36252457

ABSTRACT

AIM: We aimed at identifying the shared biological processes underlying atherosclerosis-osteoporosis co/multimorbidity. METHODS: We performed gene set analysis (GSA) of whole-blood transcriptomic data to identify biological processes shared by the early markers of these two diseases. Early markers of diseases, carotid intima-media thickness (CIMT) for atherosclerosis and trabecular bone mineral density (BMD) from distal radius and tibia for osteoporosis, were used to categorize the study participants into cases and controls. Participants with high CIMT (>90th percentile) were defined as cases for subclinical atherosclerosis. Study population-based T-scores for BMD were calculated and T-score ≤ -1 was used for the definition of low BMD cases i.e., early indicator of osteoporosis. RESULTS: We did not identify any gene sets jointly associated with early markers of atherosclerosis and osteoporosis. We identified three novel and replicated 234 gene sets significantly associated with high CIMT with false discovery rate (FDR) ≤ 0.01. Only two genes, both related to the immune system, were identified to be associated with high CIMT by traditional differential gene expression analysis. However, none of the studied gene sets or individual genes were significantly associated with tibial or radial BMD. The three novel CIMT associated gene sets contained genes involved in copper homeostasis, neural crest cell migration and nicotinate and nicotinamide metabolism. The 234 replicated gene sets in this study are related to the immune system, hypoxia and apoptosis, consistent with the existing literature on atherosclerosis. CONCLUSIONS: This study identified novel biological processes associated with high CIMT but not with reduced BMD.


Subject(s)
Atherosclerosis , Biological Phenomena , Osteoporosis , Humans , Carotid Intima-Media Thickness , Multimorbidity , Transcriptome , Finland , Cross-Sectional Studies , Osteoporosis/epidemiology , Osteoporosis/genetics , Osteoporosis/complications , Atherosclerosis/diagnosis , Atherosclerosis/epidemiology , Atherosclerosis/genetics , Biomarkers , Risk Factors
16.
Front Med (Lausanne) ; 9: 965908, 2022.
Article in English | MEDLINE | ID: mdl-36035404

ABSTRACT

Gene Set Analysis (GSA) is one of the most commonly used strategies to analyze omics data. Hundreds of GSA-related papers have been published, giving birth to a GSA field in Bioinformatics studies. However, as the field grows, it is becoming more difficult to obtain a clear view of all available methods, resources, and their quality. In this paper, we introduce a web platform called "GSA Central" which, as its name indicates, acts as a focal point to centralize GSA information and tools useful to beginners, average users, and experts in the GSA field. "GSA Central" contains five different resources: A Galaxy instance containing GSA tools ("Galaxy-GSA"), a portal to educational material ("GSA Classroom"), a comprehensive database of articles ("GSARefDB"), a set of benchmarking tools ("GSA BenchmarKING"), and a blog ("GSA Blog"). We expect that "GSA Central" will become a useful resource for users looking for introductory learning, state-of-the-art updates, method/tool selection guidelines and insights, tool usage, tool integration under a Galaxy environment, tool design, and tool validation/benchmarking. Moreover, we expect this kind of platform to become an example of a "thematic platform" containing all the resources that people in the field might need, an approach that could be extended to other bioinformatics topics or scientific fields.

18.
Front Genet ; 13: 818683, 2022.
Article in English | MEDLINE | ID: mdl-35495143

ABSTRACT

A common application of differential expression analysis is finding genes that are differentially expressed upon treatment in only one out of several groups of samples. One of the approaches is to test for significant difference in expression between treatment and control separately in the two groups, and then select genes that show statistical significance in one group only. This approach is then often combined with a gene set enrichment analysis to find pathways and gene sets regulated by treatment in only this group. Here we show that this procedure is statistically incorrect and that the interaction between treatment and group should be tested instead. Moreover, we show that gene set enrichment analysis applied to such incorrectly defined genes group-specific genes may result in misleading artifacts. Due to the presence of false negatives, genes significant in one, but not the other group are enriched in gene sets which correspond to the overall effect of the treatment. Thus, the results appear related to the problem at hand, but do not reflect the group-specific effect of a treatment. A literature search revealed that more than a quarter of papers which used a Venn diagram to illustrate the results of separate differential analysis have also applied this incorrect reasoning.

19.
Brief Bioinform ; 23(3)2022 05 13.
Article in English | MEDLINE | ID: mdl-35453140

ABSTRACT

Pathway enrichment analysis has become a widely used knowledge-based approach for the interpretation of biomedical data. Its popularity has led to an explosion of both enrichment methods and pathway databases. While the elegance of pathway enrichment lies in its simplicity, multiple factors can impact the results of such an analysis, which may not be accounted for. Researchers may fail to give influential aspects their due, resorting instead to popular methods and gene set collections, or default settings. Despite ongoing efforts to establish set guidelines, meaningful results are still hampered by a lack of consensus or gold standards around how enrichment analysis should be conducted. Nonetheless, such concerns have prompted a series of benchmark studies specifically focused on evaluating the influence of various factors on pathway enrichment results. In this review, we organize and summarize the findings of these benchmarks to provide a comprehensive overview on the influence of these factors. Our work covers a broad spectrum of factors, spanning from methodological assumptions to those related to prior biological knowledge, such as pathway definitions and database choice. In doing so, we aim to shed light on how these aspects can lead to insignificant, uninteresting or even contradictory results. Finally, we conclude the review by proposing future benchmarks as well as solutions to overcome some of the challenges, which originate from the outlined factors.


Subject(s)
Databases, Factual , Factor Analysis, Statistical , Longitudinal Studies
20.
Biomedicines ; 10(3)2022 Mar 03.
Article in English | MEDLINE | ID: mdl-35327392

ABSTRACT

Statistical methods for enrichment analysis are important tools to extract biological information from omics experiments. Although these methods have been widely used for the analysis of gene and protein lists, the development of high-throughput technologies for regulatory elements demands dedicated statistical and bioinformatics tools. Here, we present a set of enrichment analysis methods for regulatory elements, including CpG sites, miRNAs, and transcription factors. Statistical significance is determined via a power weighting function for target genes and tested by the Wallenius noncentral hypergeometric distribution model to avoid selection bias. These new methodologies have been applied to the analysis of a set of miRNAs associated with arrhythmia, showing the potential of this tool to extract biological information from a list of regulatory elements. These new methods are available in GeneCodis 4, a web tool able to perform singular and modular enrichment analysis that allows the integration of heterogeneous information.

SELECTION OF CITATIONS
SEARCH DETAIL