Search | VHL Regional Portal

1.

Gene selection and cancer classification using interaction-based feature clustering and improved-binary Bat algorithm.

Esfandiari, Ahmad; Nasiri, Niki.

Comput Biol Med ; 181: 109071, 2024 Oct.

Article in English | MEDLINE | ID: mdl-39205342

ABSTRACT

In high-dimensional gene expression data, selecting an optimal subset of genes is crucial for achieving high classification accuracy and reliable diagnosis of diseases. This paper proposes a two-stage hybrid model for gene selection based on clustering and a swarm intelligence algorithm to identify the most informative genes with high accuracy. First, a clustering-based multivariate filter approach is performed to explore the interactions between the features and eliminate any redundant or irrelevant ones. Then, by controlling for the problem of premature convergence in the binary Bat algorithm, the optimal gene subset is determined using different classifiers with the Monte Carlo cross-validation data partitioning model. The effectiveness of our proposed framework is evaluated using eight gene expression datasets, by comparison with other recently published algorithms in the literature. Experiments confirm that in seven out of eight datasets, the proposed method can achieve superior results in terms of classification accuracy and gene subset size. In particular, it achieves a classification accuracy of 100% in Lymphoma and Ovarian datasets and above 97.4% in the rest with a minimum number of genes. The results demonstrate that our proposed algorithm has the potential to solve the feature selection problem in different applications with high-dimensional datasets.

Subject(s)

Algorithms , Neoplasms , Humans , Neoplasms/genetics , Neoplasms/classification , Cluster Analysis , Gene Expression Profiling/methods , Databases, Genetic , Computational Biology/methods , Female

2.

TopMarker: Computational screening biomarkers of hepatocellular carcinoma from transcriptome and interactome based on differential network topological parameters.

Wang, Yanqiu; Wang, Tong; Cao, Yi; Qiao, Xu; Han, Xianhua; Liu, Zhi-Ping.

Comput Biol Chem ; 112: 108166, 2024 Aug 02.

Article in English | MEDLINE | ID: mdl-39111022

ABSTRACT

Identifying diagnostic biomarkers for cancer is crucial in the field of personalized medicine. The available transcriptome and interactome provide unprecedented opportunities and challenges for biomarker screening. From a systematic perspective, network-based medicine methods provide alternative approaches to organizing the available high-throughput omics data for deciphering molecular interactions and their associations with phenotypic states. In this work, we propose a bioinformatics strategy named TopMarker for discovering diagnostic biomarkers by comparing the network topology differences in control and disease samples. Specifically, we build up gene-gene interaction networks in the two states of control and disease respectively. The network rewiring status across the two networks results in differential network topologies reflecting dynamics and changes in normal samples when compared with those in disease. Thus, we identify the potential biomarker genes with differential network topological parameters between the control and disease gene networks. For a proof-of-concept study, we introduce the computational pipeline of biomarker discovery in hepatocellular carcinoma (HCC). We prove the effectiveness of the proposed TopMarker method using these candidate biomarkers in classifying HCC samples and validate its signature capability across numerous independent datasets. We also compare the discriminant power of biomarker genes identified by TopMarker with those identified by other baseline methods. The higher classification performances and functional implications indicate the advantages of our proposed method for discovering biomarkers from differential network topology.

3.

Reconstruction of gene regulatory networks for Caenorhabditis elegans using tree-shaped gene expression data.

Wu, Yida; Zhou, Da; Hu, Jie.

Brief Bioinform ; 25(5)2024 Jul 25.

Article in English | MEDLINE | ID: mdl-39133097

ABSTRACT

Constructing gene regulatory networks is a widely adopted approach for investigating gene regulation, offering diverse applications in biology and medicine. A great deal of research focuses on using time series data or single-cell RNA-sequencing data to infer gene regulatory networks. However, such gene expression data lack either cellular or temporal information. Fortunately, the advent of time-lapse confocal laser microscopy enables biologists to obtain tree-shaped gene expression data of Caenorhabditis elegans, achieving both cellular and temporal resolution. Although such tree-shaped data provide abundant knowledge, they pose challenges like non-pairwise time series, laying the inaccuracy of downstream analysis. To address this issue, a comprehensive framework for data integration and a novel Bayesian approach based on Boolean network with time delay are proposed. The pre-screening process and Markov Chain Monte Carlo algorithm are applied to obtain the parameter estimates. Simulation studies show that our method outperforms existing Boolean network inference algorithms. Leveraging the proposed approach, gene regulatory networks for five subtrees are reconstructed based on the real tree-shaped datatsets of Caenorhabditis elegans, where some gene regulatory relationships confirmed in previous genetic studies are recovered. Also, heterogeneity of regulatory relationships in different cell lineage subtrees is detected. Furthermore, the exploration of potential gene regulatory relationships that bear importance in human diseases is undertaken. All source code is available at the GitHub repository https://github.com/edawu11/BBTD.git.

Subject(s)

Algorithms , Caenorhabditis elegans , Gene Regulatory Networks , Caenorhabditis elegans/genetics , Animals , Bayes Theorem , Computational Biology/methods , Markov Chains , Gene Expression Profiling/methods

4.

Inhibitory effect of human interleukin-24 on the proliferation, migration, and invasion of cervical cancer cells.

Song, Min; Yuan, Hongtao; Zhang, Jie; Wang, Jing; Yu, Jianhua; Wang, Wei.

J Int Med Res ; 52(7): 3000605241259655, 2024 Jul.

Article in English | MEDLINE | ID: mdl-39068529

ABSTRACT

OBJECTIVE: This study aimed to identify significantly differentially expressed genes (DEGs) related to cervical cancer by exploring extensive gene expression datasets to unveil new therapeutic targets. METHODS: Gene expression profiles were extracted from the Gene Expression Omnibus, The Cancer Genome Atlas, and the Genotype-Tissue Expression platforms. A differential expression analysis identified DEGs in cervical cancer cases. Weighted gene co-expression network analysis (WGCNA) was implemented to locate genes closely linked to the clinical traits of diseases. Machine learning algorithms, including LASSO regression and the random forest algorithm, were applied to pinpoint key genes. RESULTS: The investigation successfully isolated DEGs pertinent to cervical cancer. Interleukin-24 was recognized as a pivotal gene via WGCNA and machine learning techniques. Experimental validations demonstrated that human interleukin (hIL)-24 inhibited proliferation, migration, and invasion, while promoting apoptosis, in SiHa and HeLa cervical cancer cells, affirming its role as a therapeutic target. CONCLUSION: The multi-database analysis strategy employed herein emphasized hIL-24 as a principal gene in cervical cancer pathogenesis. The findings suggest hIL-24 as a promising candidate for targeted therapy, offering a potential avenue for innovative treatment modalities. This study enhances the understanding of molecular mechanisms of cervical cancer and aids in the pursuit of novel oncological therapies.

Subject(s)

Apoptosis , Cell Movement , Cell Proliferation , Gene Expression Regulation, Neoplastic , Interleukins , Neoplasm Invasiveness , Uterine Cervical Neoplasms , Humans , Uterine Cervical Neoplasms/genetics , Uterine Cervical Neoplasms/pathology , Uterine Cervical Neoplasms/metabolism , Female , Cell Proliferation/genetics , Cell Movement/genetics , Interleukins/genetics , Interleukins/metabolism , Apoptosis/genetics , Gene Regulatory Networks , Gene Expression Profiling , HeLa Cells , Machine Learning , Cell Line, Tumor

5.

Clustering algorithm based on DINNSM and its application in gene expression data analysis.

Li, Zongjin; Song, Changxin; Yang, Jiyu; Jia, Zeyu; Chen, Dongzhen; Yan, Chengying; Tian, Liqin; Wu, Xiaoming.

Technol Health Care ; 32(S1): 229-239, 2024.

Article in English | MEDLINE | ID: mdl-38759052

ABSTRACT

BACKGROUND: Selecting an appropriate similarity measurement method is crucial for obtaining biologically meaningful clustering modules. Commonly used measurement methods are insufficient in capturing the complexity of biological systems and fail to accurately represent their intricate interactions. OBJECTIVE: This study aimed to obtain biologically meaningful gene modules by using the clustering algorithm based on a similarity measurement method. METHODS: A new algorithm called the Dual-Index Nearest Neighbor Similarity Measure (DINNSM) was proposed. This algorithm calculated the similarity matrix between genes using Pearson's or Spearman's correlation. It was then used to construct a nearest-neighbor table based on the similarity matrix. The final similarity matrix was reconstructed using the positions of shared genes in the nearest neighbor table and the number of shared genes. RESULTS: Experiments were conducted on five different gene expression datasets and compared with five widely used similarity measurement techniques for gene expression data. The findings demonstrate that when utilizing DINNSM as the similarity measure, the clustering results performed better than using alternative measurement techniques. CONCLUSIONS: DINNSM provided more accurate insights into the intricate biological connections among genes, facilitating the identification of more accurate and biological gene co-expression modules.

Subject(s)

Algorithms , Gene Expression Profiling , Cluster Analysis , Humans , Gene Expression Profiling/methods , Computational Biology/methods

6.

Stable feature selection utilizing Graph Convolutional Neural Network and Layer-wise Relevance Propagation for biomarker discovery in breast cancer.

Chereda, Hryhorii; Leha, Andreas; Beißbarth, Tim.

Artif Intell Med ; 151: 102840, 2024 May.

Article in English | MEDLINE | ID: mdl-38658129

ABSTRACT

High-throughput technologies are becoming increasingly important in discovering prognostic biomarkers and in identifying novel drug targets. With Mammaprint, Oncotype DX, and many other prognostic molecular signatures breast cancer is one of the paradigmatic examples of the utility of high-throughput data to deliver prognostic biomarkers, that can be represented in a form of a rather short gene list. Such gene lists can be obtained as a set of features (genes) that are important for the decisions of a Machine Learning (ML) method applied to high-dimensional gene expression data. Several studies have identified predictive gene lists for patient prognosis in breast cancer, but these lists are unstable and have only a few genes in common. Instability of feature selection impedes biological interpretability: genes that are relevant for cancer pathology should be members of any predictive gene list obtained for the same clinical type of patients. Stability and interpretability of selected features can be improved by including information on molecular networks in ML methods. Graph Convolutional Neural Network (GCNN) is a contemporary deep learning approach applicable to gene expression data structured by a prior knowledge molecular network. Layer-wise Relevance Propagation (LRP) and SHapley Additive exPlanations (SHAP) are methods to explain individual decisions of deep learning models. We used both GCNN+LRP and GCNN+SHAP techniques to construct feature sets by aggregating individual explanations. We suggest a methodology to systematically and quantitatively analyze the stability, the impact on the classification performance, and the interpretability of the selected feature sets. We used this methodology to compare GCNN+LRP to GCNN+SHAP and to more classical ML-based feature selection approaches. Utilizing a large breast cancer gene expression dataset we show that, while feature selection with SHAP is useful in applications where selected features have to be impactful for classification performance, among all studied methods GCNN+LRP delivers the most stable (reproducible) and interpretable gene lists.

Subject(s)

Biomarkers, Tumor , Breast Neoplasms , Neural Networks, Computer , Humans , Breast Neoplasms/genetics , Breast Neoplasms/metabolism , Biomarkers, Tumor/genetics , Female , Gene Expression Profiling/methods , Deep Learning , Prognosis , Machine Learning

7.

A reconstructed genome-scale metabolic model of Helicobacter pylori for predicting putative drug targets in clarithromycin and rifampicin resistance conditions.

Mofidifar, Sepideh; Yadegar, Abbas; Karimi-Jafari, Mohammad Hossein.

Helicobacter ; 29(2): e13074, 2024.

Article in English | MEDLINE | ID: mdl-38615332

ABSTRACT

BACKGROUND: Helicobacter pylori is considered a true human pathogen for which rising drug resistance constitutes a drastic concern globally. The present study aimed to reconstruct a genome-scale metabolic model (GSMM) to decipher the metabolic capability of H. pylori strains in response to clarithromycin and rifampicin along with identification of novel drug targets. MATERIALS AND METHODS: The iIT341 model of H. pylori was updated based on genome annotation data, and biochemical knowledge from literature and databases. Context-specific models were generated by integrating the transcriptomic data of clarithromycin and rifampicin resistance into the model. Flux balance analysis was employed for identifying essential genes in each strain, which were further prioritized upon being nonhomologs to humans, virulence factor analysis, druggability, and broad-spectrum analysis. Additionally, metabolic differences between sensitive and resistant strains were also investigated based on flux variability analysis and pathway enrichment analysis of transcriptomic data. RESULTS: The reconstructed GSMM was named as HpM485 model. Pathway enrichment and flux variability analyses demonstrated reduced activity in the ribosomal pathway in both clarithromycin- and rifampicin-resistant strains. Also, a significant decrease was detected in the activity of metabolic pathways of clarithromycin-resistant strain. Moreover, 23 and 16 essential genes were exclusively detected in clarithromycin- and rifampicin-resistant strains, respectively. Based on prioritization analysis, cyclopropane fatty acid synthase and phosphoenolpyruvate synthase were identified as putative drug targets in clarithromycin- and rifampicin-resistant strains, respectively. CONCLUSIONS: We present a robust and reliable metabolic model of H. pylori. This model can predict novel drug targets to combat drug resistance and explore the metabolic capability of H. pylori in various conditions.

Subject(s)

Helicobacter Infections , Helicobacter pylori , Humans , Helicobacter pylori/genetics , Clarithromycin/pharmacology , Rifampin/pharmacology , Helicobacter Infections/drug therapy , Databases, Factual

8.

HIHISIV: a database of gene expression in HIV and SIV host immune response.

Costa, Raquel L; Gadelha, Luiz; D'arc, Mirela; Ribeiro-Alves, Marcelo; Robertson, David L; Schwartz, Jean-Marc; Soares, Marcelo A; Porto, Fábio.

BMC Bioinformatics ; 25(1): 125, 2024 Mar 22.

Article in English | MEDLINE | ID: mdl-38519883

ABSTRACT

In the battle of the host against lentiviral pathogenesis, the immune response is crucial. However, several questions remain unanswered about the interaction with different viruses and their influence on disease progression. The simian immunodeficiency virus (SIV) infecting nonhuman primates (NHP) is widely used as a model for the study of the human immunodeficiency virus (HIV) both because they are evolutionarily linked and because they share physiological and anatomical similarities that are largely explored to understand the disease progression. The HIHISIV database was developed to support researchers to integrate and evaluate the large number of transcriptional data associated with the presence/absence of the pathogen (SIV or HIV) and the host response (NHP and human). The datasets are composed of microarray and RNA-Seq gene expression data that were selected, curated, analyzed, enriched, and stored in a relational database. Six query templates comprise the main data analysis functions and the resulting information can be downloaded. The HIHISIV database, available at https://hihisiv.github.io , provides accurate resources for browsing and visualizing results and for more robust analyses of pre-existing data in transcriptome repositories.

Subject(s)

HIV Infections , Simian Acquired Immunodeficiency Syndrome , Simian Immunodeficiency Virus , Animals , Humans , Simian Immunodeficiency Virus/genetics , HIV , Simian Acquired Immunodeficiency Syndrome/genetics , Disease Progression , Immunity , Gene Expression

9.

Systems and computational analysis of gene expression datasets reveals GRB-2 suppression as an acute immunomodulatory response against enteric infections in endemic settings.

Naidu, Akshayata; Lulu S, Sajitha.

Front Immunol ; 15: 1285785, 2024.

Article in English | MEDLINE | ID: mdl-38433833

ABSTRACT

Introduction: Enteric infections are a major cause of under-5 (age) mortality in low/middle-income countries. Although vaccines against these infections have already been licensed, unwavering efforts are required to boost suboptimalefficacy and effectiveness in regions that are highly endemic to enteric pathogens. The role of baseline immunological profiles in influencing vaccine-induced immune responses is increasingly becoming clearer for several vaccines. Hence, for the development of advanced and region-specific enteric vaccines, insights into differences in immune responses to perturbations in endemic and non-endemic settings become crucial. Materials and methods: For this reason, we employed a two-tiered system and computational pipeline (i) to study the variations in differentially expressed genes (DEGs) associated with immune responses to enteric infections in endemic and non-endemic study groups, and (ii) to derive features (genes) of importance that keenly distinguish between these two groups using unsupervised machine learning algorithms on an aggregated gene expression dataset. The derived genes were further curated using topological analysis of the constructed STRING networks. The findings from these two tiers are validated using multilayer perceptron classifier and were further explored using correlation and regression analysis for the retrieval of associated gene regulatory modules. Results: Our analysis reveals aggressive suppression of GRB-2, an adaptor molecule integral for TCR signaling, as a primary immunomodulatory response against S. typhi infection in endemic settings. Moreover, using retrieved correlation modules and multivariant regression models, we found a positive association between regulators of activated T cells and mediators of Hedgehog signaling in the endemic population, which indicates the initiation of an effector (involving differentiation and homing) rather than an inductive response upon infection. On further exploration, we found STAT3 to be instrumental in designating T-cell functions upon early responses to enteric infections in endemic settings. Conclusion: Overall, through a systems and computational biology approach, we characterized distinct molecular players involved in immune responses to enteric infections in endemic settings in the process, contributing to the mounting evidence of endemicity being a major determiner of pathogen/vaccine-induced immune responses. The gained insights will have important implications in the design and development of region/endemicity-specific vaccines.

Subject(s)

Hedgehog Proteins , Vaccines , Immunomodulation , Immunity , Gene Expression

10.

Benchmarking enrichment analysis methods with the disease pathway network.

Buzzao, Davide; Castresana-Aguirre, Miguel; Guala, Dimitri; Sonnhammer, Erik L L.

Brief Bioinform ; 25(2)2024 Jan 22.

Article in English | MEDLINE | ID: mdl-38436561

ABSTRACT

Enrichment analysis (EA) is a common approach to gain functional insights from genome-scale experiments. As a consequence, a large number of EA methods have been developed, yet it is unclear from previous studies which method is the best for a given dataset. The main issues with previous benchmarks include the complexity of correctly assigning true pathways to a test dataset, and lack of generality of the evaluation metrics, for which the rank of a single target pathway is commonly used. We here provide a generalized EA benchmark and apply it to the most widely used EA methods, representing all four categories of current approaches. The benchmark employs a new set of 82 curated gene expression datasets from DNA microarray and RNA-Seq experiments for 26 diseases, of which only 13 are cancers. In order to address the shortcomings of the single target pathway approach and to enhance the sensitivity evaluation, we present the Disease Pathway Network, in which related Kyoto Encyclopedia of Genes and Genomes pathways are linked. We introduce a novel approach to evaluate pathway EA by combining sensitivity and specificity to provide a balanced evaluation of EA methods. This approach identifies Network Enrichment Analysis methods as the overall top performers compared with overlap-based methods. By using randomized gene expression datasets, we explore the null hypothesis bias of each method, revealing that most of them produce skewed P-values.

Subject(s)

Benchmarking , RNA-Seq

11.

SurvConvMixer: robust and interpretable cancer survival prediction based on ConvMixer using pathway-level gene expression images.

Wang, Shuo; Liu, Yuanning; Zhang, Hao; Liu, Zhen.

BMC Bioinformatics ; 25(1): 133, 2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38539106

ABSTRACT

Cancer is one of the leading causes of deaths worldwide. Survival analysis and prediction of cancer patients is of great significance for their precision medicine. The robustness and interpretability of the survival prediction models are important, where robustness tells whether a model has learned the knowledge, and interpretability means if a model can show human what it has learned. In this paper, we propose a robust and interpretable model SurvConvMixer, which uses pathways customized gene expression images and ConvMixer for cancer short-term, mid-term and long-term overall survival prediction. With ConvMixer, the representation of each pathway can be learned respectively. We show the robustness of our model by testing the trained model on absolutely untrained external datasets. The interpretability of SurvConvMixer depends on gradient-weighted class activation mapping (Grad-Cam), by which we can obtain the pathway-level activation heat map. Then wilcoxon rank-sum tests are conducted to obtain the statistically significant pathways, thereby revealing which pathways the model focuses on more. SurvConvMixer achieves remarkable performance on the short-term, mid-term and long-term overall survival of lung adenocarcinoma, lung squamous cell carcinoma and skin cutaneous melanoma, and the external validation tests show that SurvConvMixer can generalize to external datasets so that it is robust. Finally, we investigate the activation maps generated by Grad-Cam, after wilcoxon rank-sum test and Kaplan-Meier estimation, we find that some survival-related pathways play important role in SurvConvMixer.

Subject(s)

Adenocarcinoma of Lung , Lung Neoplasms , Melanoma , Skin Neoplasms , Humans , Gene Expression

12.

A Data-Distribution and Successive Spline Points based discretization approach for evolving gene regulatory networks from scRNA-Seq time-series data using Cartesian Genetic Programming.

da Silva, José Eduardo H; de Carvalho, Patrick C; Camata, José J; de Oliveira, Itamar L; Bernardino, Heder S.

Biosystems ; 236: 105126, 2024 Feb.

Article in English | MEDLINE | ID: mdl-38278505

ABSTRACT

The inference of gene regulatory networks (GRNs) is a widely addressed problem in Systems Biology. GRNs can be modeled as Boolean networks, which is the simplest approach for this task. However, Boolean models need binarized data. Several approaches have been developed for the discretization of gene expression data (GED). Also, the advance of data extraction technologies, such as single-cell RNA-Sequencing (scRNA-Seq), provides a new vision of gene expression and brings new challenges for dealing with its specificities, such as a large occurrence of zero data. This work proposes a new discretization approach for dealing with scRNA-Seq time-series data, named Distribution and Successive Spline Points Discretization (DSSPD), which considers the data distribution and a proper preprocessing step. Here, Cartesian Genetic Programming (CGP) is used to infer GRNs using the results of DSSPD. The proposal is compared with CGP with the standard data handling and five state-of-the-art algorithms on curated models and experimental data. The results show that the proposal improves the results of CGP in all tested cases and outperforms the state-of-the-art algorithms in most cases.

Subject(s)

Gene Regulatory Networks , Single-Cell Gene Expression Analysis , Tosyl Compounds , Gene Regulatory Networks/genetics , Algorithms , Systems Biology , Gene Expression Profiling/methods

13.

Weighted Combination of Lukasiewicz implication and Fuzzy Jaccard similarity in Hybrid Ensemble Framework (WCLFJHEF) for Gene Selection.

Roy, Sukriti; Singh, Joginder; Ray, Shubhra Sankar.

Comput Biol Med ; 170: 107981, 2024 Mar.

Article in English | MEDLINE | ID: mdl-38262204

ABSTRACT

A framework is developed for gene expression analysis by introducing fuzzy Jaccard similarity (FJS) and combining Lukasiewicz implication with it through weights in hybrid ensemble framework (WCLFJHEF) for gene selection in cancer. The method is called weighted combination of Lukasiewicz implication and fuzzy Jaccard similarity in hybrid ensemble framework (WCLFJHEF). While the fuzziness in Jaccard similarity is incorporated by using the existing Gödel fuzzy logic, the weights are obtained by maximizing the average F-score of selected genes in classifying the cancer patients. The patients are first divided into different clusters, based on the number of patient groups, using average linkage agglomerative clustering and a new score, called WCLFJ (weighted combination of Lukasiewicz implication and fuzzy Jaccard similarity). The genes are then selected from each cluster separately using filter based Relief-F and wrapper based SVMRFE (Support Vector Machine with Recursive Feature Elimination). A gene (feature) pool is created by considering the union of selected features for all the clusters. A set of informative genes is selected from the pool using sequential backward floating search (SBFS) algorithm. Patients are then classified using Naïve Bayes'(NB) and Support Vector Machine (SVM) separately, using the selected genes and the related F-scores are calculated. The weights in WCLFJ are then updated iteratively to maximize the average F-score obtained from the results of the classifier. The effectiveness of WCLFJHEF is demonstrated on six gene expression datasets. The average values of accuracy, F-score, recall, precision and MCC over all the datasets, are 95%, 94%, 94%, 94%, and 90%, respectively. The explainability of the selected genes is shown using SHapley Additive exPlanations (SHAP) values and this information is further used to rank them. The relevance of the selected gene set are biologically validated using the KEGG Pathway, Gene Ontology (GO), and existing literatures. It is seen that the genes that are selected by WCLFJHEF are candidates for genomic alterations in the various cancer types. The source code of WCLFJHEF is available at http://www.isical.ac.in/~shubhra/WCLFJHEF.html.

Subject(s)

Gene Expression Profiling , Neoplasms , Humans , Bayes Theorem , Gene Expression Profiling/methods , Algorithms , Neoplasms/metabolism , Software

14.

SurvIAE: Survival prediction with Interpretable Autoencoders from Diffuse Large B-Cells Lymphoma gene expression data.

Zaccaria, Gian Maria; Altini, Nicola; Mezzolla, Giuseppe; Vegliante, Maria Carmela; Stranieri, Marianna; Pappagallo, Susanna Anita; Ciavarella, Sabino; Guarini, Attilio; Bevilacqua, Vitoantonio.

Comput Methods Programs Biomed ; 244: 107966, 2024 Feb.

Article in English | MEDLINE | ID: mdl-38091844

ABSTRACT

BACKGROUND: In Diffuse Large B-Cell Lymphoma (DLBCL), several methodologies are emerging to derive novel biomarkers to be incorporated in the risk assessment. We realized a pipeline that relies on autoencoders (AE) and Explainable Artificial Intelligence (XAI) to stratify prognosis and derive a gene-based signature. METHODS: AE was exploited to learn an unsupervised representation of the gene expression (GE) from three publicly available datasets, each with its own technology. Multi-layer perceptron (MLP) was used to classify prognosis from latent representation. GE data were preprocessed as normalized, scaled, and standardized. Four different AE architectures (Large, Medium, Small and Extra Small) were compared to find the most suitable for GE data. The joint AE-MLP classified patients on six different outcomes: overall survival at 12, 36, 60 months and progression-free survival (PFS) at 12, 36, 60 months. XAI techniques were used to derive a gene-based signature aimed at refining the Revised International Prognostic Index (R-IPI) risk, which was validated in a fourth independent publicly available dataset. We named our tool SurvIAE: Survival prediction with Interpretable AE. RESULTS: From the latent space of AEs, we observed that scaled and standardized data reduced the batch effect. SurvIAE models outperformed R-IPI with Matthews Correlation Coefficient up to 0.42 vs. 0.18 for the validation-set (PFS36) and to 0.30 vs. 0.19 for the test-set (PFS60). We selected the SurvIAE-Small-PFS36 as the best model and, from its gene signature, we stratified patients in three risk groups: R-IPI Poor patients with High levels of GAB1, R-IPI Poor patients with Low levels of GAB1 or R-IPI Good/Very Good patients with Low levels of GPR132, and R-IPI Good/Very Good patients with High levels of GPR132. CONCLUSIONS: SurvIAE showed the potential to derive a gene signature with translational purpose in DLBCL. The pipeline was made publicly available and can be reused for other pathologies.

Subject(s)

Artificial Intelligence , Lymphoma, Large B-Cell, Diffuse , Humans , Antineoplastic Combined Chemotherapy Protocols , Lymphoma, Large B-Cell, Diffuse/genetics , Lymphoma, Large B-Cell, Diffuse/drug therapy , Prognosis , Gene Expression , Retrospective Studies

15.

Deciphering temporal gene expression dynamics in multiple coral species exposed to heat stress: Implications for predicting resilience.

Han, Tingyu; Liao, Xin; Guo, Zhuojun; Chen, J-Y; He, Chunpeng; Lu, Zuhong.

Sci Total Environ ; 912: 169021, 2024 Feb 20.

Article in English | MEDLINE | ID: mdl-38061659

ABSTRACT

Coral reefs are facing unprecedented threats due to global climate change, particularly elevated sea surface temperatures causing coral bleaching. Understanding coral responses at the molecular level is crucial for predicting their resilience and developing effective conservation strategies. In this study, we conducted a comprehensive gene expression analysis of four coral species to investigate their long-term molecular response to heat stress. We identified distinct gene expression patterns among the coral species, with laminar corals exhibiting a stronger response compared to branching corals. Heat shock proteins (HSPs) showed an overall decreasing expression trend, indicating the high energy cost associated with sustaining elevated HSP levels during prolonged heat stress. Peroxidases and oxidoreductases involved in oxidative stress response demonstrated significant upregulation, highlighting their role in maintaining cellular redox balance. Differential expression of genes related to calcium homeostasis and bioluminescence suggested distinct mechanisms for coping with heat stress among the coral species. Furthermore, the impact of heat stress on coral biomineralization varied, with downregulation of carbonic anhydrase and skeletal organic matrix proteins indicating reduced capacity for biomineralization in the later stages of heat stress. Our findings provide insights into the molecular mechanisms underlying coral responses to heat stress and highlight the importance of considering species-specific responses in assessing coral resilience. The identified biomarkers may serve as indicators of heat stress and contribute to early detection of coral bleaching events. These findings contribute to our understanding of coral resilience and provide a basis for future research aimed at enhancing coral survival in the face of climate change.

Subject(s)

Anthozoa , Resilience, Psychological , Animals , Anthozoa/physiology , Heat-Shock Response , Coral Reefs , Gene Expression

16.

Review on application of artificial intelligence in tumor gene expression data analysis / 中国医学物理学杂志

Kunpeng LI; Zepeng WANG; Yu ZHOU; Sihai LI.

Chinese Journal of Medical Physics ; (6): 389-396, 2024.

Article in Chinese | WPRIM (Western Pacific) | ID: wpr-1026238

ABSTRACT

Tumors are serious diseases threatening human health,and the early diagnosis is essential to improve treatment success and patient survival.The study of tumor gene expression data has become a major tool for revealing tumor disease mechanisms,in which artificial intelligence plays an important role.The potential advantages of supervised learning,unsupervised learning and deep learning in tumor prediction and classification are explored from the perspective of machine learning methods.Special attention is paid to the impact of feature selection algorithms on gene screening and their importance in high-dimensional gene expression data.By providing a comprehensive overview of the application and development of artificial intelligence in the analysis of tumor gene expression data,the study aims to provide an outlook for future research directions and promote further development.

17.

GeneCompete: an integrative tool of a novel union algorithm with various ranking techniques for multiple gene expression data.

Janyasupab, Panisa; Suratanee, Apichat; Plaimas, Kitiporn.

PeerJ Comput Sci ; 9: e1686, 2023.

Article in English | MEDLINE | ID: mdl-38077583

ABSTRACT

Background: Identifying the genes responsible for diseases requires precise prioritization of significant genes. Gene expression analysis enables differentiation between gene expressions in disease and normal samples. Increasing the number of high-quality samples enhances the strength of evidence regarding gene involvement in diseases. This process has led to the discovery of disease biomarkers through the collection of diverse gene expression data. Methods: This study presents GeneCompete, a web-based tool that integrates gene expression data from multiple platforms and experiments to identify the most promising biomarkers. GeneCompete incorporates a novel union strategy and eight well-established ranking methods, including Win-Loss, Massey, Colley, Keener, Elo, Markov, PageRank, and Bi-directional PageRank algorithms, to prioritize genes across multiple gene expression datasets. Each gene in the competition is assigned a score based on log-fold change values, and significant genes are determined as winners. Results: We tested the tool on the expression datasets of Hypertrophic cardiomyopathy (HCM) and the datasets from Microarray Quality Control (MAQC) project, which include both microarray and RNA-Sequencing techniques. The results demonstrate that all ranking scores have more power to predict new occurrence datasets than the classical method. Moreover, the PageRank method with a union strategy delivers the best performance for both up-regulated and down-regulated genes. Furthermore, the top-ranking genes exhibit a strong association with the disease. For MAQC, the two-sides ranking score shows a high relationship with TaqMan validation set in all log-fold change thresholds. Conclusion: GeneCompete is a powerful web-based tool that revolutionizes the identification of disease-causing genes through the integration of gene expression data from multiple platforms and experiments.

18.

Mdwgan-gp: data augmentation for gene expression data based on multiple discriminator WGAN-GP.

Li, Rongyuan; Wu, Jingli; Li, Gaoshi; Liu, Jiafei; Xuan, Junbo; Zhu, Qi.

BMC Bioinformatics ; 24(1): 427, 2023 Nov 13.

Article in English | MEDLINE | ID: mdl-37957576

ABSTRACT

BACKGROUND: Although gene expression data play significant roles in biological and medical studies, their applications are hampered due to the difficulty and high expenses of gathering them through biological experiments. It is an urgent problem to generate high quality gene expression data with computational methods. WGAN-GP, a generative adversarial network-based method, has been successfully applied in augmenting gene expression data. However, mode collapse or over-fitting may take place for small training samples due to just one discriminator is adopted in the method. RESULTS: In this study, an improved data augmentation approach MDWGAN-GP, a generative adversarial network model with multiple discriminators, is proposed. In addition, a novel method is devised for enriching training samples based on linear graph convolutional network. Extensive experiments were implemented on real biological data. CONCLUSIONS: The experimental results have demonstrated that compared with other state-of-the-art methods, the MDWGAN-GP method can produce higher quality generated gene expression data in most cases.

Subject(s)

Data Accuracy , Gene Expression

19.

A New Filter Approach Based on Effective Ranges for Classification of Gene Expression Data.

Turfan, Derya; Altunkaynak, Bulent; Yeniay, Özgür.

Big Data ; 2023 Sep 04.

Article in English | MEDLINE | ID: mdl-37668992

ABSTRACT

Over the years, many studies have been carried out to reduce and eliminate the effects of diseases on human health. Gene expression data sets play a critical role in diagnosing and treating diseases. These data sets consist of thousands of genes and a small number of sample sizes. This situation creates the curse of dimensionality and it becomes problematic to analyze such data sets. One of the most effective strategies to solve this problem is feature selection methods. Feature selection is a preprocessing step to improve classification performance by selecting the most relevant and informative features while increasing the accuracy of classification. In this article, we propose a new statistically based filter method for the feature selection approach named Effective Range-based Feature Selection Algorithm (FSAER). As an extension of the previous Effective Range based Gene Selection (ERGS) and Improved Feature Selection based on Effective Range (IFSER) algorithms, our novel method includes the advantages of both methods while taking into account the disjoint area. To illustrate the efficacy of the proposed algorithm, the experiments have been conducted on six benchmark gene expression data sets. The results of the FSAER and the other filter methods have been compared in terms of classification accuracies to demonstrate the effectiveness of the proposed method. For classification methods, support vector machines, naive Bayes classifier, and k-nearest neighbor algorithms have been used.

20.

GeNetOntology: identifying affected gene ontology terms via grouping, scoring, and modeling of gene expression data utilizing biological knowledge-based machine learning.

Ersoz, Nur Sebnem; Bakir-Gungor, Burcu; Yousef, Malik.

Front Genet ; 14: 1139082, 2023.

Article in English | MEDLINE | ID: mdl-37671046

ABSTRACT

Introduction: Identifying significant sets of genes that are up/downregulated under specific conditions is vital to understand disease development mechanisms at the molecular level. Along this line, in order to analyze transcriptomic data, several computational feature selection (i.e., gene selection) methods have been proposed. On the other hand, uncovering the core functions of the selected genes provides a deep understanding of diseases. In order to address this problem, biological domain knowledge-based feature selection methods have been proposed. Unlike computational gene selection approaches, these domain knowledge-based methods take the underlying biology into account and integrate knowledge from external biological resources. Gene Ontology (GO) is one such biological resource that provides ontology terms for defining the molecular function, cellular component, and biological process of the gene product. Methods: In this study, we developed a tool named GeNetOntology which performs GO-based feature selection for gene expression data analysis. In the proposed approach, the process of Grouping, Scoring, and Modeling (G-S-M) is used to identify significant GO terms. GO information has been used as the grouping information, which has been embedded into a machine learning (ML) algorithm to select informative ontology terms. The genes annotated with the selected ontology terms have been used in the training part to carry out the classification task of the ML model. The output is an important set of ontologies for the two-class classification task applied to gene expression data for a given phenotype. Results: Our approach has been tested on 11 different gene expression datasets, and the results showed that GeNetOntology successfully identified important disease-related ontology terms to be used in the classification model. Discussion: GeNetOntology will assist geneticists and scientists to identify a range of disease-related genes and ontologies in transcriptomic data analysis, and it will also help doctors design diagnosis platforms and improve patient treatment plans.

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL