Your browser doesn't support javascript.
loading
Weighted Combination of Lukasiewicz implication and Fuzzy Jaccard similarity in Hybrid Ensemble Framework (WCLFJHEF) for Gene Selection.
Roy, Sukriti; Singh, Joginder; Ray, Shubhra Sankar.
Afiliación
  • Roy S; Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700108, India. Electronic address: research.sr22@gmail.com.
  • Singh J; Center for Soft Computing Research, Indian Statistical Institute, Kolkata 700108, India. Electronic address: joginder265@gmail.com.
  • Ray SS; Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700108, India; Center for Soft Computing Research, Indian Statistical Institute, Kolkata 700108, India. Electronic address: shubhra@isical.ac.in.
Comput Biol Med ; 170: 107981, 2024 Mar.
Article en En | MEDLINE | ID: mdl-38262204
ABSTRACT
A framework is developed for gene expression analysis by introducing fuzzy Jaccard similarity (FJS) and combining Lukasiewicz implication with it through weights in hybrid ensemble framework (WCLFJHEF) for gene selection in cancer. The method is called weighted combination of Lukasiewicz implication and fuzzy Jaccard similarity in hybrid ensemble framework (WCLFJHEF). While the fuzziness in Jaccard similarity is incorporated by using the existing Gödel fuzzy logic, the weights are obtained by maximizing the average F-score of selected genes in classifying the cancer patients. The patients are first divided into different clusters, based on the number of patient groups, using average linkage agglomerative clustering and a new score, called WCLFJ (weighted combination of Lukasiewicz implication and fuzzy Jaccard similarity). The genes are then selected from each cluster separately using filter based Relief-F and wrapper based SVMRFE (Support Vector Machine with Recursive Feature Elimination). A gene (feature) pool is created by considering the union of selected features for all the clusters. A set of informative genes is selected from the pool using sequential backward floating search (SBFS) algorithm. Patients are then classified using Naïve Bayes'(NB) and Support Vector Machine (SVM) separately, using the selected genes and the related F-scores are calculated. The weights in WCLFJ are then updated iteratively to maximize the average F-score obtained from the results of the classifier. The effectiveness of WCLFJHEF is demonstrated on six gene expression datasets. The average values of accuracy, F-score, recall, precision and MCC over all the datasets, are 95%, 94%, 94%, 94%, and 90%, respectively. The explainability of the selected genes is shown using SHapley Additive exPlanations (SHAP) values and this information is further used to rank them. The relevance of the selected gene set are biologically validated using the KEGG Pathway, Gene Ontology (GO), and existing literatures. It is seen that the genes that are selected by WCLFJHEF are candidates for genomic alterations in the various cancer types. The source code of WCLFJHEF is available at http//www.isical.ac.in/~shubhra/WCLFJHEF.html.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Perfilación de la Expresión Génica / Neoplasias Límite: Humans Idioma: En Revista: Comput Biol Med Año: 2024 Tipo del documento: Article

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Perfilación de la Expresión Génica / Neoplasias Límite: Humans Idioma: En Revista: Comput Biol Med Año: 2024 Tipo del documento: Article