Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Robustness evaluations of pathway activity inference methods on gene expression data.

Hui, Tay Xin; Kasim, Shahreen; Aziz, Izzatdin Abdul; Fudzee, Mohd Farhan Md; Haron, Nazleeni Samiha; Sutikno, Tole; Hassan, Rohayanti; Mahdin, Hairulnizam; Sen, Seah Choon.

BMC Bioinformatics ; 25(1): 23, 2024 Jan 12.

Artigo em Inglês | MEDLINE | ID: mdl-38216898

RESUMO

BACKGROUND: With the exponential growth of high-throughput technologies, multiple pathway analysis methods have been proposed to estimate pathway activities from gene expression profiles. These pathway activity inference methods can be divided into two main categories: non-Topology-Based (non-TB) and Pathway Topology-Based (PTB) methods. Although some review and survey articles discussed the topic from different aspects, there is a lack of systematic assessment and comparisons on the robustness of these approaches. RESULTS: Thus, this study presents comprehensive robustness evaluations of seven widely used pathway activity inference methods using six cancer datasets based on two assessments. The first assessment seeks to investigate the robustness of pathway activity in pathway activity inference methods, while the second assessment aims to assess the robustness of risk-active pathways and genes predicted by these methods. The mean reproducibility power and total number of identified informative pathways and genes were evaluated. Based on the first assessment, the mean reproducibility power of pathway activity inference methods generally decreased as the number of pathway selections increased. Entropy-based Directed Random Walk (e-DRW) distinctly outperformed other methods in exhibiting the greatest reproducibility power across all cancer datasets. On the other hand, the second assessment shows that no methods provide satisfactory results across datasets. CONCLUSION: However, PTB methods generally appear to perform better in producing greater reproducibility power and identifying potential cancer markers compared to non-TB methods.

Assuntos

Neoplasias , Humanos , Reprodutibilidade dos Testes , Neoplasias/genética , Entropia , Expressão Gênica

An Entropy-Based Directed Random Walk for Cancer Classification Using Gene Expression Data Based on Bi-Random Walk on Two Separated Networks.

Tay, Xin Hui; Kasim, Shahreen; Sutikno, Tole; Fudzee, Mohd Farhan Md; Hassan, Rohayanti; Patah Akhir, Emelia Akashah; Aziz, Norshakirah; Seah, Choon Sen.

Genes (Basel) ; 14(3)2023 02 24.

Artigo em Inglês | MEDLINE | ID: mdl-36980844

RESUMO

The integration of microarray technologies and machine learning methods has become popular in predicting the pathological condition of diseases and discovering risk genes. Traditional microarray analysis considers pathways as a simple gene set, treating all genes in the pathway identically while ignoring the pathway network's structure information. This study proposed an entropy-based directed random walk (e-DRW) method to infer pathway activities. Two enhancements from the conventional DRW were conducted, which are (1) to increase the coverage of human pathway information by constructing two inputting networks for pathway activity inference, and (2) to enhance the gene-weighting method in DRW by incorporating correlation coefficient values and t-test statistic scores. To test the objectives, gene expression datasets were used as input datasets while the pathway datasets were used as reference datasets to build two directed graphs. The within-dataset experiments indicated that e-DRW method demonstrated robust and superior performance in terms of classification accuracy and robustness of the predicted risk-active pathways compared to the other methods. In conclusion, the results revealed that e-DRW not only improved the prediction performance, but also effectively extracted topologically important pathways and genes that were specifically related to the corresponding cancer types.

Assuntos

Neoplasias , Humanos , Entropia , Neoplasias/genética , Neoplasias/metabolismo , Técnicas Genéticas , Expressão Gênica

Topologically significant directed random walk with applied walker network in cancer environment.

Seah, Choon Sen; Kasim, Shahreen; Saedudin, Rd Rohmat; Md Fudzee, Mohd Farhan; Mohamad, Mohd Saberi; Hassan, Rohayanti; Ismail, Mohd Arfian.

Pak J Pharm Sci ; 32(3 Special): 1395-1408, 2019 May.

Artigo em Inglês | MEDLINE | ID: mdl-31551221

RESUMO

Numerous cancer studies have combined different datasets for the prognosis of patients. This study incorporated four networks for significant directed random walk (sDRW) to predict cancerous genes and risk pathways. The study investigated the feasibility of cancer prediction via different networks. In this study, multiple micro array data were analysed and used in the experiment. Six gene expression datasets were applied in four networks to study the effectiveness of the networks in sDRW in terms of cancer prediction. The experimental results showed that one of the proposed networks is outstanding compared to other networks. The network is then proposed to be implemented in sDRW as a walker network. This study provides a foundation for further studies and research on other networks. We hope these finding will improve the prognostic methods of cancer patients.

Assuntos

Biologia Computacional/métodos , Regulação Neoplásica da Expressão Gênica , Neoplasias/genética , Algoritmos , Biomarcadores Tumorais/genética , Bases de Dados Genéticas , Humanos , Análise em Microsséries , Mapas de Interação de Proteínas/genética , Distribuição Aleatória , Reprodutibilidade dos Testes , Transcriptoma

An enhanced topologically significant directed random walk in cancer classification using gene expression datasets.

Seah, Choon Sen; Kasim, Shahreen; Fudzee, Mohd Farhan Md; Law Tze Ping, Jeffrey Mark; Mohamad, Mohd Saberi; Saedudin, Rd Rohmat; Ismail, Mohd Arfian.

Saudi J Biol Sci ; 24(8): 1828-1841, 2017 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-29551932

RESUMO

Microarray technology has become one of the elementary tools for researchers to study the genome of organisms. As the complexity and heterogeneity of cancer is being increasingly appreciated through genomic analysis, cancerous classification is an emerging important trend. Significant directed random walk is proposed as one of the cancerous classification approach which have higher sensitivity of risk gene prediction and higher accuracy of cancer classification. In this paper, the methodology and material used for the experiment are presented. Tuning parameter selection method and weight as parameter are applied in proposed approach. Gene expression dataset is used as the input datasets while pathway dataset is used to build a directed graph, as reference datasets, to complete the bias process in random walk approach. In addition, we demonstrate that our approach can improve sensitive predictions with higher accuracy and biological meaningful classification result. Comparison result takes place between significant directed random walk and directed random walk to show the improvement in term of sensitivity of prediction and accuracy of cancer classification.

Identification of informative genes and pathways using an improved penalized support vector machine with a weighting scheme.

Chan, Weng Howe; Mohamad, Mohd Saberi; Deris, Safaai; Zaki, Nazar; Kasim, Shahreen; Omatu, Sigeru; Corchado, Juan Manuel; Al Ashwal, Hany.

Comput Biol Med ; 77: 102-15, 2016 10 01.

Artigo em Inglês | MEDLINE | ID: mdl-27522238

RESUMO

Incorporation of pathway knowledge into microarray analysis has brought better biological interpretation of the analysis outcome. However, most pathway data are manually curated without specific biological context. Non-informative genes could be included when the pathway data is used for analysis of context specific data like cancer microarray data. Therefore, efficient identification of informative genes is inevitable. Embedded methods like penalized classifiers have been used for microarray analysis due to their embedded gene selection. This paper proposes an improved penalized support vector machine with absolute t-test weighting scheme to identify informative genes and pathways. Experiments are done on four microarray data sets. The results are compared with previous methods using 10-fold cross validation in terms of accuracy, sensitivity, specificity and F-score. Our method shows consistent improvement over the previous methods and biological validation has been done to elucidate the relation of the selected genes and pathway with the phenotype under study.

Assuntos

Biologia Computacional/métodos , Redes Reguladoras de Genes/genética , Máquina de Vetores de Suporte , Transcriptoma/genética , Animais , Apoptose/genética , Ciclo Celular/genética , Perfilação da Expressão Gênica , Humanos , Camundongos , Análise em Microsséries , Neoplasias/genética , Neoplasias/metabolismo

Multi-stage filtering for improving confidence level and determining dominant clusters in clustering algorithms of gene expression data.

Kasim, Shahreen; Deris, Safaai; Othman, Razib M.

Comput Biol Med ; 43(9): 1120-33, 2013 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-23930805

RESUMO

A drastic improvement in the analysis of gene expression has lead to new discoveries in bioinformatics research. In order to analyse the gene expression data, fuzzy clustering algorithms are widely used. However, the resulting analyses from these specific types of algorithms may lead to confusion in hypotheses with regard to the suggestion of dominant function for genes of interest. Besides that, the current fuzzy clustering algorithms do not conduct a thorough analysis of genes with low membership values. Therefore, we present a novel computational framework called the "multi-stage filtering-Clustering Functional Annotation" (msf-CluFA) for clustering gene expression data. The framework consists of four components: fuzzy c-means clustering (msf-CluFA-0), achieving dominant cluster (msf-CluFA-1), improving confidence level (msf-CluFA-2) and combination of msf-CluFA-0, msf-CluFA-1 and msf-CluFA-2 (msf-CluFA-3). By employing double filtering in msf-CluFA-1 and apriori algorithms in msf-CluFA-2, our new framework is capable of determining the dominant clusters and improving the confidence level of genes with lower membership values by means of which the unknown genes can be predicted.

Assuntos

Algoritmos , Perfilação da Expressão Gênica/métodos , Regulação Fúngica da Expressão Gênica/fisiologia , Genes Fúngicos/fisiologia , Saccharomyces cerevisiae/metabolismo , Software

Utilizing shared interacting domain patterns and Gene Ontology information to improve protein-protein interaction prediction.

Roslan, Rosfuzah; Othman, Razib M; Shah, Zuraini A; Kasim, Shahreen; Asmuni, Hishammuddin; Taliba, Jumail; Hassan, Rohayanti; Zakaria, Zalmiyah.

Comput Biol Med ; 40(6): 555-64, 2010 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-20417930

RESUMO

Protein-protein interactions (PPIs) play a significant role in many crucial cellular operations such as metabolism, signaling and regulations. The computational methods for predicting PPIs have shown tremendous growth in recent years, but problem such as huge false positive rates has contributed to the lack of solid PPI information. We aimed at enhancing the overlap between computational predictions and experimental results in an effort to partially remove PPIs falsely predicted. The use of protein function predictor named PFP() that are based on shared interacting domain patterns is introduced in this study with the purpose of aiding the Gene Ontology Annotations (GOA). We used GOA and PFP() as agents in a filtering process to reduce false positive pairs in the computationally predicted PPI datasets. The functions predicted by PFP() were extracted from cross-species PPI data in order to assign novel functional annotations for the uncharacterized proteins and also as additional functions for those that are already characterized by the GO (Gene Ontology). The implementation of PFP() managed to increase the chances of finding matching function annotation for the first rule in the filtration process as much as 20%. To assess the capability of the proposed framework in filtering false PPIs, we applied it on the available S. cerevisiae PPIs and measured the performance in two aspects, the improvement made indicated as Signal-to-Noise Ratio (SNR) and the strength of improvement, respectively. The proposed filtering framework significantly achieved better performance than without it in both metrics.

Assuntos

Biologia Computacional/métodos , Modelos Estatísticos , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Proteínas/fisiologia , Algoritmos , Animais , Proteínas de Caenorhabditis elegans , Análise por Conglomerados , Bases de Dados Genéticas , Proteínas de Drosophila , Humanos , Proteínas de Saccharomyces cerevisiae , Terminologia como Assunto

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA