Pesquisa | BVS Doenças Infecciosas e Parasitárias

MOGAT: A Multi-Omics Integration Framework Using Graph Attention Networks for Cancer Subtype Prediction.

Tanvir, Raihanul Bari; Islam, Md Mezbahul; Sobhan, Masrur; Luo, Dongsheng; Mondal, Ananda Mohan.

Int J Mol Sci ; 25(5)2024 Feb 28.

Artigo em Inglês | MEDLINE | ID: mdl-38474033

RESUMO

Accurate cancer subtype prediction is crucial for personalized medicine. Integrating multi-omics data represents a viable approach to comprehending the intricate pathophysiology of complex diseases like cancer. Conventional machine learning techniques are not ideal for analyzing the complex interrelationships among different categories of omics data. Numerous models have been suggested using graph-based learning to uncover veiled representations and network formations unique to distinct types of omics data to heighten predictions regarding cancers and characterize patients' profiles, amongst other applications aimed at improving disease management in medical research. The existing graph-based state-of-the-art multi-omics integration approaches for cancer subtype prediction, MOGONET, and SUPREME, use a graph convolutional network (GCN), which fails to consider the level of importance of neighboring nodes on a particular node. To address this gap, we hypothesize that paying attention to each neighbor or providing appropriate weights to neighbors based on their importance might improve the cancer subtype prediction. The natural choice to determine the importance of each neighbor of a node in a graph is to explore the graph attention network (GAT). Here, we propose MOGAT, a novel multi-omics integration approach, leveraging GAT models that incorporate graph-based learning with an attention mechanism. MOGAT utilizes a multi-head attention mechanism to extract appropriate information for a specific sample by assigning unique attention coefficients to neighboring samples. Based on our knowledge, our group is the first to explore GAT in multi-omics integration for cancer subtype prediction. To evaluate the performance of MOGAT in predicting cancer subtypes, we explored two sets of breast cancer data from TCGA and METABRIC. Our proposed approach, MOGAT, outperforms MOGONET by 32% to 46% and SUPREME by 2% to 16% in cancer subtype prediction in different scenarios, supporting our hypothesis. Our results also showed that GAT embeddings provide a better prognosis in differentiating the high-risk group from the low-risk group than raw features.

Assuntos

Pesquisa Biomédica , Neoplasias da Mama , Humanos , Feminino , Multiômica , Gerenciamento Clínico , Aprendizado de Máquina

Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers.

Al Mamun, Abdullah; Tanvir, Raihanul Bari; Sobhan, Masrur; Mathee, Kalai; Narasimhan, Giri; Holt, Gregory E; Mondal, Ananda Mohan.

Int J Mol Sci ; 22(21)2021 Nov 03.

Artigo em Inglês | MEDLINE | ID: mdl-34769351

RESUMO

BACKGROUND: Long non-coding RNA plays a vital role in changing the expression profiles of various target genes that lead to cancer development. Thus, identifying prognostic lncRNAs related to different cancers might help in developing cancer therapy. METHOD: To discover the critical lncRNAs that can identify the origin of different cancers, we propose the use of the state-of-the-art deep learning algorithm concrete autoencoder (CAE) in an unsupervised setting, which efficiently identifies a subset of the most informative features. However, CAE does not identify reproducible features in different runs due to its stochastic nature. We thus propose a multi-run CAE (mrCAE) to identify a stable set of features to address this issue. The assumption is that a feature appearing in multiple runs carries more meaningful information about the data under consideration. The genome-wide lncRNA expression profiles of 12 different types of cancers, with a total of 4768 samples available in The Cancer Genome Atlas (TCGA), were analyzed to discover the key lncRNAs. The lncRNAs identified by multiple runs of CAE were added to a final list of key lncRNAs that are capable of identifying 12 different cancers. RESULTS: Our results showed that mrCAE performs better in feature selection than single-run CAE, standard autoencoder (AE), and other state-of-the-art feature selection techniques. This study revealed a set of top-ranking 128 lncRNAs that could identify the origin of 12 different cancers with an accuracy of 95%. Survival analysis showed that 76 of 128 lncRNAs have the prognostic capability to differentiate high- and low-risk groups of patients with different cancers. CONCLUSION: The proposed mrCAE, which selects actual features, outperformed the AE even though it selects the latent or pseudo-features. By selecting actual features instead of pseudo-features, mrCAE can be valuable for precision medicine. The identified prognostic lncRNAs can be further studied to develop therapies for different cancers.

Assuntos

Algoritmos , Biomarcadores Tumorais/genética , Aprendizado Profundo , Regulação Neoplásica da Expressão Gênica , Neoplasias/patologia , Redes Neurais de Computação , RNA Longo não Codificante/genética , Humanos , Neoplasias/classificação , Neoplasias/genética , Medicina de Precisão , Prognóstico , Taxa de Sobrevida

Computational identification of biomarker genes for lung cancer considering treatment and non-treatment studies.

Maharjan, Mona; Tanvir, Raihanul Bari; Chowdhury, Kamal; Duan, Wenrui; Mondal, Ananda Mohan.

BMC Bioinformatics ; 21(Suppl 9): 218, 2020 Dec 03.

Artigo em Inglês | MEDLINE | ID: mdl-33272232

RESUMO

BACKGROUND: Lung cancer is the number one cancer killer in the world with more than 142,670 deaths estimated in the United States alone in the year 2019. Consequently, there is an overreaching need to identify the key biomarkers for lung cancer. The aim of this study is to computationally identify biomarker genes for lung cancer that can aid in its diagnosis and treatment. The gene expression profiles of two different types of studies, namely non-treatment and treatment, are considered for discovering biomarker genes. In non-treatment studies healthy samples are control and cancer samples are cases. Whereas, in treatment studies, controls are cancer cell lines without treatment and cases are cancer cell lines with treatment. RESULTS: The Differentially Expressed Genes (DEGs) for lung cancer were isolated from Gene Expression Omnibus (GEO) database using R software tool GEO2R. A total of 407 DEGs (254 upregulated and 153 downregulated) from non-treatment studies and 547 DEGs (133 upregulated and 414 downregulated) from treatment studies were isolated. Two Cytoscape apps, namely, CytoHubba and MCODE, were used for identifying biomarker genes from functional networks developed using DEG genes. This study discovered two distinct sets of biomarker genes - one from non-treatment studies and the other from treatment studies, each set containing 16 genes. Survival analysis results show that most non-treatment biomarker genes have prognostic capability by indicating low-expression groups have higher chance of survival compare to high-expression groups. Whereas, most treatment biomarkers have prognostic capability by indicating high-expression groups have higher chance of survival compare to low-expression groups. CONCLUSION: A computational framework is developed to identify biomarker genes for lung cancer using gene expression profiles. Two different types of studies - non-treatment and treatment - are considered for experiment. Most of the biomarker genes from non-treatment studies are part of mitosis and play vital role in DNA repair and cell-cycle regulation. Whereas, most of the biomarker genes from treatment studies are associated to ubiquitination and cellular response to stress. This study discovered a list of biomarkers, which would help experimental scientists to design a lab experiment for further exploration of detail dynamics of lung cancer development.

Assuntos

Biomarcadores Tumorais/genética , Biologia Computacional/métodos , Neoplasias Pulmonares/genética , Biomarcadores Tumorais/metabolismo , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Ontologia Genética , Redes Reguladoras de Genes , Humanos , Prognóstico , Mapas de Interação de Proteínas/genética , Transdução de Sinais/genética , Análise de Sobrevida

Minimalist ensemble algorithms for genome-wide protein localization prediction.

Lin, Jhih-Rong; Mondal, Ananda Mohan; Liu, Rong; Hu, Jianjun.

BMC Bioinformatics ; 13: 157, 2012 Jul 03.

Artigo em Inglês | MEDLINE | ID: mdl-22759391

RESUMO

BACKGROUND: Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. RESULTS: This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. CONCLUSIONS: We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.

Assuntos

Algoritmos , Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Proteínas/genética , Área Sob a Curva , Genoma Fúngico , Genoma Humano , Humanos , Internet , Modelos Logísticos , Saccharomyces cerevisiae/genética , Software

Potential Autoimmunity Resulting from Molecular Mimicry between SARS-CoV-2 Spike and Human Proteins.

Nunez-Castilla, Janelle; Stebliankin, Vitalii; Baral, Prabin; Balbin, Christian A; Sobhan, Masrur; Cickovski, Trevor; Mondal, Ananda Mohan; Narasimhan, Giri; Chapagain, Prem; Mathee, Kalai; Siltberg-Liberles, Jessica.

Viruses ; 14(7)2022 06 28.

Artigo em Inglês | MEDLINE | ID: mdl-35891400

RESUMO

Molecular mimicry between viral antigens and host proteins can produce cross-reacting antibodies leading to autoimmunity. The coronavirus SARS-CoV-2 causes COVID-19, a disease curiously resulting in varied symptoms and outcomes, ranging from asymptomatic to fatal. Autoimmunity due to cross-reacting antibodies resulting from molecular mimicry between viral antigens and host proteins may provide an explanation. Thus, we computationally investigated molecular mimicry between SARS-CoV-2 Spike and known epitopes. We discovered molecular mimicry hotspots in Spike and highlight two examples with tentative high autoimmune potential and implications for understanding COVID-19 complications. We show that a TQLPP motif in Spike and thrombopoietin shares similar antibody binding properties. Antibodies cross-reacting with thrombopoietin may induce thrombocytopenia, a condition observed in COVID-19 patients. Another motif, ELDKY, is shared in multiple human proteins, such as PRKG1 involved in platelet activation and calcium regulation, and tropomyosin, which is linked to cardiac disease. Antibodies cross-reacting with PRKG1 and tropomyosin may cause known COVID-19 complications such as blood-clotting disorders and cardiac disease, respectively. Our findings illuminate COVID-19 pathogenesis and highlight the importance of considering autoimmune potential when developing therapeutic interventions to reduce adverse reactions.

Assuntos

COVID-19 , Cardiopatias , Anticorpos Antivirais , Antígenos Virais , Autoimunidade , Humanos , Mimetismo Molecular , SARS-CoV-2 , Glicoproteína da Espícula de Coronavírus/genética , Trombopoetina , Tropomiosina/metabolismo

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA