Deep-GenMut: Automated genetic mutation classification in oncology: A deep learning comparative study.

Elsamahy, Emad A; Ahmed, Asmaa E; Shoala, Tahseen; Maghraby, Fahima A

Elsamahy, Emad A; Ahmed, Asmaa E; Shoala, Tahseen; Maghraby, Fahima A.

Afiliação

Elsamahy EA; College of Computing and Information Technology, Arab Academy for Science, Technology, and Maritime Transport, Cairo, Egypt.
Ahmed AE; College of Computing and Information Technology, Arab Academy for Science, Technology, and Maritime Transport, Cairo, Egypt.
Shoala T; Environmental Biotechnology Department, College of Biotechnology, Misr University for Science and Technology, Giza, 12563, Egypt.
Maghraby FA; College of Computing and Information Technology, Arab Academy for Science, Technology, and Maritime Transport, Cairo, Egypt.

Heliyon ; 10(11): e32279, 2024 Jun 15.

Article em En | MEDLINE | ID: mdl-38912449

ABSTRACT

ABSTRACT

Early cancer detection and treatment depend on the discovery of specific genes that cause cancer. The classification of genetic mutations was initially done manually. However, this process relies on pathologists and can be a time-consuming task. Therefore, to improve the precision of clinical interpretation, researchers have developed computational algorithms that leverage next-generation sequencing technologies for automated mutation analysis. This paper utilized four deep learning classification models with training collections of biomedical texts. These models comprise bidirectional encoder representations from transformers for Biomedical text mining (BioBERT), a specialized language model implemented for biological contexts. Impressive results in multiple tasks, including text classification, language inference, and question answering, can be obtained by simply adding an extra layer to the BioBERT model. Moreover, bidirectional encoder representations from transformers (BERT), long short-term memory (LSTM), and bidirectional LSTM (BiLSTM) have been leveraged to produce very good results in categorizing genetic mutations based on textual evidence. The dataset used in the work was created by Memorial Sloan Kettering Cancer Center (MSKCC), which contains several mutations. Furthermore, this dataset poses a major classification challenge in the Kaggle research prediction competitions. In carrying out the work, three challenges were identified enormous text length, biased representation of the data, and repeated data instances. Based on the commonly used evaluation metrics, the experimental results show that the BioBERT model outperforms other models with an F1 score of 0.87 and 0.850 MCC, which can be considered as improved performance compared to similar results in the literature that have an F1 score of 0.70 achieved with the BERT model.

Palavras-chave

BERT; BiLSTM; BioBERT; Cancer detection; Deep learning; Genetic mutation; LSTM; Text classification

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Heliyon Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Egito

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google