Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences.

Erten, Mehmet; Acharya, Madhav R; Kamath, Aditya P; Sampathila, Niranjana; Bairy, G Muralidhar; Aydemir, Emrah; Barua, Prabal Datta; Baygin, Mehmet; Tuncer, Ilknur; Dogan, Sengul; Tuncer, Turker

Erten, Mehmet; Acharya, Madhav R; Kamath, Aditya P; Sampathila, Niranjana; Bairy, G Muralidhar; Aydemir, Emrah; Barua, Prabal Datta; Baygin, Mehmet; Tuncer, Ilknur; Dogan, Sengul; Tuncer, Turker.

Afiliación

Erten M; Laboratory of Medical Biochemistry, Malatya Training and Research Hospital, 44000 Malatya, Turkey.
Acharya MR; Department of Biomedical Engineering, Manipal Academy of Higher Education, Manipal 04478, India.
Kamath AP; Center for Biomedical Engineering, Brown University, Providence, RI 02912, USA.
Sampathila N; Department of Biomedical Engineering, Manipal Academy of Higher Education, Manipal 04478, India.
Bairy GM; Department of Biomedical Engineering, Manipal Academy of Higher Education, Manipal 04478, India.
Aydemir E; Department of Management Information, College of Management, Sakarya University, 54050 Sakarya, Turkey.
Barua PD; School of Management & Enterprise, University of Southern Queensland, Toowoomba, QLD 4350, Australia.
Baygin M; Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW 2007, Australia.
Tuncer I; Department of Computer Engineering, Faculty of Engineering, Ardahan University, 75000 Ardahan, Turkey.
Dogan S; Elazig Governorship, Interior Ministry, 23119 Elazig, Turkey.
Tuncer T; Department of Digital Forensics Engineering, Technology Faculty, Firat University, 23119 Elazig, Turkey.

Diagnostics (Basel) ; 12(12)2022 Dec 15.

Article en En | MEDLINE | ID: mdl-36553188

RESUMEN

SARS-CoV-2 and Influenza-A can present similar symptoms. Computer-aided diagnosis can help facilitate screening for the two conditions, and may be especially relevant and useful in the current COVID-19 pandemic because seasonal Influenza-A infection can still occur. We have developed a novel text-based classification model for discriminating between the two conditions using protein sequences of varying lengths. We downloaded viral protein sequences of SARS-CoV-2 and Influenza-A with varying lengths (all 100 or greater) from the NCBI database and randomly selected 16,901 SARS-CoV-2 and 19,523 Influenza-A sequences to form a two-class study dataset. We used a new feature extraction function based on a unique pattern, HamletPat, generated from the text of Shakespeare's Hamlet, and a signum function to extract local binary pattern-like bits from overlapping fixed-length (27) blocks of the protein sequences. The bits were converted to decimal map signals from which histograms were extracted and concatenated to form a final feature vector of length 1280. The iterative Chi-square function selected the 340 most discriminative features to feed to an SVM with a Gaussian kernel for classification. The model attained 99.92% and 99.87% classification accuracy rates using hold-out (75:25 split ratio) and five-fold cross-validations, respectively. The excellent performance of the lightweight, handcrafted HamletPat-based classification model suggests that it can be a valuable tool for screening protein sequences to discriminate between SARS-CoV-2 and Influenza-A infections.

Palabras clave

Hamlet Pattern; SARS-CoV-2; bioinformatics; protein sequence classification

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Tipo de estudio: Diagnostic_studies Idioma: En Revista: Diagnostics (Basel) Año: 2022 Tipo del documento: Article País de afiliación: Turquía

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google