Performance analysis of hybrid deep learning framework using a vision transformer and convolutional neural network for handwritten digit recognition.

Agrawal, Vanita; Jagtap, Jayant; Patil, Shruti; Kotecha, Ketan

Agrawal, Vanita; Jagtap, Jayant; Patil, Shruti; Kotecha, Ketan.

Affiliation

Agrawal V; Department of Computer Science and Information Technology, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, Maharashtra, India.
Jagtap J; NIMS Institute of Computing, Artificial Intelligence and Machine Learning, NIMS University Rajasthan, Jaipur, India.
Patil S; Symbiosis Centre for Applied Artificial Intelligence (SCAAI), Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, Maharashtra, India.
Kotecha K; Symbiosis Centre for Applied Artificial Intelligence (SCAAI), Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, Maharashtra, India.

MethodsX ; 12: 102554, 2024 Jun.

Article in En | MEDLINE | ID: mdl-38292314

ABSTRACT

ABSTRACT

Digitization created a demand for highly efficient handwritten document recognition systems. A handwritten document consists of digits, text, symbols, diagrams, etc. Digits are an essential element of handwritten documents. Accurate recognition of handwritten digits is vital for effective communication and data analysis. Various researchers have attempted to address this issue with modern convolutional neural network (CNN) techniques. Even after training, CNN filter weights remain unchanged despite the high identification accuracy. As a result, the process cannot flexibly adapt to input changes. Hence computer vision researchers have recently become interested in Vision Transformers (ViTs) and Multilayer Perceptrons (MLPs). The shortcomings of CNNs gave rise to a hybrid model revolution that combines the best elements of the two fields. This paper analyzes how the hybrid convolutional ViT model affects the ability to recognize handwritten digits. Also, the real-time data contains noise, distortions, and varying writing styles. Hence, cleaned and uncleaned handwritten digit images are used for evaluation in this paper. The accuracy of the proposed method is compared with the state-of-the-art techniques, and the result shows that the proposed model achieves the highest recognition accuracy. Also, the probable solutions for recognizing other aspects of handwritten documents are discussed in this paper.â¢Analyzed the effect of convolutional vision transformer on cleaned and real-time handwritten digit images.â¢The model's performance improved with the implication of cross-validation and hyper-parameter tuning.â¢The results show that the proposed model is robust, feasible, and effective on cleaned and uncleaned handwritten digits.

Key words

Computer Vision; Convolutional Neural Network; Convolutional vision transformer; Handwritten Digit Recognition; Machine Learning; Vision Transformer

Fulltext

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: MethodsX Year: 2024 Type: Article Affiliation country: India

Fulltext

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: MethodsX Year: 2024 Type: Article Affiliation country: India