RESUMO
Currently, three-dimensional convolutional neural networks (3DCNNs) are a popular approach in the field of human activity recognition. However, due to the variety of methods used for human activity recognition, we propose a new deep-learning model in this paper. The main objective of our work is to optimize the traditional 3DCNN and propose a new model that combines 3DCNN with Convolutional Long Short-Term Memory (ConvLSTM) layers. Our experimental results, which were obtained using the LoDVP Abnormal Activities dataset, UCF50 dataset, and MOD20 dataset, demonstrate the superiority of the 3DCNN + ConvLSTM combination for recognizing human activities. Furthermore, our proposed model is well-suited for real-time human activity recognition applications and can be further enhanced by incorporating additional sensor data. To provide a comprehensive comparison of our proposed 3DCNN + ConvLSTM architecture, we compared our experimental results on these datasets. We achieved a precision of 89.12% when using the LoDVP Abnormal Activities dataset. Meanwhile, the precision we obtained using the modified UCF50 dataset (UCF50mini) and MOD20 dataset was 83.89% and 87.76%, respectively. Overall, our work demonstrates that the combination of 3DCNN and ConvLSTM layers can improve the accuracy of human activity recognition tasks, and our proposed model shows promise for real-time applications.
Assuntos
Aprendizado Profundo , Humanos , Redes Neurais de Computação , Atividades Humanas , Memória de Longo Prazo , Reconhecimento PsicológicoRESUMO
Recognizing various abnormal human activities from video is very challenging. This problem is also greatly influenced by the lack of datasets containing various abnormal human activities. The available datasets contain various human activities, but only a few of them contain non-standard human behavior such as theft, harassment, etc. There are datasets such as KTH that focus on abnormal activities such as sudden behavioral changes, as well as on various changes in interpersonal interactions. The UCF-crime dataset contains categories such as fighting, abuse, explosions, robberies, etc. However, this dataset is very time consuming. The events in the videos occur in a few seconds. This may affect the overall results of the neural networks that are used to detect the incident. In this article, we create a dataset that deals with abnormal activities, containing categories such as Begging, Drunkenness, Fight, Harassment, Hijack, Knife Hazard, Normal Videos, Pollution, Property Damage, Robbery, and Terrorism. We use the created dataset for the training and testing of the ConvLSTM (convolutional long short-term memory) neural network, which we designed. However, we also test the created dataset using other architectures. We use ConvLSTM architectures and 3D Resnet50, 3D Resnet101, and 3D Resnet152. With the created dataset and the architecture we designed, we obtained an accuracy of classification of 96.19% and a precision of 96.50%.