RESUMO
License Plate Recognition (LPR) is essential for the Internet of Vehicles (IoV) since license plates are a necessary characteristic for distinguishing vehicles for traffic management. As the number of vehicles on the road continues to grow, managing and controlling traffic has become increasingly complex. Large cities in particular face significant challenges, including concerns around privacy and the consumption of resources. To address these issues, the development of automatic LPR technology within the IoV has emerged as a critical area of research. By detecting and recognizing license plates on roadways, LPR can significantly enhance management and control of the transportation system. However, implementing LPR within automated transportation systems requires careful consideration of privacy and trust issues, particularly in relation to the collection and use of sensitive data. This study recommends a blockchain-based approach for IoV privacy security that makes use of LPR. A system handles the registration of a user's license plate directly on the blockchain, avoiding the gateway. The database controller may crash as the number of vehicles in the system rises. This paper proposes a privacy protection system for the IoV using license plate recognition based on blockchain. When a license plate is captured by the LPR system, the captured image is sent to the gateway responsible for managing all communications. When the user requires the license plate, the registration is done by a system connected directly to the blockchain, without going through the gateway. Moreover, in the traditional IoV system, the central authority has full authority to manage the binding of vehicle identity and public key. As the number of vehicles increases in the system, it may cause the central server to crash. Key revocation is the process in which the blockchain system analyses the behaviour of vehicles to judge malicious users and revoke their public keys.
RESUMO
Intrusion detection systems, also known as IDSs, are widely regarded as one of the most essential components of an organization's network security. This is because IDSs serve as the organization's first line of defense against several cyberattacks and are accountable for accurately detecting any possible network intrusions. Several implementations of IDSs accomplish the detection of potential threats throughout flow-based network traffic analysis. Traditional IDSs frequently struggle to provide accurate real-time intrusion detection while keeping up with the changing landscape of threat. Innovative methods used to improve IDSs' performance in network traffic analysis are urgently needed to overcome these drawbacks. In this study, we introduced a model called a deep neural decision forest (DNDF), which allows the enhancement of classification trees with the power of deep networks to learn data representations. We essentially utilized the CICIDS 2017 dataset for network traffic analysis and extended our experiments to evaluate the DNDF model's performance on two additional datasets: CICIDS 2018 and a custom network traffic dataset. Our findings showed that DNDF, a combination of deep neural networks and decision forests, outperformed reference approaches with a remarkable precision of 99.96% by using the CICIDS 2017 dataset while creating latent representations in deep layers. This success can be attributed to improved feature representation, model optimization, and resilience to noisy and unbalanced input data, emphasizing DNDF's capabilities in intrusion detection and network security solutions.
RESUMO
Organizations and individuals worldwide are becoming increasingly vulnerable to cyberattacks as phishing continues to grow and the number of phishing websites grows. As a result, improved cyber defense necessitates more effective phishing detection (PD). In this paper, we introduce a novel method for detecting phishing sites with high accuracy. Our approach utilizes a Convolution Neural Network (CNN)-based model for precise classification that effectively distinguishes legitimate websites from phishing websites. We evaluate the performance of our model on the PhishTank dataset, which is a widely used dataset for detecting phishing websites based solely on Uniform Resource Locators (URL) features. Our approach presents a unique contribution to the field of phishing detection by achieving high accuracy rates and outperforming previous state-of-the-art models. Experiment results revealed that our proposed method performs well in terms of accuracy and its false-positive rate. We created a real data set by crawling 10,000 phishing URLs from PhishTank and 10,000 legitimate websites and then ran experiments using standard evaluation metrics on the data sets. This approach is founded on integrated and deep learning (DL). The CNN-based model can distinguish phishing websites from legitimate websites with a high degree of accuracy. When binary-categorical loss and the Adam optimizer are used, the accuracy of the k-nearest neighbors (KNN), Natural Language Processing (NLP), Recurrent Neural Network (RNN), and Random Forest (RF) models is 87%, 97.98%, 97.4% and 94.26%, respectively, in contrast to previous publications. Our model outperformed previous works due to several factors, including the use of more layers and larger training sizes, and the extraction of additional features from the PhishTank dataset. Specifically, our proposed model comprises seven layers, starting with the input layer and progressing to the seventh, which incorporates a layer with pooling, convolutional, linear 1 and 2, and linear six layers as the output layers. These design choices contribute to the high accuracy of our model, which achieved a 98.77% accuracy rate.
RESUMO
Dementia affects the patient's memory and leads to language impairment. Research has demonstrated that speech and language deterioration is often a clear indication of dementia and plays a crucial role in the recognition process. Even though earlier studies have used speech features to recognize subjects suffering from dementia, they are often used along with other linguistic features obtained from transcriptions. This study explores significant standalone speech features to recognize dementia. The primary contribution of this work is to identify a compact set of speech features that aid in the dementia recognition process. The secondary contribution is to leverage machine learning (ML) and deep learning (DL) models for the recognition task. Speech samples from the Pitt corpus in Dementia Bank are utilized for the present study. The critical speech feature set of prosodic, voice quality and cepstral features has been proposed for the task. The experimental results demonstrate the superiority of machine learning (87.6 percent) over deep learning (85 percent) models for recognizing Dementia using the compact speech feature combination, along with lower time and memory consumption. The results obtained using the proposed approach are promising compared with the existing works on dementia recognition using speech.
Assuntos
Aprendizado Profundo , Demência , Humanos , Fala , Aprendizado de Máquina , Linguística , Demência/diagnósticoRESUMO
Classification of indoor environments is a challenging problem. The availability of low-cost depth sensors has opened up a new research area of using depth information in addition to color image (RGB) data for scene understanding. Transfer learning of deep convolutional networks with pairs of RGB and depth (RGB-D) images has to deal with integrating these two modalities. Single-channel depth images are often converted to three-channel images by extracting horizontal disparity, height above ground, and the angle of the pixel's local surface normal (HHA) to apply transfer learning using networks trained on the Places365 dataset. The high computational cost of HHA encoding can be a major disadvantage for the real-time prediction of scenes, although this may be less important during the training phase. We propose a new, computationally efficient encoding method that can be integrated with any convolutional neural network. We show that our encoding approach performs equally well or better in a multimodal transfer learning setup for scene classification. Our encoding is implemented in a customized and pretrained VGG16 Net. We address the class imbalance problem seen in the image dataset using a method based on the synthetic minority oversampling technique (SMOTE) at the feature level. With appropriate image augmentation and fine-tuning, our network achieves scene classification accuracy comparable to that of other state-of-the-art architectures.
Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Coleta de DadosRESUMO
Cancer was initially considered a genetic disease. However, recent studies have revealed the connection between bacterial infections and growth of different types of cancer. The enteroinvasive strain of Mycoplasma hominis alters the normal behavior of host cells that may result in the growth of prostate cancer. The role of M. hominis in the growth and development of prostate cancer still remains unclear. The infection may regulate several factors that influence prostate cancer growth in susceptible individuals. The aim of this study was to predict M. hominis proteins targeted into the endoplasmic reticulum (ER) of the host cell, and their potential role in the induction of prostate cancer. From the whole proteome of M. hominis, 19 proteins were predicted to be targeted into the ER of host cells. The results of our study predict that several proteins of M. hominis may be targeted to the host cell ER, and possibly alter the normal pattern of protein folding. These predicted proteins can modify the normal function of the host cell. Thus, the intercellular infection of M. hominis in host cells may serve as a potential factor in prostate cancer etiology.
Assuntos
Retículo Endoplasmático/metabolismo , Interações Hospedeiro-Patógeno , Infecções por Mycoplasma/complicações , Mycoplasma hominis/fisiologia , Neoplasias da Próstata/etiologia , Neoplasias da Próstata/metabolismo , Proteínas de Bactérias/metabolismo , Transformação Celular Neoplásica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Infecções por Mycoplasma/microbiologia , Neoplasias da Próstata/patologia , Ligação Proteica , Transporte Proteico , Proteoma , Proteômica/métodos , Biologia de SistemasRESUMO
Cancer has long been assumed to be a genetic disease. However, recent evidence supports the enigmatic connection of bacterial infection with the growth and development of various types of cancers. The cause and mechanism of the growth and development of prostate cancer due to Mycoplasma hominis remain unclear. Prostate cancer cells are infected and colonized by enteroinvasive M. hominis, which controls several factors that can affect prostate cancer growth in susceptible persons. We investigated M. hominis proteins targeting the nucleus of host cells and their implications in prostate cancer etiology. Many vital processes are controlled in the nucleus, where the proteins targeting M. hominis may have various potential implications. A total of 29/563 M. hominis proteins were predicted to target the nucleus of host cells. These include numerous proteins with the capability to alter normal growth activities. In conclusion, our results emphasize that various proteins of M. hominis targeted the nucleus of host cells and were involved in prostate cancer etiology through different mechanisms and strategies.
Assuntos
Proteínas de Bactérias/fisiologia , Biologia Computacional , Infecções por Mycoplasma/microbiologia , Mycoplasma hominis/metabolismo , Sinais de Localização Nuclear , Neoplasias da Próstata/etiologia , Prostatite/microbiologia , Apoptose , Proteínas de Bactérias/química , Pontos de Checagem do Ciclo Celular , Divisão Celular , Núcleo Celular/metabolismo , Cocarcinogênese , Dano ao DNA , Árvores de Decisões , Células Epiteliais/microbiologia , Interações Hospedeiro-Patógeno , Humanos , Masculino , Peso Molecular , Mycoplasma hominis/patogenicidade , Neoplasias da Próstata/metabolismo , Neoplasias da Próstata/microbiologia , Proteoma , Máquina de Vetores de SuporteRESUMO
Brain tumors are one of the leading causes of cancer death; screening early is the best strategy to diagnose and treat brain tumors. Magnetic Resonance Imaging (MRI) is extensively utilized for brain tumor diagnosis; nevertheless, achieving improved accuracy and performance, a critical challenge in most of the previously reported automated medical diagnostics, is a complex problem. The study introduces the Dual Vision Transformer-DSUNET model, which incorporates feature fusion techniques to provide precise and efficient differentiation between brain tumors and other brain regions by leveraging multi-modal MRI data. The impetus for this study arises from the necessity of automating the segmentation process of brain tumors in medical imaging, a critical component in the realms of diagnosis and therapy strategy. The BRATS 2020 dataset is employed to tackle this issue, an extensively utilized dataset for segmenting brain tumors. This dataset encompasses multi-modal MRI images, including T1-weighted, T2-weighted, T1Gd (contrast-enhanced), and FLAIR modalities. The proposed model incorporates the dual vision idea to comprehensively capture the heterogeneous properties of brain tumors across several imaging modalities. Moreover, feature fusion techniques are implemented to augment the amalgamation of data originating from several modalities, enhancing the accuracy and dependability of tumor segmentation. The Dual Vision Transformer-DSUNET model's performance is evaluated using the Dice Coefficient as a prevalent metric for quantifying segmentation accuracy. The results obtained from the experiment exhibit remarkable performance, with Dice Coefficient values of 91.47 % for enhanced tumors, 92.38 % for core tumors, and 90.88 % for edema. The cumulative Dice score for the entirety of the classes is 91.29 %. In addition, the model has a high level of accuracy, roughly 99.93 %, which underscores its durability and efficacy in segmenting brain tumors. Experimental findings demonstrate the integrity of the suggested architecture, which has quickly improved the detection accuracy of many brain diseases.
RESUMO
Detecting plant leaf diseases accurately and promptly is essential for reducing economic consequences and maximizing crop yield. However, farmers' dependence on conventional manual techniques presents a difficulty in accurately pinpointing particular diseases. This research investigates the utilization of the YOLOv4 algorithm for detecting and identifying plant leaf diseases. This study uses the comprehensive Plant Village Dataset, which includes over fifty thousand photos of healthy and diseased plant leaves from fourteen different species, to develop advanced disease prediction systems in agriculture. Data augmentation techniques including histogram equalization and horizontal flip were used to improve the dataset and strengthen the model's resilience. A comprehensive assessment of the YOLOv4 algorithm was conducted, which involved comparing its performance with established target identification methods including Densenet, Alexanet, and neural networks. When YOLOv4 was used on the Plant Village dataset, it achieved an impressive accuracy of 99.99%. The evaluation criteria, including accuracy, precision, recall, and f1-score, consistently showed high performance with a value of 0.99, confirming the effectiveness of the proposed methodology. This study's results demonstrate substantial advancements in plant disease detection and underscore the capabilities of YOLOv4 as a sophisticated tool for accurate disease prediction. These developments have significant significance for everyone involved in agriculture, researchers, and farmers, providing improved capacities for disease control and crop protection.
RESUMO
Modern technology frequently uses wearable sensors to monitor many aspects of human behavior. Since continuous records of heart rate and activity levels are typically gathered, the data generated by these devices have a lot of promise beyond counting the number of daily steps or calories expended. Due to the patient's inability to obtain the necessary information to understand their conditions and detect illness, such as depression, objectively, methods for evaluating various mental disorders, such as the Montgomery-Asberg depression rating scale (MADRS) and observations, currently require a significant amount of effort on the part of specialists. In this study, a novel dataset was provided, comprising sensor data gathered from depressed patients. The dataset included 32 healthy controls and 23 unipolar and bipolar depressive patients with motor activity recordings. Along with the sensor data collected over several days of continuous measurement for each patient, some demographic information was also offered. The result of the experiment showed that less than 70 of the 100 epochs of the model's training were completed. The Cohen Kappa score did not even pass 0.1 in the validation set, due to an imbalance in the class distribution, whereas in the second experiment, the majority of scores peaked in about 20 epochs, but because training continued during each epoch, it took much longer for the loss to decline before it fell below 0.1. In the second experiment, the model soon reached an accuracy of 0.991, which is as expected given the outcome of the UMAP dimensionality reduction. In the last experiment, UMAP and neural networks worked together to produce the best outcomes. They used a variety of machine learning classification algorithms, including the nearest neighbors, linear kernel SVM, Gaussian process, and random forest. This paper used the UMAP unsupervised machine learning dimensionality reduction without the neural network and showed a slightly lower score (QDA). By considering the ratings of the patient's depressive symptoms that were completed by medical specialists, it is possible to better understand the relationship between depression and motor activity.
RESUMO
Brain tumor (BT) diagnosis is a lengthy process, and great skill and expertise are required from radiologists. As the number of patients has expanded, so has the amount of data to be processed, making previous techniques both costly and ineffective. Many academics have examined a range of reliable and quick techniques for identifying and categorizing BTs. Recently, deep learning (DL) methods have gained popularity for creating computer algorithms that can quickly and reliably diagnose or segment BTs. To identify BTs in medical images, DL permits a pre-trained convolutional neural network (CNN) model. The suggested magnetic resonance imaging (MRI) images of BTs are included in the BT segmentation dataset, which was created as a benchmark for developing and evaluating algorithms for BT segmentation and diagnosis. There are 335 annotated MRI images in the collection. For the purpose of developing and testing BT segmentation and diagnosis algorithms, the brain tumor segmentation (BraTS) dataset was produced. A deep CNN was also utilized in the model-building process for segmenting BTs using the BraTS dataset. To train the model, a categorical cross-entropy loss function and an optimizer, such as Adam, were employed. Finally, the model's output successfully identified and segmented BTs in the dataset, attaining a validation accuracy of 98%.
RESUMO
The growth of biomedical engineering has made depression diagnosis via electroencephalography (EEG) a trendy issue. The two significant challenges to this application are EEG signals' complexity and non-stationarity. Additionally, the effects caused by individual variances may hamper the generalization of detection systems. Given the association between EEG signals and particular demographics, such as gender and age, and the influences of these demographic characteristics on the incidence of depression, it would be preferable to include demographic factors during EEG modeling and depression detection. The main objective of this work is to develop an algorithm that can recognize depression patterns by studying EEG data. Following a multiband analysis of such signals, machine learning and deep learning techniques were used to detect depression patients automatically. EEG signal data are collected from the multi-modal open dataset MODMA and employed in studying mental diseases. The EEG dataset contains information from a traditional 128-electrode elastic cap and a cutting-edge wearable 3-electrode EEG collector for widespread applications. In this project, resting EEG readings of 128 channels are considered. According to CNN, training with 25 epoch iterations had a 97% accuracy rate. The patient's status has to be divided into two basic categories: major depressive disorder (MDD) and healthy control. Additional MDD include the following six classes: obsessive-compulsive disorders, addiction disorders, conditions brought on by trauma and stress, mood disorders, schizophrenia, and the anxiety disorders discussed in this paper are a few examples of mental illnesses. According to the study, a natural combination of EEG signals and demographic data is promising for the diagnosis of depression.
RESUMO
Because underlying cognitive and neuromuscular activities regulate speech signals, biomarkers in the human voice can provide insight into neurological illnesses. Multiple motor and nonmotor aspects of neurologic voice disorders arise from an underlying neurologic condition such as Parkinson's disease, multiple sclerosis, myasthenia gravis, or ALS. Voice problems can be caused by disorders that affect the corticospinal system, cerebellum, basal ganglia, and upper or lower motoneurons. According to a new study, voice pathology detection technologies can successfully aid in the assessment of voice irregularities and enable the early diagnosis of voice pathology. In this paper, we offer two deep-learning-based computational models, 1-dimensional convolutional neural network (1D CNN) and 2-dimensional convolutional neural network (2D CNN), that simultaneously detect voice pathologies caused by neurological illnesses or other causes. From the German corpus Saarbruecken Voice Database (SVD), we used voice recordings of sustained vowel /a/ generated at normal pitch. The collected voice signals are padded and segmented to maintain homogeneity and increase the number of samples. Convolutional layers are applied to raw data, and MFCC features are extracted in this project. Although the 1D CNN had the maximum accuracy of 93.11% on test data, model training produced overfitting and 2D CNN, which generalized the data better and had lower train and validation loss despite having an accuracy of 84.17% on test data. Also, 2D CNN outperforms state-of-the-art studies in the field, implying that a model trained on handcrafted features is better for speech processing than a model that extracts features directly.
Assuntos
Aprendizado Profundo , Distúrbios da Voz , Voz , Humanos , Redes Neurais de Computação , Fala , Distúrbios da Voz/diagnósticoRESUMO
Skin cancer is one of the most severe forms of the disease, and it can spread to other parts of the body if not detected early. Therefore, diagnosing and treating skin cancer patients at an early stage is crucial. Since a manual skin cancer diagnosis is both time-consuming and expensive, an incorrect diagnosis is made due to the high similarity between the various skin cancers. Improved categorization of multiclass skin cancers requires the development of automated diagnostic systems. Herein, we propose a fully automatic method for classifying several skin cancers by fine-tuning the deep learning models VGG16, ResNet50, and ResNet101. Prior to model creation, the training dataset should undergo data augmentation using traditional image transformation techniques and Generative Adversarial Networks (GANs) to prevent class imbalance issues that may lead to model overfitting. In this study, we investigate the feasibility of creating dermoscopic images that have a realistic appearance using Conditional Generative Adversarial Network (CGAN) techniques. Thereafter, the traditional augmentation methods are used to augment our existing training set to improve the performance of pre-trained deep models on the skin cancer classification task. This improved performance is then compared to the models developed using the unbalanced dataset. In addition, we formed an ensemble of finely tuned transfer learning models, which we trained on balanced and unbalanced datasets. These models were used to make predictions about the data. With appropriate data augmentation, the proposed models attained an accuracy of 92% for VGG16, 92% for ResNet50, and 92.25% for ResNet101, respectively. The ensemble of these models increased the accuracy to 93.5%. A comprehensive discussion on the performance of the models concluded that using this method possibly leads to enhanced performance in skin cancer categorization compared to the efforts made in the past.
RESUMO
COVID-19 has remained a threat to world life despite a recent reduction in cases. There is still a possibility that the virus will evolve and become more contagious. If such a situation occurs, the resulting calamity will be worse than in the past if we act irresponsibly. COVID-19 must be widely screened and recognized early to avert a global epidemic. Positive individuals should be quarantined immediately, as this is the only effective way to prevent a global tragedy that has occurred previously. No positive case should go unrecognized. However, current COVID-19 detection procedures require a significant amount of time during human examination based on genetic and imaging techniques. Apart from RT-PCR and antigen-based tests, CXR and CT imaging techniques aid in the rapid and cost-effective identification of COVID. However, discriminating between diseased and normal X-rays is a time-consuming and challenging task requiring an expert's skill. In such a case, the only solution was an automatic diagnosis strategy for identifying COVID-19 instances from chest X-ray images. This article utilized a deep convolutional neural network, ResNet, which has been demonstrated to be the most effective for image classification. The present model is trained using pretrained ResNet on ImageNet weights. The versions of ResNet34, ResNet50, and ResNet101 were implemented and validated against the dataset. With a more extensive network, the accuracy appeared to improve. Nonetheless, our objective was to balance accuracy and training time on a larger dataset. By comparing the prediction outcomes of the three models, we concluded that ResNet34 is a more likely candidate for COVID-19 detection from chest X-rays. The highest accuracy level reached 98.34%, which was higher than the accuracy achieved by other state-of-the-art approaches examined in earlier studies. Subsequent analysis indicated that the incorrect predictions occurred with approximately 100% certainty. This uncovered a severe weakness in CNN, particularly in the medical area, where critical decisions are made. However, this can be addressed further in a future study by developing a modified model to incorporate uncertainty into the predictions, allowing medical personnel to manually review the incorrect predictions.
Assuntos
COVID-19 , Aprendizado Profundo , Humanos , Redes Neurais de Computação , SARS-CoV-2 , Raios XRESUMO
Diseases of internal organs other than the vocal folds can also affect a person's voice. As a result, voice problems are on the rise, even though they are frequently overlooked. According to a recent study, voice pathology detection systems can successfully help the assessment of voice abnormalities and enable the early diagnosis of voice pathology. For instance, in the early identification and diagnosis of voice problems, the automatic system for distinguishing healthy and diseased voices has gotten much attention. As a result, artificial intelligence-assisted voice analysis brings up new possibilities in healthcare. The work was aimed at assessing the utility of several automatic speech signal analysis methods for diagnosing voice disorders and suggesting a strategy for classifying healthy and diseased voices. The proposed framework integrates the efficacy of three voice characteristics: chroma, mel spectrogram, and mel frequency cepstral coefficient (MFCC). We also designed a deep neural network (DNN) capable of learning from the retrieved data and producing a highly accurate voice-based disease prediction model. The study describes a series of studies using the Saarbruecken Voice Database (SVD) to detect abnormal voices. The model was developed and tested using the vowels /a/, /i/, and /u/ pronounced in high, low, and average pitches. We also maintained the "continuous sentence" audio files collected from SVD to select how well the developed model generalizes to completely new data. The highest accuracy achieved was 77.49%, superior to prior attempts in the same domain. Additionally, the model attains an accuracy of 88.01% by integrating speaker gender information. The designed model trained on selected diseases can also obtain a maximum accuracy of 96.77% (cordectomy × healthy). As a result, the suggested framework is the best fit for the healthcare industry.
Assuntos
Patologia da Fala e Linguagem , Distúrbios da Voz , Voz , Inteligência Artificial , Humanos , Redes Neurais de Computação , Distúrbios da Voz/diagnósticoRESUMO
In experimental analysis and computer-aided design sustain scheme, segmentation of cell liver and hepatic lesions by an automated method is a significant step for studying the biomarkers characteristics in experimental analysis and computer-aided design sustain scheme. Patient to patient, the change in lesion type is dependent on the size, imaging equipment (such as the setting dissimilarity approach), and timing of the lesion, all of which are different. With practical approaches, it is difficult to determine the stages of liver cancer based on the segmentation of lesion patterns. Based on the training accuracy rate, the present algorithm confronts a number of obstacles in some domains. The suggested work proposes a system for automatically detecting liver tumours and lesions in magnetic resonance imaging of the abdomen pictures by using 3D affine invariant and shape parameterization approaches, as well as the results of this study. This point-to-point parameterization addresses the frequent issues associated with concave surfaces by establishing a standard model level for the organ's surface throughout the modelling process. Initially, the geodesic active contour analysis approach is used to separate the liver area from the rest of the body. The proposal is as follows: It is possible to minimise the error rate during the training operations, which are carried out using Cascaded Fully Convolutional Neural Networks (CFCNs) using the input of the segmented tumour area. Liver segmentation may help to reduce the error rate during the training procedures. The stage analysis of the data sets, which are comprised of training and testing pictures, is used to get the findings and validate their validity. The accuracy attained by the Cascaded Fully Convolutional Neural Network (CFCN) for the liver tumour analysis is 94.21 percent, with a calculation time of less than 90 seconds per volume for the liver tumour analysis. The results of the trials show that the total accuracy rate of the training and testing procedure is 93.85 percent in the various volumes of 3DIRCAD datasets tested.
Assuntos
Detecção Precoce de Câncer , Neoplasias Hepáticas , Abdome , Humanos , Processamento de Imagem Assistida por Computador/métodos , Neoplasias Hepáticas/diagnóstico por imagem , Imageamento por Ressonância Magnética , Redes Neurais de ComputaçãoRESUMO
Sign language is the native language of deaf people, which they use in their daily life, and it facilitates the communication process between deaf people. The problem faced by deaf people is targeted using sign language technique. Sign language refers to the use of the arms and hands to communicate, particularly among those who are deaf. This varies depending on the person and the location from which they come. As a result, there is no standardization about the sign language to be used; for example, American, British, Chinese, and Arab sign languages are all distinct. Here, in this study we trained a model, which will be able to classify the Arabic sign language, which consists of 32 Arabic alphabet sign classes. In images, sign language is detected through the pose of the hand. In this study, we proposed a framework, which consists of two CNN models, and each of them is individually trained on the training set. The final predictions of the two models were ensembled to achieve higher results. The dataset used in this study is released in 2019 and is called as ArSL2018. It is launched at the Prince Mohammad Bin Fahd University, Al Khobar, Saudi Arabia. The main contribution in this study is resizing the images to 64 ∗ 64 pixels, converting from grayscale images to three-channel images, and then applying the median filter to the images, which acts as lowpass filtering in order to smooth the images and reduce noise and to make the model more robust to avoid overfitting. Then, the preprocessed image is fed into two different models, which are ResNet50 and MobileNetV2. ResNet50 and MobileNetV2 architectures were implemented together. The results we achieved on the test set for the whole data are with an accuracy of about 97% after applying many preprocessing techniques and different hyperparameters for each model, and also different data augmentation techniques.
Assuntos
Auxiliares de Comunicação para Pessoas com Deficiência , Gestos , Computadores , Humanos , Idioma , Língua de Sinais , Estados UnidosRESUMO
Sign language is essential for deaf and mute people to communicate with normal people and themselves. As ordinary people tend to ignore the importance of sign language, which is the mere source of communication for the deaf and the mute communities. These people are facing significant downfalls in their lives because of these disabilities or impairments leading to unemployment, severe depression, and several other symptoms. One of the services they are using for communication is the sign language interpreters. But hiring these interpreters is very costly, and therefore, a cheap solution is required for resolving this issue. Therefore, a system has been developed that will use the visual hand dataset based on an Arabic Sign Language and interpret this visual data in textual information. The dataset used consists of 54049 images of Arabic sign language alphabets consisting of 1500\ images per class, and each class represents a different meaning by its hand gesture or sign. Various preprocessing and data augmentation techniques have been applied to the images. The experiments have been performed using various pretrained models on the given dataset. Most of them performed pretty normally and in the final stage, the EfficientNetB4 model has been considered the best fit for the case. Considering the complexity of the dataset, models other than EfficientNetB4 do not perform well due to their lightweight architecture. EfficientNetB4 is a heavy-weight architecture that possesses more complexities comparatively. The best model is exposed with a training accuracy of 98 percent and a testing accuracy of 95 percent.
Assuntos
Surdez , Língua de Sinais , Gestos , Humanos , Idioma , Aprendizado de MáquinaRESUMO
Emotions play an essential role in human relationships, and many real-time applications rely on interpreting the speaker's emotion from their words. Speech emotion recognition (SER) modules aid human-computer interface (HCI) applications, but they are challenging to implement because of the lack of balanced data for training and clarity about which features are sufficient for categorization. This research discusses the impact of the classification approach, identifying the most appropriate combination of features and data augmentation on speech emotion detection accuracy. Selection of the correct combination of handcrafted features with the classifier plays an integral part in reducing computation complexity. The suggested classification model, a 1D convolutional neural network (1D CNN), outperforms traditional machine learning approaches in classification. Unlike most earlier studies, which examined emotions primarily through a single language lens, our analysis looks at numerous language data sets. With the most discriminating features and data augmentation, our technique achieves 97.09%, 96.44%, and 83.33% accuracy for the BAVED, ANAD, and SAVEE data sets, respectively.