RESUMO
Accurate and fast recognition of vehicle license plates from natural scene images is a crucial and challenging task. Existing methods can recognize license plates in simple scenarios, but their performance degrades significantly in complex environments. A novel license plate detection and recognition model YOLOv5-PDLPR is proposed, which employs YOLOv5 target detection algorithm in the license plate detection part and uses the PDLPR algorithm proposed in this paper in the license plate recognition part. The PDLPR algorithm is mainly designed as follows: (1) A Multi-Head Attention mechanism is used to accurately recognize individual characters. (2) A global feature extractor network is designed to improve the completeness of the network for feature extraction. (3) The latest parallel decoder architecture is adopted to improve the inference efficiency. The experimental results show that the proposed algorithm has better accuracy and speed than the comparison algorithms, can achieve real-time recognition, and has high efficiency and robustness in complex scenes.
RESUMO
BACKGROUND: GPR151 is a kind of protein belonging to G protein-coupled receptor family that is closely associated with a variety of physiological and pathological processes.The potential use of GPR151 as a therapeutic target for the management of metabolic disorders has been demonstrated in several studies, highlighting the demand to explore its activators further. Activity prediction serves as a vital preliminary step in drug discovery, which is both costly and time-consuming. Thus, the development of reliable activity classification model has become an essential way in the process of drug discovery, aiming to enhance the efficiency of virtual screening. RESULTS: We propose a learning-based method based on feature extractor and deep neural network to predict the activity of GPR151 activators. We first introduce a new molecular feature extraction algorithm which utilizes the idea of bag-of-words model in natural language to densify the sparse fingerprint vector. Mol2vec method is also used to extract diverse features. Then, we construct three classical feature selection algorithms and three types of deep learning model to enhance the representational capacity of molecules and predict activity label by five different classifiers. We conduct experiments using our own dataset of GPR151 activators. The results demonstrate high classification accuracy and stability, with the optimal model Mol2vec-CNN significantly improving performance across multiple classifiers. The svm classifier achieves the best accuracy of 0.92 and F1 score of 0.76 which indicates promising applications for our method in the field of activity prediction. CONCLUSION: The results suggest that the experimental design of this study is appropriate and well-conceived. The deep learning-based feature extraction algorithm established in this study outperforms traditional feature selection algorithm for activity prediction. The model developed can be effectively utilized in the pre-screening stage of drug virtual screening.
Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Algoritmos , Descoberta de Drogas , Avaliação Pré-Clínica de MedicamentosRESUMO
Tissue phenotyping is a fundamental step in computational pathology for the analysis of tumor micro-environment in whole slide images (WSIs). Automatic tissue phenotyping in whole slide images (WSIs) of colorectal cancer (CRC) assists pathologists in better cancer grading and prognostication. In this paper, we propose a novel algorithm for the identification of distinct tissue components in colon cancer histology images by blending a comprehensive learning system with deep features extraction in the current work. Firstly, we extracted the features from the pre-trained VGG19 network which are then transformed into mapped features space for nodes enhancement generation. Utilizing both mapped features and enhancement nodes, the proposed algorithm classifies seven distinct tissue components including stroma, tumor, complex stroma, necrotic, normal benign, lymphocytes, and smooth muscle. To validate our proposed model, the experiments are performed on two publicly available colorectal cancer histology datasets. We showcase that our approach achieves a remarkable performance boost surpassing existing state-of-the-art methods by (1.3% AvTP, 2% F1) and (7% AvTP, 6% F1) on CRCD-1, and CRCD-2, respectively.
Assuntos
Algoritmos , Neoplasias Colorretais , Humanos , Aprendizagem , Patologistas , Neoplasias Colorretais/diagnóstico por imagem , Microambiente TumoralRESUMO
With the increasing popularity of electric vehicles, cable-driven serial manipulators have been applied in auto-charging processes for electric vehicles. To ensure the safety of the physical vehicle-robot interaction in this scenario, this paper presents a model-independent collision localization and classification method for cable-driven serial manipulators. First, based on the dynamic characteristics of the manipulator, data sets of terminal collision are constructed. In contrast to utilizing signals based on torque sensors, our data sets comprise the vibration signals of a specific compensator. Then, the collected data sets are applied to construct and train our collision localization and classification model, which consists of a double-layer CNN and an SVM. Compared to previous works, the proposed method can extract features without manual intervention and can deal with collision when the contact surface is irregular. Furthermore, the proposed method is able to generate the location and classification of the collision at the same time. The simulated experiment results show the validity of the proposed collision localization and classification method, with promising prediction accuracy.
Assuntos
Máquina de Vetores de SuporteRESUMO
The absence of labeled samples limits the development of speech emotion recognition (SER). Data augmentation is an effective way to address sample sparsity. However, there is a lack of research on data augmentation algorithms in the field of SER. In this paper, the effectiveness of classical acoustic data augmentation methods in SER is analyzed, based on which a strong generalized speech emotion recognition model based on effective data augmentation is proposed. The model uses a multi-channel feature extractor consisting of multiple sub-networks to extract emotional representations. Different kinds of augmented data that can effectively improve SER performance are fed into the sub-networks, and the emotional representations are obtained by the weighted fusion of the output feature maps of each sub-network. And in order to make the model robust to unseen speakers, we employ adversarial training to generalize emotion representations. A discriminator is used to estimate the Wasserstein distance between the feature distributions of different speakers and to force the feature extractor to learn the speaker-invariant emotional representations by adversarial training. The simulation experimental results on the IEMOCAP corpus show that the performance of the proposed method is 2-9% ahead of the related SER algorithm, which proves the effectiveness of the proposed method.
RESUMO
Video surveillance-based intrusion detection has been widely used in modern railway systems. Objects inside the alarm region, or the track area, can be detected by image processing algorithms. With the increasing number of surveillance cameras, manual labeling of alarm regions for each camera has become time-consuming and is sometimes not feasible at all, especially for pan-tilt-zoom (PTZ) cameras which may change their monitoring area at any time. To automatically label the track area for all cameras, video surveillance system requires an accurate track segmentation algorithm with small memory footprint and short inference delay. In this paper, we propose an adaptive segmentation algorithm to delineate the boundary of the track area with very light computation burden. The proposed algorithm includes three steps. Firstly, the image is segmented into fragmented regions. To reduce the redundant calculation in the evaluation of the boundary weight for generating the fragmented regions, an optimal set of Gaussian kernels with adaptive directions for each specific scene is calculated using Hough transformation. Secondly, the fragmented regions are combined into local areas by using a new clustering rule, based on the region's boundary weight and size. Finally, a classification network is used to recognize the track area among all local areas. To achieve a fast and accurate classification, a simplified CNN network is designed by using pre-trained convolution kernels and a loss function that can enhance the diversity of the feature maps. Experimental results show that the proposed method finds an effective balance between the segmentation precision, calculation time, and hardware cost of the system.
RESUMO
Because of the intricate topological structure and connection of the human brain, extracting deep spatial features from electroencephalograph (EEG) signals is a challenging and time-consuming task. The extraction of topological spatial information plays a crucial role in EEG classification, and the architecture of the spatial convolution greatly affects the performance and complexity of convolutional neural network (CNN) based EEG classification models. In this study, a progressive convolution CNN architecture named EEGProgress is proposed, aiming to efficiently extract the topological spatial information of EEG signals from multi-scale levels (electrode, brain region, hemisphere, global) with superior speed. To achieve this, the raw EEG data is permuted using the empirical topological permutation rule, integrating the EEG data with numerous topological properties. Subsequently, the spatial features are extracted by a progressive feature extractor including prior, electrode, region, and hemisphere convolution blocks, progressively extracting the deep spatial features with reduced parameters and speed. Finally, the comparison and ablation experiments under both cross-subject and within-subject scenarios are conducted on a public dataset to verify the performance of the proposed EEGProgress and the effectiveness of the topological permutation. The results demonstrate the superior feature extraction ability of the proposed EEGProgress, with an average increase of 4.02% compared to other CNN-based EEG classification models under both cross-subject and within-subject scenarios. Furthermore, with the obtained average testing time, FLOPs, and parameters, the proposed EEGProgress outperforms other comparison models in terms of model complexity.
Assuntos
Encéfalo , Redes Neurais de Computação , Humanos , Eletrodos , EletroencefalografiaRESUMO
Introduction: Nowadays, deep learning and convolutional neural networks (CNNs) have become widespread tools in many biomedical engineering studies. CNN is an end-to-end tool, which makes the processing procedure integrated, but in some situations, this processing tool requires to be fused with machine learning methods to be more accurate. Methods: In this paper, a hybrid approach based on deep features extracted from wavelet CNNs (WCNNs) weighted layers and multiclass support vector machine (MSVM) was proposed to improve the recognition of emotional states from electroencephalogram (EEG) signals. First, EEG signals were preprocessed and converted to Time-Frequency (T-F) color representation or scalogram using the continuous wavelet transform (CWT) method. Then, scalograms were fed into four popular pre-trained CNNs, AlexNet, ResNet-18, VGG-19, and Inception-v3 to fine-tune them. Then, the best feature layer from each one was used as input to the MSVM method to classify four quarters of the valence-arousal model. Finally, the subject-independent leave-one-subject-out criterion was used to evaluate the proposed method on DEAP and MAHNOB-HCI databases. Results: Results showed that extracting deep features from the earlier convolutional layer of ResNet-18 (Res2a) and classifying using the MSVM increased the average accuracy, precision, and recall by about 20% and 12% for MAHNOB-HCI and DEAP databases, respectively. Also, combining scalograms from four regions of pre-frontal, frontal, parietal, and parietal-occipital and two regions of frontal and parietal achieved the higher average accuracy of 77.47% and 87.45% for MAHNOB-HCI and DEAP databases, respectively. Conclusion: Combining CNN and MSVM increased the recognition of emotion from EEG signals and the results were comparable to state-of-the art studies.
RESUMO
Skin cancer is among the most common cancer types worldwide. Automatic identification of skin cancer is complicated because of the poor contrast and apparent resemblance between skin and lesions. The rate of human death can be significantly reduced if melanoma skin cancer could be detected quickly using dermoscopy images. This research uses an anisotropic diffusion filtering method on dermoscopy images to remove multiplicative speckle noise. To do this, the fast-bounding box (FBB) method is applied here to segment the skin cancer region. We also employ 2 feature extractors to represent images. The first one is the Hybrid Feature Extractor (HFE), and second one is the convolutional neural network VGG19-based CNN. The HFE combines 3 feature extraction approaches namely, Histogram-Oriented Gradient (HOG), Local Binary Pattern (LBP), and Speed Up Robust Feature (SURF) into a single fused feature vector. The CNN method is also used to extract additional features from test and training datasets. This 2-feature vector is then fused to design the classification model. The proposed method is then employed on 2 datasets namely, ISIC 2017 and the academic torrents dataset. Our proposed method achieves 99.85%, 91.65%, and 95.70% in terms of accuracy, sensitivity, and specificity, respectively, making it more successful than previously proposed machine learning algorithms.
RESUMO
Owing to its superior performance, the Transformer model, based on the 'Encoder- Decoder' paradigm, has become the mainstream model in natural language processing. However, bioinformatics has embraced machine learning and has led to remarkable progress in drug design and protein property prediction. Cell-penetrating peptides (CPPs) are a type of permeable protein that is a convenient 'postman' in drug penetration tasks. However, only a few CPPs have been discovered, limiting their practical applications in drug permeability. CPPs have led to a new approach that enables the uptake of only macromolecules into cells (i.e., without other potentially harmful materials found in the drug). Most previous studies have utilized trivial machine learning techniques and hand-crafted features to construct a simple classifier. CPPFormer was constructed by implementing the attention structure of the Transformer, rebuilding the network based on the characteristics of CPPs according to their short length, and using an automatic feature extractor with a few manually engineered features to co-direct the predicted results. Compared to all previous methods and other classic text classification models, the empirical results show that our proposed deep model-based method achieves the best performance, with an accuracy of 92.16% in the CPP924 dataset, and passes various index tests.
Assuntos
Peptídeos Penetradores de Células , Transporte Biológico , Peptídeos Penetradores de Células/química , Biologia Computacional/métodos , Desenho de Fármacos , Aprendizado de MáquinaRESUMO
BACKGROUND AND OBJECTIVE: Cerebral vascular accident (CVA), also known as stroke, is an important health problem worldwide and it affects 16 million people worldwide every year. About 30% of those that have a stroke die and 40% remain with serious physical limitations. However, recovery in the damaged region is possible if treatment is performed immediately. In the case of a stroke, Computed Tomography (CT) is the most appropriate technique to confirm the occurrence and to investigate its extent and severity. Stroke is an emergency problem for which early identification and measures are difficult; however, computer-aided diagnoses (CAD) can play an important role in obtaining information imperceptible to the human eye. Thus, this work proposes a new method for extracting features based on radiological density patterns of the brain, called Analysis of Brain Tissue Density (ABTD). METHODS: The proposed method is a specific approach applied to CT images to identify and classify the occurrence of stroke diseases. The evaluation of the results of the ABTD extractor proposed in this paper were compared with extractors already established in the literature, such as features from Gray-Level Co-Occurrence Matrix (GLCM), Local binary patterns (LBP), Central Moments (CM), Statistical Moments (SM), Hu's Moment (HM) and Zernike's Moments (ZM). Using a database of 420 CT images of the skull, each extractor was applied with the classifiers such as MLP, SVM, kNN, OPF and Bayesian to classify if a CT image represented a healthy brain or one with an ischemic or hemorrhagic stroke. RESULTS: ABTD had the shortest extraction time and the highest average accuracy (99.30%) when combined with OPF using the Euclidean distance. Also, the average accuracy values for all classifiers were higher than 95%. CONCLUSIONS: The relevance of the results demonstrated that the ABTD method is a useful algorithm to extract features that can potentially be integrated with CAD systems to assist in stroke diagnosis.
Assuntos
Encéfalo/diagnóstico por imagem , Crânio/diagnóstico por imagem , Acidente Vascular Cerebral/diagnóstico por imagem , Tomografia Computadorizada por Raios X , Algoritmos , Teorema de Bayes , HumanosRESUMO
The rapid development of many open source and commercial image editing software makes the authenticity of the digital images questionable. Copy-move forgery is one of the most widely used tampering techniques to create desirable objects or conceal undesirable objects in a scene. Existing techniques reported in the literature to detect such tampering aim to improve the robustness of these methods against the use of JPEG compression, blurring, noise, or other types of post processing operations. These post processing operations are frequently used with the intention to conceal tampering and reduce tampering clues. A robust method based on the color moments and other five image descriptors is proposed in this paper. The method divides the image into fixed size overlapping blocks. Clustering operation divides entire search space into smaller pieces with similar color distribution. Blocks from the tampered regions will reside within the same cluster since both copied and moved regions have similar color distributions. Five image descriptors are used to extract block features, which makes the method more robust to post processing operations. An ensemble of deep compositional pattern-producing neural networks are trained with these extracted features. Similarity among feature vectors in clusters indicates possible forged regions. Experimental results show that the proposed method can detect copy-move forgery even if an image was distorted by gamma correction, addictive white Gaussian noise, JPEG compression, or blurring.
RESUMO
Sensory stimuli are usually composed of different features (the what) appearing at irregular times (the when). Neural responses often use spike patterns to represent sensory information. The what is hypothesized to be encoded in the identity of the elicited patterns (the pattern categories), and the when, in the time positions of patterns (the pattern timing). However, this standard view is oversimplified. In the real world, the what and the when might not be separable concepts, for instance, if they are correlated in the stimulus. In addition, neuronal dynamics can condition the pattern timing to be correlated with the pattern categories. Hence, timing and categories of patterns may not constitute independent channels of information. In this paper, we assess the role of spike patterns in the neural code, irrespective of the nature of the patterns. We first define information-theoretical quantities that allow us to quantify the information encoded by different aspects of the neural response. We also introduce the notion of synergy/redundancy between time positions and categories of patterns. We subsequently establish the relation between the what and the when in the stimulus with the timing and the categories of patterns. To that aim, we quantify the mutual information between different aspects of the stimulus and different aspects of the response. This formal framework allows us to determine the precise conditions under which the standard view holds, as well as the departures from this simple case. Finally, we study the capability of different response aspects to represent the what and the when in the neural response.