ABSTRACT
In recent years, embedded system technologies and products for sensor networks and wearable devices used for monitoring people's activities and health have become the focus of the global IT industry. In order to enhance the speech recognition capabilities of wearable devices, this article discusses the implementation of audio positioning and enhancement in embedded systems using embedded algorithms for direction detection and mixed source separation. The two algorithms are implemented using different embedded systems: direction detection developed using TI TMS320C6713 DSK and mixed source separation developed using Raspberry Pi 2. For mixed source separation, in the first experiment, the average signal-to-interference ratio (SIR) at 1 m and 2 m distances was 16.72 and 15.76, respectively. In the second experiment, when evaluated using speech recognition, the algorithm improved speech recognition accuracy to 95%.
Subject(s)
Algorithms , Wearable Electronic Devices , Humans , Signal Processing, Computer-Assisted , Sound LocalizationABSTRACT
BACKGROUND: There is a lack of published research on the impact of the first wave of the COVID-19 pandemic in Taiwan. We investigated the mortality risk factors among critically ill patients with COVID-19 in Taiwan during the initial wave. Furthermore, we aim to develop a novel AI mortality prediction model using chest X-ray (CXR) alone. METHOD: We retrospectively reviewed the medical records of patients with COVID-19 at Taipei Tzu Chi Hospital from May 15 to July 15 2021. We enrolled adult patients who received invasive mechanical ventilation. The CXR images of each enrolled patient were divided into 4 categories (1st, pre-ETT, ETT, and WORST). To establish a prediction model, we used the MobilenetV3-Small model with "Imagenet" pretrained weights, followed by high Dropout regularization layers. We trained the model with these data with Five-Fold Cross-Validation to evaluate model performance. RESULT: A total of 64 patients were enrolled. The overall mortality rate was 45%. The median time from symptom onset to intubation was 8 days. Vasopressor use and a higher BRIXIA score on the WORST CXR were associated with an increased risk of mortality. The areas under the curve of the 1st, pre-ETT, ETT, and WORST CXRs by the AI model were 0.87, 0.92, 0.96, and 0.93 respectively. CONCLUSION: The mortality rate of COVID-19 patients who receive invasive mechanical ventilation was high. Septic shock and high BRIXIA score were clinical predictors of mortality. The novel AI mortality prediction model using CXR alone exhibited a high performance.
Subject(s)
COVID-19 , Adult , Humans , Pandemics , Prognosis , Retrospective Studies , X-Rays , Artificial IntelligenceABSTRACT
An electrocardiogram (ECG) is a basic and quick test for evaluating cardiac disorders and is crucial for remote patient monitoring equipment. An accurate ECG signal classification is critical for real-time measurement, analysis, archiving, and transmission of clinical data. Numerous studies have focused on accurate heartbeat classification, and deep neural networks have been suggested for better accuracy and simplicity. We investigated a new model for ECG heartbeat classification and found that it surpasses state-of-the-art models, achieving remarkable accuracy scores of 98.5% on the Physionet MIT-BIH dataset and 98.28% on the PTB database. Furthermore, our model achieves an impressive F1-score of approximately 86.71%, outperforming other models, such as MINA, CRNN, and EXpertRF on the PhysioNet Challenge 2017 dataset.
Subject(s)
Arrhythmias, Cardiac , Myocardial Infarction , Electrocardiography , Heart Rate , Arrhythmias, Cardiac/physiopathology , Myocardial Infarction/physiopathology , Humans , Machine LearningABSTRACT
Recently, self-supervised learning methods have been shown to be very powerful and efficient for yielding robust representation learning by maximizing the similarity across different augmented views in embedding vector space. However, the main challenge is generating different views with random cropping; the semantic feature might exist differently across different views leading to inappropriately maximizing similarity objective. We tackle this problem by introducing Heuristic Attention Representation Learning (HARL). This self-supervised framework relies on the joint embedding architecture in which the two neural networks are trained to produce similar embedding for different augmented views of the same image. HARL framework adopts prior visual object-level attention by generating a heuristic mask proposal for each training image and maximizes the abstract object-level embedding on vector space instead of whole image representation from previous works. As a result, HARL extracts the quality semantic representation from each training sample and outperforms existing self-supervised baselines on several downstream tasks. In addition, we provide efficient techniques based on conventional computer vision and deep learning methods for generating heuristic mask proposals on natural image datasets. Our HARL achieves +1.3% advancement in the ImageNet semi-supervised learning benchmark and +0.9% improvement in AP50 of the COCO object detection task over the previous state-of-the-art method BYOL. Our code implementation is available for both TensorFlow and PyTorch frameworks.
Subject(s)
Heuristics , Supervised Machine Learning , Neural Networks, Computer , SemanticsABSTRACT
Accurately segmented nuclei are important, not only for cancer classification, but also for predicting treatment effectiveness and other biomedical applications. However, the diversity of cell types, various external factors, and illumination conditions make nucleus segmentation a challenging task. In this work, we present a new deep learning-based method for cell nucleus segmentation. The proposed convolutional blur attention (CBA) network consists of downsampling and upsampling procedures. A blur attention module and a blur pooling operation are used to retain the feature salience and avoid noise generation in the downsampling procedure. A pyramid blur pooling (PBP) module is proposed to capture the multi-scale information in the upsampling procedure. The superiority of the proposed method has been compared with a few prior segmentation models, namely U-Net, ENet, SegNet, LinkNet, and Mask RCNN on the 2018 Data Science Bowl (DSB) challenge dataset and the multi-organ nucleus segmentation (MoNuSeg) at MICCAI 2018. The Dice similarity coefficient and some evaluation matrices, such as F1 score, recall, precision, and average Jaccard index (AJI) were used to evaluate the segmentation efficiency of these models. Overall, the proposal method in this paper has the best performance, the AJI indicator on the DSB dataset and MoNuSeg is 0.8429, 0.7985, respectively.
Subject(s)
Image Processing, Computer-Assisted , Neural Networks, Computer , Cell Nucleus , Image Processing, Computer-Assisted/methodsABSTRACT
Iris segmentation plays a pivotal role in the iris recognition system. The deep learning technique developed in recent years has gradually been applied to iris recognition techniques. As we all know, applying deep learning techniques requires a large number of data sets with high-quality manual labels. The larger the amount of data, the better the algorithm performs. In this paper, we propose a self-supervised framework utilizing the pix2pix conditional adversarial network for generating unlimited diversified iris images. Then, the generated iris images are used to train the iris segmentation network to achieve state-of-the-art performance. We also propose an algorithm to generate iris masks based on 11 tunable parameters, which can be generated randomly. Such a framework can generate an unlimited amount of photo-realistic training data for down-stream tasks. Experimental results demonstrate that the proposed framework achieved promising results in all commonly used metrics. The proposed framework can be easily generalized to any object segmentation task with a simple fine-tuning of the mask generation algorithm.
Subject(s)
Algorithms , Iris , Iris/diagnostic imaging , Supervised Machine LearningABSTRACT
For music emotion detection, this paper presents a music emotion verification system based on hierarchical sparse kernel machines. With the proposed system, we intend to verify if a music clip possesses happiness emotion or not. There are two levels in the hierarchical sparse kernel machines. In the first level, a set of acoustical features are extracted, and principle component analysis (PCA) is implemented to reduce the dimension. The acoustical features are utilized to generate the first-level decision vector, which is a vector with each element being a significant value of an emotion. The significant values of eight main emotional classes are utilized in this paper. To calculate the significant value of an emotion, we construct its 2-class SVM with calm emotion as the global (non-target) side of the SVM. The probability distributions of the adopted acoustical features are calculated and the probability product kernel is applied in the first-level SVMs to obtain first-level decision vector feature. In the second level of the hierarchical system, we merely construct a 2-class relevance vector machine (RVM) with happiness as the target side and other emotions as the background side of the RVM. The first-level decision vector is used as the feature with conventional radial basis function kernel. The happiness verification threshold is built on the probability value. In the experimental results, the detection error tradeoff (DET) curve shows that the proposed system has a good performance on verifying if a music clip reveals happiness emotion.
Subject(s)
Algorithms , Auditory Perception/physiology , Emotions/physiology , Music , Pattern Recognition, Automated/methods , Sound Spectrography/methods , Support Vector Machine , Biomimetics/methods , HumansABSTRACT
The investigations of emotional speech identification can be divided into two main parts, features and classifiers. In this paper, how to extract an effective speech feature set for the emotional speech identification is addressed. In our speech feature set, we use not only statistical analysis of frame-based acoustical features, but also the approximated speech feature contours, which are obtained by extracting extremely low frequency components to speech feature contours. Furthermore, principal component analysis (PCA) is applied to the approximated speech feature contours so that an efficient representation of approximated contours can be derived. The proposed speech feature set is fed into support vector machines (SVMs) to perform multiclass emotion identification. The experimental results demonstrate the performance of the proposed system with 82.26% identification rate.
Subject(s)
Emotions , Speech , Algorithms , Female , Humans , Male , Principal Component Analysis , Support Vector MachineABSTRACT
Metaverse, which is anticipated to be the future of the internet, is a 3D virtual world in which users interact via highly customizable computer avatars. It is considerably promising for several industries, including gaming, education, and business. However, it still has drawbacks, particularly in the privacy and identity threads. When a person joins the metaverse via a virtual reality (VR) human-robot equipment, their avatar, digital assets, and private information may be compromised by cybercriminals. This paper introduces a specific Finger Vein Recognition approach for the virtual reality (VR) human-robot equipment of the metaverse of the Metaverse to prevent others from misappropriating it. Finger vein is a is a biometric feature hidden beneath our skin. It is considerably more secure in person verification than other hand-based biometric characteristics such as finger print and palm print since it is difficult to imitate. Most conventional finger vein recognition systems that use hand-crafted features are ineffective, especially for images with low quality, low contrast, scale variation, translation, and rotation. Deep learning methods have been demonstrated to be more successful than traditional methods in computer vision. This paper develops a finger vein recognition system based on a convolution neural network and anti-aliasing technique. We employ/ utilize a contrast image enhancement algorithm in the preprocessing step to improve performance of the system. The proposed approach is evaluated on three publicly available finger vein datasets. Experimental results show that our proposed method outperforms the current state-of-the-art methods, improvement of 97.66% accuracy on FVUSM dataset, 99.94% accuracy on SDUMLA dataset, and 88.19% accuracy on THUFV2 dataset.
ABSTRACT
The need for a lightweight and reliable segmentation algorithm is critical in various biomedical image-prediction applications. However, the limited quantity of data presents a significant challenge for image segmentation. Additionally, low image quality negatively impacts the efficiency of segmentation, and previous deep learning models for image segmentation require large parameters with hundreds of millions of computations, resulting in high costs and processing times. In this study, we introduce a new lightweight segmentation model, the mobile anti-aliasing attention u-net model (MAAU), which features both encoder and decoder paths. The encoder incorporates an anti-aliasing layer and convolutional blocks to reduce the spatial resolution of input images while avoiding shift equivariance. The decoder uses an attention block and decoder module to capture prominent features in each channel. To address data-related problems, we implemented data augmentation methods such as flip, rotation, shear, translate, and color distortions, which enhanced segmentation efficiency in the international Skin Image Collaboration (ISIC) 2018 and PH2 datasets. Our experimental results demonstrated that our approach had fewer parameters, only 4.2 million, while it outperformed various state-of-the-art segmentation methods.
ABSTRACT
Music information retrieval is of great interest in audio signal processing. However, relatively little attention has been paid to the playing techniques of musical instruments. This work proposes an automatic system for classifying guitar playing techniques (GPTs). Automatic classification for GPTs is challenging because some playing techniques differ only slightly from others. This work presents a new framework for GPT classification: it uses a new feature extraction method based on spectral-temporal receptive fields (STRFs) to extract features from guitar sounds. This work applies a supervised deep learning approach to classify GPTs. Specifically, a new deep learning model, called the hierarchical cascade deep belief network (HCDBN), is proposed to perform automatic GPT classification. Several simulations were performed and the datasets of: 1) data on onsets of signals; 2) complete audio signals; and 3) audio signals in a real-world environment are adopted to compare the performance. The proposed system improves upon the F-score by approximately 11.47% in setup 1) and yields an F-score of 96.82% in setup 2). The results in setup 3) demonstrate that the proposed system also works well in a real-world environment. These results show that the proposed system is robust and has very high accuracy in automatic GPT classification.
Subject(s)
Music , Neural Networks, Computer , Signal Processing, Computer-AssistedABSTRACT
BACKGROUND: Classification of the type of calcaneal fracture on CT images is essential in driving treatment. However, human-based classification can be challenging due to anatomical complexities and CT image constraints. The use of computer-aided classification system in standard practice is additionally hindered by the availability of training images. The aims of this study is to 1) propose a deep learning network combined with data augmentation technique to classify calcaneal fractures on CT images into the Sanders system, and 2) assess the efficiency of such approach with differential training methods. METHODS: In this study, the Principle component analysis (PCA) network was selected for the deep learning neural network architecture for its superior performance. CT calcaneal images were processed through PCA filters, binary hashing, and a block-wise histogram. The Augmentor pipeline including rotation, distortion, and flips was applied to generate artificial calcaneus fractured images. Two types of training approaches and five data sample sizes were investigated to evaluate the performance of the proposed system with and without data augmentation. RESULTS: Compared to the original performance, use of augmented images during training improved network performance accuracy by almost twofold in classifying Sanders fracture types for all dataset sizes. A fivefold increase in the number of augmented training images improved network classification accuracy by 35%. The proposed deep CNN model achieved 72% accuracy in classifying CT calcaneal images into the four Sanders categories when trained with sufficient augmented artificial images. CONCLUSION: The proposed deep-learning algorithm coupled with data augmentation provides a feasible and efficient approach to the use of computer-aided system in assisting physicians in evaluating calcaneal fracture types.
Subject(s)
Ankle Injuries , Calcaneus , Deep Learning , Fractures, Bone , Calcaneus/diagnostic imaging , Fractures, Bone/diagnostic imaging , Humans , Tomography, X-Ray ComputedABSTRACT
BACKGROUND AND OBJECTIVES: The calcaneus is the most fracture-prone tarsal bone and injuries to the surrounding tissue are some of the most difficult to treat. Currently there is a lack of consensus on treatment or interpretation of computed tomography (CT) images for calcaneus fractures. This study proposes a novel computer-assisted method for automated classification and detection of fracture locations in calcaneus CT images using a deep learning algorithm. METHODS: Two types of Convolutional Neural Network (CNN) architectures with different network depths, a Residual network (ResNet) and a Visual geometry group (VGG), were evaluated and compared for the classification performance of CT scans into fracture and non-fracture categories based on coronal, sagittal, and transverse views. The bone fracture detection algorithm incorporated fracture area matching using the speeded-up robust features (SURF) method, Canny edge detection, and contour tracing. RESULTS: Results showed that ResNet was comparable in accuracy (98%) to the VGG network for bone fracture classification but achieved better performance for involving a deeper neural network architecture. ResNet classification results were used as the input for detecting the location and type of bone fracture using SURF algorithm. CONCLUSIONS: Results from real patient fracture data sets demonstrate the feasibility using deep CNN and SURF for computer-aided classification and detection of the location of calcaneus fractures in CT images.