Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 21
Filter
Add more filters










Publication year range
1.
Article in English | MEDLINE | ID: mdl-38082607

ABSTRACT

Recently, deep learning-driven studies have been introduced for bioacoustic signal classification. Most of them, however, have the limitation that the input of the classifier needs to match with a trained label which is known as closed set recognition (CSR). To this end, the classifier trained by CSR would not cover a real stream task since the input of the classifier has so many variations. To combat real-world tasks, open set recognition (OSR) has been developed. In OSR, randomly collected inputs are fed to the classifier and the classifier predicts target classes and Unknown class. However, this OSR has been spotlighted in the studies of computer vision and speech domains while the domain of bioacoustic signal is less developed. Especially, to our best knowledge, OSR for animal sound classification has not been studied. This paper proposes a novel method for open set bioacoustic signal classification based on Class Anchored Clustering (CAC) loss with closed set unknown bioacoustic signals. To use the closed set unknown signals for training, a total of n +1 classes are used by adding one additional Unknown class to n target classes, and n +1 cross-entropy loss is added to the CAC loss. To evaluate the proposed method, we build an animal sound dataset that includes 101 species of sounds and compare its performance with baseline methods. In the experiments, our proposed method shows higher performance than other baseline methods in the area under the receiver operating curve for detecting target class and unknown class, the classification accuracy of open set signals, and classification accuracy for target classes. As a result, the closed set class samples are well classified while the open set unknown class can be also recognized with high accuracy at the same time.


Subject(s)
Acoustics , Sound , Animals
2.
J Magn Reson ; 352: 107477, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37263100

ABSTRACT

Super-resolution (SR) is a computer vision task that involves recovering high-resolution (HR) images from low-resolution (LR) ones. While SR is applied to various disciplines, it is particularly important in the medical field which requires accurate diagnosis. L1 and L2 loss-based SR methods produce high values for the peak signal-to-noise ratio and structural similarity index measure but do not have high perceptual quality because SR methods are trained with the average of plausible HR predictions. In addition, SR is an ill-posed problem because only one LR image can be mapped to various HR images. This is crucial because poorly generated HR images can lead to misdiagnosis. In this paper, we propose MRIFlow, a novel method based on normalizing flow that transforms LR magnetic resonance (MR) images into HR MR images. MRIFlow contains frequency affine injectors to reflect frequency information. The frequency affine injector receives the output of a pre-trained LR encoder as the input and obtains frequency information from a wavelet transform based on ScatterNet. Using this method, its inverse operation is possible. MRIFlow has two versions based on the type of ScatterNet employed. In this paper, MRIFlow is compared with normalizing flow-based SR methods by using various MR image datasets such as IXI dataset, NYU fastMRI dataset, and LGG dataset and is demonstrated to produce better quantitative and qualitative results.


Subject(s)
Magnetic Resonance Imaging , Wavelet Analysis , Magnetic Resonance Imaging/methods , Signal-To-Noise Ratio
3.
Biomed Signal Process Control ; 79: 104250, 2023 Jan.
Article in English | MEDLINE | ID: mdl-36188130

ABSTRACT

Automatic segmentation of infected regions in computed tomography (CT) images is necessary for the initial diagnosis of COVID-19. Deep-learning-based methods have the potential to automate this task but require a large amount of data with pixel-level annotations. Training a deep network with annotated lung cancer CT images, which are easier to obtain, can alleviate this problem to some extent. However, this approach may suffer from a reduction in performance when applied to unseen COVID-19 images during the testing phase, caused by the difference in the image intensity and object region distribution between the training set and test set. In this paper, we proposed a novel unsupervised method for COVID-19 infection segmentation that aims to learn the domain-invariant features from lung cancer and COVID-19 images to improve the generalization ability of the segmentation network for use with COVID-19 CT images. First, to address the intensity difference, we proposed a novel data augmentation module based on Fourier Transform, which transfers the annotated lung cancer data into the style of COVID-19 image. Secondly, to reduce the distribution difference, we designed a teacher-student network to learn rotation-invariant features for segmentation. The experiments demonstrated that even without getting access to the annotations of the COVID-19 CT images during the training phase, the proposed network can achieve a state-of-the-art segmentation performance on COVID-19 infection.

4.
Sensors (Basel) ; 24(1)2023 Dec 21.
Article in English | MEDLINE | ID: mdl-38202920

ABSTRACT

Weakly supervised video anomaly detection is a methodology that assesses anomaly levels in individual frames based on labeled video data. Anomaly scores are computed by evaluating the deviation of distances derived from frames in an unbiased state. Weakly supervised video anomaly detection encounters the formidable challenge of false alarms, stemming from various sources, with a major contributor being the inadequate reflection of frame labels during the learning process. Multiple instance learning has been a pivotal solution to this issue in previous studies, necessitating the identification of discernible features between abnormal and normal segments. Simultaneously, it is imperative to identify shared biases within the feature space and cultivate a representative model. In this study, we introduce a novel multiple instance learning framework anchored on a memory unit, which augments features based on memory and effectively bridges the gap between normal and abnormal instances. This augmentation is facilitated through the integration of an multi-head attention feature augmentation module and loss function with a KL divergence and a Gaussian distribution estimation-based approach. The method identifies distinguishable features and secures the inter-instance distance, thus fortifying the distance metrics between abnormal and normal instances approximated by distribution. The contribution of this research involves proposing a novel framework based on MIL for performing WSVAD and presenting an efficient integration strategy during the augmentation process. Extensive experiments were conducted on benchmark datasets XD-Violence and UCF-Crime to substantiate the effectiveness of the proposed model.

5.
Appl Intell (Dordr) ; 52(6): 6340-6353, 2022.
Article in English | MEDLINE | ID: mdl-34764618

ABSTRACT

Automatic segmentation of infection areas in computed tomography (CT) images has proven to be an effective diagnostic approach for COVID-19. However, due to the limited number of pixel-level annotated medical images, accurate segmentation remains a major challenge. In this paper, we propose an unsupervised domain adaptation based segmentation network to improve the segmentation performance of the infection areas in COVID-19 CT images. In particular, we propose to utilize the synthetic data and limited unlabeled real COVID-19 CT images to jointly train the segmentation network. Furthermore, we develop a novel domain adaptation module, which is used to align the two domains and effectively improve the segmentation network's generalization capability to the real domain. Besides, we propose an unsupervised adversarial training scheme, which encourages the segmentation network to learn the domain-invariant feature, so that the robust feature can be used for segmentation. Experimental results demonstrate that our method can achieve state-of-the-art segmentation performance on COVID-19 CT images.

6.
Sensors (Basel) ; 21(24)2021 Dec 15.
Article in English | MEDLINE | ID: mdl-34960475

ABSTRACT

Weakly labeled sound event detection (WSED) is an important task as it can facilitate the data collection efforts before constructing a strongly labeled sound event dataset. Recent high performance in deep learning-based WSED's exploited using a segmentation mask for detecting the target feature map. However, achieving accurate detection performance was limited in real streaming audio due to the following reasons. First, the convolutional neural networks (CNN) employed in the segmentation mask extraction process do not appropriately highlight the importance of feature as the feature is extracted without pooling operations, and, concurrently, a small size kernel forces the receptive field small, making it difficult to learn various patterns. Second, as feature maps are obtained in an end-to-end fashion, the WSED model would be weak to unknown contents in the wild. These limitations would lead to generating undesired feature maps, such as noise in the unseen environment. This paper addresses these issues by constructing a more efficient model by employing a gated linear unit (GLU) and dilated convolution to improve the problems of de-emphasizing importance and lack of receptive field. In addition, this paper proposes pseudo-label-based learning for classifying target contents and unknown contents by adding 'noise label' and 'noise loss' so that unknown contents can be separated as much as possible through the noise label. The experiment is performed by mixing DCASE 2018 task1 acoustic scene data and task2 sound event data. The experimental results show that the proposed SED model achieves the best F1 performance with 59.7% at 0 SNR, 64.5% at 10 SNR, and 65.9% at 20 SNR. These results represent an improvement of 17.7%, 16.9%, and 16.5%, respectively, over the baseline.


Subject(s)
Neural Networks, Computer , Noise , Acoustics , Hearing , Sound
7.
IEEE J Biomed Health Inform ; 25(2): 441-452, 2021 02.
Article in English | MEDLINE | ID: mdl-33275588

ABSTRACT

Coronavirus disease 2019 (COVID-19) is an ongoing global pandemic that has spread rapidly since December 2019. Real-time reverse transcription polymerase chain reaction (rRT-PCR) and chest computed tomography (CT) imaging both play an important role in COVID-19 diagnosis. Chest CT imaging offers the benefits of quick reporting, a low cost, and high sensitivity for the detection of pulmonary infection. Recently, deep-learning-based computer vision methods have demonstrated great promise for use in medical imaging applications, including X-rays, magnetic resonance imaging, and CT imaging. However, training a deep-learning model requires large volumes of data, and medical staff faces a high risk when collecting COVID-19 CT data due to the high infectivity of the disease. Another issue is the lack of experts available for data labeling. In order to meet the data requirements for COVID-19 CT imaging, we propose a CT image synthesis approach based on a conditional generative adversarial network that can effectively generate high-quality and realistic COVID-19 CT images for use in deep-learning-based medical imaging tasks. Experimental results show that the proposed method outperforms other state-of-the-art image synthesis methods with the generated COVID-19 CT images and indicates promising for various machine learning applications including semantic segmentation and classification.


Subject(s)
COVID-19/diagnostic imaging , Deep Learning , Tomography, X-Ray Computed , Humans , Lung/diagnostic imaging , Radiography, Thoracic , SARS-CoV-2
8.
Sensors (Basel) ; 20(23)2020 Nov 30.
Article in English | MEDLINE | ID: mdl-33266072

ABSTRACT

Realistic synthetic data can be useful for data augmentation when training deep learning models to improve seismological detection and classification performance. In recent years, various deep learning techniques have been successfully applied in modern seismology. Due to the performance of deep learning depends on a sufficient volume of data, the data augmentation technique as a data-space solution is widely utilized. In this paper, we propose a Generative Adversarial Networks (GANs) based model that uses conditional knowledge to generate high-quality seismic waveforms. Unlike the existing method of generating samples directly from noise, the proposed method generates synthetic samples based on the statistical characteristics of real seismic waveforms in embedding space. Moreover, a content loss is added to relate high-level features extracted by a pre-trained model to the objective function to enhance the quality of the synthetic data. The classification accuracy is increased from 96.84% to 97.92% after mixing a certain amount of synthetic seismic waveforms, and results of the quality of seismic characteristics derived from the representative experiment show that the proposed model provides an effective structure for generating high-quality synthetic seismic waveforms. Thus, the proposed model is experimentally validated as a promising approach to realistic high-quality seismic waveform data augmentation.

9.
Sensors (Basel) ; 20(22)2020 Nov 23.
Article in English | MEDLINE | ID: mdl-33238396

ABSTRACT

Speech emotion recognition predicts the emotional state of a speaker based on the person's speech. It brings an additional element for creating more natural human-computer interactions. Earlier studies on emotional recognition have been primarily based on handcrafted features and manual labels. With the advent of deep learning, there have been some efforts in applying the deep-network-based approach to the problem of emotion recognition. As deep learning automatically extracts salient features correlated to speaker emotion, it brings certain advantages over the handcrafted-feature-based methods. There are, however, some challenges in applying them to the emotion recognition problem, because data required for properly training deep networks are often lacking. Therefore, there is a need for a new deep-learning-based approach which can exploit available information from given speech signals to the maximum extent possible. Our proposed method, called "Fusion-ConvBERT", is a parallel fusion model consisting of bidirectional encoder representations from transformers and convolutional neural networks. Extensive experiments were conducted on the proposed model using the EMO-DB and Interactive Emotional Dyadic Motion Capture Database emotion corpus, and it was shown that the proposed method outperformed state-of-the-art techniques in most of the test configurations.


Subject(s)
Emotions , Neural Networks, Computer , Speech , Humans
10.
Sensors (Basel) ; 20(15)2020 Jul 22.
Article in English | MEDLINE | ID: mdl-32707900

ABSTRACT

Visual object tracking is an important component of surveillance systems and many high-performance methods have been developed. However, these tracking methods tend to be optimized for the Red/Green/Blue (RGB) domain and are thus not suitable for use with the infrared (IR) domain. To overcome this disadvantage, many researchers have constructed datasets for IR analysis, including those developed for The Thermal Infrared Visual Object Tracking (VOT-TIR) challenges. As a consequence, many state-of-the-art trackers for the IR domain have been proposed, but there remains a need for reliable IR-based trackers for anti-air surveillance systems, including the construction of a new IR dataset for this purpose. In this paper, we collect various anti-air thermal-wave IR (TIR) images from an electro-optical surveillance system to create a new dataset. We also present a framework based on an end-to-end convolutional neural network that learns object tracking in the IR domain for anti-air targets such as unmanned aerial vehicles (UAVs) and drones. More specifically, we adopt a Siamese network for feature extraction and three region proposal networks for the classification and regression branches. In the inference phase, the proposed network is formulated as a detection-by-tracking method, and kernel filters for the template branch that are continuously updated for every frame are introduced. The proposed network is able to learn robust structural information for the targets during offline training, and the kernel filters can robustly track the targets, demonstrating enhanced performance. Experimental results from the new IR dataset reveal that the proposed method achieves outstanding performance, with a real-time processing speed of 40 frames per second.

11.
Article in English | MEDLINE | ID: mdl-32142439

ABSTRACT

In this paper, we propose a novel image dehazing method. Typical deep learning models for dehazing are trained on paired synthetic indoor dataset. Therefore, these models may be effective for indoor image dehazing but less so for outdoor images. We propose a heterogeneous Generative Adversarial Networks (GAN) based method composed of a cycle-consistent Generative Adversarial Networks (CycleGAN) for producing haze-clear images and a conditional Generative Adversarial Networks (cGAN) for preserving textural details. We introduce a novel loss function in the training of the fused network to minimize GAN generated artifacts, to recover fine details, and to preserve color components. These networks are fused via a convolutional neural network (CNN) to generate dehazed image. Extensive experiments demonstrate that the proposed method significantly outperforms the state-of-the-art methods on both synthetic and real-world hazy images.

12.
Annu Int Conf IEEE Eng Med Biol Soc ; 2018: 376-379, 2018 Jul.
Article in English | MEDLINE | ID: mdl-30440414

ABSTRACT

Pattern classification based on deep network outperforms conventional methods in many tasks. However, if the database for training exhibits internal representation that lacks substantial discernibility for different classes, the network is considered that learning is essentially failed. Such failure is evident when the accuracy drops sharply in the experiments performing classification task where the animal sounds are observed similar. To address and remedy the learning problem, this paper proposes a novel approach composed of a combination of multiple CNNs each separately pre-trained for generating midlevel features according to each class and then merged into a combined CNN unit with SVM for overall classification. For experiment, animal sound database that include 3 classes with 102 species is firstly established. From the experimental results using the database, the proposed method is shown to outperform over prominent conventional methods.


Subject(s)
Neural Networks, Computer , Sound , Support Vector Machine , Animals , Databases, Factual
13.
Sensors (Basel) ; 18(5)2018 May 08.
Article in English | MEDLINE | ID: mdl-29738520

ABSTRACT

This paper focuses on underwater target tracking based on a multi-static sonar network composed of passive sonobuoys and an active ping. In the multi-static sonar network, the location of the target can be estimated using TDOA (Time Difference of Arrival) measurements. However, since the sensor network may obtain insufficient and inaccurate TDOA measurements due to ambient noise and other harsh underwater conditions, target tracking performance can be significantly degraded. We propose a robust target tracking algorithm designed to operate in such a scenario. First, track management with track splitting is applied to reduce performance degradation caused by insufficient measurements. Second, a target location is estimated by a fusion of multiple TDOA measurements using a Gaussian Mixture Model (GMM). In addition, the target trajectory is refined by conducting a stack-based data association method based on multiple-frames measurements in order to more accurately estimate target trajectory. The effectiveness of the proposed method is verified through simulations.

14.
J Opt Soc Am A Opt Image Sci Vis ; 34(2): 280-293, 2017 Feb 01.
Article in English | MEDLINE | ID: mdl-28157856

ABSTRACT

This paper addresses the problem of multi-object tracking in complex scenes by a single, static, uncalibrated camera. Tracking-by-detection is a widely used approach for multi-object tracking. Challenges still remain in complex scenes, however, when this approach has to deal with occlusions, unreliable detections (e.g., inaccurate position/size, false positives, or false negatives), and sudden object motion/appearance changes, among other issues. To handle these problems, this paper presents a novel online multi-object tracking method, which can be fully applied to real-time applications. First, an object tracking process based on frame-by-frame association with a novel affinity model and an appearance update that does not rely on online learning is proposed to effectively and rapidly assign detections to tracks. Second, a two-stage drift handling method with novel track confidence is proposed to correct drifting tracks caused by the abrupt motion change of objects under occlusion and prolonged inaccurate detections. In addition, a fragmentation handling method based on a track-to-track association is proposed to solve the problem in which an object trajectory is broken into several tracks due to long-term occlusions. Based on experimental results derived from challenging public data sets, the proposed method delivers an impressive performance compared with other state-of-the-art methods. Furthermore, additional performance analysis demonstrates the effect and usefulness of each component of the proposed method.

16.
Med Biol Eng Comput ; 54(6): 915-26, 2016 Jun.
Article in English | MEDLINE | ID: mdl-26753778

ABSTRACT

A novel approach for assisting bidirectional communication between people of normal hearing and hearing-impaired is presented. While the existing hearing-impaired assistive devices such as hearing aids and cochlear implants are vulnerable in extreme noise conditions or post-surgery side effects, the proposed concept is an alternative approach wherein spoken dialogue is achieved by means of employing a robust speech recognition technique which takes into consideration of noisy environmental factors without any attachment into human body. The proposed system is a portable device with an acoustic beamformer for directional noise reduction and capable of performing speech-to-text transcription function, which adopts a keyword spotting method. It is also equipped with an optimized user interface for hearing-impaired people, rendering intuitive and natural device usage with diverse domain contexts. The relevant experimental results confirm that the proposed interface design is feasible for realizing an effective and efficient intelligent agent for hearing-impaired.


Subject(s)
Hearing Aids , Hearing Loss/therapy , Speech , Algorithms , Humans , User-Computer Interface
17.
ScientificWorldJournal ; 2014: 146040, 2014.
Article in English | MEDLINE | ID: mdl-25170520

ABSTRACT

A new voice activity detector for noisy environments is proposed. In conventional algorithms, the endpoint of speech is found by applying an edge detection filter that finds the abrupt changing point in a feature domain. However, since the frame energy feature is unstable in noisy environments, it is difficult to accurately find the endpoint of speech. Therefore, a novel feature extraction algorithm based on the double-combined Fourier transform and envelope line fitting is proposed. It is combined with an edge detection filter for effective detection of endpoints. Effectiveness of the proposed algorithm is evaluated and compared to other VAD algorithms using two different databases, which are AURORA 2.0 database and SITEC database. Experimental results show that the proposed algorithm performs well under a variety of noisy conditions.


Subject(s)
Algorithms , Models, Theoretical , Speech Acoustics , Noise
18.
ScientificWorldJournal ; 2013: 153465, 2013.
Article in English | MEDLINE | ID: mdl-24381510

ABSTRACT

A reinforced AdaBoost learning algorithm is proposed for object detection with local pattern representations. In implementing AdaBoost learning, the proposed algorithm employs an exponential criterion as a cost function and Newton's method for its optimization. In particular, we introduce an optimal selection of weak classifiers minimizing the cost function and derive the reinforced predictions based on a judicial confidence estimate to determine the classification results. The weak classifier of the proposed method produces real-valued predictions while that of the conventional AdaBoost method produces integer valued predictions of +1 or -1. Hence, in the conventional learning algorithms, the entire sample weights are updated by the same rate. On the contrary, the proposed learning algorithm allows the sample weights to be updated individually depending on the confidence level of each weak classifier prediction, thereby reducing the number of weak classifier iterations for convergence. Experimental classification performance on human face and license plate images confirm that the proposed method requires smaller number of weak classifiers than the conventional learning algorithm, resulting in higher learning and faster classification rates. An object detector implemented based on the proposed learning algorithm yields better performance in field tests in terms of higher detection rate with lower false positives than that of the conventional learning algorithm.


Subject(s)
Algorithms , Biometric Identification , Image Interpretation, Computer-Assisted/methods , Artificial Intelligence , Face , Humans , Image Enhancement/methods , Learning , Reproducibility of Results , Software
19.
DNA Res ; 15(5): 267-76, 2008 Oct.
Article in English | MEDLINE | ID: mdl-18799480

ABSTRACT

Various methods have been developed to detect horizontal gene transfer in bacteria, based on anomalous nucleotide composition, assuming that compositional features undergo amelioration in the host genome. Evolutionary theory predicts the inevitability of false positives when essential sequences are strongly conserved. Foreign genes could become more detectable on the basis of their higher order compositions if such features ameliorate more rapidly and uniformly than lower order features. This possibility is tested by comparing the heterogeneities of bacterial genomes with respect to strand-independent first- and second-order features, (i) G + C content and (ii) dinucleotide relative abundance, in 1 kb segments. Although statistical analysis confirms that (ii) is less inhomogeneous than (i) in all 12 species examined, extreme anomalies with respect to (ii) in the Escherichia coli K12 genome are typically co-located with essential genes.


Subject(s)
Bacteria/genetics , Gene Transfer, Horizontal , Genome, Bacterial , Base Composition , Escherichia coli K12/genetics
20.
Int J Neural Syst ; 18(6): 481-9, 2008 Dec.
Article in English | MEDLINE | ID: mdl-19145664

ABSTRACT

We review a new form of self-organizing map which is based on a nonlinear projection of latent points into data space, identical to that performed in the Generative Topographic Mapping (GTM).(1) But whereas the GTM is an extension of a mixture of experts, this model is an extension of a product of experts.(2) We show visualisation and clustering results on a data set composed of video data of lips uttering 5 Korean vowels. Finally we note that we may dispense with the probabilistic underpinnings of the product of experts and derive the same algorithm as a minimisation of mean squared error between the prototypes and the data. This leads us to suggest a new algorithm which incorporates local and global information in the clustering. Both ot the new algorithms achieve better results than the standard Self-Organizing Map.


Subject(s)
Artificial Intelligence , Cluster Analysis , Databases, Factual , Nonlinear Dynamics , Algorithms , Humans , Lip , Video Recording , Voice
SELECTION OF CITATIONS
SEARCH DETAIL
...