Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Más filtros

Banco de datos
Tipo del documento
Publication year range
1.
Sensors (Basel) ; 23(7)2023 Mar 23.
Artículo en Inglés | MEDLINE | ID: mdl-37050458

RESUMEN

This study proposes a sound event localization and detection (SELD) method using imbalanced real and synthetic data via a multi-generator. The proposed method is based on a residual convolutional neural network (RCNN) and a transformer encoder for real spatial sound scenes. SELD aims to classify the sound event, detect the onset and offset of the classified event, and estimate the direction of the sound event. In Detection and Classification of Acoustic Scenes and Events (DCASE) 2022 Task 3, SELD is performed with a few real spatial sound scene data and a relatively large number of synthetic data. When a model is trained using imbalanced data, it can proceed by focusing only on a larger number of data. Thus, a multi-generator that samples real and synthetic data at a specific rate in one batch is proposed to prevent this problem. We applied the data augmentation technique SpecAugment and used time-frequency masking to the dataset. Furthermore, we propose a neural network architecture to apply the RCNN and transformer encoder. Several models were trained with various structures and hyperparameters, and several ensemble models were obtained by "cherry-picking" specific models. Based on the experiment, the single model of the proposed method and the model applied with the ensemble exhibited improved performance compared with the baseline model.

2.
Sensors (Basel) ; 23(23)2023 Dec 03.
Artículo en Inglés | MEDLINE | ID: mdl-38067965

RESUMEN

Speech synthesis is a technology that converts text into speech waveforms. With the development of deep learning, neural network-based speech synthesis technology is being researched in various fields, and the quality of synthesized speech has significantly improved. In particular, Grad-TTS, a speech synthesis model based on the denoising diffusion probabilistic model (DDPM), exhibits high performance in various domains, generates high-quality speech, and supports multi-speaker speech synthesis. However, speech synthesis for an unseen speaker is not possible. Therefore, this study proposes an effective zero-shot multi-speaker speech synthesis model that improves the Grad-TTS structure. The proposed method enables the reception of speaker information from speech references using a pre-trained speaker recognition model. In addition, by converting speaker information via information perturbation, the model can learn various types of speaker information, excluding those in the dataset. To evaluate the performance of the proposed method, we measured objective performance indicators, namely speaker encoder cosine similarity (SECS) and mean opinion score (MOS). To evaluate the synthesis performance for both the seen speaker and unseen speaker scenarios, Grad-TTS, SC-GlowTTS, and YourTTS were compared. The results demonstrated excellent speech synthesis performance for seen speakers and a performance similar to that of the zero-shot multi-speaker speech synthesis model.

3.
Sensors (Basel) ; 23(16)2023 Aug 20.
Artículo en Inglés | MEDLINE | ID: mdl-37631815

RESUMEN

Voice spoofing attempts to break into a specific automatic speaker verification (ASV) system by forging the user's voice and can be used through methods such as text-to-speech (TTS), voice conversion (VC), and replay attacks. Recently, deep learning-based voice spoofing countermeasures have been developed. However, the problem with replay is that it is difficult to construct a large number of datasets because it requires a physical recording process. To overcome these problems, this study proposes a pre-training framework based on multi-order acoustic simulation for replay voice spoofing detection. Multi-order acoustic simulation utilizes existing clean signal and room impulse response (RIR) datasets to generate audios, which simulate the various acoustic configurations of the original and replayed audios. The acoustic configuration refers to factors such as the microphone type, reverberation, time delay, and noise that may occur between a speaker and microphone during the recording process. We assume that a deep learning model trained on an audio that simulates the various acoustic configurations of the original and replayed audios can classify the acoustic configurations of the original and replay audios well. To validate this, we performed pre-training to classify the audio generated by the multi-order acoustic simulation into three classes: clean signal, audio simulating the acoustic configuration of the original audio, and audio simulating the acoustic configuration of the replay audio. We also set the weights of the pre-training model to the initial weights of the replay voice spoofing detection model using the existing replay voice spoofing dataset and then performed fine-tuning. To validate the effectiveness of the proposed method, we evaluated the performance of the conventional method without pre-training and proposed method using an objective metric, i.e., the accuracy and F1-score. As a result, the conventional method achieved an accuracy of 92.94%, F1-score of 86.92% and the proposed method achieved an accuracy of 98.16%, F1-score of 95.08%.

4.
Sensors (Basel) ; 21(2)2021 Jan 14.
Artículo en Inglés | MEDLINE | ID: mdl-33466847

RESUMEN

Road surfaces should be maintained in excellent condition to ensure the safety of motorists. To this end, there exist various road-surface monitoring systems, each of which is known to have specific advantages and disadvantages. In this study, a smartphone-based dual-acquisition method system capable of acquiring images of road-surface anomalies and measuring the acceleration of the vehicle upon their detection was developed to explore the complementarity benefits of the two different methods. A road test was conducted in which 1896 road-surface images and corresponding three-axis acceleration data were acquired. All images were classified based on the presence and type of anomalies, and histograms of the maximum variations in the acceleration in the gravitational direction were comparatively analyzed. When the types of anomalies were not considered, it was difficult to identify their effects using the histograms. The differences among histograms became evident upon consideration of whether the vehicle wheels passed over the anomalies, and when excluding longitudinal anomalies that caused minor changes in acceleration. Although the image-based monitoring system used in this research provided poor performance on its own, the severity of road-surface anomalies was accurately inferred using the specific range of the maximum variation of acceleration in the gravitational direction.

5.
Sensors (Basel) ; 20(13)2020 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-32635619

RESUMEN

Deep neural networks (DNNs) have achieved significant advancements in speech processing, and numerous types of DNN architectures have been proposed in the field of sound localization. When a DNN model is deployed for sound localization, a fixed input size is required. This is generally determined by the number of microphones, the fast Fourier transform size, and the frame size. if the numbers or configurations of the microphones change, the DNN model should be retrained because the size of the input features changes. in this paper, we propose a configuration-invariant sound localization technique using the azimuth-frequency representation and convolutional neural networks (CNNs). the proposed CNN model receives the azimuth-frequency representation instead of time-frequency features as the input features. the proposed model was evaluated in different environments from the microphone configuration in which it was originally trained. for evaluation, single sound source is simulated using the image method. Through the evaluations, it was confirmed that the localization performance was superior to the conventional steered response power phase transform (SRP-PHAT) and multiple signal classification (MUSIC) methods.

6.
Sensors (Basel) ; 20(19)2020 Sep 28.
Artículo en Inglés | MEDLINE | ID: mdl-32998389

RESUMEN

Road markings constitute one of the most important elements of the road. Moreover, they are managed according to specific standards, including a criterion for a luminous contrast, which can be referred to as retroreflection. Retroreflection can be used to measure the reflection properties of road markings or other road facilities. It is essential to manage retroreflection in order to improve road safety and sustainability. In this study, we propose a dynamic retroreflection estimation method for longitudinal road markings, which employs a luminance camera and convolutional neural networks (CNNs). The images that were captured by a luminance camera were input into a classification and regression CNN model in order to determine whether the longitudinal road marking was accurately acquired. A segmentation model was also developed and implemented in order to accurately present the longitudinal road marking and reference plate if a longitudinal road marking was determined to exist in the captured image. The retroreflection was dynamically measured as a driver drove along an actual road; consequently, the effectiveness of the proposed method was demonstrated.

7.
Sensors (Basel) ; 19(24)2019 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-31842513

RESUMEN

The various defects that occur on asphalt pavement are a direct cause car accidents, and countermeasures are required because they cause significantly dangerous situations. In this paper, we propose fully convolutional neural networks (CNN)-based road surface damage detection with semi-supervised learning. First, the training DB is collected through the camera installed in the vehicle while driving on the road. Moreover, the CNN model is trained in the form of a semantic segmentation using the deep convolutional autoencoder. Here, we augmented the training dataset depending on brightness, and finally generated a total of 40,536 training images. Furthermore, the CNN model is updated by using the pseudo-labeled images from the semi-supervised learning methods for improving the performance of road surface damage detection technique. To demonstrate the effectiveness of the proposed method, 450 evaluation datasets were created to verify the performance of the proposed road surface damage detection, and four experts evaluated each image. As a result, it is confirmed that the proposed method can properly segment the road surface damages.

8.
Sensors (Basel) ; 19(20)2019 Oct 12.
Artículo en Inglés | MEDLINE | ID: mdl-31614801

RESUMEN

In networking systems such as cloud radio access networks (C-RAN) where users receive the connection and data service from short-range, light-weight base stations (BSs), users' mobility has a significant impact on their association with BSs. Although communicating with the closest BS may yield the most desirable channel conditions, such strategy can lead to certain BSs being over-populated while leaving remaining BSs under-utilized. In addition, mobile users may encounter frequent handovers, which imposes a non-negligible burden on BSs and users. To reduce the handover overhead while balancing the traffic loads between BSs, we propose an optimal user association strategy for a large-scale mobile Internet of Things (IoT) network operating on C-RAN. We begin with formulating an optimal user association scheme focusing only on the task of load balancing. Thereafter, we revise the formulation such that the number of handovers is minimized while keeping BSs well-balanced in terms of the traffic load. To evaluate the performance of the proposed scheme, we implement a discrete-time network simulator. The evaluation results show that the proposed optimal user association strategy can significantly reduce the number of handovers, while outperforming conventional association schemes in terms of load balancing.

9.
Bioengineering (Basel) ; 10(5)2023 May 19.
Artículo en Inglés | MEDLINE | ID: mdl-37237685

RESUMEN

The goal of clinical practice education is to develop the ability to apply theoretical knowledge in a clinical setting and to foster growth as a professional healthcare provider. One effective method of achieving this is through the utilization of Standardized Patients (SP) in education, which familiarizes students with real patient interviews and allows educators to assess their clinical performance skills. However, SP education faces challenges such as the cost of hiring actors and the shortage of professional educators to train them. In this paper, we aim to alleviate these issues by utilizing deep learning models to replace the actors. We employ the Conformer model for the implementation of the AI patient, and we develop a Korean SP scenario data generator to collect data for training responses to diagnostic questions. Our Korean SP scenario data generator is devised to generate SP scenarios based on the provided patient information, using pre-prepared questions and answers. In the AI patient training process, two types of data are employed: common data and personalized data. The common data are employed to develop natural general conversation skills, while personalized data, from the SP scenario, are utilized to learn specific clinical information relevant to a patient's role. Based on these data, to evaluate the learning efficiency of the Conformer structure, a comparison was conducted with the Transformer using the BLEU score and WER as evaluation metrics. Experimental results showed that the Conformer-based model demonstrated a 3.92% and 6.74% improvement in BLEU and WER performance compared to the Transformer-based model, respectively. The dental AI patient for SP simulation presented in this paper has the potential to be applied to other medical and nursing fields, provided that additional data collection processes are conducted.

10.
J Forensic Sci ; 68(1): 139-153, 2023 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-36273272

RESUMEN

The number of smartwatch users has been rapidly increasing in recent years. A smartwatch is a wearable device that collects various types of data using sensors and provides basic functions, such as healthcare-related measurements and audio recording. In this study, we proposed the forensic authentication method for audio recordings from the Voice Recording application in the Samsung Galaxy Watch4 series. First, a total of 240 audio recordings from each of the four different models, paired with four different smartphones for synchronization via Bluetooth, were collected and verified. To analyze the characteristics of smartwatch audio recordings, we examined the transition of the audio latency, writable audio bandwidth, timestamps, and file structure between those generated in the smartwatches and those edited using the Voice Recording application of the paired smartphones. In addition, the devices with the audio recordings were examined via the Android Debug Bridge (ADB) tool and compared with the timestamps stored in the file system. The experimental results showed that the audio latency, writable audio bandwidth, and file structure of audio recordings generated by smartwatches differed from those generated by smartphones. Additionally, by analyzing the file structure, audio recordings can be classified as unmanipulated, manipulation has been attempted, or manipulated. Finally, we can forensically authenticate the audio recordings generated by the Voice Recorder application in the Samsung Galaxy Watch4 series by accessing the smartwatches and analyzing the timestamps related to the audio recordings in the file system.


Asunto(s)
Grabaciones de Sonido , Dispositivos Electrónicos Vestibles , Teléfono Inteligente , Medicina Legal
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda