Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Más filtros

Banco de datos
Asunto principal
Tipo del documento
Publication year range
1.
Sensors (Basel) ; 24(19)2024 Sep 25.
Artículo en Inglés | MEDLINE | ID: mdl-39409253

RESUMEN

Speaker recognition is a technology that identifies the speaker in an input utterance by extracting speaker-distinguishable features from the speech signal. Speaker recognition is used for system security and authentication; therefore, it is crucial to extract unique features of the speaker to achieve high recognition rates. Representative methods for extracting these features include a classification approach, or utilizing contrastive learning to learn the speaker relationship between representations and then using embeddings extracted from a specific layer of the model. This paper introduces a framework for developing robust speaker recognition models through contrastive learning. This approach aims to minimize the similarity to hard negative samples-those that are genuine negatives, but have extremely similar features to the positives, leading to potential mistaken. Specifically, our proposed method trains the model by estimating hard negative samples within a mini-batch during contrastive learning, and then utilizes a cross-attention mechanism to determine speaker agreement for pairs of utterances. To demonstrate the effectiveness of our proposed method, we compared the performance of a deep learning model trained with a conventional loss function utilized in speaker recognition with that of a deep learning model trained using our proposed method, as measured by the equal error rate (EER), an objective performance metric. Our results indicate that when trained with the voxceleb2 dataset, the proposed method achieved an EER of 0.98% on the voxceleb1-E dataset and 1.84% on the voxceleb1-H dataset.


Asunto(s)
Habla , Humanos , Habla/fisiología , Algoritmos , Aprendizaje Profundo , Reconocimiento de Normas Patrones Automatizadas/métodos , Software de Reconocimiento del Habla
2.
Sensors (Basel) ; 23(16)2023 Aug 20.
Artículo en Inglés | MEDLINE | ID: mdl-37631815

RESUMEN

Voice spoofing attempts to break into a specific automatic speaker verification (ASV) system by forging the user's voice and can be used through methods such as text-to-speech (TTS), voice conversion (VC), and replay attacks. Recently, deep learning-based voice spoofing countermeasures have been developed. However, the problem with replay is that it is difficult to construct a large number of datasets because it requires a physical recording process. To overcome these problems, this study proposes a pre-training framework based on multi-order acoustic simulation for replay voice spoofing detection. Multi-order acoustic simulation utilizes existing clean signal and room impulse response (RIR) datasets to generate audios, which simulate the various acoustic configurations of the original and replayed audios. The acoustic configuration refers to factors such as the microphone type, reverberation, time delay, and noise that may occur between a speaker and microphone during the recording process. We assume that a deep learning model trained on an audio that simulates the various acoustic configurations of the original and replayed audios can classify the acoustic configurations of the original and replay audios well. To validate this, we performed pre-training to classify the audio generated by the multi-order acoustic simulation into three classes: clean signal, audio simulating the acoustic configuration of the original audio, and audio simulating the acoustic configuration of the replay audio. We also set the weights of the pre-training model to the initial weights of the replay voice spoofing detection model using the existing replay voice spoofing dataset and then performed fine-tuning. To validate the effectiveness of the proposed method, we evaluated the performance of the conventional method without pre-training and proposed method using an objective metric, i.e., the accuracy and F1-score. As a result, the conventional method achieved an accuracy of 92.94%, F1-score of 86.92% and the proposed method achieved an accuracy of 98.16%, F1-score of 95.08%.

SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda