Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Acoust Soc Am ; 155(1): 757-768, 2024 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-38284823

RESUMO

Sound zone methods aim to control the sound field produced by an array of loudspeakers to render a given audio content in specific areas while making it almost inaudible in others. At low frequencies, control filters are based on information of the electro-acoustical path between loudspeakers and listening areas, contained in the room impulse responses (RIRs). This information can be acquired wirelessly through ubiquitous networks of microphones. In that case and for real-time applications in general, short acquisition and processing times are critical. In addition, limiting the amount of data that should be retrieved and processed can also reduce computational demands. Furthermore, such a framework would enable fast adaptation of control filters in changing acoustic environments. This work explores reducing the amount of time and information required to compute control filters when rendering and updating low-frequency sound zones. Using real RIR measurements, it is demonstrated that in some standard acoustic rooms, acquisition times on the order of a few hundred milliseconds are sufficient for accurately rendering sound zones. Moreover, an additional amount of information can be removed from the acquired RIRs without degrading the performance.

2.
Artigo em Inglês | MEDLINE | ID: mdl-37124321

RESUMO

In the development of acoustic signal processing algorithms, their evaluation in various acoustic environments is of utmost importance. In order to advance evaluation in realistic and reproducible scenarios, several high-quality acoustic databases have been developed over the years. In this paper, we present another complementary database of acoustic recordings, referred to as the Multi-arraY Room Acoustic Database (MYRiAD). The MYRiAD database is unique in its diversity of microphone configurations suiting a wide range of enhancement and reproduction applications (such as assistive hearing, teleconferencing, or sound zoning), the acoustics of the two recording spaces, and the variety of contained signals including 1214 room impulse responses (RIRs), reproduced speech, music, and stationary noise, as well as recordings of live cocktail parties held in both rooms. The microphone configurations comprise a dummy head (DH) with in-ear omnidirectional microphones, two behind-the-ear (BTE) pieces equipped with 2 omnidirectional microphones each, 5 external omnidirectional microphones (XMs), and two concentric circular microphone arrays (CMAs) consisting of 12 omnidirectional microphones in total. The two recording spaces, namely the SONORA Audio Laboratory (SAL) and the Alamire Interactive Laboratory (AIL), have reverberation times of 2.1 s and 0.5 s, respectively. Audio signals were reproduced using 10 movable loudspeakers in the SAL and a built-in array of 24 loudspeakers in the AIL. MATLAB and Python scripts are included for accessing the signals as well as microphone and loudspeaker coordinates. The database is publicly available (https://zenodo.org/record/7389996).

3.
Front Neuroinform ; 16: 942978, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36465690

RESUMO

Recent deep neural network based methods provide accurate binaural source localization performance. These data-driven models map measured binaural cues directly to source locations hence their performance highly depend on the training data distribution. In this paper, we propose a parametric embedding that maps the binaural cues to a low-dimensional space where localization can be done with a nearest-neighbor regression. We implement the embedding using a neural network, optimized to map points that are close to each other in the latent space (the space of source azimuths or elevations) to nearby points in the embedding space, thus the Euclidean distances between the embeddings reflect their source proximities, and the structure of the embeddings forms a manifold, which provides interpretability to the embeddings. We show that the proposed embedding generalizes well in various acoustic conditions (with reverberation) different from those encountered during training, and provides better performance than unsupervised embeddings previously used for binaural localization. In addition, the proposed method performs better than or equally well as a feed-forward neural network based model that directly estimates the source locations from the binaural cues, and it has better results than the feed-forward model when a small amount of training data is used. Moreover, we also compare the proposed embedding using both supervised and weakly supervised learning, and show that in both conditions, the resulting embeddings perform similarly well, but the weakly supervised embedding allows to estimate source azimuth and elevation simultaneously.

4.
J Acoust Soc Am ; 152(5): 2735, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36456297

RESUMO

This paper proposes an experimental setup for measuring the sound radiation of a quadrotor drone using a hemispherical microphone array. The measured sound field is decomposed into spherical harmonics, which enables the evaluation of the radiation pattern to non-probed positions. Additionally, the measurement setup allows the assessment of noise emission and psychoacoustic metrics at a wide range of angles. The obtained directivity patterns using a third-order spherical harmonic decomposition (SHD) are shown to exhibit low distortion with respect to the original measurements, therefore, validating the SHD as an adequate representation strategy. Furthermore, the noise emissions are evaluated, and the highest noise emission is observed in the 90 ° azimuth direction. The exterior spherical acoustic holography description is employed to evaluate psychoacoustic metrics at arbitrary far-field positions and validated on a reference microphone. The estimated psychoacoustic metrics are closely related to the target metrics, which allows for sound quality analysis at any point external to the drone.

5.
JMIR Ment Health ; 9(2): e31724, 2022 Feb 11.
Artigo em Inglês | MEDLINE | ID: mdl-35147507

RESUMO

BACKGROUND: Emotions and mood are important for overall well-being. Therefore, the search for continuous, effortless emotion prediction methods is an important field of study. Mobile sensing provides a promising tool and can capture one of the most telling signs of emotion: language. OBJECTIVE: The aim of this study is to examine the separate and combined predictive value of mobile-sensed language data sources for detecting both momentary emotional experience as well as global individual differences in emotional traits and depression. METHODS: In a 2-week experience sampling method study, we collected self-reported emotion ratings and voice recordings 10 times a day, continuous keyboard activity, and trait depression severity. We correlated state and trait emotions and depression and language, distinguishing between speech content (spoken words), speech form (voice acoustics), writing content (written words), and writing form (typing dynamics). We also investigated how well these features predicted state and trait emotions using cross-validation to select features and a hold-out set for validation. RESULTS: Overall, the reported emotions and mobile-sensed language demonstrated weak correlations. The most significant correlations were found between speech content and state emotions and between speech form and state emotions, ranging up to 0.25. Speech content provided the best predictions for state emotions. None of the trait emotion-language correlations remained significant after correction. Among the emotions studied, valence and happiness displayed the most significant correlations and the highest predictive performance. CONCLUSIONS: Although using mobile-sensed language as an emotion marker shows some promise, correlations and predictive R2 values are low.

6.
Artigo em Inglês | MEDLINE | ID: mdl-34721556

RESUMO

Amongst the various characteristics of a speech signal, the expression of emotion is one of the characteristics that exhibits the slowest temporal dynamics. Hence, a performant speech emotion recognition (SER) system requires a predictive model that is capable of learning sufficiently long temporal dependencies in the analysed speech signal. Therefore, in this work, we propose a novel end-to-end neural network architecture based on the concept of dilated causal convolution with context stacking. Firstly, the proposed model consists only of parallelisable layers and is hence suitable for parallel processing, while avoiding the inherent lack of parallelisability occurring with recurrent neural network (RNN) layers. Secondly, the design of a dedicated dilated causal convolution block allows the model to have a receptive field as large as the input sequence length, while maintaining a reasonably low computational cost. Thirdly, by introducing a context stacking structure, the proposed model is capable of exploiting long-term temporal dependencies hence providing an alternative to the use of RNN layers. We evaluate the proposed model in SER regression and classification tasks and provide a comparison with a state-of-the-art end-to-end SER model. Experimental results indicate that the proposed model requires only 1/3 of the number of model parameters used in the state-of-the-art model, while also significantly improving SER performance. Further experiments are reported to understand the impact of using various types of input representations (i.e. raw audio samples vs log mel-spectrograms) and to illustrate the benefits of an end-to-end approach over the use of hand-crafted audio features. Moreover, we show that the proposed model can efficiently learn intermediate embeddings preserving speech emotion information.

7.
Artigo em Inglês | MEDLINE | ID: mdl-34721557

RESUMO

If music is the language of the universe, musical note onsets may be the syllables for this language. Not only do note onsets define the temporal pattern of a musical piece, but their time-frequency characteristics also contain rich information about the identity of the musical instrument producing the notes. Note onset detection (NOD) is the basic component for many music information retrieval tasks and has attracted significant interest in audio signal processing research. In this paper, we propose an NOD method based on a novel feature coined as Normalized Identification of Note Onset based on Spectral Sparsity (NINOS2). The NINOS2 feature can be thought of as a spectral sparsity measure, aiming to exploit the difference in spectral sparsity between the different parts of a musical note. This spectral structure is revealed when focusing on low-magnitude spectral components that are traditionally filtered out when computing note onset features. We present an extensive set of NOD simulation results covering a wide range of instruments, playing styles, and mixing options. The proposed algorithm consistently outperforms the baseline Logarithmic Spectral Flux (LSF) feature for the most difficult group of instruments which are the sustained-strings instruments. It also shows better performance for challenging scenarios including polyphonic music and vibrato performances.

8.
IEEE J Biomed Health Inform ; 25(12): 4300-4307, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34314365

RESUMO

One of the current gaps in teleaudiology is the lack of methods for adult hearing screening viable for use in individuals of unknown language and in varying environments. We have developed a novel automated speech-in-noise test that uses stimuli viable for use in non-native listeners. The test reliability has been demonstrated in laboratory settings and in uncontrolled environmental noise settings in previous studies. The aim of this study was: (i) to evaluate the ability of the test to identify hearing loss using multivariate logistic regression classifiers in a population of 148 unscreened adults and (ii) to evaluate the ear-level sound pressure levels generated by different earphones and headphones as a function of the test volume. The multivariate classifiers had sensitivity equal to 0.79 and specificity equal to 0.79 using both the full set of features extracted from the test as well as a subset of three features (speech recognition threshold, age, and number of correct responses). The analysis of the ear-level sound pressure levels showed substantial variability across transducer types and models, with earphones levels being up to 22 dB lower than those of headphones. Overall, these results suggest that the proposed approach might be viable for hearing screening in varying environments if an option to self-adjust the test volume is included and if headphones are used. Future research is needed to assess the viability of the test for screening at a distance, for example by addressing the influence of user interface, device, and settings, on a large sample of subjects with varying hearing loss.


Assuntos
Ruído , Fala , Adulto , Audição , Humanos , Reprodutibilidade dos Testes , Transdutores
9.
Am J Audiol ; 29(3S): 564-576, 2020 Sep 18.
Artigo em Inglês | MEDLINE | ID: mdl-32946249

RESUMO

Purpose The aim of this study was to develop and evaluate a novel, automated speech-in-noise test viable for widespread in situ and remote screening. Method Vowel-consonant-vowel sounds in a multiple-choice consonant discrimination task were used. Recordings from a professional male native English speaker were used. A novel adaptive staircase procedure was developed, based on the estimated intelligibility of stimuli rather than on theoretical binomial models. Test performance was assessed in a population of 26 young adults (YAs) with normal hearing and in 72 unscreened adults (UAs), including native and nonnative English listeners. Results The proposed test provided accurate estimates of the speech recognition threshold (SRT) compared to a conventional adaptive procedure. Consistent outcomes were observed in YAs in test/retest and in controlled/uncontrolled conditions and in UAs in native and nonnative listeners. The SRT increased with increasing age, hearing loss, and self-reported hearing handicap in UAs. Test duration was similar in YAs and UAs irrespective of age and hearing loss. The test-retest repeatability of SRTs was high (Pearson correlation coefficient = .84), and the pass/fail outcomes of the test were reliable in repeated measures (Cohen's κ = .8). The test was accurate in identifying ears with pure-tone thresholds > 25 dB HL (accuracy = 0.82). Conclusion This study demonstrated the viability of the proposed test in subjects of varying language in terms of accuracy, reliability, and short test time. Further research is needed to validate the test in a larger population across a wider range of languages and hearing loss and to identify optimal classification criteria for screening purposes.


Assuntos
Perda Auditiva/diagnóstico , Ruído , Percepção da Fala , Teste do Limiar de Recepção da Fala/métodos , Telemedicina/métodos , Adulto , Idoso , Idoso de 80 Anos ou mais , Automação , Feminino , Perda Auditiva/fisiopatologia , Humanos , Masculino , Pessoa de Meia-Idade , Reprodutibilidade dos Testes , Adulto Jovem
10.
J Acoust Soc Am ; 146(5): 3562, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31795724

RESUMO

An experiment was conducted to identify the perceptual effects of acoustical properties of domestic listening environments, in a stereophonic reproduction scenario. Nine sound fields, originating from four rooms, were captured and spatially reproduced over a three-dimensional loudspeaker array. A panel of ten expert assessors identified and quantified the perceived differences of those sound fields using their own perceptual attributes. A multivariate analysis revealed two principal dimensions that could summarize the sound fields of this investigation. Four perceptual constructs seem to characterize the sensory properties of these dimensions, relating to Reverberance, Width & Envelopment, Proximity, and Bass. Overall, the results signify the importance of reverberation in residential listening environments on the perceived sensory experience, and as a consequence, the assessors' preferences towards certain decay times.

11.
J Acoust Soc Am ; 141(3): 1459, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-28372066

RESUMO

An experiment was conducted to determine the perceptual effects of car cabin acoustics on the reproduced sound field. In-car measurements were conducted whilst the cabin's interior was physically modified. The captured sound fields were recreated in the laboratory using a three-dimensional loudspeaker array. A panel of expert assessors followed a rapid sensory analysis protocol, the flash profile, to perceptually characterize and evaluate 12 acoustical conditions of the car cabin using individually elicited attributes. A multivariate analysis revealed the panel's consensus and the identified perceptual constructs. Six perceptual constructs characterize the differences between the acoustical conditions of the cabin, related to bass, ambience, transparency, width and envelopment, brightness, and image focus. The current results indicate the importance of several acoustical properties of a car's interior on the perceived sound qualities. Moreover, they signify the capacity of the applied methodology in assessing spectral and spatial properties of automotive environments in laboratory settings using a time-efficient and flexible protocol.

12.
J Acoust Soc Am ; 140(1): EL101, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-27475197

RESUMO

Subjective audio quality evaluation experiments have been conducted to assess the performance of embedded-optimization-based precompensation algorithms for mitigating perceptible linear and nonlinear distortion in audio signals. It is concluded with statistical significance that the perceived audio quality is improved by applying an embedded-optimization-based precompensation algorithm, both in case (i) nonlinear distortion and (ii) a combination of linear and nonlinear distortion is present. Moreover, a significant positive correlation is reported between the collected subjective and objective PEAQ audio quality scores, supporting the validity of using PEAQ to predict the impact of linear and nonlinear distortion on the perceived audio quality.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...