Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
J Acoust Soc Am ; 155(4): 2659-2669, 2024 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-38634661

RESUMO

Within the realm of voice classification, singers could be sub-categorized by the weight of their repertoire, the so-called "singer's Fach." However, the opposite pole terms "lyric" and "dramatic" singing are not yet well defined by their acoustic and articulatory characteristics. Nine professional singers of different singers' Fach were asked to sing a diatonic scale on the vowel /a/, first in what the singers considered as lyric and second in what they considered as dramatic. Image recording was performed using real time magnetic resonance imaging (MRI) with 25 frames/s, and the audio signal was recorded via an optical microphone system. Analysis was performed with regard to sound pressure level (SPL), vibrato amplitude, and frequency and resonance frequencies as well as articulatory settings of the vocal tract. The analysis revealed three primary differences between dramatic and lyric singing: Dramatic singing was associated with greater SPL and greater vibrato amplitude and frequency as well as lower resonance frequencies. The higher SPL is an indication of voice source changes, and the lower resonance frequencies are probably caused by the lower larynx position. However, all these strategies showed a considerable individual variability. The singers' Fach might contribute to perceptual differences even for the same singer with regard to the respective repertoire.


Assuntos
Música , Canto , Qualidade da Voz , Acústica
2.
J Acoust Soc Am ; 153(6): 3281, 2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-37307363

RESUMO

This study investigated how the bandwidths of resonances simulated by transmission-line models of the vocal tract compare to bandwidths measured from physical three-dimensional printed vowel resonators. Three types of physical resonators were examined: models with realistic vocal tract shapes based on Magnetic Resonance Imaging (MRI) data, straight axisymmetric tubes with varying cross-sectional areas, and two-tube approximations of the vocal tract with notched lips. All physical models had hard walls and closed glottis so the main loss mechanisms contributing to the bandwidths were sound radiation, viscosity, and heat conduction. These losses were accordingly included in the simulations, in two variants: A coarse approximation of the losses with frequency-independent lumped elements, and a detailed, theoretically more precise loss model. Across the examined frequency range from 0 to 5 kHz, the resonance bandwidths increased systematically from the simulations with the coarse loss model to the simulations with the detailed loss model, to the tube-shaped physical resonators, and to the MRI-based resonators. This indicates that the simulated losses, especially the commonly used approximations, underestimate the real losses in physical resonators. Hence, more realistic acoustic simulations of the vocal tract require improved models for viscous and radiation losses.


Assuntos
Acústica , Glote , Vibração , Viscosidade
3.
J Acoust Soc Am ; 151(1): 45, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-35105025

RESUMO

The periodic repetitions of laryngeal adduction and abduction gestures were uttered by 16 subjects. The movement of the cuneiform tubercles was tracked over time in the laryngoscopic recordings of these utterances. The adduction velocity and abduction velocity were determined objectively by means of a piecewise linear model fitted to the cuneiform tubercle trajectories. The abduction was found to be significantly faster than the adduction. This was interpreted in terms of the biomechanics and active control by the nervous system. The biomechanical properties could be responsible for a velocity of abduction that is up to 51% higher compared to the velocity of adduction. Additionally, the adduction velocity may be actively limited to prevent an overshoot of the intended adduction degree when the vocal folds are approximated to initiate phonation.


Assuntos
Gestos , Laringe , Humanos , Laringe/diagnóstico por imagem , Movimento , Fonação/fisiologia , Prega Vocal/fisiologia
4.
J Acoust Soc Am ; 149(1): 466, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33514162

RESUMO

The influence of non-smooth trachea walls on phonation onset and offset pressures and the fundamental frequency of oscillation were experimentally investigated for three different synthetic vocal fold models. Three models of the trachea were compared: a cylindrical tube (smooth walls) and wavy-walled tubes with ripple depths of 1 and 2 mm. Threshold pressures for the onset and offset of phonation were measured at the lower and upper ends of each trachea tube. All measurements were performed both with and without a supraglottal resonator. While the fundamental frequency was not affected by non-smooth trachea walls, the phonation onset and offset pressures measured right below the glottis decreased with an increasing ripple depth of the trachea walls (up to 20% for 2 mm ripples). This effect was independent from the type of glottis model and the presence of a supraglottal resonator. The pressures at the lower end of the trachea and the average volume velocities showed a tendency to decrease with an increasing ripple depth of the trachea walls but to a much smaller extent. These results indicate that the subglottal geometry and the flow conditions in the trachea can substantially affect the oscillation of synthetic vocal folds.

5.
J Acoust Soc Am ; 150(6): 4191, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34972262

RESUMO

Resonance-strategies with respect to vocal registers, i.e., frequency-ranges of uniform, demarcated voice quality, for the highest part of the female voice are still not completely understood. The first and second vocal tract resonances usually determine vowels. If the fundamental frequency exceeds the vowel-shaping resonance frequencies of speech, vocal tract resonances are tuned to voice source partials. It has not yet been clarified if such tuning is applicable for the entire voice-range, particularly for the top pitches. We investigated professional sopranos who regularly sing pitches above C6 (1047 Hz). Dynamic three-dimensional (3D) magnetic resonance imaging was used to calculate resonances for pitches from C5 (523 Hz) to C7 (2093 Hz) with different vowel configurations ([a:], [i:], [u:]), and different contexts (scales or octave jumps). A spectral analysis and an acoustic analysis of 3D-printed vocal tract models were conducted. The results suggest that there is no exclusive register-defining resonance-strategy. The intersection of fundamental frequency and first vocal tract resonance was not found to necessarily indicate a register shift. The articulators and the vocal tract resonances were either kept without significant adjustments, or the fR1:fo-tuning, wherein the first vocal tract resonance enhances the fundamental frequency, was applied until F6 (1396 Hz). An fR2:fo-tuning was not observed.


Assuntos
Canto , Acústica , Feminino , Humanos , Imageamento por Ressonância Magnética , Fonação , Qualidade da Voz
6.
J Acoust Soc Am ; 150(2): 1209, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34470273

RESUMO

When pitch is explicitly modelled for parametric speech synthesis, microprosodic variations of the fundamental frequency f0 are usually disregarded by current intonation models. While there are numerous studies dealing with the nature and the origin of microprosody, little research has been done on its audibility and its effect on the naturalness of synthetic speech. In this work, the influence of obstruent-related microprosodic variations on the perceived naturalness of articulatory speech synthesis was studied. A small corpus of 20 German words and sentences was re-synthesized using the state-of-the-art articulatory synthesizer VocalTractLab. The pitch contours of the real utterances were extracted and fitted with the Target-Approximation-Model. After the real microprosodic variations were removed from the obtained pitch contours, synthetic variations were applied based on a microprosody model. Subsequently, multiple stimuli with different microprosody amplitudes were synthesized and evaluated in a listening experiment. The results indicate that microprosodic variations are barely audible, but can lead to a greater perceived naturalness of the synthesized speech in certain cases.


Assuntos
Percepção da Fala , Idioma , Fala , Acústica da Fala
7.
J Acoust Soc Am ; 148(1): EL112, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32752753

RESUMO

This study analyzed the durational and spectral differences and their interaction in the production of seven German tense-lax vowel pairs between 30 German native speakers and 30 Mandarin learners of German. The results showed that Mandarin speakers differed significantly from the German speakers in producing the German tense-lax contrast. The general pattern was that Mandarin learners employed temporal features more strongly than spectral features to indicate the tense-lax contrast as compared to German speakers. The phonetic influences of the Mandarin language on the production of German tense and lax vowels are discussed.


Assuntos
Idioma , Percepção da Fala , Acústica , China , Humanos , Fonética , Acústica da Fala
8.
J Acoust Soc Am ; 146(1): 223, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31370636

RESUMO

The estimation of formant frequencies from acoustic speech signals is mostly based on Linear Predictive Coding (LPC) algorithms. Since LPC is based on the source-filter model of speech production, the formant frequencies obtained are often implicitly regarded as those for an infinite glottal impedance, i.e., a closed glottis. However, previous studies have indicated that LPC-based formant estimates of vowels generated with a realistically varying glottal area may substantially differ from the resonances of the vocal tract with a closed glottis. In the present study, the deviation between closed-glottis resonances and LPC-estimated formants during phonation with different peak glottal areas has been systematically examined both using physical vocal tract models excited with a self-oscillating rubber model of the vocal folds, and by computer simulations of interacting source and filter models. Ten vocal tract resonators representing different vowels have been analyzed. The results showed that F1 increased with the peak area of the time-varying glottis, while F2 and F3 were not systematically affected. The effect of the peak glottal area on F1 was strongest for close-mid to close vowels, and more moderate for mid to open vowels.

9.
J Acoust Soc Am ; 137(3): 1503-12, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25786961

RESUMO

Vocal emotions are signaled by specific patterns of prosodic parameters, most notably pitch, phone duration, intensity, and phonation type. Phonation type was so far the least accessible parameter in emotion research, because it was difficult to extract from speech signals and difficult to manipulate in natural or synthetic speech. The present study built on recent advances in articulatory speech synthesis to exclusively control phonation type in re-synthesized German sentences spoken with seven different emotions. The goal was to find out to what extent the sole change of phonation type affects the perception of these emotions. Therefore, portrayed emotional utterances were re-synthesized with their original phonation type, as well as with each purely breathy, modal, and pressed phonation, and then rated by listeners with respect to the perceived emotions. Highly significant effects of phonation type on the recognition rates of the original emotions were found, except for disgust. While fear, anger, and the neutral emotion require specific phonation types for correct perception, sadness, happiness, boredom, and disgust primarily rely on other prosodic parameters. These results can help to improve the expression of emotions in synthesized speech and facilitate the robust automatic recognition of vocal emotions.


Assuntos
Emoções , Percepção Sonora , Fonação , Percepção da Altura Sonora , Acústica da Fala , Percepção da Fala , Qualidade da Voz , Acústica , Adulto , Feminino , Humanos , Masculino , Reconhecimento Automatizado de Padrão , Fonética , Processamento de Sinais Assistido por Computador , Espectrografia do Som , Medida da Produção da Fala , Fatores de Tempo , Adulto Jovem
10.
J Acoust Soc Am ; 137(5): 2586-95, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25994691

RESUMO

The role of the vocal tract for phonation at very high soprano fundamental frequencies (F0s) is not yet understood in detail. In this investigation, two experiments were carried out with a single professional high soprano subject. First, using two dimensional (2D) dynamic real-time magnetic resonance imaging (MRI) (24 fps) midsagittal and coronal vocal tract shapes were analyzed while the subject sang a scale from Bb5 (932 Hz) to G6 (1568 Hz). In a second experiment, volumetric vocal tract MRI data were recorded from sustained phonations (13 s) for the pitches C6 (1047 Hz) and G6 (1568 Hz). Formant frequencies were measured in physical models created by 3D printing, and calculated from area functions obtained from the 3D vocal tract shapes. The data showed that there were only minor modifications of the vocal tract shape. These changes involved a decrease of the piriform sinus as well as small changes of tongue position. Formant frequencies did not exhibit major differences between C6 and G6 for F1 and F3, respectively. Only F2 was slightly raised for G6. For G6, however, F2 is not excited by any voice source partial. Therefore, this investigation was not able to confirm that the analyzed professional soprano subject adjusted formants to voice source partials for the analyzed F0s.


Assuntos
Acústica , Laringe/fisiologia , Fonação , Canto , Qualidade da Voz , Fenômenos Biomecânicos , Feminino , Humanos , Imageamento Tridimensional , Laringe/anatomia & histologia , Imageamento por Ressonância Magnética , Modelos Anatômicos , Impressão Tridimensional , Processamento de Sinais Assistido por Computador , Espectrografia do Som
11.
J Vis Exp ; (203)2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-38251763

RESUMO

This study aims to develop super-soft, non-sticky vocal fold models for voice research. The conventional manufacturing process of silicone-based vocal fold models results in models with undesirable properties, such as stickiness and reproducibility issues. Those vocal fold models are prone to rapid aging, leading to poor comparability across different measurements. In this study, we propose a modification to the manufacturing process by changing the order of layering the silicone material, which leads to the production of non-sticky and highly consistent vocal fold models. We also compare a model produced using this method with a conventionally manufactured vocal fold model that is adversely affected by its sticky surface. We detail the manufacturing process and characterize the properties of the models for potential applications. The outcomes of the study demonstrate the efficacy of the modified fabrication method, highlighting the superior qualities of our non-sticky vocal fold models. The findings contribute to the development of realistic and reliable vocal fold models for research and clinical applications.


Assuntos
Confiabilidade dos Dados , Prega Vocal , Reprodutibilidade dos Testes , Silicones
12.
Neuroimage ; 79: 275-87, 2013 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-23660030

RESUMO

The basis for different neural activations in response to male and female voices as well as the question, whether men and women perceive male and female voices differently, has not been thoroughly investigated. Therefore, the aim of the present study was to examine the behavioral and neural correlates of gender-related voice perception in healthy male and female volunteers. fMRI data were collected while 39 participants (19 female) were asked to indicate the gender of 240 voice stimuli. These stimuli included recordings of 3-syllable nouns as well as the same recordings pitch-shifted in 2, 4 and 6 semitone steps in the direction of the other gender. Data analysis revealed a) equal voice discrimination sensitivity in men and women but better performance in the categorization of opposite-sex stimuli at least in men, b) increased responses to increasing gender ambiguity in the mid cingulate cortex and bilateral inferior frontal gyri, and c) stronger activation in a fronto-temporal neural network in response to voices of the opposite sex. Our results indicate a gender specific processing for male and female voices on a behavioral and neuronal level. We suggest that our results reflect higher sensitivity probably due to the evolutionary relevance of voice perception in mate selection.


Assuntos
Mapeamento Encefálico , Córtex Cerebral/fisiologia , Potenciais Evocados Auditivos/fisiologia , Rede Nervosa/fisiologia , Análise para Determinação do Sexo/métodos , Percepção da Fala/fisiologia , Adulto , Feminino , Humanos , Masculino , Caracteres Sexuais
13.
J Voice ; 2023 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-36966126

RESUMO

In this study, silicone vocal fold models with different geometries were manufactured using the common silicone brand EcoFlex 00-30 with typical oil mixing ratios. However, the proportions of oil typically used are higher than the manufacturer's recommended limit, in order to attain the softness of human vocal folds. This additional oil causes direct effects on the silicone, such as shrinkage, stickiness, evaporation, embrittlement, and uneven vulcanization. This study investigated the impact of these effects on the oscillation characteristics of the silicone vocal fold models and how they change over time. The goal was to examine the comparability of produced silicone vocal fold models and the results obtained from experiments performed with these models. For the manufactured models, the phonation onset pressure, offset pressure, mean volume velocity, pulmonary power, fundamental frequency, and measures of the glottal area waveform were collected over a period of up to 8 weeks. The results showed that the data for the models were highly scattered. Furthermore, over time, the phonation onset/offset pressures increased, leading to failure to oscillate for some models, and the glottal area waveform also changed. In conclusion, when working with over-thinned silicone vocal fold models, their characteristics depend strongly on the time of measurement. Therefore, it is recommended to carefully consider the effects of oil-oversaturation and timing of measurements when using silicone vocal fold models in experiments.

14.
IEEE Trans Neural Netw Learn Syst ; 34(10): 7648-7659, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35120012

RESUMO

Echo state networks (ESNs) are a special type of recurrent neural networks (RNNs), in which the input and recurrent connections are traditionally generated randomly, and only the output weights are trained. Despite the recent success of ESNs in various tasks of audio, image, and radar recognition, we postulate that a purely random initialization is not the ideal way of initializing ESNs. The aim of this work is to propose an unsupervised initialization of the input connections using the K -means algorithm on the training data. We show that for a large variety of datasets, this initialization performs equivalently or superior than a randomly initialized ESN while needing significantly less reservoir neurons. Furthermore, we discuss that this approach provides the opportunity to estimate a suitable size of the reservoir based on prior knowledge about the data.

15.
PLoS One ; 18(2): e0281877, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36795744

RESUMO

In this study, 23 subjects produced cyclic transitions between rounded vowels and unrounded vowels as in /o-i-o-i-o-…/ at two specific speaking rates. Rounded vowels are typically produced with a lower larynx position than unrounded vowels. This contrast in vertical larynx position was further amplified by producing the unrounded vowels with a higher pitch than the rounded vowels. The vertical larynx movements of each subject were measured by means of object tracking in laryngeal ultrasound videos. The results indicate that larynx lowering was on average 26% faster than larynx raising, and that this velocity difference was more pronounced in woman than in men. Possible reasons for this are discussed with a focus on specific biomechanical properties. The results can help to interpret vertical larynx movements with regard to underlying neural control and aerodynamic conditions, and to improve movement models for articulatory speech synthesis.


Assuntos
Laringe , Fala , Masculino , Feminino , Humanos , Fonética , Laringe/diagnóstico por imagem , Movimento , Gravação de Videoteipe
16.
J Speech Lang Hear Res ; : 1-15, 2023 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-37971432

RESUMO

PURPOSE: Breathing is ubiquitous in speech production, crucial for structuring speech, and a potential diagnostic indicator for respiratory diseases. However, the acoustic characteristics of speech breathing remain underresearched. This work aims to characterize the spectral properties of human inhalation noises in a large speaker sample and explore their potential similarities with speech sounds. Speech sounds are mostly realized with egressive airflow. To account for this, we investigated the effect of airflow direction (inhalation vs. exhalation) on acoustic properties of certain vocal tract (VT) configurations. METHOD: To characterize human inhalation, we describe spectra of breath noises produced by human speakers from two data sets comprising 34 female and 100 male participants. To investigate the effect of airflow direction, three-dimensional-printed VT models of a male and a female speaker with static VT configurations of four vowels and four fricatives were used. An airstream was directed through these VT configurations in both directions, and their spectral consequences were analyzed. RESULTS: For human inhalations, we found spectra with a decreasing slope and several weak peaks below 3 kHz. These peaks show moderate (female) to strong (male) overlap with resonances found for participants inhaling with a VT configuration of a central vowel. Results for the VT models suggest that airflow direction is crucial for spectral properties of sibilants, /ç/, and /i:/, but not the other sounds we investigated. Inhalation noise is most similar to /ə/ where airflow direction does not play a role. CONCLUSIONS: Inhalation is realized on ingressive airflow, and inhalation noises have specific resonance properties that are most similar to /ə/ but occur without phonation. Airflow direction does not play a role in this specific VT configuration, but subglottal resonances may do. For future work, we suggest investigating the articulation of speech breathing and link it to current work on pause postures. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.24520585.

17.
Sci Rep ; 12(1): 4192, 2022 03 09.
Artigo em Inglês | MEDLINE | ID: mdl-35273225

RESUMO

Recovering speech in the absence of the acoustic speech signal itself, i.e., silent speech, holds great potential for restoring or enhancing oral communication in those who lost it. Radar is a relatively unexplored silent speech sensing modality, even though it has the advantage of being fully non-invasive. We therefore built a custom stepped frequency continuous wave radar hardware to measure the changes in the transmission spectra during speech between three antennas, located on both cheeks and the chin with a measurement update rate of 100 Hz. We then recorded a command word corpus of 40 phonetically balanced, two-syllable German words and the German digits zero to nine for two individual speakers and evaluated both the speaker-dependent multi-session and inter-session recognition accuracies on this 50-word corpus using a bidirectional long-short term memory network. We obtained recognition accuracies of 99.17% and 88.87% for the speaker-dependent multi-session and inter-session accuracy, respectively. These results show that the transmission spectra are very well suited to discriminate individual words from one another, even across different sessions, which is one of the key challenges for fully non-invasive silent speech interfaces.


Assuntos
Percepção da Fala , Fala , Idioma , Radar , Reconhecimento Psicológico
18.
IEEE Trans Biomed Eng ; 69(1): 356-365, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34214033

RESUMO

OBJECTIVE: Stroke survivors commonly suffer from dysphagia, originating from oro-facial impairments which affect swallowing function. Functional therapy often employs tongue exercises that require the patient to perform short motion sequences. Evaluating the patient's performance on those exercises is difficult, because there is no reliable form of visual feedback. METHODS: We propose an optopalatographic device that does not require a personalized dental retainer and is capable of measuring tongue movement trajectories intraorally. The device features nine optical proximity sensors at 100 Hz and is fixated against the hard palate with a specifically developed palatal adhesive. The sensing capabilities of the device were evaluated on a tongue gesture corpus recorded from nine healthy individuals, containing eight different tongue exercises commonly used in functional dysphagia therapy. RESULTS: The measured tongue trajectories contained temporally and spatially resolved information about the tongue movement and location during each exercise. Furthermore, a simple DTW-kNN classifier was able to distinguish the exercises from one another with an average classification accuracy of 97.9 % and 61.4 % (cross-validation and inter-speaker test accuracy, respectively). CONCLUSION: the device can provide real-time feedback for tongue motion and we obtained promising gesture recognition results with relatively few sensors, even in the absence of a personalized dental retainer. SIGNIFICANCE: Non-personalized optopalatography is readily available and could aid in improving functional dysphagia therapy by providing visual feedback to both the physician and patient.


Assuntos
Transtornos de Deglutição , Deglutição , Transtornos de Deglutição/diagnóstico , Transtornos de Deglutição/etiologia , Transtornos de Deglutição/terapia , Humanos , Pressão , Estudos Prospectivos , Língua
19.
JASA Express Lett ; 1(7): 075203, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-36154640

RESUMO

This study compared the f0 of 14 German vowels in monosyllabic words (/dVt/) embedded in carrier sentences produced by 30 native speakers and 30 Mandarin Chinese learners. Appropriate techniques were employed to robustly measure f0 values and reliably analyze f0 profiles. The results showed that Mandarin learners produced the vowels bearing sentence stress with significantly larger f0 ranges and steeper f0 slopes but comparable f0 mean and maximum in comparison to German natives. Moreover, lax vowels produced by both groups demonstrated narrower ranges with faster f0 changes than tense vowels, which was stronger for Mandarin learners.


Assuntos
Idioma , China , Fatores de Tempo
20.
Sci Adv ; 7(34)2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34407948

RESUMO

Early detection of malign patterns in patients' biological signals can save millions of lives. Despite the steady improvement of artificial intelligence-based techniques, the practical clinical application of these methods is mostly constrained to an offline evaluation of the patients' data. Previous studies have identified organic electrochemical devices as ideal candidates for biosignal monitoring. However, their use for pattern recognition in real time was never demonstrated. Here, we produce and characterize brain-inspired networks composed of organic electrochemical transistors and use them for time-series predictions and classification tasks using the reservoir computing approach. To show their potential use for biofluid monitoring and biosignal analysis, we classify four classes of arrhythmic heartbeats with an accuracy of 88%. The results of this study introduce a previously unexplored paradigm for biocompatible computational platforms and may enable development of ultralow-power consumption hardware-based artificial neural networks capable of interacting with body fluids and biological tissues.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA