Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros

Bases de dados
País/Região como assunto
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
J Acoust Soc Am ; 155(2): 1253-1263, 2024 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-38341748

RESUMO

The reassigned spectrogram (RS) has emerged as the most accurate way to infer vocal tract resonances from the acoustic signal [Shadle, Nam, and Whalen (2016). "Comparing measurement errors for formants in synthetic and natural vowels," J. Acoust. Soc. Am. 139(2), 713-727]. To date, validating its accuracy has depended on formant synthesis for ground truth values of these resonances. Synthesis is easily controlled, but it has many intrinsic assumptions that do not necessarily accurately realize the acoustics in the way that physical resonances would. Here, we show that physical models of the vocal tract with derivable resonance values allow a separate approach to the ground truth, with a different range of limitations. Our three-dimensional printed vocal tract models were excited by white noise, allowing an accurate determination of the resonance frequencies. Then, sources with a range of fundamental frequencies were implemented, allowing a direct assessment of whether RS avoided the systematic bias towards the nearest strong harmonic to which other analysis techniques are prone. RS was indeed accurate at fundamental frequencies up to 300 Hz; above that, accuracy was somewhat reduced. Future directions include testing mechanical models with the dimensions of children's vocal tracts and making RS more broadly useful by automating the detection of resonances.


Assuntos
Voz , Criança , Humanos , Acústica , Acústica da Fala , Vibração , Espectrografia do Som
2.
J Acoust Soc Am ; 152(5): R9, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36456254

RESUMO

The Reflections series takes a look back on historical articles from The Journal of the Acoustical Society of America that have had a significant impact on the science and practice of acoustics.


Assuntos
Acústica
3.
J Acoust Soc Am ; 152(2): 933, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-36050157

RESUMO

Formants in speech signals are easily identified, largely because formants are defined to be local maxima in the wideband sound spectrum. Sadly, this is not what is of most interest in analyzing speech; instead, resonances of the vocal tract are of interest, and they are much harder to measure. Klatt [(1986). in Proceedings of the Montreal Satellite Symposium on Speech Recognition, 12th International Congress on Acoustics, edited by P. Mermelstein (Canadian Acoustical Society, Montreal), pp. 5-7] showed that estimates of resonances are biased by harmonics while the human ear is not. Several analysis techniques placed the formant closer to a strong harmonic than to the center of the resonance. This "harmonic attraction" can persist with newer algorithms and in hand measurements, and systematic errors can persist even in large corpora. Research has shown that the reassigned spectrogram is less subject to these errors than linear predictive coding and similar measures, but it has not been satisfactorily automated, making its wider use unrealistic. Pending better techniques, the recommendations are (1) acknowledge limitations of current analyses regarding influence of F0 and limits on granularity, (2) report settings more fully, (3) justify settings chosen, and (4) examine the pattern of F0 vs F1 for possible harmonic bias.


Assuntos
Acústica , Acústica da Fala , Algoritmos , Canadá , Humanos , Idioma
4.
J Acoust Soc Am ; 149(6): 4190, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-34241453

RESUMO

Many claims about the prevalence of phonetic voicing in English obstruents have been made in the literature over the decades, particularly concerning the stops and affricate [b, d, ɡ, ʤ]. An examination of this literature reveals that many of these claims are based on a paucity of speech data and measurements. For the present study, voiced consonants in the Buckeye corpus of American English (39 speakers) have been measured to determine the percentage of their duration that shows vocal cord vibrations. The prevalence of voicing in the 53 690 voiced stop and affricate tokens has been examined in all contexts, including the initial, intervocalic, and final positions. The results generally contradict the common notion that the nominally "voiced" stops of English are phonetically unvoiced in all positions but intervocalic. Here, they are found to be mostly voiced in final position as well as intervocalically, but usually less than 50% voiced in initial position. A significant proportion of these stops, however, were found to be nearly 100% voiced in the initial position, and this could not be explained by interspeaker variation.


Assuntos
Aesculus , Voz , Fonética , Fala , Acústica da Fala , Prega Vocal
5.
J Acoust Soc Am ; 127(4): 2114-7, 2010 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-20369992

RESUMO

This brief report describes a small study which was undertaken with nine synthetic vowel tokens, in an effort to demonstrate the validity of the reassigned spectrogram as a formant measurement tool. The reassigned spectrogram's performance is also compared with that of a typical pitch-asynchronous linear predictive analysis and is found to be superior. In this study, reassigned spectrograms were further processed to highlight the formants and then were used to measure these synthetic vowel formants generally to within 0.5% of their known true values, far surpassing the accuracy of a typical linear predictive analysis procedure which was inaccurate by as much as 17%. The overall accuracy of reassigned spectrographic formant measurement is thus demonstrated in these cases.


Assuntos
Modelos Lineares , Modelos Teóricos , Fonética , Processamento de Sinais Assistido por Computador , Acústica da Fala , Interface para o Reconhecimento da Fala , Algoritmos , Análise de Fourier , Humanos , Reprodutibilidade dos Testes , Espectrografia do Som , Fatores de Tempo
6.
Phonetica ; 64(4): 237-62, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-18421245

RESUMO

A reassigned or time-corrected instantaneous frequency spectrogram has been developed in the work of a number of practitioners. Here we present a general description of this imaging technique and explore its manifold applications to acoustic phonetics. The TCIF spectrogram shows the locations of signal components with unrivalled precision, eliminating the blurring and smearing of components that hamper the readability of the conventional spectrogram. Formants of vowels and other resonants are shown with great accuracy by observing glottal pulsations at very short time scales with a wideband analysis. A further post-processing technique is also described, by which signal components such as formants, as well as impulsive events, can be effectively isolated to the exclusion of other signal information. When the phonation process is examined this closely, a variety of evidence surfaces which supports recent developments in the theory and computational simulation of aeroacoustic phenomena in speech. Narrowband analysis is also demonstrated to permit pitch tracking with relative ease.


Assuntos
Fonética , Espectrografia do Som , Fala/fisiologia , Glote/fisiologia , Humanos , Modelos Biológicos , Movimento/fisiologia , Acústica da Fala , Medida da Produção da Fala/métodos , Fatores de Tempo , Prega Vocal/fisiologia
7.
Wiley Interdiscip Rev Cogn Sci ; 4(3): 299-306, 2013 May.
Artigo em Inglês | MEDLINE | ID: mdl-26304207

RESUMO

Learnability theory is a body of mathematical and computational results concerning questions such as: when is learning possible? What prior information is required to support learning? What computational or other resources are required for learning to be possible? It is therefore complementary both to the computational project of building machine learning systems and to the scientific project of understanding learning in people and animals through observation and experiment. Learnability theory includes work within a variety of theoretical frameworks, including, for example, identification in the limit, and Bayesian learning, which idealize learning in different ways. Learnability theory addresses one of the foundational questions in cognitive science: to what extent can knowledge be derived from experience? WIREs Cogn Sci 2013, 4:299-306. doi: 10.1002/wcs.1228 For further resources related to this article, please visit the WIREs website.

8.
J Acoust Soc Am ; 121(3): 1510-8, 2007 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-17407888

RESUMO

Two computational methods for pruning a reassigned spectrogram to show only quasisinusoidal components, or only impulses, or both, are presented mathematically and provided with step-by-step algorithms. Both methods compute the second-order mixed partial derivative of the short-time Fourier transform phase, and rely on the conditions that components and impulses are each well-represented by reassigned spectrographic points possessing particular values of this derivative. This use of the mixed second-order derivative was introduced by Nelson [J. Acoust. Soc. Am. 110, 2575-2592 (2001)] but here our goals are to completely describe the computation of this derivative in a way that highlights the relations to the two most influential methods of computing a reassigned spectrogram, and also to demonstrate the utility of this technique for plotting spectrograms showing line components or impulses while excluding most other points. When applied to speech signals, vocal tract resonances (formants) or glottal pulsations can be effectively isolated in expanded views of the phonation process.


Assuntos
Modelos Biológicos , Fonação/fisiologia , Fala/fisiologia , Prega Vocal/fisiologia , Humanos , Espectrografia do Som/métodos
9.
J Acoust Soc Am ; 119(1): 360-71, 2006 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-16454291

RESUMO

A modification of the spectrogram (log magnitude of the short-time Fourier transform) to more accurately show the instantaneous frequencies of signal components was first proposed in 1976 [Kodera et al., Phys. Earth Planet. Inter. 12, 142-150 (1976)], and has been considered or reinvented a few times since but never widely adopted. This paper presents a unified theoretical picture of this time-frequency analysis method, the time-corrected instantaneous frequency spectrogram, together with detailed implementable algorithms comparing three published techniques for its computation. The new representation is evaluated against the conventional spectrogram for its superior ability to track signal components. The lack of a uniform framework for either mathematics or implementation details which has characterized the disparate literature on the schemes has been remedied here. Fruitful application of the method is shown in the realms of speech phonation analysis, whale song pitch tracking, and additive sound modeling.


Assuntos
Algoritmos , Espectrografia do Som/métodos , Fala/fisiologia , Prega Vocal/fisiologia , Vocalização Animal/fisiologia , Baleias/fisiologia , Animais , Análise de Fourier , Humanos , Fonética , Percepção da Altura Sonora/fisiologia , Processamento de Sinais Assistido por Computador , Medida da Produção da Fala/instrumentação , Medida da Produção da Fala/métodos , Fatores de Tempo
10.
Phonetica ; 60(4): 231-60, 2003.
Artigo em Inglês | MEDLINE | ID: mdl-15004493

RESUMO

Yeyi has the largest known inventory of click sounds in the Bantu language family. It is now entering a moribund state, and this paper documents a variety of acoustic and distributional details of the clicks found in the speech of 13 Yeyi speakers by presenting sound inventories, spectrograms, palatograms, and related acoustic data. The durations of the closure and release phases of the clicks were measured, and an analysis demonstrates that the two duration measures together are statistically able to distinguish the dental, alveolar, palatal, and lateral clicks from one another. A second quantitative study examines the discriminability of the four click places using solely the anterior burst power spectra, as parametrized using the first four spectral moments. The places of articulation are found to be moderately well classified by this means. The patterns of interspeaker variation affecting the clicks are also documented, and these are found to accord rather well with the classification errors made by the optimal classifier using the anterior burst spectra.


Assuntos
Idioma , Fonação/fisiologia , Fonética , Acústica da Fala , Botsuana , Análise Discriminante , Feminino , Humanos , Masculino , Espectrografia do Som , Medida da Produção da Fala , Gravação em Fita , Comportamento Verbal
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA