Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
J Acoust Soc Am ; 155(2): 1253-1263, 2024 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-38341748

RESUMO

The reassigned spectrogram (RS) has emerged as the most accurate way to infer vocal tract resonances from the acoustic signal [Shadle, Nam, and Whalen (2016). "Comparing measurement errors for formants in synthetic and natural vowels," J. Acoust. Soc. Am. 139(2), 713-727]. To date, validating its accuracy has depended on formant synthesis for ground truth values of these resonances. Synthesis is easily controlled, but it has many intrinsic assumptions that do not necessarily accurately realize the acoustics in the way that physical resonances would. Here, we show that physical models of the vocal tract with derivable resonance values allow a separate approach to the ground truth, with a different range of limitations. Our three-dimensional printed vocal tract models were excited by white noise, allowing an accurate determination of the resonance frequencies. Then, sources with a range of fundamental frequencies were implemented, allowing a direct assessment of whether RS avoided the systematic bias towards the nearest strong harmonic to which other analysis techniques are prone. RS was indeed accurate at fundamental frequencies up to 300 Hz; above that, accuracy was somewhat reduced. Future directions include testing mechanical models with the dimensions of children's vocal tracts and making RS more broadly useful by automating the detection of resonances.


Assuntos
Voz , Criança , Humanos , Acústica , Acústica da Fala , Vibração , Espectrografia do Som
2.
J Acoust Soc Am ; 153(2): 1412, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36859163

RESUMO

Means of characterizing acoustic signals of fricatives with a few parameters have long been sought. When Forrest, Weismer, Milenkovic, and Dougall [(1988) J. Acoust. Soc. Am. 84, 115-123] described their system of treating spectra as probability density functions and computing the first four spectral moments, others quickly adopted their clearly described method, although it did not distinguish /f/ and /θ/. Various problems with their method are described, including the lack of spectral averaging, the necessity of normalizing the amplitude, and correlation between pairs of moments. Even when these issues are rectified by alternative methods, the fact remains that moments are not ideal descriptors because they can only describe departures from the shape of a normal Gaussian distribution. Fricative spectra, particularly of non-sibilants, are often quite dissimilar in shape from Gaussians. Furthermore, shape descriptors do not lend themselves to direct inferences about the production variables that caused the acoustic effects. Here, alternative parameters are defined, it is shown how to adapt them to specific experimental conditions, and tests of efficacy are proposed. These parameters are strongly linked to the articulatory and aerodynamic variables that underlie fricative production.

3.
J Acoust Soc Am ; 154(3): 1932-1944, 2023 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-37768114

RESUMO

Fricatives have noise sources that are filtered by the vocal tract and that typically possess energy over a much broader range of frequencies than observed for vowels and sonorant consonants. This paper introduces and refines fricative measurements that were designed to reflect underlying articulatory and aerodynamic conditions These show differences in the pattern of high-frequency energy for sibilants vs non-sibilants, voiced vs voiceless fricatives, and non-sibilants differing in place of articulation. The results confirm the utility of a spectral peak measure (FM) and low-mid frequency amplitude difference (AmpD) for sibilants. Using a higher-frequency range for defining FM for female voices for alveolars is justified; a still higher range was considered and rejected. High-frequency maximum amplitude (Fh) and amplitude difference between low- and higher-frequency regions (AmpRange) capture /f-θ/ differences in English and the dynamic amplitude range over the entire spectrum. For this dataset, with spectral information up to 15 kHz, a new measure, HighLevelD, was more effective than previously used LevelD and Slope in showing changes over time within the frication. Finally, isolated words and connected speech differ. This work contributes improved measures of fricative spectra and demonstrates the necessity of including high-frequency energy in those measures.


Assuntos
Idioma , Fala , Feminino , Humanos
4.
J Acoust Soc Am ; 152(2): 933, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-36050157

RESUMO

Formants in speech signals are easily identified, largely because formants are defined to be local maxima in the wideband sound spectrum. Sadly, this is not what is of most interest in analyzing speech; instead, resonances of the vocal tract are of interest, and they are much harder to measure. Klatt [(1986). in Proceedings of the Montreal Satellite Symposium on Speech Recognition, 12th International Congress on Acoustics, edited by P. Mermelstein (Canadian Acoustical Society, Montreal), pp. 5-7] showed that estimates of resonances are biased by harmonics while the human ear is not. Several analysis techniques placed the formant closer to a strong harmonic than to the center of the resonance. This "harmonic attraction" can persist with newer algorithms and in hand measurements, and systematic errors can persist even in large corpora. Research has shown that the reassigned spectrogram is less subject to these errors than linear predictive coding and similar measures, but it has not been satisfactorily automated, making its wider use unrealistic. Pending better techniques, the recommendations are (1) acknowledge limitations of current analyses regarding influence of F0 and limits on granularity, (2) report settings more fully, (3) justify settings chosen, and (4) examine the pattern of F0 vs F1 for possible harmonic bias.


Assuntos
Acústica , Acústica da Fala , Algoritmos , Canadá , Humanos , Idioma
5.
J Acoust Soc Am ; 145(5): EL360, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-31153348

RESUMO

Many developmental studies attribute reduction of acoustic variability to increasing motor control. However, linear prediction-based formant measurements are known to be biased toward the nearest harmonic of F0, especially at high F0s. Thus, the amount of reported formant variability generated by changes in F0 is unknown. Here, 470 000 vowels were synthesized, mimicking statistics reported in four developmental studies, to estimate the proportion of formant variability that can be attributed to F0 bias, as well as other formant measurement errors. Results showed that the F0-induced formant measurements errors are large and systematic, and cannot be eliminated by a large sample size.


Assuntos
Acústica , Viés , Acústica da Fala , Percepção da Fala/fisiologia , Humanos , Fonética , Espectrografia do Som/métodos
6.
J Acoust Soc Am ; 139(2): 713-27, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26936555

RESUMO

The measurement of formant frequencies of vowels is among the most common measurements in speech studies, but measurements are known to be biased by the particular fundamental frequency (F0) exciting the formants. Approaches to reducing the errors were assessed in two experiments. In the first, synthetic vowels were constructed with five different first formant (F1) values and nine different F0 values; formant bandwidths, and higher formant frequencies, were constant. Input formant values were compared to manual measurements and automatic measures using the linear prediction coding-Burg algorithm, linear prediction closed-phase covariance, the weighted linear prediction-attenuated main excitation (WLP-AME) algorithm [Alku, Pohjalainen, Vainio, Laukkanen, and Story (2013). J. Acoust. Soc. Am. 134(2), 1295-1313], spectra smoothed cepstrally and by averaging repeated discrete Fourier transforms. Formants were also measured manually from pruned reassigned spectrograms (RSs) [Fulop (2011). Speech Spectrum Analysis (Springer, Berlin)]. All but WLP-AME and RS had large errors in the direction of the strongest harmonic; the smallest errors occur with WLP-AME and RS. In the second experiment, these methods were used on vowels in isolated words spoken by four speakers. Results for the natural speech show that F0 bias affects all automatic methods, including WLP-AME; only the formants measured manually from RS appeared to be accurate. In addition, RS coped better with weaker formants and glottal fry.


Assuntos
Processamento de Sinais Assistido por Computador , Acústica da Fala , Medida da Produção da Fala/métodos , Qualidade da Voz , Acústica , Adulto , Algoritmos , Feminino , Análise de Fourier , Humanos , Modelos Lineares , Masculino , Pessoa de Meia-Idade , Reprodutibilidade dos Testes , Espectrografia do Som , Adulto Jovem
7.
J Acoust Soc Am ; 134(2): 1271-82, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23927125

RESUMO

Coarticulation and invariance are two topics at the center of theorizing about speech production and speech perception. In this paper, a quantitative scale is proposed that places coarticulation and invariance at the two ends of the scale. This scale is based on physical information flow in the articulatory signal, and uses Information Theory, especially the concept of mutual information, to quantify these central concepts of speech research. Mutual Information measures the amount of physical information shared across phonological units. In the proposed quantitative scale, coarticulation corresponds to greater and invariance to lesser information sharing. The measurement scale is tested by data from three languages: German, Catalan, and English. The relation between the proposed scale and several existing theories of coarticulation is discussed, and implications for existing theories of speech production and perception are presented.


Assuntos
Destreza Motora , Fonação , Fonética , Acústica da Fala , Inteligibilidade da Fala , Percepção da Fala , Sistema Estomatognático/inervação , Qualidade da Voz , Fenômenos Biomecânicos , Fenômenos Eletromagnéticos , Feminino , Humanos , Teoria da Informação , Modelos Lineares , Masculino , Medida da Produção da Fala
8.
J Acoust Soc Am ; 129(2): 944-54, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21361451

RESUMO

Due to its aerodynamic, articulatory, and acoustic complexities, the fricative /s/ is known to require high precision in its control, and to be highly resistant to coarticulation. This study documents in detail how jaw, tongue front, tongue back, lips, and the first spectral moment covary during the production of /s/, to establish how coarticulation affects this segment. Data were obtained from 24 speakers in the Wisconsin x-ray microbeam database producing /s/ in prevocalic and pre-obstruent sequences. Analysis of the data showed that certain aspects of jaw and tongue motion had specific kinematic trajectories, regardless of context, and the first spectral moment trajectory corresponded to these in some aspects. In particular contexts, variability due to jaw motion is compensated for by tongue-tip motion and bracing against the palate, to maintain an invariant articulatory-aerodynamic goal, constriction degree. The change in the first spectral moment, which rises to a peak at the midpoint of the fricative, primarily reflects the motion of the jaw. Implications of the results for theories of speech motor control and acoustic-articulatory relations are discussed.


Assuntos
Arcada Osseodentária/fisiologia , Idioma , Boca/fisiologia , Fonética , Acústica da Fala , Fenômenos Biomecânicos , Bases de Dados como Assunto , Feminino , Fricção , Humanos , Arcada Osseodentária/diagnóstico por imagem , Lábio/fisiologia , Masculino , Boca/diagnóstico por imagem , Radiografia , Espectrografia do Som , Medida da Produção da Fala , Língua/fisiologia , Adulto Jovem
9.
J Acoust Soc Am ; 127(3): 1507-18, 2010 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-20329851

RESUMO

A structural magnetic resonance imaging study has revealed that pharyngeal articulation varies considerably with voicing during the production of English fricatives. In a study of four speakers of American English, pharyngeal volume was generally found to be greater during the production of sustained voiced fricatives, compared to voiceless equivalents. Though pharyngeal expansion is expected for voiced stops, it is more surprising for voiced fricatives. For three speakers, all four voiced oral fricatives were produced with a larger pharynx than that used during the production of the voiceless fricative at the same place of articulation. For one speaker, pharyngeal volume during the production of voiceless labial fricatives was found to be greater, and sibilant pharyngeal volume varied with vocalic context as well as voicing. Pharyngeal expansion was primarily achieved through forward displacement of the anterior and lateral walls of the upper pharynx, but some displacement of the rear pharyngeal wall was also observed. These results suggest that the production of voiced fricatives involves the complex interaction of articulatory constraints from three separate goals: the formation of the appropriate oral constriction, the control of airflow through the constriction so as to achieve frication, and the maintenance of glottal oscillation by attending to transglottal pressure.


Assuntos
Imageamento por Ressonância Magnética , Faringe/anatomia & histologia , Faringe/fisiologia , Fala/fisiologia , Voz/fisiologia , Adulto , Feminino , Glote/anatomia & histologia , Glote/fisiologia , Humanos , Laringe/anatomia & histologia , Laringe/fisiologia , Masculino , Modelos Biológicos , Fonética , Adulto Jovem
10.
Am J Speech Lang Pathol ; 29(4): 2012-2022, 2020 11 12.
Artigo em Inglês | MEDLINE | ID: mdl-32870708

RESUMO

Purpose The purpose of this study was to report the variability of electrolarynx (EL) users' speech intelligibility in quiet and in multitalker babble. Method Ten EL users (five Servox® Digital, five TruTone™) who were at least 2 years postlaryngectomy provided recordings of five sentences from the 1965 Revised List of Phonetically Balanced Sentences. Recordings were judged by two groups of naïve listeners in quiet and in the presence of multitalker babble. Fifteen listeners orthographically transcribed a total of 750 sentences containing 3,750 key words in quiet, and another 15 listeners orthographically transcribed the same sentences mixed with multitalker babble. Results Significant differences in speech intelligibility were observed between listening conditions; 17.9% more key words were correctly identified in quiet compared to multitalker babble. Significant differences in fundamental frequency (F0) standard deviation and range but not speech intelligibility were observed between EL device types. A positive correlation of moderate significance was observed between F0 standard deviation and intelligibility for TruTone users in multitalker babble. Conclusions Findings suggest that listeners are able to identify a significantly higher percentage of EL users' speech in quiet compared to multitalker babble, but a large variability in EL users' speech intelligibility exists. Continued investigation involving a larger number of EL users is necessary to confirm this study's findings. Future research should explore the relationships among F0 measures, speaker characteristics (e.g., rate of speech, articulatory precision), and speech intelligibility, in addition to improving alaryngeal rehabilitation training protocols for EL users.


Assuntos
Inteligibilidade da Fala , Percepção da Fala , Percepção Auditiva , Humanos , Idioma , Ruído , Distúrbios da Fala
11.
PLoS One ; 13(9): e0202180, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30192767

RESUMO

Speech motor actions are performed quickly, while simultaneously maintaining a high degree of accuracy. Are speed and accuracy in conflict during speech production? Speed-accuracy tradeoffs have been shown in many domains of human motor action, but have not been directly examined in the domain of speech production. The present work seeks evidence for Fitts' law, a rigorous formulation of this fundamental tradeoff, in speech articulation kinematics by analyzing USC-TIMIT, a real-time magnetic resonance imaging data set of speech production. A theoretical framework for considering Fitts' law with respect to models of speech motor control is elucidated. Methodological challenges in seeking relationships consistent with Fitts' law are addressed, including the operational definitions and measurement of key variables in real-time MRI data. Results suggest the presence of speed-accuracy tradeoffs for certain types of speech production actions, with wide variability across syllable position, and substantial variability also across subjects. Coda consonant targets immediately following the syllabic nucleus show the strongest evidence of this tradeoff, with correlations as high as 0.72 between speed and accuracy. A discussion is provided concerning the potentially limited applicability of Fitts' law in the context of speech production, as well as the theoretical context for interpreting the results.


Assuntos
Córtex Motor/fisiologia , Desempenho Psicomotor/fisiologia , Tempo de Reação/fisiologia , Fala/fisiologia , Algoritmos , Fenômenos Biomecânicos , Humanos , Laringe/diagnóstico por imagem , Laringe/fisiologia , Imageamento por Ressonância Magnética , Modelos Biológicos , Prega Vocal/diagnóstico por imagem , Prega Vocal/fisiologia
12.
J Speech Lang Hear Res ; 56(4): 1175-89, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23785194

RESUMO

PURPOSE: This article introduces theoretically driven acoustic measures of /s/ that reflect aerodynamic and articulatory conditions. The measures were evaluated by assessing whether they revealed expected changes over time and labiality effects, along with possible gender differences suggested by past work. METHOD: Productions of /s/ were extracted from various speaking tasks from typically speaking adolescents (6 boys, 6 girls). Measures were made of relative spectral energies in low- (550-3000 Hz), mid- (3000-7000 Hz), and high-frequency regions (7000-11025 Hz); the mid-frequency amplitude peak; and temporal changes in these parameters. Spectral moments were also obtained to permit comparison with existing work. RESULTS: Spectral balance measures in low-mid and mid-high frequency bands varied over the time course of /s/, capturing the development of sibilance at mid-fricative along with showing some effects of gender and labiality. The mid-frequency spectral peak was significantly higher in nonlabial contexts, and in girls. Temporal variation in the mid-frequency peak differentiated ±labial contexts while normalizing over gender. CONCLUSIONS: The measures showed expected patterns, supporting their validity. Comparison of these data with studies of adults suggests some developmental patterns that call for further study. The measures may also serve to differentiate some cases of typical and misarticulated /s/.


Assuntos
Fonética , Caracteres Sexuais , Acústica da Fala , Fala , Comportamento Verbal , Adolescente , Feminino , Humanos , Lábio , Masculino , Valores de Referência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA