RESUMO
The current project undertakes a kinematic examination of vertical larynx actions and intergestural timing stability within multi-gesture complex segments such as ejectives and implosives that may possess specific temporal goals critical to their articulatory realization. Using real-time MRI (rtMRI) speech production data from Hausa non-pulmonic and pulmonic consonants, this study illuminates speech timing between oral constriction and vertical larynx actions within segments and the role this intergestural timing plays in realizing phonological contrasts and processes in varying prosodic contexts. Results suggest that vertical larynx actions have greater magnitude in the production of ejectives compared to their pulmonic counterparts, but implosives and pulmonic consonants are differentiated not by vertical larynx magnitude but by the intergestural timing patterns between their oral and vertical larynx gestures. Moreover, intergestural timing stability/variability between oral and non-oral (vertical larynx) actions differ among ejectives, implosives, and pulmonic consonants, with ejectives having the most stable temporal lags, followed by implosives and pulmonic consonants, respectively. Lastly, the findings show how contrastive linguistic 'molecules' - here, segment-sized phonological complexes with multiple gestures - interact with phrasal context in speech in such a way that it variably shapes temporal organization between participating gestures as well as respecting stability in relative timing between such gestures comprising a segment.
RESUMO
Extensive research has found that the duration of a pause is influenced by the length of an upcoming utterance, suggesting that speakers plan the upcoming utterance during this time. Research has more recently begun to examine articulation during pauses. A specific configuration of the vocal tract during acoustic pauses, termed pause posture (PP), has been identified in Greek and American English. However, the cognitive function giving rise to PPs is not well understood. The present study examines whether PPs are related to speech planning processes, such that they contribute additional planning time for an upcoming utterance. In an articulatory magnetometer study, the hypothesis is tested that an increase in upcoming utterance length leads to more frequent PP occurrence and that PPs are longer in pauses that precede longer phrases. The results indicate that PPs are associated with planning time for longer utterances but that they are associated with a relatively fixed scope of planning for upcoming speech. To further examine the relationship between articulation and speech planning, an additional hypothesis examines whether the first part of the pause predominantly serves to mark prosodic boundaries while the second part serves speech planning purposes. This hypothesis is not supported by the results.
Assuntos
Idioma , Fala , Acústica da Fala , Testes de Articulação da Fala , Medida da Produção da FalaRESUMO
PURPOSE: To provide 3D real-time MRI of speech production with improved spatio-temporal sharpness using randomized, variable-density, stack-of-spiral sampling combined with a 3D spatio-temporally constrained reconstruction. METHODS: We evaluated five candidate (k, t) sampling strategies using a previously proposed gradient-echo stack-of-spiral sequence and a 3D constrained reconstruction with spatial and temporal penalties. Regularization parameters were chosen by expert readers based on qualitative assessment. We experimentally determined the effect of spiral angle increment and kz temporal order. The strategy yielding highest image quality was chosen as the proposed method. We evaluated the proposed and original 3D real-time MRI methods in 2 healthy subjects performing speech production tasks that invoke rapid movements of articulators seen in multiple planes, using interleaved 2D real-time MRI as the reference. We quantitatively evaluated tongue boundary sharpness in three locations at two speech rates. RESULTS: The proposed data-sampling scheme uses a golden-angle spiral increment in the kx -ky plane and variable-density, randomized encoding along kz . It provided a statistically significant improvement in tongue boundary sharpness score (P < .001) in the blade, body, and root of the tongue during normal and 1.5-times speeded speech. Qualitative improvements were substantial during natural speech tasks of alternating high, low tongue postures during vowels. The proposed method was also able to capture complex tongue shapes during fast alveolar consonant segments. Furthermore, the proposed scheme allows flexible retrospective selection of temporal resolution. CONCLUSION: We have demonstrated improved 3D real-time MRI of speech production using randomized, variable-density, stack-of-spiral sampling with a 3D spatio-temporally constrained reconstruction.
Assuntos
Processamento de Imagem Assistida por Computador , Fala , Humanos , Imageamento Tridimensional , Imageamento por Ressonância Magnética , Estudos Retrospectivos , Língua/diagnóstico por imagemRESUMO
PURPOSE: To mitigate a common artifact in spiral real-time MRI, caused by aliasing of signal outside the desired FOV. This artifact frequently occurs in midsagittal speech real-time MRI. METHODS: Simulations were performed to determine the likely origin of the artifact. Two methods to mitigate the artifact are proposed. The first approach, denoted as "large FOV" (LF), keeps an FOV that is large enough to include the artifact signal source during reconstruction. The second approach, denoted as "estimation-subtraction" (ES), estimates the artifact signal source before subtracting a synthetic signal representing that source in multicoil k-space raw data. Twenty-five midsagittal speech-production real-time MRI data sets were used to evaluate both of the proposed methods. Reconstructions without and with corrections were evaluated by two expert readers using a 5-level Likert scale assessing artifact severity. Reconstruction time was also compared. RESULTS: The origin of the artifact was found to be a combination of gradient nonlinearity and imperfect anti-aliasing in spiral sampling. The LF and ES methods were both able to substantially reduce the artifact, with an averaged qualitative score improvement of 1.25 and 1.35 Likert levels for LF correction and ES correction, respectively. Average reconstruction time without correction, with LF correction, and with ES correction were 160.69 ± 1.56, 526.43 ± 5.17, and 171.47 ± 1.71 ms/frame. CONCLUSION: Both proposed methods were able to reduce the spiral aliasing artifacts, with the ES-reduction method being more effective and more time efficient.
Assuntos
Artefatos , Processamento de Imagem Assistida por Computador , Imageamento por Ressonância Magnética , FalaRESUMO
OBJECTIVES: To evaluate a novel method for real-time tagged MRI with increased tag persistence using phase sensitive tagging (REALTAG), demonstrated for speech imaging. METHODS: Tagging is applied as a brief interruption to a continuous real-time spiral acquisition. REALTAG is implemented using a total tagging flip angle of 180° and a novel frame-by-frame phase sensitive reconstruction to remove smooth background phase while preserving the sign of the tag lines. Tag contrast-to-noise ratio of REALTAG and conventional tagging (total flip angle of 90°) is simulated and evaluated in vivo. The ability to extend tag persistence is tested during the production of vowel-to-vowel transitions by American English speakers. RESULTS: REALTAG resulted in a doubling of contrast-to-noise ratio at each time point and increased tag persistence by more than 1.9-fold. The tag persistence was 1150 ms with contrast-to-noise ratio >6 at 1.5T, providing 2 mm in-plane resolution, 179 frames/s, with 72.6 ms temporal window width, and phase sensitive reconstruction. The new imaging window is able to capture internal tongue deformation over word-to-word transitions in natural speech production. CONCLUSION: Tag persistence is substantially increased in intermittently tagged real-time MRI by using the improved REALTAG method. This makes it possible to capture longer motion patterns in the tongue, such as cross-word vowel-to-vowel transitions, and provides a powerful new window to study tongue biomechanics.
Assuntos
Idioma , Imageamento por Ressonância Magnética , Fenômenos Biomecânicos , Fala , Língua/diagnóstico por imagemRESUMO
It has been previously observed [McMicken, Salles, Berg, Vento-Wilson, Rogers, Toutios, and Narayanan. (2017). J. Commun. Disorders, Deaf Stud. Hear. Aids 5(2), 1-6] using real-time magnetic resonance imaging that a speaker with severe congenital tongue hypoplasia (aglossia) had developed a compensatory articulatory strategy where she, in the absence of a functional tongue tip, produced a plosive consonant perceptually similar to /d/ using a bilabial constriction. The present paper provides an updated account of this strategy. It is suggested that the previously observed compensatory bilabial closing that occurs during this speaker's /d/ production is consistent with vocal tract shaping resulting from hyoid raising created with mylohyoid action, which may also be involved in typical /d/ production. Simulating this strategy in a dynamic articulatory synthesis experiment leads to the generation of /d/-like formant transitions.
Assuntos
Língua , Voz , Feminino , Humanos , Fonética , Fala , Língua/diagnóstico por imagemRESUMO
PURPOSE: To demonstrate a tagging method compatible with RT-MRI for the study of speech production. METHODS: Tagging is applied as a brief interruption to a continuous real-time spiral acquisition. Tagging can be initiated manually by the operator, cued to the speech stimulus, or be automatically applied with a fixed frequency. We use a standard 2D 1-3-3-1 binomial SPAtial Modulation of Magnetization (SPAMM) sequence with 1 cm spacing in both in-plane directions. Tag persistence in tongue muscle is simulated and validated in vivo. The ability to capture internal tongue deformations is tested during speech production of American English diphthongs in native speakers. RESULTS: We achieved an imaging window of 650-800 ms at 1.5T, with imaging signal to noise ratio ≥ 17 and tag contrast to noise ratio ≥ 5 in human tongue, providing 36 frames/s temporal resolution and 2 mm in-plane spatial resolution with real-time interactive acquisition and view-sharing reconstruction. The proposed method was able to capture tongue motion patterns and their relative timing with adequate spatiotemporal resolution during the production of American English diphthongs and consonants. CONCLUSION: Intermittent tagging during real-time MRI of speech production is able to reveal the internal deformations of the tongue. This capability will allow new investigations of valuable spatiotemporal information on the biomechanics of the lingual subsystems during speech without reliance on binning speech utterance repetition.
Assuntos
Processamento de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Medida da Produção da Fala/métodos , Fala/fisiologia , Língua , Adulto , Feminino , Humanos , Masculino , Movimento/fisiologia , Língua/diagnóstico por imagem , Língua/fisiologiaRESUMO
PURPOSE: To develop and evaluate a technique for 3D dynamic MRI of the full vocal tract at high temporal resolution during natural speech. METHODS: We demonstrate 2.4 × 2.4 × 5.8 mm3 spatial resolution, 61-ms temporal resolution, and a 200 × 200 × 70 mm3 FOV. The proposed method uses 3D gradient-echo imaging with a custom upper-airway coil, a minimum-phase slab excitation, stack-of-spirals readout, pseudo golden-angle view order in kx -ky , linear Cartesian order along kz , and spatiotemporal finite difference constrained reconstruction, with 13-fold acceleration. This technique is evaluated using in vivo vocal tract airway data from 2 healthy subjects acquired at 1.5T scanner, 1 with synchronized audio, with 2 tasks during production of natural speech, and via comparison with interleaved multislice 2D dynamic MRI. RESULTS: This technique captured known dynamics of vocal tract articulators during natural speech tasks including tongue gestures during the production of consonants "s" and "l" and of consonant-vowel syllables, and was additionally consistent with 2D dynamic MRI. Coordination of lingual (tongue) movements for consonants is demonstrated via volume-of-interest analysis. Vocal tract area function dynamics revealed critical lingual constriction events along the length of the vocal tract for consonants and vowels. CONCLUSION: We demonstrate feasibility of 3D dynamic MRI of the full vocal tract, with spatiotemporal resolution adequate to visualize lingual movements for consonants and vocal tact shaping during natural productions of consonant-vowel syllables, without requiring multiple repetitions.
Assuntos
Imageamento Tridimensional/métodos , Laringe/diagnóstico por imagem , Imageamento por Ressonância Magnética , Processamento de Sinais Assistido por Computador , Medida da Produção da Fala/métodos , Fala/fisiologia , Adulto , Feminino , Humanos , Processamento de Imagem Assistida por Computador , Idioma , Masculino , Movimento , Reprodutibilidade dos Testes , Língua , Gravação em VídeoRESUMO
This paper reports on the concurrent use of electroglottography (EGG) and electromagnetic articulography (EMA) in the acquisition of EMA trajectory data for running speech. Static and dynamic intersensor distances, standard deviations, and coefficients of variation associated with inter-sample distances were compared in two conditions: with and without EGG present. Results indicate that measurement discrepancies between the two conditions are within the EMA system's measurement uncertainty. Therefore, potential electromagnetic interference from EGG does not seem to cause differences of practical importance on EMA trajectory behaviors, suggesting that simultaneous EMA and EGG data acquisition is a viable laboratory procedure for speech research.
Assuntos
Fenômenos Eletromagnéticos , Glote/fisiologia , Medida da Produção da Fala/instrumentação , Fala/fisiologia , Feminino , Glote/anatomia & histologia , Humanos , Laringe/anatomia & histologia , Laringe/fisiologia , Masculino , Boca/anatomia & histologia , Boca/fisiologiaRESUMO
USC-TIMIT is an extensive database of multimodal speech production data, developed to complement existing resources available to the speech research community and with the intention of being continuously refined and augmented. The database currently includes real-time magnetic resonance imaging data from five male and five female speakers of American English. Electromagnetic articulography data have also been presently collected from four of these speakers. The two modalities were recorded in two independent sessions while the subjects produced the same 460 sentence corpus used previously in the MOCHA-TIMIT database. In both cases the audio signal was recorded and synchronized with the articulatory data. The database and companion software are freely available to the research community.
Assuntos
Acústica , Pesquisa Biomédica , Bases de Dados Factuais , Fenômenos Eletromagnéticos , Imageamento por Ressonância Magnética , Faringe/fisiologia , Acústica da Fala , Medida da Produção da Fala , Qualidade da Voz , Acústica/instrumentação , Adulto , Fenômenos Biomecânicos , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Faringe/anatomia & histologia , Processamento de Sinais Assistido por Computador , Software , Medida da Produção da Fala/instrumentação , Fatores de Tempo , TransdutoresRESUMO
This paper presents an automatic procedure to analyze articulatory setting in speech production using real-time magnetic resonance imaging of the moving human vocal tract. The procedure extracts frames corresponding to inter-speech pauses, speech-ready intervals and absolute rest intervals from magnetic resonance imaging sequences of read and spontaneous speech elicited from five healthy speakers of American English and uses automatically extracted image features to quantify vocal tract posture during these intervals. Statistical analyses show significant differences between vocal tract postures adopted during inter-speech pauses and those at absolute rest before speech; the latter also exhibits a greater variability in the adopted postures. In addition, the articulatory settings adopted during inter-speech pauses in read and spontaneous speech are distinct. The results suggest that adopted vocal tract postures differ on average during rest positions, ready positions and inter-speech pauses, and might, in that order, involve an increasing degree of active control by the cognitive speech planning mechanism.
Assuntos
Epiglote/fisiologia , Glote/fisiologia , Interpretação de Imagem Assistida por Computador/métodos , Lábio/fisiologia , Imageamento por Ressonância Magnética/métodos , Palato Mole/fisiologia , Faringe/fisiologia , Fonação/fisiologia , Fonética , Fala/fisiologia , Língua/fisiologia , Algoritmos , Feminino , Humanos , Contração Muscular/fisiologia , Ventilação Pulmonar/fisiologia , Decúbito Dorsal/fisiologiaRESUMO
Real-time magnetic resonance imaging (rtMRI) was used to examine mechanisms of sound production by an American male beatbox artist. rtMRI was found to be a useful modality with which to study this form of sound production, providing a global dynamic view of the midsagittal vocal tract at frame rates sufficient to observe the movement and coordination of critical articulators. The subject's repertoire included percussion elements generated using a wide range of articulatory and airstream mechanisms. Many of the same mechanisms observed in human speech production were exploited for musical effect, including patterns of articulation that do not occur in the phonologies of the artist's native languages: ejectives and clicks. The data offer insights into the paralinguistic use of phonetic primitives and the ways in which they are coordinated in this style of musical performance. A unified formalism for describing both musical and phonetic dimensions of human vocal percussion performance is proposed. Audio and video data illustrating production and orchestration of beatboxing sound effects are provided in a companion annotated corpus.
Assuntos
Laringe/fisiologia , Imageamento por Ressonância Magnética , Música , Fonação , Fonética , Qualidade da Voz , Adulto , Humanos , Processamento de Imagem Assistida por Computador , Masculino , Reconhecimento Automatizado de Padrão , Acústica da Fala , Fatores de Tempo , Gravação em Vídeo , Prega Vocal/fisiologiaRESUMO
Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving articulators and dynamic airway shaping during speech demands high spatio-temporal resolution and robust reconstruction methods. Further, while reconstructed images have been published, to-date there is no open dataset providing raw multi-coil RT-MRI data from an optimized speech production experimental setup. Such datasets could enable new and improved methods for dynamic image reconstruction, artifact correction, feature extraction, and direct extraction of linguistically-relevant biomarkers. The present dataset offers a unique corpus of 2D sagittal-view RT-MRI videos along with synchronized audio for 75 participants performing linguistically motivated speech tasks, alongside the corresponding public domain raw RT-MRI data. The dataset also includes 3D volumetric vocal tract MRI during sustained speech sounds and high-resolution static anatomical T2-weighted upper airway MRI for each participant.
Assuntos
Laringe/fisiologia , Imageamento por Ressonância Magnética/métodos , Fala , Adolescente , Adulto , Sistemas Computacionais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Fatores de Tempo , Gravação em Vídeo , Adulto JovemRESUMO
It is hypothesized that pauses at major syntactic boundaries (i.e., grammatical pauses), but not ungrammatical (e.g., word search) pauses, are planned by a high-level cognitive mechanism that also controls the rate of articulation around these junctures. Real-time magnetic resonance imaging is used to analyze articulation at and around grammatical and ungrammatical pauses in spontaneous speech. Measures quantifying the speed of articulators were developed and applied during these pauses as well as during their immediate neighborhoods. Grammatical pauses were found to have an appreciable drop in speed at the pause itself as compared to ungrammatical pauses, which is consistent with our hypothesis that grammatical pauses are indeed choreographed by a central cognitive planner.
Assuntos
Córtex Auditivo/fisiologia , Imageamento por Ressonância Magnética/métodos , Testes de Articulação da Fala , Fala/fisiologia , Prega Vocal/fisiologia , Humanos , FonéticaRESUMO
In producing linguistic prominence, certain linguistic elements are highlighted relative to others in a given domain; focus is an instance of prominence in which speakers highlight new or important information. This study investigates prominence modulation at the sub-syllable level using a corrective focus task, examining acoustic duration and pitch with particular attention to the gestural composition of Korean tense and lax consonants. The results indicate that focus effects are manifested with systematic variations depending on the gestural structures, i.e. consonants, active during the domain of a focus gesture, that the patterns of focus modulation do not differ as a function of elicited focus positions within the syllable. The findings generally support the premise that the scope of the focus gesture is not (much) smaller than the interval of (CVC) syllable. Lastly, there is also some support for an interaction among prosodic gestures-focus gestures and pitch accentual gestures-at the phrase level. Overall, the current findings support the hypothesis that focus, implemented as a prosodic prominence gesture, modulates temporal characteristics of gestures, as well as possibly other prosodic gestures that are co-active in its the domain.
RESUMO
This study evaluates the effects of phrase boundaries on the intra- and intergestural kinematic characteristics of blended gestures, i.e., overlapping gestures produced with a single articulator. The sequences examined are the juncture geminate [d(#)d], the sequence [d(#)z], and, for comparison, the singleton tongue tip gesture in [d(#)b]. This allows the investigation of the process of gestural aggregation [Munhall, K. G., and Lofqvist, A. (1992). "Gestural aggregation in speech: laryngeal gestures," J. Phonetics 20, 93-110] and the manner in which it is affected by prosodic structure. Juncture geminates are predicted to be affected by prosodic boundaries in the same way as other gestures; that is, they should display prosodic lengthening and lesser overlap across a boundary. Articulatory prosodic lengthening is also investigated using a signal alignment method of the functional data analysis framework [Ramsay, J. O., and Silverman, B. W. (2005). Functional Data Analysis, 2nd ed. (Springer-Verlag, New York)]. This provides the ability to examine a time warping function that characterizes relative timing difference (i.e., lagging or advancing) of a test signal with respect to a given reference, thus offering a way of illuminating local nonlinear deformations at work in prosodic lengthening. These findings are discussed in light of the pi-gesture framework of Byrd and Saltzman [(2003) "The elastic phrase: Modeling the dynamics of boundary-adjacent lengthening," J. Phonetics 31, 149-180].
Assuntos
Fenômenos Biomecânicos , Fonação , Testes de Articulação da Fala , Fala/fisiologia , Língua/fisiologia , Gestos , Humanos , Idioma , Atividade Motora , Movimento , Medida da Produção da FalaRESUMO
Temporal lengthening of gestures and segments located in a boundary-adjacent syllable has been found in both pre- and postboundary contexts. However, the temporal extent or scope of this lengthening, particularly in the articulatory domain, is not well described. We address the question of scope of prosodic lengthening by considering specifically whether prominence interacts with boundary-related articulatory lengthening in such a way that prominent elements not immediately at a phrase edge are lengthened relative to the same prominent elements phrase-medially (i.e. at a considerable distance from a boundary). Articulatory kinematic data were collected for three subjects to analyze consonant constrictions of prominent syllables located (1) either immediately before or after a boundary and (2) two and three syllables away from that boundary. The results indicate that, as expected, gestures undergo prosodic lengthening when immediately local to the phase boundary. However, some subjects did display prosodic lengthening at a small remove from the boundary for a prominent syllable. This effect was strongest in the postboundary condition. These results suggest that a consideration of prominence may be relevant in understanding the temporal patterning of boundary-related articulatory lengthening.
RESUMO
This study uses a maze navigation task in conjunction with a quasi-scripted, prosodically controlled speech task to examine acoustic and articulatory accommodation in pairs of interacting speakers. The experiment uses a dual electromagnetic articulography set-up to collect synchronized acoustic and articulatory kinematic data from two facing speakers simultaneously. We measure the members of a dyad individually before they interact, while they are interacting in a cooperative task, and again individually after they interact. The design is ideally suited to measure speech convergence, divergence, and persistence effects during and after speaker interaction. This study specifically examines how convergence and divergence effects during a dyadic interaction may be related to prosodically salient positions, such as preceding a phrase boundary. The findings of accommodation in fine-grained prosodic measures illuminate our understanding of how the realization of linguistic phrasal structure is coordinated across interacting speakers. Our findings on individual speaker variability and the time course of accommodation provide novel evidence for accommodation at the level of cognitively specified motor control of individual articulatory gestures. Taken together, these results have implications for understanding the cognitive control of interactional behavior in spoken language communication.
Assuntos
Cognição/fisiologia , Comportamento Cooperativo , Relações Interpessoais , Fala/fisiologia , Adulto , Fenômenos Eletromagnéticos , Feminino , Humanos , Masculino , Medida da Produção da Fala/instrumentação , Medida da Produção da Fala/métodos , Adulto JovemRESUMO
In the past, the nature of the compositional units proposed for spoken language has largely diverged from the types of control units pursued in the domains of other skilled motor tasks. A classic source of evidence as to the units structuring speech has been patterns observed in speech errors--"slips of the tongue". The present study reports, for the first time, on kinematic data from tongue and lip movements during speech errors elicited in the laboratory using a repetition task. Our data are consistent with the hypothesis that speech production results from the assembly of dynamically defined action units--gestures--in a linguistically structured environment. The experimental results support both the presence of gestural units and the dynamical properties of these units and their coordination. This study of speech articulation shows that it is possible to develop a principled account of spoken language within a more general theory of action.
Assuntos
Fala , Comportamento Verbal , Humanos , Idioma , Teoria PsicológicaRESUMO
Much evidence has been found for pervasive links between the manual and speech motor systems, including evidence from infant development, deictic pointing, and repetitive tapping and speaking tasks. We expand on the last of these paradigms to look at intra- and cross-modal effects of emphatic stress, as well as the effects of coordination in the absence of explicit rhythm. In this study, subjects repeatedly tapped their finger and synchronously repeated a single spoken syllable. On each trial, subjects placed an emphatic stress on one finger tap or one spoken syllable. Results show that both movement duration and magnitude are affected by emphatic stress regardless of whether that stress is in the same domain (e.g., effects on the oral articulators when a spoken repetition is stressed) or across domains (e.g., effects on the oral articulators when a tap is stressed). Though the size of the effects differs between intra-and cross-domain emphases, the implementation of stress affects both motor domains, indicating a tight connection. This close coupling is seen even in the absence of stress, though it is highlighted under stress. The results of this study support the idea that implementation of prosody is not domain-specific but relies on general aspects of the motor system.