Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 84
Filtrar
1.
Magn Reson Med ; 86(2): 725-737, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-33665929

RESUMO

PURPOSE: To develop an image-based motion-robust diffusion MRI (dMRI) acquisition framework that is able to minimize motion artifacts caused by rigid and nonrigid motion, applicable to both brain and tongue dMRI. METHODS: We developed a novel prospective motion-correction technique in dMRI using a phase image-based real-time motion-detection method (PITA-MDD) with re-acquisition of motion-corrupted images. The prospective PITA-MDD acquisition technique was tested in the brains and tongues of volunteers. The subjects were instructed to move their heads or swallow, to induce motion. Motion-detection efficacy was validated against visual inspection as the gold standard. The effect of the PITA-MDD technique on diffusion-parameter estimates was evaluated by comparing reconstructed fiber tracts using tractography with and without re-acquisition. RESULTS: The prospective PITA-MDD technique was able to effectively and accurately detect motion-corrupted data as compared with visual inspection. Tractography results demonstrated that PITA-MDD motion detection followed by re-acquisition helps in recovering lost and misshaped fiber tracts in the brain and tongue that would otherwise be corrupted by motion and yield erroneous estimates of the diffusion tensor. CONCLUSION: A prospective PITA-MDD technique was developed for dMRI acquisition, providing improved dMRI image quality and motion-robust diffusion estimation of the brain and tongue.


Assuntos
Encéfalo , Imagem de Difusão por Ressonância Magnética , Algoritmos , Artefatos , Encéfalo/diagnóstico por imagem , Humanos , Processamento de Imagem Assistida por Computador , Movimento (Física) , Estudos Prospectivos , Língua/diagnóstico por imagem
2.
J Acoust Soc Am ; 145(5): EL423, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-31153323

RESUMO

The ability to differentiate post-cancer from healthy tongue muscle coordination patterns is necessary for the advancement of speech motor control theories and for the development of therapeutic and rehabilitative strategies. A deep learning approach is presented to classify two groups using muscle coordination patterns from magnetic resonance imaging (MRI). The proposed method uses tagged-MRI to track the tongue's internal tissue points and atlas-driven non-negative matrix factorization to reduce the dimensionality of the deformation fields. A convolutional neural network is applied to the classification task yielding an accuracy of 96.90%, offering the potential to the development of therapeutic or rehabilitative strategies in speech-related disorders.


Assuntos
Aprendizado Profundo , Movimento/fisiologia , Fala/fisiologia , Língua/fisiologia , Músculos Faciais/fisiologia , Humanos , Imageamento por Ressonância Magnética/métodos , Neoplasias/fisiopatologia , Redes Neurais de Computação
3.
J Acoust Soc Am ; 141(4): 2579, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-28464688

RESUMO

Biomechanical models of the oropharynx facilitate the study of speech function by providing information that cannot be directly derived from imaging data, such as internal muscle forces and muscle activation patterns. Such models, when constructed and simulated based on anatomy and motion captured from individual speakers, enable the exploration of inter-subject variability of speech biomechanics. These models also allow one to answer questions, such as whether speakers produce similar sounds using essentially the same motor patterns with subtle differences, or vastly different motor equivalent patterns. Following this direction, this study uses speaker-specific modeling tools to investigate the muscle activation variability in two simple speech tasks that move the tongue forward (/ə-ɡis/) vs backward (/ə-suk/). Three dimensional tagged magnetic resonance imaging data were used to inversely drive the biomechanical models in four English speakers. Results show that the genioglossus is the workhorse muscle of the tongue, with activity levels of 10% in different subdivisions at different times. Jaw and hyoid positioners (inferior pterygoid and digastric) also show high activation during specific phonemes. Other muscles may be more involved in fine tuning the shapes. For example, slightly more activation of the anterior portion of the transverse is found during apical than laminal /s/, which would protrude the tongue tip to a greater extent for the apical /s/.


Assuntos
Atividade Motora , Músculo Esquelético/fisiologia , Fala , Língua/fisiologia , Voz , Adulto , Fenômenos Biomecânicos , Feminino , Humanos , Imagem Cinética por Ressonância Magnética , Masculino , Músculo Esquelético/diagnóstico por imagem , Fonação , Músculos Pterigoides/diagnóstico por imagem , Músculos Pterigoides/fisiologia , Língua/diagnóstico por imagem , Adulto Jovem
4.
Clin Linguist Phon ; 30(3-5): 313-27, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26786063

RESUMO

A new contour-tracking algorithm is presented for ultrasound tongue image sequences, which can follow the motion of tongue contours over long durations with good robustness. To cope with missing segments caused by noise, or by the tongue midsagittal surface being parallel to the direction of ultrasound wave propagation, active contours with a contour-similarity constraint are introduced, which can be used to provide 'prior' shape information. Also, in order to address accumulation of tracking errors over long sequences, we present an automatic re-initialization technique, based on the complex wavelet image similarity index. Experiments on synthetic data and on real 60 frame per second (fps) data from different subjects demonstrate that the proposed method gives good contour tracking for ultrasound image sequences even over durations of minutes, which can be useful in applications such as speech recognition where very long sequences must be analyzed in their entirety.


Assuntos
Algoritmos , Língua/fisiologia , Ultrassonografia , Feminino , Humanos , Masculino , Modelos Biológicos , Língua/diagnóstico por imagem
5.
J Acoust Soc Am ; 133(6): EL439-45, 2013 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-23742437

RESUMO

Magnetic resonance imaging has been widely used in speech production research. Often only one image stack (sagittal, axial, or coronal) is used for vocal tract modeling. As a result, complementary information from other available stacks is not utilized. To overcome this, a recently developed super-resolution technique was applied to integrate three orthogonal low-resolution stacks into one isotropic volume. The results on vowels show that the super-resolution volume produces better vocal tract visualization than any of the low-resolution stacks. Its derived area functions generally produce formant predictions closer to the ground truth, particularly for those formants sensitive to area perturbations at constrictions.


Assuntos
Simulação por Computador , Epiglote/anatomia & histologia , Aumento da Imagem/métodos , Processamento de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Laringe/anatomia & histologia , Lábio/anatomia & histologia , Imageamento por Ressonância Magnética/métodos , Faringe/anatomia & histologia , Fonação/fisiologia , Fonética , Algoritmos , Artefatos , Epiglote/fisiologia , Humanos , Laringe/fisiologia , Lábio/fisiologia , Faringe/fisiologia , Sensibilidade e Especificidade , Software , Espectrografia do Som , Acústica da Fala
6.
IEEE Comput Graph Appl ; 43(3): 88-93, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37195830

RESUMO

Some 15 years ago, Visualization Viewpoints published an influential article titled Rainbow Color Map (Still) Considered Harmful (Borland and Taylor, 2007). The paper argued that the "rainbow colormap's characteristics of confusing the viewer, obscuring the data and actively misleading interpretation make it a poor choice for visualization." Subsequent articles often repeat and extend these arguments, so much so that avoiding rainbow colormaps, along with their derivatives, has become dogma in the visualization community. Despite this loud and persistent recommendation, scientists continue to use rainbow colormaps. Have we failed to communicate our message, or do rainbow colormaps offer advantages that have not been fully appreciated? We argue that rainbow colormaps have properties that are underappreciated by existing design conventions. We explore key critiques of the rainbow in the context of recent research to understand where and how rainbows might be misunderstood. Choosing a colormap is a complex task, and rainbow colormaps can be useful for selected applications.

7.
Artigo em Inglês | MEDLINE | ID: mdl-38009135

RESUMO

Investigating the relationship between internal tissue point motion of the tongue and oropharyngeal muscle deformation measured from tagged MRI and intelligible speech can aid in advancing speech motor control theories and developing novel treatment methods for speech related-disorders. However, elucidating the relationship between these two sources of information is challenging, due in part to the disparity in data structure between spatiotemporal motion fields (i.e., 4D motion fields) and one-dimensional audio waveforms. In this work, we present an efficient encoder-decoder translation network for exploring the predictive information inherent in 4D motion fields via 2D spectrograms as a surrogate of the audio data. Specifically, our encoder is based on 3D convolutional spatial modeling and transformer-based temporal modeling. The extracted features are processed by an asymmetric 2D convolution decoder to generate spectrograms that correspond to 4D motion fields. Furthermore, we incorporate a generative adversarial training approach into our framework to further improve synthesis quality on our generated spectrograms. We experiment on 63 paired motion field sequences and speech waveforms, demonstrating that our framework enables the generation of clear audio waveforms from a sequence of motion fields. Thus, our framework has the potential to improve our understanding of the relationship between these two modalities and inform the development of treatments for speech disorders.

8.
Interspeech ; 2023: 4189-4193, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38107509

RESUMO

Finite element models (FEM) of the tongue have facilitated speech studies through analysis of internal muscle forces indirectly derived from imaging data. In this work, we build a uniform hexahedral FEM of a tongue atlas constructed from magnetic resonance imaging data of a healthy population. The FEM is driven by inverse internal tongue tissue kinematics of speakers temporally aligned and deformed into the same atlas space, while performing the speech task "a souk" allowing muscle activation predictions. This work aims to investigate the commonalities in tongue motor strategies in the articulation of "a souk" predicted by the inverse tongue atlas model. Our findings report variability among five speakers for estimated muscle activations with a similarity index using a dynamic time warp function. Two speakers show similarity index > 0.9 and two others < 0.7 with respect to a reference speaker for most tongue muscles. The relative motion tracking error of the model is less than 2% which is promising for speech study applications.

9.
J Speech Lang Hear Res ; 66(2): 513-526, 2023 02 13.
Artigo em Inglês | MEDLINE | ID: mdl-36716389

RESUMO

PURPOSE: Muscle groups within the tongue in healthy and diseased populations show different behaviors during speech. Visualizing and quantifying strain patterns of these muscle groups during tongue motion can provide insights into tongue motor control and adaptive behaviors of a patient. METHOD: We present a pipeline to estimate the strain along the muscle fiber directions in the deforming tongue during speech production. A deep convolutional network estimates the crossing muscle fiber directions in the tongue using diffusion-weighted magnetic resonance imaging (MRI) data acquired at rest. A phase-based registration algorithm is used to estimate motion of the tongue muscles from tagged MRI acquired during speech. After transforming both muscle fiber directions and motion fields into a common atlas space, strain tensors are computed and projected onto the muscle fiber directions, forming so-called strains in the line of actions (SLAs) throughout the tongue. SLAs are then averaged over individual muscles that have been manually labeled in the atlas space using high-resolution T2-weighted MRI. Data were acquired, and this pipeline was run on a cohort of eight healthy controls and two glossectomy patients. RESULTS: The crossing muscle fibers reconstructed by the deep network show orthogonal patterns. The strain analysis results demonstrate consistency of muscle behaviors among some healthy controls during speech production. The patients show irregular muscle patterns, and their tongue muscles tend to show more extension than the healthy controls. CONCLUSIONS: The study showed visual evidence of correlation between two muscle groups during speech production. Patients tend to have different strain patterns compared to the controls. Analysis of variations in muscle strains can potentially help develop treatment strategies in oral diseases. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.21957011.


Assuntos
Imageamento por Ressonância Magnética , Fala , Humanos , Fala/fisiologia , Imageamento por Ressonância Magnética/métodos , Língua/diagnóstico por imagem , Língua/fisiologia , Glossectomia , Fibras Musculares Esqueléticas
10.
ArXiv ; 2023 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-37292465

RESUMO

Self-training is an important class of unsupervised domain adaptation (UDA) approaches that are used to mitigate the problem of domain shift, when applying knowledge learned from a labeled source domain to unlabeled and heterogeneous target domains. While self-training-based UDA has shown considerable promise on discriminative tasks, including classification and segmentation, through reliable pseudo-label filtering based on the maximum softmax probability, there is a paucity of prior work on self-training-based UDA for generative tasks, including image modality translation. To fill this gap, in this work, we seek to develop a generative self-training (GST) framework for domain adaptive image translation with continuous value prediction and regression objectives. Specifically, we quantify both aleatoric and epistemic uncertainties within our GST using variational Bayes learning to measure the reliability of synthesized data. We also introduce a self-attention scheme that de-emphasizes the background region to prevent it from dominating the training process. The adaptation is then carried out by an alternating optimization scheme with target domain supervision that focuses attention on the regions with reliable pseudo-labels. We evaluated our framework on two cross-scanner/center, inter-subject translation tasks, including tagged-to-cine magnetic resonance (MR) image translation and T1-weighted MR-to-fractional anisotropy translation. Extensive validations with unpaired target domain data showed that our GST yielded superior synthesis performance in comparison to adversarial training UDA methods.

11.
Med Image Anal ; 88: 102851, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37329854

RESUMO

Self-training is an important class of unsupervised domain adaptation (UDA) approaches that are used to mitigate the problem of domain shift, when applying knowledge learned from a labeled source domain to unlabeled and heterogeneous target domains. While self-training-based UDA has shown considerable promise on discriminative tasks, including classification and segmentation, through reliable pseudo-label filtering based on the maximum softmax probability, there is a paucity of prior work on self-training-based UDA for generative tasks, including image modality translation. To fill this gap, in this work, we seek to develop a generative self-training (GST) framework for domain adaptive image translation with continuous value prediction and regression objectives. Specifically, we quantify both aleatoric and epistemic uncertainties within our GST using variational Bayes learning to measure the reliability of synthesized data. We also introduce a self-attention scheme that de-emphasizes the background region to prevent it from dominating the training process. The adaptation is then carried out by an alternating optimization scheme with target domain supervision that focuses attention on the regions with reliable pseudo-labels. We evaluated our framework on two cross-scanner/center, inter-subject translation tasks, including tagged-to-cine magnetic resonance (MR) image translation and T1-weighted MR-to-fractional anisotropy translation. Extensive validations with unpaired target domain data showed that our GST yielded superior synthesis performance in comparison to adversarial training UDA methods.


Assuntos
Processamento de Imagem Assistida por Computador , Aprendizagem , Humanos , Teorema de Bayes , Reprodutibilidade dos Testes , Anisotropia , Incerteza
12.
Med Image Comput Comput Assist Interv ; 14226: 435-445, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38651032

RESUMO

The tongue's intricate 3D structure, comprising localized functional units, plays a crucial role in the production of speech. When measured using tagged MRI, these functional units exhibit cohesive displacements and derived quantities that facilitate the complex process of speech production. Non-negative matrix factorization-based approaches have been shown to estimate the functional units through motion features, yielding a set of building blocks and a corresponding weighting map. Investigating the link between weighting maps and speech acoustics can offer significant insights into the intricate process of speech production. To this end, in this work, we utilize two-dimensional spectrograms as a proxy representation, and develop an end-to-end deep learning framework for translating weighting maps to their corresponding audio waveforms. Our proposed plastic light transformer (PLT) framework is based on directional product relative position bias and single-level spatial pyramid pooling, thus enabling flexible processing of weighting maps with variable size to fixed-size spectrograms, without input information loss or dimension expansion. Additionally, our PLT framework efficiently models the global correlation of wide matrix input. To improve the realism of our generated spectrograms with relatively limited training samples, we apply pair-wise utterance consistency with Maximum Mean Discrepancy constraint and adversarial training. Experimental results on a dataset of 29 subjects speaking two utterances demonstrated that our framework is able to synthesize speech audio waveforms from weighting maps, outperforming conventional convolution and transformer models.

13.
Artigo em Inglês | MEDLINE | ID: mdl-26157329

RESUMO

American English can be produced with two types of /s/: apical or laminal. These productions differ in that the apical gesture requires independent tongue tip elevation, and the laminal does not. Postglossectomy speakers, who have lost a unilateral portion of the tongue body along the outer edge, lose innervation to the tongue tip. We hypothesize that postglossectomy patients, even those with a preserved tongue tip, will be more likely to use laminal tongue shapes because of reduced control of the tongue tip. This study examines /s/ type, palate height, and related parameters in 24 control participants and 13 patients with lateral resections using cine-MRI and dental casts. Results of this dataset show that palate height affects choice of /s/ in control participants, but not in patients. Patients tend to use laminal /s/.

14.
Med Image Comput Comput Assist Interv ; 13436: 376-386, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-36820764

RESUMO

Understanding the underlying relationship between tongue and oropharyngeal muscle deformation seen in tagged-MRI and intelligible speech plays an important role in advancing speech motor control theories and treatment of speech related-disorders. Because of their heterogeneous representations, however, direct mapping between the two modalities-i.e., two-dimensional (mid-sagittal slice) plus time tagged-MRI sequence and its corresponding one-dimensional waveform-is not straightforward. Instead, we resort to two-dimensional spectrograms as an intermediate representation, which contains both pitch and resonance, from which to develop an end-to-end deep learning framework to translate from a sequence of tagged-MRI to its corresponding audio waveform with limited dataset size. Our framework is based on a novel fully convolutional asymmetry translator with guidance of a self residual attention strategy to specifically exploit the moving muscular structures during speech. In addition, we leverage a pairwise correlation of the samples with the same utterances with a latent space representation disentanglement strategy. Furthermore, we incorporate an adversarial training approach with generative adversarial networks to offer improved realism on our generated spectrograms. Our experimental results, carried out with a total of 63 tagged-MRI sequences alongside speech acoustics, showed that our framework enabled the generation of clear audio waveforms from a sequence of tagged-MRI, surpassing competing methods. Thus, our framework provides the great potential to help better understand the relationship between the two modalities.

15.
Artigo em Inglês | MEDLINE | ID: mdl-36212702

RESUMO

Multimodal representation learning using visual movements from cine magnetic resonance imaging (MRI) and their acoustics has shown great potential to learn shared representation and to predict one modality from another. Here, we propose a new synthesis framework to translate from cine MRI sequences to spectrograms with a limited dataset size. Our framework hinges on a novel fully convolutional heterogeneous translator, with a 3D CNN encoder for efficient sequence encoding and a 2D transpose convolution decoder. In addition, a pairwise correlation of the samples with the same speech word is utilized with a latent space representation disentanglement scheme. Furthermore, an adversarial training approach with generative adversarial networks is incorporated to provide enhanced realism on our generated spectrograms. Our experimental results, carried out with a total of 63 cine MRI sequences alongside speech acoustics, show that our framework improves synthesis accuracy, compared with competing methods. Our framework thereby has shown the potential to aid in better understanding the relationship between the two modalities.

16.
Artigo em Inglês | MEDLINE | ID: mdl-36203947

RESUMO

Cycle reconstruction regularized adversarial training-e.g., CycleGAN, DiscoGAN, and DualGAN-has been widely used for image style transfer with unpaired training data. Several recent works, however, have shown that local distortions are frequent, and structural consistency cannot be guaranteed. Targeting this issue, prior works usually relied on additional segmentation or consistent feature extraction steps that are task-specific. To counter this, this work aims to learn a general add-on structural feature extractor, by explicitly enforcing the structural alignment between an input and its synthesized image. Specifically, we propose a novel input-output image patches self-training scheme to achieve a disentanglement of underlying anatomical structures and imaging modalities. The translator and structure encoder are updated, following an alternating training protocol. In addition, the information w.r.t. imaging modality can be eliminated with an asymmetric adversarial game. We train, validate, and test our network on 1,768, 416, and 1,560 unpaired subject-independent slices of tagged and cine magnetic resonance imaging from a total of twenty healthy subjects, respectively, demonstrating superior performance over competing methods.

17.
Artigo em Inglês | MEDLINE | ID: mdl-36777787

RESUMO

Accurate strain measurement in a deforming organ has been essential in motion analysis using medical images. In recent years, internal tissue's in vivo motion and strain computation has been mostly achieved through dynamic magnetic resonance (MR) imaging. However, such data lack information on tissue's intrinsic fiber directions, preventing computed strain tensors from being projected onto a direction of interest. Although diffusion-weighted MR imaging excels at providing fiber tractography, it yields static images unmatched with dynamic MR data. This work reports an algorithm workflow that estimates strain values in the diffusion MR space by matching corresponding tagged dynamic MR images. We focus on processing a dataset of various human tongue deformations in speech. The geometry of tongue muscle fibers is provided by diffusion tractography, while spatiotemporal motion fields are provided by tagged MR analysis. The tongue's deforming shapes are determined by segmenting a synthetic cine dynamic MR sequence generated from tagged data using a deep neural network. Estimated motion fields are transformed into the diffusion MR space using diffeomorphic registration, eventually leading to strain values computed in the direction of muscle fibers. The method was tested on 78 time volumes acquired during three sets of specific tongue deformations including both speech and protrusion motion. Strain in the line of action of seven internal tongue muscles was extracted and compared both intra- and inter-subject. Resulting compression and stretching patterns of individual muscles revealed the unique behavior of individual muscles and their potential activation pattern.

18.
J Speech Lang Hear Res ; 65(10): 3661-3673, 2022 10 17.
Artigo em Inglês | MEDLINE | ID: mdl-36054846

RESUMO

PURPOSE: The goal of this study is to validate the muscle architecture derived from both ex vivo and in vivo diffusion-weighted magnetic resonance imaging (dMRI) of the human tongue with histology of an ex vivo tongue. METHOD: dMRI was acquired with a 200-direction high angular resolution diffusion imaging (HARDI) diffusion scheme for both a postmortem head (imaged within 48 hr after death) and a healthy volunteer. After MRI, the postmortem head was fixed and the tongue excised for hematoxylin and eosin (H&E) staining and histology imaging. Structure tensor images were generated from the stained images to better demonstrate muscle fiber orientations. The tongue muscle fiber orientations, estimated from dMRI, were visualized using the tractogram, a novel representation of crossing fiber orientations, and compared against the histology images of the ex vivo tongue. RESULTS: Muscle fibers identified in the tractograms showed good correspondence with those appearing in the histology images. We further demonstrated tongue muscle architecture in in vivo tractograms for the entire tongue. CONCLUSION: The study demonstrates that dMRI can accurately reveal the complex muscle architecture of the human tongue and may potentially benefit planning and evaluation of oral surgery and research on speech and swallowing.


Assuntos
Imagem de Difusão por Ressonância Magnética , Fibras Musculares Esqueléticas , Encéfalo , Imagem de Difusão por Ressonância Magnética/métodos , Amarelo de Eosina-(YS)/análise , Hematoxilina/análise , Humanos , Imageamento por Ressonância Magnética/métodos , Língua/diagnóstico por imagem
19.
Artigo em Inglês | MEDLINE | ID: mdl-34012189

RESUMO

To advance our understanding of speech motor control, it is essential to image and assess dynamic functional patterns of internal structures caused by the complex muscle anatomy inside the human tongue. Speech pathologists are investigating into new tools that help assessment of internal tongue muscle's cooperative mechanics on top of their anatomical differences. Previous studies using dynamic magnetic resonance imaging (MRI) of the tongue revealed that tongue muscles tend to function in different groups during speech, especially the floor-of-the-mouth (FOM) muscles. In this work, we developed a method that analyzed the unique functional pattern of the FOM muscles in speech. First, four-dimensional motion fields of the whole tongue were computed using tagged MRI. Meanwhile, a statistical atlas of the tongue was constructed to form a common space for subject comparison, while a manually delineated mask of internal tongue muscles was used to separate individual muscle's motion. Then we computed four-dimensional motion correlation between each muscle and the FOM muscle group. Finally, dynamic correlation of different muscle groups was compared and evaluated. We used data from a study group of nineteen subjects including both healthy controls and oral cancer patients. Results revealed that most internal tongue muscles coordinated in a similar pattern in speech while the FOM muscles followed a unique pattern that helped supporting the tongue body and pivoting its rotation. The proposed method can help provide further interpretation of clinical observations and speech motor control from an imaging point of view.

20.
Med Image Anal ; 72: 102131, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34174748

RESUMO

Intelligible speech is produced by creating varying internal local muscle groupings-i.e., functional units-that are generated in a systematic and coordinated manner. There are two major challenges in characterizing and analyzing functional units. First, due to the complex and convoluted nature of tongue structure and function, it is of great importance to develop a method that can accurately decode complex muscle coordination patterns during speech. Second, it is challenging to keep identified functional units across subjects comparable due to their substantial variability. In this work, to address these challenges, we develop a new deep learning framework to identify common and subject-specific functional units of tongue motion during speech. Our framework hinges on joint deep graph-regularized sparse non-negative matrix factorization (NMF) using motion quantities derived from displacements by tagged Magnetic Resonance Imaging. More specifically, we transform NMF with sparse and graph regularizations into modular architectures akin to deep neural networks by means of unfolding the Iterative Shrinkage-Thresholding Algorithm to learn interpretable building blocks and associated weighting map. We then apply spectral clustering to common and subject-specific weighting maps from which we jointly determine the common and subject-specific functional units. Experiments carried out with simulated datasets show that the proposed method achieved on par or better clustering performance over the comparison methods.Experiments carried out with in vivo tongue motion data show that the proposed method can determine the common and subject-specific functional units with increased interpretability and decreased size variability.


Assuntos
Algoritmos , Fala , Humanos , Imageamento por Ressonância Magnética , Redes Neurais de Computação , Língua/diagnóstico por imagem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA