Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 88
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Magn Reson Med ; 91(1): 61-74, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-37677043

RESUMO

PURPOSE: To improve the spatiotemporal qualities of images and dynamics of speech MRI through an improved data sampling and image reconstruction approach. METHODS: For data acquisition, we used a Poisson-disc random under sampling scheme that reduced the undersampling coherence. For image reconstruction, we proposed a novel locally higher-rank partial separability model. This reconstruction model represented the oral and static regions using separate low-rank subspaces, therefore, preserving their distinct temporal signal characteristics. Regional optimized temporal basis was determined from the regional-optimized virtual coil approach. Overall, we achieved a better spatiotemporal image reconstruction quality with the potential of reducing total acquisition time by 50%. RESULTS: The proposed method was demonstrated through several 2-mm isotropic, 64 mm total thickness, dynamic acquisitions with 40 frames per second and compared to the previous approach using a global subspace model along with other k-space sampling patterns. Individual timeframe images and temporal profiles of speech samples were shown to illustrate the ability of the Poisson-disc under sampling pattern in reducing total acquisition time. Temporal information of sagittal and coronal directions was also shown to illustrate the effectiveness of the locally higher-rank operator and regional optimized temporal basis. To compare the reconstruction qualities of different regions, voxel-wise temporal SNR analysis were performed. CONCLUSION: Poisson-disc sampling combined with a locally higher-rank model and a regional-optimized temporal basis can drastically improve the spatiotemporal image quality and provide a 50% reduction in overall acquisition time.


Assuntos
Imageamento por Ressonância Magnética , Fala , Imageamento por Ressonância Magnética/métodos , Processamento de Imagem Assistida por Computador/métodos , Algoritmos
2.
Magn Reson Med ; 89(2): 652-664, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36289572

RESUMO

PURPOSE: To enable a more comprehensive view of articulations during speech through near-isotropic 3D dynamic MRI with high spatiotemporal resolution and large vocal-tract coverage. METHODS: Using partial separability model-based low-rank reconstruction coupled with a sparse acquisition of both spatial and temporal models, we are able to achieve near-isotropic resolution 3D imaging with a high frame rate. The total acquisition time of the speech acquisition is shortened by introducing a sparse temporal sampling that interleaves one temporal navigator with four randomized phase and slice-encoded imaging samples. Memory and computation time are improved through compressing coils based on the region of interest for low-rank constrained reconstruction with an edge-preserving spatial penalty. RESULTS: The proposed method has been evaluated through experiments on several speech samples, including a standard reading passage. A near-isotropic 1.875 × 1.875 × 2 mm3 spatial resolution, 64-mm through-plane coverage, and a 35.6-fps temporal resolution are achieved. Investigations and analysis on specific speech samples support novel insights into nonsymmetric tongue movement, velum raising, and coarticulation events with adequate visualization of rapid articulatory movements. CONCLUSION: Three-dimensional dynamic images of the vocal tract structures during speech with high spatiotemporal resolution and axial coverage is capable of enhancing linguistic research, enabling visualization of soft tissue motions that are not possible with other modalities.


Assuntos
Imageamento por Ressonância Magnética , Fala , Imageamento por Ressonância Magnética/métodos , Imageamento Tridimensional/métodos , Idioma , Linguística
3.
Cleft Palate Craniofac J ; : 10556656231183385, 2023 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-37335134

RESUMO

OBJECTIVE: To introduce a highly innovative imaging method to study the complex velopharyngeal (VP) system and introduce the potential future clinical applications of a VP atlas in cleft care. DESIGN: Four healthy adults participated in a 20-min dynamic magnetic resonance imaging scan that included a high-resolution T2-weighted turbo-spin-echo 3D structural scan and five custom dynamic speech imaging scans. Subjects repeated a variety of phrases when in the scanner as real-time audio was captured. SETTING: Multisite institution and clinical setting. PARTICIPANTS: Four adult subjects with normal anatomy were recruited for this study. MAIN OUTCOME: Establishment of 4-D atlas constructed from dynamic VP MRI data. RESULTS: Three-dimensional dynamic magnetic resonance imaging was successfully used to obtain high quality dynamic speech scans in an adult population. Scans were able to be re-sliced in various imaging planes. Subject-specific MR data were then reconstructed and time-aligned to create a velopharyngeal atlas representing the averaged physiological movements across the four subjects. CONCLUSIONS: The current preliminary study examined the feasibility of developing a VP atlas for potential clinical applications in cleft care. Our results indicate excellent potential for the development and use of a VP atlas for assessing VP physiology during speech.

4.
J Acoust Soc Am ; 150(5): 3500, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34852570

RESUMO

Magnetic resonance (MR) imaging is becoming an established tool in capturing articulatory and physiological motion of the structures and muscles throughout the vocal tract and enabling visual and quantitative assessment of real-time speech activities. Although motion capture speed has been regularly improved by the continual developments in high-speed MR technology, quantitative analysis of multi-subject group data remains challenging due to variations in speaking rate and imaging time among different subjects. In this paper, a workflow of post-processing methods that matches different MR image datasets within a study group is proposed. Each subject's recorded audio waveform during speech is used to extract temporal domain information and generate temporal alignment mappings from their matching pattern. The corresponding image data are resampled by deformable registration and interpolation of the deformation fields, achieving inter-subject temporal alignment between image sequences. A four-dimensional dynamic MR speech atlas is constructed using aligned volumes from four human subjects. Similarity tests between subject and target domains using the squared error, cross correlation, and mutual information measures all show an overall score increase after spatiotemporal alignment. The amount of image variability in atlas construction is reduced, indicating a quality increase in the multi-subject data for groupwise quantitative analysis.


Assuntos
Algoritmos , Fala , Humanos , Imageamento por Ressonância Magnética , Movimento (Física) , Movimento
5.
Clin Linguist Phon ; 35(11): 1091-1112, 2021 11 02.
Artigo em Inglês | MEDLINE | ID: mdl-33427505

RESUMO

The purpose of this study was to identify aspects of impaired tongue motor performance that limit the ability to produce distinct speech sounds and contribute to reduced speech intelligibility in individuals with dysarthria secondary to amyotrophic lateral sclerosis (ALS). We analyzed simultaneously recorded tongue kinematic and acoustic data from 22 subjects during three target words (cat, dog, and took). The subjects included 11 participants with ALS and 11 healthy controls from the X-ray microbeam dysarthria database (Westbury, 1994). Novel measures were derived based on the range and speed of relative movement between two quasi-independent regions of the tongue - blade and dorsum - to characterize the global pattern of tongue dynamics. These "whole tongue" measures, along with the range and speed of single tongue regions, were compared across words, groups (ALS vs. control), and measure types (whole tongue vs. tongue blade vs. tongue dorsum). Reduced range and speed of both global and regional tongue movements were found in participants with ALS relative to healthy controls, reflecting impaired tongue motor performance in ALS. The extent of impairment, however, varied across words and measure types. Compared with the regional tongue measures, the whole tongue measures showed more consistent disease-related changes across the target words and were more robust predictors of speech intelligibility. Furthermore, these whole tongue measures were correlated with various word-specific acoustic features associated with intelligibility decline in ALS, suggesting that impaired tongue movement likely contributes to reduced phonetic distinctiveness of both vowels and consonants that underlie speech intelligibility decline in ALS.


Assuntos
Esclerose Lateral Amiotrófica , Inteligibilidade da Fala , Acústica , Esclerose Lateral Amiotrófica/complicações , Disartria/etiologia , Humanos , Movimento , Acústica da Fala , Medida da Produção da Fala , Língua
6.
J Acoust Soc Am ; 145(5): EL423, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-31153323

RESUMO

The ability to differentiate post-cancer from healthy tongue muscle coordination patterns is necessary for the advancement of speech motor control theories and for the development of therapeutic and rehabilitative strategies. A deep learning approach is presented to classify two groups using muscle coordination patterns from magnetic resonance imaging (MRI). The proposed method uses tagged-MRI to track the tongue's internal tissue points and atlas-driven non-negative matrix factorization to reduce the dimensionality of the deformation fields. A convolutional neural network is applied to the classification task yielding an accuracy of 96.90%, offering the potential to the development of therapeutic or rehabilitative strategies in speech-related disorders.


Assuntos
Aprendizado Profundo , Movimento/fisiologia , Fala/fisiologia , Língua/fisiologia , Músculos Faciais/fisiologia , Humanos , Imageamento por Ressonância Magnética/métodos , Neoplasias/fisiopatologia , Redes Neurais de Computação
7.
J Acoust Soc Am ; 143(4): EL248, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-29716267

RESUMO

Amyotrophic Lateral Sclerosis (ALS) is a neurological disorder, which impairs tongue function for speech and swallowing. A widely used Diffusion Tensor Imaging (DTI) analysis pipeline is employed for quantifying differences in tongue fiber myoarchitecture between controls and ALS patients. This pipeline uses both high-resolution magnetic resonance imaging (hMRI) and DTI. hMRI is used to delineate tongue muscles, while DTI provides indices to reveal fiber connectivity within and between muscles. The preliminary results using five controls and two patients show quantitative differences between the groups. This work has the potential to provide insights into the detrimental effects of ALS on speech and swallowing.


Assuntos
Esclerose Lateral Amiotrófica/patologia , Doenças da Língua/patologia , Adulto , Idoso , Esclerose Lateral Amiotrófica/complicações , Estudos de Casos e Controles , Imagem de Tensor de Difusão , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Doenças da Língua/etiologia
8.
J Acoust Soc Am ; 141(4): 2579, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-28464688

RESUMO

Biomechanical models of the oropharynx facilitate the study of speech function by providing information that cannot be directly derived from imaging data, such as internal muscle forces and muscle activation patterns. Such models, when constructed and simulated based on anatomy and motion captured from individual speakers, enable the exploration of inter-subject variability of speech biomechanics. These models also allow one to answer questions, such as whether speakers produce similar sounds using essentially the same motor patterns with subtle differences, or vastly different motor equivalent patterns. Following this direction, this study uses speaker-specific modeling tools to investigate the muscle activation variability in two simple speech tasks that move the tongue forward (/ə-ɡis/) vs backward (/ə-suk/). Three dimensional tagged magnetic resonance imaging data were used to inversely drive the biomechanical models in four English speakers. Results show that the genioglossus is the workhorse muscle of the tongue, with activity levels of 10% in different subdivisions at different times. Jaw and hyoid positioners (inferior pterygoid and digastric) also show high activation during specific phonemes. Other muscles may be more involved in fine tuning the shapes. For example, slightly more activation of the anterior portion of the transverse is found during apical than laminal /s/, which would protrude the tongue tip to a greater extent for the apical /s/.


Assuntos
Atividade Motora , Músculo Esquelético/fisiologia , Fala , Língua/fisiologia , Voz , Adulto , Fenômenos Biomecânicos , Feminino , Humanos , Imagem Cinética por Ressonância Magnética , Masculino , Músculo Esquelético/diagnóstico por imagem , Fonação , Músculos Pterigoides/diagnóstico por imagem , Músculos Pterigoides/fisiologia , Língua/diagnóstico por imagem , Adulto Jovem
9.
ArXiv ; 2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-39130200

RESUMO

Delineating lesions and anatomical structure is important for image-guided interventions. Point-supervised medical image segmentation (PSS) has great potential to alleviate costly expert delineation labeling. However, due to the lack of precise size and boundary guidance, the effectiveness of PSS often falls short of expectations. Although recent vision foundational models, such as the medical segment anything model (MedSAM), have made significant advancements in bounding-box-prompted segmentation, it is not straightforward to utilize point annotation, and is prone to semantic ambiguity. In this preliminary study, we introduce an iterative framework to facilitate semantic-aware point-supervised MedSAM. Specifically, the semantic box-prompt generator (SBPG) module has the capacity to convert the point input into potential pseudo bounding box suggestions, which are explicitly refined by the prototype-based semantic similarity. This is then succeeded by a prompt-guided spatial refinement (PGSR) module that harnesses the exceptional generalizability of MedSAM to infer the segmentation mask, which also updates the box proposal seed in SBPG. Performance can be progressively improved with adequate iterations. We conducted an evaluation on BraTS2018 for the segmentation of whole brain tumors and demonstrated its superior performance compared to traditional PSS methods and on par with box-supervised methods.

10.
Radiother Oncol ; 194: 110186, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38412906

RESUMO

BACKGROUND: Accurate gross tumor volume (GTV) delineation is a critical step in radiation therapy treatment planning. However, it is reader dependent and thus susceptible to intra- and inter-reader variability. GTV delineation of soft tissue sarcoma (STS) often relies on CT and MR images. PURPOSE: This study investigates the potential role of 18F-FDG PET in reducing intra- and inter-reader variability thereby improving reproducibility of GTV delineation in STS, without incurring additional costs or radiation exposure. MATERIALS AND METHODS: Three readers performed independent GTV delineation of 61 patients with STS using first CT and MR followed by CT, MR, and 18F-FDG PET images. Each reader performed a total of six delineation trials, three trials per imaging modality group. Dice Similarity Coefficient (DSC) score and Hausdorff distance (HD) were used to assess both intra- and inter-reader variability using generated simultaneous truth and performance level estimation (STAPLE) GTVs as ground truth. Statistical analysis was performed using a Wilcoxon signed-ranked test. RESULTS: There was a statistically significant decrease in both intra- and inter-reader variability in GTV delineation using CT, MR 18F-FDG PET images vs. CT and MR images. This was translated by an increase in the DSC score and a decrease in the HD for GTVs drawn from CT, MR and 18F-FDG PET images vs. GTVs drawn from CT and MR for all readers and across all three trials. CONCLUSION: Incorporation of 18F-FDG PET into CT and MR images decreased intra- and inter-reader variability and subsequently increased reproducibility of GTV delineation in STS.


Assuntos
Fluordesoxiglucose F18 , Imageamento por Ressonância Magnética , Tomografia por Emissão de Pósitrons , Sarcoma , Carga Tumoral , Humanos , Sarcoma/diagnóstico por imagem , Sarcoma/patologia , Sarcoma/radioterapia , Tomografia por Emissão de Pósitrons/métodos , Feminino , Masculino , Imageamento por Ressonância Magnética/métodos , Pessoa de Meia-Idade , Compostos Radiofarmacêuticos , Variações Dependentes do Observador , Adulto , Idoso , Reprodutibilidade dos Testes , Tomografia Computadorizada por Raios X/métodos , Neoplasias de Tecidos Moles/diagnóstico por imagem , Neoplasias de Tecidos Moles/patologia , Neoplasias de Tecidos Moles/radioterapia , Planejamento da Radioterapia Assistida por Computador/métodos
11.
Biotechnol Bioeng ; 110(10): 2697-705, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-23568761

RESUMO

In this article, we investigate the application of contactless high frequency ultrasound microbeam stimulation (HFUMS) for determining the invasion potential of breast cancer cells. In breast cancer patients, the finding of tumor metastasis significantly worsens the clinical prognosis. Thus, early determination of the potential of a tumor for invasion and metastasis would significantly impact decisions about aggressiveness of cancer treatment. Recent work suggests that invasive breast cancer cells (MDA-MB-231), but not weakly invasive breast cancer cells (MCF-7, SKBR3, and BT-474), display a number of neuronal characteristics, including expression of voltage-gated sodium channels. Since sodium channels are often co-expressed with calcium channels, this prompted us to test whether single-cell stimulation by a highly focused ultrasound microbeam would trigger Ca(2+) elevation, especially in highly invasive breast cancer cells. To calibrate the diameter of the microbeam ultrasound produced by a 200-MHz single element LiNbO3 transducer, we focused the beam on a wire target and performed a pulse-echo test. The width of the beam was ∼17 µm, appropriate for single cell stimulation. Membrane-permeant fluorescent Ca(2+) indicators were utilized to monitor Ca(2+) changes in the cells due to HFUMS. The cell response index (CRI), which is a composite parameter reflecting both Ca(2+) elevation and the fraction of responding cells elicited by HFUMS, was much greater in highly invasive breast cancer cells than in the weakly invasive breast cancer cells. The CRI of MDA-MB-231 cells depended on peak-to-peak amplitude of the voltage driving the transducer. These results suggest that HFUMS may serve as a novel tool to determine the invasion potential of breast cancer cells, and with further refinement may offer a rapid test for invasiveness of tumor biopsies in situ.


Assuntos
Neoplasias da Mama , Espaço Intracelular , Invasividade Neoplásica , Imagem Óptica/métodos , Som , Antineoplásicos/farmacologia , Neoplasias da Mama/química , Neoplasias da Mama/metabolismo , Cálcio/análise , Cálcio/metabolismo , Linhagem Celular Tumoral , Sobrevivência Celular/efeitos dos fármacos , Feminino , Humanos , Espaço Intracelular/química , Espaço Intracelular/efeitos dos fármacos , Espaço Intracelular/metabolismo , Espaço Intracelular/efeitos da radiação , Paclitaxel/farmacologia
12.
J Acoust Soc Am ; 133(6): EL439-45, 2013 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-23742437

RESUMO

Magnetic resonance imaging has been widely used in speech production research. Often only one image stack (sagittal, axial, or coronal) is used for vocal tract modeling. As a result, complementary information from other available stacks is not utilized. To overcome this, a recently developed super-resolution technique was applied to integrate three orthogonal low-resolution stacks into one isotropic volume. The results on vowels show that the super-resolution volume produces better vocal tract visualization than any of the low-resolution stacks. Its derived area functions generally produce formant predictions closer to the ground truth, particularly for those formants sensitive to area perturbations at constrictions.


Assuntos
Simulação por Computador , Epiglote/anatomia & histologia , Aumento da Imagem/métodos , Processamento de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Laringe/anatomia & histologia , Lábio/anatomia & histologia , Imageamento por Ressonância Magnética/métodos , Faringe/anatomia & histologia , Fonação/fisiologia , Fonética , Algoritmos , Artefatos , Epiglote/fisiologia , Humanos , Laringe/fisiologia , Lábio/fisiologia , Faringe/fisiologia , Sensibilidade e Especificidade , Software , Espectrografia do Som , Acústica da Fala
13.
Med Image Anal ; 83: 102641, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36265264

RESUMO

Unsupervised domain adaptation (UDA) has been a vital protocol for migrating information learned from a labeled source domain to facilitate the implementation in an unlabeled heterogeneous target domain. Although UDA is typically jointly trained on data from both domains, accessing the labeled source domain data is often restricted, due to concerns over patient data privacy or intellectual property. To sidestep this, we propose "off-the-shelf (OS)" UDA (OSUDA), aimed at image segmentation, by adapting an OS segmentor trained in a source domain to a target domain, in the absence of source domain data in adaptation. Toward this goal, we aim to develop a novel batch-wise normalization (BN) statistics adaptation framework. In particular, we gradually adapt the domain-specific low-order BN statistics, e.g., mean and variance, through an exponential momentum decay strategy, while explicitly enforcing the consistency of the domain shareable high-order BN statistics, e.g., scaling and shifting factors, via our optimization objective. We also adaptively quantify the channel-wise transferability to gauge the importance of each channel, via both low-order statistics divergence and a scaling factor. Furthermore, we incorporate unsupervised self-entropy minimization into our framework to boost performance alongside a novel queued, memory-consistent self-training strategy to utilize the reliable pseudo label for stable and efficient unsupervised adaptation. We evaluated our OSUDA-based framework on both cross-modality and cross-subtype brain tumor segmentation and cardiac MR to CT segmentation tasks. Our experimental results showed that our memory consistent OSUDA performs better than existing source-relaxed UDA methods and yields similar performance to UDA methods with source data.


Assuntos
Neoplasias Encefálicas , Aprendizagem , Humanos , Neoplasias Encefálicas/diagnóstico por imagem , Entropia , Coração , Movimento (Física)
14.
IEEE Trans Biomed Eng ; 70(4): 1252-1263, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36227815

RESUMO

Deep learning (DL)-based automatic sleep staging approaches have attracted much attention recently due in part to their outstanding accuracy. At the testing stage, however, the performance of these approaches is likely to be degraded, when applied in different testing environments, because of the problem of domain shift. This is because while a pre-trained model is typically trained on noise-free electroencephalogram (EEG) signals acquired from accurate medical equipment, deployment is carried out on consumer-level devices with undesirable noise. To alleviate this challenge, in this work, we propose an efficient training approach that is robust against unseen arbitrary noise. In particular, we propose to generate the worst-case input perturbations by means of adversarial transformation in an auxiliary model, to learn a wide range of input perturbations and thereby to improve reliability. Our approach is based on two separate training models: (i) an auxiliary model to generate adversarial noise and (ii) a target network to incorporate the noise signal to enhance robustness. Furthermore, we exploit novel class-wise robustness during the training of the target network to represent different robustness patterns of each sleep stage. Our experimental results demonstrated that our approach improved sleep staging performance on healthy controls, in the presence of moderate to severe noise levels, compared with competing methods. Our approach was able to effectively train and deploy a DL model to handle different types of noise, including adversarial, Gaussian, and shot noise.


Assuntos
Eletroencefalografia , Fases do Sono , Reprodutibilidade dos Testes , Distribuição Normal
15.
Med Phys ; 50(3): 1539-1548, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36331429

RESUMO

BACKGROUND: In medical imaging, images are usually treated as deterministic, while their uncertainties are largely underexplored. PURPOSE: This work aims at using deep learning to efficiently estimate posterior distributions of imaging parameters, which in turn can be used to derive the most probable parameters as well as their uncertainties. METHODS: Our deep learning-based approaches are based on a variational Bayesian inference framework, which is implemented using two different deep neural networks based on conditional variational auto-encoder (CVAE), CVAE-dual-encoder, and CVAE-dual-decoder. The conventional CVAE framework, that is, CVAE-vanilla, can be regarded as a simplified case of these two neural networks. We applied these approaches to a simulation study of dynamic brain PET imaging using a reference region-based kinetic model. RESULTS: In the simulation study, we estimated posterior distributions of PET kinetic parameters given a measurement of the time-activity curve. Our proposed CVAE-dual-encoder and CVAE-dual-decoder yield results that are in good agreement with the asymptotically unbiased posterior distributions sampled by Markov Chain Monte Carlo (MCMC). The CVAE-vanilla can also be used for estimating posterior distributions, although it has an inferior performance to both CVAE-dual-encoder and CVAE-dual-decoder. CONCLUSIONS: We have evaluated the performance of our deep learning approaches for estimating posterior distributions in dynamic brain PET. Our deep learning approaches yield posterior distributions, which are in good agreement with unbiased distributions estimated by MCMC. All these neural networks have different characteristics and can be chosen by the user for specific applications. The proposed methods are general and can be adapted to other problems.


Assuntos
Aprendizado Profundo , Teorema de Bayes , Tomografia por Emissão de Pósitrons/métodos , Simulação por Computador , Redes Neurais de Computação
16.
Artigo em Inglês | MEDLINE | ID: mdl-38031559

RESUMO

Cardiac cine magnetic resonance imaging (MRI) has been used to characterize cardiovascular diseases (CVD), often providing a noninvasive phenotyping tool. While recently flourished deep learning based approaches using cine MRI yield accurate characterization results, the performance is often degraded by small training samples. In addition, many deep learning models are deemed a "black box," for which models remain largely elusive in how models yield a prediction and how reliable they are. To alleviate this, this work proposes a lightweight successive subspace learning (SSL) framework for CVD classification, based on an interpretable feedforward design, in conjunction with a cardiac atlas. Specifically, our hierarchical SSL model is based on (i) neighborhood voxel expansion, (ii) unsupervised subspace approximation, (iii) supervised regression, and (iv) multi-level feature integration. In addition, using two-phase 3D deformation fields, including end-diastolic and end-systolic phases, derived between the atlas and individual subjects as input offers objective means of assessing CVD, even with small training samples. We evaluate our framework on the ACDC2017 database, comprising one healthy group and four disease groups. Compared with 3D CNN-based approaches, our framework achieves superior classification performance with 140× fewer parameters, which supports its potential value in clinical use.

17.
Artigo em Inglês | MEDLINE | ID: mdl-38009135

RESUMO

Investigating the relationship between internal tissue point motion of the tongue and oropharyngeal muscle deformation measured from tagged MRI and intelligible speech can aid in advancing speech motor control theories and developing novel treatment methods for speech related-disorders. However, elucidating the relationship between these two sources of information is challenging, due in part to the disparity in data structure between spatiotemporal motion fields (i.e., 4D motion fields) and one-dimensional audio waveforms. In this work, we present an efficient encoder-decoder translation network for exploring the predictive information inherent in 4D motion fields via 2D spectrograms as a surrogate of the audio data. Specifically, our encoder is based on 3D convolutional spatial modeling and transformer-based temporal modeling. The extracted features are processed by an asymmetric 2D convolution decoder to generate spectrograms that correspond to 4D motion fields. Furthermore, we incorporate a generative adversarial training approach into our framework to further improve synthesis quality on our generated spectrograms. We experiment on 63 paired motion field sequences and speech waveforms, demonstrating that our framework enables the generation of clear audio waveforms from a sequence of motion fields. Thus, our framework has the potential to improve our understanding of the relationship between these two modalities and inform the development of treatments for speech disorders.

18.
Artigo em Inglês | MEDLINE | ID: mdl-37621417

RESUMO

New developments in dynamic magnetic resonance imaging (MRI) facilitate high-quality data acquisition of human velopharyngeal deformations in real-time speech. With recently established speech motion atlases, group analysis is made possible via spatially and temporally aligned datasets in the atlas space from a desired population of interest. In practice, when analyzing motion characteristics from various subjects performing a designated speech task, it is observed that different subjects' velopharyngeal deformation patterns could vary during the pronunciation of the same utterance, regardless of the spatial and temporal alignment of their MRI. Since such variation can be subtle, identification and extraction of unique patterns out of these high-dimensional datasets is a challenging task. In this work, we present a method that computes and visualizes subtle deformation variation patterns as principal components of a subject group's dynamic motion fields in the atlas space. Coupled with the real-time speech audio recordings during image acquisition, the key time frames that contain maximum speech variations are identified by the principal components of temporally aligned audio waveforms, which in turn inform the temporal location of the maximum spatial deformation variation. Henceforth, the motion fields between the key frames and the reference frame for each subject are computed and warped into the common atlas space, enabling a direct extraction of motion variation patterns via quantitative analysis. The method was evaluated on a dataset of twelve healthy subjects. Subtle velopharyngeal motion differences were visualized quantitatively to reveal pronunciation-specific patterns among different subjects.

19.
ArXiv ; 2023 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-37396599

RESUMO

Deep learning (DL) models for segmenting various anatomical structures have achieved great success via a static DL model that is trained in a single source domain. Yet, the static DL model is likely to perform poorly in a continually evolving environment, requiring appropriate model updates. In an incremental learning setting, we would expect that well-trained static models are updated, following continually evolving target domain data-e.g., additional lesions or structures of interest-collected from different sites, without catastrophic forgetting. This, however, poses challenges, due to distribution shifts, additional structures not seen during the initial model training, and the absence of training data in a source domain. To address these challenges, in this work, we seek to progressively evolve an "off-the-shelf" trained segmentation model to diverse datasets with additional anatomical categories in a unified manner. Specifically, we first propose a divergence-aware dual-flow module with balanced rigidity and plasticity branches to decouple old and new tasks, which is guided by continuous batch renormalization. Then, a complementary pseudo-label training scheme with self-entropy regularized momentum MixUp decay is developed for adaptive network optimization. We evaluated our framework on a brain tumor segmentation task with continually changing target domains-i.e., new MRI scanners/modalities with incremental structures. Our framework was able to well retain the discriminability of previously learned structures, hence enabling the realistic life-long segmentation model extension along with the widespread accumulation of big medical data.

20.
ArXiv ; 2023 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-36994161

RESUMO

Background: In medical imaging, images are usually treated as deterministic, while their uncertainties are largely underexplored. Purpose: This work aims at using deep learning to efficiently estimate posterior distributions of imaging parameters, which in turn can be used to derive the most probable parameters as well as their uncertainties. Methods: Our deep learning-based approaches are based on a variational Bayesian inference framework, which is implemented using two different deep neural networks based on conditional variational auto-encoder (CVAE), CVAE-dual-encoder and CVAE-dual-decoder. The conventional CVAE framework, i.e., CVAE-vanilla, can be regarded as a simplified case of these two neural networks. We applied these approaches to a simulation study of dynamic brain PET imaging using a reference region-based kinetic model. Results: In the simulation study, we estimated posterior distributions of PET kinetic parameters given a measurement of time-activity curve. Our proposed CVAE-dual-encoder and CVAE-dual-decoder yield results that are in good agreement with the asymptotically unbiased posterior distributions sampled by Markov Chain Monte Carlo (MCMC). The CVAE-vanilla can also be used for estimating posterior distributions, although it has an inferior performance to both CVAE-dual-encoder and CVAE-dual-decoder. Conclusions: We have evaluated the performance of our deep learning approaches for estimating posterior distributions in dynamic brain PET. Our deep learning approaches yield posterior distributions, which are in good agreement with unbiased distributions estimated by MCMC. All these neural networks have different characteristics and can be chosen by the user for specific applications. The proposed methods are general and can be adapted to other problems.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa