Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 67
Filtrar
1.
Cost Eff Resour Alloc ; 22(1): 44, 2024 May 22.
Artigo em Inglês | MEDLINE | ID: mdl-38773527

RESUMO

BACKGROUND: Deep learning (DL) is a new technology that can assist prenatal ultrasound (US) in the detection of congenital heart disease (CHD) at the prenatal stage. Hence, an economic-epidemiologic evaluation (aka Cost-Utility Analysis) is required to assist policymakers in deciding whether to adopt the new technology. METHODS: The incremental cost-utility ratios (CUR), of adding DL assisted ultrasound (DL-US) to the current provision of US plus pulse oximetry (POX), was calculated by building a spreadsheet model that integrated demographic, economic epidemiological, health service utilization, screening performance, survival and lifetime quality of life data based on the standard formula: CUR = Increase in Intervention Costs - Decrease in Treatment costs Averted QALY losses of adding DL to US & POX US screening data were based on real-world operational routine reports (as opposed to research studies). The DL screening cost of 145 USD was based on Israeli US costs plus 20.54 USD for reading and recording screens. RESULTS: The addition of DL assisted US, which is associated with increased sensitivity (95% vs 58.1%), resulted in far fewer undiagnosed infants (16 vs 102 [or 2.9% vs 15.4%] of the 560 and 659 births, respectively). Adoption of DL-US will add 1,204 QALYs. with increased screening costs 22.5 million USD largely offset by decreased treatment costs (20.4 million USD). Therefore, the new DL-US technology is considered "very cost-effective", costing only 1,720 USD per QALY. For most performance combinations (sensitivity > 80%, specificity > 90%), the adoption of DL-US is either cost effective or very cost effective. For specificities greater than 98% (with sensitivities above 94%), DL-US (& POX) is said to "dominate" US (& POX) by providing more QALYs at a lower cost. CONCLUSION: Our exploratory CUA calculations indicate the feasibility of DL-US as being at least cost-effective.

2.
Ultrasound Med Biol ; 50(6): 805-816, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38467521

RESUMO

OBJECTIVE: Automated medical image analysis solutions should closely mimic complete human actions to be useful in clinical practice. However, more often an automated image analysis solution represents only part of a human task, which restricts its practical utility. In the case of ultrasound-based fetal biometry, an automated solution should ideally recognize key fetal structures in freehand video guidance, select a standard plane from a video stream and perform biometry. A complete automated solution should automate all three subactions. METHODS: In this article, we consider how to automate the complete human action of first-trimester biometry measurement from real-world freehand ultrasound. In the proposed hybrid convolutional neural network (CNN) architecture design, a classification regression-based guidance model detects and tracks fetal anatomical structures (using visual cues) in the ultrasound video. Several high-quality standard planes that contain the mid-sagittal view of the fetus are sampled at multiple time stamps (using a custom-designed confident-frame detector) based on the estimated probability values associated with predicted anatomical structures that define the biometry plane. Automated semantic segmentation is performed on the selected frames to extract fetal anatomical landmarks. A crown-rump length (CRL) estimate is calculated as the mean CRL from these multiple frames. RESULTS: Our fully automated method has a high correlation with clinical expert CRL measurement (Pearson's p = 0.92, R-squared [R2] = 0.84) and a low mean absolute error of 0.834 (weeks) for fetal age estimation on a test data set of 42 videos. CONCLUSION: A novel algorithm for standard plane detection employs a quality detection mechanism defined by clinical standards, ensuring precise biometric measurements.


Assuntos
Biometria , Primeiro Trimestre da Gravidez , Ultrassonografia Pré-Natal , Humanos , Ultrassonografia Pré-Natal/métodos , Feminino , Gravidez , Biometria/métodos , Processamento de Imagem Assistida por Computador/métodos , Redes Neurais de Computação , Feto/diagnóstico por imagem , Feto/anatomia & histologia
3.
Med Image Anal ; 90: 102977, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37778101

RESUMO

In obstetric sonography, the quality of acquisition of ultrasound scan video is crucial for accurate (manual or automated) biometric measurement and fetal health assessment. However, the nature of fetal ultrasound involves free-hand probe manipulation and this can make it challenging to capture high-quality videos for fetal biometry, especially for the less-experienced sonographer. Manually checking the quality of acquired videos would be time-consuming, subjective and requires a comprehensive understanding of fetal anatomy. Thus, it would be advantageous to develop an automatic quality assessment method to support video standardization and improve diagnostic accuracy of video-based analysis. In this paper, we propose a general and purely data-driven video-based quality assessment framework which directly learns a distinguishable feature representation from high-quality ultrasound videos alone, without anatomical annotations. Our solution effectively utilizes both spatial and temporal information of ultrasound videos. The spatio-temporal representation is learned by a bi-directional reconstruction between the video space and the feature space, enhanced by a key-query memory module proposed in the feature space. To further improve performance, two additional modalities are introduced in training which are the sonographer gaze and optical flow derived from the video. Two different clinical quality assessment tasks in fetal ultrasound are considered in our experiments, i.e., measurement of the fetal head circumference and cerebellar diameter; in both of these, low-quality videos are detected by the large reconstruction error in the feature space. Extensive experimental evaluation demonstrates the merits of our approach.


Assuntos
Feto , Ultrassonografia Pré-Natal , Gravidez , Feminino , Humanos , Ultrassonografia Pré-Natal/métodos , Feto/diagnóstico por imagem , Ultrassonografia
4.
Med Image Anal ; 90: 102981, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37863638

RESUMO

In this work, we exploit multi-task learning to jointly predict the two decision-making processes of gaze movement and probe manipulation that an experienced sonographer would perform in routine obstetric scanning. A multimodal guidance framework, Multimodal-GuideNet, is proposed to detect the causal relationship between a real-world ultrasound video signal, synchronized gaze, and probe motion. The association between the multi-modality inputs is learned and shared through a modality-aware spatial graph that leverages useful cross-modal dependencies. By estimating the probability distribution of probe and gaze movements in real scans, the predicted guidance signals also allow inter- and intra-sonographer variations and avoid a fixed scanning path. We validate the new multi-modality approach on three types of obstetric scanning examinations, and the result consistently outperforms single-task learning under various guidance policies. To simulate sonographer's attention on multi-structure images, we also explore multi-step estimation in gaze guidance, and its visual results show that the prediction allows multiple gaze centers that are substantially aligned with underlying anatomical structures.


Assuntos
Atenção , Aprendizagem , Feminino , Gravidez , Humanos , Ultrassonografia Pré-Natal , Ultrassonografia
5.
Proc Mach Learn Res ; 210: 184-198, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37252341

RESUMO

We present a method for classifying human skill at fetal ultrasound scanning from eye-tracking and pupillary data of sonographers. Human skill characterization for this clinical task typically creates groupings of clinician skills such as expert and beginner based on the number of years of professional experience; experts typically have more than 10 years and beginners between 0-5 years. In some cases, they also include trainees who are not yet fully-qualified professionals. Prior work has considered eye movements that necessitates separating eye-tracking data into eye movements, such as fixations and saccades. Our method does not use prior assumptions about the relationship between years of experience and does not require the separation of eye-tracking data. Our best performing skill classification model achieves an F1 score of 98% and 70% for expert and trainee classes respectively. We also show that years of experience as a direct measure of skill, is significantly correlated to the expertise of a sonographer.

6.
IEEE Trans Med Imaging ; 42(5): 1301-1313, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36455084

RESUMO

Obstetric ultrasound assessment of fetal anatomy in the first trimester of pregnancy is one of the less explored fields in obstetric sonography because of the paucity of guidelines on anatomical screening and availability of data. This paper, for the first time, examines imaging proficiency and practices of first trimester ultrasound scanning through analysis of full-length ultrasound video scans. Findings from this study provide insights to inform the development of more effective user-machine interfaces, of targeted assistive technologies, as well as improvements in workflow protocols for first trimester scanning. Specifically, this paper presents an automated framework to model operator clinical workflow from full-length routine first-trimester fetal ultrasound scan videos. The 2D+t convolutional neural network-based architecture proposed for video annotation incorporates transfer learning and spatio-temporal (2D+t) modelling to automatically partition an ultrasound video into semantically meaningful temporal segments based on the fetal anatomy detected in the video. The model results in a cross-validation A1 accuracy of 96.10% , F1=0.95 , precision =0.94 and recall =0.95 . Automated semantic partitioning of unlabelled video scans (n=250) achieves a high correlation with expert annotations ( ρ = 0.95, p=0.06 ). Clinical workflow patterns, operator skill and its variability can be derived from the resulting representation using the detected anatomy labels, order, and distribution. It is shown that nuchal translucency (NT) is the toughest standard plane to acquire and most operators struggle to localize high-quality frames. Furthermore, it is found that newly qualified operators spend 25.56% more time on key biometry tasks than experienced operators.


Assuntos
Medição da Translucência Nucal , Ultrassonografia Pré-Natal , Gravidez , Feminino , Humanos , Primeiro Trimestre da Gravidez , Fluxo de Trabalho , Ultrassonografia Pré-Natal/métodos , Medição da Translucência Nucal/métodos , Aprendizado de Máquina
7.
Med Image Anal ; 82: 102630, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36223683

RESUMO

In this work, we present a novel gaze-assisted natural language processing (NLP)-based video captioning model to describe routine second-trimester fetal ultrasound scan videos in a vocabulary of spoken sonography. The primary novelty of our multi-modal approach is that the learned video captioning model is built using a combination of ultrasound video, tracked gaze and textual transcriptions from speech recordings. The textual captions that describe the spatio-temporal scan video content are learnt from sonographer speech recordings. The generation of captions is assisted by sonographer gaze-tracking information reflecting their visual attention while performing live-imaging and interpreting a frozen image. To evaluate the effect of adding, or withholding, different forms of gaze on the video model, we compare spatio-temporal deep networks trained using three multi-modal configurations, namely: (1) a gaze-less neural network with only text and video as input, (2) a neural network additionally using real sonographer gaze in the form of attention maps, and (3) a neural network using automatically-predicted gaze in the form of saliency maps instead. We assess algorithm performance through established general text-based metrics (BLEU, ROUGE-L, F1 score), a domain-specific metric (ARS), and metrics that consider the richness and efficiency of the generated captions with respect to the scan video. Results show that the proposed gaze-assisted models can generate richer and more diverse captions for clinical fetal ultrasound scan videos than those without gaze at the expense of the perceived sentence structure. The results also show that the generated captions are similar to sonographer speech in terms of discussing the visual content and the scanning actions performed.


Assuntos
Algoritmos , Redes Neurais de Computação , Humanos , Gravidez , Feminino , Ultrassonografia Pré-Natal
8.
Int J Comput Assist Radiol Surg ; 17(8): 1437-1444, 2022 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-35556206

RESUMO

PURPOSE: For highly operator-dependent ultrasound scanning, skill assessment approaches evaluate operator competence given available data, such as acquired images and tracked probe movement. Operator skill level can be quantified by the completeness, speed, and precision of performing a clinical task, such as biometry. Such clinical tasks are increasingly becoming assisted or even replaced by automated machine learning models. In addition to measurement, operators need to be competent at the upstream task of acquiring images of sufficient quality. To provide computer assistance for this task requires a new definition of skill. METHODS: This paper focuses on the task of selecting ultrasound frames for biometry, for which operator skill is assessed by quantifying how well the tasks are performed with neural network-based frame classifiers. We first develop a frame classification model for each biometry task, using a novel label-efficient training strategy. Once these task models are trained, we propose a second task model-specific network to predict two skill assessment scores, based on the probability of identifying positive frames and accuracy of model classification. RESULTS: We present comprehensive results to demonstrate the efficacy of both the frame-classification and skill-assessment networks, using clinically acquired data from two biometry tasks for a total of 139 subjects, and compare the proposed skill assessment with metrics of operator experience. CONCLUSION: Task model-specific skill assessment is feasible and can be predicted by the proposed neural networks, which provide objective assessment that is a stronger indicator of task model performance, compared to existing skill assessment methods.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Feminino , Humanos , Gravidez , Análise e Desempenho de Tarefas , Ultrassonografia Pré-Natal/métodos
9.
Ultrasound Med Biol ; 48(6): 1157-1162, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35300877

RESUMO

SlowflowHD is a new ultrasound Doppler imaging technology that allows visualization of flow within small blood vessels. In this mode, a proprietary algorithm differentiates between low-speed flow and signals attributed to tissue motion so that microvessel vasculature can be examined. Our objectives were to describe the low-velocity Doppler mode principles, to assess the bone thermal index (TIb) safety parameter in obstetric ultrasound scans and to evaluate adherence to professional guidelines. To achieve the latter goals, we retrospectively reviewed prospectively collected ultrasound images and video clips from pregnancy ultrasound scans at >10 wk of gestation over 4 mo. We used a custom-built optical character recognition-based software to automatically identify all images and video clips using this technology and extract the TIb. Overall, a total of 185 ultrasound scans performed by three fetal medicine physicians were included, of which 60, 54 and 71 scans were first-, second- and third-trimester scans, respectively. The mean (highest recorded) TIb values were 0.32 (0.70), 0.23 (0.70) and 0.32 (0.60) in the first, second, and third trimesters, respectively. Thermal index values were within recommended values set by the World Federation for Ultrasound in Medicine and Biology American Institute of Ultrasound in Medicine and British Medical Ultrasound Society in all scans.


Assuntos
Obstetrícia , Feminino , Humanos , Gravidez , Terceiro Trimestre da Gravidez , Estudos Retrospectivos , Ultrassonografia Doppler , Ultrassonografia Pré-Natal/métodos , Estados Unidos
10.
Med Image Comput Comput Assist Interv ; 2022: 104-114, 2022 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-37223131

RESUMO

Ultrasound (US)-probe motion estimation is a fundamental problem in automated standard plane locating during obstetric US diagnosis. Most recent existing recent works employ deep neural network (DNN) to regress the probe motion. However, these deep regressionbased methods leverage the DNN to overfit on the specific training data, which is naturally lack of generalization ability for the clinical application. In this paper, we are back to generalized US feature learning rather than deep parameter regression. We propose a self-supervised learned local detector and descriptor, named USPoint, for US-probe motion estimation during the fine-adjustment phase of fetal plane acquisition. Specifically, a hybrid neural architecture is designed to simultaneously extract a local feature, and further estimate the probe motion. By embedding a differentiable USPoint-based motion estimation inside the proposed network architecture, the USPoint learns the keypoint detector, scores and descriptors from motion error alone, which doesn't require expensive human-annotation of local features. The two tasks, local feature learning and motion estimation, are jointly learned in a unified framework to enable collaborative learning with the aim of mutual benefit. To the best of our knowledge, it is the first learned local detector and descriptor tailored for the US image. Experimental evaluation on real clinical data demonstrates the resultant performance improvement on feature matching and motion estimation for potential clinical value. A video demo can be found online: https://youtu.be/JGzHuTQVlBs.

11.
Comput Vis ECCV ; 2022: 422-436, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37250853

RESUMO

Self-supervised contrastive representation learning offers the advantage of learning meaningful visual representations from unlabeled medical datasets for transfer learning. However, applying current contrastive learning approaches to medical data without considering its domain-specific anatomical characteristics may lead to visual representations that are inconsistent in appearance and semantics. In this paper, we propose to improve visual representations of medical images via anatomy-aware contrastive learning (AWCL), which incorporates anatomy information to augment the positive/negative pair sampling in a contrastive learning manner. The proposed approach is demonstrated for automated fetal ultrasound imaging tasks, enabling the positive pairs from the same or different ultrasound scans that are anatomically similar to be pulled together and thus improving the representation learning. We empirically investigate the effect of inclusion of anatomy information with coarse- and fine-grained granularity, for contrastive learning and find that learning with fine-grained anatomy information which preserves intra-class difference is more effective than its counterpart. We also analyze the impact of anatomy ratio on our AWCL framework and find that using more distinct but anatomically similar samples to compose positive pairs results in better quality representations. Extensive experiments on a large-scale fetal ultrasound dataset demonstrate that our approach is effective for learning representations that transfer well to three clinical downstream tasks, and achieves superior performance compared to ImageNet supervised and the current state-of-the-art contrastive learning methods. In particular, AWCL outperforms ImageNet supervised method by 13.8% and state-of-the-art contrastive-based method by 7.1% on a cross-domain segmentation task. The code is available at https://github.com/JianboJiao/AWCL.

12.
Artigo em Inglês | MEDLINE | ID: mdl-36812105

RESUMO

We present a method for skill characterisation of sonographer gaze patterns while performing routine second trimester fetal anatomy ultrasound scans. The position and scale of fetal anatomical planes during each scan differ because of fetal position, movements and sonographer skill. A standardised reference is required to compare recorded eye-tracking data for skill characterisation. We propose using an affine transformer network to localise the anatomy circumference in video frames, for normalisation of eye-tracking data. We use an event-based data visualisation, time curves, to characterise sonographer scanning patterns. We chose brain and heart anatomical planes because they vary in levels of gaze complexity. Our results show that when sonographers search for the same anatomical plane, even though the landmarks visited are similar, their time curves display different visual patterns. Brain planes also, on average, have more events or landmarks occurring than the heart, which highlights anatomy-specific differences in searching approaches.

13.
Med Image Underst Anal (2022) ; 13413: 187-198, 2022 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-36848308

RESUMO

Medical image captioning models generate text to describe the semantic contents of an image, aiding the non-experts in understanding and interpretation. We propose a weakly-supervised approach to improve the performance of image captioning models on small image-text datasets by leveraging a large anatomically-labelled image classification dataset. Our method generates pseudo-captions (weak labels) for caption-less but anatomically-labelled (class-labelled) images using an encoder-decoder sequence-to-sequence model. The augmented dataset is used to train an image-captioning model in a weakly supervised learning manner. For fetal ultrasound, we demonstrate that the proposed augmentation approach outperforms the baseline on semantics and syntax-based metrics, with nearly twice as much improvement in value on BLEU-1 and ROUGE-L. Moreover, we observe that superior models are trained with the proposed data augmentation, when compared with the existing regularization techniques. This work allows seamless automatic annotation of images that lack human-prepared descriptive captions for training image-captioning models. Using pseudo-captions in the training data is particularly useful for medical image captioning when significant time and effort of medical experts is required to obtain real image captions.

14.
Artigo em Inglês | MEDLINE | ID: mdl-36643818

RESUMO

In this paper we develop a multi-modal video analysis algorithm to predict where a sonographer should look next. Our approach uses video and expert knowledge, defined by gaze tracking data, which is acquired during routine first-trimester fetal ultrasound scanning. Specifically, we propose a spatio-temporal convolutional LSTMU-Net neural network (cLSTMU-Net) for video saliency prediction with stochastic augmentation. The architecture design consists of a U-Net based encoder-decoder network and a cLSTM to take into account temporal information. We compare the performance of the cLSTMU-Net alongside spatial-only architectures for the task of predicting gaze in first trimester ultrasound videos. Our study dataset consists of 115 clinically acquired first trimester US videos and a total of 45, 666 video frames. We adopt a Random Augmentation strategy (RA) from a stochastic augmentation policy search to improve model performance and reduce over-fitting. The proposed cLSTMU-Net using a video clip of 6 frames outperforms the baseline approach on all saliency metrics: KLD, SIM, NSS and CC (2.08, 0.28, 4.53 and 0.42 versus 2.16, 0.27, 4.34 and 0.39).

15.
Artigo em Inglês | MEDLINE | ID: mdl-36643819

RESUMO

This study presents a novel approach to automatic detection and segmentation of the Crown Rump Length (CRL) and Nuchal Translucency (NT), two essential measurements in the first trimester US scan. The proposed method automatically localises a standard plane within a video clip as defined by the UK Fetal Abnormality Screening Programme. A Nested Hourglass (NHG) based network performs semantic pixel-wise segmentation to extract NT and CRL structures. Our results show that the NHG network is faster (19.52% < GFlops than FCN32) and offers high pixel agreement (mean-IoU=80.74) with expert manual annotations.

16.
Artigo em Inglês | MEDLINE | ID: mdl-36649381

RESUMO

Visualising patterns in clinicians' eye movements while interpreting fetal ultrasound imaging videos is challenging. Across and within videos, there are differences in size an d position of Areas-of-Interest (AOIs) due to fetal position, movement and sonographer skill. Currently, AOIs are manually labelled or identified using eye-tracker manufacturer specifications which are not study specific. We propose using unsupervised clustering to identify meaningful AOIs and bi-contour plots to visualise spatio-temporal gaze characteristics. We use Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) to identify the AOIs, and use their corresponding images to capture granular changes within each AOI. Then we visualise transitions within and between AOIs as read by the sonographer. We compare our method to a standardised eye-tracking manufacturer algorithm. Our method captures granular changes in gaze characteristics which are otherwise not shown. Our method is suitable for exploratory data analysis of eye-tracking data involving multiple participants and AOIs.

17.
Med Image Comput Comput Assist Interv ; 13437: 94-103, 2022 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-36649382

RESUMO

Eye trackers can provide visual guidance to sonographers during ultrasound (US) scanning. Such guidance is potentially valuable for less experienced operators to improve their scanning skills on how to manipulate the probe to achieve the desired plane. In this paper, a multimodal guidance approach (Multimodal-GuideNet) is proposed to capture the stepwise dependency between a real-world US video signal, synchronized gaze, and probe motion within a unified framework. To understand the causal relationship between gaze movement and probe motion, our model exploits multitask learning to jointly learn two related tasks: predicting gaze movements and probe signals that an experienced sonographer would perform in routine obstetric scanning. The two tasks are associated by a modality-aware spatial graph to detect the co-occurrence among the multi-modality inputs and share useful cross-modal information. Instead of a deterministic scanning path, Multimodal-GuideNet allows for scanning diversity by estimating the probability distribution of real scans. Experiments performed with three typical obstetric scanning examinations show that the new approach outperforms single-task learning for both probe motion guidance and gaze movement prediction. Multimodal-GuideNet also provides a visual guidance signal with an error rate of less than 10 pixels for a 224 × 288 US image.

18.
Med Image Comput Comput Assist Interv ; 13434: 228-237, 2022 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-36649384

RESUMO

Video quality assurance is an important topic in obstetric ultrasound imaging to ensure that captured videos are suitable for biometry and fetal health assessment. Previously, one successful objective approach to automated ultrasound image quality assurance has considered it as a supervised learning task of detecting anatomical structures defined by a clinical protocol. In this paper, we propose an alternative and purely data-driven approach that makes effective use of both spatial and temporal information and the model learns from high-quality videos without any anatomy-specific annotations. This makes it attractive for potentially scalable generalisation. In the proposed model, a 3D encoder and decoder pair bi-directionally learns a spatio-temporal representation between the video space and the feature space. A zoom-in module is introduced to encourage the model to focus on the main object in a frame. A further design novelty is the introduction of two additional modalities in model training (sonographer gaze and optical flow derived from the video). Finally, our approach is applied to identify high-quality videos for fetal head circumference measurement in freehand second-trimester ultrasound scans. Extensive experiments are conducted, and the results demonstrate the effectiveness of our approach with an AUC of 0.911.

19.
Med Image Underst Anal (2021) ; 2021: 361-374, 2021 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-34476423

RESUMO

While performing an ultrasound (US) scan, sonographers direct their gaze at regions of interest to verify that the correct plane is acquired and to interpret the acquisition frame. Predicting sonographer gaze on US videos is useful for identification of spatio-temporal patterns that are important for US scanning. This paper investigates utilizing sonographer gaze, in the form of gaze-tracking data, in a multimodal imaging deep learning framework to assist the analysis of the first trimester fetal ultrasound scan. Specifically, we propose an encoderdecoder convolutional neural network with skip connections to predict the visual gaze for each frame using 115 first trimester ultrasound videos; 29,250 video frames for training, 7,290 for validation and 9,126 for testing. We find that the dataset of our size benefits from automated data augmentation, which in turn, alleviates model overfitting and reduces structural variation imbalance of US anatomical views between the training and test datasets. Specifically, we employ a stochastic augmentation policy search method to improve segmentation performance. Using the learnt policies, our models outperform the baseline: KLD, SIM, NSS and CC (2.16, 0.27, 4.34 and 0.39 versus 3.17, 0.21, 2.92 and 0.28).

20.
Proc IEEE Int Symp Biomed Imaging ; 2021: 716-720, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-34413932

RESUMO

We propose a curriculum learning captioning method to caption fetal ultrasound images by training a model to dynamically transition between two different modalities (image and text) as training progresses. Specifically, we propose a course-focused dual curriculum method, where a course is training with a curriculum based on only one of the two modalities involved in image captioning. We compare two configurations of the course-focused dual curriculum; an image-first course-focused dual curriculum which prepares the early training batches primarily on the complexity of the image information before slowly introducing an order of batches for training based on the complexity of the text information, and a text-first course-focused dual curriculum which operates in reverse. The evaluation results show that dynamically transitioning between text and images over epochs of training improves results when compared to the scenario where both modalities are considered in equal measure in every epoch.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...