Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Más filtros













Base de datos
Intervalo de año de publicación
1.
IEEE Trans Image Process ; 33: 2018-2031, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38470593

RESUMEN

Self-supervised Object Segmentation (SOS) aims to segment objects without any annotations. Under conditions of multi-camera inputs, the structural, textural and geometrical consistency among each view can be leveraged to achieve fine-grained object segmentation. To make better use of the above information, we propose Surface representation based Self-supervised Object Segmentation (Surface-SOS), a new framework to segment objects for each view by 3D surface representation from multi-view images of a scene. To model high-quality geometry surfaces for complex scenes, we design a novel scene representation scheme, which decomposes the scene into two complementary neural representation modules respectively with a Signed Distance Function (SDF). Moreover, Surface-SOS is able to refine single-view segmentation with multi-view unlabeled images, by introducing coarse segmentation masks as additional input. To the best of our knowledge, Surface-SOS is the first self-supervised approach that leverages neural surface representation to break the dependence on large amounts of annotated data and strong constraints. These constraints typically involve observing target objects against a static background or relying on temporal supervision in videos. Extensive experiments on standard benchmarks including LLFF, CO3D, BlendedMVS, TUM and several real-world scenes show that Surface-SOS always yields finer object masks than its NeRF-based counterparts and surpasses supervised single-view baselines remarkably. Code is available at: https://github.com/zhengxyun/Surface-SOS.

2.
Comput Vis ECCV ; 2022: 422-436, 2022 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37250853

RESUMEN

Self-supervised contrastive representation learning offers the advantage of learning meaningful visual representations from unlabeled medical datasets for transfer learning. However, applying current contrastive learning approaches to medical data without considering its domain-specific anatomical characteristics may lead to visual representations that are inconsistent in appearance and semantics. In this paper, we propose to improve visual representations of medical images via anatomy-aware contrastive learning (AWCL), which incorporates anatomy information to augment the positive/negative pair sampling in a contrastive learning manner. The proposed approach is demonstrated for automated fetal ultrasound imaging tasks, enabling the positive pairs from the same or different ultrasound scans that are anatomically similar to be pulled together and thus improving the representation learning. We empirically investigate the effect of inclusion of anatomy information with coarse- and fine-grained granularity, for contrastive learning and find that learning with fine-grained anatomy information which preserves intra-class difference is more effective than its counterpart. We also analyze the impact of anatomy ratio on our AWCL framework and find that using more distinct but anatomically similar samples to compose positive pairs results in better quality representations. Extensive experiments on a large-scale fetal ultrasound dataset demonstrate that our approach is effective for learning representations that transfer well to three clinical downstream tasks, and achieves superior performance compared to ImageNet supervised and the current state-of-the-art contrastive learning methods. In particular, AWCL outperforms ImageNet supervised method by 13.8% and state-of-the-art contrastive-based method by 7.1% on a cross-domain segmentation task. The code is available at https://github.com/JianboJiao/AWCL.

3.
IEEE Trans Pattern Anal Mach Intell ; 44(7): 3791-3806, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-33566757

RESUMEN

This paper proposes a novel pretext task to address the self-supervised video representation learning problem. Specifically, given an unlabeled video clip, we compute a series of spatio-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion, the spatial location and dominant color of the largest color diversity along the temporal axis, etc. Then a neural network is built and trained to yield the statistical summaries given the video frames as inputs. In order to alleviate the learning difficulty, we employ several spatial partitioning patterns to encode rough spatial locations instead of exact spatial Cartesian coordinates. Our approach is inspired by the observation that human visual system is sensitive to rapidly changing contents in the visual field, and only needs impressions about rough spatial locations to understand the visual contents. To validate the effectiveness of the proposed approach, we conduct extensive experiments with four 3D backbone networks, i.e., C3D, 3D-ResNet, R(2+1)D and S3D-G. The results show that our approach outperforms the existing approaches across these backbone networks on four downstream video analysis tasks including action recognition, video retrieval, dynamic scene recognition, and action similarity labeling. The source code is publicly available at: https://github.com/laura-wang/video_repres_sts.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Humanos , Movimiento (Física) , Programas Informáticos
4.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 8082-8096, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-34033532

RESUMEN

Weakly supervised semantic segmentation is receiving great attention due to its low human annotation cost. In this paper, we aim to tackle bounding box supervised semantic segmentation, i.e., training accurate semantic segmentation models using bounding box annotations as supervision. To this end, we propose affinity attention graph neural network ( A2GNN). Following previous practices, we first generate pseudo semantic-aware seeds, which are then formed into semantic graphs based on our newly proposed affinity Convolutional Neural Network (CNN). Then the built graphs are input to our A2GNN, in which an affinity attention layer is designed to acquire the short- and long- distance information from soft graph edges to accurately propagate semantic labels from the confident seeds to the unlabeled pixels. However, to guarantee the precision of the seeds, we only adopt a limited number of confident pixel seed labels for A2GNN, which may lead to insufficient supervision for training. To alleviate this issue, we further introduce a new loss function and a consistency-checking mechanism to leverage the bounding box constraint, so that more reliable guidance can be included for the model optimization. Experiments show that our approach achieves new state-of-the-art performances on Pascal VOC 2012 datasets (val: 76.5 percent, test: 75.2 percent). More importantly, our approach can be readily applied to bounding box supervised instance segmentation task or other weakly supervised semantic segmentation tasks, with state-of-the-art or comparable performance among almot all weakly supervised tasks on PASCAL VOC or COCO dataset. Our source code will be available at https://github.com/zbf1991/A2GNN.


Asunto(s)
Aprendizaje Automático Supervisado , Compuestos Orgánicos Volátiles , Algoritmos , Atención , Humanos , Procesamiento de Imagen Asistido por Computador , Redes Neurales de la Computación , Semántica
5.
IEEE J Biomed Health Inform ; 26(4): 1591-1601, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-34495853

RESUMEN

Fetal alcohol syndrome (FAS) caused by prenatal alcohol exposure can result in a series of cranio-facial anomalies, and behavioral and neurocognitive problems. Current diagnosis of FAS is typically done by identifying a set of facial characteristics, which are often obtained by manual examination. Anatomical landmark detection, which provides rich geometric information, is important to detect the presence of FAS associated facial anomalies. This imaging application is characterized by large variations in data appearance and limited availability of labeled data. Current deep learning-based heatmap regression methods designed for facial landmark detection in natural images assume availability of large datasets and are therefore not well-suited for this application. To address this restriction, we develop a new regularized transfer learning approach that exploits the knowledge of a network learned on large facial recognition datasets. In contrast to standard transfer learning which focuses on adjusting the pre-trained weights, the proposed learning approach regularizes the model behavior. It explicitly reuses the rich visual semantics of a domain-similar source model on the target task data as an additional supervisory signal for regularizing landmark detection optimization. Specifically, we develop four regularization constraints for the proposed transfer learning, including constraining the feature outputs from classification and intermediate layers, as well as matching activation attention maps in both spatial and channel levels. Experimental evaluation on a collected clinical imaging dataset demonstrate that the proposed approach can effectively improve model generalizability under limited training samples, and is advantageous to other approaches in the literature.


Asunto(s)
Trastornos del Espectro Alcohólico Fetal , Efectos Tardíos de la Exposición Prenatal , Cara/diagnóstico por imagen , Femenino , Trastornos del Espectro Alcohólico Fetal/diagnóstico por imagen , Humanos , Aprendizaje Automático , Embarazo , Semántica
6.
IEEE Trans Image Process ; 30: 6700, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34339368

RESUMEN

In the above article [1], unfortunately, Fig. 5 was not displayed correctly with many empty images. The correct version is supplemented here.

7.
IEEE Trans Image Process ; 30: 6024-6035, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34181543

RESUMEN

Existing GAN-based multi-view face synthesis methods rely heavily on "creating" faces, and thus they struggle in reproducing the faithful facial texture and fail to preserve identity when undergoing a large angle rotation. In this paper, we combat this problem by dividing the challenging large-angle face synthesis into a series of easy small-angle rotations, and each of them is guided by a face flow to maintain faithful facial details. In particular, we propose a Face Flow-guided Generative Adversarial Network (FFlowGAN) that is specifically trained for small-angle synthesis. The proposed network consists of two modules, a face flow module that aims to compute a dense correspondence between the input and target faces. It provides strong guidance to the second module, face synthesis module, for emphasizing salient facial texture. We apply FFlowGAN multiple times to progressively synthesize different views, and therefore facial features can be propagated to the target view from the very beginning. All these multiple executions are cascaded and trained end-to-end with a unified back-propagation, and thus we ensure each intermediate step contributes to the final result. Extensive experiments demonstrate the proposed divide-and-conquer strategy is effective, and our method outperforms the state-of-the-art on four benchmark datasets qualitatively and quantitatively.

8.
IEEE/ACM Trans Comput Biol Bioinform ; 18(5): 1914-1923, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-31841420

RESUMEN

Tumor metastases detection is of great importance for the treatment of breast cancer patients. Various CNN (convolutional neural network) based methods get excellent performance in object detection/segmentation. However, the detection of metastases in hematoxylin and eosin (H&E) stained whole-slide images (WSI) is still challenging mainly due to two aspects. (1) The resolution of the image is too large. (2) lacking labeled training data. Whole-slide images generally stored in a multi-resolution structure with multiple downsampled tiles. It is difficult to feed the whole image into memory without compression. Moreover, labeling images for the pathologists are time-consuming and expensive. In this paper, we study the problem of detecting breast cancer metastases in the pathological image on patch level. To address the abovementioned challenges, we propose a few-shot learning method to classify whether an image patch contains tumor cells. Specifically, we propose a patch-level unsupervised cell ranking approach, which only relies on images with limited labels. The main idea of the proposed method is that when cropping a patch A from the WSI and further cropping a sub-patch B from A, the cell number of A is always larger than that of B. Based on this observation, we make use of the unlabeled images to learn the ranking information of cell counting to extract the abstract features. Experimental results show that our method is effective to improve the patch-level classification accuracy, compared to the traditional supervised method. The source code is publicly available at https://github.com/fewshot-camelyon.


Asunto(s)
Neoplasias de la Mama , Recuento de Células/métodos , Interpretación de Imagen Asistida por Computador/métodos , Metástasis de la Neoplasia , Aprendizaje Automático no Supervisado , Algoritmos , Neoplasias de la Mama/diagnóstico por imagen , Neoplasias de la Mama/patología , Femenino , Histocitoquímica , Humanos , Ganglios Linfáticos/diagnóstico por imagen , Ganglios Linfáticos/patología , Metástasis de la Neoplasia/diagnóstico por imagen , Metástasis de la Neoplasia/patología , Redes Neurales de la Computación
9.
Med Image Comput Comput Assist Interv ; 12263: 534-543, 2020 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-33103162

RESUMEN

In medical imaging, manual annotations can be expensive to acquire and sometimes infeasible to access, making conventional deep learning-based models difficult to scale. As a result, it would be beneficial if useful representations could be derived from raw data without the need for manual annotations. In this paper, we propose to address the problem of self-supervised representation learning with multi-modal ultrasound video-speech raw data. For this case, we assume that there is a high correlation between the ultrasound video and the corresponding narrative speech audio of the sonographer. In order to learn meaningful representations, the model needs to identify such correlation and at the same time understand the underlying anatomical features. We designed a framework to model the correspondence between video and audio without any kind of human annotations. Within this framework, we introduce cross-modal contrastive learning and an affinity-aware self-paced learning scheme to enhance correlation modelling. Experimental evaluations on multi-modal fetal ultrasound video and audio show that the proposed approach is able to learn strong representations and transfers well to downstream tasks of standard plane detection and eye-gaze prediction.

10.
Artículo en Inglés | MEDLINE | ID: mdl-33103166

RESUMEN

In this paper, we consider differentiating operator skill during fetal ultrasound scanning using probe motion tracking. We present a novel convolutional neural network-based deep learning framework to model ultrasound probe motion in order to classify operator skill levels, that is invariant to operators' personal scanning styles. In this study, probe motion data during routine second-trimester fetal ultrasound scanning was acquired by operators of known experience levels (2 newly-qualified operators and 10 expert operators). The results demonstrate that the proposed model can successfully learn underlying probe motion features that distinguish operator skill levels during routine fetal ultrasound with 95% accuracy.

11.
IEEE Trans Med Imaging ; 39(12): 4413-4424, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-32833630

RESUMEN

Fetal brain magnetic resonance imaging (MRI) offers exquisite images of the developing brain but is not suitable for second-trimester anomaly screening, for which ultrasound (US) is employed. Although expert sonographers are adept at reading US images, MR images which closely resemble anatomical images are much easier for non-experts to interpret. Thus in this article we propose to generate MR-like images directly from clinical US images. In medical image analysis such a capability is potentially useful as well, for instance for automatic US-MRI registration and fusion. The proposed model is end-to-end trainable and self-supervised without any external annotations. Specifically, based on an assumption that the US and MRI data share a similar anatomical latent space, we first utilise a network to extract the shared latent features, which are then used for MRI synthesis. Since paired data is unavailable for our study (and rare in practice), pixel-level constraints are infeasible to apply. We instead propose to enforce the distributions to be statistically indistinguishable, by adversarial learning in both the image domain and feature space. To regularise the anatomical structures between US and MRI during synthesis, we further propose an adversarial structural constraint. A new cross-modal attention technique is proposed to utilise non-local spatial information, by encouraging multi-modal knowledge fusion and propagation. We extend the approach to consider the case where 3D auxiliary information (e.g., 3D neighbours and a 3D location index) from volumetric data is also available, and show that this improves image synthesis. The proposed approach is evaluated quantitatively and qualitatively with comparison to real fetal MR images and other approaches to synthesis, demonstrating its feasibility of synthesising realistic MR images.


Asunto(s)
Imagen por Resonancia Magnética , Neuroimagen , Encéfalo/diagnóstico por imagen , Feto/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador , Ultrasonografía
12.
Proc IEEE Int Symp Biomed Imaging ; 2020: 1847-1850, 2020 Apr 03.
Artículo en Inglés | MEDLINE | ID: mdl-32489519

RESUMEN

Recent advances in deep learning have achieved promising performance for medical image analysis, while in most cases ground-truth annotations from human experts are necessary to train the deep model. In practice, such annotations are expensive to collect and can be scarce for medical imaging applications. Therefore, there is significant interest in learning representations from unlabelled raw data. In this paper, we propose a self-supervised learning approach to learn meaningful and transferable representations from medical imaging video without any type of human annotation. We assume that in order to learn such a representation, the model should identify anatomical structures from the unlabelled data. Therefore we force the model to address anatomy-aware tasks with free supervision from the data itself. Specifically, the model is designed to correct the order of a reshuffled video clip and at the same time predict the geometric transformation applied to the video clip. Experiments on fetal ultrasound video show that the proposed approach can effectively learn meaningful and strong representations, which transfer well to downstream tasks like standard plane detection and saliency prediction.

13.
Artículo en Inglés | MEDLINE | ID: mdl-32365031

RESUMEN

In this paper, we propose a deep CNN to tackle the image restoration problem by learning formatted information. Previous deep learning based methods directly learn the mapping from corrupted images to clean images, and may suffer from the gradient exploding/vanishing problems of deep neural networks. We propose to address the image restoration problem by learning the structured details and recovering the latent clean image together, from the shared information between the corrupted image and the latent image. In addition, instead of learning the pure difference (corruption), we propose to add a residual formatting layer and an adversarial block to format the information to structured one, which allows the network to converge faster and boosts the performance. Furthermore, we propose a cross-level loss net to ensure both pixel-level accuracy and semantic-level visual quality. Evaluations on public datasets show that the proposed method performs favorably against existing approaches quantitatively and qualitatively.

14.
Artículo en Inglés | MEDLINE | ID: mdl-31944972

RESUMEN

Image denoising and high-level vision tasks are usually handled independently in the conventional practice of computer vision, and their connection is fragile. In this paper, we cope with the two jointly and explore the mutual influence between them with the focus on two questions, namely (1) how image denoising can help improving high-level vision tasks, and (2) how the semantic information from high-level vision tasks can be used to guide image denoising. First for image denoising we propose a convolutional neural network in which convolutions are conducted in various spatial resolutions via downsampling and upsampling operations in order to fuse and exploit contextual information on different scales. Second we propose a deep neural network solution that cascades two modules for image denoising and various high-level tasks, respectively, and use the joint loss for updating only the denoising network via backpropagation. We experimentally show that on one hand, the proposed denoiser has the generality to overcome the performance degradation of different high-level vision tasks. On the other hand, with the guidance of high-level vision information, the denoising network produces more visually appealing results. Extensive experiments demonstrate the benefit of exploiting image semantics simultaneously for image denoising and highlevel vision tasks via deep learning. The code is available online: https://github.com/Ding-Liu/DeepDenoising.

15.
Artículo en Inglés | MEDLINE | ID: mdl-31603784

RESUMEN

In this paper, we propose a novel method to generate stereoscopic images from light-field images with the intended depth range and simultaneously perform image super-resolution. Subject to the small baseline of neighboring subaperture views and low spatial resolution of light-field images captured using compact commercial light-field cameras, the disparity range of any two subaperture views is usually very small. We propose a method to control the disparity range of the target stereoscopic images with linear or nonlinear disparity scaling and properly resolve the disocclusion problem with the aid of a smooth energy term previously used for texture synthesis. The left and right views of the target stereoscopic image are simultaneously generated by a unified optimization framework, which preserves content coherence between the left and right views by a coherence energy term. The disparity range of the target stereoscopic image can be larger than that of the input light field image. This benefits many light field image-based applications, e.g., displaying light field images on various stereo display devices and generating stereoscopic panoramic images from a light field image montage. An extensive experimental evaluation demonstrates the effectiveness of our method.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA