Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Artículo en Inglés | MEDLINE | ID: mdl-38743544

RESUMEN

Early action prediction aiming to recognize which classes the actions belong to before they are fully conveyed is a very challenging task, owing to the insufficient discrimination information caused by the domain gaps among different temporally observed domains. Most of the existing approaches focus on using fully observed temporal domains to "guide" the partially observed domains while ignoring the discrepancies between the harder low-observed temporal domains and the easier highly observed temporal domains. The recognition models tend to learn the easier samples from the highly observed temporal domains and may lead to significant performance drops on low-observed temporal domains. Therefore, in this article, we propose a novel temporally observed domain contrastive network, namely, TODO-Net, to explicitly mine the discrimination information from the hard actions samples from the low-observed temporal domains by mitigating the domain gaps among various temporally observed domains for 3-D early action prediction. More specifically, the proposed TODO-Net is able to mine the relationship between the low-observed sequences and all the highly observed sequences belonging to the same action category to boost the recognition performance of the hard samples with fewer observed frames. We also introduce a temporal domain conditioned supervised contrastive (TD-conditioned SupCon) learning scheme to empower our TODO-Net with the ability to minimize the gaps between the temporal domains within the same action categories, meanwhile pushing apart the temporal domains belonging to different action classes. We conduct extensive experiments on two public 3-D skeleton-based activity datasets, and the results show the efficacy of the proposed TODO-Net.

2.
IEEE Trans Image Process ; 32: 3254-3265, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37256800

RESUMEN

Early activity prediction/recognition aims to recognize action categories before they are fully conveyed. Compared to full-length action sequences, partial video sequences only provide insufficient discrimination information, which makes predicting the class labels for some similar activities challenging, especially when only very few frames can be observed. To address this challenge, in this paper, we propose a novel meta negative network, namely, Magi-Net, that utilizes a contrastive learning scheme to alleviate the insufficiency of discriminative information. In our Magi-Net model, the positive samples are generated by augmenting an input anchor conditioned on all observation ratios, while the negative samples are selected from a trainable negative look-up memory (LUM) table, which stores the training samples and the corresponding misleading categories. Furthermore, a meta negative sample optimization strategy (MetaSOS) is proposed to boost the training of Magi-Net by encouraging the model to learn from the most informative negative samples via a meta learning scheme. Extensive experiments are conducted on several public skeleton-based activity datasets, and the results show the efficacy of the proposed Magi-Net model.

3.
Sensors (Basel) ; 19(12)2019 Jun 13.
Artículo en Inglés | MEDLINE | ID: mdl-31200511

RESUMEN

You Only Look Once (YOLO) deep network can detect objects quickly with high precision and has been successfully applied in many detection problems. The main shortcoming of YOLO network is that YOLO network usually cannot achieve high precision when dealing with small-size object detection in high resolution images. To overcome this problem, we propose an effective region proposal extraction method for YOLO network to constitute an entire detection structure named ACF-PR-YOLO, and take the cyclist detection problem to show our methods. Instead of directly using the generated region proposals for classification or regression like most region proposal methods do, we generate large-size potential regions containing objects for the following deep network. The proposed ACF-PR-YOLO structure includes three main parts. Firstly, a region proposal extraction method based on aggregated channel feature (ACF) is proposed, called ACF based region proposal (ACF-PR) method. In ACF-PR, ACF is firstly utilized to fast extract candidates and then a bounding boxes merging and extending method is designed to merge the bounding boxes into correct region proposals for the following YOLO net. Secondly, we design suitable YOLO net for fine detection in the region proposals generated by ACF-PR. Lastly, we design a post-processing step, in which the results of YOLO net are mapped into the original image outputting the detection and localization results. Experiments performed on the Tsinghua-Daimler Cyclist Benchmark with high resolution images and complex scenes show that the proposed method outperforms the other tested representative detection methods in average precision, and that it outperforms YOLOv3 by 13.69 % average precision and outperforms SSD by 25.27 % average precision.

4.
Healthc Technol Lett ; 6(6): 275-279, 2019 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-32038871

RESUMEN

Surgical instrument detection in robot-assisted surgery videos is an import vision component for these systems. Most of the current deep learning methods focus on single-tool detection and suffer from low detection speed. To address this, the authors propose a novel frame-by-frame detection method using a cascading convolutional neural network (CNN) which consists of two different CNNs for real-time multi-tool detection. An hourglass network and a modified visual geometry group (VGG) network are applied to jointly predict the localisation. The former CNN outputs detection heatmaps representing the location of tool tip areas, and the latter performs bounding-box regression for tool tip areas on these heatmaps stacked with input RGB image frames. The authors' method is tested on the publicly available EndoVis Challenge dataset and the ATLAS Dione dataset. The experimental results show that their method achieves better performance than mainstream detection methods in terms of detection accuracy and speed.

5.
Sensors (Basel) ; 18(7)2018 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-30037154

RESUMEN

With rapid calculation speed and relatively high accuracy, the AdaBoost-based detection framework has been successfully applied in some real applications of machine vision-based intelligent systems. The main shortcoming of the AdaBoost-based detection framework is that the off-line trained detector cannot be transfer retrained to adapt to unknown application scenes. In this paper, a new transfer learning structure based on two novel methods of supplemental boosting and cascaded ConvNet is proposed to address this shortcoming. The supplemental boosting method is proposed to supplementally retrain an AdaBoost-based detector for the purpose of transferring a detector to adapt to unknown application scenes. The cascaded ConvNet is designed and attached to the end of the AdaBoost-based detector for improving the detection rate and collecting supplemental training samples. With the added supplemental training samples provided by the cascaded ConvNet, the AdaBoost-based detector can be retrained with the supplemental boosting method. The detector combined with the retrained boosted detector and cascaded ConvNet detector can achieve high accuracy and a short detection time. As a representative object detection problem in intelligent transportation systems, the traffic sign detection problem is chosen to show our method. Through experiments with the public datasets from different countries, we show that the proposed framework can quickly detect objects in unknown application scenes.

6.
Comput Assist Surg (Abingdon) ; 22(sup1): 26-35, 2017 12.
Artículo en Inglés | MEDLINE | ID: mdl-28937281

RESUMEN

BACKGROUND: Worldwide propagation of minimally invasive surgeries (MIS) is hindered by their drawback of indirect observation and manipulation, while monitoring of surgical instruments moving in the operated body required by surgeons is a challenging problem. Tracking of surgical instruments by vision-based methods is quite lucrative, due to its flexible implementation via software-based control with no need to modify instruments or surgical workflow. METHODS: A MIS instrument is conventionally split into a shaft and end-effector portions, while a 2D/3D tracking-by-detection framework is proposed, which performs the shaft tracking followed by the end-effector one. The former portion is described by line features via the RANSAC scheme, while the latter is depicted by special image features based on deep learning through a well-trained convolutional neural network. RESULTS: The method verification in 2D and 3D formulation is performed through the experiments on ex-vivo video sequences, while qualitative validation on in-vivo video sequences is obtained. CONCLUSION: The proposed method provides robust and accurate tracking, which is confirmed by the experimental results: its 3D performance in ex-vivo video sequences exceeds those of the available state-of -the-art methods. Moreover, the experiments on in-vivo sequences demonstrate that the proposed method can tackle the difficult condition of tracking with unknown camera parameters. Further refinements of the method will refer to the occlusion and multi-instrumental MIS applications.


Asunto(s)
Aprendizaje Profundo , Imagenología Tridimensional , Procedimientos Quirúrgicos Mínimamente Invasivos/instrumentación , Redes Neurales de la Computación , Instrumentos Quirúrgicos , Algoritmos , Endoscopios , Humanos , Laparoscopios , Procedimientos Quirúrgicos Mínimamente Invasivos/métodos
7.
J Opt Soc Am A Opt Image Sci Vis ; 33(8): 1430-41, 2016 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-27505640

RESUMEN

The nonsubsampled contourlet transform (NSCT) has properties of multiresolution, localization, directionality, and anisotropy. The directionality property permits it to resolve intrinsic directional features that characterize the analyzed image. In this paper, we present a bottom-up salient object detection approach fusing global and local information based on NSCT. Images are first decomposed by applying NSCT. The coefficients of bandpass subbands are categorized and optimized accordingly to get better representation. Then feature maps are obtained by performing the inverse NSCT on these optimized coefficients. The global and local saliency maps are generated from these feature maps. Global saliency is obtained by utilizing the likelihood of features, and local saliency is measured by calculating the local self-information. In the end, the final saliency map is computed by fusing the global and local saliency maps together. Experimental results on MSRA 10K demonstrate the effectiveness and promising performance of our proposed method.

8.
Biomed Res Int ; 2014: 923260, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24967415

RESUMEN

Changes of arterial pressure waveform characteristics have been accepted as risk indicators of cardiovascular diseases. Waveform modelling using Gaussian functions has been used to decompose arterial pressure pulses into different numbers of subwaves and hence quantify waveform characteristics. However, the fitting accuracy and computation efficiency of current modelling approaches need to be improved. This study aimed to develop a novel two-stage particle swarm optimizer (TSPSO) to determine optimal parameters of Gaussian functions. The evaluation was performed on carotid and radial artery pressure waveforms (CAPW and RAPW) which were simultaneously recorded from twenty normal volunteers. The fitting accuracy and calculation efficiency of our TSPSO were compared with three published optimization methods: the Nelder-Mead, the modified PSO (MPSO), and the dynamic multiswarm particle swarm optimizer (DMS-PSO). The results showed that TSPSO achieved the best fitting accuracy with a mean absolute error (MAE) of 1.1% for CAPW and 1.0% for RAPW, in comparison with 4.2% and 4.1% for Nelder-Mead, 2.0% and 1.9% for MPSO, and 1.2% and 1.1% for DMS-PSO. In addition, to achieve target MAE of 2.0%, the computation time of TSPSO was only 1.5 s, which was only 20% and 30% of that for MPSO and DMS-PSO, respectively.


Asunto(s)
Presión Arterial/fisiología , Modelos Cardiovasculares , Adulto , Femenino , Humanos , Masculino , Persona de Mediana Edad , Distribución Normal
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...