Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Artículo en Inglés | MEDLINE | ID: mdl-38683713

RESUMEN

Crowd localization aims to predict the positions of humans in images of crowded scenes. While existing methods have made significant progress, two primary challenges remain: (i) a fixed number of evenly distributed anchors can cause excessive or insufficient predictions across regions in an image with varying crowd densities, and (ii) ranking inconsistency of predictions between the testing and training phases leads to the model being sub-optimal in inference. To address these issues, we propose a Consistency-Aware Anchor Pyramid Network (CAAPN) comprising two key components: an Adaptive Anchor Generator (AAG) and a Localizer with Augmented Matching (LAM). The AAG module adaptively generates anchors based on estimated crowd density in local regions to alleviate the anchor deficiency or excess problem. It also considers the spatial distribution prior to heads for better performance. The LAM module is designed to augment the predictions which are used to optimize the neural network during training by introducing an extra set of target candidates and correctly matching them to the ground truth. The proposed method achieves favorable performance against state-of-the-art approaches on five challenging datasets: ShanghaiTech A and B, UCF-QNRF, JHU-CROWD++, and NWPU-Crowd. The source code and trained models will be released at https://github.com/ucasyan/CAAPN.

2.
IEEE Trans Pattern Anal Mach Intell ; 46(2): 1049-1064, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-37878438

RESUMEN

Video captioning aims to generate natural language descriptions for a given video clip. Existing methods mainly focus on end-to-end representation learning via word-by-word comparison between predicted captions and ground-truth texts. Although significant progress has been made, such supervised approaches neglect semantic alignment between visual and linguistic entities, which may negatively affect the generated captions. In this work, we propose a hierarchical modular network to bridge video representations and linguistic semantics at four granularities before generating captions: entity, verb, predicate, and sentence. Each level is implemented by one module to embed corresponding semantics into video representations. Additionally, we present a reinforcement learning module based on the scene graph of captions to better measure sentence similarity. Extensive experimental results show that the proposed method performs favorably against the state-of-the-art models on three widely-used benchmark datasets, including microsoft research video description corpus (MSVD), MSR-video to text (MSR-VTT), and video-and-TEXt (VATEX).

3.
IEEE Trans Image Process ; 33: 1726-1739, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-37463088

RESUMEN

Visual attention advances object detection by attending neural networks to object representations. While existing methods incorporate empirical modules to empower network attention, we rethink attentive object detection from the network learning perspective in this work. We propose a NEural Attention Learning approach (NEAL) which consists of two parts. During the back-propagation of each training iteration, we first calculate the partial derivatives (a.k.a. the accumulated gradients) of the classification output with respect to the input features. We refine these partial derivatives to obtain attention response maps whose elements reflect the contributions to the final network predictions. Then, we formulate the attention response maps as extra objective functions, which are combined together with the original detection loss to train detectors in an end-to-end manner. In this way, we succeed in learning an attentive CNN model without introducing additional network structures. We apply NEAL to the two-stage object detection frameworks, which are usually composed of a CNN feature backbone, a region proposal network (RPN), and a classifier. We show that the proposed NEAL not only helps the RPN attend to objects but also enables the classifier to pay more attention to the premier positive samples. To this end, the localization (proposal generation) and classification mutually benefit from each other in our proposed method. Extensive experiments on large-scale benchmark datasets, including MS COCO 2017 and Pascal VOC 2012, demonstrate that the proposed NEAL algorithm advances the two-stage object detector over state-of-the-art approaches.

4.
IEEE Trans Pattern Anal Mach Intell ; 45(7): 8524-8537, 2023 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-37018268

RESUMEN

Recent works attempt to employ pre-training in Vision-and-Language Navigation (VLN). However, these methods neglect the importance of historical contexts or ignore predicting future actions during pre-training, limiting the learning of visual-textual correspondence and the capability of decision-making. To address these problems, we present a history-enhanced and order-aware pre-training with the complementing fine-tuning paradigm (HOP+) for VLN. Specifically, besides the common Masked Language Modeling (MLM) and Trajectory-Instruction Matching (TIM) tasks, we design three novel VLN-specific proxy tasks: Action Prediction with History (APH) task, Trajectory Order Modeling (TOM) task and Group Order Modeling (GOM) task. APH task takes into account the visual perception trajectory to enhance the learning of historical knowledge as well as action prediction. The two temporal visual-textual alignment tasks, TOM and GOM further improve the agent's ability to order reasoning. Moreover, we design a memory network to address the representation inconsistency of history context between the pre-training and the fine-tuning stages. The memory network effectively selects and summarizes historical information for action prediction during fine-tuning, without costing huge extra computation consumption for downstream VLN tasks. HOP+ achieves new state-of-the-art performance on four downstream VLN tasks (R2R, REVERIE, RxR, and NDH), which demonstrates the effectiveness of our proposed method.

5.
Artículo en Inglés | MEDLINE | ID: mdl-32941139

RESUMEN

Convolutional neural networks (CNNs) have achieved great success in several face-related tasks, such as face detection, alignment and recognition. As a fundamental problem in computer vision, face tracking plays a crucial role in various applications, such as video surveillance, human emotion detection and human-computer interaction. However, few CNN-based approaches are proposed for face (bounding box) tracking. In this paper, we propose a face tracking method based on Siamese CNNs, which takes advantages of powerful representations of hierarchical CNN features learned from massive face images. The proposed method captures discriminative face information at both local and global levels. At the local level, representations for attribute patches (i.e:, eyes, nose and mouth) are learned to distinguish a face from another one, which are robust to pose changes and occlusions. At the global level, representations for each whole face are learned, which take into account the spatial relationships among local patches and facial characters, such as skin color and nevus. In addition, we build a new largescale challenging face tracking dataset to evaluate face tracking methods and to facilitate the research forward in this field. Extensive experiments on the collected dataset demonstrate the effectiveness of our method in comparison to several state-of-theart visual tracking methods.

6.
Sensors (Basel) ; 20(9)2020 May 09.
Artículo en Inglés | MEDLINE | ID: mdl-32397421

RESUMEN

The dynamic time warping (DTW) algorithm is widely used in pattern matching and sequence alignment tasks, including speech recognition and time series clustering. However, DTW algorithms perform poorly when aligning sequences of uneven sampling frequencies. This makes it difficult to apply DTW to practical problems, such as aligning signals that are recorded simultaneously by sensors with different, uneven, and dynamic sampling frequencies. As multi-modal sensing technologies become increasingly popular, it is necessary to develop methods for high quality alignment of such signals. Here we propose a DTW algorithm called EventDTW which uses information propagated from defined events as basis for path matching and hence sequence alignment. We have developed two metrics, the error rate (ER) and the singularity score (SS), to define and evaluate alignment quality and to enable comparison of performance across DTW algorithms. We demonstrate the utility of these metrics on 84 publicly-available signals in addition to our own multi-modal biomedical signals. EventDTW outperformed existing DTW algorithms for optimal alignment of signals with different sampling frequencies in 37% of artificial signal alignment tasks and 76% of real-world signal alignment tasks.


Asunto(s)
Algoritmos , Tecnología Biomédica , Tiempo
7.
J Clin Transl Sci ; 5(1): e19, 2020 Jul 14.
Artículo en Inglés | MEDLINE | ID: mdl-33948242

RESUMEN

INTRODUCTION: Digital health is rapidly expanding due to surging healthcare costs, deteriorating health outcomes, and the growing prevalence and accessibility of mobile health (mHealth) and wearable technology. Data from Biometric Monitoring Technologies (BioMeTs), including mHealth and wearables, can be transformed into digital biomarkers that act as indicators of health outcomes and can be used to diagnose and monitor a number of chronic diseases and conditions. There are many challenges faced by digital biomarker development, including a lack of regulatory oversight, limited funding opportunities, general mistrust of sharing personal data, and a shortage of open-source data and code. Further, the process of transforming data into digital biomarkers is computationally expensive, and standards and validation methods in digital biomarker research are lacking. METHODS: In order to provide a collaborative, standardized space for digital biomarker research and validation, we present the first comprehensive, open-source software platform for end-to-end digital biomarker development: The Digital Biomarker Discovery Pipeline (DBDP). RESULTS: Here, we detail the general DBDP framework as well as three robust modules within the DBDP that have been developed for specific digital biomarker discovery use cases. CONCLUSIONS: The clear need for such a platform will accelerate the DBDP's adoption as the industry standard for digital biomarker development and will support its role as the epicenter of digital biomarker collaboration and exploration.

8.
IEEE Trans Pattern Anal Mach Intell ; 41(5): 1116-1130, 2019 05.
Artículo en Inglés | MEDLINE | ID: mdl-29993908

RESUMEN

Convolutional Neural Networks (CNNs) have been applied to visual tracking with demonstrated success in recent years. Most CNN-based trackers utilize hierarchical features extracted from a certain layer to represent the target. However, features from a certain layer are not always effective for distinguishing the target object from the backgrounds especially in the presence of complicated interfering factors (e.g., heavy occlusion, background clutter, illumination variation, and shape deformation). In this work, we propose a CNN-based tracking algorithm which hedges deep features from different CNN layers to better distinguish target objects and background clutters. Correlation filters are applied to feature maps of each CNN layer to construct a weak tracker, and all weak trackers are hedged into a strong one. For robust visual tracking, we propose a hedge method to adaptively determine weights of weak classifiers by considering both the difference between the historical as well as instantaneous performance, and the difference among all weak trackers over time. In addition, we design a Siamese network to define the loss of each weak tracker for the proposed hedge method. Extensive experiments on large benchmark datasets demonstrate the effectiveness of the proposed algorithm against the state-of-the-art tracking methods.


Asunto(s)
Procesamiento de Imagen Asistido por Computador/métodos , Redes Neurales de la Computación , Algoritmos , Humanos , Grabación en Video
9.
Sci Rep ; 8(1): 14098, 2018 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-30237527

RESUMEN

Epithelial-mesenchymal transition (EMT) is one of the most important mechanisms in the initiation and promotion of cancer cell metastasis. The phosphoinositide 3-kinase (PI3K) signaling pathway has been demonstrated to be involved in TGF-ß induced EMT, but the complicated TGF-ß signaling network makes it challenging to dissect the important role of PI3K on regulation of EMT process. Here, we applied optogenetic controlled PI3K module (named 'Opto-PI3K'), which based on CRY2 and the N-terminal of CIB1 (CIBN), to rapidly and reversibly control the endogenous PI3K activity in cancer cells with light. By precisely modulating the kinetics of PI3K activation, we found that E-cadherin is an important downstream target of PI3K signaling. Compared with TGF-ß treatment, Opto-PI3K had more potent effect in down-regulation of E-cadherin expression, which was demonstrated to be regulated in a light dose-dependent manner. Surprisingly, sustained PI3K activation induced partial EMT state in A549 cells that is highly reversible. Furthermore, we demonstrated that Opto-PI3K only partially mimicked TGF-ß effects on promotion of cell migration in vitro. These results reveal the importance of PI3K signaling in TGF-ß induced EMT, suggesting other TGF-ß regulated signaling pathways are necessary for the full and irreversible promotion of EMT in cancer cells. In addition, our study implicates the great promise of optogenetics in cancer research for mapping input-output relationships in oncogenic pathways.


Asunto(s)
Transición Epitelial-Mesenquimal/fisiología , Fosfatidilinositol 3-Quinasas/metabolismo , Movimiento Celular/efectos de los fármacos , Transición Epitelial-Mesenquimal/efectos de los fármacos , Células HeLa , Humanos , Optogenética , Fosforilación , Transducción de Señal/efectos de los fármacos , Transducción de Señal/fisiología , Factor de Crecimiento Transformador beta/farmacología
10.
IEEE Trans Image Process ; 27(8): 3857-3869, 2018 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-29727271

RESUMEN

Sparse coding has been applied to visual tracking and related vision problems with demonstrated success in recent years. Existing tracking methods based on local sparse coding sample patches from a target candidate and sparsely encode these using a dictionary consisting of patches sampled from target template images. The discriminative strength of existing methods based on local sparse coding is limited as spatial structure constraints among the template patches are not exploited. To address this problem, we propose a structure-aware local sparse coding algorithm, which encodes a target candidate using templates with both global and local sparsity constraints. For robust tracking, we show the local regions of a candidate region should be encoded only with the corresponding local regions of the target templates that are the most similar from the global view. Thus, a more precise and discriminative sparse representation is obtained to account for appearance changes. To alleviate the issues with tracking drifts, we design an effective template update scheme. Extensive experiments on challenging image sequences demonstrate the effectiveness of the proposed algorithm against numerous state-of-the-art methods.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...