Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
IEEE Trans Pattern Anal Mach Intell ; 42(3): 568-579, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-30561340

RESUMO

Deep neural networks enable state-of-the-art accuracy on visual recognition tasks such as image classification and object detection. However, modern networks contain millions of learned connections, and the current trend is towards deeper and more densely connected architectures. This poses a challenge to the deployment of state-of-the-art networks on resource-constrained systems, such as smartphones or mobile robots. In general, a more efficient utilization of computation resources would assist in deployment scenarios from embedded platforms to computing clusters running ensembles of networks. In this paper, we propose a deep network compression algorithm that performs weight pruning and quantization jointly, and in parallel with fine-tuning. Our approach takes advantage of the complementary nature of pruning and quantization and recovers from premature pruning errors, which is not possible with two-stage approaches. In experiments on ImageNet, CLIP-Q (Compression Learning by In-Parallel Pruning-Quantization) improves the state-of-the-art in network compression on AlexNet, VGGNet, GoogLeNet, and ResNet. We additionally demonstrate that CLIP-Q is complementary to efficient network architecture design by compressing MobileNet and ShuffleNet, and that CLIP-Q generalizes beyond convolutional networks by compressing a memory network for visual question answering.

2.
IEEE Trans Pattern Anal Mach Intell ; 42(5): 1257-1271, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-30668494

RESUMO

Visual data such as images and videos contain a rich source of structured semantic labels as well as a wide range of interacting components. Visual content could be assigned with fine-grained labels describing major components, coarse-grained labels depicting high level abstractions, or a set of labels revealing attributes. Such categorization over different, interacting layers of labels evinces the potential for a graph-based encoding of label information. In this paper, we exploit this rich structure for performing graph-based inference in label space for a number of tasks: multi-label image and video classification and action detection in untrimmed videos. We consider the use of the Bidirectional Inference Neural Network (BINN) and Structured Inference Neural Network (SINN) for performing graph-based inference in label space and propose a Long Short-Term Memory (LSTM) based extension for exploiting activity progression on untrimmed videos. The methods were evaluated on (i) the Animal with Attributes (AwA), Scene Understanding (SUN) and NUS-WIDE datasets for multi-label image classification, (ii) the first two releases of the YouTube-8M large scale dataset for multi-label video classification, and (iii) the THUMOS'14 and MultiTHUMOS video datasets for action detection. Our results demonstrate the effectiveness of structured label inference in these challenging tasks, achieving significant improvements against baselines.

3.
PLoS One ; 12(7): e0180318, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28678808

RESUMO

Falls are a major cause of injuries and deaths in older adults. Even when no injury occurs, about half of all older adults who fall are unable to get up without assistance. The extended period of lying on the floor often leads to medical complications, including muscle damage, dehydration, anxiety and fear of falling. Wearable sensor systems incorporating accelerometers and/or gyroscopes are designed to prevent long lies by automatically detecting and alerting care providers to the occurrence of a fall. Research groups have reported up to 100% accuracy in detecting falls in experimental settings. However, there is a lack of studies examining accuracy in the real-world setting. In this study, we examined the accuracy of a fall detection system based on real-world fall and non-fall data sets. Five young adults and 19 older adults went about their daily activities while wearing tri-axial accelerometers. Older adults experienced 10 unanticipated falls during the data collection. Approximately 400 hours of activities of daily living were recorded. We employed a machine learning algorithm, Support Vector Machine (SVM) classifier, to identify falls and non-fall events. We found that our system was able to detect 8 out of the 10 falls in older adults using signals from a single accelerometer (waist or sternum). Furthermore, our system did not report any false alarm during approximately 28.5 hours of recorded data from young adults. However, with older adults, the false positive rate among individuals ranged from 0 to 0.3 false alarms per hour. While our system showed higher fall detection and substantially lower false positive rate than the existing fall detection systems, there is a need for continuous efforts to collect real-world data within the target population to perform fall validation studies for fall detection systems on bigger real-world fall and non-fall datasets.


Assuntos
Acelerometria/métodos , Acidentes por Quedas/prevenção & controle , Atividades Cotidianas , Máquina de Vetores de Suporte , Acelerometria/instrumentação , Acelerometria/estatística & dados numéricos , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Serviços de Saúde para Idosos/estatística & dados numéricos , Humanos , Monitorização Ambulatorial/instrumentação , Monitorização Ambulatorial/métodos , Monitorização Ambulatorial/estatística & dados numéricos , Reprodutibilidade dos Testes
4.
IEEE Trans Pattern Anal Mach Intell ; 39(9): 1839-1852, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-28114057

RESUMO

We propose a probabilistic graphical framework for multi-instance learning (MIL) based on Markov networks. This framework can deal with different levels of labeling ambiguity (i.e., the portion of positive instances in a bag) in weakly supervised data by parameterizing cardinality potential functions. Consequently, it can be used to encode different cardinality-based multi-instance assumptions, ranging from the standard MIL assumption to more general assumptions. In addition, this framework can be efficiently used for both binary and multiclass classification. To this end, an efficient inference algorithm and a discriminative latent max-margin learning algorithm are introduced to train and test the proposed multi-instance Markov network models. We evaluate the performance of the proposed framework on binary and multi-class MIL benchmark datasets as well as two challenging computer vision tasks: cyclist helmet recognition and human group activity recognition. Experimental results verify that encoding the degree of ambiguity in data can improve classification performance.

5.
Med Biol Eng Comput ; 55(1): 45-55, 2017 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-27106749

RESUMO

Falls are the leading cause of injury-related morbidity and mortality among older adults. Over 90 % of hip and wrist fractures and 60 % of traumatic brain injuries in older adults are due to falls. Another serious consequence of falls among older adults is the 'long lie' experienced by individuals who are unable to get up and remain on the ground for an extended period of time after a fall. Considerable research has been conducted over the past decade on the design of wearable sensor systems that can automatically detect falls and send an alert to care providers to reduce the frequency and severity of long lies. While most systems described to date incorporate threshold-based algorithms, machine learning algorithms may offer increased accuracy in detecting falls. In the current study, we compared the accuracy of these two approaches in detecting falls by conducting a comprehensive set of falling experiments with 10 young participants. Participants wore waist-mounted tri-axial accelerometers and simulated the most common causes of falls observed in older adults, along with near-falls and activities of daily living. The overall performance of five machine learning algorithms was greater than the performance of five threshold-based algorithms described in the literature, with support vector machines providing the highest combination of sensitivity and specificity.


Assuntos
Acelerometria , Acidentes por Quedas , Algoritmos , Aprendizado de Máquina , Processamento de Sinais Assistido por Computador , Atividades Cotidianas , Adulto , Humanos , Sensibilidade e Especificidade , Adulto Jovem
6.
Gait Posture ; 39(1): 506-12, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24148648

RESUMO

Falls are the number one cause of injury in older adults. Lack of objective evidence on the cause and circumstances of falls is often a barrier to effective prevention strategies. Previous studies have established the ability of wearable miniature inertial sensors (accelerometers and gyroscopes) to automatically detect falls, for the purpose of delivering medical assistance. In the current study, we extend the applications of this technology, by developing and evaluating the accuracy of wearable sensor systems for determining the cause of falls. Twelve young adults participated in experimental trials involving falls due to seven causes: slips, trips, fainting, and incorrect shifting/transfer of body weight while sitting down, standing up from sitting, reaching and turning. Features (means and variances) of acceleration data acquired from four tri-axial accelerometers during the falling trials were input to a linear discriminant analysis technique. Data from an array of three sensors (left ankle+right ankle+sternum) provided at least 83% sensitivity and 89% specificity in classifying falls due to slips, trips, and incorrect shift of body weight during sitting, reaching and turning. Classification of falls due to fainting and incorrect shift during rising was less successful across all sensor combinations. Furthermore, similar classification accuracy was observed with data from wearable sensors and a video-based motion analysis system. These results establish a basis for the development of sensor-based fall monitoring systems that provide information on the cause and circumstances of falls, to direct fall prevention strategies at a patient or population level.


Assuntos
Acelerometria/métodos , Acidentes por Quedas/prevenção & controle , Algoritmos , Postura/fisiologia , Acelerometria/instrumentação , Adulto , Feminino , Humanos , Masculino , Adulto Jovem
7.
IEEE Trans Pattern Anal Mach Intell ; 35(4): 911-24, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22868650

RESUMO

We develop an algorithm for structured prediction with nondecomposable performance measures. The algorithm learns parameters of Markov Random Fields (MRFs) and can be applied to multivariate performance measures. Examples include performance measures such as Fß score (natural language processing), intersection over union (object category segmentation), Precision/Recall at k (search engines), and ROC area (binary classifiers). We attack this optimization problem by approximating the loss function with a piecewise linear function. The loss augmented inference forms a Quadratic Program (QP), which we solve using LP relaxation. We apply this approach to two tasks: object class-specific segmentation and human action retrieval from videos. We show significant improvement over baseline approaches that either use simple loss functions or simple scoring functions on the PASCAL VOC and H3D Segmentation datasets, and a nursing home action recognition dataset.


Assuntos
Atividades Cotidianas/classificação , Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Inteligência Artificial , Humanos , Modelos Teóricos , Movimento , Casas de Saúde , Gravação em Vídeo
8.
Artigo em Inglês | MEDLINE | ID: mdl-23367256

RESUMO

Falls are the number one cause of injury in older adults. An individual's risk for falls depends on his or her frequency of imbalance episodes, and ability to recover balance following these events. However, there is little direct evidence on the frequency and circumstances of imbalance episodes (near falls) in older adults. Currently, there is rapid growth in the development of wearable fall monitoring systems based on inertial sensors. The utility of these systems would be enhanced by the ability to detect near-falls. In the current study, we conducted laboratory experiments to determine how the number and location of wearable inertial sensors influences the accuracy of a machine learning algorithm in distinguishing near-falls from activities of daily living (ADLs).


Assuntos
Acelerometria/instrumentação , Máquina de Vetores de Suporte , Adulto , Algoritmos , Humanos , Adulto Jovem
9.
IEEE Trans Pattern Anal Mach Intell ; 34(8): 1549-62, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22144516

RESUMO

In this paper, we go beyond recognizing the actions of individuals and focus on group activities. This is motivated from the observation that human actions are rarely performed in isolation; the contextual information of what other people in the scene are doing provides a useful cue for understanding high-level activities. We propose a novel framework for recognizing group activities which jointly captures the group activity, the individual person actions, and the interactions among them. Two types of contextual information, group-person interaction and person-person interaction, are explored in a latent variable framework. In particular, we propose three different approaches to model the person-person interaction. One approach is to explore the structures of person-person interaction. Differently from most of the previous latent structured models, which assume a predefined structure for the hidden layer, e.g., a tree structure, we treat the structure of the hidden layer as a latent variable and implicitly infer it during learning and inference. The second approach explores person-person interaction in the feature level. We introduce a new feature representation called the action context (AC) descriptor. The AC descriptor encodes information about not only the action of an individual person in the video, but also the behavior of other people nearby. The third approach combines the above two. Our experimental results demonstrate the benefit of using contextual information for disambiguating group activities.


Assuntos
Atividades Cotidianas/classificação , Processamento de Imagem Assistida por Computador/métodos , Modelos Teóricos , Comportamento Social , Comportamento Espacial , Acidentes por Quedas , Algoritmos , Inteligência Artificial , Análise Discriminante , Humanos , Relações Interpessoais , Casas de Saúde , Curva ROC , Gravação em Vídeo
10.
Int J Biomed Imaging ; 2011: 846312, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22046177

RESUMO

Many subproblems in automated skin lesion diagnosis (ASLD) can be unified under a single generalization of assigning a label, from an predefined set, to each pixel in an image. We first formalize this generalization and then present two probabilistic models capable of solving it. The first model is based on independent pixel labeling using maximum a-posteriori (MAP) estimation. The second model is based on conditional random fields (CRFs), where dependencies between pixels are defined using a graph structure. Furthermore, we demonstrate how supervised learning and an appropriate training set can be used to automatically determine all model parameters. We evaluate both models' ability to segment a challenging dataset consisting of 116 images and compare our results to 5 previously published methods.

11.
IEEE Trans Pattern Anal Mach Intell ; 33(7): 1310-23, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-21135448

RESUMO

We present a discriminative part-based approach for human action recognition from video sequences using motion features. Our model is based on the recently proposed hidden conditional random field (HCRF) for object recognition. Similarly to HCRF for object recognition, we model a human action by a flexible constellation of parts conditioned on image observations. Differently from object recognition, our model combines both large-scale global features and local patch features to distinguish various actions. Our experimental results show that our model is comparable to other state-of-the-art approaches in action recognition. In particular, our experimental results demonstrate that combining large-scale global features and local patch features performs significantly better than directly applying HCRF on local patches alone. We also propose an alternative for learning the parameters of an HCRF model in a max-margin framework. We call this method the max-margin hidden conditional random field (MMHCRF). We demonstrate that MMHCRF outperforms HCRF in human action recognition. In addition, MMHCRF can handle a much broader range of complex hidden structures arising in various problems in computer vision.


Assuntos
Actigrafia/métodos , Modelos Estatísticos , Movimento/fisiologia , Reconhecimento Automatizado de Padrão/métodos , Gravação em Vídeo/métodos , Imagem Corporal Total/métodos , Algoritmos , Simulação por Computador , Humanos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Cadeias de Markov , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Técnica de Subtração
12.
IEEE Trans Pattern Anal Mach Intell ; 31(10): 1762-74, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19696448

RESUMO

We propose two new models for human action recognition from video sequences using topic models. Video sequences are represented by a novel "bag-of-words" representation, where each frame corresponds to a "word." Our models differ from previous latent topic models for visual recognition in two major aspects: first of all, the latent topics in our models directly correspond to class labels; second, some of the latent variables in previous topic models become observed in our case. Our models have several advantages over other latent topic models used in visual recognition. First of all, the training is much easier due to the decoupling of the model parameters. Second, it alleviates the issue of how to choose the appropriate number of latent topics. Third, it achieves much better performance by utilizing the information provided by the class labels in the training set. We present action classification results on five different data sets. Our results are either comparable to, or significantly better than previously published results on these data sets.


Assuntos
Atividades Humanas/classificação , Locomoção/fisiologia , Modelos Biológicos , Movimento/fisiologia , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Análise por Conglomerados , Humanos , Movimento (Física)
13.
IEEE Trans Vis Comput Graph ; 14(4): 885-99, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18467762

RESUMO

One challenge in video processing is to detect actions and events, known or unknown, in video streams dynamically. This paper proposes a visualization solution, where a video stream is depicted as a series of snapshots at a relatively sparse interval, and detected actions are highlighted with continuous abstract illustrations. The combined imagery and illustrative visualization conveys multi-field information in a manner similar to electrocardiograms (ECG) and seismographs. We thus name this type of video visualization as VideoPerpetuoGram (VPG). In this paper, we describe a system that handles the aw and processed information of the video stream in a multi-field visualization pipeline. As examples, we consider the needs for highlighting several types of processed information, including detected actions in video streams, and estimated relationship between recognized objects. We examine the effective means for depicting multi-field information in VPG, and support our choice of visual mappings through a survey. Our GPU implementation facilitates the VPG-specific viewing specification through a sheared object space, as well as volume bricking and combinational rendering of volume data and glyphs.


Assuntos
Algoritmos , Gráficos por Computador , Interpretação de Imagem Assistida por Computador/métodos , Análise Numérica Assistida por Computador , Interface Usuário-Computador , Gravação em Vídeo/métodos , Movimento (Física)
14.
IEEE Trans Pattern Anal Mach Intell ; 28(7): 1052-62, 2006 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-16792095

RESUMO

The problem we consider in this paper is to take a single two-dimensional image containing a human figure, locate the joint positions, and use these to estimate the body configuration and pose in three-dimensional space. The basic approach is to store a number of exemplar 2D views of the human body in a variety of different configurations and viewpoints with respect to the camera. On each of these stored views, the locations of the body joints (left elbow, right knee, etc.) are manually marked and labeled for future use. The input image is then matched to each stored view, using the technique of shape context matching in conjunction with a kinematic chain-based deformation model. Assuming that there is a stored view sufficiently similar in configuration and pose, the correspondence process will succeed. The locations of the body joints are then transferred from the exemplar view to the test shape. Given the 2D joint locations, the 3D body configuration and pose are then estimated using an existing algorithm. We can apply this technique to video by treating each frame independently--tracking just becomes repeated recognition. We present results on a variety of data sets.


Assuntos
Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Articulações/anatomia & histologia , Articulações/fisiologia , Reconhecimento Automatizado de Padrão/métodos , Postura/fisiologia , Imagem Corporal Total/métodos , Algoritmos , Inteligência Artificial , Análise por Conglomerados , Simulação por Computador , Humanos , Aumento da Imagem/métodos , Armazenamento e Recuperação da Informação/métodos , Modelos Biológicos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Técnica de Subtração
15.
IEEE Trans Pattern Anal Mach Intell ; 27(11): 1832-7, 2005 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-16285381

RESUMO

We demonstrate that shape contexts can be used to quickly prune a search for similar shapes. We present two algorithms for rapid shape retrieval: representative shape contexts, performing comparisons based on a small number of shape contexts, and shapemes, using vector quantization in the space of shape contexts to obtain prototypical shape pieces.


Assuntos
Algoritmos , Inteligência Artificial , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Gráficos por Computador , Aumento da Imagem/métodos , Análise Numérica Assistida por Computador , Processamento de Sinais Assistido por Computador , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA