Búsqueda | BVS Bolivia

1.

Neural network dose prediction for cervical brachytherapy: Overcoming data scarcity for applicator-specific models.

Moore, Lance C; Ahern, Fritz; Li, Lingyi; Kallis, Karoline; Kisling, Kelly; Cortes, Katherina G; Nwachukwu, Chika; Rash, Dominique; Yashar, Catheryn M; Mayadev, Jyoti; Zou, Jingjing; Vasconcelos, Nuno; Meyers, Sandra M.

Med Phys ; 2024 May 30.

Artículo en Inglés | MEDLINE | ID: mdl-38814165

RESUMEN

BACKGROUND: 3D neural network dose predictions are useful for automating brachytherapy (BT) treatment planning for cervical cancer. Cervical BT can be delivered with numerous applicators, which necessitates developing models that generalize to multiple applicator types. The variability and scarcity of data for any given applicator type poses challenges for deep learning. PURPOSE: The goal of this work was to compare three methods of neural network training-a single model trained on all applicator data, fine-tuning the combined model to each applicator, and individual (IDV) applicator models-to determine the optimal method for dose prediction. METHODS: Models were produced for four applicator types-tandem-and-ovoid (T&O), T&O with 1-7 needles (T&ON), tandem-and-ring (T&R) and T&R with 1-4 needles (T&RN). First, the combined model was trained on 859 treatment plans from 266 cervical cancer patients treated from 2010 onwards. The train/validation/test split was 70%/16%/14%, with approximately 49%/10%/19%/22% T&O/T&ON/T&R/T&RN in each dataset. Inputs included four channels for anatomical masks (high-risk clinical target volume [HRCTV], bladder, rectum, and sigmoid), a mask indicating dwell position locations, and applicator channels for each applicator component. Applicator channels were created by mapping the 3D dose for a single dwell position to each dwell position and summing over each applicator component with uniform dwell time weighting. A 3D Cascade U-Net, which consists of two U-Nets in sequence, and mean squared error loss function were used. The combined model was then fine-tuned to produce four applicator-specific models by freezing the first U-Net and encoding layers of the second and resuming training on applicator-specific data. Finally, four IDV models were trained using only data from each applicator type. Performance of these three model types was compared using the following metrics for the test set: mean error (ME, representing model bias) and mean absolute error (MAE) over all dose voxels and ME of clinical metrics (HRCTV D90% and D2cc of bladder, rectum, and sigmoid), averaged over all patients. A positive ME indicates the clinical dose was higher than predicted. 3D global gamma analysis with the prescription dose as reference value was performed. Dice similarity coefficients (DSC) were computed for each isodose volume. RESULTS: Fine-tuned and combined models showed better performance than IDV applicator training. Fine-tuning resulted in modest improvements in about half the metrics, compared to the combined model, while the remainder were mostly unchanged. Fine-tuned MAE = 3.98%/2.69%/5.36%/3.80% for T&O/T&R/T&ON/T&RN, and ME over all voxels = -0.08%/-0.89%/-0.59%/1.42%. ME D2cc were bladder = -0.77%/1.00%/-0.66%/-1.53%, rectum = 1.11%/-0.22%/-0.29%/-3.37%, sigmoid = -0.47%/-0.06%/-2.37%/-1.40%, and ME D90 = 2.6%/-4.4%/4.8%/0.0%. Gamma pass rates (3%/3 mm) were 86%/91%/83%/89%. Mean DSCs were 0.92%/0.92%/0.88%/0.91% for isodoses ≤ 150% of prescription. CONCLUSIONS: 3D BT dose was accurately predicted for all applicator types, as indicated by the low MAE and MEs, high gamma scores and high DSCs. Training on all treatment data overcomes challenges with data scarcity in each applicator type, resulting in superior performance than can be achieved by training on IDV applicators alone. This could presumably be explained by the fact that the larger, more diverse dataset allows the neural network to learn underlying trends and characteristics in dose that are common to all treatment applicators. Accurate, applicator-specific dose predictions could enable automated, knowledge-based planning for any cervical brachytherapy treatment.

2.

A fully integrated wearable ultrasound system to monitor deep tissues in moving subjects.

Lin, Muyang; Zhang, Ziyang; Gao, Xiaoxiang; Bian, Yizhou; Wu, Ray S; Park, Geonho; Lou, Zhiyuan; Zhang, Zhuorui; Xu, Xiangchen; Chen, Xiangjun; Kang, Andrea; Yang, Xinyi; Yue, Wentong; Yin, Lu; Wang, Chonghe; Qi, Baiyan; Zhou, Sai; Hu, Hongjie; Huang, Hao; Li, Mohan; Gu, Yue; Mu, Jing; Yang, Albert; Yaghi, Amer; Chen, Yimu; Lei, Yusheng; Lu, Chengchangfeng; Wang, Ruotao; Wang, Joseph; Xiang, Shu; Kistler, Erik B; Vasconcelos, Nuno; Xu, Sheng.

Nat Biotechnol ; 42(3): 448-457, 2024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-37217752

RESUMEN

Recent advances in wearable ultrasound technologies have demonstrated the potential for hands-free data acquisition, but technical barriers remain as these probes require wire connections, can lose track of moving targets and create data-interpretation challenges. Here we report a fully integrated autonomous wearable ultrasonic-system-on-patch (USoP). A miniaturized flexible control circuit is designed to interface with an ultrasound transducer array for signal pre-conditioning and wireless data communication. Machine learning is used to track moving tissue targets and assist the data interpretation. We demonstrate that the USoP allows continuous tracking of physiological signals from tissues as deep as 164 mm. On mobile subjects, the USoP can continuously monitor physiological signals, including central blood pressure, heart rate and cardiac output, for as long as 12 h. This result enables continuous autonomous surveillance of deep tissue signals toward the internet-of-medical-things.

Asunto(s)

Dispositivos Electrónicos Vestibles , Humanos , Signos Vitales

3.

Deep Learning Estimation of 10-2 Visual Field Map Based on Macular Optical Coherence Tomography Angiography Measurements.

Mahmoudinezhad, Golnoush; Moghimi, Sasan; Cheng, Jiacheng; Ru, Liyang; Yang, Dongchen; Agrawal, Kushagra; Dixit, Rajeev; Beheshtaein, Siavash; Du, Kelvin H; Latif, Kareem; Gunasegaran, Gopikasree; Micheletti, Eleonora; Nishida, Takashi; Kamalipour, Alireza; Walker, Evan; Christopher, Mark; Zangwill, Linda; Vasconcelos, Nuno; Weinreb, Robert N.

Am J Ophthalmol ; 257: 187-200, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-37734638

RESUMEN

PURPOSE: To develop deep learning (DL) models estimating the central visual field (VF) from optical coherence tomography angiography (OCTA) vessel density (VD) measurements. DESIGN: Development and validation of a deep learning model. METHODS: A total of 1051 10-2 VF OCTA pairs from healthy, glaucoma suspects, and glaucoma eyes were included. DL models were trained on en face macula VD images from OCTA to estimate 10-2 mean deviation (MD), pattern standard deviation (PSD), 68 total deviation (TD) and pattern deviation (PD) values and compared with a linear regression (LR) model with the same input. Accuracy of the models was evaluated by calculating the average mean absolute error (MAE) and the R2 (squared Pearson correlation coefficients) of the estimated and actual VF values. RESULTS: DL models predicting 10-2 MD achieved R2 of 0.85 (95% confidence interval [CI], 74-0.92) for 10-2 MD and MAEs of 1.76 dB (95% CI, 1.39-2.17 dB) for MD. This was significantly better than mean linear estimates for 10-2 MD. The DL model outperformed the LR model for the estimation of pointwise TD values with an average MAE of 2.48 dB (95% CI, 1.99-3.02) and R2 of 0.69 (95% CI, 0.57-0.76) over all test points. The DL model outperformed the LR model for the estimation of all sectors. CONCLUSIONS: DL models enable the estimation of VF loss from OCTA images with high accuracy. Applying DL to the OCTA images may enhance clinical decision making. It also may improve individualized patient care and risk stratification of patients who are at risk for central VF damage.

Asunto(s)

Aprendizaje Profundo , Glaucoma , Humanos , Campos Visuales , Tomografía de Coherencia Óptica/métodos , Células Ganglionares de la Retina , Glaucoma/diagnóstico , Pruebas del Campo Visual , Angiografía , Presión Intraocular

4.

A Generalized Explanation Framework for Visualization of Deep Learning Model Predictions.

Wang, Pei; Vasconcelos, Nuno.

IEEE Trans Pattern Anal Mach Intell ; 45(8): 9265-9283, 2023 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-37022375

RESUMEN

Attribution-based explanations are popular in computer vision but of limited use for fine-grained classification problems typical of expert domains, where classes differ by subtle details. In these domains, users also seek understanding of "why" a class was chosen and "why not" an alternative class. A new GenerAlized expLanatiOn fRamEwork (GALORE) is proposed to satisfy all these requirements, by unifying attributive explanations with explanations of two other types. The first is a new class of explanations, denoted deliberative, proposed to address the "why" question, by exposing the network insecurities about a prediction. The second is the class of counterfactual explanations, which have been shown to address the "why not" question but are now more efficiently computed. GALORE unifies these explanations by defining them as combinations of attribution maps with respect to various classifier predictions and a confidence score. An evaluation protocol that leverages object recognition (CUB200) and scene classification (ADE20 K) datasets combining part and attribute annotations is also proposed. Experiments show that confidence scores can improve explanation accuracy, deliberative explanations provide insight into the network deliberation process, the latter correlates with that performed by humans, and counterfactual explanations enhance the performance of human students in machine teaching experiments.

Asunto(s)

Aprendizaje Profundo , Humanos , Redes Neurales de la Computación , Algoritmos

5.

Cascade R-CNN: High Quality Object Detection and Instance Segmentation.

Cai, Zhaowei; Vasconcelos, Nuno.

IEEE Trans Pattern Anal Mach Intell ; 43(5): 1483-1498, 2021 05.

Artículo en Inglés | MEDLINE | ID: mdl-31794388

RESUMEN

In object detection, the intersection over union (IoU) threshold is frequently used to define positives/negatives. The threshold used to train a detector defines its quality. While the commonly used threshold of 0.5 leads to noisy (low-quality) detections, detection performance frequently degrades for larger thresholds. This paradox of high-quality detection has two causes: 1) overfitting, due to vanishing positive samples for large thresholds, and 2) inference-time quality mismatch between detector and test hypotheses. A multi-stage object detection architecture, the Cascade R-CNN, composed of a sequence of detectors trained with increasing IoU thresholds, is proposed to address these problems. The detectors are trained sequentially, using the output of a detector as training set for the next. This resampling progressively improves hypotheses quality, guaranteeing a positive training set of equivalent size for all detectors and minimizing overfitting. The same cascade is applied at inference, to eliminate quality mismatches between hypotheses and detectors. An implementation of the Cascade R-CNN without bells or whistles achieves state-of-the-art performance on the COCO dataset, and significantly improves high-quality detection on generic and specific object datasets, including VOC, KITTI, CityPerson, and WiderFace. Finally, the Cascade R-CNN is generalized to instance segmentation, with nontrivial improvements over the Mask R-CNN.

6.

Automated High-Frequency Observations of Physical Activity Using Computer Vision.

Carlson, Jordan A; Liu, B O; Sallis, James F; Hipp, J Aaron; Staggs, Vincent S; Kerr, Jacqueline; Papa, Amy; Dean, Kelsey; Vasconcelos, Nuno M.

Med Sci Sports Exerc ; 52(9): 2029-2036, 2020 09.

Artículo en Inglés | MEDLINE | ID: mdl-32175976

RESUMEN

PURPOSE: To test the validity of the Ecological Video Identification of Physical Activity (EVIP) computer vision algorithms for automated video-based ecological assessment of physical activity in settings such as parks and schoolyards. METHODS: Twenty-seven hours of video were collected from stationary overhead video cameras across 22 visits in nine sites capturing organized activities. Each person in the setting wore an accelerometer, and each second was classified as moderate-to-vigorous physical activity or sedentary/light activity. Data with 57,987 s were used to train and test computer vision algorithms for estimating the total number of people in the video and number of people active (in moderate-to-vigorous physical activity) each second. In the testing data set (38,658 s), video-based System for Observing Play and Recreation in Communities (SOPARC) observations were conducted every 5 min (130 observations). Concordance correlation coefficients (CCC) and mean absolute errors (MAE) assessed agreement between (1) EVIP and ground truth (people counts+accelerometry) and (2) SOPARC observation and ground truth. Site and scene-level correlates of error were investigated. RESULTS: Agreement between EVIP and ground truth was high for number of people in the scene (CCC = 0.88; MAE = 2.70) and moderate for number of people active (CCC = 0.55; MAE = 2.57). The EVIP error was uncorrelated with camera placement, presence of obstructions or shadows, and setting type. For both number in scene and number active, EVIP outperformed SOPARC observations in estimating ground truth values (CCC were larger by 0.11-0.12 and MAE smaller by 41%-48%). CONCLUSIONS: Computer vision algorithms are promising for automated assessment of setting-based physical activity. Such tools would require less manpower than human observation, produce more and potentially more accurate data, and allow for ongoing monitoring and feedback to inform interventions.

Asunto(s)

Algoritmos , Computadores , Ejercicio Físico , Grabación en Video , Acelerometría , Entorno Construido , Humanos , Observación/métodos , Parques Recreativos , Instituciones Académicas

7.

Learning Complexity-Aware Cascades for Pedestrian Detection.

Cai, Zhaowei; Saberian, Mohammad; Vasconcelos, Nuno.

IEEE Trans Pattern Anal Mach Intell ; 42(9): 2195-2211, 2020 09.

Artículo en Inglés | MEDLINE | ID: mdl-30990173

RESUMEN

The problem of pedestrian detection is considered. The design of complexity-aware cascaded pedestrian detectors, combining features of very different complexities, is investigated. A new cascade design procedure is introduced, by formulating cascade learning as the Lagrangian optimization of a risk that accounts for both accuracy and complexity. A boosting algorithm, denoted as complexity aware cascade training (CompACT), is then derived to solve this optimization. CompACT cascades are shown to seek an optimal trade-off between accuracy and complexity by pushing features of higher complexity to the later cascade stages, where only a few difficult candidate patches remain to be classified. This enables the use of features of vastly different complexities in a single detector. In result, the feature pool can be expanded to features previously impractical for cascade design, such as the responses of a deep convolutional neural network (CNN). This is demonstrated through the design of pedestrian detectors with a pool of features whose complexities span orders of magnitude. The resulting cascade generalizes the combination of a CNN with an object proposal mechanism: rather than a pre-processing stage, CompACT cascades seamlessly integrate CNNs in their stages. This enables accurate detection at fairly fast speeds.

8.

Semantic Fisher Scores for Task Transfer: Using Objects to Classify Scenes.

Dixit, Mandar; Li, Yunsheng; Vasconcelos, Nuno.

IEEE Trans Pattern Anal Mach Intell ; 42(12): 3102-3118, 2020 12.

Artículo en Inglés | MEDLINE | ID: mdl-31180842

RESUMEN

The transfer of a neural network (CNN) trained to recognize objects to the task of scene classification is considered. A Bag-of-Semantics (BoS) representation is first induced, by feeding scene image patches to the object CNN, and representing the scene image by the ensuing bag of posterior class probability vectors (semantic posteriors). The encoding of the BoS with a Fisher vector (FV) is then studied. A link is established between the FV of any probabilistic model and the Q-function of the expectation-maximization (EM) algorithm used to estimate its parameters by maximum likelihood. This enables 1) immediate derivation of FVs for any model for which an EM algorithm exists, and 2) leveraging efficient implementations from the EM literature for the computation of FVs. It is then shown that standard FVs, such as those derived from Gaussian or even Dirichlet mixtures, are unsuccessful for the transfer of semantic posteriors, due to the highly non-linear nature of the probability simplex. The analysis of these FVs shows that significant benefits can ensue by 1) designing FVs in the natural parameter space of the multinomial distribution, and 2) adopting sophisticated probabilistic models of semantic feature covariance. The combination of these two insights leads to the encoding of the BoS in the natural parameter space of the multinomial, using a vector of Fisher scores derived from a mixture of factor analyzers (MFA). A network implementation of the MFA Fisher Score (MFA-FS), denoted as the MFAFSNet, is finally proposed to enable end-to-end training. Experiments with various object CNNs and datasets show that the approach has state-of-the-art transfer performance. Somewhat surprisingly, the scene classification results are superior to those of a CNN explicitly trained for scene classification, using a large scene dataset (Places). This suggests that holistic analysis is insufficient for scene classification. The modeling of local object semantics appears to be at least equally important. The two approaches are also shown to be strongly complementary, leading to very large scene classification gains when combined, and outperforming all previous scene classification approaches by a sizable margin.

9.

Super Diffusion for Salient Object Detection.

Jiang, Peng; Pan, Zhiyi; Tu, Changhe; Vasconcelos, Nuno; Chen, Baoquan; Peng, Jingliang.

IEEE Trans Image Process ; 2019 Nov 25.

Artículo en Inglés | MEDLINE | ID: mdl-31765314

RESUMEN

One major branch of saliency object detection methods are diffusion-based which construct a graph model on a given image and diffuse seed saliency values to the whole graph by a diffusion matrix. While their performance is sensitive to specific feature spaces and scales used for the diffusion matrix definition, little work has been published to systematically promote the robustness and accuracy of salient object detection under the generic mechanism of diffusion. In this work, we firstly present a novel view of the working mechanism of the diffusion process based on mathematical analysis, which reveals that the diffusion process is actually computing the similarity of nodes with respect to the seeds based on diffusion maps. Following this analysis, we propose super diffusion, a novel inclusive learning-based framework for salient object detection, which makes the optimum and robust performance by integrating a large pool of feature spaces, scales and even features originally computed for non-diffusion-based salient object detection. A closed-form solution of the optimal parameters for the integration is determined through supervised learning. At the local level, we propose to promote each individual diffusion before the integration. Our mathematical analysis reveals the close relationship between saliency diffusion and spectral clustering. Based on this, we propose to re-synthesize each individual diffusion matrix from the most discriminative eigenvectors and the constant eigenvector (for saliency normalization). The proposed framework is implemented and experimented on prevalently used benchmark datasets, consistently leading to state-of-the-art performance.

10.

Deficient Endoplasmic Reticulum-Mitochondrial Phosphatidylserine Transfer Causes Liver Disease.

Hernández-Alvarez, María Isabel; Sebastián, David; Vives, Sara; Ivanova, Saska; Bartoccioni, Paola; Kakimoto, Pamela; Plana, Natalia; Veiga, Sónia R; Hernández, Vanessa; Vasconcelos, Nuno; Peddinti, Gopal; Adrover, Anna; Jové, Mariona; Pamplona, Reinald; Gordaliza-Alaguero, Isabel; Calvo, Enrique; Cabré, Noemí; Castro, Rui; Kuzmanic, Antonija; Boutant, Marie; Sala, David; Hyotylainen, Tuulia; Oresic, Matej; Fort, Joana; Errasti-Murugarren, Ekaitz; Rodrígues, Cecilia M P; Orozco, Modesto; Joven, Jorge; Cantó, Carles; Palacin, Manuel; Fernández-Veledo, Sonia; Vendrell, Joan; Zorzano, Antonio.

Cell ; 177(4): 881-895.e17, 2019 05 02.

Artículo en Inglés | MEDLINE | ID: mdl-31051106

RESUMEN

Non-alcoholic fatty liver is the most common liver disease worldwide. Here, we show that the mitochondrial protein mitofusin 2 (Mfn2) protects against liver disease. Reduced Mfn2 expression was detected in liver biopsies from patients with non-alcoholic steatohepatitis (NASH). Moreover, reduced Mfn2 levels were detected in mouse models of steatosis or NASH, and its re-expression in a NASH mouse model ameliorated the disease. Liver-specific ablation of Mfn2 in mice provoked inflammation, triglyceride accumulation, fibrosis, and liver cancer. We demonstrate that Mfn2 binds phosphatidylserine (PS) and can specifically extract PS into membrane domains, favoring PS transfer to mitochondria and mitochondrial phosphatidylethanolamine (PE) synthesis. Consequently, hepatic Mfn2 deficiency reduces PS transfer and phospholipid synthesis, leading to endoplasmic reticulum (ER) stress and the development of a NASH-like phenotype and liver cancer. Ablation of Mfn2 in liver reveals that disruption of ER-mitochondrial PS transfer is a new mechanism involved in the development of liver disease.

Asunto(s)

GTP Fosfohidrolasas/metabolismo , Proteínas Mitocondriales/metabolismo , Enfermedad del Hígado Graso no Alcohólico/metabolismo , Fosfatidilserinas/metabolismo , Animales , Modelos Animales de Enfermedad , Retículo Endoplásmico/metabolismo , Estrés del Retículo Endoplásmico/fisiología , Hepatocitos/metabolismo , Hepatocitos/patología , Humanos , Inflamación/metabolismo , Hígado/patología , Hepatopatías/etiología , Hepatopatías/metabolismo , Masculino , Ratones , Ratones Endogámicos C57BL , Mitocondrias/metabolismo , Cultivo Primario de Células , Transporte de Proteínas/fisiología , Transducción de Señal , Triglicéridos/metabolismo

11.

Automated Ecological Assessment of Physical Activity: Advancing Direct Observation.

Carlson, Jordan A; Liu, Bo; Sallis, James F; Kerr, Jacqueline; Hipp, J Aaron; Staggs, Vincent S; Papa, Amy; Dean, Kelsey; Vasconcelos, Nuno M.

Int J Environ Res Public Health ; 14(12)2017 12 01.

Artículo en Inglés | MEDLINE | ID: mdl-29194358

RESUMEN

Technological advances provide opportunities for automating direct observations of physical activity, which allow for continuous monitoring and feedback. This pilot study evaluated the initial validity of computer vision algorithms for ecological assessment of physical activity. The sample comprised 6630 seconds per camera (three cameras in total) of video capturing up to nine participants engaged in sitting, standing, walking, and jogging in an open outdoor space while wearing accelerometers. Computer vision algorithms were developed to assess the number and proportion of people in sedentary, light, moderate, and vigorous activity, and group-based metabolic equivalents of tasks (MET)-minutes. Means and standard deviations (SD) of bias/difference values, and intraclass correlation coefficients (ICC) assessed the criterion validity compared to accelerometry separately for each camera. The number and proportion of participants sedentary and in moderate-to-vigorous physical activity (MVPA) had small biases (within 20% of the criterion mean) and the ICCs were excellent (0.82-0.98). Total MET-minutes were slightly underestimated by 9.3-17.1% and the ICCs were good (0.68-0.79). The standard deviations of the bias estimates were moderate-to-large relative to the means. The computer vision algorithms appeared to have acceptable sample-level validity (i.e., across a sample of time intervals) and are promising for automated ecological assessment of activity in open outdoor settings, but further development and testing is needed before such tools can be used in a diverse range of settings.

Asunto(s)

Algoritmos , Ejercicio Físico , Acelerometría , Adulto , Femenino , Humanos , Masculino , Persona de Mediana Edad , Proyectos Piloto , Postura , Conducta Sedentaria , Adulto Joven

12.

Parametric Regression on the Grassmannian.

Hong, Yi; Kwitt, Roland; Singh, Nikhil; Vasconcelos, Nuno; Niethammer, Marc.

IEEE Trans Pattern Anal Mach Intell ; 38(11): 2284-2297, 2016 11.

Artículo en Inglés | MEDLINE | ID: mdl-26766216

RESUMEN

We address the problem of fitting parametric curves on the Grassmann manifold for the purpose of intrinsic parametric regression. We start from the energy minimization formulation of linear least-squares in Euclidean space and generalize this concept to general nonflat Riemannian manifolds, following an optimal-control point of view. We then specialize this idea to the Grassmann manifold and demonstrate that it yields a simple, extensible and easy-to-implement solution to the parametric regression problem. In fact, it allows us to extend the basic geodesic model to (1) a "time-warped" variant and (2) cubic splines. We demonstrate the utility of the proposed solution on different vision problems, such as shape regression as a function of age, traffic-speed estimation and crowd-counting from surveillance video clips. Most notably, these problems can be conveniently solved within the same framework without any specifically-tailored steps along the processing pipeline.

13.

Robust deformable and occluded object tracking with dynamic graph.

Cai, Zhaowei; Wen, Longyin; Lei, Zhen; Vasconcelos, Nuno; Li, Stan Z.

IEEE Trans Image Process ; 23(12): 5497-509, 2014 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-25350927

RESUMEN

While some efforts have been paid to handle deformation and occlusion in visual tracking, they are still great challenges. In this paper, a dynamic graph-based tracker (DGT) is proposed to address these two challenges in a unified framework. In the dynamic target graph, nodes are the target local parts encoding appearance information, and edges are the interactions between nodes encoding inner geometric structure information. This graph representation provides much more information for tracking in the presence of deformation and occlusion. The target tracking is then formulated as tracking this dynamic undirected graph, which is also a matching problem between the target graph and the candidate graph. The local parts within the candidate graph are separated from the background with Markov random field, and spectral clustering is used to solve the graph matching. The final target state is determined through a weighted voting procedure according to the reliability of part correspondence, and refined with recourse to a foreground/background segmentation. An effective online updating mechanism is proposed to update the model, allowing DGT to robustly adapt to variations of target structure. Experimental results show improved performance over several state-of-the-art trackers, in various challenging scenarios.

14.

Object recognition with hierarchical discriminant saliency networks.

Han, Sunhyoung; Vasconcelos, Nuno.

Front Comput Neurosci ; 8: 109, 2014.

Artículo en Inglés | MEDLINE | ID: mdl-25249971

RESUMEN

The benefits of integrating attention and object recognition are investigated. While attention is frequently modeled as a pre-processor for recognition, we investigate the hypothesis that attention is an intrinsic component of recognition and vice-versa. This hypothesis is tested with a recognition model, the hierarchical discriminant saliency network (HDSN), whose layers are top-down saliency detectors, tuned for a visual class according to the principles of discriminant saliency. As a model of neural computation, the HDSN has two possible implementations. In a biologically plausible implementation, all layers comply with the standard neurophysiological model of visual cortex, with sub-layers of simple and complex units that implement a combination of filtering, divisive normalization, pooling, and non-linearities. In a convolutional neural network implementation, all layers are convolutional and implement a combination of filtering, rectification, and pooling. The rectification is performed with a parametric extension of the now popular rectified linear units (ReLUs), whose parameters can be tuned for the detection of target object classes. This enables a number of functional enhancements over neural network models that lack a connection to saliency, including optimal feature denoising mechanisms for recognition, modulation of saliency responses by the discriminant power of the underlying features, and the ability to detect both feature presence and absence. In either implementation, each layer has a precise statistical interpretation, and all parameters are tuned by statistical learning. Each saliency detection layer learns more discriminant saliency templates than its predecessors and higher layers have larger pooling fields. This enables the HDSN to simultaneously achieve high selectivity to target object classes and invariance. The performance of the network in saliency and object recognition tasks is compared to those of models from the biological and computer vision literatures. This demonstrates benefits for all the functional enhancements of the HDSN, the class tuning inherent to discriminant saliency, and saliency layers based on templates of increasing target selectivity and invariance. Altogether, these experiments suggest that there are non-trivial benefits in integrating attention and recognition.

15.

On the role of correlation and abstraction in cross-modal multimedia retrieval.

Costa Pereira, Jose; Coviello, Emanuele; Doyle, Gabriel; Rasiwasia, Nikhil; Lanckriet, Gert R G; Levy, Roger; Vasconcelos, Nuno.

IEEE Trans Pattern Anal Mach Intell ; 36(3): 521-35, 2014 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-24457508

RESUMEN

The problem of cross-modal retrieval from multimedia repositories is considered. This problem addresses the design of retrieval systems that support queries across content modalities, for example, using an image to search for texts. A mathematical formulation is proposed, equating the design of cross-modal retrieval systems to that of isomorphic feature spaces for different content modalities. Two hypotheses are then investigated regarding the fundamental attributes of these spaces. The first is that low-level cross-modal correlations should be accounted for. The second is that the space should enable semantic abstraction. Three new solutions to the cross-modal retrieval problem are then derived from these hypotheses: correlation matching (CM), an unsupervised method which models cross-modal correlations, semantic matching (SM), a supervised technique that relies on semantic representation, and semantic correlation matching (SCM), which combines both. An extensive evaluation of retrieval performance is conducted to test the validity of the hypotheses. All approaches are shown successful for text retrieval in response to image queries and vice versa. It is concluded that both hypotheses hold, in a complementary form, although evidence in favor of the abstraction hypothesis is stronger than that for correlation.

16.

Anomaly detection and localization in crowded scenes.

Li, Weixin; Mahadevan, Vijay; Vasconcelos, Nuno.

IEEE Trans Pattern Anal Mach Intell ; 36(1): 18-32, 2014 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-24231863

RESUMEN

The detection and localization of anomalous behaviors in crowded scenes is considered, and a joint detector of temporal and spatial anomalies is proposed. The proposed detector is based on a video representation that accounts for both appearance and dynamics, using a set of mixture of dynamic textures models. These models are used to implement 1) a center-surround discriminant saliency detector that produces spatial saliency scores, and 2) a model of normal behavior that is learned from training data and produces temporal saliency scores. Spatial and temporal anomaly maps are then defined at multiple spatial scales, by considering the scores of these operators at progressively larger regions of support. The multiscale scores act as potentials of a conditional random field that guarantees global consistency of the anomaly judgments. A data set of densely crowded pedestrian walkways is introduced and used to evaluate the proposed anomaly detector. Experiments on this and other data sets show that the latter achieves state-of-the-art anomaly detection results.

Asunto(s)

Aglomeración , Procesamiento de Imagen Asistido por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Grabación en Video/métodos , Algoritmos , Humanos

17.

Latent Dirichlet allocation models for image classification.

Rasiwasia, Nikhil; Vasconcelos, Nuno.

IEEE Trans Pattern Anal Mach Intell ; 35(11): 2665-79, 2013 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-24051727

RESUMEN

Two new extensions of latent Dirichlet allocation (LDA), denoted topic-supervised LDA (ts-LDA) and class-specific-simplex LDA (css-LDA), are proposed for image classification. An analysis of the supervised LDA models currently used for this task shows that the impact of class information on the topics discovered by these models is very weak in general. This implies that the discovered topics are driven by general image regularities, rather than the semantic regularities of interest for classification. To address this, ts-LDA models are introduced which replace the automated topic discovery of LDA with specified topics, identical to the classes of interest for classification. While this results in improvements in classification accuracy over existing LDA models, it compromises the ability of LDA to discover unanticipated structure of interest. This limitation is addressed by the introduction of css-LDA, an LDA model with class supervision at the level of image features. In css-LDA topics are discovered per class, i.e., a single set of topics shared across classes is replaced by multiple class-specific topic sets. The css-LDA model is shown to combine the labeling strength of topic-supervision with the flexibility of topic-discovery. Its effectiveness is demonstrated through an extensive experimental evaluation, involving multiple benchmark datasets, where it is shown to outperform existing LDA-based image classification approaches.

Asunto(s)

Algoritmos , Inteligencia Artificial , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Modelos Estadísticos , Reconocimiento de Normas Patrones Automatizadas/métodos , Simulación por Computador , Humanos

18.

Glial cell activation in the spinal cord and dorsal root ganglia induced by surgery in mice.

Romero, Asunción; Romero-Alejo, Elizabeth; Vasconcelos, Nuno; Puig, Margarita M.

Eur J Pharmacol ; 702(1-3): 126-34, 2013 Feb 28.

Artículo en Inglés | MEDLINE | ID: mdl-23396227

RESUMEN

In rodents, surgery and/or remifentanil induce postoperative pain hypersensitivity together with glial cell activation. The same stimulus also produces long-lasting adaptative changes resulting in latent pain sensitization, substantiated after naloxone administration. Glial contribution to postoperative latent sensitization is unknown. In the incisional pain model in mice, surgery was performed under sevoflurane+remifentanil anesthesia and 21 days later, 1 mg/kg of (-) or (+) naloxone was administered subcutaneously. Mechanical thresholds (Von Frey) and glial activation were repeatedly assessed from 30 min to 21 days. We used ionized calcium binding adaptor molecule 1 (Iba1) and glial fibrillary acidic protein (GFAP) to identify glial cells in the spinal cord and dorsal root ganglia by immunohistochemistry. Postoperative hypersensitivity was present up to 10 days, but the administration of (-) but not (+) naloxone at 21 days, induced again hyperalgesia. A transient microglia/macrophage and astrocyte activation was present between 30 min and 2 days postoperatively, while increased immunoreactivity in satellite glial cells lasted 21 days. At this time point, (-) naloxone, but not (+) naloxone, increased GFAP in satellite glial cells; conversely, both naloxone steroisomers similarly increased GFAP in the spinal cord. The report shows for the first time that surgery induces long-lasting morphological changes in astrocytes and satellite cells, involving opioid and toll-like receptors, that could contribute to the development of latent pain sensitization in mice.

Asunto(s)

Astrocitos/fisiología , Ganglios Espinales/fisiopatología , Hiperalgesia/fisiopatología , Dolor Postoperatorio/fisiopatología , Médula Espinal/fisiopatología , Anestésicos Intravenosos , Animales , Conducta Animal , Ganglios Espinales/citología , Hiperalgesia/etiología , Masculino , Ratones , Piperidinas , Remifentanilo , Médula Espinal/citología

19.

Biologically Inspired Object Tracking Using Center-Surround Saliency Mechanisms.

Mahadevan, Vijay; Vasconcelos, Nuno.

IEEE Trans Pattern Anal Mach Intell ; 35(3): 541-54, 2013 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-22529325

RESUMEN

A biologically inspired discriminant object tracker is proposed. It is argued that discriminant tracking is a consequence of top-down tuning of the saliency mechanisms that guide the deployment of visual attention. The principle of discriminant saliency is then used to derive a tracker that implements a combination of center-surround saliency, a spatial spotlight of attention, and feature-based attention. In this framework, the tracking problem is formulated as one of continuous target-background classification, implemented in two stages. The first, or learning stage, combines a focus of attention (FoA) mechanism, and bottom-up saliency to identify a maximally discriminant set of features for target detection. The second, or detection stage, uses a feature-based attention mechanism and a target-tuned top-down discriminant saliency detector to detect the target. Overall, the tracker iterates between learning discriminant features from the target location in a video frame and detecting the location of the target in the next. The statistics of natural images are exploited to derive an implementation which is conceptually simple and computationally efficient. The saliency formulation is also shown to establish a unified framework for classifier design, target detection, automatic tracker initialization, and scale adaptation. Experimental results show that the proposed discriminant saliency tracker outperforms a number of state-of-the-art trackers in the literature.

20.

Learning optimal embedded cascades.

Saberian, Mohammad Javad; Vasconcelos, Nuno.

IEEE Trans Pattern Anal Mach Intell ; 34(10): 2005-18, 2012 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-22213762

RESUMEN

The problem of automatic and optimal design of embedded object detector cascades is considered. Two main challenges are identified: optimization of the cascade configuration and optimization of individual cascade stages, so as to achieve the best tradeoff between classification accuracy and speed, under a detection rate constraint. Two novel boosting algorithms are proposed to address these problems. The first, RCBoost, formulates boosting as a constrained optimization problem which is solved with a barrier penalty method. The constraint is the target detection rate, which is met at all iterations of the boosting process. This enables the design of embedded cascades of known configuration without extensive cross validation or heuristics. The second, ECBoost, searches over cascade configurations to achieve the optimal tradeoff between classification risk and speed. The two algorithms are combined into an overall boosting procedure, RCECBoost, which optimizes both the cascade configuration and its stages under a detection rate constraint, in a fully automated manner. Extensive experiments in face, car, pedestrian, and panda detection show that the resulting detectors achieve an accuracy versus speed tradeoff superior to those of previous methods.

Asunto(s)

Algoritmos , Inteligencia Artificial , Procesamiento de Imagen Asistido por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Animales , Automóviles , Cara/anatomía & histología , Humanos , Ursidae

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA