RESUMEN
Agricultural pest identification is a prerequisite for increasing crop production and meeting global food demands. Generally, numerous phenotypic and genotypic features are widely utilized for species-level pest identification. However, the approaches are time-consuming and require expert knowledge in relevant fields. Numerous image-based machine learning (ML) models also exist to identify insect pests in agricultural fields. The models are significantly rely on a large, manually curated dataset and are close-set in nature. Our study aims to develop an open set pest identification approach by adding the capability of rejecting any irrelevant inputs. Tephritid fruit flies (Diptera:Tephritidae) are considered as an example since they are the most economically important agricultural pests worldwide. Images of the fruit flies were collected from a publicly available database and filtered to exclude uninformative images using a deep learning model (Inception-V3) and an unsupervised k-means clustering method. For the closed-set identification task, our EfficientNet-B2 model classified four major genera of notorious tephritid flies, namely, Anastrepha, Ceratitis, Rhagoletis, and Bactrocera with an accuracy of 89.65%. We further improvise our proposed model for open-set recognition tasks to leverage the identification beyond the trained datasets. The open set model achieved an overall accuracy of 86.48% and a macro F1-score of 94.44% on the four genera and an unknown class. Our proposed model can be a practical and effective pest identification tool for harmful fruit flies. In addition, the model is easy to implement with existing agricultural pest control systems in an open-world scenario.
Asunto(s)
Tephritidae , Animales , Insectos , Bases de Datos Factuales , GenotipoRESUMEN
Deep learning techniques have recently demonstrated remarkable success in numerous domains. Typically, the success of these deep learning models is measured in terms of performance metrics such as accuracy and mean average precision (mAP). Generally, a model's high performance is highly valued, but it frequently comes at the expense of substantial energy costs and carbon footprint emissions during the model building step. Massive emission of CO2 has a deleterious impact on life on earth in general and is a serious ethical concern that is largely ignored in deep learning research. In this article, we mainly focus on environmental costs and the means of mitigating carbon footprints in deep learning models, with a particular focus on models created using knowledge distillation (KD). Deep learning models typically contain a large number of parameters, resulting in a 'heavy' model. A heavy model scores high on performance metrics but is incompatible with mobile and edge computing devices. Model compression techniques such as knowledge distillation enable the creation of lightweight, deployable models for these low-resource devices. KD generates lighter models and typically performs with slightly less accuracy than the heavier teacher model (model accuracy by the teacher model on CIFAR 10, CIFAR 100, and TinyImageNet is 95.04%, 76.03%, and 63.39%; model accuracy by KD is 91.78%, 69.7%, and 60.49%). Although the distillation process makes models deployable on low-resource devices, they were found to consume an exorbitant amount of energy and have a substantial carbon footprint (15.8, 17.9, and 13.5 times more carbon compared to the corresponding teacher model). The enormous environmental cost is primarily attributable to the tuning of the hyperparameter, Temperature (τ). In this article, we propose measuring the environmental costs of deep learning work (in terms of GFLOPS in millions, energy consumption in kWh, and CO2 equivalent in grams). In order to create lightweight models with low environmental costs, we propose a straightforward yet effective method for selecting a hyperparameter (τ) using a stochastic approach for each training batch fed into the models. We applied knowledge distillation (including its data-free variant) to problems involving image classification and object detection. To evaluate the robustness of our method, we ran experiments on various datasets (CIFAR 10, CIFAR 100, Tiny ImageNet, and PASCAL VOC) and models (ResNet18, MobileNetV2, Wrn-40-2). Our novel approach reduces the environmental costs by a large margin by eliminating the requirement of expensive hyperparameter tuning without sacrificing performance. Empirical results on the CIFAR 10 dataset show that the stochastic technique achieves an accuracy of 91.67%, whereas tuning achieves an accuracy of 91.78%-however, the stochastic approach reduces the energy consumption and CO2 equivalent each by a factor of 19. Similar results have been obtained with CIFAR 100 and TinyImageNet dataset. This pattern is also observed in object detection classification on the PASCAL VOC dataset, where the tuning technique performs similarly to the stochastic technique, with a difference of 0.03% mAP favoring the stochastic technique while reducing the energy consumptions and CO2 emission each by a factor of 18.5.
Asunto(s)
Dióxido de Carbono , Aprendizaje Profundo , Huella de Carbono , Fenómenos Físicos , BenchmarkingRESUMEN
Conventional object detection models require large amounts of training data. In comparison, humans can recognize previously unseen objects by merely knowing their semantic description. To mimic similar behavior, zero-shot object detection (ZSD) aims to recognize and localize "unseen" object instances by using only their semantic information. The model is first trained to learn the relationships between visual and semantic domains for seen objects, later transferring the acquired knowledge to totally unseen objects. This setting gives rise to the need for correct alignment between visual and semantic concepts so that the unseen objects can be identified using only their semantic attributes. In this article, we propose a novel loss function called "polarity loss" that promotes correct visual-semantic alignment for an improved ZSD. On the one hand, it refines the noisy semantic embeddings via metric learning on a "semantic vocabulary" of related concepts to establish a better synergy between visual and semantic domains. On the other hand, it explicitly maximizes the gap between positive and negative predictions to achieve better discrimination between seen, unseen, and background objects. Our approach is inspired by embodiment theories in cognitive science that claim human semantic understanding to be grounded in past experiences (seen objects), related linguistic concepts (word vocabulary), and visual perception (seen/unseen object images). We conduct extensive evaluations on the Microsoft Common Objects in Context (MS-COCO) and Pascal Visual Object Classes (VOC) datasets, showing significant improvements over state of the art. Our code and evaluation protocols available at: https://github.com/salman-h-khan/PL-ZSD_Release.
RESUMEN
Health professionals often prescribe patients to perform specific exercises for rehabilitation of several diseases (e.g., stroke, Parkinson, backpain). When patients perform those exercises in the absence of an expert (e.g., physicians/therapists), they cannot assess the correctness of the performance. Automatic assessment of physical rehabilitation exercises aims to assign a quality score given an RGBD video of the body movement as input. Recent deep learning approaches address this problem by extracting CNN features from co-ordinate grids of skeleton data (body-joints) obtained from videos. However, they could not extract rich spatio-temporal features from variable-length inputs. To address this issue, we investigate Graph Convolutional Networks (GCNs) for this task. We adapt spatio-temporal GCN to predict continuous scores(assessment) instead of discrete class labels. Our model can process variable-length inputs so that users can perform any number of repetitions of the prescribed exercise. Moreover, our novel design also provides self-attention of body-joints, indicating their role in predicting assessment scores. It guides the user to achieve a better score in future trials by matching the same attention weights of expert users. Our model successfully outperforms existing exercise assessment methods on KIMORE and UI-PRMD datasets.
Asunto(s)
Terapia por Ejercicio , Redes Neurales de la Computación , Ejercicio Físico , Terapia por Ejercicio/métodos , Humanos , MovimientoRESUMEN
Introduction: The evaluation of retinal vessels and the retinal blood flow is important for ocular diseases. We introduce a spectral domain optical coherence tomography (SD-OCT) based method for facilitating a retinal blood vessel analysis using the scattering properties of retinal vessels. The intensity of the distal shadow of vessels caused by the scattered signal is measured, correlated with the pulsatile ocular blood flow (POBF), and its repeatability is analyzed. Methods: About 20 eyes of 20 healthy, young participants (mean age 23.15 years, standard deviation 2.3 years) were included in the analysis. Participants underwent ophthalmic diagnostics including three repeated SD-OCT examinations and measurement of POBF. The vessel shadow intensity analysis is based on peripapillary SD-OCT scans and automatically analyses the intensity of the distal vessel shadow compared to its surroundings. Results: The distal shadow of arteries in SD-OCT scans correlated with the POBF (r = 0.647, p = 0.002). Furthermore, the shadow intensity correlated with the established morphological arterio-venous ratio. The evaluation of repeatability was performed using the interclass correlation coefficient (ICC), showing good repeatability for individual vessels (ICC = 0.825) and arteries (ICC = 0.820). Conclusions: In summary, we indicate that the scattering properties of retinal vessels in SD-OCT images might correlate with the vessel morphology and for retinal arteries with the retinal blood flow volume as well. Further studies are needed to establish this method's sensitivity and specificity in participants with retinal and cardiovascular diseases.
Asunto(s)
Retina/diagnóstico por imagen , Arteria Retiniana , Vena Retiniana , Adulto , Femenino , Humanos , Masculino , Tomografía de Coherencia Óptica , Adulto JovenRESUMEN
Prevalent techniques in zero-shot learning do not generalize well to other related problem scenarios. Here, we present a unified approach for conventional zero-shot, generalized zero-shot and few-shot learning problems. Our approach is based on a novel Class Adapting Principal Directions (CAPD) concept that allows multiple embeddings of image features into a semantic space. Given an image, our method produces one principal direction for each seen class. Then, it learns how to combine these directions to obtain the principal direction for each unseen class such that the CAPD of the test image is aligned with the semantic embedding of the true class, and opposite to the other classes. This allows efficient and class-adaptive information transfer from seen to unseen classes. In addition, we propose an automatic process for selection of the most useful seen classes for each unseen class to achieve robustness in zero-shot learning. Our method can update the unseen CAPD taking the advantages of few unseen images to work in a few-shot learning scenario. Furthermore, our method can generalize the seen CAPDs by estimating seen-unseen diversity that significantly improves the performance of generalized zero-shot learning. Our extensive evaluations demonstrate that the proposed approach consistently achieves superior performance in zero-shot, generalized zero-shot and few/one-shot learning problems.
RESUMEN
PURPOSE: Clinical trials have demonstrated that retinal blood flow deficiencies are present in patients with open-angle glaucoma (OAG). We introduce a method for facilitating retinal vessel analysis: The intensity of the distal shadow of vessels in optical coherence tomography (OCT) caused by the scattered signal is analyzed, compared between healthy subjects and OAG patients and correlated with OCT angiography (OCT-A) flow density. PATIENTS AND METHODS: We recruited 80 patients with diagnosed OAG (mean age 63.4 ± 13.2 years) and 80 healthy age-matched control subjects for comparison, and 20 patients for the correlation with OCT-A flow density. Patients received perimetry, peripapillary OCT measurements, and selected patients OCT-A of the papillary area. The vessel shadow intensity (VSI) is based on peripapillary OCT scans: the intensity of the distal vessel shadow was automatically compared to its surroundings, separated by arteries and veins. Flow density of the OCT-A scan was calculated by binarization and quantification of the pixel density. RESULTS: The VSI for arteries was in OAG patients with 7.52 ± 2.62 [%] significantly lower compared to healthy subjects (9.03 ± 3.38 [%], p = 0.0029). In veins, the VSI was as well significantly lower for OAG patients (14.9 ± 3.59 [%]) compared to healthy subjects (17.46 ± 4.45 [%], p < 0.0001). Furthermore, in OAG patients there was a significant correlation of mean deviation of the visual field results with the veins' VSI (p = 0.0006; r = -0,454). There was no significant correlation of scattering properties with OCT-A flow density (p > 0.05). CONCLUSIONS: We conclude that the OCT-based analysis of the scattering properties of retinal vessels differs significantly between patients with OAG and healthy subjects. Furthermore, changes in the scattering properties of veins correlated with the stage of the disease in terms of visual field deficits. These properties might complement existing measurements of ocular blood flow including OCT-A flow density.
Asunto(s)
Glaucoma de Ángulo Abierto/diagnóstico , Luz , Disco Óptico/irrigación sanguínea , Vasos Retinianos/diagnóstico por imagen , Tomografía de Coherencia Óptica/métodos , Anciano , Femenino , Humanos , Masculino , Persona de Mediana Edad , Estudios Prospectivos , Dispersión de RadiaciónRESUMEN
Saliency maps produced by different algorithms are often evaluated by comparing output to fixated image locations appearing in human eye tracking data. There are challenges in evaluation based on fixation data due to bias in the data. Properties of eye movement patterns that are independent of image content may limit the validity of evaluation results, including spatial bias in fixation data. To address this problem, we present modeling and evaluation results for data derived from different perceptual tasks related to the concept of saliency. We also present a novel approach to benchmarking to deal with some of the challenges posed by spatial bias. The results presented establish the value of alternatives to fixation data to drive improvement and development of models. We also demonstrate an approach to approximate the output of alternative perceptual tasks based on computational saliency and/or eye gaze data. As a whole, this work presents novel benchmarking results and methods, establishes a new performance baseline for perceptual tasks that provide an alternative window into visual saliency, and demonstrates the capacity for saliency to serve in approximating human behaviour for one visual task given data from another.
Asunto(s)
Bases de Datos Factuales , Movimientos Oculares , Modelos Biológicos , Percepción Visual , HumanosRESUMEN
In the past decade, a large number of computational models of visual saliency have been proposed. Recently a number of comprehensive benchmark studies have been presented, with the goal of assessing the performance landscape of saliency models under varying conditions. This has been accomplished by considering fixation data, annotated image regions, and stimulus patterns inspired by psychophysics. In this paper, we present a high-level examination of challenges in computational modeling of visual saliency, with a heavy emphasis on human vision and neural computation. This includes careful assessment of different metrics for performance of visual saliency models, and identification of remaining difficulties in assessing model performance. We also consider the importance of a number of issues relevant to all saliency models including scale-space, the impact of border effects, and spatial or central bias. Additionally, we consider the biological plausibility of models in stepping away from exemplar input patterns towards a set of more general theoretical principles consistent with behavioral experiments. As a whole, this presentation establishes important obstacles that remain in visual saliency modeling, in addition to identifying a number of important avenues for further investigation.