RESUMO
Culex pipiens, an important vector of many vector borne diseases, is a species capable to feeding on a wide variety of hosts and adapting to different environments. To predict the potential distribution of Cx. pipiens in central Italy, this study integrated presence/absence data from a four-year entomological survey (2019-2022) carried out in the Abruzzo and Molise regions, with a datacube of spectral bands acquired by Sentinel-2 satellites, as patches of 224 × 224 pixels of 20 meters spatial resolution around each site and for each satellite revisit time. We investigated three scenarios: the baseline model, which considers the environmental conditions at the time of collection; the multitemporal model, focusing on conditions in the 2 months preceding the collection; and the MultiAdjacency Graph Attention Network (MAGAT) model, which accounts for similarities in temperature and nearby sites using a graph architecture. For the baseline scenario, a deep convolutional neural network (DCNN) analyzed a single multi-band Sentinel-2 image. The DCNN in the multitemporal model extracted temporal patterns from a sequence of 10 multispectral images; the MAGAT model incorporated spatial and climatic relationships among sites through a graph neural network aggregation method. For all models, we also evaluated temporal lags between the multi-band Earth Observation datacube date of acquisition and the mosquito collection, from 0 to 50 days. The study encompassed a total of 2,555 entomological collections, and 108,064 images (patches) at 20 meters spatial resolution. The baseline model achieved an F1 score higher than 75.8% for any temporal lag, which increased up to 81.4% with the multitemporal model. The MAGAT model recorded the highest F1 score of 80.9%. The study confirms the widespread presence of Cx. pipiens throughout the majority of the surveyed area. Utilizing only Sentinel-2 spectral bands, the models effectively capture early in advance the temporal patterns of the mosquito population, offering valuable insights for directing surveillance activities during the vector season. The methodology developed in this study can be scaled up to the national territory and extended to other vectors, in order to support the Ministry of Health in the surveillance and control strategies for the vectors and the diseases they transmit.
RESUMO
The usage of Multi Instance Learning (MIL) for classifying Whole Slide Images (WSIs) has recently increased. Due to their gigapixel size, the pixel-level annotation of such data is extremely expensive and time-consuming, practically unfeasible. For this reason, multiple automatic approaches have been raised in the last years to support clinical practice and diagnosis. Unfortunately, most state-of-the-art proposals apply attention mechanisms without considering the spatial instance correlation and usually work on a single-scale resolution. To leverage the full potential of pyramidal structured WSI, we propose a graph-based multi-scale MIL approach, DAS-MIL. Our model comprises three modules: i) a self-supervised feature extractor, ii) a graph-based architecture that precedes the MIL mechanism and aims at creating a more contextualized representation of the WSI structure by considering the mutual (spatial) instance correlation both inter and intra-scale. Finally, iii) a (self) distillation loss between resolutions is introduced to compensate for their informative gap and significantly improve the final prediction. The effectiveness of the proposed framework is demonstrated on two well-known datasets, where we outperform SOTA on WSI classification, gaining a +2.7% AUC and +3.7% accuracy on the popular Camelyon16 benchmark.
RESUMO
The staple of human intelligence is the capability of acquiring knowledge in a continuous fashion. In stark contrast, Deep Networks forget catastrophically and, for this reason, the sub-field of Class-Incremental Continual Learning fosters methods that learn a sequence of tasks incrementally, blending sequentially-gained knowledge into a comprehensive prediction. This work aims at assessing and overcoming the pitfalls of our previous proposal Dark Experience Replay (DER), a simple and effective approach that combines rehearsal and Knowledge Distillation. Inspired by the way our minds constantly rewrite past recollections and set expectations for the future, we endow our model with the abilities to i) revise its replay memory to welcome novel information regarding past data ii) pave the way for learning yet unseen classes. We show that the application of these strategies leads to remarkable improvements; indeed, the resulting method - termed eXtended-DER (X-DER) - outperforms the state of the art on both standard benchmarks (such as CIFAR-100 and miniImageNet) and a novel one here introduced. To gain a better understanding, we further provide extensive ablation studies that corroborate and extend the findings of our previous research (e.g., the value of Knowledge Distillation and flatter minima in continual learning setups). We make our results fully reproducible; the codebase is available at https://github.com/aimagelab/mammoth.
RESUMO
In this article we introduce a new self-supervised, semi-parametric approach for synthesizing novel views of a vehicle starting from a single monocular image. Differently from parametric (i.e., entirely learning-based) methods, we show how a-priori geometric knowledge about the object and the 3D world can be successfully integrated into a deep learning based image generation framework. As this geometric component is not learnt, we call our approach semi-parametric. In particular, we exploit man-made object symmetry and piece-wise planarity to integrate rich a-priori visual information into the novel viewpoint synthesis process. An Image Completion Network (ICN) is then trained to generate a realistic image starting from this geometric guidance. This careful blend between parametric and non-parametric components allows us to i) operate in a real-world scenario, ii) preserve high-frequency visual information such as textures, iii) handle truly arbitrary 3D roto-translations of the input, and iv) perform shape transfer to completely different 3D models. Eventually, we show that our approach can be easily complemented with synthetic data and extended to other rigid objects with completely different topology, even in presence of concave structures and holes (e.g., chairs). A comprehensive experimental analysis against state-of-the-art competitors shows the efficacy of our method both from a quantitative and a perceptive point of view. Supplementary material, animated results, code, and data are available at: https://github.com/ndrplz/semiparametric.
RESUMO
The slaughterhouse can act as a valid checkpoint to estimate the prevalence and the economic impact of diseases in farm animals. At present, scoring lesions is a challenging and time-consuming activity, which is carried out by veterinarians serving the slaughter chain. Over recent years, artificial intelligence(AI) has gained traction in many fields of research, including livestock production. In particular, AI-based methods appear able to solve highly repetitive tasks and to consistently analyze large amounts of data, such as those collected by veterinarians during postmortem inspection in high-throughput slaughterhouses. The present study aims to develop an AI-based method capable of recognizing and quantifying enzootic pneumonia-like lesions on digital images captured from slaughtered pigs under routine abattoir conditions. Overall, the data indicate that the AI-based method proposed herein could properly identify and score enzootic pneumonia-like lesions without interfering with the slaughter chain routine. According to European legislation, the application of such a method avoids the handling of carcasses and organs, decreasing the risk of microbial contamination, and could provide further alternatives in the field of food hygiene.
RESUMO
Diseases of the respiratory system are known to negatively impact the profitability of the pig industry, worldwide. Considering the relatively short lifespan of pigs, lesions can be still evident at slaughter, where they can be usefully recorded and scored. Therefore, the slaughterhouse represents a key check-point to assess the health status of pigs, providing unique and valuable feedback to the farm, as well as an important source of data for epidemiological studies. Although relevant, scoring lesions in slaughtered pigs represents a very time-consuming and costly activity, thus making difficult their systematic recording. The present study has been carried out to train a convolutional neural network-based system to automatically score pleurisy in slaughtered pigs. The automation of such a process would be extremely helpful to enable a systematic examination of all slaughtered livestock. Overall, our data indicate that the proposed system is well able to differentiate half carcasses affected with pleurisy from healthy ones, with an overall accuracy of 85.5%. The system was better able to recognize severely affected half carcasses as compared with those showing less severe lesions. The training of convolutional neural networks to identify and score pneumonia, on the one hand, and the achievement of trials in large capacity slaughterhouses, on the other, represent the natural pursuance of the present study. As a result, convolutional neural network-based technologies could provide a fast and cheap tool to systematically record lesions in slaughtered pigs, thus supplying an enormous amount of useful data to all stakeholders in the pig industry.
Assuntos
Redes Neurais de Computação , Pleurisia/veterinária , Doenças dos Suínos/patologia , Matadouros , Animais , Pleurisia/patologia , Pneumonia/patologia , Pneumonia/veterinária , Sus scrofa , SuínosRESUMO
Depth cameras allow to set up reliable solutions for people monitoring and behavior understanding, especially when unstable or poor illumination conditions make unusable common RGB sensors. Therefore, we propose a complete framework for the estimation of the head and shoulder pose based on depth images only. A head detection and localization module is also included, in order to develop a complete end-to-end system. The core element of the framework is a Convolutional Neural Network, called POSEidon +, that receives as input three types of images and provides the 3D angles of the pose as output. Moreover, a Face-from-Depth component based on a Deterministic Conditional GAN model is able to hallucinate a face from the corresponding depth image. We empirically demonstrate that this positively impacts the system performances. We test the proposed framework on two public datasets, namely Biwi Kinect Head Pose and ICT-3DHP, and on Pandora, a new challenging dataset mainly inspired by the automotive setup. Experimental results show that our method overcomes several recent state-of-art works based on both intensity and depth input data, running in real-time at more than 30 frames per second.
Assuntos
Face , Cabeça , Imageamento Tridimensional/métodos , Redes Neurais de Computação , Postura/fisiologia , Algoritmos , Reconhecimento Facial Automatizado , Bases de Dados Factuais , Face/anatomia & histologia , Face/diagnóstico por imagem , Feminino , Cabeça/anatomia & histologia , Cabeça/diagnóstico por imagem , Humanos , Masculino , Reconhecimento Automatizado de Padrão , Ombro/anatomia & histologia , Ombro/diagnóstico por imagemRESUMO
Diplegia is a specific subcategory of the wide spectrum of motion disorders gathered under the name of cerebral palsy. Recent works proposed to use gait analysis for diplegia classification paving the way for automated analysis. A clinically established gait-based classification system divides diplegic patients into 4 main forms, each one associated with a peculiar walking pattern. In this work, we apply two different deep learning techniques, namely, multilayer perceptron and recurrent neural networks, to automatically classify children into the 4 clinical forms. For the analysis, we used a dataset comprising gait data of 174 patients collected by means of an optoelectronic system. The measurements describing walking patterns have been processed to extract 27 angular parameters and then used to train both kinds of neural networks. Classification results are comparable with those provided by experts in 3 out of 4 forms.
Assuntos
Paralisia Cerebral , Aprendizado Profundo , Transtornos Neurológicos da Marcha , Marcha/fisiologia , Paralisia Cerebral/classificação , Paralisia Cerebral/diagnóstico , Paralisia Cerebral/fisiopatologia , Bases de Dados Factuais , Transtornos Neurológicos da Marcha/classificação , Transtornos Neurológicos da Marcha/diagnóstico , Transtornos Neurológicos da Marcha/fisiopatologia , HumanosRESUMO
In this work we aim to predict the driver's focus of attention. The goal is to estimate what a person would pay attention to while driving, and which part of the scene around the vehicle is more critical for the task. To this end we propose a new computer vision model based on a multi-branch deep architecture that integrates three sources of information: raw video, motion and scene semantics. We also introduce DR(eye)VE, the largest dataset of driving scenes for which eye-tracking annotations are available. This dataset features more than 500,000 registered frames, matching ego-centric views (from glasses worn by drivers) and car-centric views (from roof-mounted camera), further enriched by other sensors measurements. Results highlight that several attention patterns are shared across drivers and can be reproduced to some extent. The indication of which elements in the scene are likely to capture the driver's attention may benefit several applications in the context of human-vehicle interaction and driver attention analysis.
RESUMO
Mankind directly controls the environment and lifestyles of several domestic species for purposes ranging from production and research to conservation and companionship. These environments and lifestyles may not offer these animals the best quality of life. Behaviour is a direct reflection of how the animal is coping with its environment. Behavioural indicators are thus among the preferred parameters to assess welfare. However, behavioural recording (usually from video) can be very time consuming and the accuracy and reliability of the output rely on the experience and background of the observers. The outburst of new video technology and computer image processing gives the basis for promising solutions. In this pilot study, we present a new prototype software able to automatically infer the behaviour of dogs housed in kennels from 3D visual data and through structured machine learning frameworks. Depth information acquired through 3D features, body part detection and training are the key elements that allow the machine to recognise postures, trajectories inside the kennel and patterns of movement that can be later labelled at convenience. The main innovation of the software is its ability to automatically cluster frequently observed temporal patterns of movement without any pre-set ethogram. Conversely, when common patterns are defined through training, a deviation from normal behaviour in time or between individuals could be assessed. The software accuracy in correctly detecting the dogs' behaviour was checked through a validation process. An automatic behaviour recognition system, independent from human subjectivity, could add scientific knowledge on animals' quality of life in confinement as well as saving time and resources. This 3D framework was designed to be invariant to the dog's shape and size and could be extended to farm, laboratory and zoo quadrupeds in artificial housing. The computer vision technique applied to this software is innovative in non-human animal behaviour science. Further improvements and validation are needed, and future applications and limitations are discussed.
Assuntos
Bem-Estar do Animal , Animais Domésticos , Comportamento Animal/fisiologia , Processamento de Imagem Assistida por Computador/métodos , Movimento/fisiologia , Qualidade de Vida , Software , Animais , Cães , Meio Ambiente , Projetos Piloto , Reprodutibilidade dos TestesRESUMO
Modern crowd theories agree that collective behavior is the result of the underlying interactions among small groups of individuals. In this work, we propose a novel algorithm for detecting social groups in crowds by means of a Correlation Clustering procedure on people trajectories. The affinity between crowd members is learned through an online formulation of the Structural SVM framework and a set of specifically designed features characterizing both their physical and social identity, inspired by Proxemic theory, Granger causality, DTW and Heat-maps. To adhere to sociological observations, we introduce a loss function ( G -MITRE) able to deal with the complexity of evaluating group detection performances. We show our algorithm achieves state-of-the-art results when relying on both ground truth trajectories and tracklets previously extracted by available detector/tracker systems.
RESUMO
There is a large variety of trackers, which have been proposed in the literature during the last two decades with some mixed success. Object tracking in realistic scenarios is a difficult problem, therefore, it remains a most active area of research in computer vision. A good tracker should perform well in a large number of videos involving illumination changes, occlusion, clutter, camera motion, low contrast, specularities, and at least six more aspects. However, the performance of proposed trackers have been evaluated typically on less than ten videos, or on the special purpose datasets. In this paper, we aim to evaluate trackers systematically and experimentally on 315 video fragments covering above aspects. We selected a set of nineteen trackers to include a wide variety of algorithms often cited in literature, supplemented with trackers appearing in 2010 and 2011 for which the code was publicly available. We demonstrate that trackers can be evaluated objectively by survival curves, Kaplan Meier statistics, and Grubs testing. We find that in the evaluation practice the F-score is as effective as the object tracking accuracy (OTA) score. The analysis under a large variety of circumstances provides objective insight into the strengths and weaknesses of trackers.
RESUMO
This paper presents a novel and robust approach to consistent labeling for people surveillance in multi-camera systems. A general framework scalable to any number of cameras with overlapped views is devised. An off-line training process automatically computes ground-plane homography and recovers epipolar geometry. When a new object is detected in any one camera, hypotheses for potential matching objects in the other cameras are established. Each of the hypotheses is evaluated using a prior and likelihood value. The prior accounts for the positions of the potential matching objects, while the likelihood is computed by warping the vertical axis of the new object on the field of view of the other cameras and measuring the amount of match. In the likelihood, two contributions (forward and backward) are considered so as to correctly handle the case of groups of people merged into single objects. Eventually, a maximum-a-posteriori approach estimates the best label assignment for the new object. Comparisons with other methods based on homography and extensive outdoor experiments demonstrate that the proposed approach is accurate and robust in coping with segmentation errors and in disambiguating groups.