Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
Article in English | MEDLINE | ID: mdl-38656856

ABSTRACT

Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tackle computer vision applications. Their main feature is the capacity to extract global information through the self-attention mechanism, outperforming earlier convolutional neural networks. However, ViT deployment and performance have grown steadily with their size, number of trainable parameters, and operations. Furthermore, self-attention's computational and memory cost quadratically increases with the image resolution. Generally speaking, it is challenging to employ these architectures in real-world applications due to many hardware and environmental restrictions, such as processing and computational capabilities. Therefore, this survey investigates the most efficient methodologies to ensure sub-optimal estimation performances. More in detail, four efficient categories will be analyzed: compact architecture, pruning, knowledge distillation, and quantization strategies. Moreover, a new metric called Efficient Error Rate has been introduced in order to normalize and compare models' features that affect hardware devices at inference time, such as the number of parameters, bits, FLOPs, and model size. Summarizing, this paper firstly mathematically defines the strategies used to make Vision Transformer efficient, describes and discusses state-of-the-art methodologies, and analyzes their performances over different application scenarios. Toward the end of this paper, we also discuss open challenges and promising research directions.

2.
IEEE Comput Graph Appl ; 44(2): 23-36, 2024.
Article in English | MEDLINE | ID: mdl-38319778

ABSTRACT

The increasing demand for edge devices causes the necessity for recent technologies to be adaptable to nonspecialized hardware. In particular, in the context of augmented, virtual reality, and computer graphics, the 3-D object reconstruction task from a sparse point cloud is highly computationally demanding and for this reason, it is difficult to accomplish on embedded devices. In addition, the majority of earlier works have focused on mesh quality at the expense of speeding up the creation process. In order to find the best balance between time for mesh generation and mesh quality, we aim to tackle the object reconstruction process by developing a lightweight implicit representation. To achieve this goal, we leverage the use of convolutional occupancy networks. We show the effectiveness of the proposed approach through extensive experiments on the ShapeNet dataset using systems with different resources such as GPU, CPU, and an embedded device.

3.
Sensors (Basel) ; 23(4)2023 Feb 16.
Article in English | MEDLINE | ID: mdl-36850825

ABSTRACT

The knowledge of environmental depth is essential in multiple robotics and computer vision tasks for both terrestrial and underwater scenarios. Moreover, the hardware on which this technology runs, generally IoT and embedded devices, are limited in terms of power consumption, and therefore, models with a low-energy footprint are required to be designed. Recent works aim at enabling depth perception using single RGB images on deep architectures, such as convolutional neural networks and vision transformers, which are generally unsuitable for real-time inferences on low-power embedded hardware. Moreover, such architectures are trained to estimate depth maps mainly on terrestrial scenarios due to the scarcity of underwater depth data. Purposely, we present two lightweight architectures based on optimized MobileNetV3 encoders and a specifically designed decoder to achieve fast inferences and accurate estimations over embedded devices, a feasibility study to predict depth maps over underwater scenarios, and an energy assessment to understand which is the effective energy consumption during the inference. Precisely, we propose the MobileNetV3S75 configuration to infer on the 32-bit ARM CPU and the MobileNetV3LMin for the 8-bit Edge TPU hardware. In underwater settings, the proposed design achieves comparable estimations with fast inference performances compared to state-of-the-art methods. Moreover, we statistically proved that the architecture of the models has an impact on the energy footprint in terms of Watts required by the device during the inference. Then, the proposed architectures would be considered to be a promising approach for real-time monocular depth estimation by offering the best trade-off between inference performances, estimation error and energy consumption, with the aim of improving the environment perception for underwater drones, lightweight robots and Internet of things.

4.
J Imaging ; 7(11)2021 Nov 17.
Article in English | MEDLINE | ID: mdl-34821873

ABSTRACT

Nowadays, images and videos have become the main modalities of information being exchanged in everyday life, and their pervasiveness has led the image forensics community to question their reliability, integrity, confidentiality, and security more and more [...].

5.
J Imaging ; 7(8)2021 Aug 08.
Article in English | MEDLINE | ID: mdl-34460776

ABSTRACT

Videos have become a powerful tool for spreading illegal content such as military propaganda, revenge porn, or bullying through social networks. To counter these illegal activities, it has become essential to try new methods to verify the origin of videos from these platforms. However, collecting datasets large enough to train neural networks for this task has become difficult because of the privacy regulations that have been enacted in recent years. To mitigate this limitation, in this work we propose two different solutions based on transfer learning and multitask learning to determine whether a video has been uploaded from or downloaded to a specific social platform through the use of shared features with images trained on the same task. By transferring features from the shallowest to the deepest levels of the network from the image task to videos, we measure the amount of information shared between these two tasks. Then, we introduce a model based on multitask learning, which learns from both tasks simultaneously. The promising experimental results show, in particular, the effectiveness of the multitask approach. According to our knowledge, this is the first work that addresses the problem of social media platform identification of videos through the use of shared features.

6.
Entropy (Basel) ; 22(11)2020 Oct 30.
Article in English | MEDLINE | ID: mdl-33287003

ABSTRACT

Research findings have shown that microphones can be uniquely identified by audio recordings since physical features of the microphone components leave repeatable and distinguishable traces on the audio stream. This property can be exploited in security applications to perform the identification of a mobile phone through the built-in microphone. The problem is to determine an accurate but also efficient representation of the physical characteristics, which is not known a priori. Usually there is a trade-off between the identification accuracy and the time requested to perform the classification. Various approaches have been used in literature to deal with it, ranging from the application of handcrafted statistical features to the recent application of deep learning techniques. This paper evaluates the application of different entropy measures (Shannon Entropy, Permutation Entropy, Dispersion Entropy, Approximate Entropy, Sample Entropy, and Fuzzy Entropy) and their suitability for microphone classification. The analysis is validated against an experimental dataset of built-in microphones of 34 mobile phones, stimulated by three different audio signals. The findings show that selected entropy measures can provide a very high identification accuracy in comparison to other statistical features and that they can be robust against the presence of noise. This paper performs an extensive analysis based on filter features selection methods to identify the most discriminating entropy measures and the related hyper-parameters (e.g., embedding dimension). Results on the trade-off between accuracy and classification time are also presented.

7.
Forensic Sci Int ; 251: e9-e14, 2015 Jun.
Article in English | MEDLINE | ID: mdl-25851695

ABSTRACT

Photographic documents both in digital and in printed format plays a fundamental role in crime scene analysis. Photos are crucial to reconstruct what happened and also to freeze the fact scenario with all the different present objects and evidences. Consequently, it is immediate to comprehend the paramount importance of the assessment of the authenticity of such images, to avoid that a possible malicious counterfeiting leads to a wrong evaluation of the circumstance. In this paper, a case study in which some printed photos, brought as documental evidences of a familiar murder, had been fraudulently modified to bias the final judgement is presented. In particular, the usage of CADET image forensic tool, to verify printed photos integrity, is introduced and discussed.

SELECTION OF CITATIONS
SEARCH DETAIL
...