Pesquisa | Portal de Pesquisa da BVS Enfermagem

1.

Low-latency automotive vision with event cameras.

Gehrig, Daniel; Scaramuzza, Davide.

Nature ; 629(8014): 1034-1040, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38811712

RESUMO

The computer vision algorithms used currently in advanced driver assistance systems rely on image-based RGB cameras, leading to a critical bandwidth-latency trade-off for delivering safe driving experiences. To address this, event cameras have emerged as alternative vision sensors. Event cameras measure the changes in intensity asynchronously, offering high temporal resolution and sparsity, markedly reducing bandwidth and latency requirements1. Despite these advantages, event-camera-based algorithms are either highly efficient but lag behind image-based ones in terms of accuracy or sacrifice the sparsity and efficiency of events to achieve comparable results. To overcome this, here we propose a hybrid event- and frame-based object detector that preserves the advantages of each modality and thus does not suffer from this trade-off. Our method exploits the high temporal resolution and sparsity of events and the rich but low temporal resolution information in standard images to generate efficient, high-rate object detections, reducing perceptual and computational latency. We show that the use of a 20 frames per second (fps) RGB camera plus an event camera can achieve the same latency as a 5,000-fps camera with the bandwidth of a 45-fps camera without compromising accuracy. Our approach paves the way for efficient and robust perception in edge-case scenarios by uncovering the potential of event cameras2.

2.

Champion-level drone racing using deep reinforcement learning.

Kaufmann, Elia; Bauersfeld, Leonard; Loquercio, Antonio; Müller, Matthias; Koltun, Vladlen; Scaramuzza, Davide.

Nature ; 620(7976): 982-987, 2023 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-37648758

RESUMO

First-person view (FPV) drone racing is a televised sport in which professional competitors pilot high-speed aircraft through a 3D circuit. Each pilot sees the environment from the perspective of their drone by means of video streamed from an onboard camera. Reaching the level of professional pilots with an autonomous drone is challenging because the robot needs to fly at its physical limits while estimating its speed and location in the circuit exclusively from onboard sensors1. Here we introduce Swift, an autonomous system that can race physical vehicles at the level of the human world champions. The system combines deep reinforcement learning (RL) in simulation with data collected in the physical world. Swift competed against three human champions, including the world champions of two international leagues, in real-world head-to-head races. Swift won several races against each of the human champions and demonstrated the fastest recorded race time. This work represents a milestone for mobile robotics and machine intelligence2, which may inspire the deployment of hybrid learning-based solutions in other physical systems.

3.

A model selection framework to quantify microvascular liver function in gadoxetate-enhanced MRI: Application to healthy liver, diseased tissue, and hepatocellular carcinoma.

Berks, Michael; Little, Ross A; Watson, Yvonne; Cheung, Sue; Datta, Anubhav; O'Connor, James P B; Scaramuzza, Davide; Parker, Geoff J M.

Magn Reson Med ; 86(4): 1829-1844, 2021 10.

Artigo em Inglês | MEDLINE | ID: mdl-33973674

RESUMO

PURPOSE: We introduce a novel, generalized tracer kinetic model selection framework to quantify microvascular characteristics of liver and tumor tissue in gadoxetate-enhanced dynamic contrast-enhanced MRI (DCE-MRI). METHODS: Our framework includes a hierarchy of nested models, from which physiological parameters are derived in 2 regimes, corresponding to the active transport and free diffusion of gadoxetate. We use simulations to show the sensitivity of model selection and parameter estimation to temporal resolution, time-series duration, and noise. We apply the framework in 8 healthy volunteers (time-series duration up to 24 minutes) and 10 patients with hepatocellular carcinoma (6 minutes). RESULTS: The active transport regime is preferred in 98.6% of voxels in volunteers, 82.1% of patients' non-tumorous liver, and 32.2% of tumor voxels. Interpatient variations correspond to known co-morbidities. Simulations suggest both datasets have sufficient temporal resolution and signal-to-noise ratio, while patient data would be improved by using a time-series duration of at least 12 minutes. CONCLUSIONS: In patient data, gadoxetate exhibits different kinetics: (a) between liver and tumor regions and (b) within regions due to liver disease and/or tumor heterogeneity. Our generalized framework selects a physiological interpretation at each voxel, without preselecting a model for each region or duplicating time-consuming optimizations for models with identical functional forms.

Assuntos

Carcinoma Hepatocelular , Neoplasias Hepáticas , Carcinoma Hepatocelular/diagnóstico por imagem , Meios de Contraste , Gadolínio DTPA , Humanos , Fígado/diagnóstico por imagem , Neoplasias Hepáticas/diagnóstico por imagem , Imageamento por Ressonância Magnética

4.

Reference Pose Generation for Long-term Visual Localization via Learned Features and View Synthesis.

Zhang, Zichao; Sattler, Torsten; Scaramuzza, Davide.

Int J Comput Vis ; 129(4): 821-844, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34720404

RESUMO

Visual Localization is one of the key enabling technologies for autonomous driving and augmented reality. High quality datasets with accurate 6 Degree-of-Freedom (DoF) reference poses are the foundation for benchmarking and improving existing methods. Traditionally, reference poses have been obtained via Structure-from-Motion (SfM). However, SfM itself relies on local features which are prone to fail when images were taken under different conditions, e.g., day/night changes. At the same time, manually annotating feature correspondences is not scalable and potentially inaccurate. In this work, we propose a semi-automated approach to generate reference poses based on feature matching between renderings of a 3D model and real images via learned features. Given an initial pose estimate, our approach iteratively refines the pose based on feature matches against a rendering of the model from the current pose estimate. We significantly improve the nighttime reference poses of the popular Aachen Day-Night dataset, showing that state-of-the-art visual localization methods perform better (up to 47%) than predicted by the original reference poses. We extend the dataset with new nighttime test images, provide uncertainty estimates for our new reference poses, and introduce a new evaluation criterion. We will make our reference poses and our framework publicly available upon publication.

5.

A proposal of an updated classification for pelvic relapses of rectal cancer to guide surgical decision-making.

Belli, Filiberto; Sorrentino, Luca; Gallino, Gianfrancesco; Gronchi, Alessandro; Scaramuzza, Davide; Valvo, Francesca; Cattaneo, Laura; Cosimelli, Maurizio.

J Surg Oncol ; 122(2): 350-359, 2020 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-32424824

RESUMO

BACKGROUND AND OBJECTIVES: Selection of patients affected by pelvic recurrence of rectal cancer (PRRC) who are likely to achieve a R0 resection is mandatory. The aim of this study was to propose a classification for PRRC to predict both radical surgery and disease-free survival (DFS). METHODS: PRRC patients treated at the National Cancer Institute of Milan (Italy) were included in the study. PRRC were classified as S1, if located centrally (S1a-S1b) or anteriorly (S1c) within the pelvis; S2, in case of sacral involvement below (S2a) or above (S2b) the second sacral vertebra; S3, in case of lateral pelvic involvement. RESULTS: Of 280 reviewed PRRC patients, 152 (54.3%) were evaluated for curative surgery. The strongest predictor of R+ resection was the S3 category (OR, 6.37; P = .011). Abdominosacral resection (P = .012), anterior exenteration (P = .012) and extended rectal re-excision (P = .003) were predictive of R0 resection. S3 category was highly predictive of poor DFS (HR 2.53; P = .038). DFS was significantly improved after R0 surgery for S1 (P < .0001) and S2 (P = .015) patients but not for S3 cases (P = .525). CONCLUSIONS: The proposed classification allows selection of subjects candidates to curative surgery, emphasizing that lateral pelvic involvement is the main predictor of R+ resection and independently affects the DFS.

Assuntos

Tomada de Decisões , Recidiva Local de Neoplasia/classificação , Recidiva Local de Neoplasia/cirurgia , Neoplasias Pélvicas/classificação , Neoplasias Pélvicas/cirurgia , Neoplasias Retais/classificação , Neoplasias Retais/cirurgia , Análise de Variância , Quimioterapia Adjuvante , Intervalo Livre de Doença , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Recidiva Local de Neoplasia/patologia , Neoplasias Pélvicas/patologia , Modelos de Riscos Proporcionais , Radioterapia Adjuvante , Neoplasias Retais/patologia , Neoplasias Retais/terapia , Taxa de Sobrevida

6.

Different pixel pitch and maximum luminance of medical grade displays may result in different evaluations of digital radiography images.

Laffranchi, Alberto; Cicero, Calogero; Lualdi, Manuela; Ciniselli, Chiara M; Calareso, Giuseppina; Canestrini, Stefano; Greco, Francesca G; Alberioli, Enrico; Cavatorta, Claudia; Guarise, Alessandro; Pignoli, Emanuele; Plebani, Maddalena; Scaramuzza, Davide; Siciliano, Claudio; Verderio, Paolo; Marchianò, Alfonso.

Radiol Med ; 123(8): 586-592, 2018 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-29671208

RESUMO

AIM: To evaluate the effects of display pixel pitch and maximum luminance on intra- and inter-observer reproducibility and observer performance when evaluating chest lesions and bone fractures. MATERIALS AND METHODS: This was a multi-institutional study for a retrospective interpretation of selected digital radiography images. Overall, 82 images were selected by senior radiologists, including 50 cases of chest lesions and 32 cases of bone fractures. These images were displayed at two pixel pitches (0.212 and 0.165 mm size pixels) and two maximum luminance values (250 and 500 cd/m2) and reviewed twice by senior and junior radiologists. All the observers had to indicate the likelihood of the presence of the lesions and to rate the relative confidence of their assessment. Cohen Kappa statistic was computed to estimate the reproducibility in correctly identifying lesions; for multi-reader-multi-case (MRMC) analysis, weighted Jackknife Alternative Free-response Receiver Operating Characteristic (wJAFROC) statistical tools was applied. RESULTS: The intra-radiologist and inter-observer reproducibility values were the highest for the 0.165 mm size pixel at 500 cd/m2 display, for both chest lesions and bone fractures evaluations. As regards chest lesions, observer performances were significantly greater with 0.165 mm size pixel display at 500 cd/m2 than with lower maximum luminance and/or larger pixel pitch displays. Concerning bone fractures, the performance obtained with 0.212 mm size pixel display at 250 cd/m2 was statistically lower than that obtained with 0.165 mm sixe pixel display at 500 cd/m2. CONCLUSION: Our results indicate that an increased maximum luminance level and a decreased pixel pitch of medical-grade display improve the accuracy of detecting both chest lesions and bone fractures.

Assuntos

Fraturas Ósseas/diagnóstico por imagem , Intensificação de Imagem Radiográfica/métodos , Radiografia Torácica/métodos , Doenças Torácicas/diagnóstico por imagem , Humanos , Variações Dependentes do Observador , Reprodutibilidade dos Testes

7.

A smartphone application to determine body length for body weight estimation in children: a prospective clinical trial.

Wetzel, Oliver; Schmidt, Alexander R; Seiler, Michelle; Scaramuzza, Davide; Seifert, Burkhardt; Spahn, Donat R; Stein, Philipp.

J Clin Monit Comput ; 32(3): 571-578, 2018 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-28660564

RESUMO

The aim of this study was to test the feasibility and accuracy of a smartphone application to measure the body length of children using the integrated camera and to evaluate the subsequent weight estimates. A prospective clinical trial of children aged 0-<13 years admitted to the emergency department of the University Children's Hospital Zurich. The primary outcome was to validate the length measurement by the smartphone application «Optisizer¼. The secondary outcome was to correlate the virtually calculated ordinal categories based on the length measured by the app to the categories based on the real length. The third and independent outcome was the comparison of the different weight estimations by physicians, nurses, parents and the app. For all 627 children, the Bland Altman analysis showed a bias of -0.1% (95% CI -0.3-0.2%) comparing real length and length measured by the app. Ordinal categories of real length were in excellent agreement with categories virtually calculated based upon app length (kappa = 0.83, 95% CI 0.79-0.86). Children's real weight was underestimated by physicians (-3.3, 95% CI -4.4 to -2.2%, p < 0.001), nurses (-2.6, 95% CI -3.8 to -1.5%, p < 0.001) and parents (-1.3, 95% CI -1.9 to -0.6%, p < 0.001) but overestimated by categories based upon app length (1.6, 95% CI 0.3-2.8%, p = 0.02) and categories based upon real length (2.3, 95% CI 1.1-3.5%, p < 0.001). Absolute weight differences were lowest, if estimated by the parents (5.4, 95% CI 4.9-5.9%, p < 0.001). This study showed the accuracy of length measurement of children by a smartphone application: body length determined by the smartphone application is in good agreement with the real patient length. Ordinal length categories derived from app-measured length are in excellent agreement with the ordinal length categories based upon the real patient length. The body weight estimations based upon length corresponded to known data and limitations. Precision of body weight estimations by paediatric physicians and nurses were comparable and not different to length based estimations. In this non-emergency setting, parental weight estimation was significantly better than all other means of estimation (paediatric physicians and nurses, length based estimations) in terms of precision and absolute difference.

Assuntos

Estatura , Peso Corporal , Aplicativos Móveis , Monitorização Fisiológica/instrumentação , Smartphone , Algoritmos , Criança , Pré-Escolar , Serviço Hospitalar de Emergência , Feminino , Hospitais Pediátricos , Humanos , Lactente , Recém-Nascido , Masculino , Monitorização Fisiológica/métodos , Variações Dependentes do Observador , Pediatria , Estudos Prospectivos , Reprodutibilidade dos Testes

8.

Dense Continuous-Time Optical Flow From Event Cameras.

Gehrig, Mathias; Muglikar, Manasi; Scaramuzza, Davide.

IEEE Trans Pattern Anal Mach Intell ; 46(7): 4736-4746, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38306259

RESUMO

We present a method for estimating dense continuous-time optical flow from event data. Traditional dense optical flow methods compute the pixel displacement between two images. Due to missing information, these approaches cannot recover the pixel trajectories in the blind time between two images. In this work, we show that it is possible to compute per-pixel, continuous-time optical flow using events from an event camera. Events provide temporally fine-grained information about movement in pixel space due to their asynchronous nature and microsecond response time. We leverage these benefits to predict pixel trajectories densely in continuous time via parameterized Bézier curves. To achieve this, we build a neural network with strong inductive biases for this task: First, we build multiple sequential correlation volumes in time using event data. Second, we use Bézier curves to index these correlation volumes at multiple timestamps along the trajectory. Third, we use the retrieved correlation to update the Bézier curve representations iteratively. Our method can optionally include image pairs to boost performance further. To the best of our knowledge, our model is the first method that can regress dense pixel trajectories from event data. To train and evaluate our model, we introduce a synthetic dataset (MultiFlow) that features moving objects and ground truth trajectories for every pixel. Our quantitative experiments not only suggest that our method successfully predicts pixel trajectories in continuous time but also that it is competitive in the traditional two-view pixel displacement metric on MultiFlow and DSEC-Flow. Open source code and datasets are released to the public.

9.

Wearable robots for the real world need vision.

Gionfrida, Letizia; Kim, Daekyum; Scaramuzza, Davide; Farina, Dario; Howe, Robert D.

Sci Robot ; 9(90): eadj8812, 2024 May 22.

Artigo em Inglês | MEDLINE | ID: mdl-38776377

RESUMO

To enhance wearable robots, understanding user intent and environmental perception with novel vision approaches is needed.

Assuntos

Robótica , Dispositivos Eletrônicos Vestíveis , Robótica/instrumentação , Robótica/tendências , Robótica/estatística & dados numéricos , Humanos , Desenho de Equipamento , Inteligência Artificial , Intenção

10.

E-Calib: A Fast, Robust, and Accurate Calibration Toolbox for Event Cameras.

Salah, Mohammed; Ayyad, Abdulla; Humais, Muhammad; Gehrig, Daniel; Abusafieh, Abdelqader; Seneviratne, Lakmal; Scaramuzza, Davide; Zweiri, Yahya.

IEEE Trans Image Process ; 33: 3977-3990, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38869999

RESUMO

Event cameras triggered a paradigm shift in the computer vision community delineated by their asynchronous nature, low latency, and high dynamic range. Calibration of event cameras is always essential to account for the sensor intrinsic parameters and for 3D perception. However, conventional image-based calibration techniques are not applicable due to the asynchronous, binary output of the sensor. The current standard for calibrating event cameras relies on either blinking patterns or event-based image reconstruction algorithms. These approaches are difficult to deploy in factory settings and are affected by noise and artifacts degrading the calibration performance. To bridge these limitations, we present E-Calib, a novel, fast, robust, and accurate calibration toolbox for event cameras utilizing the asymmetric circle grid, for its robustness to out-of-focus scenes. E-Calib introduces an efficient reweighted least squares (eRWLS) method for feature extraction of the calibration pattern circles with sub-pixel accuracy and robustness to noise. In addition, a modified hierarchical clustering algorithm is devised to detect the calibration grid apart from the background clutter. The proposed method is tested in a variety of rigorous experiments for different event camera models, on circle grids with different geometric properties, on varying calibration trajectories and speeds, and under challenging illumination conditions. The results show that our approach outperforms the state-of-the-art in detection success rate, reprojection error, and pose estimation accuracy.

11.

Radiofrequency ablation of liver tumors: quantitative assessment of tumor coverage through CT image processing.

Passera, Katia; Selvaggi, Sabrina; Scaramuzza, Davide; Garbagnati, Francesco; Vergnaghi, Daniele; Mainardi, Luca.

BMC Med Imaging ; 13: 3, 2013 Jan 16.

Artigo em Inglês | MEDLINE | ID: mdl-23324557

RESUMO

BACKGROUND: Radiofrequency ablation (RFA) is one of the most promising non-surgical treatments for hepatic tumors. The assessment of the therapeutic efficacy of RFA is usually obtained by visual comparison of pre- and post-treatment CT images, but no numerical quantification is performed. METHODS: In this work, a novel method aiming at providing a more objective tool for the evaluation of RFA coverage is described. Image registration and segmentation techniques were applied to enable the visualization of the tumor and the corresponding post-RFA necrosis in the same framework. In addition, a set of numerical indexes describing tumor/necrosis overlap and their mutual position were computed. RESULTS: After validation of segmentation step, the method was applied on a dataset composed by 10 tumors, suspected not to be completed treated. Numerical indexes showed that only two tumors were totally treated and the percentage of a residual tumor was in the range of 5.12%-35.92%. CONCLUSIONS: This work represents a first attempt to obtain a quantitative tool aimed to assess the accuracy of RFA treatment. The possibility to visualize the tumor and the correspondent post-RFA necrosis in the same framework and the definition of some synthetic numerical indexes could help clinicians in ameliorating RFA treatment.

Assuntos

Ablação por Cateter/métodos , Neoplasias Hepáticas/diagnóstico por imagem , Neoplasias Hepáticas/cirurgia , Reconhecimento Automatizado de Padrão/métodos , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Cirurgia Assistida por Computador/métodos , Tomografia Computadorizada por Raios X/métodos , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Humanos , Imageamento Tridimensional/métodos , Masculino , Pessoa de Meia-Idade , Intensificação de Imagem Radiográfica/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Resultado do Tratamento

12.

Cracking double-blind review: Authorship attribution with deep learning.

Bauersfeld, Leonard; Romero, Angel; Muglikar, Manasi; Scaramuzza, Davide.

PLoS One ; 18(6): e0287611, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37390072

RESUMO

Double-blind peer review is considered a pillar of academic research because it is perceived to ensure a fair, unbiased, and fact-centered scientific discussion. Yet, experienced researchers can often correctly guess from which research group an anonymous submission originates, biasing the peer-review process. In this work, we present a transformer-based, neural-network architecture that only uses the text content and the author names in the bibliography to attribute an anonymous manuscript to an author. To train and evaluate our method, we created the largest authorship-identification dataset to date. It leverages all research papers publicly available on arXiv amounting to over 2 million manuscripts. In arXiv-subsets with up to 2,000 different authors, our method achieves an unprecedented authorship attribution accuracy, where up to 73% of papers are attributed correctly. We present a scaling analysis to highlight the applicability of the proposed method to even larger datasets when sufficient compute capabilities are more widely available to the academic community. Furthermore, we analyze the attribution accuracy in settings where the goal is to identify all authors of an anonymous manuscript. Thanks to our method, we are not only able to predict the author of an anonymous work but we also provide empirical evidence of the key aspects that make a paper attributable. We have open-sourced the necessary tools to reproduce our experiments.

Assuntos

Autoria , Aprendizado Profundo , Método Duplo-Cego , Fontes de Energia Elétrica , Redes Neurais de Computação

13.

Microgravity induces overconfidence in perceptual decision-making.

Loued-Khenissi, Leyla; Pfeiffer, Christian; Saxena, Rupal; Adarsh, Shivam; Scaramuzza, Davide.

Sci Rep ; 13(1): 9727, 2023 06 15.

Artigo em Inglês | MEDLINE | ID: mdl-37322248

RESUMO

Does gravity affect decision-making? This question comes into sharp focus as plans for interplanetary human space missions solidify. In the framework of Bayesian brain theories, gravity encapsulates a strong prior, anchoring agents to a reference frame via the vestibular system, informing their decisions and possibly their integration of uncertainty. What happens when such a strong prior is altered? We address this question using a self-motion estimation task in a space analog environment under conditions of altered gravity. Two participants were cast as remote drone operators orbiting Mars in a virtual reality environment on board a parabolic flight, where both hyper- and microgravity conditions were induced. From a first-person perspective, participants viewed a drone exiting a cave and had to first predict a collision and then provide a confidence estimate of their response. We evoked uncertainty in the task by manipulating the motion's trajectory angle. Post-decision subjective confidence reports were negatively predicted by stimulus uncertainty, as expected. Uncertainty alone did not impact overt behavioral responses (performance, choice) differentially across gravity conditions. However microgravity predicted higher subjective confidence, especially in interaction with stimulus uncertainty. These results suggest that variables relating to uncertainty affect decision-making distinctly in microgravity, highlighting the possible need for automatized, compensatory mechanisms when considering human factors in space research.

Assuntos

Gravidade Alterada , Voo Espacial , Ausência de Peso , Humanos , Teorema de Bayes , Incerteza , Encéfalo

14.

Reaching the limit in autonomous racing: Optimal control versus reinforcement learning.

Song, Yunlong; Romero, Angel; Müller, Matthias; Koltun, Vladlen; Scaramuzza, Davide.

Sci Robot ; 8(82): eadg1462, 2023 Sep 27.

Artigo em Inglês | MEDLINE | ID: mdl-37703383

RESUMO

A central question in robotics is how to design a control system for an agile mobile robot. This paper studies this question systematically, focusing on a challenging setting: autonomous drone racing. We show that a neural network controller trained with reinforcement learning (RL) outperformed optimal control (OC) methods in this setting. We then investigated which fundamental factors have contributed to the success of RL or have limited OC. Our study indicates that the fundamental advantage of RL over OC is not that it optimizes its objective better but that it optimizes a better objective. OC decomposes the problem into planning and control with an explicit intermediate representation, such as a trajectory, that serves as an interface. This decomposition limits the range of behaviors that can be expressed by the controller, leading to inferior control performance when facing unmodeled effects. In contrast, RL can directly optimize a task-level objective and can leverage domain randomization to cope with model uncertainty, allowing the discovery of more robust control responses. Our findings allowed us to push an agile drone to its maximum performance, achieving a peak acceleration greater than 12 times the gravitational acceleration and a peak velocity of 108 kilometers per hour. Our policy achieved superhuman control within minutes of training on a standard workstation. This work presents a milestone in agile robotics and sheds light on the role of RL and OC in robot control.

15.

Visual attention prediction improves performance of autonomous drone racing agents.

Pfeiffer, Christian; Wengeler, Simon; Loquercio, Antonio; Scaramuzza, Davide.

PLoS One ; 17(3): e0264471, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35231038

RESUMO

Humans race drones faster than neural networks trained for end-to-end autonomous flight. This may be related to the ability of human pilots to select task-relevant visual information effectively. This work investigates whether neural networks capable of imitating human eye gaze behavior and attention can improve neural networks' performance for the challenging task of vision-based autonomous drone racing. We hypothesize that gaze-based attention prediction can be an efficient mechanism for visual information selection and decision making in a simulator-based drone racing task. We test this hypothesis using eye gaze and flight trajectory data from 18 human drone pilots to train a visual attention prediction model. We then use this visual attention prediction model to train an end-to-end controller for vision-based autonomous drone racing using imitation learning. We compare the drone racing performance of the attention-prediction controller to those using raw image inputs and image-based abstractions (i.e., feature tracks). Comparing success rates for completing a challenging race track by autonomous flight, our results show that the attention-prediction based controller (88% success rate) outperforms the RGB-image (61% success rate) and feature-tracks (55% success rate) controller baselines. Furthermore, visual attention-prediction and feature-track based models showed better generalization performance than image-based models when evaluated on hold-out reference trajectories. Our results demonstrate that human visual attention prediction improves the performance of autonomous vision-based drone racing agents and provides an essential step towards vision-based, fast, and agile autonomous flight that eventually can reach and even exceed human performances.

Assuntos

Redes Neurais de Computação , Dispositivos Aéreos não Tripulados , Fixação Ocular , Humanos , Visão Ocular

16.

AlphaPilot: autonomous drone racing.

Foehn, Philipp; Brescianini, Dario; Kaufmann, Elia; Cieslewski, Titus; Gehrig, Mathias; Muglikar, Manasi; Scaramuzza, Davide.

Auton Robots ; 46(1): 307-320, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35221535

RESUMO

This paper presents a novel system for autonomous, vision-based drone racing combining learned data abstraction, nonlinear filtering, and time-optimal trajectory planning. The system has successfully been deployed at the first autonomous drone racing world championship: the 2019 AlphaPilot Challenge. Contrary to traditional drone racing systems, which only detect the next gate, our approach makes use of any visible gate and takes advantage of multiple, simultaneous gate detections to compensate for drift in the state estimate and build a global map of the gates. The global map and drift-compensated state estimate allow the drone to navigate through the race course even when the gates are not immediately visible and further enable to plan a near time-optimal path through the race course in real time based on approximate drone dynamics. The proposed system has been demonstrated to successfully guide the drone through tight race courses reaching speeds up to 8 m / s and ranked second at the 2019 AlphaPilot Challenge.

17.

Agilicious: Open-source and open-hardware agile quadrotor for vision-based flight.

Foehn, Philipp; Kaufmann, Elia; Romero, Angel; Penicka, Robert; Sun, Sihao; Bauersfeld, Leonard; Laengle, Thomas; Cioffi, Giovanni; Song, Yunlong; Loquercio, Antonio; Scaramuzza, Davide.

Sci Robot ; 7(67): eabl6259, 2022 06 22.

Artigo em Inglês | MEDLINE | ID: mdl-35731886

RESUMO

Autonomous, agile quadrotor flight raises fundamental challenges for robotics research in terms of perception, planning, learning, and control. A versatile and standardized platform is needed to accelerate research and let practitioners focus on the core problems. To this end, we present Agilicious, a codesigned hardware and software framework tailored to autonomous, agile quadrotor flight. It is completely open source and open hardware and supports both model-based and neural network-based controllers. Also, it provides high thrust-to-weight and torque-to-inertia ratios for agility, onboard vision sensors, graphics processing unit (GPU)-accelerated compute hardware for real-time perception and neural network inference, a real-time flight controller, and a versatile software stack. In contrast to existing frameworks, Agilicious offers a unique combination of flexible software stack and high-performance hardware. We compare Agilicious with prior works and demonstrate it on different agile tasks, using both model-based and neural network-based controllers. Our demonstrators include trajectory tracking at up to 5g and 70 kilometers per hour in a motion capture system, and vision-based acrobatic flight and obstacle avoidance in both structured and unstructured environments using solely onboard perception. Last, we demonstrate its use for hardware-in-the-loop simulation in virtual reality environments. Because of its versatility, we believe that Agilicious supports the next generation of scientific and industrial quadrotor research.

Assuntos

Robótica , Simulação por Computador , Redes Neurais de Computação , Software , Visão Ocular

18.

Event-Based Vision: A Survey.

Gallego, Guillermo; Delbruck, Tobi; Orchard, Garrick; Bartolozzi, Chiara; Taba, Brian; Censi, Andrea; Leutenegger, Stefan; Davison, Andrew J; Conradt, Jorg; Daniilidis, Kostas; Scaramuzza, Davide.

IEEE Trans Pattern Anal Mach Intell ; 44(1): 154-180, 2022 01.

Artigo em Inglês | MEDLINE | ID: mdl-32750812

RESUMO

Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of µs), very high dynamic range (140 dB versus 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world.

Assuntos

Algoritmos , Robótica , Redes Neurais de Computação

19.

High Speed and High Dynamic Range Video with an Event Camera.

Rebecq, Henri; Ranftl, Rene; Koltun, Vladlen; Scaramuzza, Davide.

IEEE Trans Pattern Anal Mach Intell ; 43(6): 1964-1980, 2021 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-31902754

RESUMO

Event cameras are novel sensors that report brightness changes in the form of a stream of asynchronous "events" instead of intensity frames. They offer significant advantages with respect to conventional cameras: high temporal resolution, high dynamic range, and no motion blur. While the stream of events encodes in principle the complete visual signal, the reconstruction of an intensity image from a stream of events is an ill-posed problem in practice. Existing reconstruction approaches are based on hand-crafted priors and strong assumptions about the imaging process as well as the statistics of natural images. In this work we propose to learn to reconstruct intensity images from event streams directly from data instead of relying on any hand-crafted priors. We propose a novel recurrent network to reconstruct videos from a stream of events, and train it on a large amount of simulated event data. During training we propose to use a perceptual loss to encourage reconstructions to follow natural image statistics. We further extend our approach to synthesize color images from color event streams. Our quantitative experiments show that our network surpasses state-of-the-art reconstruction methods by a large margin in terms of image quality ( ), while comfortably running in real-time. We show that the network is able to synthesize high framerate videos ( frames per second) of high-speed phenomena (e.g., a bullet hitting an object) and is able to provide high dynamic range reconstructions in challenging lighting conditions. As an additional contribution, we demonstrate the effectiveness of our reconstructions as an intermediate representation for event data. We show that off-the-shelf computer vision algorithms can be applied to our reconstructions for tasks such as object classification and visual-inertial odometry and that this strategy consistently outperforms algorithms that were specifically designed for event data. We release the reconstruction code, a pre-trained model and the datasets to enable further research.

20.

Time-optimal planning for quadrotor waypoint flight.

Foehn, Philipp; Romero, Angel; Scaramuzza, Davide.

Sci Robot ; 6(56)2021 07 21.

Artigo em Inglês | MEDLINE | ID: mdl-34290102

RESUMO

Quadrotors are among the most agile flying robots. However, planning time-optimal trajectories at the actuation limit through multiple waypoints remains an open problem. This is crucial for applications such as inspection, delivery, search and rescue, and drone racing. Early works used polynomial trajectory formulations, which do not exploit the full actuator potential because of their inherent smoothness. Recent works resorted to numerical optimization but require waypoints to be allocated as costs or constraints at specific discrete times. However, this time allocation is a priori unknown and renders previous works incapable of producing truly time-optimal trajectories. To generate truly time-optimal trajectories, we propose a solution to the time allocation problem while exploiting the full quadrotor's actuator potential. We achieve this by introducing a formulation of progress along the trajectory, which enables the simultaneous optimization of the time allocation and the trajectory itself. We compare our method against related approaches and validate it in real-world flights in one of the world's largest motion-capture systems, where we outperform human expert drone pilots in a drone-racing task.

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA