Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Opt Express ; 31(4): 6827-6848, 2023 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-36823931

RESUMO

Detecting and avoiding obstacles while navigating can pose a challenge for people with low vision, but augmented reality (AR) has the potential to assist by enhancing obstacle visibility. Perceptual and user experience research is needed to understand how to craft effective AR visuals for this purpose. We developed a prototype AR application capable of displaying multiple kinds of visual cues for obstacles on an optical see-through head-mounted display. We assessed the usability of these cues via a study in which participants with low vision navigated an obstacle course. The results suggest that 3D world-locked AR cues were superior to directional heads-up cues for most participants during this activity.


Assuntos
Realidade Aumentada , Óculos Inteligentes , Baixa Visão , Humanos , Sinais (Psicologia) , Interface Usuário-Computador
2.
Sensors (Basel) ; 21(12)2021 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-34208112

RESUMO

Pedestrian tracking systems implemented in regular smartphones may provide a convenient mechanism for wayfinding and backtracking for people who are blind. However, virtually all existing studies only considered sighted participants, whose gait pattern may be different from that of blind walkers using a long cane or a dog guide. In this contribution, we present a comparative assessment of several algorithms using inertial sensors for pedestrian tracking, as applied to data from WeAllWalk, the only published inertial sensor dataset collected indoors from blind walkers. We consider two situations of interest. In the first situation, a map of the building is not available, in which case we assume that users walk in a network of corridors intersecting at 45° or 90°. We propose a new two-stage turn detector that, combined with an LSTM-based step counter, can robustly reconstruct the path traversed. We compare this with RoNIN, a state-of-the-art algorithm based on deep learning. In the second situation, a map is available, which provides a strong prior on the possible trajectories. For these situations, we experiment with particle filtering, with an additional clustering stage based on mean shift. Our results highlight the importance of training and testing inertial odometry systems for assisted navigation with data from blind walkers.


Assuntos
Pedestres , Smartphone , Algoritmos , Animais , Cães , Marcha , Humanos
3.
Proc Mach Learn Res ; 210: 37-49, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37323294

RESUMO

Self-supervised learning (SSL) has become prevalent for learning representations in computer vision. Notably, SSL exploits contrastive learning to encourage visual representations to be invariant under various image transformations. The task of gaze estimation, on the other hand, demands not just invariance to various appearances but also equivariance to the geometric transformations. In this work, we propose a simple contrastive representation learning framework for gaze estimation, named Gaze Contrastive Learning (GazeCLR). GazeCLR exploits multi-view data to promote equivariance and relies on selected data augmentation techniques that do not alter gaze directions for invariance learning. Our experiments demonstrate the effectiveness of GazeCLR for several settings of the gaze estimation task. Particularly, our results show that GazeCLR improves the performance of cross-domain gaze estimation and yields as high as 17.2% relative improvement. Moreover, the GazeCLR framework is competitive with state-of-the-art representation learning methods for few-shot evaluation. The code and pre-trained models are available at https://github.com/jswati31/gazeclr.

4.
Artigo em Inglês | MEDLINE | ID: mdl-38152683

RESUMO

Pedestrian dead reckoning (PDR) relies on the estimation of the length of each step taken by the walker in a path from inertial data (e.g. as recorded by a smartphone). Existing algorithms either estimate step lengths directly, or predict walking speed, which can then be integrated over a step period to obtain step length. We present an analysis, using a common architecture formed by an LSTM followed by four fully connected layers, of the quality of reconstruction when predicting step length vs. walking speed. Our experiments, conducted on a data set collected by twelve participants, strongly suggest that step length can be predicted more reliably than average walking speed over each step.

5.
ACM Trans Access Comput ; 16(2): 1-26, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37427355

RESUMO

In this article, we introduce Semantic Interior Mapology (SIM), a web app that allows anyone to quickly trace the floor plan of a building, generating a vectorized representation that can be automatically converted into a tactile map at the desired scale. The design of SIM is informed by a focus group with seven blind participants. Maps generated by SIM at two different scales have been tested by a user study with 10 participants, who were asked to perform a number of tasks designed to ascertain the spatial knowledge acquired through map exploration. These tasks included cross-map pointing and path finding, and determination of turn direction/walker orientation during imagined path traversal. By and large, participants were able to successfully complete the tasks, suggesting that these types of maps could be useful for pre-journey spatial learning.

6.
ASSETS ; 20232023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38045532

RESUMO

RouteNav is an iOS app designed to support wayfinding for blind travelers in an indoor/outdoor transit hub. It doesn't rely on external infrastructure (such as BLE beacons); instead, localization is obtained by fusing spatial information from inertial dead reckoning and GPS (when available) via particle filtering. Routes are expressed as sequences of "tiles", where each tile may contain relevant points of interest. Redundant modalities are used to guide users to switching goalposts within tiles. In this paper, we describe the different components of RouteNav, and report on a user study with seven blind participants, who traversed three challenging routes in a transit hub while receiving input from the app.

7.
ASSETS ; 20232023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38463538

RESUMO

We present a study with 20 participants with low vision who operated two types of screen magnification (lens and full) on a laptop computer to read two types of document (text and web page). Our purposes were to comparatively assess the two magnification modalities, and to obtain some insight into how people with low vision use the mouse to control the center of magnification. These observations may inform the design of systems for the automatic control of the center of magnification. Our results show that there were no significant differences in reading performances or in subjective preferences between the two magnification modes. However, when using the lens mode, our participants adopted more consistent and uniform mouse motion patterns, while longer and more frequent pauses and shorter overall path lengths were measured using the full mode. Analysis of the distribution of gaze points (as measured by a gaze tracker) using the full mode shows that, when reading a text document, most participants preferred to move the area of interest to a specific region of the screen.

8.
Commun ACM ; 55(1): 96-104, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22815563

RESUMO

Computer vision holds great promise for helping persons with blindness or visual impairments (VI) to interpret and explore the visual world. To this end, it is worthwhile to assess the situation critically by understanding the actual needs of the VI population and which of these needs might be addressed by computer vision. This article reviews the types of assistive technology application areas that have already been developed for VI, and the possible roles that computer vision can play in facilitating these applications. We discuss how appropriate user interfaces are designed to translate the output of computer vision algorithms into information that the user can quickly and safely act upon, and how system-level characteristics affect the overall usability of an assistive technology. Finally, we conclude by highlighting a few novel and intriguing areas of application of computer vision to assistive technology.

9.
Artigo em Inglês | MEDLINE | ID: mdl-35673555

RESUMO

Modern appearance-based gaze tracking algorithms require vast amounts of training data, with images of a viewer annotated with "ground truth" gaze direction. The standard approach to obtain gaze annotations is to ask subjects to fixate at specific known locations, then use a head model to determine the location of "origin of gaze". We propose using an IR gaze tracker to generate gaze annotations in natural settings that do not require the fixation of target points. This requires prior geometric calibration of the IR gaze tracker with the camera, such that the data produced by the IR tracker can be expressed in the camera's reference frame. This contribution introduces a simple tracker/camera calibration procedure based on the PnP algorithm and demonstrates its use to obtain a full characterization of gaze direction that can be used for ground truth annotation.

10.
Artigo em Inglês | MEDLINE | ID: mdl-35754936

RESUMO

Over the past several years, a number of data-driven gaze tracking algorithms have been proposed, which have been shown to outperform classic model-based methods in terms of gaze direction accuracy. These algorithms leverage the recent development of sophisticated CNN architectures, as well as the availability of large gaze datasets captured under various conditions. One shortcoming of black-box, end-to-end methods, though, is that any unexpected behaviors are difficult to explain. In addition, there is always the risk that a system trained with a certain dataset may not perform well when tested on data from a different source (the "domain gap" problem.) In this work, we propose a novel method to embed eye geometry information in an end-to-end gaze estimation network by means of a "geometric layer". Our experimental results show that our system outperforms other state-of-the-art methods in cross-dataset evaluation, while producing competitive performance over within dataset tests. In addition, the proposed system is able to extrapolate gaze angles outside the range of those considered in the training data.

11.
IEEE Winter Conf Appl Comput Vis ; 2021: 11-20, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-33851070

RESUMO

We propose a method for synthesizing eye images from segmentation masks with a desired style. The style encompasses attributes such as skin color, texture, iris color, and personal identity. Our approach generates an eye image that is consistent with a given segmentation mask and has the attributes of the input style image. We apply our method to data augmentation as well as to gaze redirection. The previous techniques of synthesizing real eye images from synthetic eye images for data augmentation lacked control over the generated attributes. We demonstrate the effectiveness of the proposed method in synthesizing realistic eye images with given characteristics corresponding to the synthetic labels for data augmentation, which is further useful for various tasks such as gaze estimation, eye image segmentation, pupil detection, etc. We also show how our approach can be applied to gaze redirection using only synthetic gaze labels, improving the previous state of the art results. The main contributions of our paper are i) a novel approach for Style-Based eye image generation from segmentation mask; ii) the use of this approach for gaze-redirection without the need for gaze annotated real eye images.

12.
Artigo em Inglês | MEDLINE | ID: mdl-34308095

RESUMO

We present a comparative analysis of inertial-based odometry algorithms for the purpose of assisted return. An assisted return system facilitates backtracking of a path previously taken, and can be particularly useful for blind pedestrians. We present a new algorithm for path matching, and test it in simulated assisted return tasks with data from WeAllWalk, the only existing data set with inertial data recorded from blind walkers. We consider two odometry systems, one based on deep learning (RoNIN), and the second based on robust turn detection and step counting. Our results show that the best path matching results are obtained using the turns/steps odometry system.

13.
Comput Help People Spec Needs ; 12376: 459-466, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-33236002

RESUMO

We introduce a multi-scale embossed map authoring tool (M-EMAT) that produces tactile maps of indoor environments from the building's structural layout and its 3D-scanned interiors on demand. Our tool renders indoor tactile maps at different spatial scales, representing a building's structure, a zoomed-in of a specific area, or an interior of a room. M-EMAT is very easy to use and produces accurate results even in the case of complex building layouts.

14.
Int J Artif Intell Tools ; 18(3): 379-397, 2009 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-19960101

RESUMO

We describe a wayfinding system for blind and visually impaired persons that uses a camera phone to determine the user's location with respect to color markers, posted at locations of interest (such as offices), which are automatically detected by the phone. The color marker signs are specially designed to be detected in real time in cluttered environments using computer vision software running on the phone; a novel segmentation algorithm quickly locates the borders of the color marker in each image, which allows the system to calculate how far the marker is from the phone. We present a model of how the user's scanning strategy (i.e. how he/she pans the phone left and right to find color markers) affects the system's ability to detect color markers given the limitations imposed by motion blur, which is always a possibility whenever a camera is in motion. Finally, we describe experiments with our system tested by blind and visually impaired volunteers, demonstrating their ability to reliably use the system to find locations designated by color markers in a variety of indoor and outdoor environments, and elucidating which search strategies were most effective for users.

15.
IUI ; 2019: 197-207, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-31681911

RESUMO

We present a study with seven blind participants using three different mobile OCR apps to find text posted in various indoor environments. The first app considered was Microsoft SeeingAI in its Short Text mode, which reads any text in sight with a minimalistic interface. The second app was Spot+OCR, a custom application that separates the task of text detection from OCR proper. Upon detection of text in the image, Spot+OCR generates a short vibration; as soon as the user stabilizes the phone, a high-resolution snapshot is taken and OCR-processed. The third app, Guided OCR, was designed to guide the user in taking several pictures in a 360° span at the maximum resolution available by the camera, with minimum overlap between pictures. Quantitative results (in terms of true positive ratios and traversal speed) were recorded. Along with the qualitative observation and outcomes from an exit survey, these results allow us to identify and assess the different strategies used by our participants, as well as the challenges of operating these systems without sight.

16.
Proc Int Conf Doc Anal Recognit ; 2017: 1275-1282, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-29563857

RESUMO

We introduce an algorithm for word-level text spotting that is able to accurately and reliably determine the bounding regions of individual words of text "in the wild". Our system is formed by the cascade of two convolutional neural networks. The first network is fully convolutional and is in charge of detecting areas containing text. This results in a very reliable but possibly inaccurate segmentation of the input image. The second network (inspired by the popular YOLO architecture) analyzes each segment produced in the first stage, and predicts oriented rectangular regions containing individual words. No post-processing (e.g. text line grouping) is necessary. With execution time of 450 ms for a 1000 × 560 image on a Titan X GPU, our system achieves good performance on the ICDAR 2013, 2015 benchmarks [2], [1].

17.
ACM Trans Access Comput ; 10(4)2017 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-29270243

RESUMO

Mobile optical character recognition (OCR) apps have come of age. Many blind individuals use them on a daily basis. The usability of such tools, however, is limited by the requirement that a good picture of the text to be read must be taken, something that is difficult to do without sight. Some mobile OCR apps already implement auto-shot and guidance mechanisms to facilitate this task. In this paper, we describe two experiments with blind participants, who tested these two interactive mechanisms on a customized iPhone implementation. These experiments bring to light a number of interesting aspects of accessing a printed document without sight, and enable a comparative analysis of the available interaction modalities.

18.
Artigo em Inglês | MEDLINE | ID: mdl-28757907

RESUMO

For blind travelers, finding crosswalks and remaining within their borders while traversing them is a crucial part of any trip involving street crossings. While standard Orientation & Mobility (O&M) techniques allow blind travelers to safely negotiate street crossings, additional information about crosswalks and other important features at intersections would be helpful in many situations, resulting in greater safety and/or comfort during independent travel. For instance, in planning a trip a blind pedestrian may wish to be informed of the presence of all marked crossings near a desired route. We have conducted a survey of several O&M experts from the United States and Italy to determine the role that crosswalks play in travel by blind pedestrians. The results show stark differences between survey respondents from the U.S. compared with Italy: the former group emphasized the importance of following standard O&M techniques at all legal crossings (marked or unmarked), while the latter group strongly recommended crossing at marked crossings whenever possible. These contrasting opinions reflect differences in the traffic regulations of the two countries and highlight the diversity of needs that travelers in different regions may have. To address the challenges faced by blind pedestrians in negotiating street crossings, we devised a computer vision-based technique that mines existing spatial image databases for discovery of zebra crosswalks in urban settings. Our algorithm first searches for zebra crosswalks in satellite images; all candidates thus found are validated against spatially registered Google Street View images. This cascaded approach enables fast and reliable discovery and localization of zebra crosswalks in large image datasets. While fully automatic, our algorithm can be improved by a final crowdsourcing validation. To this end, we developed a Pedestrian Crossing Human Validation (PCHV) web service, which supports crowdsourcing to rule out false positives and identify false negatives.

19.
IEEE Trans Pattern Anal Mach Intell ; 28(11): 1713-23, 2006 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-17063678

RESUMO

We present an algorithm for color classification with explicit illuminant estimation and compensation. A Gaussian classifier is trained with color samples from just one training image. Then, using a simple diagonal illumination model, the illuminants in a new scene that contains some of the surface classes seen in the training image are estimated in a maximum likelihood framework using the Expectation Maximization algorithm. We also show how to impose priors on the illuminants, effectively computing a maximum a posteriori estimation. Experimental results are provided to demonstrate the performance of our classification algorithm in the case of outdoor images.


Assuntos
Algoritmos , Inteligência Artificial , Cor , Colorimetria/métodos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Análise por Conglomerados , Armazenamento e Recuperação da Informação/métodos
20.
Artigo em Inglês | MEDLINE | ID: mdl-27616942

RESUMO

We introduce an algorithm for text detection and localization ("spotting") that is computationally efficient and produces state-of-the-art results. Our system uses multi-channel MSERs to detect a large number of promising regions, then subsamples these regions using a clustering approach. Representatives of region clusters are binarized and then passed on to a deep network. A final line grouping stage forms word-level segments. On the ICDAR 2011 and 2015 benchmarks, our algorithm obtains an F-score of 82% and 83%, respectively, at a computational cost of 1.2 seconds per frame. We also introduce a version that is three times as fast, with only a slight reduction in performance.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA