Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 67
Filtrar
1.
IEEE Trans Image Process ; 33: 2689-2702, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38536682

RESUMEN

Unsupervised object discovery (UOD) refers to the task of discriminating the whole region of objects from the background within a scene without relying on labeled datasets, which benefits the task of bounding-box-level localization and pixel-level segmentation. This task is promising due to its ability to discover objects in a generic manner. We roughly categorize existing techniques into two main directions, namely the generative solutions based on image resynthesis, and the clustering methods based on self-supervised models. We have observed that the former heavily relies on the quality of image reconstruction, while the latter shows limitations in effectively modeling semantic correlations. To directly target at object discovery, we focus on the latter approach and propose a novel solution by incorporating weakly-supervised contrastive learning (WCL) to enhance semantic information exploration. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images, which is achieved by fine-tuning the feature encoder of a self-supervised model, namely DINO, via WCL. Subsequently, we introduce Principal Component Analysis (PCA) to localize object regions. The principal projection direction, corresponding to the maximal eigenvalue, serves as an indicator of the object region(s). Extensive experiments on benchmark unsupervised object discovery datasets demonstrate the effectiveness of our proposed solution. The source code and experimental results are publicly available via our project page at https://github.com/npucvr/WSCUOD.git.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12635-12649, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37310842

RESUMEN

Vision transformers have shown great success on numerous computer vision tasks. However, their central component, softmax attention, prohibits vision transformers from scaling up to high-resolution images, due to both the computational complexity and memory footprint being quadratic. Linear attention was introduced in natural language processing (NLP) which reorders the self-attention mechanism to mitigate a similar issue, but directly applying existing linear attention to vision may not lead to satisfactory results. We investigate this problem and point out that existing linear attention methods ignore an inductive bias in vision tasks, i.e., 2D locality. In this article, we propose Vicinity Attention, which is a type of linear attention that integrates 2D locality. Specifically, for each image patch, we adjust its attention weight based on its 2D Manhattan distance from its neighbouring patches. In this case, we achieve 2D locality in a linear complexity where the neighbouring image patches receive stronger attention than far away patches. In addition, we propose a novel Vicinity Attention Block that is comprised of Feature Reduction Attention (FRA) and Feature Preserving Connection (FPC) in order to address the computational bottleneck of linear attention approaches, including our Vicinity Attention, whose complexity grows quadratically with respect to the feature dimension. The Vicinity Attention Block computes attention in a compressed feature space with an extra skip connection to retrieve the original feature distribution. We experimentally validate that the block further reduces computation without degenerating the accuracy. Finally, to validate the proposed methods, we build a linear vision transformer backbone named Vicinity Vision Transformer (VVT). Targeting general vision tasks, we build VVT in a pyramid structure with progressively reduced sequence length. We perform extensive experiments on CIFAR-100, ImageNet-1 k, and ADE20 K datasets to validate the effectiveness of our method. Our method has a slower growth rate in terms of computational overhead than previous transformer-based and convolution-based networks when the input resolution increases. In particular, our approach achieves state-of-the-art image classification accuracy with 50% fewer parameters than previous approaches.

3.
IEEE Trans Pattern Anal Mach Intell ; 45(11): 13100-13116, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37384466

RESUMEN

We propose a novel generative saliency prediction framework that adopts an informative energy-based model as a prior distribution. The energy-based prior model is defined on the latent space of a saliency generator network that generates the saliency map based on a continuous latent variables and an observed image. Both the parameters of saliency generator and the energy-based prior are jointly trained via Markov chain Monte Carlo-based maximum likelihood estimation, in which the sampling from the intractable posterior and prior distributions of the latent variables are performed by Langevin dynamics. With the generative saliency model, we can obtain a pixel-wise uncertainty map from an image, indicating model confidence in the saliency prediction. Different from existing generative models, which define the prior distribution of the latent variables as a simple isotropic Gaussian distribution, our model uses an energy-based informative prior which can be more expressive in capturing the latent space of the data. With the informative energy-based prior, we extend the Gaussian distribution assumption of generative models to achieve a more representative distribution of the latent space, leading to more reliable uncertainty estimation. We apply the proposed frameworks to both RGB and RGB-D salient object detection tasks with both transformer and convolutional neural network backbones. We further propose an adversarial learning algorithm and a variational inference algorithm as alternatives to train the proposed generative framework. Experimental results show that our generative saliency model with an energy-based prior can achieve not only accurate saliency predictions but also reliable uncertainty maps that are consistent with human perception.

4.
Transl Vis Sci Technol ; 12(3): 20, 2023 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-36943168

RESUMEN

Purpose: Accurate mapping of phosphene locations from visual prostheses is vital to encode spatial information. This process may involve the subject pointing to evoked phosphene locations with their finger. Here, we demonstrate phosphene mapping for a retinal implant using eye movements and compare it with retinotopic electrode positions and previous results using conventional finger-based mapping. Methods: Three suprachoroidal retinal implant recipients (NCT03406416) indicated the spatial position of phosphenes. Electrodes were stimulated individually, and the subjects moved their finger (finger based) or their eyes (gaze based) to the perceived phosphene location. The distortion of the measured phosphene locations from the expected locations (retinotopic electrode locations) was characterized with Procrustes analysis. Results: The finger-based phosphene locations were compressed spatially relative to the expected locations all three subjects, but preserved the general retinotopic arrangement (scale factors ranged from 0.37 to 0.83). In two subjects, the gaze-based phosphene locations were similar to the expected locations (scale factors of 0.72 and 0.99). For the third subject, there was no apparent relationship between gaze-based phosphene locations and electrode locations (scale factor of 0.07). Conclusions: Gaze-based phosphene mapping was achievable in two of three tested retinal prosthesis subjects and their derived phosphene maps correlated well with the retinotopic electrode layout. A third subject could not produce a coherent gaze-based phosphene map, but this may have revealed that their phosphenes were indistinct spatially. Translational Relevance: Gaze-based phosphene mapping is a viable alternative to conventional finger-based mapping, but may not be suitable for all subjects.


Asunto(s)
Movimientos Oculares , Prótesis Visuales , Humanos , Fosfenos , Trastornos de la Visión , Retina/cirugía
5.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 1312-1319, 2023 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-34941501

RESUMEN

With the help of the deep learning paradigm, many point cloud networks have been invented for visual analysis. However, there is great potential for development of these networks since the given information of point cloud data has not been fully exploited. To improve the effectiveness of existing networks in analyzing point cloud data, we propose a plug-and-play module, PnP-3D, aiming to refine the fundamental point cloud feature representations by involving more local context and global bilinear response from explicit 3D space and implicit feature space. To thoroughly evaluate our approach, we conduct experiments on three standard point cloud analysis tasks, including classification, semantic segmentation, and object detection, where we select three state-of-the-art networks from each task for evaluation. Serving as a plug-and-play module, PnP-3D can significantly boost the performances of established networks. In addition to achieving state-of-the-art results on four widely used point cloud benchmarks, we present comprehensive ablation studies and visualizations to demonstrate our approach's advantages. The code will be available at https://github.com/ShiQiu0419/pnp-3d.

6.
Artículo en Inglés | MEDLINE | ID: mdl-35771782

RESUMEN

Conventional object detection models require large amounts of training data. In comparison, humans can recognize previously unseen objects by merely knowing their semantic description. To mimic similar behavior, zero-shot object detection (ZSD) aims to recognize and localize "unseen" object instances by using only their semantic information. The model is first trained to learn the relationships between visual and semantic domains for seen objects, later transferring the acquired knowledge to totally unseen objects. This setting gives rise to the need for correct alignment between visual and semantic concepts so that the unseen objects can be identified using only their semantic attributes. In this article, we propose a novel loss function called "polarity loss" that promotes correct visual-semantic alignment for an improved ZSD. On the one hand, it refines the noisy semantic embeddings via metric learning on a "semantic vocabulary" of related concepts to establish a better synergy between visual and semantic domains. On the other hand, it explicitly maximizes the gap between positive and negative predictions to achieve better discrimination between seen, unseen, and background objects. Our approach is inspired by embodiment theories in cognitive science that claim human semantic understanding to be grounded in past experiences (seen objects), related linguistic concepts (word vocabulary), and visual perception (seen/unseen object images). We conduct extensive evaluations on the Microsoft Common Objects in Context (MS-COCO) and Pascal Visual Object Classes (VOC) datasets, showing significant improvements over state of the art. Our code and evaluation protocols available at: https://github.com/salman-h-khan/PL-ZSD_Release.

7.
IEEE Trans Pattern Anal Mach Intell ; 44(3): 1192-1204, 2022 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-32877331

RESUMEN

Super-Resolution convolutional neural networks have recently demonstrated high-quality restoration for single images. However, existing algorithms often require very deep architectures and long training times. Furthermore, current convolutional neural networks for super-resolution are unable to exploit features at multiple scales and weigh them equally or at only static scale only, limiting their learning capability. In this exposition, we present a compact and accurate super-resolution algorithm, namely, densely residual laplacian network (DRLN). The proposed network employs cascading residual on the residual structure to allow the flow of low-frequency information to focus on learning high and mid-level features. In addition, deep supervision is achieved via the densely concatenated residual blocks settings, which also helps in learning from high-level complex features. Moreover, we propose Laplacian attention to model the crucial features to learn the inter and intra-level dependencies between the feature maps. Furthermore, comprehensive quantitative and qualitative evaluations on low-resolution, noisy low-resolution, and real historical image benchmark datasets illustrate that our DRLN algorithm performs favorably against the state-of-the-art methods visually and accurately.

8.
IEEE Trans Pattern Anal Mach Intell ; 44(9): 5761-5779, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-33856982

RESUMEN

We propose the first stochastic framework to employ uncertainty for RGB-D saliency detection by learning from the data labeling process. Existing RGB-D saliency detection models treat this task as a point estimation problem by predicting a single saliency map following a deterministic learning pipeline. We argue that, however, the deterministic solution is relatively ill-posed. Inspired by the saliency data labeling process, we propose a generative architecture to achieve probabilistic RGB-D saliency detection which utilizes a latent variable to model the labeling variations. Our framework includes two main models: 1) a generator model, which maps the input image and latent variable to stochastic saliency prediction, and 2) an inference model, which gradually updates the latent variable by sampling it from the true or approximate posterior distribution. The generator model is an encoder-decoder saliency network. To infer the latent variable, we introduce two different solutions: i) a Conditional Variational Auto-encoder with an extra encoder to approximate the posterior distribution of the latent variable; and ii) an Alternating Back-Propagation technique, which directly samples the latent variable from the true posterior distribution. Qualitative and quantitative results on six challenging RGB-D benchmark datasets show our approach's superior performance in learning the distribution of saliency maps. The source code is publicly available via our project page: https://github.com/JingZhang617/UCNet.

9.
Artículo en Inglés | MEDLINE | ID: mdl-34898442

RESUMEN

Deep convolutional neural networks perform better on images containing spatially invariant degradations, also known as synthetic degradations; however, their performance is limited on real-degraded photographs and requires multiple-stage network modeling. To advance the practicability of restoration algorithms, this article proposes a novel single-stage blind real image restoration network (R²Net) by employing a modular architecture. We use a residual on the residual structure to ease low-frequency information flow and apply feature attention to exploit the channel dependencies. Furthermore, the evaluation in terms of quantitative metrics and visual quality for four restoration tasks, i.e., denoising, super-resolution, raindrop removal, and JPEG compression on 11 real degraded datasets against more than 30 state-of-the-art algorithms, demonstrates the superiority of our R²Net. We also present the comparison on three synthetically generated degraded datasets for denoising to showcase our method's capability on synthetics denoising. The codes, trained models, and results are available on https://github.com/saeed-anwar/R2Net.

10.
Transl Vis Sci Technol ; 10(10): 12, 2021 08 12.
Artículo en Inglés | MEDLINE | ID: mdl-34581770

RESUMEN

Purpose: To report the initial safety and efficacy results of a second-generation (44-channel) suprachoroidal retinal prosthesis at 56 weeks after device activation. Methods: Four subjects, with advanced retinitis pigmentosa and bare-light perception only, enrolled in a phase II trial (NCT03406416). A 44-channel electrode array was implanted in a suprachoroidal pocket. Device stability, efficacy, and adverse events were investigated at 12-week intervals. Results: All four subjects were implanted successfully and there were no device-related serious adverse events. Color fundus photography indicated a mild postoperative subretinal hemorrhage in two recipients, which cleared spontaneously within 2 weeks. Optical coherence tomography confirmed device stability and position under the macula. Screen-based localization accuracy was significantly better for all subjects with device on versus device off. Two subjects were significantly better with the device on in a motion discrimination task at 7, 15, and 30°/s and in a spatial discrimination task at 0.033 cycles per degree. All subjects were more accurate with the device on than device off at walking toward a target on a modified door task, localizing and touching tabletop objects, and detecting obstacles in an obstacle avoidance task. A positive effect of the implant on subjects' daily lives was confirmed by an orientation and mobility assessor and subject self-report. Conclusions: These interim study data demonstrate that the suprachoroidal prosthesis is safe and provides significant improvements in functional vision, activities of daily living, and observer-rated quality of life. Translational Relevance: A suprachoroidal prosthesis can provide clinically useful artificial vision while maintaining a safe surgical profile.


Asunto(s)
Retinitis Pigmentosa , Prótesis Visuales , Actividades Cotidianas , Humanos , Calidad de Vida , Visión Ocular
11.
Transl Vis Sci Technol ; 10(10): 7, 2021 08 12.
Artículo en Inglés | MEDLINE | ID: mdl-34383875

RESUMEN

Purpose: In a clinical trial (NCT03406416) of a second-generation (44-channel) suprachoroidal retinal prosthesis implanted in subjects with late-stage retinitis pigmentosa (RP), we assessed performance in real-world functional visual tasks and emotional well-being. Methods: The Functional Low-Vision Observer Rated Assessment (FLORA) and Impact of Vision Impairment-Very Low Vision (IVI-VLV) instruments were administered to four subjects before implantation and after device fitting. The FLORA contains 13 self-reported and 35 observer-reported items ranked for ease of conducting task (impossible-easy, central tendency given as mode). The IVI-VLV instrument quantified the impact of low vision on daily activities and emotional well-being. Results: Three subjects completed the FLORA for two years after device fitting; the fourth subject ceased participation in the FLORA after fitting for reasons unrelated to the device. For all subjects at each post-fitting visit, the mode ease of task with device ON was better or equal to device OFF. Ease of task improved over the first six months with device ON, then remained stable. Subjects reported improvements in mobility, functional vision, and quality of life with device ON. The IVI-VLV suggested self-assessed vision-related quality of life was not impacted by device implantation or usage. Conclusions: Subjects demonstrated sustained improved ease of task scores with device ON compared to OFF, indicating the device has a positive impact in the real-world setting. Translational Relevance: Our suprachoroidal retinal prosthesis shows potential utility in everyday life, by enabling an increased environmental awareness and improving access to sensory information for people with end-stage RP.


Asunto(s)
Retinitis Pigmentosa , Baja Visión , Prótesis Visuales , Humanos , Calidad de Vida , Retinitis Pigmentosa/cirugía , Visión Ocular
12.
IEEE Trans Pattern Anal Mach Intell ; 43(8): 2866-2873, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-33351750

RESUMEN

The advances made in predicting visual saliency using deep neural networks come at the expense of collecting large-scale annotated data. However, pixel-wise annotation is labor-intensive and overwhelming. In this paper, we propose to learn saliency prediction from a single noisy labelling, which is easy to obtain (e.g., from imperfect human annotation or from unsupervised saliency prediction methods). With this goal, we address a natural question: Can we learn saliency prediction while identifying clean labels in a unified framework? To answer this question, we call on the theory of robust model fitting and formulate deep saliency prediction from a single noisy labelling as robust network learning and exploit model consistency across iterations to identify inliers and outliers (i.e., noisy labels). Extensive experiments on different benchmark datasets demonstrate the superiority of our proposed framework, which can learn comparable saliency prediction with state-of-the-art fully supervised saliency methods. Furthermore, we show that simply by treating ground truth annotations as noisy labelling, our framework achieves tangible improvements over state-of-the-art methods.

13.
Transl Vis Sci Technol ; 9(13): 31, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-33384885

RESUMEN

Purpose: To investigate oculomotor behavior in response to dynamic stimuli in retinal implant recipients. Methods: Three suprachoroidal retinal implant recipients performed a four-alternative forced-choice motion discrimination task over six sessions longitudinally. Stimuli were a single white bar ("moving bar") or a series of white bars ("moving grating") sweeping left, right, up, or down across a 42″ monitor. Performance was compared with normal video processing and scrambled video processing (randomized image-to-electrode mapping to disrupt spatiotemporal structure). Eye and head movement was monitored throughout the task. Results: Two subjects had diminished performance with scrambling, suggesting retinotopic discrimination was used in the normal condition and made smooth pursuit eye movements congruent to the moving bar stimulus direction. These two subjects also made stimulus-related eye movements resembling optokinetic reflex (OKR) for moving grating stimuli, but the movement was incongruent with stimulus direction. The third subject was less adept at the task, appeared primarily reliant on head position cues (head movements were congruent to stimulus direction), and did not exhibit retinotopic discrimination and associated eye movements. Conclusions: Our observation of smooth pursuit indicates residual functionality of cortical direction-selective circuits and implies a more naturalistic perception of motion than expected. A distorted OKR implies improper functionality of retinal direction-selective circuits, possibly due to retinal remodeling or the non-selective nature of the electrical stimulation. Translational Relevance: Retinal implant users can make naturalistic eye movements in response to moving stimuli, highlighting the potential for eye tracker feedback to improve perceptual localization and image stabilization in camera-based visual prostheses.


Asunto(s)
Prótesis Visuales , Movimientos Oculares , Movimientos de la Cabeza , Humanos , Estimulación Luminosa , Seguimiento Ocular Uniforme
14.
Clin Neurophysiol ; 131(6): 1383-1398, 2020 06.
Artículo en Inglés | MEDLINE | ID: mdl-31866339

RESUMEN

Retinal prostheses are designed to restore a basic sense of sight to people with profound vision loss. They require a relatively intact posterior visual pathway (optic nerve, lateral geniculate nucleus and visual cortex). Retinal implants are options for people with severe stages of retinal degenerative disease such as retinitis pigmentosa and age-related macular degeneration. There have now been three regulatory-approved retinal prostheses. Over five hundred patients have been implanted globally over the past 15 years. Devices generally provide an improved ability to localize high-contrast objects, navigate, and perform basic orientation tasks. Adverse events have included conjunctival erosion, retinal detachment, loss of light perception, and the need for revision surgery, but are rare. There are also specific device risks, including overstimulation (which could cause damage to the retina) or delamination of implanted components, but these are very unlikely. Current challenges include how to improve visual acuity, enlarge the field-of-view, and reduce a complex visual scene to its most salient components through image processing. This review encompasses the work of over 40 individual research groups who have built devices, developed stimulation strategies, or investigated the basic physiology underpinning retinal prostheses. Current technologies are summarized, along with future challenges that face the field.


Asunto(s)
Retinitis Pigmentosa/cirugía , Trastornos de la Visión/cirugía , Prótesis Visuales , Humanos , Resultado del Tratamiento
15.
J Vis ; 19(6): 18, 2019 06 03.
Artículo en Inglés | MEDLINE | ID: mdl-31215978

RESUMEN

Previous studies of age-related macular degeneration (AMD) report impaired facial expression recognition even with enlarged face images. Here, we test potential benefits of caricaturing (exaggerating how the expression's shape differs from neutral) as an image enhancement procedure targeted at mid- to high-level cortical vision. Experiment 1 provides proof-of-concept using normal vision observers shown blurred images as a partial simulation of AMD. Caricaturing significantly improved expression recognition (happy, sad, anger, disgust, fear, surprise) by ∼4%-5% across young adults and older adults (mean age 73 years); two different severities of blur; high, medium, and low intensity of the original expression; and all intermediate accuracy levels (impaired but still above chance). Experiment 2 tested AMD patients, running 19 eyes monocularly (from 12 patients, 67-94 years) covering a wide range of vision loss (acuities 6/7.5 to poorer than 6/360). With faces pre-enlarged, recognition approached ceiling and was only slightly worse than matched controls for high- and medium-intensity expressions. For low-intensity expressions, recognition of veridical expressions remained impaired and was significantly improved with caricaturing across all levels of vision loss by 5.8%. Overall, caricaturing benefits emerged when improvement was most needed, that is, when initial recognition of uncaricatured expressions was impaired.


Asunto(s)
Emociones/fisiología , Reconocimiento Facial/fisiología , Degeneración Macular/fisiopatología , Reconocimiento Visual de Modelos/fisiología , Adulto , Anciano , Anciano de 80 o más Años , Expresión Facial , Femenino , Humanos , Masculino , Persona de Mediana Edad , Adulto Joven
16.
J Exp Psychol Appl ; 25(2): 256-279, 2019 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-30321022

RESUMEN

There are multiple well-established situations in which humans' face recognition performance is poor, including for low-resolution images, other-race faces, and in older adult observers. Here we show that caricaturing faces-that is, exaggerating their appearance away from an average face-can provide a useful applied method for improving face recognition across all these circumstances. We employ a face-name learning task offering a number of methodological advantages (e.g., valid comparison of the size of the caricature improvement across conditions differing in overall accuracy). Across six experiments, we (a) extend previous evidence that caricaturing can improve recognition of low-resolution (blurred) faces; (b) show for the first time that caricaturing improves recognition and perception of other-race faces; and (c) show for the first time that caricaturing improves recognition in observers across the whole adult life span (testing older adults, M age = 71 years). In size, caricature benefits were at least as large where natural face recognition is poor (other-race, low resolution, older adults) as for the naturally best situation (own-race high-resolution faces in young adults). We discuss potential for practical applicability to improving face recognition in low-vision patients (age-related macular degeneration, bionic eye), security settings (police, passport control), eyewitness testimony, and prosopagnosia. (PsycINFO Database Record (c) 2019 APA, all rights reserved).


Asunto(s)
Cara/fisiología , Reconocimiento Facial/fisiología , Grupos Raciales , Agudeza Visual/fisiología , Anciano , Femenino , Humanos , Masculino
17.
Healthc Technol Lett ; 6(6): 187-190, 2019 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-32038855

RESUMEN

Optical colonoscopy is known as a gold standard screening method in detecting and removing cancerous polyps. During this procedure, some polyps may be undetected due to their positions, not being covered by the camera or missed by the surgeon. In this Letter, the authors introduce a novel convolutional neural network (ConvNet) algorithm to map the internal colon surface to a 2D map (visibility map), which can be used to increase the awareness of clinicians about areas they might miss. This was achieved by leveraging a colonoscopy simulator to generate a dataset consisting of colonoscopy video frames and their corresponding colon centreline (CCL) points in 3D camera coordinates. A pair of video frames were used as input to a ConvNet, whereas the output was a point on the CCL and its direction vector. By knowing CCL for each frame and roughly modelling the colon as a cylinder, frames could be unrolled to build a visibility map. They validated their results using both simulated and real colonoscopy frames. Their results showed that using consecutive simulated frames to learn the CCL can be generalised to real colonoscopy video frames to generate a visibility map.

18.
Sci Rep ; 8(1): 15205, 2018 10 12.
Artículo en Inglés | MEDLINE | ID: mdl-30315188

RESUMEN

Patients with age-related macular degeneration (AMD) have difficulty recognising people's faces. We tested whether this could be improved using caricaturing: an image enhancement procedure derived from cortical coding in a perceptual 'face-space'. Caricaturing exaggerates the distinctive ways in which an individual's face shape differs from the average. We tested 19 AMD-affected eyes (from 12 patients; ages 66-93 years) monocularly, selected to cover the full range of vision loss. Patients rated how different in identity people's faces appeared when compared in pairs (e.g., two young men, both Caucasian), at four caricature strengths (0, 20, 40, 60% exaggeration). This task gives data reliable enough to analyse statistically at the individual-eye level. All 9 eyes with mild vision loss (acuity ≥ 6/18) showed significant improvement in identity discrimination (higher dissimilarity ratings) with caricaturing. The size of improvement matched that in normal-vision young adults. The caricature benefit became less stable as visual acuity further decreased, but caricaturing was still effective in half the eyes with moderate and severe vision loss (significant improvement in 5 of 10 eyes; at acuities from 6/24 to poorer than <6/360). We conclude caricaturing has the potential to help many AMD patients recognise faces.


Asunto(s)
Cara/fisiología , Reconocimiento Facial/fisiología , Degeneración Macular/fisiopatología , Anciano , Anciano de 80 o más Años , Femenino , Humanos , Masculino , Estimulación Luminosa/métodos , Trastornos de la Visión/fisiopatología , Agudeza Visual/fisiología
19.
PLoS One ; 13(10): e0204361, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30286112

RESUMEN

PURPOSE: Previous behavioural studies demonstrate that face caricaturing can provide an effective image enhancement method for improving poor face identity perception in low vision simulations (e.g., age-related macular degeneration, bionic eye). To translate caricaturing usefully to patients, assignment of the multiple face landmark points needed to produce the caricatures needs to be fully automatised. Recent development in computer science allows automatic face landmark detection of 68 points in real time and in multiple viewpoints. However, previous demonstrations of the behavioural effectiveness of caricaturing have used higher-precision caricatures with 147 landmark points per face, assigned by hand. Here, we test the effectiveness of the auto-assigned 68-point caricatures. We also compare this to the hand-assigned 147-point caricatures. METHOD: We assessed human perception of how different in identity pairs of faces appear, when veridical (uncaricatured), caricatured with 68-points, and caricatured with 147-points. Across two experiments, we tested two types of low-vision images: a simulation of blur, as experienced in macular degeneration (testing two blur levels); and a simulation of the phosphenised images seen in prosthetic vision (at three resolutions). RESULTS: The 68-point caricatures produced significant improvements in identity discrimination relative to veridical. They were approximately 50% as effective as the 147-point caricatures. CONCLUSION: Realistic translation to patients (e.g., via real time caricaturing with the enhanced signal sent to smart glasses or visual prosthetic) is approaching feasibility. For maximum effectiveness software needs to be able to assign landmark points tracing out all details of feature and face shape, to produce high-precision caricatures.


Asunto(s)
Reconocimiento Facial , Procesamiento de Imagen Asistido por Computador/métodos , Adolescente , Adulto , Femenino , Humanos , Degeneración Macular/rehabilitación , Masculino , Prótesis Neurales , Estimulación Luminosa/métodos , Programas Informáticos , Adulto Joven
20.
IEEE Trans Image Process ; 27(3): 1271-1281, 2018 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-29990192

RESUMEN

The purpose of this paper is to recover dense correspondence between non-rigid shapes for anatomical objects, which is a key element of disease diagnosis and analysis. We proposed a shape matching framework based on Markov random fields to obtain non-rigid correspondence. We constructed an energy function by summing up two terms where one was a unary term and the other was a binary term. By using this formulation, shape matching was represented as an energy function minimisation problem. Loopy belief propagation (LBP) was then used to minimize the energy function. We adopted a new sparse update technique for LBP update to increase computational efficiency. At the same time, we also proposed to use a novel clamping technique, an expectation-maximization (EM) like approach, to enhance matching accuracy. Experiments with the hippocampal data from OASIS and PATH showed that the sparse update was 160 times faster than standard BP. By iteratively running the EM-like clamping procedure, we were able to obtain high quality non-rigid correspondence results to achieve 97% matching rate between two hippocampi. Our shape matching based approach overcomes the flip problem of first-order ellipsoid and does not assume pre-alignment unlike iterative closest point.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...