Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38215319

RESUMO

Graph convolutional networks (GCNs) have emerged as a powerful tool for action recognition, leveraging skeletal graphs to encapsulate human motion. Despite their efficacy, a significant challenge remains the dependency on huge labeled datasets. Acquiring such datasets is often prohibitive, and the frequent occurrence of incomplete skeleton data, typified by absent joints and frames, complicates the testing phase. To tackle these issues, we present graph representation alignment (GRA), a novel approach with two main contributions: 1) a self-training (ST) paradigm that substantially reduces the need for labeled data by generating high-quality pseudo-labels, ensuring model stability even with minimal labeled inputs and 2) a representation alignment (RA) technique that utilizes consistency regularization to effectively reduce the impact of missing data components. Our extensive evaluations on the NTU RGB+D and Northwestern-UCLA (N-UCLA) benchmarks demonstrate that GRA not only improves GCN performance in data-constrained environments but also retains impressive performance in the face of data incompleteness.

2.
Bioengineering (Basel) ; 10(8)2023 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-37627793

RESUMO

Radiotherapy (RT) is an important modality for laryngeal cancer treatment to preserve laryngeal function. During beam delivery, laryngeal motion remains uncontrollable and may compromise tumor-targeting efficacy. We aimed to examine real-time laryngeal motion by developing a surface depth-sensing technique with preliminary testing during RT-based treatment of patients with laryngeal cancer. A surface depth-sensing (SDS) camera was set up and integrated into RT simulation procedures. By recording the natural swallowing of patients, SDS calculation was performed using the Pose Estimation Model and deep neural network technique. Seven male patients with laryngeal cancer were enrolled in this prospective study. The calculated motion distances of the laryngeal prominence (mean ± standard deviation) were 1.6 ± 0.8 mm, 21.4 ± 5.1 mm, 6.4 ± 3.3 mm, and 22.7 ± 4.9 mm in the left-right, cranio-caudal, and anterior-posterior directions and for the spatial displacement, respectively. The calculated differences in the 3D margins for generating the planning tumor volume by senior physicians with and without SDS data were -0.7 ± 1.0 mm (-18%), 11.3 ± 6.8 mm (235%), and 1.8 ± 2.6 mm (45%) in the left-right, cranio-caudal, and anterior-posterior directions, respectively. The SDS technique developed for detecting laryngeal motion during swallowing may be a practical guide for individualized RT design in the treatment of laryngeal cancer.

3.
Neural Netw ; 161: 83-91, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36736002

RESUMO

Existing deep learning based face anti-spoofing (FAS) or deepfake detection approaches usually rely on large-scale datasets and powerful networks with significant amount of parameters to achieve satisfactory performance. However, these make them resource-heavy and unsuitable for handheld devices. Moreover, they are limited by the types of spoof in the dataset they train on and require considerable training time. To produce a robust FAS model, they need large datasets covering the widest variety of predefined presentation attacks possible. Testing on new or unseen attacks or environments generally results in poor performance. Ideally, the FAS model should learn discriminative features that can generalize well even on unseen spoof types. In this paper, we propose a fast learning approach called Domain Effective Fast Adaptive nEt-worK (DEFAEK), a face anti-spoofing approach based on the optimization-based meta-learning paradigm that effectively and quickly adapts to new tasks. DEFAEK treats differences in an environment as domains and simulates multiple domain shifts during training. To further improve the effectiveness and efficiency of meta-learning, we adopt the metric learning in the inner loop update with careful sample selection. With extensive experiments on the challenging CelebA-Spoof and FaceForensics++ datasets, the evaluation results show that DEFAEK can learn cues independent of the environment with good generalization capability. In addition, the resulting model is lightweight following the design principle of modern lightweight network architecture and still generalizes well on unseen classes. In addition, we also demonstrate our model's capabilities by comparing the numbers of parameters, FLOPS, and model performance with other state-of-the-art methods.


Assuntos
Sinais (Psicologia) , Generalização Psicológica
4.
Sensors (Basel) ; 22(2)2022 Jan 17.
Artigo em Inglês | MEDLINE | ID: mdl-35062647

RESUMO

Accidents are continuously reported for autonomous driving vehicles including those with advanced sensors installed. Some of accidents are usually caused by bad weather, poor lighting conditions and non-line-of-sight obstacles. Cellular Vehicle-to-Everything (C-V2X) radio technology can significantly improve those weak spots for autonomous driving. This paper describes one of the C-V2X system solutions: Vulnerable Road User Collision Warning (VRUCW) for autonomous driving. The paper provides the system architecture, design logic, network topology, message flow, artificial intelligence (AI) and network security feature. As a reference it also includes a commercial project with its test results.


Assuntos
Acidentes de Trânsito , Condução de Veículo , Acidentes de Trânsito/prevenção & controle , Inteligência Artificial , Tecnologia , Tempo (Meteorologia)
5.
IEEE J Biomed Health Inform ; 26(4): 1453-1463, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-34033550

RESUMO

Alzheimer's disease (AD) is one of the deadliest neurodegenerative diseases ailing the elderly population all over the world. An ensemble of Deep learning (DL) models can learn highly complicated patterns from MRI scans for the detection of AD by utilizing diverse solutions. In this work, we propose a computationally efficient, DL-architecture agnostic, ensemble of deep neural networks, named 'Deep Transfer Ensemble (DTE)' trained using transfer learning for the classification of AD. DTE leverages the complementary feature views and diversity introduced by many different locally optimum solutions reached by individual networks through the randomization of hyper-parameters. DTE achieves an accuracy of 99.05% and 85.27% on two independent splits of the large dataset for cognitively normal (NC) vs AD classification task. For the task of mild cognitive impairment (MCI) vs AD classification, DTE achieves 98.71% and 83.11% respectively on the two independent splits. It also performs reasonable on a small dataset consisting of only 50 samples per class. It achieved a maximum accuracy of 85% for NC vs AD on the small dataset. It also outperformed snapshot ensembles along with several other existing deep models from similar kind of previous works by other researchers.


Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Idoso , Doença de Alzheimer/diagnóstico por imagem , Disfunção Cognitiva/diagnóstico por imagem , Humanos , Aprendizado de Máquina , Imageamento por Ressonância Magnética , Redes Neurais de Computação
6.
IEEE J Biomed Health Inform ; 26(5): 1987-1996, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-34432642

RESUMO

Online healthcare applications have grown more popular over the years. For instance, telehealth is an online healthcare application that allows patients and doctors to schedule consultations, prescribe medication, share medical documents, and monitor health conditions conveniently. Apart from this, telehealth can also be used to store a patient's personal and medical information. With its rise in usage due to COVID-19, given the amount of sensitive data it stores, security measures are necessary. A simple way of making these applications more secure is through user authentication. One of the most common and often used authentications is face recognition. It is convenient and easy to use. However, face recognition systems are not foolproof. They are prone to malicious attacks like printed photos, paper cutouts, replayed videos, and 3D masks. The goal of face anti-spoofing is to differentiate real users (live) from attackers (spoof). Although effective in terms of performance, existing methods use a significant amount of parameters, making them resource-heavy and unsuitable for handheld devices. Apart from this, they fail to generalize well to new environments like changes in lighting or background. This paper proposes a lightweight face anti-spoofing framework that does not compromise on performance. Our proposed method achieves good performance with the help of an ArcFace Classifier (AC). The AC encourages differentiation between spoof and live samples by making clear boundaries between them. With clear boundaries, classification becomes more accurate. We further demonstrate our model's capabilities by comparing the number of parameters, FLOPS, and performance with other state-of-the-art methods.


Assuntos
COVID-19 , Telemedicina , Segurança Computacional , Face , Humanos
7.
IEEE Trans Cybern ; 52(6): 4825-4836, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34043518

RESUMO

Modifying facial attributes without the paired dataset proves to be a challenging task. Previously, approaches either required supervision from a ground-truth transformed image or required training a separate model for mapping every pair of attributes. These limit the scalability of the models to accommodate a larger set of attributes since the number of models that we need to train grows exponentially large. Another major drawback of the previous approaches is the unintentional gain of the identity of the person as they transform the facial attributes. We propose a method that allows for controllable and identity-aware transformations across multiple facial attributes using only a single model. Our approach is to train a generative adversarial network (GAN) with a multitask conditional discriminator that recognizes the identity of the face, distinguishes real images from fake, as well as identifies facial attributes present in an image. This guides the generator into producing an output that is realistic while preserving the person's identity and facial attributes. Through this framework, our model also learns meaningful image representations in a lower dimensional latent space and semantically associate separate parts of the encoded vector with both the person's identity and facial attributes. This opens up the possibility of generating new faces and other transformations such as making the face thinner or chubbier. Furthermore, our model only encodes the image once and allows for multiple transformations using the encoded vector. This allows for faster transformations since it does not need to reprocess the entire image for every transformation. We show the effectiveness of our proposed method through both qualitative and quantitative evaluations, such as ablative studies, visual inspection, and face verification. Competitive results are achieved compared to the main competition (CycleGAN), however, at great space and extensibility gain by using a single model.

8.
Sensors (Basel) ; 21(18)2021 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-34577244

RESUMO

A desirable photographic reproduction method should have the ability to compress high-dynamic-range images to low-dynamic-range displays that faithfully preserve all visual information. However, during the compression process, most reproduction methods face challenges in striking a balance between maintaining global contrast and retaining majority of local details in a real-world scene. To address this problem, this study proposes a new photographic reproduction method that can smoothly take global and local features into account. First, a highlight/shadow region detection scheme is used to obtain prior information to generate a weight map. Second, a mutually hybrid histogram analysis is performed to extract global/local features in parallel. Third, we propose a feature fusion scheme to construct the virtual combined histogram, which is achieved by adaptively fusing global/local features through the use of Gaussian mixtures according to the weight map. Finally, the virtual combined histogram is used to formulate the pixel-wise mapping function. As both global and local features are simultaneously considered, the output image has a natural and visually pleasing appearance. The experimental results demonstrated the effectiveness of the proposed method and the superiority over other seven state-of-the-art methods.


Assuntos
Compressão de Dados , Aumento da Imagem , Algoritmos , Fotografação , Reprodução
9.
Sensors (Basel) ; 21(12)2021 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-34208602

RESUMO

Photographic reproduction and enhancement is challenging because it requires the preservation of all the visual information during the compression of the dynamic range of the input image. This paper presents a cascaded-architecture-type reproduction method that can simultaneously enhance local details and retain the naturalness of original global contrast. In the pre-processing stage, in addition to using a multiscale detail injection scheme to enhance the local details, the Stevens effect is considered for adapting different luminance levels and normally compressing the global feature. We propose a modified histogram equalization method in the reproduction stage, where individual histogram bin widths are first adjusted according to the property of overall image content. In addition, the human visual system (HVS) is considered so that a luminance-aware threshold can be used to control the maximum permissible width of each bin. Then, the global tone is modified by performing histogram equalization on the output modified histogram. Experimental results indicate that the proposed method can outperform the five state-of-the-art methods in terms of visual comparisons and several objective image quality evaluations.


Assuntos
Compressão de Dados , Aumento da Imagem , Algoritmos , Humanos , Fotografação , Reprodução
10.
Nutrients ; 13(1)2021 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-33430147

RESUMO

The use of image-based dietary assessments (IBDAs) has rapidly increased; however, there is no formalized training program to enhance the digital viewing skills of dieticians. An IBDA was integrated into a nutritional practicum course in the School of Nutrition and Health Sciences, Taipei Medical University Taiwan. An online IBDA platform was created as an off-campus remedial teaching tool to reinforce the conceptualization of food portion sizes. Dietetic students' receptiveness and response to the IBDA, and their performance in food identification and quantification, were compared between the IBDA and real food visual estimations (RFVEs). No differences were found between the IBDA and RFVE in terms of food identification (67% vs. 71%) or quantification (±10% of estimated calories: 23% vs. 24%). A Spearman correlation analysis showed a moderate to high correlation for calorie estimates between the IBDA and RFVE (r ≥ 0.33~0.75, all p < 0.0001). Repeated IBDA training significantly improved students' image-viewing skills [food identification: first semester: 67%; pretest: 77%; second semester: 84%) and quantification [±10%: first semester: 23%; pretest: 28%; second semester: 32%; and ±20%: first semester: 38%; pretest: 48%; second semester: 59%] and reduced absolute estimated errors from 27% (first semester) to 16% (second semester). Training also greatly improved the identification of omitted foods (e.g., condiments, sugar, cooking oil, and batter coatings) and the accuracy of food portion size estimates. The integration of an IBDA into dietetic courses has the potential to help students develop knowledge and skills related to "e-dietetics".


Assuntos
Dietética/educação , Avaliação Nutricional , Nutricionistas/educação , Fotografação , Tamanho da Porção , Currículo , Humanos , Internet
11.
Sensors (Basel) ; 21(3)2021 Jan 27.
Artigo em Inglês | MEDLINE | ID: mdl-33513998

RESUMO

C-V2X (Cellular Vehicle-to-Everything) is a state-of-the-art wireless technology used in autonomous driving and intelligent transportation systems (ITS). This technology has extended the coverage and blind-spot detection of autonomous driving vehicles. Economically, C-V2X is much more cost-effective than the traditional sensors that are commonly used by autonomous driving vehicles. This cost-benefit makes it more practical in a large scale deployment. PC5-based C-V2X uses an RF (Radio Frequency) sidelink direct communication for low latency mission-critical vehicle sensor connectivity. Over the C-V2X radio communications, the autonomous driving vehicle's sensor ability can now be largely enhanced to the distances as far as the network covers. In 2020, 5G is commercialized worldwide, and Taiwan is at the forefront. Operators and governments are keen to see its implications in people's daily life brought by its low latency, high reliability, and high throughput. Autonomous driving class L3 (Conditional Automation) or L4 (Highly Automation) are good examples of 5G's advanced applications. In these applications, the mobile networks with URLLC (Ultra-Reliable Low-Latency Communication) are perfectly demonstrated. Therefore, C-V2X evolution and 5G NR (New Radio) deployment coincide and form a new ecosystem. This ecosystem will change how people will drive and how transportation will be managed in the future. In this paper, the following topics are covered. Firstly, the benefits of C-V2X communication technology. Secondly, the standards of C-V2X and C-V2X applications for automotive road safety system which includes V2P/V2I/V2V/V2N, and artificial intelligence in VRU (Vulnerable Road User) detection, object recognition and movement prediction for collision warning and prevention. Thirdly, PC5-based C-V2X deployment status in global, especially in Taiwan. Lastly, current challenges and conclusions of C-V2X development.

12.
Sensors (Basel) ; 19(23)2019 Nov 29.
Artigo em Inglês | MEDLINE | ID: mdl-31795519

RESUMO

Urban swarming transportation (UST) is a type of road transportation where multiple types of vehicles such as cars, buses, trucks, motorcycles, and bicycles, as well as pedestrians are allowed and mixed together on the roads. Predicting the traffic jam speed under UST is very different and difficult from the single road network traffic prediction which has been commonly studied in the intelligent traffic system (ITS) research. In this research, the road network wide (RNW) traffic prediction which predicts traffic jam speeds of multiple roads at once by utilizing citizens' mobile GPS sensor records is proposed to better predict traffic jam under UST. In order to conduct the RNW traffic prediction, a specific data preprocessing is needed to convert traffic data into an image representing spatial-temporal relationships among RNW. In addition, a revised capsule network (CapsNet), named OCapsNet, which utilizes nonlinearity functions in the first two convolution layers and the modified dynamic routing to optimize the performance of CapsNet, is proposed. The experiments were conducted using real-world urban road traffic data of Jakarta to evaluate the performance. The results show that OCapsNet has better performance than Convolution Neural Network (CNN) and original CapsNet with better accuracy and precision.

13.
Sensors (Basel) ; 19(21)2019 Oct 31.
Artigo em Inglês | MEDLINE | ID: mdl-31683704

RESUMO

High dynamic range (HDR) has wide applications involving intelligent vision sensing which includes enhanced electronic imaging, smart surveillance, self-driving cars, intelligent medical diagnosis, etc. Exposure fusion is an essential HDR technique which fuses different exposures of the same scene into an HDR-like image. However, determining the appropriate fusion weights is difficult because each differently exposed image only contains a subset of the scene's details. When blending, the problem of local color inconsistency is more challenging; thus, it often requires manual tuning to avoid image artifacts. To address this problem, we present an adaptive coarse-to-fine searching approach to find the optimal fusion weights. In the coarse-tuning stage, fuzzy logic is used to efficiently decide the initial weights. In the fine-tuning stage, the multivariate normal conditional random field model is used to adjust the fuzzy-based initial weights which allows us to consider both intra- and inter-image information in the data. Moreover, a multiscale enhanced fusion scheme is proposed to blend input images when maintaining the details in each scale-level. The proposed fuzzy-based MNCRF (Multivariate Normal Conditional Random Fields) fusion method provided a smoother blending result and a more natural look. Meanwhile, the details in the highlighted and dark regions were preserved simultaneously. The experimental results demonstrated that our work outperformed the state-of-the-art methods not only in several objective quality measures but also in a user study analysis.

14.
Onco Targets Ther ; 12: 6439-6451, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31496743

RESUMO

PURPOSE: This study integrated clinical outcomes and radiomics of advanced thoracic esophageal squamous cell carcinoma patients receiving neoadjuvant concurrent chemoradiotherapy (NACCRT) to establish a novel constraint model for predicting radiation pneumonitis (RP). PATIENTS AND METHODS: We conducted a retrospective review for thoracic advanced esophageal cancer patients who received NACCRT. From 2013 to 2018, 89 patients were eligible for review. Staging workup and response evaluation included positron emission tomography/computed tomography (PET/CT) scans and endoscopic ultrasound. Patients received RT with 48 Gy to gross tumor and 43.2 Gy to elective nodal area in simultaneous integrated boost method divided in 24 fractions. Weekly platinum-based chemotherapy was administered concurrently. Side effects were evaluated using CTCAE v4. Images of 2-fluoro-2-deoxyglucose PET/CT before and after NACCRT were registered to planning CT images to create a region of interest for dosimetry parameters that spatially matched RP-related regions, including V10, V20, V50%, V27, and V30. Correlation between bio-physic parameters and toxicity was used to establish a constraint model for avoiding RP. RESULTS: Among the investigated cohort, clinical downstaging, complete pathological response, and 5-year overall survival rates were 59.6%, 40%, and 34.4%, respectively. Multivariate logistic regression analysis demonstrated that each individual set standardized uptake value ratios (SUVRs), neither pre- nor post-NACCRT, was not predictive. Interestingly, cutoff increments of 6.2% and 8.9% in SUVRs (delta-SUVR) in registered V20 and V27 regions were powerful predictors for acute and chronic RP, respectively. CONCLUSION: Spatial registration of metabolic and planning CT images with delta-radiomics analysis using fore-and-aft image sets can establish a unique bio-physic prediction model for avoiding RP in esophageal cancer patients receiving NACCRT.

15.
Sensors (Basel) ; 19(7)2019 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-30974774

RESUMO

Depth has been a valuable piece of information for perception tasks such as robot grasping, obstacle avoidance, and navigation, which are essential tasks for developing smart homes and smart cities. However, not all applications have the luxury of using depth sensors or multiple cameras to obtain depth information. In this paper, we tackle the problem of estimating the per-pixel depths from a single image. Inspired by the recent works on generative neural network models, we formulate the task of depth estimation as a generative task where we synthesize an image of the depth map from a single Red, Green, and Blue (RGB) input image. We propose a novel generative adversarial network that has an encoder-decoder type generator with residual transposed convolution blocks trained with an adversarial loss. Quantitative and qualitative experimental results demonstrate the effectiveness of our approach over several depth estimation works.

16.
Sensors (Basel) ; 19(7)2019 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-30986925

RESUMO

Autonomous robots for smart homes and smart cities mostly require depth perception in order to interact with their environments. However, depth maps are usually captured in a lower resolution as compared to RGB color images due to the inherent limitations of the sensors. Naively increasing its resolution often leads to loss of sharpness and incorrect estimates, especially in the regions with depth discontinuities or depth boundaries. In this paper, we propose a novel Generative Adversarial Network (GAN)-based framework for depth map super-resolution that is able to preserve the smooth areas, as well as the sharp edges at the boundaries of the depth map. Our proposed model is trained on two different modalities, namely color images and depth maps. However, at test time, our model only requires the depth map in order to produce a higher resolution version. We evaluated our model both quantitatively and qualitatively, and our experiments show that our method performs better than existing state-of-the-art models.

17.
Sensors (Basel) ; 19(5)2019 Mar 09.
Artigo em Inglês | MEDLINE | ID: mdl-30857334

RESUMO

The JPEG-XR encoding process utilizes two types of transform operations: Photo Overlap Transform (POT) and Photo Core Transform (PCT). Using the Device Porting Kit (DPK) provided by Microsoft, we performed encoding and decoding processes on JPEG XR images. It was discovered that when the quantization parameter is >1-lossy compression conditions, the resulting image displays chequerboard block artefacts, border artefacts and corner artefacts. These artefacts are due to the nonlinearity of transforms used by JPEG-XR. Typically, it is not so visible; however, it can cause problems while copying and scanning applications, as it shows nonlinear transforms when the source and the target of the image have different configurations. Hence, it is important for document image processing pipelines to take such artefacts into account. Additionally, these artefacts are most problematic for high-quality settings and appear more visible at high compression ratios. In this paper, we analyse the cause of the above artefacts. It was found that the main problem lies in the step of POT and quantization. To solve this problem, the use of a "uniform matrix" is proposed. After POT (encoding) and before inverse POT (decoding), an extra step is added to multiply this uniform matrix. Results suggest that it is an easy and effective way to decrease chequerboard, border and corner artefacts, thereby improving the image quality of lossy encoding JPEG XR than the original DPK program with no increased calculation complexity or file size.

18.
Sensors (Basel) ; 19(6)2019 Mar 22.
Artigo em Inglês | MEDLINE | ID: mdl-30909503

RESUMO

In this paper, a preliminary baseball player behavior classification system is proposed. By using multiple IoT sensors and cameras, the proposed method accurately recognizes many of baseball players' behaviors by analyzing signals from heterogeneous sensors. The contribution of this paper is threefold: (i) signals from a depth camera and from multiple inertial sensors are obtained and segmented, (ii) the time-variant skeleton vector projection from the depth camera and the statistical features extracted from the inertial sensors are used as features, and (iii) a deep learning-based scheme is proposed for training behavior classifiers. The experimental results demonstrate that the proposed deep learning behavior system achieves an accuracy of greater than 95% compared to the proposed dataset.


Assuntos
Acelerometria/métodos , Comportamento/fisiologia , Aprendizado Profundo , Acelerometria/instrumentação , Beisebol , Humanos , Articulações/fisiologia , Memória de Longo Prazo , Memória de Curto Prazo , Fotografação , Dispositivos Eletrônicos Vestíveis
19.
Artigo em Inglês | MEDLINE | ID: mdl-29994510

RESUMO

Convolutional neural networks are currently the state-of-the-art solution for a wide range of image processing tasks. Their deep architecture extracts low and high-level features from images, thus, improving the model's performance. In this paper, we propose a method for image demosaicking based on deep convolutional neural networks. Demosaicking is the task of reproducing full color images from incomplete images formed from overlaid color filter arrays on image sensors found in digital cameras. Instead of producing the output image directly, the proposed method divides the demosaicking task into an initial demosaicking step and a refinement step. The initial step produces a rough demosaicked image containing unwanted color artifacts. The refinement step then reduces these color artifacts using deep residual estimation and multi-model fusion producing a higher quality image. Experimental results show that the proposed method outperforms several existing and state-of-the-art methods in terms of both subjective and objective evaluations.

20.
IEEE Trans Cybern ; 48(1): 423-435, 2018 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-28026799

RESUMO

It is important to extract a clear background for computer vision and augmented reality. Generally, background extraction assumes the existence of a clean background shot through the input sequence, but realistically, situations may violate this assumption such as highway traffic videos. Therefore, our probabilistic model-based method formulates fusion of candidate background patches of the input sequence as a random walk problem and seeks a globally optimal solution based on their temporal and spatial relationship. Furthermore, we also design two quality measures to consider spatial and temporal coherence and contrast distinctness among pixels as background selection basis. A static background should have high temporal coherence among frames, and thus, we improve our fusion precision with a temporal contrast filter and an optical-flow-based motionless patch extractor. Experiments demonstrate that our algorithm can successfully extract artifact-free background images with low computational cost while comparing to state-of-the-art algorithms.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA