Búsqueda | Portal Regional de la BVS

1.

GRA: Graph Representation Alignment for Semi-Supervised Action Recognition.

Huang, Kuan-Hung; Huang, Yao-Bang; Lin, Yong-Xiang; Hua, Kai-Lung; Tanveer, M; Lu, Xuequan; Razzak, Imran.

IEEE Trans Neural Netw Learn Syst ; PP2024 Jan 12.

Artículo en Inglés | MEDLINE | ID: mdl-38215319

RESUMEN

Graph convolutional networks (GCNs) have emerged as a powerful tool for action recognition, leveraging skeletal graphs to encapsulate human motion. Despite their efficacy, a significant challenge remains the dependency on huge labeled datasets. Acquiring such datasets is often prohibitive, and the frequent occurrence of incomplete skeleton data, typified by absent joints and frames, complicates the testing phase. To tackle these issues, we present graph representation alignment (GRA), a novel approach with two main contributions: 1) a self-training (ST) paradigm that substantially reduces the need for labeled data by generating high-quality pseudo-labels, ensuring model stability even with minimal labeled inputs and 2) a representation alignment (RA) technique that utilizes consistency regularization to effectively reduce the impact of missing data components. Our extensive evaluations on the NTU RGB+D and Northwestern-UCLA (N-UCLA) benchmarks demonstrate that GRA not only improves GCN performance in data-constrained environments but also retains impressive performance in the face of data incompleteness.

2.

Real-Time Tracking of Laryngeal Motion via the Surface Depth-Sensing Technique for Radiotherapy in Laryngeal Cancer Patients.

Lee, Wan-Ju; Leu, Yi-Shing; Chen, Jing-Sheng; Dai, Kun-Yao; Hou, Tien-Chi; Chang, Chung-Ting; Li, Chi-Jung; Hua, Kai-Lung; Chen, Yu-Jen.

Bioengineering (Basel) ; 10(8)2023 Jul 31.

Artículo en Inglés | MEDLINE | ID: mdl-37627793

RESUMEN

Radiotherapy (RT) is an important modality for laryngeal cancer treatment to preserve laryngeal function. During beam delivery, laryngeal motion remains uncontrollable and may compromise tumor-targeting efficacy. We aimed to examine real-time laryngeal motion by developing a surface depth-sensing technique with preliminary testing during RT-based treatment of patients with laryngeal cancer. A surface depth-sensing (SDS) camera was set up and integrated into RT simulation procedures. By recording the natural swallowing of patients, SDS calculation was performed using the Pose Estimation Model and deep neural network technique. Seven male patients with laryngeal cancer were enrolled in this prospective study. The calculated motion distances of the laryngeal prominence (mean ± standard deviation) were 1.6 ± 0.8 mm, 21.4 ± 5.1 mm, 6.4 ± 3.3 mm, and 22.7 ± 4.9 mm in the left-right, cranio-caudal, and anterior-posterior directions and for the spatial displacement, respectively. The calculated differences in the 3D margins for generating the planning tumor volume by senior physicians with and without SDS data were -0.7 ± 1.0 mm (-18%), 11.3 ± 6.8 mm (235%), and 1.8 ± 2.6 mm (45%) in the left-right, cranio-caudal, and anterior-posterior directions, respectively. The SDS technique developed for detecting laryngeal motion during swallowing may be a practical guide for individualized RT design in the treatment of laryngeal cancer.

3.

DEFAEK: Domain Effective Fast Adaptive Network for Face Anti-Spoofing.

Lin, Jiun-Da; Han, Yue-Hua; Huang, Po-Han; Tan, Julianne; Chen, Jun-Cheng; Tanveer, M; Hua, Kai-Lung.

Neural Netw ; 161: 83-91, 2023 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-36736002

RESUMEN

Existing deep learning based face anti-spoofing (FAS) or deepfake detection approaches usually rely on large-scale datasets and powerful networks with significant amount of parameters to achieve satisfactory performance. However, these make them resource-heavy and unsuitable for handheld devices. Moreover, they are limited by the types of spoof in the dataset they train on and require considerable training time. To produce a robust FAS model, they need large datasets covering the widest variety of predefined presentation attacks possible. Testing on new or unseen attacks or environments generally results in poor performance. Ideally, the FAS model should learn discriminative features that can generalize well even on unseen spoof types. In this paper, we propose a fast learning approach called Domain Effective Fast Adaptive nEt-worK (DEFAEK), a face anti-spoofing approach based on the optimization-based meta-learning paradigm that effectively and quickly adapts to new tasks. DEFAEK treats differences in an environment as domains and simulates multiple domain shifts during training. To further improve the effectiveness and efficiency of meta-learning, we adopt the metric learning in the inner loop update with careful sample selection. With extensive experiments on the challenging CelebA-Spoof and FaceForensics++ datasets, the evaluation results show that DEFAEK can learn cues independent of the environment with good generalization capability. In addition, the resulting model is lightweight following the design principle of modern lightweight network architecture and still generalizes well on unseen classes. In addition, we also demonstrate our model's capabilities by comparing the numbers of parameters, FLOPS, and model performance with other state-of-the-art methods.

Asunto(s)

Señales (Psicología) , Generalización Psicológica

4.

How Does C-V2X Help Autonomous Driving to Avoid Accidents?

Miao, Lili; Chen, Shang-Fu; Hsu, Yu-Ling; Hua, Kai-Lung.

Sensors (Basel) ; 22(2)2022 Jan 17.

Artículo en Inglés | MEDLINE | ID: mdl-35062647

RESUMEN

Accidents are continuously reported for autonomous driving vehicles including those with advanced sensors installed. Some of accidents are usually caused by bad weather, poor lighting conditions and non-line-of-sight obstacles. Cellular Vehicle-to-Everything (C-V2X) radio technology can significantly improve those weak spots for autonomous driving. This paper describes one of the C-V2X system solutions: Vulnerable Road User Collision Warning (VRUCW) for autonomous driving. The paper provides the system architecture, design logic, network topology, message flow, artificial intelligence (AI) and network security feature. As a reference it also includes a commercial project with its test results.

Asunto(s)

Accidentes de Tránsito , Conducción de Automóvil , Accidentes de Tránsito/prevención & control , Inteligencia Artificial , Tecnología , Tiempo (Meteorología)

5.

Classification of Alzheimer's Disease Using Ensemble of Deep Neural Networks Trained Through Transfer Learning.

Tanveer, M; Rashid, A H; Ganaie, M A; Reza, M; Razzak, Imran; Hua, Kai-Lung.

IEEE J Biomed Health Inform ; 26(4): 1453-1463, 2022 04.

Artículo en Inglés | MEDLINE | ID: mdl-34033550

RESUMEN

Alzheimer's disease (AD) is one of the deadliest neurodegenerative diseases ailing the elderly population all over the world. An ensemble of Deep learning (DL) models can learn highly complicated patterns from MRI scans for the detection of AD by utilizing diverse solutions. In this work, we propose a computationally efficient, DL-architecture agnostic, ensemble of deep neural networks, named 'Deep Transfer Ensemble (DTE)' trained using transfer learning for the classification of AD. DTE leverages the complementary feature views and diversity introduced by many different locally optimum solutions reached by individual networks through the randomization of hyper-parameters. DTE achieves an accuracy of 99.05% and 85.27% on two independent splits of the large dataset for cognitively normal (NC) vs AD classification task. For the task of mild cognitive impairment (MCI) vs AD classification, DTE achieves 98.71% and 83.11% respectively on the two independent splits. It also performs reasonable on a small dataset consisting of only 50 samples per class. It achieved a maximum accuracy of 85% for NC vs AD on the small dataset. It also outperformed snapshot ensembles along with several other existing deep models from similar kind of previous works by other researchers.

Asunto(s)

Enfermedad de Alzheimer , Disfunción Cognitiva , Anciano , Enfermedad de Alzheimer/diagnóstico por imagen , Disfunción Cognitiva/diagnóstico por imagen , Humanos , Aprendizaje Automático , Imagen por Resonancia Magnética , Redes Neurales de la Computación

6.

Controllable and Identity-Aware Facial Attribute Transformation.

Tan, Daniel Stanley; Soeseno, Jonathan Hans; Hua, Kai-Lung.

IEEE Trans Cybern ; 52(6): 4825-4836, 2022 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-34043518

RESUMEN

Modifying facial attributes without the paired dataset proves to be a challenging task. Previously, approaches either required supervision from a ground-truth transformed image or required training a separate model for mapping every pair of attributes. These limit the scalability of the models to accommodate a larger set of attributes since the number of models that we need to train grows exponentially large. Another major drawback of the previous approaches is the unintentional gain of the identity of the person as they transform the facial attributes. We propose a method that allows for controllable and identity-aware transformations across multiple facial attributes using only a single model. Our approach is to train a generative adversarial network (GAN) with a multitask conditional discriminator that recognizes the identity of the face, distinguishes real images from fake, as well as identifies facial attributes present in an image. This guides the generator into producing an output that is realistic while preserving the person's identity and facial attributes. Through this framework, our model also learns meaningful image representations in a lower dimensional latent space and semantically associate separate parts of the encoded vector with both the person's identity and facial attributes. This opens up the possibility of generating new faces and other transformations such as making the face thinner or chubbier. Furthermore, our model only encodes the image once and allows for multiple transformations using the encoded vector. This allows for faster transformations since it does not need to reprocess the entire image for every transformation. We show the effectiveness of our proposed method through both qualitative and quantitative evaluations, such as ablative studies, visual inspection, and face verification. Competitive results are achieved compared to the main competition (CycleGAN), however, at great space and extensibility gain by using a single model.

7.

Lightweight Face Anti-Spoofing Network for Telehealth Applications.

Lin, Jiun-Da; Lin, Hung-Hsiang; Dy, Jilyan; Chen, Jun-Cheng; Tanveer, M; Razzak, Imran; Hua, Kai-Lung.

IEEE J Biomed Health Inform ; 26(5): 1987-1996, 2022 05.

Artículo en Inglés | MEDLINE | ID: mdl-34432642

RESUMEN

Online healthcare applications have grown more popular over the years. For instance, telehealth is an online healthcare application that allows patients and doctors to schedule consultations, prescribe medication, share medical documents, and monitor health conditions conveniently. Apart from this, telehealth can also be used to store a patient's personal and medical information. With its rise in usage due to COVID-19, given the amount of sensitive data it stores, security measures are necessary. A simple way of making these applications more secure is through user authentication. One of the most common and often used authentications is face recognition. It is convenient and easy to use. However, face recognition systems are not foolproof. They are prone to malicious attacks like printed photos, paper cutouts, replayed videos, and 3D masks. The goal of face anti-spoofing is to differentiate real users (live) from attackers (spoof). Although effective in terms of performance, existing methods use a significant amount of parameters, making them resource-heavy and unsuitable for handheld devices. Apart from this, they fail to generalize well to new environments like changes in lighting or background. This paper proposes a lightweight face anti-spoofing framework that does not compromise on performance. Our proposed method achieves good performance with the help of an ArcFace Classifier (AC). The AC encourages differentiation between spoof and live samples by making clear boundaries between them. With clear boundaries, classification becomes more accurate. We further demonstrate our model's capabilities by comparing the number of parameters, FLOPS, and performance with other state-of-the-art methods.

Asunto(s)

COVID-19 , Telemedicina , Seguridad Computacional , Cara , Humanos

8.

A New Photographic Reproduction Method Based on Feature Fusion and Virtual Combined Histogram Equalization.

Lin, Yu-Hsiu; Hua, Kai-Lung; Chen, Yung-Yao; Chen, I-Ying; Tsai, Yun-Chen.

Sensors (Basel) ; 21(18)2021 Sep 09.

Artículo en Inglés | MEDLINE | ID: mdl-34577244

RESUMEN

A desirable photographic reproduction method should have the ability to compress high-dynamic-range images to low-dynamic-range displays that faithfully preserve all visual information. However, during the compression process, most reproduction methods face challenges in striking a balance between maintaining global contrast and retaining majority of local details in a real-world scene. To address this problem, this study proposes a new photographic reproduction method that can smoothly take global and local features into account. First, a highlight/shadow region detection scheme is used to obtain prior information to generate a weight map. Second, a mutually hybrid histogram analysis is performed to extract global/local features in parallel. Third, we propose a feature fusion scheme to construct the virtual combined histogram, which is achieved by adaptively fusing global/local features through the use of Gaussian mixtures according to the weight map. Finally, the virtual combined histogram is used to formulate the pixel-wise mapping function. As both global and local features are simultaneously considered, the output image has a natural and visually pleasing appearance. The experimental results demonstrated the effectiveness of the proposed method and the superiority over other seven state-of-the-art methods.

Asunto(s)

Compresión de Datos , Aumento de la Imagen , Algoritmos , Fotograbar , Reproducción

9.

Photographic Reproduction and Enhancement Using HVS-Based Modified Histogram Equalization.

Chen, Yung-Yao; Hua, Kai-Lung; Tsai, Yun-Chen; Wu, Jun-Hua.

Sensors (Basel) ; 21(12)2021 Jun 16.

Artículo en Inglés | MEDLINE | ID: mdl-34208602

RESUMEN

Photographic reproduction and enhancement is challenging because it requires the preservation of all the visual information during the compression of the dynamic range of the input image. This paper presents a cascaded-architecture-type reproduction method that can simultaneously enhance local details and retain the naturalness of original global contrast. In the pre-processing stage, in addition to using a multiscale detail injection scheme to enhance the local details, the Stevens effect is considered for adapting different luminance levels and normally compressing the global feature. We propose a modified histogram equalization method in the reproduction stage, where individual histogram bin widths are first adjusted according to the property of overall image content. In addition, the human visual system (HVS) is considered so that a luminance-aware threshold can be used to control the maximum permissible width of each bin. Then, the global tone is modified by performing histogram equalization on the output modified histogram. Experimental results indicate that the proposed method can outperform the five state-of-the-art methods in terms of visual comparisons and several objective image quality evaluations.

Asunto(s)

Compresión de Datos , Aumento de la Imagen , Algoritmos , Humanos , Fotograbar , Reproducción

10.

Integration of an Image-Based Dietary Assessment Paradigm into Dietetic Training Improves Food Portion Estimates by Future Dietitians.

Ho, Dang Khanh Ngan; Chiu, Wan-Chun; Lee, Yu-Chieh; Su, Hsiu-Yueh; Chang, Chun-Chao; Yao, Chih-Yuan; Hua, Kai-Lung; Chu, Hung-Kuo; Hsu, Chien-Yeh; Chang, Jung-Su.

Nutrients ; 13(1)2021 Jan 08.

Artículo en Inglés | MEDLINE | ID: mdl-33430147

RESUMEN

The use of image-based dietary assessments (IBDAs) has rapidly increased; however, there is no formalized training program to enhance the digital viewing skills of dieticians. An IBDA was integrated into a nutritional practicum course in the School of Nutrition and Health Sciences, Taipei Medical University Taiwan. An online IBDA platform was created as an off-campus remedial teaching tool to reinforce the conceptualization of food portion sizes. Dietetic students' receptiveness and response to the IBDA, and their performance in food identification and quantification, were compared between the IBDA and real food visual estimations (RFVEs). No differences were found between the IBDA and RFVE in terms of food identification (67% vs. 71%) or quantification (±10% of estimated calories: 23% vs. 24%). A Spearman correlation analysis showed a moderate to high correlation for calorie estimates between the IBDA and RFVE (r ≥ 0.33~0.75, all p < 0.0001). Repeated IBDA training significantly improved students' image-viewing skills [food identification: first semester: 67%; pretest: 77%; second semester: 84%) and quantification [±10%: first semester: 23%; pretest: 28%; second semester: 32%; and ±20%: first semester: 38%; pretest: 48%; second semester: 59%] and reduced absolute estimated errors from 27% (first semester) to 16% (second semester). Training also greatly improved the identification of omitted foods (e.g., condiments, sugar, cooking oil, and batter coatings) and the accuracy of food portion size estimates. The integration of an IBDA into dietetic courses has the potential to help students develop knowledge and skills related to "e-dietetics".

Asunto(s)

Dietética/educación , Evaluación Nutricional , Nutricionistas/educación , Fotograbar , Tamaño de la Porción , Curriculum , Humanos , Internet

11.

PC5-Based Cellular-V2X Evolution and Deployment.

Miao, Lili; Virtusio, John Jethro; Hua, Kai-Lung.

Sensors (Basel) ; 21(3)2021 Jan 27.

Artículo en Inglés | MEDLINE | ID: mdl-33513998

RESUMEN

C-V2X (Cellular Vehicle-to-Everything) is a state-of-the-art wireless technology used in autonomous driving and intelligent transportation systems (ITS). This technology has extended the coverage and blind-spot detection of autonomous driving vehicles. Economically, C-V2X is much more cost-effective than the traditional sensors that are commonly used by autonomous driving vehicles. This cost-benefit makes it more practical in a large scale deployment. PC5-based C-V2X uses an RF (Radio Frequency) sidelink direct communication for low latency mission-critical vehicle sensor connectivity. Over the C-V2X radio communications, the autonomous driving vehicle's sensor ability can now be largely enhanced to the distances as far as the network covers. In 2020, 5G is commercialized worldwide, and Taiwan is at the forefront. Operators and governments are keen to see its implications in people's daily life brought by its low latency, high reliability, and high throughput. Autonomous driving class L3 (Conditional Automation) or L4 (Highly Automation) are good examples of 5G's advanced applications. In these applications, the mobile networks with URLLC (Ultra-Reliable Low-Latency Communication) are perfectly demonstrated. Therefore, C-V2X evolution and 5G NR (New Radio) deployment coincide and form a new ecosystem. This ecosystem will change how people will drive and how transportation will be managed in the future. In this paper, the following topics are covered. Firstly, the benefits of C-V2X communication technology. Secondly, the standards of C-V2X and C-V2X applications for automotive road safety system which includes V2P/V2I/V2V/V2N, and artificial intelligence in VRU (Vulnerable Road User) detection, object recognition and movement prediction for collision warning and prevention. Thirdly, PC5-based C-V2X deployment status in global, especially in Taiwan. Lastly, current challenges and conclusions of C-V2X development.

12.

Optimized CapsNet for Traffic Jam Speed Prediction Using Mobile Sensor Data under Urban Swarming Transportation.

Tampubolon, Hendrik; Yang, Chao-Lung; Chan, Arnold Samuel; Sutrisno, Hendri; Hua, Kai-Lung.

Sensors (Basel) ; 19(23)2019 Nov 29.

Artículo en Inglés | MEDLINE | ID: mdl-31795519

RESUMEN

Urban swarming transportation (UST) is a type of road transportation where multiple types of vehicles such as cars, buses, trucks, motorcycles, and bicycles, as well as pedestrians are allowed and mixed together on the roads. Predicting the traffic jam speed under UST is very different and difficult from the single road network traffic prediction which has been commonly studied in the intelligent traffic system (ITS) research. In this research, the road network wide (RNW) traffic prediction which predicts traffic jam speeds of multiple roads at once by utilizing citizens' mobile GPS sensor records is proposed to better predict traffic jam under UST. In order to conduct the RNW traffic prediction, a specific data preprocessing is needed to convert traffic data into an image representing spatial-temporal relationships among RNW. In addition, a revised capsule network (CapsNet), named OCapsNet, which utilizes nonlinearity functions in the first two convolution layers and the modified dynamic routing to optimize the performance of CapsNet, is proposed. The experiments were conducted using real-world urban road traffic data of Jakarta to evaluate the performance. The results show that OCapsNet has better performance than Convolution Neural Network (CNN) and original CapsNet with better accuracy and precision.

13.

An Adaptive Exposure Fusion Method Using fuzzy Logic and Multivariate Normal Conditional Random Fields.

Lin, Yu-Hsiu; Hua, Kai-Lung; Lu, Hsin-Han; Sun, Wei-Lun; Chen, Yung-Yao.

Sensors (Basel) ; 19(21)2019 Oct 31.

Artículo en Inglés | MEDLINE | ID: mdl-31683704

RESUMEN

High dynamic range (HDR) has wide applications involving intelligent vision sensing which includes enhanced electronic imaging, smart surveillance, self-driving cars, intelligent medical diagnosis, etc. Exposure fusion is an essential HDR technique which fuses different exposures of the same scene into an HDR-like image. However, determining the appropriate fusion weights is difficult because each differently exposed image only contains a subset of the scene's details. When blending, the problem of local color inconsistency is more challenging; thus, it often requires manual tuning to avoid image artifacts. To address this problem, we present an adaptive coarse-to-fine searching approach to find the optimal fusion weights. In the coarse-tuning stage, fuzzy logic is used to efficiently decide the initial weights. In the fine-tuning stage, the multivariate normal conditional random field model is used to adjust the fuzzy-based initial weights which allows us to consider both intra- and inter-image information in the data. Moreover, a multiscale enhanced fusion scheme is proposed to blend input images when maintaining the details in each scale-level. The proposed fuzzy-based MNCRF (Multivariate Normal Conditional Random Fields) fusion method provided a smoother blending result and a more natural look. Meanwhile, the details in the highlighted and dark regions were preserved simultaneously. The experimental results demonstrated that our work outperformed the state-of-the-art methods not only in several objective quality measures but also in a user study analysis.

14.

Bio-physic constraint model using spatial registration of delta 18F-fluorodeoxyglucose positron emission tomography/computed tomography images for predicting radiation pneumonitis in esophageal squamous cell carcinoma patients receiving neoadjuvant chemoradiation.

Hou, Tien-Chi; Dai, Kun-Yao; Wu, Ming-Che; Hua, Kai-Lung; Tai, Hung-Chi; Huang, Wen-Chien; Chen, Yu-Jen.

Onco Targets Ther ; 12: 6439-6451, 2019.

Artículo en Inglés | MEDLINE | ID: mdl-31496743

RESUMEN

PURPOSE: This study integrated clinical outcomes and radiomics of advanced thoracic esophageal squamous cell carcinoma patients receiving neoadjuvant concurrent chemoradiotherapy (NACCRT) to establish a novel constraint model for predicting radiation pneumonitis (RP). PATIENTS AND METHODS: We conducted a retrospective review for thoracic advanced esophageal cancer patients who received NACCRT. From 2013 to 2018, 89 patients were eligible for review. Staging workup and response evaluation included positron emission tomography/computed tomography (PET/CT) scans and endoscopic ultrasound. Patients received RT with 48 Gy to gross tumor and 43.2 Gy to elective nodal area in simultaneous integrated boost method divided in 24 fractions. Weekly platinum-based chemotherapy was administered concurrently. Side effects were evaluated using CTCAE v4. Images of 2-fluoro-2-deoxyglucose PET/CT before and after NACCRT were registered to planning CT images to create a region of interest for dosimetry parameters that spatially matched RP-related regions, including V10, V20, V50%, V27, and V30. Correlation between bio-physic parameters and toxicity was used to establish a constraint model for avoiding RP. RESULTS: Among the investigated cohort, clinical downstaging, complete pathological response, and 5-year overall survival rates were 59.6%, 40%, and 34.4%, respectively. Multivariate logistic regression analysis demonstrated that each individual set standardized uptake value ratios (SUVRs), neither pre- nor post-NACCRT, was not predictive. Interestingly, cutoff increments of 6.2% and 8.9% in SUVRs (delta-SUVR) in registered V20 and V27 regions were powerful predictors for acute and chronic RP, respectively. CONCLUSION: Spatial registration of metabolic and planning CT images with delta-radiomics analysis using fore-and-aft image sets can establish a unique bio-physic prediction model for avoiding RP in esophageal cancer patients receiving NACCRT.

15.

Single-Image Depth Inference Using Generative Adversarial Networks.

Tan, Daniel Stanley; Yao, Chih-Yuan; Ruiz, Conrado; Hua, Kai-Lung.

Sensors (Basel) ; 19(7)2019 Apr 10.

Artículo en Inglés | MEDLINE | ID: mdl-30974774

RESUMEN

Depth has been a valuable piece of information for perception tasks such as robot grasping, obstacle avoidance, and navigation, which are essential tasks for developing smart homes and smart cities. However, not all applications have the luxury of using depth sensors or multiple cameras to obtain depth information. In this paper, we tackle the problem of estimating the per-pixel depths from a single image. Inspired by the recent works on generative neural network models, we formulate the task of depth estimation as a generative task where we synthesize an image of the depth map from a single Red, Green, and Blue (RGB) input image. We propose a novel generative adversarial network that has an encoder-decoder type generator with residual transposed convolution blocks trained with an adversarial loss. Quantitative and qualitative experimental results demonstrate the effectiveness of our approach over several depth estimation works.

16.

Depth Map Upsampling via Multi-Modal Generative Adversarial Network.

Tan, Daniel Stanley; Lin, Jun-Ming; Lai, Yu-Chi; Ilao, Joel; Hua, Kai-Lung.

Sensors (Basel) ; 19(7)2019 Apr 02.

Artículo en Inglés | MEDLINE | ID: mdl-30986925

RESUMEN

Autonomous robots for smart homes and smart cities mostly require depth perception in order to interact with their environments. However, depth maps are usually captured in a lower resolution as compared to RGB color images due to the inherent limitations of the sensors. Naively increasing its resolution often leads to loss of sharpness and incorrect estimates, especially in the regions with depth discontinuities or depth boundaries. In this paper, we propose a novel Generative Adversarial Network (GAN)-based framework for depth map super-resolution that is able to preserve the smooth areas, as well as the sharp edges at the boundaries of the depth map. Our proposed model is trained on two different modalities, namely color images and depth maps. However, at test time, our model only requires the depth map in order to produce a higher resolution version. We evaluated our model both quantitatively and qualitatively, and our experiments show that our method performs better than existing state-of-the-art models.

17.

Baseball Player Behavior Classification System Using Long Short-Term Memory with Multimodal Features.

Sun, Shih-Wei; Mou, Ting-Chen; Fang, Chih-Chieh; Chang, Pao-Chi; Hua, Kai-Lung; Shih, Huang-Chia.

Sensors (Basel) ; 19(6)2019 Mar 22.

Artículo en Inglés | MEDLINE | ID: mdl-30909503

RESUMEN

In this paper, a preliminary baseball player behavior classification system is proposed. By using multiple IoT sensors and cameras, the proposed method accurately recognizes many of baseball players' behaviors by analyzing signals from heterogeneous sensors. The contribution of this paper is threefold: (i) signals from a depth camera and from multiple inertial sensors are obtained and segmented, (ii) the time-variant skeleton vector projection from the depth camera and the statistical features extracted from the inertial sensors are used as features, and (iii) a deep learning-based scheme is proposed for training behavior classifiers. The experimental results demonstrate that the proposed deep learning behavior system achieves an accuracy of greater than 95% compared to the proposed dataset.

Asunto(s)

Acelerometría/métodos , Conducta/fisiología , Aprendizaje Profundo , Acelerometría/instrumentación , Béisbol , Humanos , Articulaciones/fisiología , Memoria a Largo Plazo , Memoria a Corto Plazo , Fotograbar , Dispositivos Electrónicos Vestibles

18.

Reduction of Artefacts in JPEG-XR Compressed Images.

Hua, Kai-Lung; Trang, Ho Thi; Srinivasan, Kathiravan; Chen, Yung-Yao; Chen, Chun-Hao; Sharma, Vishal; Zomaya, Albert Y.

Sensors (Basel) ; 19(5)2019 Mar 09.

Artículo en Inglés | MEDLINE | ID: mdl-30857334

RESUMEN

The JPEG-XR encoding process utilizes two types of transform operations: Photo Overlap Transform (POT) and Photo Core Transform (PCT). Using the Device Porting Kit (DPK) provided by Microsoft, we performed encoding and decoding processes on JPEG XR images. It was discovered that when the quantization parameter is >1-lossy compression conditions, the resulting image displays chequerboard block artefacts, border artefacts and corner artefacts. These artefacts are due to the nonlinearity of transforms used by JPEG-XR. Typically, it is not so visible; however, it can cause problems while copying and scanning applications, as it shows nonlinear transforms when the source and the target of the image have different configurations. Hence, it is important for document image processing pipelines to take such artefacts into account. Additionally, these artefacts are most problematic for high-quality settings and appear more visible at high compression ratios. In this paper, we analyse the cause of the above artefacts. It was found that the main problem lies in the step of POT and quantization. To solve this problem, the use of a "uniform matrix" is proposed. After POT (encoding) and before inverse POT (decoding), an extra step is added to multiply this uniform matrix. Results suggest that it is an easy and effective way to decrease chequerboard, border and corner artefacts, thereby improving the image quality of lossy encoding JPEG XR than the original DPK program with no increased calculation complexity or file size.

19.

DeepDemosaicking: Adaptive Image Demosaicking via Multiple Deep Fully Convolutional Networks.

Tan, Daniel Stanley; Chen, Wei-Yang; Hua, Kai-Lung.

IEEE Trans Image Process ; 2018 Feb 07.

Artículo en Inglés | MEDLINE | ID: mdl-29994510

RESUMEN

Convolutional neural networks are currently the state-of-the-art solution for a wide range of image processing tasks. Their deep architecture extracts low and high-level features from images, thus, improving the model's performance. In this paper, we propose a method for image demosaicking based on deep convolutional neural networks. Demosaicking is the task of reproducing full color images from incomplete images formed from overlaid color filter arrays on image sensors found in digital cameras. Instead of producing the output image directly, the proposed method divides the demosaicking task into an initial demosaicking step and a refinement step. The initial step produces a rough demosaicked image containing unwanted color artifacts. The refinement step then reduces these color artifacts using deep residual estimation and multi-model fusion producing a higher quality image. Experimental results show that the proposed method outperforms several existing and state-of-the-art methods in terms of both subjective and objective evaluations.

20.

Edge-Preserving Depth Map Upsampling by Joint Trilateral Filter.

Lo, Kai-Han; Wang, Yu-Chiang Frank; Hua, Kai-Lung.

IEEE Trans Cybern ; 48(1): 371-384, 2018 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-28129196

RESUMEN

Compared to the color images, their associated depth images captured by the RGB-D sensors are typically with lower resolution. The task of depth map super-resolution (SR) aims at increasing the resolution of the range data by utilizing the high-resolution (HR) color image, while the details of the depth information are to be properly preserved. In this paper, we present a joint trilateral filtering (JTF) algorithm for depth image SR. The proposed JTF first observes context information from the HR color image. In addition to the extracted spatial and range information of local pixels, our JTF further integrates local gradient information of the depth image, which allows the prediction and refinement of HR depth image outputs without artifacts like textural copies or edge discontinuities. Quantitative and qualitative experimental results demonstrate the effectiveness and robustness of our approach over prior depth map upsampling works.

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA