Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 310
Filtrar
1.
Artículo en Inglés | MEDLINE | ID: mdl-38031559

RESUMEN

Cardiac cine magnetic resonance imaging (MRI) has been used to characterize cardiovascular diseases (CVD), often providing a noninvasive phenotyping tool. While recently flourished deep learning based approaches using cine MRI yield accurate characterization results, the performance is often degraded by small training samples. In addition, many deep learning models are deemed a "black box," for which models remain largely elusive in how models yield a prediction and how reliable they are. To alleviate this, this work proposes a lightweight successive subspace learning (SSL) framework for CVD classification, based on an interpretable feedforward design, in conjunction with a cardiac atlas. Specifically, our hierarchical SSL model is based on (i) neighborhood voxel expansion, (ii) unsupervised subspace approximation, (iii) supervised regression, and (iv) multi-level feature integration. In addition, using two-phase 3D deformation fields, including end-diastolic and end-systolic phases, derived between the atlas and individual subjects as input offers objective means of assessing CVD, even with small training samples. We evaluate our framework on the ACDC2017 database, comprising one healthy group and four disease groups. Compared with 3D CNN-based approaches, our framework achieves superior classification performance with 140× fewer parameters, which supports its potential value in clinical use.

2.
IEEE Trans Image Process ; 32: 5933-5947, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37903048

RESUMEN

Dynamic point cloud is a volumetric visual data representing realistic 3D scenes for virtual reality and augmented reality applications. However, its large data volume has been the bottleneck of data processing, transmission, and storage, which requires effective compression. In this paper, we propose a Perceptually Weighted Rate-Distortion Optimization (PWRDO) scheme for Video-based Point Cloud Compression (V-PCC), which aims to minimize the perceptual distortion of reconstructed point cloud at the given bit rate. Firstly, we propose a general framework of perceptually optimized V-PCC to exploit visual redundancies in point clouds. Secondly, a multi-scale Projection based Point Cloud quality Metric (PPCM) is proposed to measure the perceptual quality of 3D point cloud. The PPCM model comprises 3D-to-2D patch projection, multi-scale structural distortion measurement, and fusion model. Approximations and simplifications of the proposed PPCM are also presented for both V-PCC integration and low complexity. Thirdly, based on the simplified PPCM model, we propose a PWRDO scheme with Lagrange multiplier adaptation, which is incorporated into the V-PCC to enhance the coding efficiency. Experimental results show that the proposed PPCM models can be used as standalone quality metrics, and they are able to achieve higher consistency with the human subjective scores than the state-of-the-art objective visual quality metrics. Also, compared with the latest V-PCC reference model, the proposed PWRDO-based V-PCC scheme achieves an average bit rate reduction of 13.52%, 8.16%, 10.56% and 9.54%, respectively, in terms of four objective visual quality metrics for point clouds. It is significantly superior to the state-of-the-art coding algorithms. The computational complexity of the proposed PWRDO increases by 1.71% and 0.05% on average to the V-PCC encoder and decoder, respectively, which is negligible. The source codes of the PPCM and PWRDO schemes are available at https://github.com/VVCodec/PPCM-PWRDO.

3.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 14856-14871, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37647182

RESUMEN

An enhanced label propagation (LP) method called GraphHop was proposed recently. It outperforms graph convolutional networks (GCNs) in the semi-supervised node classification task on various networks. Although the performance of GraphHop was explained intuitively with joint node attribute and label signal smoothening, its rigorous mathematical treatment is lacking. In this paper, we propose a label efficient regularization and propagation (LERP) framework for graph node classification, and present an alternate optimization procedure for its solution. Furthermore, we show that GraphHop only offers an approximate solution to this framework and has two drawbacks. First, it includes all nodes in the classifier training without taking the reliability of pseudo-labeled nodes into account in the label update step. Second, it provides a rough approximation to the optimum of a subproblem in the label aggregation step. Based on the LERP framework, we propose a new method, named the LERP method, to solve these two shortcomings. LERP determines reliable pseudo-labels adaptively during the alternate optimization and provides a better approximation to the optimum with computational efficiency. Theoretical convergence of LERP is guaranteed. Extensive experiments are conducted to demonstrate the effectiveness and efficiency of LERP. That is, LERP outperforms all benchmarking methods, including GraphHop, consistently on five common test datasets, two large-scale networks, and an object recognition task at extremely low label rates (i.e., 1, 2, 4, 8, 16, and 20 labeled samples per class).

5.
IEEE Trans Neural Netw Learn Syst ; 34(11): 9287-9301, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-35302944

RESUMEN

A scalable semisupervised node classification method on graph-structured data, called GraphHop, is proposed in this work. The graph contains all nodes' attributes and link connections but labels of only a subset of nodes. Graph convolutional networks (GCNs) have provided superior performance in node label classification over the traditional label propagation (LP) methods for this problem. Nevertheless, current GCN algorithms suffer from a considerable amount of labels for training because of high model complexity or cannot be easily generalized to large-scale graphs due to the expensive cost of loading the entire graph and node embeddings. Besides, nonlinearity makes the optimization process a mystery. To this end, an enhanced LP method, called GraphHop, is proposed to tackle these problems. GraphHop can be viewed as a smoothening LP algorithm, in which each propagation alternates between two steps: label aggregation and label update. In the label aggregation step, multihop neighbor embeddings are aggregated to the center node. In the label update step, new embeddings are learned and predicted for each node based on aggregated results from the previous step. The two-step iteration improves the graph signal smoothening capacity. Furthermore, to encode attributes, links, and labels on graphs effectively under one framework, we adopt a two-stage training process, i.e., the initialization stage and the iteration stage. Thus, the smooth attribute information extracted from the initialization stage is consistently imposed in the propagation process in the iteration stage. Experimental results show that GraphHop outperforms state-of-the-art graph learning methods on a wide range of tasks in graphs of various sizes (e.g., multilabel and multiclass classification on citation networks, social graphs, and commodity consumption graphs).

6.
IEEE Trans Neural Netw Learn Syst ; 34(12): 10711-10723, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-35544501

RESUMEN

Learning low-dimensional representations of bipartite graphs enables e-commerce applications, such as recommendation, classification, and link prediction. A layerwise-trained bipartite graph neural network (L-BGNN) embedding method, which is unsupervised, efficient, and scalable, is proposed in this work. To aggregate the information across and within two partitions of a bipartite graph, a customized interdomain message passing (IDMP) operation and an intradomain alignment (IDA) operation are adopted by the proposed L-BGNN method. Furthermore, we develop a layerwise training algorithm for L-BGNN to capture the multihop relationship of large bipartite networks and improve training efficiency. We conduct extensive experiments on several datasets and downstream tasks of various scales to demonstrate the effectiveness and efficiency of the L-BGNN method as compared with state-of-the-art methods. Our codes are publicly available at https://github.com/TianXieUSC/L-BGNN.

7.
Artículo en Inglés | MEDLINE | ID: mdl-35983176

RESUMEN

Unsupervised domain adaptation (UDA) has been widely used to transfer knowledge from a labeled source domain to an unlabeled target domain to counter the difficulty of labeling in a new domain. The training of conventional solutions usually relies on the existence of both source and target domain data. However, privacy of the large-scale and well-labeled data in the source domain and trained model parameters can become the major concern of cross center/domain collaborations. In this work, to address this, we propose a practical solution to UDA for segmentation with a black-box segmentation model trained in the source domain only, rather than original source data or a white-box source model. Specifically, we resort to a knowledge distillation scheme with exponential mixup decay (EMD) to gradually learn target-specific representations. In addition, unsupervised entropy minimization is further applied to regularization of the target domain confidence. We evaluated our framework on the BraTS 2018 database, achieving performance on par with white-box source model adaptation approaches.

8.
Artículo en Inglés | MEDLINE | ID: mdl-35895653

RESUMEN

Unsupervised domain adaptation (UDA) has been successfully applied to transfer knowledge from a labeled source domain to target domains without their labels. Recently introduced transferable prototypical networks (TPNs) further address class-wise conditional alignment. In TPN, while the closeness of class centers between source and target domains is explicitly enforced in a latent space, the underlying fine-grained subtype structure and the cross-domain within-class compactness have not been fully investigated. To counter this, we propose a new approach to adaptively perform a fine-grained subtype-aware alignment to improve the performance in the target domain without the subtype label in both domains. The insight of our approach is that the unlabeled subtypes in a class have the local proximity within a subtype while exhibiting disparate characteristics because of different conditional and label shifts. Specifically, we propose to simultaneously enforce subtype-wise compactness and class-wise separation, by utilizing intermediate pseudo-labels. In addition, we systematically investigate various scenarios with and without prior knowledge of subtype numbers and propose to exploit the underlying subtype structure. Furthermore, a dynamic queue framework is developed to evolve the subtype cluster centroids steadily using an alternative processing scheme. Experimental results, carried out with multiview congenital heart disease data and VisDA and DomainNet, show the effectiveness and validity of our subtype-aware UDA, compared with state-of-the-art UDA methods.

9.
Artículo en Inglés | MEDLINE | ID: mdl-35862331

RESUMEN

The multilayer perceptron (MLP) neural network is interpreted from the geometrical viewpoint in this work, that is, an MLP partition an input feature space into multiple nonoverlapping subspaces using a set of hyperplanes, where the great majority of samples in a subspace belongs to one object class. Based on this high-level idea, we propose a three-layer feedforward MLP (FF-MLP) architecture for its implementation. In the first layer, the input feature space is split into multiple subspaces by a set of partitioning hyperplanes and rectified linear unit (ReLU) activation, which is implemented by the classical two-class linear discriminant analysis (LDA). In the second layer, each neuron activates one of the subspaces formed by the partitioning hyperplanes with specially designed weights. In the third layer, all subspaces of the same class are connected to an output node that represents the object class. The proposed design determines all MLP parameters in a feedforward one-pass fashion analytically without backpropagation. Experiments are conducted to compare the performance of the traditional backpropagation-based MLP (BP-MLP) and the new FF-MLP. It is observed that the FF-MLP outperforms the BP-MLP in terms of design time, training time, and classification performance in several benchmarking datasets. Our source code is available at https://colab.research.google.com/drive/1Gz0L8A-nT4ijrUchrhEXXsnaacrFdenn?usp = sharing.

10.
Front Neurosci ; 16: 837646, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35720708

RESUMEN

Unsupervised domain adaptation (UDA) is an emerging technique that enables the transfer of domain knowledge learned from a labeled source domain to unlabeled target domains, providing a way of coping with the difficulty of labeling in new domains. The majority of prior work has relied on both source and target domain data for adaptation. However, because of privacy concerns about potential leaks in sensitive information contained in patient data, it is often challenging to share the data and labels in the source domain and trained model parameters in cross-center collaborations. To address this issue, we propose a practical framework for UDA with a black-box segmentation model trained in the source domain only, without relying on source data or a white-box source model in which the network parameters are accessible. In particular, we propose a knowledge distillation scheme to gradually learn target-specific representations. Additionally, we regularize the confidence of the labels in the target domain via unsupervised entropy minimization, leading to performance gain over UDA without entropy minimization. We extensively validated our framework on a few datasets and deep learning backbones, demonstrating the potential for our framework to be applied in challenging yet realistic clinical settings.

11.
IEEE Trans Image Process ; 31: 2710-2725, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35324441

RESUMEN

Inspired by the recent PointHop classification method, an unsupervised 3D point cloud registration method, called R-PointHop, is proposed in this work. R-PointHop first determines a local reference frame (LRF) for every point using its nearest neighbors and finds local attributes. Next, R-PointHop obtains local-to-global hierarchical features by point downsampling, neighborhood expansion, attribute construction and dimensionality reduction steps. Thus, point correspondences are built in hierarchical feature space using the nearest neighbor rule. Afterwards, a subset of salient points with good correspondence is selected to estimate the 3D transformation. The use of the LRF allows for invariance of the hierarchical features of points with respect to rotation and translation, thus making R-PointHop more robust at building point correspondence, even when the rotation angles are large. Experiments are conducted on the 3DMatch, ModelNet40, and Stanford Bunny datasets, which demonstrate the effectiveness of R-PointHop for 3D point cloud registration. R-PointHop's model size and training time are an order of magnitude smaller than those of deep learning methods, and its registration errors are smaller, making it a green and accurate solution. Our codes are available on GitHub (https://github.com/pranavkdm/R-PointHop).

12.
IEEE J Biomed Health Inform ; 26(7): 3185-3196, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35139030

RESUMEN

Modeling statistical properties of anatomical structures using magnetic resonance imaging is essential for revealing common information of a target population and unique properties of specific subjects. In brain imaging, a statistical brain atlas is often constructed using a number of healthy subjects. When tumors are present, however, it is difficult to either provide a common space for various subjects or align their imaging data due to the unpredictable distribution of lesions. Here we propose a deep learning-based image inpainting method to replace the tumor regions with normal tissue intensities using only a patient population. Our framework has three major innovations: 1) incompletely distributed datasets with random tumor locations can be used for training; 2) irregularly-shaped tumor regions are properly learned, identified, and corrected; and 3) a symmetry constraint between the two brain hemispheres is applied to regularize inpainted regions. Henceforth, regular atlas construction and image registration methods can be applied using inpainted data to obtain tissue deformation, thereby achieving group-specific statistical atlases and patient-to-atlas registration. Our framework was tested using the public database from the Multimodal Brain Tumor Segmentation challenge. Results showed increased similarity scores as well as reduced reconstruction errors compared with three existing image inpainting methods. Patient-to-atlas registration also yielded better results with improved normalized cross-correlation and mutual information and a reduced amount of deformation over the tumor regions.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Imagen por Resonancia Magnética , Encéfalo/diagnóstico por imagen , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Imagen por Resonancia Magnética/métodos
13.
J Nutr Health Aging ; 26(1): 6-12, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35067697

RESUMEN

OBJECTIVES: Frailty is a significant public health and clinical issue among the elder population. This study aimed to evaluate the nutritional status and renal function in relation to frailty among elderly Taiwanese. DESIGN: We administered community-based health surveys to the elder population in Chiayi County, Taiwan, from 2017 to 2019. MEASUREMENTS: We measured nutritional status (including serum albumin and total protein levels), renal function (including serum blood urea nitrogen, creatinine, urine protein, and urine creatinine levels), hand grip strength (GS) and calculated appendicular muscle mass (AMM). RESULTS: The study recruited 3739 participants (2139 women). Participants of both sexes with normal GS had higher serum albumin levels and lower urine protein/creatinine ratios (UPCRs). For the men with normal and weak GS, serum albumin levels were 4.15 ± 0.2 and 4.10 ± 0.2 g/dL (p < 0.01), and UPCRs were 123.1 ± 219.6 and 188.7 ± 366.2 (p < 0.001), respectively. GS was positively correlated with serum albumin and urine creatinine levels (r = 0.136 and 0.177, both p < 0.001). AMM was also positively correlated with serum albumin and urine creatinine levels (r = 0.078 and 0.091, both p < 0.001). In the multivariate regression model, for every 1 g/dL increase in serum albumin level, there was a 1.9 and 1.7-kg increase in GS for men and women (p < 0.05 and p < 0.01), respectively. The final model for predicting GS included age, albumin, BUN, and UPCR (urine creatinine for women) which presented a variance of 22.1% and 13.8%, respectively. CONCLUSION: Proper dietary nutritional intake and maintaining renal function are key elements for preventing frailty among elder population in Taiwan.


Asunto(s)
Fragilidad , Anciano , Creatinina , Estudios Transversales , Femenino , Fragilidad/epidemiología , Fuerza de la Mano , Humanos , Vida Independiente , Riñón/fisiología , Masculino , Estado Nutricional
14.
Med Image Comput Comput Assist Interv ; 13435: 725-734, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37093922

RESUMEN

Vision-and-language (V&L) models take image and text as input and learn to capture the associations between them. These models can potentially deal with the tasks that involve understanding medical images along with their associated text. However, applying V&L models in the medical domain is challenging due to the expensiveness of data annotations and the requirements of domain knowledge. In this paper, we identify that the visual representation in general V&L models is not suitable for processing medical data. To overcome this limitation, we propose BERTHop, a transformer-based model based on PixelHop++ and VisualBERT for better capturing the associations between clinical notes and medical images. Experiments on the OpenI dataset, a commonly used thoracic disease diagnosis benchmark, show that BERTHop achieves an average Area Under the Curve (AUC) of 98.12% which is 1.62% higher than state-of-the-art while it is trained on a 9× smaller dataset.

15.
IEEE Trans Pattern Anal Mach Intell ; 44(9): 5243-5260, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-33945470

RESUMEN

Deep learning recognition approaches can potentially perform better if we can extract a discriminative representation that controllably separates nuisance factors. In this paper, we propose a novel approach to explicitly enforce the extracted discriminative representation d, extracted latent variation l (e,g., background, unlabeled nuisance attributes), and semantic variation label vector s (e.g., labeled expressions/pose) to be independent and complementary to each other. We can cast this problem as an adversarial game in the latent space of an auto-encoder. Specifically, with the to-be-disentangled s, we propose to equip an end-to-end conditional adversarial network with the ability to decompose an input sample into d and l. However, we argue that maximizing the cross-entropy loss of semantic variation prediction from d is not sufficient to remove the impact of s from d, and that the uniform-target and entropy regularization are necessary. A collaborative mutual information regularization framework is further proposed to avoid unstable adversarial training. It is able to minimize the differentiable mutual information between the variables to enforce independence. The proposed discriminative representation inherits the desired tolerance property guided by prior knowledge of the task. Our proposed framework achieves top performance on diverse recognition tasks, including digits classification, large-scale face recognition on LFW and IJB-A datasets, and face recognition tolerant to changes in lighting, makeup, disguise, etc.


Asunto(s)
Reconocimiento Facial , Reconocimiento de Normas Patrones Automatizadas , Algoritmos , Iluminación
16.
Med Image Comput Comput Assist Interv ; 13435: 66-76, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-36780245

RESUMEN

Unsupervised domain adaptation (UDA) has been vastly explored to alleviate domain shifts between source and target domains, by applying a well-performed model in an unlabeled target domain via supervision of a labeled source domain. Recent literature, however, has indicated that the performance is still far from satisfactory in the presence of significant domain shifts. Nonetheless, delineating a few target samples is usually manageable and particularly worthwhile, due to the substantial performance gain. Inspired by this, we aim to develop semi-supervised domain adaptation (SSDA) for medical image segmentation, which is largely underexplored. We, thus, propose to exploit both labeled source and target domain data, in addition to unlabeled target data in a unified manner. Specifically, we present a novel asymmetric co-training (ACT) framework to integrate these subsets and avoid the domination of the source domain data. Following a divide-and-conquer strategy, we explicitly decouple the label supervisions in SSDA into two asymmetric sub-tasks, including semi-supervised learning (SSL) and UDA, and leverage different knowledge from two segmentors to take into account the distinction between the source and target label supervisions. The knowledge learned in the two modules is then adaptively integrated with ACT, by iteratively teaching each other, based on the confidence-aware pseudo-label. In addition, pseudo label noise is well-controlled with an exponential MixUp decay scheme for smooth propagation. Experiments on cross-modality brain tumor MRI segmentation tasks using the BraTS18 database showed, even with limited labeled target samples, ACT yielded marked improvements over UDA and state-of-the-art SSDA methods and approached an "upper bound" of supervised joint training.

17.
Annu Int Conf IEEE Eng Med Biol Soc ; 2021: 3535-3538, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34892002

RESUMEN

Assessment of cardiovascular disease (CVD) with cine magnetic resonance imaging (MRI) has been used to non-invasively evaluate detailed cardiac structure and function. Accurate segmentation of cardiac structures from cine MRI is a crucial step for early diagnosis and prognosis of CVD, and has been greatly improved with convolutional neural networks (CNN). There, however, are a number of limitations identified in CNN models, such as limited interpretability and high complexity, thus limiting their use in clinical practice. In this work, to address the limitations, we propose a lightweight and interpretable machine learning model, successive subspace learning with the subspace approximation with adjusted bias (Saab) transform, for accurate and efficient segmentation from cine MRI. Specifically, our segmentation framework is comprised of the following steps: (1) sequential expansion of near-to-far neighborhood at different resolutions; (2) channel-wise subspace approximation using the Saab transform for unsupervised dimension reduction; (3) class-wise entropy guided feature selection for supervised dimension reduction; (4) concatenation of features and pixel-wise classification with gradient boost; and (5) conditional random field for post-processing. Experimental results on the ACDC 2017 segmentation database, showed that our framework performed better than state-of-the-art U-Net models with 200× fewer parameters in delineating the left ventricle, right ventricle, and myocardium, thus showing its potential to be used in clinical practice.Clinical relevance- Delineation of the left ventricular cavity, myocardium, and right ventricle from cardiac MR images is a common clinical task to establish diagnosis and prognosis of CVD.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Imagen por Resonancia Cinemagnética , Corazón/diagnóstico por imagen , Ventrículos Cardíacos/diagnóstico por imagen , Redes Neurales de la Computación
18.
IEEE Trans Image Process ; 30: 5889-5904, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34156942

RESUMEN

Viewing various stereo images under different viewing conditions has escalated the need for effective object-level remapping techniques. In this paper, we propose a new object spatial mapping scheme, which adjusts the depth and size of the selected object to match user preference and viewing conditions. Existing warping-based methods often distort the shape of important objects or cannot faithfully adjust the depth/size of the selected object due to improper warping such as local rotations. In this paper, by explicitly reducing the transformation freedom degree of warping, we propose an optimization model based on axis-aligned warping for object spatial remapping. The proposed axis-aligned warping based optimization model can simultaneously adjust the depths and sizes of selected objects to their target values without introducing severe shape distortions. Moreover, we propose object consistency constraints to ensure the size/shape of parts inside a selected object to be consistently adjusted. Such constraints improve the size/shape adjustment performance while remaining robust to some extent to incomplete object extraction. Experimental results demonstrate that the proposed method achieves high flexibility and effectiveness in adjusting the size and depth of objects compared with existing methods.

19.
Brainlesion ; 12658: 80-91, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34013242

RESUMEN

Deformable registration of magnetic resonance images between patients with brain tumors and healthy subjects has been an important tool to specify tumor geometry through location alignment and facilitate pathological analysis. Since tumor region does not match with any ordinary brain tissue, it has been difficult to deformably register a patient's brain to a normal one. Many patient images are associated with irregularly distributed lesions, resulting in further distortion of normal tissue structures and complicating registration's similarity measure. In this work, we follow a multi-step context-aware image inpainting framework to generate synthetic tissue intensities in the tumor region. The coarse image-to-image translation is applied to make a rough inference of the missing parts. Then, a feature-level patch-match refinement module is applied to refine the details by modeling the semantic relevance between patch-wise features. A symmetry constraint reflecting a large degree of anatomical symmetry in the brain is further proposed to achieve better structure understanding. Deformable registration is applied between inpainted patient images and normal brains, and the resulting deformation field is eventually used to deform original patient data for the final alignment. The method was applied to the Multimodal Brain Tumor Segmentation (BraTS) 2018 challenge database and compared against three existing inpainting methods. The proposed method yielded results with increased peak signal-to-noise ratio, structural similarity index, inception score, and reduced L1 error, leading to successful patient-to-normal brain image registration.

20.
IEEE Trans Image Process ; 30: 5109-5121, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33989154

RESUMEN

It has been recognized that videos have to be encoded in a rate-distortion optimized manner for high coding performance. Therefore, operational coding methods have been developed for conventional distortion metrics such as Sum of Squared Error (SSE). Nowadays, with the rapid development of machine learning, the state-of-the-art learning based metric Video Multimethod Assessment Fusion (VMAF) has been proven to outperform conventional ones in terms of the correlation with human perception, and thus deserves integration into the coding framework. However, unlike conventional metrics, VMAF has no specific computational formulas and may be frequently updated by new training data, which invalidates the existing coding methods and makes it highly desired to develop a rate-distortion optimized method for VMAF. Moreover, VMAF is designed to operate at the frame level, which leads to further difficulties in its application to today's block based coding. In this paper, we propose a VMAF oriented perceptual coding method based on piecewise metric coupling. Firstly, we explore the correlation between VMAF and SSE in the neighborhood of a benchmark distortion. Then a rate-distortion optimization model is formulated based on the correlation, and an optimized block based coding method is presented for VMAF. Experimental results show that 3.61% and 2.67% bit saving on average can be achieved for VMAF under the low_delay_p and the random_access_main configurations of HEVC coding respectively.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA