Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
J Neural Eng ; 20(4)2023 08 31.
Article in English | MEDLINE | ID: mdl-37607552

ABSTRACT

Objective.Human beings perceive stereoscopic image quality through the cerebral visual cortex, which is a complex brain activity. As a solution, the quality of stereoscopic images can be evaluated more accurately by attempting to replicate the human perception from electroencephalogram (EEG) signals on image quality in a machine, which is different from previous stereoscopic image quality assessment methods focused only on the extraction of image features.Approach.Our proposed method is based on a novel image-to-brain (I2B) cross-modality model including a spatial-temporal EEG encoder (STEE) and an I2B deep convolutional generative adversarial network (I2B-DCGAN). Specifically, the EEG representations are first learned by STEE as real samples of I2B-DCGAN, which is designed to extract both quality and semantic features from the stereoscopic images by a semantic-guided image encoder, and utilize a generator to conditionally create the corresponding EEG features for images. Finally, the generated EEG features are classified to predict the image perceptual quality level.Main results.Extensive experimental results on the collected brain-visual multimodal stereoscopic image quality ranking database, demonstrate that the proposed I2B cross-modality model can better emulate the visual perception mechanism of the human brain and outperform the other methods by achieving an average accuracy of 95.95%.Significance.The proposed method can convert the learned stereoscopic image features into brain representations without EEG signals during testing. Further experiments verify that the proposed method has good generalization ability on new datasets and the potential for practical applications.


Subject(s)
Brain , Cerebral Cortex , Humans , Databases, Factual , Electroencephalography , Learning
2.
IEEE Trans Image Process ; 32: 3188-3198, 2023.
Article in English | MEDLINE | ID: mdl-37200126

ABSTRACT

In contrast to image compression, the key of video compression is to efficiently exploit the temporal context for reducing the inter-frame redundancy. Existing learned video compression methods generally rely on utilizing short-term temporal correlations or image-oriented codecs, which prevents further improvement of the coding performance. This paper proposed a novel temporal context-based video compression network (TCVC-Net) for improving the performance of learned video compression. Specifically, a global temporal reference aggregation (GTRA) module is proposed to obtain an accurate temporal reference for motion-compensated prediction by aggregating long-term temporal context. Furthermore, in order to efficiently compress the motion vector and residue, a temporal conditional codec (TCC) is proposed to preserve structural and detailed information by exploiting the multi-frequency components in temporal context. Experimental results show that the proposed TCVC-Net outperforms public state-of-the-art methods in terms of both PSNR and MS-SSIM metrics.

3.
IEEE Trans Image Process ; 31: 6707-6718, 2022.
Article in English | MEDLINE | ID: mdl-36260594

ABSTRACT

Generation of a 3D model of an object from multiple views has a wide range of applications. Different parts of an object would be accurately captured by a particular view or a subset of views in the case of multiple views. In this paper, a novel coarse-to-fine network (C2FNet) is proposed for 3D point cloud generation from multiple views. C2FNet generates subsets of 3D points that are best captured by individual views with the support of other views in a coarse-to-fine way, and then fuses these subsets of 3D points to a whole point cloud. It consists of a coarse generation module where coarse point clouds are constructed from multiple views by exploring the cross-view spatial relations, and a fine generation module where the coarse point cloud features are refined under the guidance of global consistency in appearance and context. Extensive experiments on the benchmark datasets have demonstrated that the proposed method outperforms the state-of-the-art methods.

4.
IEEE Trans Image Process ; 31: 4515-4526, 2022.
Article in English | MEDLINE | ID: mdl-35727785

ABSTRACT

Multiview video coding (MVC) aims to compress the multiview video through the elimination of video redundancies, where the quality of the reference frame directly affects the compression efficiency. In this paper, we propose a deep virtual reference frame generation method based on a disparity-aware reference frame generation network (DAG-Net) to transform the disparity relationship between different viewpoints and generate a more reliable reference frame. The proposed DAG-Net consists of a multi-level receptive field module, a disparity-aware alignment module, and a fusion reconstruction module. First, a multi-level receptive field module is designed to enlarge the receptive field, and extract the multi-scale deep features of the temporal and inter-view reference frames. Then, a disparity-aware alignment module is proposed to learn the disparity relationship, and perform disparity shift on the inter-view reference frame to align it with the temporal reference frame. Finally, a fusion reconstruction module is utilized to fuse the complementary information and generate a more reliable virtual reference frame. Experiments demonstrate that the proposed reference frame generation method achieves superior performance for multiview video coding.

5.
Article in English | MEDLINE | ID: mdl-35533169

ABSTRACT

Stereopsis is the ability of human beings to get the 3D perception on real scenarios. The conventional stereopsis measurement is based on subjective judgment for stereograms, leading to be easily affected by personal consciousness. To alleviate the issue, in this paper, the EEG signals evoked by dynamic random dot stereograms (DRDS) are collected for stereogram recognition, which can help the ophthalmologists diagnose strabismus patients even without real-time communication. To classify the collected Electroencephalography (EEG) signals, a novel multi-scale temporal self-attention and dynamical graph convolution hybrid network (MTS-DGCHN) is proposed, including multi-scale temporal self-attention module, dynamical graph convolution module and classification module. Firstly, the multi-scale temporal self-attention module is employed to learn time continuity information, where the temporal self-attention block is designed to highlight the global importance of each time segments in one EEG trial, and the multi-scale convolution block is developed to further extract advanced temporal features in multiple receptive fields. Meanwhile, the dynamical graph convolution module is utilized to capture spatial functional relationships between different EEG electrodes, in which the adjacency matrix of each GCN layer is adaptively tuned to explore the optimal intrinsic relationship. Finally, the temporal and spatial features are fed into the classification module to obtain prediction results. Extensive experiments are conducted on collected datasets i.e., SRDA and SRDB, and the results demonstrate the proposed MTS-DGCHN achieves outstanding classification performance compared with the other methods. The datasets are available at https://github.com/YANGeeg/TJU-SRD-datasets and the code is at https://github.com/YANGeeg/MTS-DGCHN.


Subject(s)
Attention , Electroencephalography , Electroencephalography/methods , Humans , Recognition, Psychology
6.
IEEE Trans Image Process ; 31: 1613-1627, 2022.
Article in English | MEDLINE | ID: mdl-35081029

ABSTRACT

Guided by the free-energy principle, generative adversarial networks (GAN)-based no-reference image quality assessment (NR-IQA) methods have improved the image quality prediction accuracy. However, the GAN cannot well handle the restoration task for the free-energy principle-guided NR-IQA methods, especially for the severely destroyed images, which results in that the quality reconstruction relationship between the distorted image and its restored image cannot be accurately built. To address this problem, a visual compensation restoration network (VCRNet)-based NR-IQA method is proposed, which uses a non-adversarial model to efficiently handle the distorted image restoration task. The proposed VCRNet consists of a visual restoration network and a quality estimation network. To accurately build the quality reconstruction relationship between the distorted image and its restored image, a visual compensation module, an optimized asymmetric residual block, and an error map-based mixed loss function, are proposed for increasing the restoration capability of the visual restoration network. For further addressing the NR-IQA problem of severely destroyed images, the multi-level restoration features which are obtained from the visual restoration network are used for the image quality estimation. To prove the effectiveness of the proposed VCRNet, seven representative IQA databases are used, and experimental results show that the proposed VCRNet achieves the state-of-the-art image quality prediction accuracy. The implementation of the proposed VCRNet has been released at https://github.com/NUIST-Videocoding/VCRNet.

7.
Article in English | MEDLINE | ID: mdl-32224459

ABSTRACT

The raw video data can be compressed much by the latest video coding standard, high efficiency video coding (HEVC). However, the block-based hybrid coding used in HEVC will incur lots of artifacts in compressed videos, the video quality will be severely influenced. To settle this problem, the in-loop filtering is used in HEVC to eliminate artifacts. Inspired by the success of deep learning, we propose an efficient in-loop filtering algorithm based on the enhanced deep convolutional neural networks (EDCNN) for significantly improving the performance of in-loop filtering in HEVC. Firstly, the problems of traditional convolutional neural networks models, including the normalization method, network learning ability, and loss function, are analyzed. Then, based on the statistical analyses, the EDCNN is proposed for efficiently eliminating the artifacts, which adopts three solutions, including a weighted normalization method, a feature information fusion block, and a precise loss function. Finally, the PSNR enhancement, PSNR smoothness, RD performance, subjective test, and computational complexity/GPU memory consumption are employed as the evaluation criteria, and experimental results show that when compared with the filter in HM16.9, the proposed in-loop filtering algorithm achieves an average of 6.45% BDBR reduction and 0.238 dB BDPSNR gains.

8.
Comput Biol Med ; 99: 161-172, 2018 08 01.
Article in English | MEDLINE | ID: mdl-29933127

ABSTRACT

The present study is based on the application of a multivariate statistical analysis approach for the selection of optimal descriptors of nanomaterials with the objective of robust qualitative modeling of their toxicity. A novel data mining protocol has been developed for the selection of an optimal subset of descriptors of nanomaterials by using the well-known multivariate method principal component analysis (PCA). The selected subsets of descriptors were validated for qualitative modeling of the toxicity of nanomaterials in the PC space. The analysis and validation of the proposed schemes were based on five decisive nanomaterial toxicity data sets available in the published literature. Optimal descriptors were selected on the basis of the maximum loading criteria and using a threshold value of cumulative variance ≤90% on PC directions. A maximum inter-class separation(B) and the minimum intra-classes separation(A) were obtained for toxic vs. nontoxic nanomaterials in the PC space with the selected subsets of optimal descriptors compared to their other combinations for each of the datasets.


Subject(s)
Algorithms , Data Mining , Nanostructures/toxicity , Animals , Humans , Multivariate Analysis
9.
Sensors (Basel) ; 15(12): 30115-25, 2015 Dec 02.
Article in English | MEDLINE | ID: mdl-26633415

ABSTRACT

Visual sensor networks (VSNs) can be widely applied in security surveillance, environmental monitoring, smart rooms, etc. However, with the increased number of camera nodes in VSNs, the volume of the visual information data increases significantly, which becomes a challenge for storage, processing and transmitting the visual data. The state-of-the-art video compression standard, high efficiency video coding (HEVC), can effectively compress the raw visual data, while the higher compression rate comes at the cost of heavy computational complexity. Hence, reducing the encoding complexity becomes vital for the HEVC encoder to be used in VSNs. In this paper, we propose a fast coding unit (CU) depth decision method to reduce the encoding complexity of the HEVC encoder for VSNs. Firstly, the content property of the CU is analyzed. Then, an early CU depth decision method and a low complexity distortion calculation method are proposed for the CUs with homogenous content. Experimental results show that the proposed method achieves 71.91% on average encoding time savings for the HEVC encoder for VSNs.

10.
IEEE Trans Image Process ; 24(7): 2225-38, 2015 Jul.
Article in English | MEDLINE | ID: mdl-25826804

ABSTRACT

In this paper, we propose a machine learning-based fast coding unit (CU) depth decision method for High Efficiency Video Coding (HEVC), which optimizes the complexity allocation at CU level with given rate-distortion (RD) cost constraints. First, we analyze quad-tree CU depth decision process in HEVC and model it as a three-level of hierarchical binary decision problem. Second, a flexible CU depth decision structure is presented, which allows the performances of each CU depth decision be smoothly transferred between the coding complexity and RD performance. Then, a three-output joint classifier consists of multiple binary classifiers with different parameters is designed to control the risk of false prediction. Finally, a sophisticated RD-complexity model is derived to determine the optimal parameters for the joint classifier, which is capable of minimizing the complexity in each CU depth at given RD degradation constraints. Comparative experiments over various sequences show that the proposed CU depth decision algorithm can reduce the computational complexity from 28.82% to 70.93%, and 51.45% on average when compared with the original HEVC test model. The Bjøntegaard delta peak signal-to-noise ratio and Bjøntegaard delta bit rate are -0.061 dB and 1.98% on average, which is negligible. The overall performance of the proposed algorithm outperforms those of the state-of-the-art schemes.

11.
IEEE Trans Image Process ; 22(4): 1598-609, 2013 Apr.
Article in English | MEDLINE | ID: mdl-23269749

ABSTRACT

In a generic decision process, optimal stopping theory aims to achieve a good tradeoff between decision performance and time consumed, with the advantages of theoretical decision-making and predictable decision performance. In this paper, optimal stopping theory is employed to develop an effective hybrid model for the mode decision problem, which aims to theoretically achieve a good tradeoff between the two interrelated measurements in mode decision, as computational complexity reduction and rate-distortion degradation. The proposed hybrid model is implemented and examined with a multiview encoder. To support the model and further promote coding performance, the multiview coding mode characteristics, including predicted mode probability and estimated coding time, are jointly investigated with inter-view correlations. Exhaustive experimental results with a wide range of video resolutions reveal the efficiency and robustness of our method, with high decision accuracy, negligible computational overhead, and almost intact rate-distortion performance compared to the original encoder.

SELECTION OF CITATIONS
SEARCH DETAIL
...