RESUMEN
The fusion of magnetic resonance imaging and positron emission tomography can combine biological anatomical information and physiological metabolic information, which is of great significance for the clinical diagnosis and localization of lesions. In this paper, we propose a novel adaptive linear fusion method for multi-dimensional features of brain magnetic resonance and positron emission tomography images based on a convolutional neural network, termed as MdAFuse. First, in the feature extraction stage, three-dimensional feature extraction modules are constructed to extract coarse, fine, and multi-scale information features from the source image. Second, at the fusion stage, the affine mapping function of multi-dimensional features is established to maintain a constant geometric relationship between the features, which can effectively utilize structural information from a feature map to achieve a better reconstruction effect. Furthermore, our MdAFuse comprises a key feature visualization enhancement algorithm designed to observe the dynamic growth of brain lesions, which can facilitate the early diagnosis and treatment of brain tumors. Extensive experimental results demonstrate that our method is superior to existing fusion methods in terms of visual perception and nine kinds of objective image fusion metrics. Specifically, in the results of MR-PET fusion, the SSIM (Structural Similarity) and VIF (Visual Information Fidelity) metrics show improvements of 5.61% and 13.76%, respectively, compared to the current state-of-the-art algorithm. Our project is publicly available at: https://github.com/22385wjy/MdAFuse.
Asunto(s)
Algoritmos , Neoplasias Encefálicas , Encéfalo , Imagen por Resonancia Magnética , Tomografía de Emisión de Positrones , Neoplasias Encefálicas/diagnóstico por imagen , Humanos , Imagen por Resonancia Magnética/métodos , Tomografía de Emisión de Positrones/métodos , Encéfalo/diagnóstico por imagen , Imagen Multimodal/métodos , Redes Neurales de la ComputaciónRESUMEN
In this article, we propose a novel wavelet convolution unit for the image-oriented neural network to integrate wavelet analysis with a vanilla convolution operator to extract deep abstract features more efficiently. On one hand, in order to acquire non-local receptive fields and avoid information loss, we define a new convolution operation by composing a traditional convolution function and approximate and detailed representations after single-scale wavelet decomposition of source images. On the other hand, multi-scale wavelet decomposition is introduced to obtain more comprehensive multi-scale feature information. Then, we fuse all these cross-scale features to improve the problem of inaccurate localization of singular points. Given the novel wavelet convolution unit, we further design a network based on it for fine-grained Alzheimer's disease classifications (i.e., Alzheimer's disease, Normal controls, early mild cognitive impairment, late mild cognitive impairment). Up to now, only a few methods have studied one or several fine-grained classifications, and even fewer methods can achieve both fine-grained and multi-class classifications. We adopt the novel network and diffuse tensor images to achieve fine-grained classifications, which achieved state-of-the-art accuracy for all eight kinds of fine-grained classifications, up to 97.30%, 95.78%, 95.00%, 94.00%, 97.89%, 95.71%, 95.07%, 93.79%. In order to build a reference standard for Alzheimer's disease classifications, we actually implemented all twelve coarse-grained and fine-grained classifications. The results show that the proposed method achieves solidly high accuracy for them. Its classification ability greatly exceeds any kind of existing Alzheimer's disease classification method.
Asunto(s)
Enfermedad de Alzheimer , Disfunción Cognitiva , Humanos , Enfermedad de Alzheimer/diagnóstico por imagen , Redes Neurales de la Computación , Encéfalo , Bases de Datos FactualesRESUMEN
3D face reconstruction has witnessed considerable progress in recovering 3D face shapes and textures from in-the-wild images. However, due to a lack of texture detail information, the reconstructed shape and texture based on deep learning could not be used to re-render a photorealistic facial image since it does not work in harmony with weak supervision only from the spatial domain. In the paper, we propose a method of spatio-frequency decoupled weak-supervision for face reconstruction, which applies the losses from not only the spatial domain but also the frequency domain to learn the reconstruction process that approaches photorealistic effect based on the output shape and texture. In detail, the spatial domain losses cover image-level and perceptual-level supervision. Moreover, the frequency domain information is separated from the input and rendered images, respectively, and is then used to build the frequency-based loss. In particular, we devise a spectrum-wise weighted Wing loss to implement balanced attention on different spectrums. Through the spatio-frequency decoupled weak-supervision, the reconstruction process can be learned in harmony and generate detailed texture and high-quality shape only with labels of landmarks. The experiments on several benchmarks show that our method can generate high-quality results and outperform state-of-the-art methods in qualitative and quantitative comparisons.
RESUMEN
Designing efficient deep learning models for 3D point cloud perception is becoming a major research direction. Point-voxel convolution (PVConv) Liu et al. (2019) is a pioneering research work in this topic. However, since with quite a few layers of simple 3D convolutions and linear point-voxel feature fusion operations, it still has considerable room for improvement in performance. In this paper, we propose a novel pyramid point-voxel convolution (PyraPVConv) block with two key structural modifications to address the above issues. First, PyraPVConv uses a voxel pyramid module to fully extract voxel features in the manner of feature pyramid, such that sufficient voxel features can be obtained efficiently. Second, a sharable attention module is utilized to capture compatible features between multi-scale voxels in pyramid and point cloud for aggregation, as well as to reduce the complexity via structure sharing. Extensive results on three point cloud perception tasks, i.e., indoor scene segmentation, object part segmentation and 3D object detection, validate that the networks constructed by stacking PyraPVConv blocks are efficient in terms of both GPU memory consumption and computational complexity, and are superior to the state-of-the-art methods.
Asunto(s)
Atención , Redes Neurales de la Computación , PercepciónRESUMEN
A number of methods have been proposed for face reconstruction from single/multiple image(s). However, it is still a challenge to do reconstruction for limited number of wild images, in which there exists complex different imaging conditions, various face appearance, and limited number of high-quality images. And most current mesh model based methods cannot generate high-quality face model because of the local mapping deviation in geometric optics and distortion error brought by discrete differential operation. In this paper, accurate geometrical consistency modeling on B-spline parameter domain is proposed to reconstruct high-quality face surface from the various images. The modeling is completely consistent with the law of geometric optics, and B-spline reduces the distortion during surface deformation. In our method, 0th- and 1st-order consistency of stereo are formulated based on low-rank texture structures and local normals, respectively, to approach the pinpoint geometric modeling for face reconstruction. A practical solution combining the two consistency as well as an iterative algorithm is proposed to optimize high-detailed B-spline face effectively. Extensive empirical evaluations on synthetic data and unconstrained data are conducted, and the experimental results demonstrate the effectiveness of our method on challenging scenario, e.g., limited number of images with different head poses, illuminations, and expressions.
RESUMEN
Two-scale image representation of base and detail in the spatial-domain is a well-known decomposition scheme for its lower computational complexity than that performed in the transform-domain in the field of image fusion. Unfortunately, for a pseudo-colour input image, the base and detail images in the spatial-domain obtained via image decomposition scheme always display in greyscale. In this paper, a two-scale image fusion method with adaptive threshold obtained by Otsu's method is proposed for pseudo-colour image in the colour space domain. For greyscale image, detail and base image are obtained using structural information extracted from the difference image between a global and a local patch size. Consequently, local edge-preserving filter for preserving luminance information and local energy with the discussed window size are adopted to combine base and detail image. Experimental results show that structural and luminance information has been better preserved in terms of subjective and objective evaluations for medical image and protein image fusion. Specially, a two-step non-parametric statistical test (Friedman test and Nemenyi post-hoc test) with p-values is adopted to analysis the statistical significant of the relative difference between the proposed and compared methods in terms of values of objective metrics including 30 co-registered pairs of imaging data.