Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
J Imaging Inform Med ; 2024 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-38758420

RESUMEN

Domain generalization (DG) for medical image segmentation due to privacy preservation prefers learning from a single-source domain and expects good robustness on unseen target domains. To achieve this goal, previous methods mainly use data augmentation to expand the distribution of samples and learn invariant content from them. However, most of these methods commonly perform global augmentation, leading to limited augmented sample diversity. In addition, the style of the augmented image is more scattered than the source domain, which may cause the model to overfit the style of the source domain. To address the above issues, we propose an invariant content representation network (ICRN) to enhance the learning of invariant content and suppress the learning of variability styles. Specifically, we first design a gamma correction-based local style augmentation (LSA) to expand the distribution of samples by augmenting foreground and background styles, respectively. Then, based on the augmented samples, we introduce invariant content learning (ICL) to learn generalizable invariant content from both augmented and source-domain samples. Finally, we design domain-specific batch normalization (DSBN) based style adversarial learning (SAL) to suppress the learning of preferences for source-domain styles. Experimental results show that our proposed method improves by 8.74% and 11.33% in overall dice coefficient (Dice) and reduces 15.88 mm and 3.87 mm in overall average surface distance (ASD) on two publicly available cross-domain datasets, Fundus and Prostate, compared to the state-of-the-art DG methods. The code is available at https://github.com/ZMC-IIIM/ICRN-DG .

2.
IEEE Trans Image Process ; 33: 3212-3226, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38687650

RESUMEN

Depth images and thermal images contain the spatial geometry information and surface temperature information, which can act as complementary information for the RGB modality. However, the quality of the depth and thermal images is often unreliable in some challenging scenarios, which will result in the performance degradation of the two-modal based salient object detection (SOD). Meanwhile, some researchers pay attention to the triple-modal SOD task, namely the visible-depth-thermal (VDT) SOD, where they attempt to explore the complementarity of the RGB image, the depth image, and the thermal image. However, existing triple-modal SOD methods fail to perceive the quality of depth maps and thermal images, which leads to performance degradation when dealing with scenes with low-quality depth and thermal images. Therefore, in this paper, we propose a quality-aware selective fusion network (QSF-Net) to conduct VDT salient object detection, which contains three subnets including the initial feature extraction subnet, the quality-aware region selection subnet, and the region-guided selective fusion subnet. Firstly, except for extracting features, the initial feature extraction subnet can generate a preliminary prediction map from each modality via a shrinkage pyramid architecture, which is equipped with the multi-scale fusion (MSF) module. Then, we design the weakly-supervised quality-aware region selection subnet to generate the quality-aware maps. Concretely, we first find the high-quality and low-quality regions by using the preliminary predictions, which further constitute the pseudo label that can be used to train this subnet. Finally, the region-guided selective fusion subnet purifies the initial features under the guidance of the quality-aware maps, and then fuses the triple-modal features and refines the edge details of prediction maps through the intra-modality and inter-modality attention (IIA) module and the edge refinement (ER) module, respectively. Extensive experiments are performed on VDT-2048 dataset, and the results show that our saliency model consistently outperforms 13 state-of-the-art methods with a large margin. Our code and results are available at https://github.com/Lx-Bao/QSFNet.

3.
Entropy (Basel) ; 26(2)2024 Jan 31.
Artículo en Inglés | MEDLINE | ID: mdl-38392385

RESUMEN

RGB-T salient object detection (SOD) has made significant progress in recent years. However, most existing works are based on heavy models, which are not applicable to mobile devices. Additionally, there is still room for improvement in the design of cross-modal feature fusion and cross-level feature fusion. To address these issues, we propose a lightweight cross-modal information mutual reinforcement network for RGB-T SOD. Our network consists of a lightweight encoder, the cross-modal information mutual reinforcement (CMIMR) module, and the semantic-information-guided fusion (SIGF) module. To reduce the computational cost and the number of parameters, we employ the lightweight module in both the encoder and decoder. Furthermore, to fuse the complementary information between two-modal features, we design the CMIMR module to enhance the two-modal features. This module effectively refines the two-modal features by absorbing previous-level semantic information and inter-modal complementary information. In addition, to fuse the cross-level feature and detect multiscale salient objects, we design the SIGF module, which effectively suppresses the background noisy information in low-level features and extracts multiscale information. We conduct extensive experiments on three RGB-T datasets, and our method achieves competitive performance compared to the other 15 state-of-the-art methods.

4.
IEEE Trans Cybern ; 53(1): 539-552, 2023 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-35417369

RESUMEN

Optical remote sensing images (RSIs) have been widely used in many applications, and one of the interesting issues about optical RSIs is the salient object detection (SOD). However, due to diverse object types, various object scales, numerous object orientations, and cluttered backgrounds in optical RSIs, the performance of the existing SOD models often degrade largely. Meanwhile, cutting-edge SOD models targeting optical RSIs typically focus on suppressing cluttered backgrounds, while they neglect the importance of edge information which is crucial for obtaining precise saliency maps. To address this dilemma, this article proposes an edge-guided recurrent positioning network (ERPNet) to pop-out salient objects in optical RSIs, where the key point lies in the edge-aware position attention unit (EPAU). First, the encoder is used to give salient objects a good representation, that is, multilevel deep features, which are then delivered into two parallel decoders, including: 1) an edge extraction part and 2) a feature fusion part. The edge extraction module and the encoder form a U-shape architecture, which not only provides accurate salient edge clues but also ensures the integrality of edge information by extra deploying the intraconnection. That is to say, edge features can be generated and reinforced by incorporating object features from the encoder. Meanwhile, each decoding step of the feature fusion module provides the position attention about salient objects, where position cues are sharpened by the effective edge information and are used to recurrently calibrate the misaligned decoding process. After that, we can obtain the final saliency map by fusing all position attention cues. Extensive experiments are conducted on two public optical RSIs datasets, and the results show that the proposed ERPNet can accurately and completely pop-out salient objects, which consistently outperforms the state-of-the-art SOD models.

5.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 1135-1149, 2023 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-35119998

RESUMEN

Partial point cloud registration aims to transform partial scans into a common coordinate system. It is an important preprocessing step to generate complete 3D shapes. Although previous registration methods have made great progress in recent decades, traditional registration methods, such as Iterative Closest Point (ICP) and its variants, all these methods highly depend on the sufficient overlaps between two point clouds, because they cannot distinguish outlier correspondences. Note that the overlap between point clouds could always be small, which limits the application of these methods. To tackle this problem, we present a StrucTure-based OveRlap Matching (STORM) method for partial point cloud registration. In our method, an overlap prediction module with differentiable sampling is designed to detect points in overlap utilizing structure information, and facilitates exact partial correspondence generation, which is based on discriminative pointwise feature similarity. The pointwise features which contain effective structural information are extracted by graph-based methods. Experimental results and comparison with state-of-the-art methods demonstrate that STORM can achieve better performance. Moreover, most registration methods perform worse when the overlap ratio decreases, while STORM can still achieve satisfactory performance when the overlap ratio is small.

6.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7123-7141, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-36417745

RESUMEN

Scene text spotting is of great importance to the computer vision community due to its wide variety of applications. Recent methods attempt to introduce linguistic knowledge for challenging recognition rather than pure visual classification. However, how to effectively model the linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet++ for scene text spotting. First, the autonomous suggests enforcing explicitly language modeling by decoupling the recognizer into vision model and language model and blocking gradient flow between both models. Second, a novel bidirectional cloze network (BCN) as the language model is proposed based on bidirectional feature representation. Third, we propose an execution manner of iterative correction for the language model which can effectively alleviate the impact of noise input. Additionally, based on an ensemble of the iterative predictions, a self-training method is developed which can learn from unlabeled images effectively. Finally, to polish ABINet++ in long text recognition, we propose to aggregate horizontal features by embedding Transformer units inside a U-Net, and design a position and content attention module which integrates character order and content to attend to character features precisely. ABINet++ achieves state-of-the-art performance on both scene text recognition and scene text spotting benchmarks, which consistently demonstrates the superiority of our method in various environments especially on low-quality images. Besides, extensive experiments including in English and Chinese also prove that, a text spotter that incorporates our language modeling method can significantly improve its performance both in accuracy and speed compared with commonly used attention-based recognizers. Code is available at https://github.com/FangShancheng/ABINet-PP.

7.
Bioinformatics ; 39(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36440906

RESUMEN

MOTIVATION: Light-field microscopy (LFM) is a compact solution to high-speed 3D fluorescence imaging. Usually, we need to do 3D deconvolution to the captured raw data. Although there are deep neural network methods that can accelerate the reconstruction process, the model is not universally applicable for all system parameters. Here, we develop AutoDeconJ, a GPU-accelerated ImageJ plugin for 4.4× faster and more accurate deconvolution of LFM data. We further propose an image quality metric for the deconvolution process, aiding in automatically determining the optimal number of iterations with higher reconstruction accuracy and fewer artifacts. RESULTS: Our proposed method outperforms state-of-the-art light-field deconvolution methods in reconstruction time and optimal iteration numbers prediction capability. It shows better universality of different light-field point spread function (PSF) parameters than the deep learning method. The fast, accurate and general reconstruction performance for different PSF parameters suggests its potential for mass 3D reconstruction of LFM data. AVAILABILITY AND IMPLEMENTATION: The codes, the documentation and example data are available on an open source at: https://github.com/Onetism/AutoDeconJ.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Imagenología Tridimensional , Procesamiento de Imagen Asistido por Computador/métodos , Imagenología Tridimensional/métodos , Microscopía/métodos , Redes Neurales de la Computación
8.
Biology (Basel) ; 11(7)2022 Jun 30.
Artículo en Inglés | MEDLINE | ID: mdl-36101379

RESUMEN

Protein-protein interactions (PPIs) play an essential role in many biological cellular functions. However, it is still tedious and time-consuming to identify protein-protein interactions through traditional experimental methods. For this reason, it is imperative and necessary to develop a computational method for predicting PPIs efficiently. This paper explores a novel computational method for detecting PPIs from protein sequence, the approach which mainly adopts the feature extraction method: Locality Preserving Projections (LPP) and classifier: Rotation Forest (RF). Specifically, we first employ the Position Specific Scoring Matrix (PSSM), which can remain evolutionary information of biological for representing protein sequence efficiently. Then, the LPP descriptor is applied to extract feature vectors from PSSM. The feature vectors are fed into the RF to obtain the final results. The proposed method is applied to two datasets: Yeast and H. pylori, and obtained an average accuracy of 92.81% and 92.56%, respectively. We also compare it with K nearest neighbors (KNN) and support vector machine (SVM) to better evaluate the performance of the proposed method. In summary, all experimental results indicate that the proposed approach is stable and robust for predicting PPIs and promising to be a useful tool for proteomics research.

9.
J Alzheimers Dis ; 86(4): 1805-1816, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35253761

RESUMEN

BACKGROUND: Mounting evidence shows that the neuropathological burdens manifest preference in affecting brain regions during the dynamic progression of Alzheimer's disease (AD). Since the distinct brain regions are physically wired by white matter fibers, it is reasonable to hypothesize the differential spreading pattern of neuropathological burdens may underlie the wiring topology, which can be characterized using neuroimaging and network science technologies. OBJECTIVE: To study the dynamic spreading patterns of neuropathological events in AD. METHODS: We first examine whether hub nodes with high connectivity in the brain network (assemble of white matter wirings) are susceptible to a higher level of pathological burdens than other regions that are less involved in the process of information exchange in the network. Moreover, we propose a novel linear mixed-effect model to characterize the multi-factorial spreading process of neuropathological burdens from hub nodes to non-hub nodes, where age, sex, and APOE4 indicators are considered as confounders. We apply our statistical model to the longitudinal neuroimaging data of amyloid-PET and tau-PET, respectively. RESULTS: Our meta-data analysis results show that 1) AD differentially affects hub nodes with a significantly higher level of pathology, and 2) the longitudinal increase of neuropathological burdens on non-hub nodes is strongly correlated with the connectome distance to hub nodes rather than the spatial proximity. CONCLUSION: The spreading pathway of AD neuropathological burdens might start from hub regions and propagate through the white matter fibers in a prion-like manner.


Asunto(s)
Enfermedad de Alzheimer , Conectoma , Enfermedad de Alzheimer/patología , Encéfalo/patología , Conectoma/métodos , Humanos , Neuroimagen , Neuropatología
10.
IEEE Trans Image Process ; 31: 3565-3577, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35312620

RESUMEN

TV show captioning aims to generate a linguistic sentence based on the video and its associated subtitle. Compared to purely video-based captioning, the subtitle can provide the captioning model with useful semantic clues such as actors' sentiments and intentions. However, the effective use of subtitle is also very challenging, because it is the pieces of scrappy information and has semantic gap with visual modality. To organize the scrappy information together and yield a powerful omni-representation for all the modalities, an efficient captioning model requires understanding video contents, subtitle semantics, and the relations in between. In this paper, we propose an Intra- and Inter-relation Embedding Transformer (I2Transformer), consisting of an Intra-relation Embedding Block (IAE) and an Inter-relation Embedding Block (IEE) under the framework of a Transformer. First, the IAE captures the intra-relation in each modality via constructing the learnable graphs. Then, IEE learns the cross attention gates, and selects useful information from each modality based on their inter-relations, so as to derive the omni-representation as the input to the Transformer. Experimental results on the public dataset show that the I2Transformer achieves the state-of-the-art performance. We also evaluate the effectiveness of the IAE and IEE on two other relevant tasks of video with text inputs, i.e., TV show retrieval and video-guided machine translation. The encouraging performance further validates that the IAE and IEE blocks have a good generalization ability. The code is available at https://github.com/tuyunbin/I2Transformer.


Asunto(s)
Intención , Semántica
11.
IEEE Trans Cybern ; 52(12): 13862-13873, 2022 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-35077378

RESUMEN

Recent advances in 3-D sensors and 3-D modeling have led to the availability of massive amounts of 3-D data. It is too onerous and time consuming to manually label a plentiful of 3-D objects in real applications. In this article, we address this issue by transferring the knowledge from the existing labeled data (e.g., the annotated 2-D images or 3-D objects) to the unlabeled 3-D objects. Specifically, we propose a domain-adversarial guided siamese network (DAGSN) for unsupervised cross-domain 3-D object retrieval (CD3DOR). It is mainly composed of three key modules: 1) siamese network-based visual feature learning; 2) mutual information (MI)-based feature enhancement; and 3) conditional domain classifier-based feature adaptation. First, we design a siamese network to encode both 3-D objects and 2-D images from two domains because of its balanced accuracy and efficiency. Besides, it can guarantee the same transformation applied to both domains, which is crucial for the positive domain shift. The core issue for the retrieval task is to improve the capability of feature abstraction, but the previous CD3DOR approaches merely focus on how to eliminate the domain shift. We solve this problem by maximizing the MI between the input 3-D object or 2-D image data and the high-level feature in the second module. To eliminate the domain shift, we design a conditional domain classifier, which can exploit multiplicative interactions between the features and predictive labels, to enforce the joint alignment in both feature level and category level. Consequently, the network can generate domain-invariant yet discriminative features for both domains, which is essential for CD3DOR. Extensive experiments on two protocols, including the cross-dataset 3-D object retrieval protocol (3-D to 3-D) on PSB/NTU, and the cross-modal 3-D object retrieval protocol (2-D to 3-D) on MI3DOR-2, demonstrate that the proposed DAGSN can significantly outperform state-of-the-art CD3DOR methods.

12.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 7705-7717, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-34559636

RESUMEN

Image demoireing is a multi-faceted image restoration task involving both moire pattern removal and color restoration. In this paper, we raise a general degradation model to describe an image contaminated by moire patterns, and propose a novel multi-scale bandpass convolutional neural network (MBCNN) for single image demoireing. For moire pattern removal, we propose a multi-block-size learnable bandpass filters (M-LBFs), based on a block-wise frequency domain transform, to learn the frequency domain priors of moire patterns. We also introduce a new loss function named Dilated Advanced Sobel loss (D-ASL) to better sense the frequency information. For color restoration, we propose a two-step tone mapping strategy, which first applies a global tone mapping to correct for a global color shift, and then performs local fine tuning of the color per pixel. To determine the most appropriate frequency domain transform, we investigate several transforms including DCT, DFT, DWT, learnable non-linear transform and learnable orthogonal transform. We finally adopt the DCT. Our basic model won the AIM2019 demoireing challenge. Experimental results on three public datasets show that our method outperforms state-of-the-art methods by a large margin.

13.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 8249-8260, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-34010126

RESUMEN

Human brain is a complex yet economically organized system, where a small portion of critical hub regions support the majority of brain functions. The identification of common hub nodes in a population of networks is often simplified as a voting procedure on the set of identified hub nodes across individual brain networks, which ignores the intrinsic data geometry and partially lacks the reproducible findings in neuroscience. Hence, we propose a first-ever group-wise hub identification method to identify hub nodes that are common across a population of individual brain networks. Specifically, the backbone of our method is to learn common graph embedding that can represent the majority of local topological profiles. By requiring orthogonality among the graph embedding vectors, each graph embedding as a data element is residing on the Grassmannian manifold. We present a novel Grassmannian manifold optimization scheme that allows us to find the common graph embeddings, which not only identify the most reliable hub nodes in each network but also yield a population-based common hub node map. Results of the accuracy and replicability on both synthetic and real network data show that the proposed manifold learning approach outperforms all hub identification methods employed in this evaluation.


Asunto(s)
Algoritmos , Aprendizaje , Encéfalo , Humanos
14.
IEEE Trans Neural Netw Learn Syst ; 33(11): 6484-6493, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-34161244

RESUMEN

One-shot semantic segmentation poses the challenging task of segmenting object regions from unseen categories with only one annotated example as guidance. Thus, how to effectively construct robust feature representations from the guidance image is crucial to the success of one-shot semantic segmentation. To this end, we propose in this article a simple, yet effective approach named rich embedding features (REFs). Given a reference image accompanied with its annotated mask, our REF constructs rich embedding features of the support object from three perspectives: 1) global embedding to capture the general characteristics; 2) peak embedding to capture the most discriminative information; 3) adaptive embedding to capture the internal long-range dependencies. By combining these informative features, we can easily harvest sufficient and rich guidance even from a single reference image. In addition to REF, we further propose a simple depth-priority context module to obtain useful contextual cues from the query image. This successfully raises the performance of one-shot semantic segmentation to a new level. We conduct experiments on pattern analysis, statical modeling and computational learning (Pascal) visual object classes (VOC) 2012 and common object in context (COCO) to demonstrate the effectiveness of our approach.

15.
IEEE Trans Vis Comput Graph ; 28(12): 4671-4684, 2022 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-34310310

RESUMEN

Real-time dense SLAM techniques aim to reconstruct the dense three-dimensional geometry of a scene in real time with an RGB or RGB-D sensor. An indoor scene is an important type of working environment for these techniques. The planar prior can be used in this scenario to improve the reconstruction quality, especially for large low-texture regions that commonly occur in an indoor scene. This article fully explores the planar prior in a dense SLAM pipeline. First, we propose a novel plane detection and segmentation method that runs at 200 Hz on a modern graphics processing unit. Our algorithm for constructing global plane constraints is very efficient; hence, we use it in the process of each input frame for the camera pose estimation while maintaining the real-time performance. Second, we propose herein a plane-based map representation that greatly reduces the memory footprint of plane regions while keeping the geometric details on planes. The experiments reveal that our system yields superior reconstruction results with planar information running at more than 30 fps. Aside from speed and storage improvements, our technique also handles the low-texture problem in plane regions.

16.
Patterns (N Y) ; 2(12): 100390, 2021 Dec 10.
Artículo en Inglés | MEDLINE | ID: mdl-34950907

RESUMEN

The continuous emergence of drug-target interaction data provides an opportunity to construct a biological network for systematically discovering unknown interactions. However, this is challenging due to complex and heterogeneous correlations between drug and target. Here, we describe a heterogeneous hypergraph-based framework for drug-target interaction (HHDTI) predictions by modeling biological networks through a hypergraph, where each vertex represents a drug or a target and a hyperedge indicates existing similar interactions or associations between the connected vertices. The hypergraph is then trained to generate suitably structured embeddings for discovering unknown interactions. Comprehensive experiments performed on four public datasets demonstrate that HHDTI achieves significant and consistently improved predictions compared with state-of-the-art methods. Our analysis indicates that this superior performance is due to the ability to integrate heterogeneous high-order information from the hypergraph learning. These results suggest that HHDTI is a scalable and practical tool for uncovering novel drug-target interactions.

17.
IEEE Trans Image Process ; 30: 9179-9192, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34739374

RESUMEN

RGB-D saliency detection is receiving more and more attention in recent years. There are many efforts have been devoted to this area, where most of them try to integrate the multi-modal information, i.e. RGB images and depth maps, via various fusion strategies. However, some of them ignore the inherent difference between the two modalities, which leads to the performance degradation when handling some challenging scenes. Therefore, in this paper, we propose a novel RGB-D saliency model, namely Dynamic Selective Network (DSNet), to perform salient object detection (SOD) in RGB-D images by taking full advantage of the complementarity between the two modalities. Specifically, we first deploy a cross-modal global context module (CGCM) to acquire the high-level semantic information, which can be used to roughly locate salient objects. Then, we design a dynamic selective module (DSM) to dynamically mine the cross-modal complementary information between RGB images and depth maps, and to further optimize the multi-level and multi-scale information by executing the gated and pooling based selection, respectively. Moreover, we conduct the boundary refinement to obtain high-quality saliency maps with clear boundary details. Extensive experiments on eight public RGB-D datasets show that the proposed DSNet achieves a competitive and excellent performance against the current 17 state-of-the-art RGB-D SOD models.


Asunto(s)
Algoritmos , Semántica
18.
Med Image Anal ; 73: 102162, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34274691

RESUMEN

Recent developments in neuroimaging allow us to investigate the structural and functional connectivity between brain regions in vivo. Mounting evidence suggests that hub nodes play a central role in brain communication and neural integration. Such high centrality, however, makes hub nodes particularly susceptible to pathological network alterations and the identification of hub nodes from brain networks has attracted much attention in neuroimaging. Current popular hub identification methods often work in a univariate manner, i.e., selecting the hub nodes one after another based on either heuristic of the connectivity profile at each node or predefined settings of network modules. Since the topological information of the entire network (such as network modules) is not fully utilized, current methods have limited power to identify hubs that link multiple modules (connector hubs) and are biased toward identifying hubs having many connections within the same module (provincial hubs). To address this challenge, we propose a novel multivariate hub identification method. Our method identifies connector hubs as those that partition the network into disconnected components when they are removed from the network. Furthermore, we extend our hub identification method to find the population-based hub nodes from a group of network data. We have compared our hub identification method with existing methods on both simulated and human brain network data. Our proposed method achieves more accurate and replicable discovery of hub nodes and exhibits enhanced statistical power in identifying network alterations related to neurological disorders such as Alzheimer's disease and obsessive-compulsive disorder.


Asunto(s)
Enfermedad de Alzheimer , Encéfalo , Enfermedad de Alzheimer/diagnóstico por imagen , Encéfalo/diagnóstico por imagen , Humanos , Imagen por Resonancia Magnética , Vías Nerviosas
19.
IEEE Trans Image Process ; 30: 5327-5338, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34043509

RESUMEN

Effective 3D shape retrieval and recognition are challenging but important tasks in computer vision research field, which have attracted much attention in recent decades. Although recent progress has shown significant improvement of deep learning methods on 3D shape retrieval and recognition performance, it is still under investigated of how to jointly learn an optimal representation of 3D shapes considering their relationships. To tackle this issue, we propose a multi-scale representation learning method on hypergraph for 3D shape retrieval and recognition, called multi-scale hypergraph neural network (MHGNN). In this method, the correlation among 3D shapes is formulated in a hypergraph and a hypergraph convolution process is conducted to learn the representations. Here, multiple representations can be obtained through different convolution layers, leading to multi-scale representations of 3D shapes. A fusion module is then introduced to combine these representations for 3D shape retrieval and recognition. The main advantages of our method lie in 1) the high-order correlation among 3D shapes can be investigated in the framework and 2) the joint multi-scale representation can be more robust for comparison. Comparisons with state-of-the-art methods on the public ModelNet40 dataset demonstrate remarkable performance improvement of our proposed method on the 3D shape retrieval task. Meanwhile, experiments on recognition tasks also show better results of our proposed method, which indicate the superiority of our method on learning better representation for retrieval and recognition.

20.
IEEE Trans Pattern Anal Mach Intell ; 43(4): 1445-1451, 2021 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-32091992

RESUMEN

Hashing is an efficient method for nearest neighbor search in large-scale data space by embedding high-dimensional feature descriptors into a similarity preserving Hamming space with a low dimension. However, large-scale high-speed retrieval through binary code has a certain degree of reduction in retrieval accuracy compared to traditional retrieval methods. We have noticed that multi-view methods can well preserve the diverse characteristics of data. Therefore, we try to introduce the multi-view deep neural network into the hash learning field, and design an efficient and innovative retrieval model, which has achieved a significant improvement in retrieval performance. In this paper, we propose a supervised multi-view hash model which can enhance the multi-view information through neural networks. This is a completely new hash learning method that combines multi-view and deep learning methods. The proposed method utilizes an effective view stability evaluation method to actively explore the relationship among views, which will affect the optimization direction of the entire network. We have also designed a variety of multi-data fusion methods in the Hamming space to preserve the advantages of both convolution and multi-view. In order to avoid excessive computing resources on the enhancement procedure during retrieval, we set up a separate structure called memory network which participates in training together. The proposed method is systematically evaluated on the CIFAR-10, NUS-WIDE and MS-COCO datasets, and the results show that our method significantly outperforms the state-of-the-art single-view and multi-view hashing methods.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...