Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
1.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36440906

RESUMO

MOTIVATION: Light-field microscopy (LFM) is a compact solution to high-speed 3D fluorescence imaging. Usually, we need to do 3D deconvolution to the captured raw data. Although there are deep neural network methods that can accelerate the reconstruction process, the model is not universally applicable for all system parameters. Here, we develop AutoDeconJ, a GPU-accelerated ImageJ plugin for 4.4× faster and more accurate deconvolution of LFM data. We further propose an image quality metric for the deconvolution process, aiding in automatically determining the optimal number of iterations with higher reconstruction accuracy and fewer artifacts. RESULTS: Our proposed method outperforms state-of-the-art light-field deconvolution methods in reconstruction time and optimal iteration numbers prediction capability. It shows better universality of different light-field point spread function (PSF) parameters than the deep learning method. The fast, accurate and general reconstruction performance for different PSF parameters suggests its potential for mass 3D reconstruction of LFM data. AVAILABILITY AND IMPLEMENTATION: The codes, the documentation and example data are available on an open source at: https://github.com/Onetism/AutoDeconJ.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Processamento de Imagem Assistida por Computador , Imageamento Tridimensional , Processamento de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Microscopia/métodos , Redes Neurais de Computação
2.
Entropy (Basel) ; 26(2)2024 Jan 31.
Artigo em Inglês | MEDLINE | ID: mdl-38392385

RESUMO

RGB-T salient object detection (SOD) has made significant progress in recent years. However, most existing works are based on heavy models, which are not applicable to mobile devices. Additionally, there is still room for improvement in the design of cross-modal feature fusion and cross-level feature fusion. To address these issues, we propose a lightweight cross-modal information mutual reinforcement network for RGB-T SOD. Our network consists of a lightweight encoder, the cross-modal information mutual reinforcement (CMIMR) module, and the semantic-information-guided fusion (SIGF) module. To reduce the computational cost and the number of parameters, we employ the lightweight module in both the encoder and decoder. Furthermore, to fuse the complementary information between two-modal features, we design the CMIMR module to enhance the two-modal features. This module effectively refines the two-modal features by absorbing previous-level semantic information and inter-modal complementary information. In addition, to fuse the cross-level feature and detect multiscale salient objects, we design the SIGF module, which effectively suppresses the background noisy information in low-level features and extracts multiscale information. We conduct extensive experiments on three RGB-T datasets, and our method achieves competitive performance compared to the other 15 state-of-the-art methods.

3.
Opt Lett ; 45(19): 5405-5408, 2020 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-33001905

RESUMO

Fourier ptychographic microscopy (FPM) is a computational approach geared towards creating high-resolution and large field-of-view images without mechanical scanning. Acquiring color images of histology slides often requires sequential acquisitions with red, green, and blue illuminations. The color reconstructions often suffer from coherent artifacts that are not presented in regular incoherent microscopy images. As a result, it remains a challenge to employ FPM for digital pathology applications, where resolution and color accuracy are of critical importance. Here we report a deep learning approach for performing unsupervised image-to-image translation of FPM reconstructions. A cycle-consistent adversarial network with multiscale structure similarity loss is trained to perform virtual brightfield and fluorescence staining of the recovered FPM images. In the training stage, we feed the network with two sets of unpaired images: (1) monochromatic FPM recovery and (2) color or fluorescence images captured using a regular microscope. In the inference stage, the network takes the FPM input and outputs a virtually stained image with reduced coherent artifacts and improved image quality. We test the approach on various samples with different staining protocols. High-quality color and fluorescence reconstructions validate its effectiveness.

4.
Brief Bioinform ; 18(4): 558-576, 2017 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-27345524

RESUMO

LncRNAs have attracted lots of attentions from researchers worldwide in recent decades. With the rapid advances in both experimental technology and computational prediction algorithm, thousands of lncRNA have been identified in eukaryotic organisms ranging from nematodes to humans in the past few years. More and more research evidences have indicated that lncRNAs are involved in almost the whole life cycle of cells through different mechanisms and play important roles in many critical biological processes. Therefore, it is not surprising that the mutations and dysregulations of lncRNAs would contribute to the development of various human complex diseases. In this review, we first made a brief introduction about the functions of lncRNAs, five important lncRNA-related diseases, five critical disease-related lncRNAs and some important publicly available lncRNA-related databases about sequence, expression, function, etc. Nowadays, only a limited number of lncRNAs have been experimentally reported to be related to human diseases. Therefore, analyzing available lncRNA-disease associations and predicting potential human lncRNA-disease associations have become important tasks of bioinformatics, which would benefit human complex diseases mechanism understanding at lncRNA level, disease biomarker detection and disease diagnosis, treatment, prognosis and prevention. Furthermore, we introduced some state-of-the-art computational models, which could be effectively used to identify disease-related lncRNAs on a large scale and select the most promising disease-related lncRNAs for experimental validation. We also analyzed the limitations of these models and discussed the future directions of developing computational models for lncRNA research.


Assuntos
RNA Longo não Codificante/genética , Algoritmos , Biologia Computacional , Simulação por Computador , Humanos
5.
Opt Express ; 27(16): 23173-23185, 2019 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-31510600

RESUMO

Two-dimensional phase unwrapping algorithms are widely used in optical metrology and measurements. The high noise from interference measurements, however, often leads to the failure of conventional phase unwrapping algorithms. In this paper, we propose a deep convolutional neural network (DCNN) based method to perform rapid and robust two-dimensional phase unwrapping. In our approach, we employ a DCNN architecture, DeepLabV3+, with noise suppression and strong feature representation capabilities. The employed DCNN is first used to perform semantic segmentation to obtain the segmentation result of the wrapped phase map. We then combine the wrapped phase map with the segmentation result to generate the unwrapped phase. We benchmarked our results by comparing them with well-established methods. The reported approach out-performed the conventional path-dependent and path-independent algorithms. We also tested the robustness of the reported approach using interference measurements from optical metrology setups. Our results, again, clearly out-performed the conventional phase unwrap algorithms. The reported approach may find applications in optical metrology and microscopy imaging.

6.
Brief Bioinform ; 17(4): 696-712, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-26283676

RESUMO

Identification of drug-target interactions is an important process in drug discovery. Although high-throughput screening and other biological assays are becoming available, experimental methods for drug-target interaction identification remain to be extremely costly, time-consuming and challenging even nowadays. Therefore, various computational models have been developed to predict potential drug-target associations on a large scale. In this review, databases and web servers involved in drug-target identification and drug discovery are summarized. In addition, we mainly introduced some state-of-the-art computational models for drug-target interactions prediction, including network-based method, machine learning-based method and so on. Specially, for the machine learning-based method, much attention was paid to supervised and semi-supervised models, which have essential difference in the adoption of negative samples. Although significant improvements for drug-target interaction prediction have been obtained by many effective computational models, both network-based and machine learning-based methods have their disadvantages, respectively. Furthermore, we discuss the future directions of the network-based drug discovery and network approach for personalized drug discovery based on personalized medicine, genome sequencing, tumor clone-based network and cancer hallmark-based network. Finally, we discussed the new evaluation validation framework and the formulation of drug-target interactions prediction problem by more realistic regression formulation based on quantitative bioactivity data.


Assuntos
Bases de Dados Factuais , Sistemas de Liberação de Medicamentos , Descoberta de Drogas , Humanos
7.
J Imaging Inform Med ; 2024 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-38758420

RESUMO

Domain generalization (DG) for medical image segmentation due to privacy preservation prefers learning from a single-source domain and expects good robustness on unseen target domains. To achieve this goal, previous methods mainly use data augmentation to expand the distribution of samples and learn invariant content from them. However, most of these methods commonly perform global augmentation, leading to limited augmented sample diversity. In addition, the style of the augmented image is more scattered than the source domain, which may cause the model to overfit the style of the source domain. To address the above issues, we propose an invariant content representation network (ICRN) to enhance the learning of invariant content and suppress the learning of variability styles. Specifically, we first design a gamma correction-based local style augmentation (LSA) to expand the distribution of samples by augmenting foreground and background styles, respectively. Then, based on the augmented samples, we introduce invariant content learning (ICL) to learn generalizable invariant content from both augmented and source-domain samples. Finally, we design domain-specific batch normalization (DSBN) based style adversarial learning (SAL) to suppress the learning of preferences for source-domain styles. Experimental results show that our proposed method improves by 8.74% and 11.33% in overall dice coefficient (Dice) and reduces 15.88 mm and 3.87 mm in overall average surface distance (ASD) on two publicly available cross-domain datasets, Fundus and Prostate, compared to the state-of-the-art DG methods. The code is available at https://github.com/ZMC-IIIM/ICRN-DG .

8.
IEEE Trans Image Process ; 33: 3212-3226, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38687650

RESUMO

Depth images and thermal images contain the spatial geometry information and surface temperature information, which can act as complementary information for the RGB modality. However, the quality of the depth and thermal images is often unreliable in some challenging scenarios, which will result in the performance degradation of the two-modal based salient object detection (SOD). Meanwhile, some researchers pay attention to the triple-modal SOD task, namely the visible-depth-thermal (VDT) SOD, where they attempt to explore the complementarity of the RGB image, the depth image, and the thermal image. However, existing triple-modal SOD methods fail to perceive the quality of depth maps and thermal images, which leads to performance degradation when dealing with scenes with low-quality depth and thermal images. Therefore, in this paper, we propose a quality-aware selective fusion network (QSF-Net) to conduct VDT salient object detection, which contains three subnets including the initial feature extraction subnet, the quality-aware region selection subnet, and the region-guided selective fusion subnet. Firstly, except for extracting features, the initial feature extraction subnet can generate a preliminary prediction map from each modality via a shrinkage pyramid architecture, which is equipped with the multi-scale fusion (MSF) module. Then, we design the weakly-supervised quality-aware region selection subnet to generate the quality-aware maps. Concretely, we first find the high-quality and low-quality regions by using the preliminary predictions, which further constitute the pseudo label that can be used to train this subnet. Finally, the region-guided selective fusion subnet purifies the initial features under the guidance of the quality-aware maps, and then fuses the triple-modal features and refines the edge details of prediction maps through the intra-modality and inter-modality attention (IIA) module and the edge refinement (ER) module, respectively. Extensive experiments are performed on VDT-2048 dataset, and the results show that our saliency model consistently outperforms 13 state-of-the-art methods with a large margin. Our code and results are available at https://github.com/Lx-Bao/QSFNet.

9.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7123-7141, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-36417745

RESUMO

Scene text spotting is of great importance to the computer vision community due to its wide variety of applications. Recent methods attempt to introduce linguistic knowledge for challenging recognition rather than pure visual classification. However, how to effectively model the linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet++ for scene text spotting. First, the autonomous suggests enforcing explicitly language modeling by decoupling the recognizer into vision model and language model and blocking gradient flow between both models. Second, a novel bidirectional cloze network (BCN) as the language model is proposed based on bidirectional feature representation. Third, we propose an execution manner of iterative correction for the language model which can effectively alleviate the impact of noise input. Additionally, based on an ensemble of the iterative predictions, a self-training method is developed which can learn from unlabeled images effectively. Finally, to polish ABINet++ in long text recognition, we propose to aggregate horizontal features by embedding Transformer units inside a U-Net, and design a position and content attention module which integrates character order and content to attend to character features precisely. ABINet++ achieves state-of-the-art performance on both scene text recognition and scene text spotting benchmarks, which consistently demonstrates the superiority of our method in various environments especially on low-quality images. Besides, extensive experiments including in English and Chinese also prove that, a text spotter that incorporates our language modeling method can significantly improve its performance both in accuracy and speed compared with commonly used attention-based recognizers. Code is available at https://github.com/FangShancheng/ABINet-PP.

10.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 1135-1149, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35119998

RESUMO

Partial point cloud registration aims to transform partial scans into a common coordinate system. It is an important preprocessing step to generate complete 3D shapes. Although previous registration methods have made great progress in recent decades, traditional registration methods, such as Iterative Closest Point (ICP) and its variants, all these methods highly depend on the sufficient overlaps between two point clouds, because they cannot distinguish outlier correspondences. Note that the overlap between point clouds could always be small, which limits the application of these methods. To tackle this problem, we present a StrucTure-based OveRlap Matching (STORM) method for partial point cloud registration. In our method, an overlap prediction module with differentiable sampling is designed to detect points in overlap utilizing structure information, and facilitates exact partial correspondence generation, which is based on discriminative pointwise feature similarity. The pointwise features which contain effective structural information are extracted by graph-based methods. Experimental results and comparison with state-of-the-art methods demonstrate that STORM can achieve better performance. Moreover, most registration methods perform worse when the overlap ratio decreases, while STORM can still achieve satisfactory performance when the overlap ratio is small.

11.
IEEE Trans Cybern ; 53(1): 539-552, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35417369

RESUMO

Optical remote sensing images (RSIs) have been widely used in many applications, and one of the interesting issues about optical RSIs is the salient object detection (SOD). However, due to diverse object types, various object scales, numerous object orientations, and cluttered backgrounds in optical RSIs, the performance of the existing SOD models often degrade largely. Meanwhile, cutting-edge SOD models targeting optical RSIs typically focus on suppressing cluttered backgrounds, while they neglect the importance of edge information which is crucial for obtaining precise saliency maps. To address this dilemma, this article proposes an edge-guided recurrent positioning network (ERPNet) to pop-out salient objects in optical RSIs, where the key point lies in the edge-aware position attention unit (EPAU). First, the encoder is used to give salient objects a good representation, that is, multilevel deep features, which are then delivered into two parallel decoders, including: 1) an edge extraction part and 2) a feature fusion part. The edge extraction module and the encoder form a U-shape architecture, which not only provides accurate salient edge clues but also ensures the integrality of edge information by extra deploying the intraconnection. That is to say, edge features can be generated and reinforced by incorporating object features from the encoder. Meanwhile, each decoding step of the feature fusion module provides the position attention about salient objects, where position cues are sharpened by the effective edge information and are used to recurrently calibrate the misaligned decoding process. After that, we can obtain the final saliency map by fusing all position attention cues. Extensive experiments are conducted on two public optical RSIs datasets, and the results show that the proposed ERPNet can accurately and completely pop-out salient objects, which consistently outperforms the state-of-the-art SOD models.

12.
IEEE Trans Vis Comput Graph ; 28(12): 4671-4684, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34310310

RESUMO

Real-time dense SLAM techniques aim to reconstruct the dense three-dimensional geometry of a scene in real time with an RGB or RGB-D sensor. An indoor scene is an important type of working environment for these techniques. The planar prior can be used in this scenario to improve the reconstruction quality, especially for large low-texture regions that commonly occur in an indoor scene. This article fully explores the planar prior in a dense SLAM pipeline. First, we propose a novel plane detection and segmentation method that runs at 200 Hz on a modern graphics processing unit. Our algorithm for constructing global plane constraints is very efficient; hence, we use it in the process of each input frame for the camera pose estimation while maintaining the real-time performance. Second, we propose herein a plane-based map representation that greatly reduces the memory footprint of plane regions while keeping the geometric details on planes. The experiments reveal that our system yields superior reconstruction results with planar information running at more than 30 fps. Aside from speed and storage improvements, our technique also handles the low-texture problem in plane regions.

13.
IEEE Trans Neural Netw Learn Syst ; 33(11): 6484-6493, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34161244

RESUMO

One-shot semantic segmentation poses the challenging task of segmenting object regions from unseen categories with only one annotated example as guidance. Thus, how to effectively construct robust feature representations from the guidance image is crucial to the success of one-shot semantic segmentation. To this end, we propose in this article a simple, yet effective approach named rich embedding features (REFs). Given a reference image accompanied with its annotated mask, our REF constructs rich embedding features of the support object from three perspectives: 1) global embedding to capture the general characteristics; 2) peak embedding to capture the most discriminative information; 3) adaptive embedding to capture the internal long-range dependencies. By combining these informative features, we can easily harvest sufficient and rich guidance even from a single reference image. In addition to REF, we further propose a simple depth-priority context module to obtain useful contextual cues from the query image. This successfully raises the performance of one-shot semantic segmentation to a new level. We conduct experiments on pattern analysis, statical modeling and computational learning (Pascal) visual object classes (VOC) 2012 and common object in context (COCO) to demonstrate the effectiveness of our approach.

14.
Biology (Basel) ; 11(7)2022 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-36101379

RESUMO

Protein-protein interactions (PPIs) play an essential role in many biological cellular functions. However, it is still tedious and time-consuming to identify protein-protein interactions through traditional experimental methods. For this reason, it is imperative and necessary to develop a computational method for predicting PPIs efficiently. This paper explores a novel computational method for detecting PPIs from protein sequence, the approach which mainly adopts the feature extraction method: Locality Preserving Projections (LPP) and classifier: Rotation Forest (RF). Specifically, we first employ the Position Specific Scoring Matrix (PSSM), which can remain evolutionary information of biological for representing protein sequence efficiently. Then, the LPP descriptor is applied to extract feature vectors from PSSM. The feature vectors are fed into the RF to obtain the final results. The proposed method is applied to two datasets: Yeast and H. pylori, and obtained an average accuracy of 92.81% and 92.56%, respectively. We also compare it with K nearest neighbors (KNN) and support vector machine (SVM) to better evaluate the performance of the proposed method. In summary, all experimental results indicate that the proposed approach is stable and robust for predicting PPIs and promising to be a useful tool for proteomics research.

15.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 8249-8260, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-34010126

RESUMO

Human brain is a complex yet economically organized system, where a small portion of critical hub regions support the majority of brain functions. The identification of common hub nodes in a population of networks is often simplified as a voting procedure on the set of identified hub nodes across individual brain networks, which ignores the intrinsic data geometry and partially lacks the reproducible findings in neuroscience. Hence, we propose a first-ever group-wise hub identification method to identify hub nodes that are common across a population of individual brain networks. Specifically, the backbone of our method is to learn common graph embedding that can represent the majority of local topological profiles. By requiring orthogonality among the graph embedding vectors, each graph embedding as a data element is residing on the Grassmannian manifold. We present a novel Grassmannian manifold optimization scheme that allows us to find the common graph embeddings, which not only identify the most reliable hub nodes in each network but also yield a population-based common hub node map. Results of the accuracy and replicability on both synthetic and real network data show that the proposed manifold learning approach outperforms all hub identification methods employed in this evaluation.


Assuntos
Algoritmos , Aprendizagem , Encéfalo , Humanos
16.
J Alzheimers Dis ; 86(4): 1805-1816, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35253761

RESUMO

BACKGROUND: Mounting evidence shows that the neuropathological burdens manifest preference in affecting brain regions during the dynamic progression of Alzheimer's disease (AD). Since the distinct brain regions are physically wired by white matter fibers, it is reasonable to hypothesize the differential spreading pattern of neuropathological burdens may underlie the wiring topology, which can be characterized using neuroimaging and network science technologies. OBJECTIVE: To study the dynamic spreading patterns of neuropathological events in AD. METHODS: We first examine whether hub nodes with high connectivity in the brain network (assemble of white matter wirings) are susceptible to a higher level of pathological burdens than other regions that are less involved in the process of information exchange in the network. Moreover, we propose a novel linear mixed-effect model to characterize the multi-factorial spreading process of neuropathological burdens from hub nodes to non-hub nodes, where age, sex, and APOE4 indicators are considered as confounders. We apply our statistical model to the longitudinal neuroimaging data of amyloid-PET and tau-PET, respectively. RESULTS: Our meta-data analysis results show that 1) AD differentially affects hub nodes with a significantly higher level of pathology, and 2) the longitudinal increase of neuropathological burdens on non-hub nodes is strongly correlated with the connectome distance to hub nodes rather than the spatial proximity. CONCLUSION: The spreading pathway of AD neuropathological burdens might start from hub regions and propagate through the white matter fibers in a prion-like manner.


Assuntos
Doença de Alzheimer , Conectoma , Doença de Alzheimer/patologia , Encéfalo/patologia , Conectoma/métodos , Humanos , Neuroimagem , Neuropatologia
17.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 7705-7717, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34559636

RESUMO

Image demoireing is a multi-faceted image restoration task involving both moire pattern removal and color restoration. In this paper, we raise a general degradation model to describe an image contaminated by moire patterns, and propose a novel multi-scale bandpass convolutional neural network (MBCNN) for single image demoireing. For moire pattern removal, we propose a multi-block-size learnable bandpass filters (M-LBFs), based on a block-wise frequency domain transform, to learn the frequency domain priors of moire patterns. We also introduce a new loss function named Dilated Advanced Sobel loss (D-ASL) to better sense the frequency information. For color restoration, we propose a two-step tone mapping strategy, which first applies a global tone mapping to correct for a global color shift, and then performs local fine tuning of the color per pixel. To determine the most appropriate frequency domain transform, we investigate several transforms including DCT, DFT, DWT, learnable non-linear transform and learnable orthogonal transform. We finally adopt the DCT. Our basic model won the AIM2019 demoireing challenge. Experimental results on three public datasets show that our method outperforms state-of-the-art methods by a large margin.

18.
IEEE Trans Image Process ; 31: 3565-3577, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35312620

RESUMO

TV show captioning aims to generate a linguistic sentence based on the video and its associated subtitle. Compared to purely video-based captioning, the subtitle can provide the captioning model with useful semantic clues such as actors' sentiments and intentions. However, the effective use of subtitle is also very challenging, because it is the pieces of scrappy information and has semantic gap with visual modality. To organize the scrappy information together and yield a powerful omni-representation for all the modalities, an efficient captioning model requires understanding video contents, subtitle semantics, and the relations in between. In this paper, we propose an Intra- and Inter-relation Embedding Transformer (I2Transformer), consisting of an Intra-relation Embedding Block (IAE) and an Inter-relation Embedding Block (IEE) under the framework of a Transformer. First, the IAE captures the intra-relation in each modality via constructing the learnable graphs. Then, IEE learns the cross attention gates, and selects useful information from each modality based on their inter-relations, so as to derive the omni-representation as the input to the Transformer. Experimental results on the public dataset show that the I2Transformer achieves the state-of-the-art performance. We also evaluate the effectiveness of the IAE and IEE on two other relevant tasks of video with text inputs, i.e., TV show retrieval and video-guided machine translation. The encouraging performance further validates that the IAE and IEE blocks have a good generalization ability. The code is available at https://github.com/tuyunbin/I2Transformer.


Assuntos
Intenção , Semântica
19.
IEEE Trans Cybern ; 52(12): 13862-13873, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-35077378

RESUMO

Recent advances in 3-D sensors and 3-D modeling have led to the availability of massive amounts of 3-D data. It is too onerous and time consuming to manually label a plentiful of 3-D objects in real applications. In this article, we address this issue by transferring the knowledge from the existing labeled data (e.g., the annotated 2-D images or 3-D objects) to the unlabeled 3-D objects. Specifically, we propose a domain-adversarial guided siamese network (DAGSN) for unsupervised cross-domain 3-D object retrieval (CD3DOR). It is mainly composed of three key modules: 1) siamese network-based visual feature learning; 2) mutual information (MI)-based feature enhancement; and 3) conditional domain classifier-based feature adaptation. First, we design a siamese network to encode both 3-D objects and 2-D images from two domains because of its balanced accuracy and efficiency. Besides, it can guarantee the same transformation applied to both domains, which is crucial for the positive domain shift. The core issue for the retrieval task is to improve the capability of feature abstraction, but the previous CD3DOR approaches merely focus on how to eliminate the domain shift. We solve this problem by maximizing the MI between the input 3-D object or 2-D image data and the high-level feature in the second module. To eliminate the domain shift, we design a conditional domain classifier, which can exploit multiplicative interactions between the features and predictive labels, to enforce the joint alignment in both feature level and category level. Consequently, the network can generate domain-invariant yet discriminative features for both domains, which is essential for CD3DOR. Extensive experiments on two protocols, including the cross-dataset 3-D object retrieval protocol (3-D to 3-D) on PSB/NTU, and the cross-modal 3-D object retrieval protocol (2-D to 3-D) on MI3DOR-2, demonstrate that the proposed DAGSN can significantly outperform state-of-the-art CD3DOR methods.

20.
IEEE Trans Pattern Anal Mach Intell ; 43(4): 1445-1451, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-32091992

RESUMO

Hashing is an efficient method for nearest neighbor search in large-scale data space by embedding high-dimensional feature descriptors into a similarity preserving Hamming space with a low dimension. However, large-scale high-speed retrieval through binary code has a certain degree of reduction in retrieval accuracy compared to traditional retrieval methods. We have noticed that multi-view methods can well preserve the diverse characteristics of data. Therefore, we try to introduce the multi-view deep neural network into the hash learning field, and design an efficient and innovative retrieval model, which has achieved a significant improvement in retrieval performance. In this paper, we propose a supervised multi-view hash model which can enhance the multi-view information through neural networks. This is a completely new hash learning method that combines multi-view and deep learning methods. The proposed method utilizes an effective view stability evaluation method to actively explore the relationship among views, which will affect the optimization direction of the entire network. We have also designed a variety of multi-data fusion methods in the Hamming space to preserve the advantages of both convolution and multi-view. In order to avoid excessive computing resources on the enhancement procedure during retrieval, we set up a separate structure called memory network which participates in training together. The proposed method is systematically evaluated on the CIFAR-10, NUS-WIDE and MS-COCO datasets, and the results show that our method significantly outperforms the state-of-the-art single-view and multi-view hashing methods.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA