Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros












Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38767994

RESUMO

Discovering the novel associations of biomedical entities is of great significance and can facilitate not only the identification of network biomarkers of disease but also the search for putative drug targets. Graph representation learning (GRL) has incredible potential to efficiently predict the interactions from biomedical networks by modeling the robust representation for each node. However, the current GRL-based methods learn the representation of nodes by aggregating the features of their neighbors with equal weights. Furthermore, they also fail to identify which features of higher-order neighbors are integrated into the representation of the central node. In this work, we propose a novel graph representation learning framework: a multi-order graph neural network based on reconstructed specific subgraphs (MGRS) for biomedical interaction prediction. In the MGRS, we apply the multi-order graph aggregation module (MOGA) to learn the wide-view representation by integrating the multi-hop neighbor features. Besides, we propose a subgraph selection module (SGSM) to reconstruct the specific subgraph with adaptive edge weights for each node. SGSM can clearly explore the dependency of the node representation on the neighbor features and learn the subgraph-based representation based on the reconstructed weighted subgraphs. Extensive experimental results on four public biomedical networks demonstrate that the MGRS performs better and is more robust than the latest baselines.

2.
IEEE Trans Image Process ; 33: 2530-2543, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38530730

RESUMO

Existing human parsing frameworks commonly employ joint learning of semantic edge detection and human parsing to facilitate the localization around boundary regions. Nevertheless, the parsing prediction within the interior of the part contour may still exhibit inconsistencies due to the inherent ambiguity of fine-grained semantics. In contrast, binary edge detection does not suffer from such fine-grained semantic ambiguity, leading to a typical failure case where misclassification occurs inner the part contour while the semantic edge is accurately detected. To address these challenges, we develop a novel diffusion scheme that incorporates guidance from the detected semantic edge to mitigate this problem by propagating corrected classified semantics into the misclassified regions. Building upon this diffusion scheme, we present an Edge Guided Diffusion Network (EGDNet) for human parsing, which can progressively refine the parsing predictions to enhance the accuracy and coherence of human parsing results. Moreover, we design a horizontal-vertical aggregation to exploit inherent correlations among body parts along both the horizontal and vertical axes, which aims at enhancing the initial parsing results. Extensive experimental evaluations on various challenging datasets demonstrate the effectiveness of the proposed EGDNet. Remarkably, our EGDNet shows impressive performances on six benchmark datasets, including four human body parsing datasets (LIP, CIHP, ATR, and PASCAL-Person-Part), and two human face parsing datasets (CelebAMask-HQ and LaPa).


Assuntos
Benchmarking , Aprendizagem , Humanos , Semântica
3.
IEEE Trans Image Process ; 33: 1149-1161, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38300775

RESUMO

Composed query image retrieval task aims to retrieve the target image in the database by a query that composes two different modalities: a reference image and a sentence declaring that some details of the reference image need to be modified and replaced by new elements. Tackling this task needs to learn a multimodal embedding space, which can make semantically similar targets and queries close but dissimilar targets and queries as far away as possible. Most of the existing methods start from the perspective of model structure and design some clever interactive modules to promote the better fusion and embedding of different modalities. However, their learning objectives use conventional query-level examples as negatives while neglecting the composed query's multimodal characteristics, leading to the inadequate utilization of the training data and suboptimal construction of metric space. To this end, in this paper, we propose to improve the learning objective by constructing and mining hard negative examples from the perspective of multimodal fusion. Specifically, we compose the reference image and its logically unpaired sentences rather than paired ones to create component-level negative examples to better use data and enhance the optimization of metric space. In addition, we further propose a new sentence augmentation method to generate more indistinguishable multimodal negative examples from the element level and help the model learn a better metric space. Massive comparison experiments on four real-world datasets confirm the effectiveness of the proposed method.

4.
IEEE Trans Pattern Anal Mach Intell ; 45(7): 8594-8605, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37015575

RESUMO

This article explores how to harvest precise object segmentation masks while minimizing the human interaction cost. To achieve this, we propose a simple yet effective interaction scheme, named Inside-Outside Guidance (IOG). Concretely, we leverage an inside point that is clicked near the object center and two outside points at the symmetrical corner locations (top-left and bottom-right or top-right and bottom-left) of an almost-tight bounding box that encloses the target object. The interaction results in a total of one foreground click and four background clicks for segmentation. The advantages of our IOG are four-fold: 1) the two outside points can help remove distractions from other objects or background; 2) the inside point can help eliminate the unrelated regions inside the bounding box; 3) the inside and outside points are easily identified, reducing the confusion raised by the state-of-the-art DEXTR Maninis et al. 2018, in labeling some extreme samples; 4) it naturally supports additional click annotations for further correction. Despite its simplicity, our IOG not only achieves state-of-the-art performance on several popular benchmarks such as GrabCut Rother et al. 2004, PASCAL Everingham et al. 2010 and MS COCO Russakovsky et al. 2015, but also demonstrates strong generalization capability across different domains such as street scenes (Cityscapes Cordts et al. 2016), aerial imagery (Rooftop Sun et al. 2014 and Agriculture-Vision Chiu et al. 2020) and medical images (ssTEM Gerhard et al. 2013). Code is available at https://github.com/shiyinzhang/Inside-Outside-Guidancehttps://github.com/shiyinzhang/Inside-Outside-Guidance.

5.
IEEE Trans Image Process ; 31: 5976-5988, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36094980

RESUMO

Composed image retrieval aims at retrieving the desired images, given a reference image and a text piece. To handle this task, two important subprocesses should be modeled reasonably. One is to erase irrelated details of the reference image against the text piece, and the other is to replenish the desired details in the image against the text piece. Nowadays, the existing methods neglect to distinguish between the two subprocesses and implicitly put them together to solve the composed image retrieval task. To explicitly and orderly model the two subprocesses of the task, we propose a novel composed image retrieval method which contains three key components, i.e., Multi-semantic Dynamic Suppression module (MDS), Text-semantic Complementary Selection module (TCS), and Semantic Space Alignment constraints (SSA). Concretely, MDS is to erase irrelated details of the reference image by suppressing its semantic features. TCS aims to select and enhance the semantic features of the text piece and then replenish them to the reference image. In the end, to facilitate the erasure and replenishment subprocesses, SSA aligns the semantics of the two modality features in the final space. Extensive experiments on three benchmark datasets (Shoes, FashionIQ, and Fashion200K) show the superior performance of our approach against state-of-the-art methods.

6.
IEEE Trans Image Process ; 30: 7499-7510, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34460375

RESUMO

Garment transfer aims to transfer the desired garment from a model image with the desired clothing to a target person, which has attracted a great deal of attention due to its wider potential applications. However, considering the model and target persons are often given at different views, body shapes and poses, realistic garment transfer is facing the following challenges that have not been well addressed: 1) deforming the garment; 2) inferring unobserved appearance; 3) preserving fine texture details. To tackle these challenges, we propose a novel SPatial-Aware Texture Transformer (SPATT) model. Different from existing models, SPATT establishes correspondence and infers unobserved clothing appearance by leveraging the spatial prior information of a UV-space. Specifically, the source image is transformed into a partial UV texture map guided by the extracted dense pose. To better infer the unseen appearance utilizing seen region, we first propose a novel coordinate-prior map that defines the spatial relationship between the coordinates in the UV texture map, and design an algorithm to compute it. Based on the proposed coordinate-prior map, we present a novel spatial-aware texture generation network to complete the partial UV texture. In the second stage, we first transform the completed UV texture to fit the target person. To polish the details and improve realism, we introduce a refinement generative network conditioned on the warped image and source input. Compared with existing frameworks as shown experimentally, the proposed framework can generate more realistic images with better-preserved texture details. Furthermore, difficult cases where two persons have large pose and view differences can also be well handled by SPATT.

7.
BMC Med Inform Decis Mak ; 20(1): 264, 2020 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-33059709

RESUMO

BACKGROUND: Syndrome differentiation aims at dividing patients into several types according to their clinical symptoms and signs, which is essential for traditional Chinese medicine (TCM). Several previous works were devoted to employing the classical algorithms to classify the syndrome and achieved delightful results. However, the presence of ambiguous symptoms substantially disturbed the performance of syndrome differentiation, This disturbance is always due to the diversity and complexity of the patients' symptoms. METHODS: To alleviate this issue, we proposed an algorithm based on the multilayer perceptron model with an attention mechanism (ATT-MLP). In particular, we first introduced an attention mechanism to assign different weights for different symptoms among the symptomatic features. In this manner, the symptoms of major significance were highlighted and ambiguous symptoms were restrained. Subsequently, those weighted features were further fed into an MLP to predict the syndrome type of AIDS. RESULTS: Experimental results for a real-world AIDS dataset show that our framework achieves significant and consistent improvements compared to other methods. Besides, our model can also capture the key symptoms corresponding to each type of syndrome. CONCLUSION: In conclusion, our proposed method can learn these intrinsic correlations between symptoms and types of syndromes. Our model is able to learn the core cluster of symptoms for each type of syndrome from limited data, while assisting medical doctors to diagnose patients efficiently.


Assuntos
Síndrome da Imunodeficiência Adquirida/diagnóstico , Diagnóstico por Computador/métodos , Medicina Tradicional Chinesa/métodos , Redes Neurais de Computação , Algoritmos , Atenção , Humanos
8.
IEEE Trans Neural Netw Learn Syst ; 31(11): 4600-4609, 2020 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31945000

RESUMO

Convolutional neural network (CNN) is the primary technique that has greatly promoted the development of computer vision technologies. However, there is little research on how to allocate parameters in different convolution layers when designing CNNs. We research mainly on revealing the relationship between CNN parameter distribution, i.e., the allocation of parameters in convolution layers, and the discriminative performance of CNN. Unlike previous works, we do not append more elements into the network, such as more convolution layers or denser short connections. We focus on enhancing the discriminative performance of CNN through varying its parameter distribution under strict size constraint. We propose an energy function to represent the CNN parameter distribution, which establishes the connection between the allocation of parameters and the discriminative performance of CNN. Extensive experiments with shallow CNNs on three public image classification data sets demonstrate that the CNN parameter distribution with a higher energy value will promote the model to obtain better performance. According to the motivated observation, the problem of finding the optimal parameter distribution can be transformed into an optimization problem of finding the biggest energy value. We present a simple yet effective guideline that uses balanced parameter distribution to design CNNs. Extensive experiments on ImageNet with three popular backbones, i.e., AlexNet, ResNet34, and ResNet101, demonstrate that the proposed guideline can make consistent improvements upon different baselines under strict size constraint.

9.
Artigo em Inglês | MEDLINE | ID: mdl-31059441

RESUMO

In content-based image retrieval (CBIR), one of the most challenging and ambiguous tasks are to correctly understand the human query intention and measure its semantic relevance with images in the database. Due to the impressive capability of visual saliency in predicting human visual attention that is closely related to the query intention, this paper attempts to explicitly discover the essential effect of visual saliency in CBIR via qualitative and quantitative experiments. Toward this end, we first generate the fixation density maps of images from a widely used CBIR dataset by using an eye-tracking apparatus. These ground-truth saliency maps are then used to measure the influence of visual saliency to the task of CBIR by exploring several probable ways of incorporating such saliency cues into the retrieval process. We find that visual saliency is indeed beneficial to the CBIR task, and the best saliency involving scheme is possibly different for different image retrieval models. Inspired by the findings, this paper presents two-stream attentive CNNs with saliency embedded inside for CBIR. The proposed network has two streams that simultaneously handle two tasks. The main stream focuses on extracting discriminative visual features that are tightly related to semantic attributes. Meanwhile, the auxiliary stream aims to facilitate the main stream by redirecting the feature extraction to the salient image content that human may pay attention to. By fusing these two streams into the Main and Auxiliary CNNs (MAC), image similarity can be computed as the human being does by reserving conspicuous content and suppressing irrelevant regions. Extensive experiments show that the proposed model achieves impressive performance in image retrieval on four public datasets.

10.
IEEE Trans Image Process ; 28(9): 4219-4232, 2019 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-30932837

RESUMO

This paper presents an intelligent system named Magic-wall, which enables visualization of the effect of room decoration automatically. Concretely, given an image of the indoor scene and a preferred color, the Magic-wall can automatically locate the wall regions in the image and smoothly replace the existing wall with the required one. The key idea of the proposed Magic-wall is to leverage visual semantics to guide the entire process of color substitution, including wall segmentation and replacement. To strengthen the reality of visualization, we make the following contributions. First, we propose an edge-aware fully convolutional neural network (Edge-aware-FCN) for indoor semantic scene parsing, in which a novel edge-prior branch is introduced to identify the boundary of different semantic regions better. To further polish the details between the wall and other semantic regions, we leverage the output of Edge-aware-FCN as the prior knowledge, concatenating with the image to form a new input for the Enhanced-Net. In such a case, the Enhanced-Net is able to capture more semantic-aware information from the input and polish some ambiguous regions. Finally, to naturally replace the color of the original walls, a simple yet effective color space conversion method is proposed for replacement with brightness reserved. We build a new indoor scene dataset upon ADE20K for training and testing, which includes six semantic labels. Extensive experimental evaluations and visualizations well demonstrate that the proposed Magic-wall is effective and can automatically generate a set of visually pleasing results.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...