Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Neural Netw ; 179: 106523, 2024 Jul 09.
Artículo en Inglés | MEDLINE | ID: mdl-39053300

RESUMEN

Community detection in multi-layer networks stands as a prominent subject within network analysis research. However, the majority of existing techniques for identifying communities encounter two primary constraints: they lack suitability for high-dimensional data within multi-layer networks and fail to fully leverage additional auxiliary information among communities to enhance detection accuracy. To address these limitations, a novel approach named weighted prior tensor training decomposition (WPTTD) is proposed for multi-layer network community detection. Specifically, the WPTTD method harnesses the tensor feature optimization techniques to effectively manage high-dimensional data in multi-layer networks. Additionally, it employs a weighted flattened network to construct prior information for each dimension of the multi-layer network, thereby continuously exploring inter-community connections. To preserve the cohesive structure of communities and to harness comprehensive information within the multi-layer network for more effective community detection, the common community manifold learning (CCML) is integrated into the WPTTD framework for enhancing the performance. Experimental evaluations conducted on both artificial and real-world networks have verified that this algorithm outperforms several mainstream multi-layer network community detection algorithms.

2.
J Am Chem Soc ; 146(12): 8706-8715, 2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38487838

RESUMEN

Metal nanoclusters (MNCs) represent a promising class of materials for catalytic carbon dioxide and proton reduction as well as dihydrogen oxidation. In such reactions, multiple proton-coupled electron transfer (PCET) processes are typically involved, and the current understanding of PCET mechanisms in MNCs has primarily focused on the sequential transfer mode. However, a concerted transfer pathway, i.e., concerted electron-proton transfer (CEPT), despite its potential for a higher catalytic rate and lower reaction barrier, still lacks comprehensive elucidation. Herein, we introduce an experimental paradigm to test the feasibility of the CEPT process in MNCs, by employing Au18(SR)14 (SR denotes thiolate ligand), Au22(SR)18, and Au25(SR)18- as model clusters. Detailed investigations indicate that the photoinduced PCET reactions in the designed system proceed via an CEPT pathway. Furthermore, the rate constants of gold nanoclusters (AuNCs) have been found to be correlated with both the size of the cluster and the flexibility of the Au-S framework. This newly identified PCET behavior in AuNCs is prominently different from that observed in semiconductor quantum dots and plasmonic metal nanoparticles. Our findings are of crucial importance for unveiling the catalytic mechanisms of quantum-confined metal nanomaterials and for the future rational design of more efficient catalysts.

3.
Artículo en Inglés | MEDLINE | ID: mdl-38153822

RESUMEN

Video scene graph generation (VidSGG) aims to identify objects in visual scenes and infer their relationships for a given video. It requires not only a comprehensive understanding of each object scattered on the whole scene but also a deep dive into their temporal motions and interactions. Inherently, object pairs and their relationships enjoy spatial co-occurrence correlations within each image and temporal consistency/transition correlations across different images, which can serve as prior knowledge to facilitate VidSGG model learning and inference. In this work, we propose a spatial-temporal knowledge-embedded transformer (STKET) that incorporates the prior spatial-temporal knowledge into the multi-head cross-attention mechanism to learn more representative relationship representations. Specifically, we first learn spatial co-occurrence and temporal transition correlations in a statistical manner. Then, we design spatial and temporal knowledge-embedded layers that introduce the multi-head cross-attention mechanism to fully explore the interaction between visual representation and the knowledge to generate spatial- and temporal-embedded representations, respectively. Finally, we aggregate these representations for each subject-object pair to predict the final semantic labels and their relationships. Extensive experiments show that STKET outperforms current competing algorithms by a large margin, e.g., improving the mR@50 by 8.1%, 4.7%, and 2.1% on different settings over current algorithms.

4.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15462-15476, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37713216

RESUMEN

Blind face restoration aims at recovering high-quality face images from those with unknown degradations. Current algorithms mainly introduce priors to complement high-quality details and achieve impressive progress. However, most of these algorithms ignore abundant contextual information in the face and its interplay with the priors, leading to sub-optimal performance. Moreover, they pay less attention to the gap between the synthetic and real-world scenarios, limiting the robustness and generalization to real-world applications. In this work, we propose RestoreFormer++, which on the one hand introduces fully-spatial attention mechanisms to model the contextual information and the interplay with the priors, and on the other hand, explores an extending degrading model to help generate more realistic degraded face images to alleviate the synthetic-to-real-world gap. Compared with current algorithms, RestoreFormer++ has several crucial benefits. First, instead of using a multi-head self-attention mechanism like the traditional visual transformer, we introduce multi-head cross-attention over multi-scale features to fully explore spatial interactions between corrupted information and high-quality priors. In this way, it can facilitate RestoreFormer++ to restore face images with higher realness and fidelity. Second, in contrast to the recognition-oriented dictionary, we learn a reconstruction-oriented dictionary as priors, which contains more diverse high-quality facial details and better accords with the restoration target. Third, we introduce an extending degrading model that contains more realistic degraded scenarios for training data synthesizing, and thus helps to enhance the robustness and generalization of our RestoreFormer++ model. Extensive experiments show that RestoreFormer++ outperforms state-of-the-art algorithms on both synthetic and real-world datasets.

5.
IEEE Trans Neural Netw Learn Syst ; 34(7): 3308-3322, 2023 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-35089863

RESUMEN

Land remote-sensing analysis is a crucial research in earth science. In this work, we focus on a challenging task of land analysis, i.e., automatic extraction of traffic roads from remote-sensing data, which has widespread applications in urban development and expansion estimation. Nevertheless, conventional methods either only utilized the limited information of aerial images, or simply fused multimodal information (e.g., vehicle trajectories), thus cannot well recognize unconstrained roads. To facilitate this problem, we introduce a novel neural network framework termed cross-modal message propagation network (CMMPNet), which fully benefits the complementary different modal data (i.e., aerial images and crowdsourced trajectories). Specifically, CMMPNet is composed of two deep autoencoders for modality-specific representation learning and a tailor-designed dual enhancement module for cross-modal representation refinement. In particular, the complementary information of each modality is comprehensively extracted and dynamically propagated to enhance the representation of another modality. Extensive experiments on three real-world benchmarks demonstrate the effectiveness of our CMMPNet for robust road extraction benefiting from blending different modal data, either using image and trajectory data or image and light detection and ranging (LiDAR) data. From the experimental results, we observe that the proposed approach outperforms current state-of-the-art methods by large margins. Our source code is resealed on the project page http://lingboliu.com/multimodal_road_extraction.html.


Asunto(s)
Colaboración de las Masas , Redes Neurales de la Computación , Benchmarking , Redes Reguladoras de Genes , Aprendizaje
6.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 9887-9903, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-34847019

RESUMEN

Facial expression recognition (FER) has received significant attention in the past decade with witnessed progress, but data inconsistencies among different FER datasets greatly hinder the generalization ability of the models learned on one dataset to another. Recently, a series of cross-domain FER algorithms (CD-FERs) have been extensively developed to address this issue. Although each declares to achieve superior performance, comprehensive and fair comparisons are lacking due to inconsistent choices of the source/target datasets and feature extractors. In this work, we first propose to construct a unified CD-FER evaluation benchmark, in which we re-implement the well-performing CD-FER and recently published general domain adaptation algorithms and ensure that all these algorithms adopt the same source/target datasets and feature extractors for fair CD-FER evaluations. Based on the analysis, we find that most of the current state-of-the-art algorithms use adversarial learning mechanisms that aim to learn holistic domain-invariant features to mitigate domain shifts. However, these algorithms ignore local features, which are more transferable across different datasets and carry more detailed content for fine-grained adaptation. Therefore, we develop a novel adversarial graph representation adaptation (AGRA) framework that integrates graph representation propagation with adversarial learning to realize effective cross-domain holistic-local feature co-adaptation. Specifically, our framework first builds two graphs to correlate holistic and local regions within each domain and across different domains, respectively. Then, it extracts holistic-local features from the input image and uses learnable per-class statistical distributions to initialize the corresponding graph nodes. Finally, two stacked graph convolution networks (GCNs) are adopted to propagate holistic-local features within each domain to explore their interaction and across different domains for holistic-local feature co-adaptation. In this way, the AGRA framework can adaptively learn fine-grained domain-invariant features and thus facilitate cross-domain expression recognition. We conduct extensive and fair comparisons on the unified evaluation benchmark and show that the proposed AGRA framework outperforms previous state-of-the-art methods.


Asunto(s)
Algoritmos , Reconocimiento Facial , Benchmarking , Aprendizaje
7.
IEEE Trans Pattern Anal Mach Intell ; 44(3): 1371-1384, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-32986543

RESUMEN

Recognizing multiple labels of an image is a practical yet challenging task, and remarkable progress has been achieved by searching for semantic regions and exploiting label dependencies. However, current works utilize RNN/LSTM to implicitly capture sequential region/label dependencies, which cannot fully explore mutual interactions among the semantic regions/labels and do not explicitly integrate label co-occurrences. In addition, these works require large amounts of training samples for each category, and they are unable to generalize to novel categories with limited samples. To address these issues, we propose a knowledge-guided graph routing (KGGR) framework, which unifies prior knowledge of statistical label correlations with deep neural networks. The framework exploits prior knowledge to guide adaptive information propagation among different categories to facilitate multi-label analysis and reduce the dependency of training samples. Specifically, it first builds a structured knowledge graph to correlate different labels based on statistical label co-occurrence. Then, it introduces the label semantics to guide learning semantic-specific features to initialize the graph, and it exploits a graph propagation network to explore graph node interactions, enabling learning contextualized image feature representations. Moreover, we initialize each graph node with the classifier weights for the corresponding label and apply another propagation network to transfer node messages through the graph. In this way, it can facilitate exploiting the information of correlated labels to help train better classifiers, especially for labels with limited training samples. We conduct extensive experiments on the traditional multi-label image recognition (MLR) and multi-label few-shot learning (ML-FSL) tasks and show that our KGGR framework outperforms the current state-of-the-art methods by sizable margins on the public benchmarks.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Benchmarking , Aprendizaje Automático , Semántica
8.
IEEE Trans Image Process ; 27(12): 5827-5839, 2018 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-30040644

RESUMEN

To avoid the exhaustive search over locations and scales, current state-of-the-art object detection systems usually involve a crucial component generating a batch of candidate object proposals from images. In this paper, we present a simple yet effective approach for segmenting object proposals via a deep architecture of recursive neural networks (ReNNs), which hierarchically groups regions for detecting object candidates over scales. Unlike traditional methods that mainly adopt fixed similarity measures for merging regions or finding object proposals, our approach adaptively learns the region merging similarity and the objectness measure during the process of hierarchical region grouping. Specifically, guided by a structured loss, the ReNN model jointly optimizes the cross-region similarity metric with the region merging process as well as the objectness prediction. During inference of the object proposal generation, we introduce randomness into the greedy search to cope with the ambiguity of grouping regions. Extensive experiments on standard benchmarks, e.g., PASCAL VOC and ImageNet, suggest that our approach is capable of producing object proposals with high recall while well preserving the object boundaries and outperforms other existing methods in both accuracy and efficiency.


Asunto(s)
Procesamiento de Imagen Asistido por Computador/métodos , Redes Neurales de la Computación , Algoritmos , Animales , Humanos
9.
IEEE Trans Image Process ; 26(1): 328-339, 2017 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-27831874

RESUMEN

Sketch portrait generation benefits a wide range of applications such as digital entertainment and law enforcement. Although plenty of efforts have been dedicated to this task, several issues still remain unsolved for generating vivid and detail-preserving personal sketch portraits. For example, quite a few artifacts may exist in synthesizing hairpins and glasses, and textural details may be lost in the regions of hair or mustache. Moreover, the generalization ability of current systems is somewhat limited since they usually require elaborately collecting a dictionary of examples or carefully tuning features/components. In this paper, we present a novel representation learning framework that generates an end-to-end photo-sketch mapping through structure and texture decomposition. In the training stage, we first decompose the input face photo into different components according to their representational contents (i.e., structural and textural parts) by using a pre-trained convolutional neural network (CNN). Then, we utilize a branched fully CNN for learning structural and textural representations, respectively. In addition, we design a sorted matching mean square error metric to measure texture patterns in the loss function. In the stage of sketch rendering, our approach automatically generates structural and textural representations for the input photo and produces the final result via a probabilistic fusion scheme. Extensive experiments on several challenging benchmarks suggest that our approach outperforms example-based synthesis algorithms in terms of both perceptual and objective metrics. In addition, the proposed method also has better generalization ability across data set without additional training.

10.
IEEE Trans Neural Netw Learn Syst ; 27(6): 1135-49, 2016 06.
Artículo en Inglés | MEDLINE | ID: mdl-26742147

RESUMEN

Salient object detection increasingly receives attention as an important component or step in several pattern recognition and image processing tasks. Although a variety of powerful saliency models have been intensively proposed, they usually involve heavy feature (or model) engineering based on priors (or assumptions) about the properties of objects and backgrounds. Inspired by the effectiveness of recently developed feature learning, we provide a novel deep image saliency computing (DISC) framework for fine-grained image saliency computing. In particular, we model the image saliency from both the coarse-and fine-level observations, and utilize the deep convolutional neural network (CNN) to learn the saliency representation in a progressive manner. In particular, our saliency model is built upon two stacked CNNs. The first CNN generates a coarse-level saliency map by taking the overall image as the input, roughly identifying saliency regions in the global context. Furthermore, we integrate superpixel-based local context information in the first CNN to refine the coarse-level saliency map. Guided by the coarse saliency map, the second CNN focuses on the local context to produce fine-grained and accurate saliency map while preserving object details. For a testing image, the two CNNs collaboratively conduct the saliency computing in one shot. Our DISC framework is capable of uniformly highlighting the objects of interest from complex background while preserving well object details. Extensive experiments on several standard benchmarks suggest that DISC outperforms other state-of-the-art methods and it also generalizes well across data sets without additional training. The executable version of DISC is available online: http://vision.sysu.edu.cn/projects/DISC.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA