RESUMO
Attributed networks consist of not only a network structure but also node attributes. Most existing community detection algorithms only focus on network structures and ignore node attributes, which are also important. Although some algorithms using both node attributes and network structure information have been proposed in recent years, the complex hierarchical coupling relationships within and between attributes, nodes and network structure have not been considered. Such hierarchical couplings are driving factors in community formation. This paper introduces a novel coupled node similarity (CNS) to involve and learn attribute and structure couplings and compute the similarity within and between nodes with categorical attributes in a network. CNS learns and integrates the frequency-based intra-attribute coupled similarity within an attribute, the co-occurrence-based inter-attribute coupled similarity between attributes, and coupled attribute-to-structure similarity based on the homophily property. CNS is then used to generate the weights of edges and transfer a plain graph to a weighted graph. Clustering algorithms detect community structures that are topologically well-connected and semantically coherent on the weighted graphs. Extensive experiments verify the effectiveness of CNS-based community detection algorithms on several data sets by comparing with the state-of-the-art node similarity measures, whether they involve node attribute information and hierarchical interactions, and on various levels of network structure complexity.
RESUMO
Text classification plays an important role in many practical applications. In the real world, there are extremely small datasets. Most existing methods adopt pretrained neural network models to handle this kind of dataset. However, these methods are either difficult to deploy on mobile devices because of their large output size or cannot fully extract the deep semantic information between phrases and clauses. This paper proposes a multimodel-based deep learning framework for short-text multiclass classification with an imbalanced and extremely small dataset. Our framework mainly includes five layers: the encoder layer, the word-level LSTM network layer, the sentence-level LSTM network layer, the max-pooling layer, and the SoftMax layer. The encoder layer uses DistilBERT to obtain context-sensitive dynamic word vectors that are difficult to represent in traditional feature engineering methods. Since the transformer part of this layer is distilled, our framework is compressed. Then, we use the next two layers to extract deep semantic information. The output of the encoder layer is sent to a bidirectional LSTM network, and the feature matrix is extracted hierarchically through the LSTM at the word and sentence level to obtain the fine-grained semantic representation. After that, the max-pooling layer converts the feature matrix into a lower-dimensional matrix, preserving only the obvious features. Finally, the feature matrix is taken as the input of a fully connected SoftMax layer, which contains a function that can convert the predicted linear vector into the output value as the probability of the text in each classification. Extensive experiments on two public benchmarks demonstrate the effectiveness of our proposed approach on an extremely small dataset. It retains the state-of-the-art baseline performance in terms of precision, recall, accuracy, and F1 score, and through the model size, training time, and convergence epoch, we can conclude that our method can be deployed faster and lighter on mobile devices.
Assuntos
Aprendizado Profundo , Benchmarking , Idioma , Redes Neurais de Computação , SemânticaRESUMO
Link prediction is an important task in social network analysis and mining because of its various applications. A large number of link prediction methods have been proposed. Among them, the deep learning-based embedding methods exhibit excellent performance, which encodes each node and edge as an embedding vector, enabling easy integration with traditional machine learning algorithms. However, there still remain some unsolved problems for this kind of methods, especially in the steps of node embedding and edge embedding. First, they either share exactly the same weight among all neighbors or assign a completely different weight to each node to obtain the node embedding. Second, they can hardly keep the symmetry of edge embeddings obtained from node representations by direct concatenation or other binary operations such as averaging and Hadamard product. In order to solve these problems, we propose a weighted symmetric graph embedding approach for link prediction. In node embedding, the proposed approach aggregates neighbors in different orders with different aggregating weights. In edge embedding, the proposed approach bidirectionally concatenates node pairs both forwardly and backwardly to guarantee the symmetry of edge representations while preserving local structural information. The experimental results show that our proposed approach can better predict network links, outperforming the state-of-the-art methods. The appropriate aggregating weight assignment and the bidirectional concatenation enable us to learn more accurate and symmetric edge representations for link prediction.