Pesquisa | Portal Regional da BVS

1.

(p,q)-biclique counting and enumeration for large sparse bipartite graphs.

Yang, Jianye; Peng, Yun; Ouyang, Dian; Zhang, Wenjie; Lin, Xuemin; Zhao, Xiang.

VLDB J ; : 1-25, 2023 Mar 13.

Artigo em Inglês | MEDLINE | ID: mdl-37362202

RESUMO

In this paper, we study the problem of (p, q)-biclique counting and enumeration for large sparse bipartite graphs. Given a bipartite graph G=(U,V,E) and two integer parameters p and q, we aim to efficiently count and enumerate all (p, q)-bicliques in G, where a (p, q)-biclique B(L, R) is a complete subgraph of G with LâU, RâV, |L|=p, and |R|=q. The problem of (p, q)-biclique counting and enumeration has many applications, such as graph neural network information aggregation, densest subgraph detection, and cohesive subgroup analysis. Despite the wide range of applications, to the best of our knowledge, we note that there is no efficient and scalable solution to this problem in the literature . This problem is computationally challenging, due to the worst-case exponential number of (p, q)-bicliques. In this paper, we propose a competitive branch-and-bound baseline method, namely BCList, which explores the search space in a depth-first manner, together with a variety of pruning techniques. Although BCList offers a useful computation framework to our problem, its worst-case time complexity is exponential to p+q. To alleviate this, we propose an advanced approach, called BCList++. Particularly, BCList++ applies a layer-based exploring strategy to enumerate (p, q)-bicliques by anchoring the search on either U or V only, which has a worst-case time complexity exponential to either p or q only. Consequently, a vital task is to choose a layer with the least computation cost. To this end, we develop a cost model, which is built upon an unbiased estimator for the density of 2-hop graph induced by U or V. To improve computation efficiency, BCList++ exploits pre-allocated arrays and vertex labeling techniques such that the frequent subgraph creating operations can be substituted by array element switching operations. We conduct extensive experiments on 16 real-life datasets, and the experimental results demonstrate that BCList++ significantly outperforms the baseline methods by up to 3 orders of magnitude. We show via a case study that (p, q)-bicliques optimizes the efficiency of graph neural networks. In this paper, we extend our techniques to count and enumerate (p, q)-bicliques on uncertain bipartite graphs. An efficient method IUBCList is developed on the top of BCList++, together with a couple of pruning techniques, including common neighbor refinement and search branch early termination, to discard unpromising uncertain (p, q)-bicliques early. The experimental results demonstrate that IUBCList significantly outperforms the baseline method by up to 2 orders of magnitude.

2.

Class-Imbalanced-Aware Distantly Supervised Named Entity Recognition.

Mao, Yuren; Hao, Yu; Liu, Weiwei; Lin, Xuemin; Cao, Xin.

IEEE Trans Neural Netw Learn Syst ; PP2023 Apr 26.

Artigo em Inglês | MEDLINE | ID: mdl-37099461

RESUMO

Distantly supervised named entity recognition (NER), which automatically learns NER models without manually labeling data, has gained much attention recently. In distantly supervised NER, positive unlabeled (PU) learning methods have achieved notable success. However, existing PU learning-based NER methods are unable to automatically handle the class imbalance and further depend on the estimation of the unknown class prior; thus, the class imbalance and imperfect estimation of the class prior degenerate the NER performance. To address these issues, this article proposes a novel PU learning method for distantly supervised NER. The proposed method can automatically handle the class imbalance and does not need to engage in class prior estimation, which enables the proposed methods to achieve the state-of-the-art performance. Extensive experiments support our theoretical analysis and validate the superiority of our method.

3.

Access to Polysubstituted (Furyl)methylthioethers via a Base-Promoted S-H Insertion Reaction of Conjugated Enynones.

Wu, Wanqing; Chen, Yang; Li, Meng; Hu, Weigao; Lin, Xuemin.

J Org Chem ; 84(22): 14529-14539, 2019 11 15.

Artigo em Inglês | MEDLINE | ID: mdl-31590485

RESUMO

A convenient and applicable approach to the construction of diverse functionalized (2-furyl)methylthioether derivatives via base-promoted S-H insertion of conjugated enynones with thiophenols or thiols has been developed. This reaction features readily available starting materials, high atom economy, broad substrate scope, and versatile operation. Moreover, the synthetic utility of this method has been demonstrated by the efficient synthesis of the CNKSPR1 inhibitor precursor and late-stage functionalization of glutathione.

Assuntos

Alcadienos/química , Sulfetos/síntese química , Estrutura Molecular , Sulfetos/química

4.

Hyperspectral Imagery Classification via Stochastic HHSVMs.

Liu, Weiwei; Shen, Xiaobo; Du, Bo; Tsang, Ivor W; Zhang, Wenjie; Lin, Xuemin.

IEEE Trans Image Process ; 28(2): 577-588, 2019 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-30222564

RESUMO

Hyperspectral imagery (HSI) has shown promising results in real-world applications. However, the technological evolution of optical sensors poses two main challenges in HSI classification: 1) the spectral band is usually redundant and noisy and 2) HSI with millions of pixels has become increasingly common in real-world applications. Motivated by the recent success of hybrid huberized support vector machines (HHSVMs), which inherit the benefits of both lasso and ridge regression, this paper first investigates the advantages of HHSVM for HSI applications. Unfortunately, the existing HHSVM solvers suffer from prohibitive computational costs on large-scale data sets. To solve this problem, this paper proposes simple and effective stochastic HHSVM algorithms for HSI classification. In the stochastic settings, we show that with a probability of at least , our algorithms find an -accurate solution using iterations. Since the convergence rate of our algorithms does not depend on the size of the training set, our algorithms are suitable for handling large-scale problems. We demonstrate the superiority of our algorithms by conducting experiments on large-scale binary and multiclass classification problems, comparing to the state-of-the-art HHSVM solvers. Finally, we apply our algorithms to real HSI classification and achieve promising results.

5.

Multiview Spectral Clustering via Structured Low-Rank Matrix Factorization.

Wang, Yang; Wu, Lin; Lin, Xuemin; Gao, Junbin.

IEEE Trans Neural Netw Learn Syst ; 29(10): 4833-4843, 2018 10.

Artigo em Inglês | MEDLINE | ID: mdl-29993958

RESUMO

Multiview data clustering attracts more attention than their single-view counterparts due to the fact that leveraging multiple independent and complementary information from multiview feature spaces outperforms the single one. Multiview spectral clustering aims at yielding the data partition agreement over their local manifold structures by seeking eigenvalue-eigenvector decompositions. Among all the methods, low-rank representation (LRR) is effective, by exploring the multiview consensus structures beyond the low rankness to boost the clustering performance. However, as we observed, such classical paradigm still suffers from the following stand-out limitations for multiview spectral clustering of overlooking the flexible local manifold structure, caused by aggressively enforcing the low-rank data correlation agreement among all views, and such a strategy, therefore, cannot achieve the satisfied between-views agreement; worse still, LRR is not intuitively flexible to capture the latent data clustering structures. In this paper, first, we present the structured LRR by factorizing into the latent low-dimensional data-cluster representations, which characterize the data clustering structure for each view. Upon such representation, second, the Laplacian regularizer is imposed to be capable of preserving the flexible local manifold structure for each view. Third, we present an iterative multiview agreement strategy by minimizing the divergence objective among all factorized latent data-cluster representations during each iteration of optimization process, where such latent representation from each view serves to regulate those from other views, and such an intuitive process iteratively coordinates all views to be agreeable. Fourth, we remark that such data-cluster representation can flexibly encode the data clustering structure from any view with an adaptive input cluster number. To this end, finally, a novel nonconvex objective function is proposed via the efficient alternating minimization strategy. The complexity analysis is also presented. The extensive experiments conducted against the real-world multiview data sets demonstrate the superiority over the state of the arts.

6.

Effective Multi-Query Expansions: Collaborative Deep Networks for Robust Landmark Retrieval.

Wang, Yang; Lin, Xuemin; Wu, Lin; Zhang, Wenjie.

IEEE Trans Image Process ; 26(3): 1393-1404, 2017 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-28103558

RESUMO

Given a query photo issued by a user (q-user), the landmark retrieval is to return a set of photos with their landmarks similar to those of the query, while the existing studies on the landmark retrieval focus on exploiting geometries of landmarks for similarity matches between candidate photos and a query photo. We observe that the same landmarks provided by different users over social media community may convey different geometry information depending on the viewpoints and/or angles, and may, subsequently, yield very different results. In fact, dealing with the landmarks with low quality shapes caused by the photography of q-users is often nontrivial and has seldom been studied. In this paper, we propose a novel framework, namely, multi-query expansions, to retrieve semantically robust landmarks by two steps. First, we identify the top- k photos regarding the latent topics of a query landmark to construct multi-query set so as to remedy its possible low quality shape. For this purpose, we significantly extend the techniques of Latent Dirichlet Allocation. Then, motivated by the typical collaborative filtering methods, we propose to learn a collaborative deep networks-based semantically, nonlinear, and high-level features over the latent factor for landmark photo as the training set, which is formed by matrix factorization over collaborative user-photo matrix regarding the multi-query set. The learned deep network is further applied to generate the features for all the other photos, meanwhile resulting into a compact multi-query set within such space. Then, the final ranking scores are calculated over the high-level feature space between the multi-query set and all other photos, which are ranked to serve as the final ranking list of landmark retrieval. Extensive experiments are conducted on real-world social media data with both landmark photos together with their user information to show the superior performance over the existing methods, especially our recently proposed multi-query based mid-level pattern representation method [1].

7.

Unsupervised Metric Fusion Over Multiview Data by Graph Random Walk-Based Cross-View Diffusion.

Wang, Yang; Zhang, Wenjie; Wu, Lin; Lin, Xuemin; Zhao, Xiang.

IEEE Trans Neural Netw Learn Syst ; 28(1): 57-70, 2017 01.

Artigo em Inglês | MEDLINE | ID: mdl-26672050

RESUMO

Learning an ideal metric is crucial to many tasks in computer vision. Diverse feature representations may combat this problem from different aspects; as visual data objects described by multiple features can be decomposed into multiple views, thus often provide complementary information. In this paper, we propose a cross-view fusion algorithm that leads to a similarity metric for multiview data by systematically fusing multiple similarity measures. Unlike existing paradigms, we focus on learning distance measure by exploiting a graph structure of data samples, where an input similarity matrix can be improved through a propagation of graph random walk. In particular, we construct multiple graphs with each one corresponding to an individual view, and a cross-view fusion approach based on graph random walk is presented to derive an optimal distance measure by fusing multiple metrics. Our method is scalable to a large amount of data by enforcing sparsity through an anchor graph representation. To adaptively control the effects of different views, we dynamically learn view-specific coefficients, which are leveraged into graph random walk to balance multiviews. However, such a strategy may lead to an over-smooth similarity metric where affinities between dissimilar samples may be enlarged by excessively conducting cross-view fusion. Thus, we figure out a heuristic approach to controlling the iteration number in the fusion process in order to avoid over smoothness. Extensive experiments conducted on real-world data sets validate the effectiveness and efficiency of our approach.

8.

Robust Subspace Clustering for Multi-View Data by Exploiting Correlation Consensus.

Wang, Yang; Lin, Xuemin; Wu, Lin; Zhang, Wenjie; Zhang, Qing; Huang, Xiaodi.

IEEE Trans Image Process ; 24(11): 3939-49, 2015 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-26353354

RESUMO

More often than not, a multimedia data described by multiple features, such as color and shape features, can be naturally decomposed of multi-views. Since multi-views provide complementary information to each other, great endeavors have been dedicated by leveraging multiple views instead of a single view to achieve the better clustering performance. To effectively exploit data correlation consensus among multi-views, in this paper, we study subspace clustering for multi-view data while keeping individual views well encapsulated. For characterizing data correlations, we generate a similarity matrix in a way that high affinity values are assigned to data objects within the same subspace across views, while the correlations among data objects from distinct subspaces are minimized. Before generating this matrix, however, we should consider that multi-view data in practice might be corrupted by noise. The corrupted data will significantly downgrade clustering results. We first present a novel objective function coupled with an angular based regularizer. By minimizing this function, multiple sparse vectors are obtained for each data object as its multiple representations. In fact, these sparse vectors result from reaching data correlation consensus on all views. For tackling noise corruption, we present a sparsity-based approach that refines the angular-based data correlation. Using this approach, a more ideal data similarity matrix is generated for multi-view data. Spectral clustering is then applied to the similarity matrix to obtain the final subspace clustering. Extensive experiments have been conducted to validate the effectiveness of our proposed approach.

9.

Evaluation of analyte stability and method ruggedness in the determination of streptomycin residues in honey by liquid chromatography with post-column derivatization.

Pang, Guo-Fang; Zhang, Jin-Jie; Cao, Yan-Zhong; Fan, Chun-Lin; Lin, Xue-Min; Li, Zeng-Yin; Jia, Guo-Qun.

J AOAC Int ; 87(1): 39-44, 2004.

Artigo em Inglês | MEDLINE | ID: mdl-15084085

RESUMO

This study demonstrated that streptomycin in honey is quite stable, and the results showed no obvious differences for 3 samples containing incurred analyte during continuous testing for 4 months. Fifteen laboratories evaluated method performance at 4 fortification levels ranging from 0.010 to 0.100 mg/kg; the recoveries ranged from 73.7 to 78.5%, the reproducibility relative standard deviations ranged from 5.76 to 15.85%, and the repeatability relative standard deviations ranged from 1.64 to 3.80%. In 1999-2002, the method was used to determine streptomycin residues in 5106 lots of honey samples from >20 provinces all over China. All of the honey samples were found to be in conformity with the requirements of customs clearance for exports to Europe, the United States, and Japan. The continuous 4-year quality analysis also found that C18 solid-phase extraction cartridges should be standardized to ensure that the analytical results are accurate when different lots of cartridges are used.

Assuntos

Antibacterianos/análise , Resíduos de Drogas/análise , Mel/análise , Estreptomicina/análise , Soluções Tampão , Resinas de Troca de Cátion , Cromatografia Líquida , Indicadores e Reagentes , Controle de Qualidade , Padrões de Referência , Reprodutibilidade dos Testes , Solventes

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA