Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35514205

RESUMO

BACKGROUND: Coronavirus disease 2019 (COVID-19) has spurred a boom in uncovering repurposable existing drugs. Drug repurposing is a strategy for identifying new uses for approved or investigational drugs that are outside the scope of the original medical indication. MOTIVATION: Current works of drug repurposing for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) are mostly limited to only focusing on chemical medicines, analysis of single drug targeting single SARS-CoV-2 protein, one-size-fits-all strategy using the same treatment (same drug) for different infected stages of SARS-CoV-2. To dilute these issues, we initially set the research focusing on herbal medicines. We then proposed a heterogeneous graph embedding method to signaled candidate repurposing herbs for each SARS-CoV-2 protein, and employed the variational graph convolutional network approach to recommend the precision herb combinations as the potential candidate treatments against the specific infected stage. METHOD: We initially employed the virtual screening method to construct the 'Herb-Compound' and 'Compound-Protein' docking graph based on 480 herbal medicines, 12,735 associated chemical compounds and 24 SARS-CoV-2 proteins. Sequentially, the 'Herb-Compound-Protein' heterogeneous network was constructed by means of the metapath-based embedding approach. We then proposed the heterogeneous-information-network-based graph embedding method to generate the candidate ranking lists of herbs that target structural, nonstructural and accessory SARS-CoV-2 proteins, individually. To obtain precision synthetic effective treatments forvarious COVID-19 infected stages, we employed the variational graph convolutional network method to generate candidate herb combinations as the recommended therapeutic therapies. RESULTS: There were 24 ranking lists, each containing top-10 herbs, targeting 24 SARS-CoV-2 proteins correspondingly, and 20 herb combinations were generated as the candidate-specific treatment to target the four infected stages. The code and supplementary materials are freely available at https://github.com/fanyang-AI/TCM-COVID19.


Assuntos
Tratamento Farmacológico da COVID-19 , Combinação de Medicamentos , Reposicionamento de Medicamentos/métodos , Drogas em Investigação , Humanos , SARS-CoV-2
2.
Sensors (Basel) ; 21(6)2021 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-33802708

RESUMO

In recent years, electroencephalogram (EEG) signals have been used as a biometric modality, and EEG-based biometric systems have received increasing attention. However, due to the sensitive nature of EEG signals, the extraction of identity information through processing techniques may lead to some loss in the extracted identity information. This may impact the distinctiveness between subjects in the system. In this context, we propose a new self-relative evaluation framework for EEG-based biometric systems. The proposed framework aims at selecting a more accurate identity information when the biometric system is open to the enrollment of novel subjects. The experiments were conducted on publicly available EEG datasets collected from 108 subjects in a resting state with closed eyes. The results show that the openness condition is useful for selecting more accurate identity information.


Assuntos
Identificação Biométrica , Biometria , Eletroencefalografia , Humanos , Autoavaliação (Psicologia)
3.
Entropy (Basel) ; 22(4)2020 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-33286247

RESUMO

Cross-domain recommendation is a promising solution in recommendation systems by using relatively rich information from the source domain to improve the recommendation accuracy of the target domain. Most of the existing methods consider the rating information of users in different domains, the label information of users and items and the review information of users on items. However, they do not effectively use the latent sentiment information to find the accurate mapping of latent features in reviews between domains. User reviews usually include user's subjective views, which can reflect the user's preferences and sentiment tendencies to various attributes of the items. Therefore, in order to solve the cold-start problem in the recommendation process, this paper proposes a cross-domain recommendation algorithm (CDR-SAFM) based on sentiment analysis and latent feature mapping by combining the sentiment information implicit in user reviews in different domains. Different from previous sentiment research, this paper divides sentiment into three categories based on three-way decision ideas-namely, positive, negative and neutral-by conducting sentiment analysis on user review information. Furthermore, the Latent Dirichlet Allocation (LDA) is used to model the user's semantic orientation to generate the latent sentiment review features. Moreover, the Multilayer Perceptron (MLP) is used to obtain the cross domain non-linear mapping function to transfer the user's sentiment review features. Finally, this paper proves the effectiveness of the proposed CDR-SAFM framework by comparing it with existing recommendation algorithms in a cross-domain scenario on the Amazon dataset.

4.
J Environ Manage ; 196: 365-375, 2017 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-28324852

RESUMO

The rapid development of time-series data mining provides an emerging method for water resource management research. In this paper, based on the time-series data mining methodology, we propose a novel and general analysis framework for water quality time-series data. It consists of two parts: implementation components and common tasks of time-series data mining in water quality data. In the first part, we propose to granulate the time series into several two-dimensional normal clouds and calculate the similarities in the granulated level. On the basis of the similarity matrix, the similarity search, anomaly detection, and pattern discovery tasks in the water quality time-series instance dataset can be easily implemented in the second part. We present a case study of this analysis framework on weekly Dissolve Oxygen time-series data collected from five monitoring stations on the upper reaches of Yangtze River, China. It discovered the relationship of water quality in the mainstream and tributary as well as the main changing patterns of DO. The experimental results show that the proposed analysis framework is a feasible and efficient method to mine the hidden and valuable knowledge from water quality historical time-series data.


Assuntos
Mineração de Dados , Monitoramento Ambiental , Qualidade da Água , China , Rios , Água , Poluentes Químicos da Água
5.
ScientificWorldJournal ; 2014: 317387, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25177721

RESUMO

The interval set is a special set, which describes uncertainty of an uncertain concept or set Z with its two crisp boundaries named upper-bound set and lower-bound set. In this paper, the concept of similarity degree between two interval sets is defined at first, and then the similarity degrees between an interval set and its two approximations (i.e., upper approximation set (R)(Z) and lower approximation set (R)(Z) are presented, respectively. The disadvantages of using upper-approximation set (R)(Z) or lower-approximation set (R)(Z) as approximation sets of the uncertain set (uncertain concept) Z are analyzed, and a new method for looking for a better approximation set of the interval set Z is proposed. The conclusion that the approximation set R0.5(Z) is an optimal approximation set of interval set Z is drawn and proved successfully. The change rules of (R0.5)(Z) with different binary relations are analyzed in detail. Finally, a kind of crisp approximation set of the interval set Z is constructed. We hope this research work will promote the development of both the interval set model and granular computing theory.


Assuntos
Conceitos Matemáticos
6.
IEEE Trans Image Process ; 33: 1710-1725, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38416622

RESUMO

Deep learning has excelled in single-image super-resolution (SISR) applications, yet the lack of interpretability in most deep learning-based SR networks hinders their applicability, especially in fields like medical imaging that require transparent computation. To address these problems, we present an interpretable frequency division SR network that operates in the image frequency domain. It comprises a frequency division module and a step-wise reconstruction method, which divides the image into different frequencies and performs reconstruction accordingly. We develop a frequency division loss function to ensure that each reconstruction module (ReM) operates solely at one image frequency. These methods establish an interpretable framework for SR networks, visualizing the image reconstruction process and reducing the black box nature of SR networks. Additionally, we revisited the subpixel layer upsampling process by deriving its inverse process and designing a displacement generation module. This interpretable upsampling process incorporates subpixel information and is similar to pre-upsampling frameworks. Furthermore, we develop a new ReM based on interpretable Hessian attention to enhance network performance. Extensive experiments demonstrate that our network, without the frequency division loss, outperforms state-of-the-art methods qualitatively and quantitatively. The inclusion of the frequency division loss enhances the network's interpretability and robustness, and only slightly decreases the PSNR and SSIM metrics by an average of 0.48 dB and 0.0049, respectively.

7.
Artigo em Inglês | MEDLINE | ID: mdl-38739513

RESUMO

In the real world, data distributions often exhibit multiple granularities. However, the majority of existing neighbor-based machine-learning methods rely on manually setting a single-granularity for neighbor relationships. These methods typically handle each data point using a single-granularity approach, which severely affects their accuracy and efficiency. This paper adopts a dual-pronged approach: it constructs a multi-granularity representation of the data using the granular-ball computing model, thereby boosting the algorithm's time efficiency. It leverages the multi-granularity representation of the data to create tailored, multi-granularity neighborhood relationships for different task scenarios, resulting in improved algorithmic accuracy. The experimental results convincingly demonstrate that the proposed multi-granularity neighbor relationship effectively enhances KNN classification and clustering methods. The source code has been publicly released and is now accessible on GitHub at https://github.com/xjnine/MGNR.

8.
Artigo em Inglês | MEDLINE | ID: mdl-38954574

RESUMO

Granular-ball support vector machine (GBSVM) is a significant attempt to construct a classifier using the coarse-to-fine granularity of a granular ball as input, rather than a single data point. It is the first classifier whose input contains no points. However, the existing model has some errors, and its dual model has not been derived. As a result, the current algorithm cannot be implemented or applied. To address these problems, we fix the errors of the original model of the existing GBSVM and derive its dual model. Furthermore, a particle swarm optimization (PSO) algorithm is designed to solve the dual problem. The sequential minimal optimization (SMO) algorithm is also carefully designed to solve the dual problem. The latter is faster and more stable. The experimental results on the UCI benchmark datasets demonstrate that GBSVM is more robust and efficient. All codes have been released in the open source library available at: http://www.cquptshuyinxia.com/GBSVM.html or https://github.com/syxiaa/GBSVM.

9.
IEEE/ACM Trans Comput Biol Bioinform ; 21(5): 1579-1590, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38805329

RESUMO

Due to the great successes of Graph Neural Networks (GNN) in numerous fields, growing research interests have been devoted to applying GNN to molecular learning tasks. The molecule structure can be naturally represented as graphs where atoms and bonds refer to nodes and edges respectively. However, the atoms are not haphazardly stacked together but combined into various spatial geometries. Meanwhile, since chemical reactions mainly occur in substructures such as functional groups, the substructure plays a decisive role in the molecule's properties. Therefore, directly applying GNN to molecular representation learning could ignore the molecular spatial structure and the substructure properties which in turn degrades the performance of downstream tasks. In this paper, we propose Knowledge-Driven Self-Supervised Model for Molecular Representation Learning (KSMRL) to address above problems. The KSMRL consists of two major pathways: (1) the Spatial Information (SI) based pathway which preserves the spatial information of molecular structure, (2) the Subgraph Constraint (SC) based pathway which retains the properties of substructures into the molecular representation. In this manner, both the atomic level and substructure level information can be included in modeling. According to the experimental results on multiple datasets, the proposed KSMRL can generate discriminative molecular representations. In molecular generation tasks, KSMRL combined with Autoregressive Flow (AF) models or Discrete Flow (DF) models outperforms the state-of-the-art baselines over all datasets. In addition, we demonstrate the effectiveness of KSMRL with property optimization experiments. To indicate the ability of predicting specified potential Drug-Target Interactions (DTIs), a case study for discriminating the interactions between molecule generated by KSMRL and targets is also given.


Assuntos
Biologia Computacional , Redes Neurais de Computação , Aprendizado de Máquina Supervisionado , Biologia Computacional/métodos , Algoritmos
10.
Artigo em Inglês | MEDLINE | ID: mdl-39446541

RESUMO

Protein-protein interactions (PPIs) are essential to understanding cellular mechanisms, signaling networks, disease processes, and drug development, as they represent the physical contacts and functional associations between proteins. Recent advances have witnessed the achievements of artificial intelligence (AI) methods aimed at predicting PPIs. However, these approaches often handle the intricate web of relationships and mechanisms among proteins, drugs, diseases, ribonucleic acid (RNA), and protein structures in a fragmented or superficial manner. This is typically due to the limitations of non-end-to-end learning frameworks, which can lead to sub-optimal feature extraction and fusion, thereby compromising the prediction accuracy. To address these deficiencies, this paper introduces a novel end-to-end learning model, the Knowledge Graph Fused Graph Neural Network (KGF-GNN). This model comprises three integral components: (1) Protein Associated Network (PAN) Construction: We begin by constructing a PAN that extensively captures the diverse relationships and mechanisms linking proteins with drugs, diseases, RNA, and protein structures. (2) Graph Neural Network for Feature Extraction: A Graph Neural Network (GNN) is then employed to distill both topological and semantic features from the PAN, alongside another GNN designed to extract topological features directly from observed PPI networks. (3) Multi-layer Perceptron for Feature Fusion: Finally, a multi-layer perceptron integrates these varied features through end-to-end learning, ensuring that the feature extraction and fusion processes are both comprehensive and optimized for PPI prediction. Extensive experiments conducted on real-world PPI datasets validate the effectiveness of our proposed KGF-GNN approach, which not only achieves high accuracy in predicting PPIs but also significantly surpasses existing state-of-the-art models. This work not only enhances our ability to predict PPIs with a higher precision but also contributes to the broader application of AI in Bioinformatics, offering profound implications for biological research and therapeutic development.

11.
Neural Netw ; 171: 383-395, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38141474

RESUMO

Deep generative models have advantages in modeling complex time series and are widely used in anomaly detection. Nevertheless, the existing deep generative approaches mainly concentrate on the investigation of models' reconstruction capability rather than customizing a model suitable for anomaly detection. Meanwhile, VAE-based models suffer from posterior collapse, which can lead to a series of undesirable consequences, such as high false positive rate etc. Based on these considerations, in this paper, we propose a novel self-adversarial variational auto-encoder combined with contrast learning, short for ACVAE, to address these challenges. ACVAE consist of three parts 〈T,E,G〉, wherein the transformation network T is employed to generate abnormal latent representations similar to those normal latent representations encoded by the encoder E, and the decoder G is used to distinguish the two representations. In the framework of this model, the normal reconstructions are considered as positive samples and abnormal reconstructions as negative samples, and the contrast learning is executed on the part E to measure the similarities between inputs and positive samples, dissimilarities between inputs and negative samples. Thus, an improved objective function is proposed by integrating two novel regularizers, one refers to adversarial mechanism and the other involves contrast learning, in which the encoder E and decoder G hold the capability to distinguish, and decoder G is constrained to mitigate the posterior collapse. We perform several experiments on five datasets, whose results show ACVAE outperforms state-of-the-art methods.


Assuntos
Aprendizagem , Fatores de Tempo
12.
Artigo em Inglês | MEDLINE | ID: mdl-38564353

RESUMO

Electroencephalographic (EEG) source imaging (ESI) is a powerful method for studying brain functions and surgical resection of epileptic foci. However, accurately estimating the location and extent of brain sources remains challenging due to noise and background interference in EEG signals. To reconstruct extended brain sources, we propose a new ESI method called Variation Sparse Source Imaging based on Generalized Gaussian Distribution (VSSI-GGD). VSSI-GGD uses the generalized Gaussian prior as a sparse constraint on the spatial variation domain and embeds it into the Bayesian framework for source estimation. Using a variational technique, we approximate the intractable true posterior with a Gaussian density. Through convex analysis, the Bayesian inference problem is transformed entirely into a series of regularized L2p -norm ( ) optimization problems, which are efficiently solved with the ADMM algorithm. Imaging results of numerical simulations and human experimental dataset analysis reveal the superior performance of VSSI-GGD, which provides higher spatial resolution with clear boundaries compared to benchmark algorithms. VSSI-GGD can potentially serve as an effective and robust spatiotemporal EEG source imaging method. The source code of VSSI-GGD is available at https://github.com/Mashirops/VSSI-GGD.git.


Assuntos
Encéfalo , Eletroencefalografia , Humanos , Teorema de Bayes , Distribuição Normal , Eletroencefalografia/métodos , Encéfalo/diagnóstico por imagem , Mapeamento Encefálico/métodos , Algoritmos , Magnetoencefalografia/métodos
13.
Artigo em Inglês | MEDLINE | ID: mdl-39190517

RESUMO

OBJECT: Transformer-based neural networks have been applied to the electroencephalography (EEG) decoding for motor imagery (MI). However, most networks focus on applying the self-attention mechanism to extract global temporal information, while the cross-frequency coupling features between different frequencies have been neglected. Additionally, effectively integrating different neural networks poses challenges for the advanced design of decoding algorithms. METHODS: This study proposes a novel end-to-end Multi-Scale Vision Transformer Neural Network (MSVTNet) for MI-EEG classification. MSVTNet first extracts local spatio-temporal features at different filtered scales through convolutional neural networks (CNNs). Then, these features are concatenated along the feature dimension to form local multi-scale spatio-temporal feature tokens. Finally, Transformers are utilized to capture cross-scale interaction information and global temporal correlations, providing more distinguishable feature embeddings for classification. Moreover, auxiliary branch loss is leveraged for intermediate supervision to ensure the effective integration of CNNs and Transformers. RESULTS: The performance of MSVTNet was assessed through subject-dependent (session-dependent and session-independent) and subject-independent experiments on three MI datasets, i.e., the BCI competition IV 2a, 2b and OpenBMI datasets. The experimental results demonstrate that MSVTNet achieves state-of-the-art performance in all analyses. CONCLUSION: MSVTNet shows superiority and robustness in enhancing MI decoding performance. The source code for MSVTNet is available at https://github.com/SheepTAO/MSVTNet.

14.
Appl Opt ; 52(12): 2670-5, 2013 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-23669675

RESUMO

We propose a simple intrinsic Fabry-Perot interferometer (FPI) based on single-mode fiber, where a thin film is formed by arc discharge to serve as one mirror of the FPI cavity. The temperature and refractive-index (RI) characteristics of the proposed device are investigated. Experimental results show that the device can provide temperature-independent measurement of RI with a fringe-contrast sensitivity of ~72.59 dB/RIU (RI units). Meanwhile, it can also be used as a temperature sensor with a wavelength sensitivity of ~8 pm/°C. Therefore, the potential simultaneous measurement of RI and temperature could be realized by detecting the variations of fringe contrast and wavelength, respectively.

15.
Artigo em Inglês | MEDLINE | ID: mdl-37566496

RESUMO

Density peaks clustering algorithm (DP) has difficulty in clustering large-scale data, because it requires the distance matrix to compute the density and δ -distance for each object, which has O(n2) time complexity. Granular ball (GB) is a coarse-grained representation of data. It is based on the fact that an object and its local neighbors have similar distribution and they have high possibility of belonging to the same class. It has been introduced into supervised learning by Xia et al. to improve the efficiency of supervised learning, such as support vector machine, k -nearest neighbor classification, rough set, etc. Inspired by the idea of GB, we introduce it into unsupervised learning for the first time and propose a GB-based DP algorithm, called GB-DP. First, it generates GBs from the original data with an unsupervised partitioning method. Then, it defines the density of GBs, instead of the density of objects, according to the centers, radius, and distances between its members and centers, without setting any parameters. After that, it computes the distance between the centers of GBs as the distance between GBs and defines the δ -distance of GBs. Finally, it uses GBs' density and δ -distance to plot the decision graph, employs DP algorithm to cluster them, and expands the clustering result to the original data. Since there is no need to calculate the distance between any two objects and the number of GBs is far less than the scale of a data, it greatly reduces the running time of DP algorithm. By comparing with k -means, ball k -means, DP, DPC-KNN-PCA, FastDPeak, and DLORE-DP, GB-DP can get similar or even better clustering results in much less running time without setting any parameters. The source code is available at https://github.com/DongdongCheng/GB-DP.

16.
IEEE Trans Biomed Eng ; 70(2): 436-445, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-35867371

RESUMO

OBJECT: Motor imagery (MI) is a mental process widely utilized as the experimental paradigm for brain-computer interfaces (BCIs) across a broad range of basic science and clinical studies. However, decoding intentions from MI remains challenging due to the inherent complexity of brain patterns relative to the small sample size available for machine learning. APPROACH: This paper proposes an end-to-end Filter-Bank Multiscale Convolutional Neural Network (FBMSNet) for MI classification. A filter bank is first employed to derive a multiview spectral representation of the EEG data. Mixed depthwise convolution is then applied to extract temporal features at multiple scales, followed by spatial filtering to mitigate volume conduction. Finally, with the joint supervision of cross-entropy and center loss, FBMSNet obtains features that maximize interclass dispersion and intraclass compactness. MAIN RESULTS: We compare FBMSNet with several state-of-the-art EEG decoding methods on two MI datasets: the BCI Competition IV 2a dataset and the OpenBMI dataset. FBMSNet significantly outperforms the benchmark methods by achieving 79.17% and 70.05% for four-class and two-class hold-out classification accuracy, respectively. SIGNIFICANCE: These results demonstrate the efficacy of FBMSNet in improving EEG decoding performance toward more robust BCI applications. The FBMSNet source code is available at https://github.com/Want2Vanish/FBMSNet.


Assuntos
Interfaces Cérebro-Computador , Imaginação , Redes Neurais de Computação , Aprendizado de Máquina , Encéfalo , Eletroencefalografia/métodos , Algoritmos
17.
IEEE Trans Neural Netw Learn Syst ; 34(4): 2144-2155, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-34460405

RESUMO

This article presents a general sampling method, called granular-ball sampling (GBS), for classification problems by introducing the idea of granular computing. The GBS method uses some adaptively generated hyperballs to cover the data space, and the points on the hyperballs constitute the sampled data. GBS is the first sampling method that not only reduces the data size but also improves the data quality in noisy label classification. In addition, because the GBS method can be used to exactly describe the boundary, it can obtain almost the same classification accuracy as the results on the original datasets, and it can obtain an obviously higher classification accuracy than random sampling. Therefore, for the data reduction classification task, GBS is a general method that is not especially restricted by any specific classifier or dataset. Moreover, the GBS can be effectively used as an undersampling method for imbalanced classification. It has a time complexity that is close to O( N ), so it can accelerate most classifiers. These advantages make GBS powerful for improving the performance of classifiers. All codes have been released in the open source GBS library at http://www.cquptshuyinxia.com/GBS.html.

18.
Front Plant Sci ; 14: 1113059, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36760643

RESUMO

Understanding the genetic basis of the node of the first fruiting branch (NFFB) improves early-maturity cotton breeding. Here we report QTL mapping on 200 F2 plants and derivative F2:3 and F2:4 populations by genotyping by sequencing (GBS). BC1F2 population was constructed by backcrossing one F2:4 line with the maternal parent JF914 and used for BSA-seq for further QTL mapping. A total of 1,305,642 SNPs were developed between the parents by GBS, and 2,907,790 SNPs were detected by BSA-seq. A high-density genetic map was constructed containing 11,488 SNPs and spanning 4,202.12 cM in length. A total of 13 QTL were mapped in the 3 tested populations. JF914 conferred favorable alleles for 11 QTL, and JF173 conferred favorable alleles for the other 2 QTL. Two stable QTL were repeatedly mapped in F2:3 and F2:4, including qNFFB-D3-1 and qNFFB-D6-1. Only qNFFB-D3-1 contributed more than 10% of the phenotypic variation. This QTL covered about 24.7 Mb (17,130,008-41,839,226 bp) on chromosome D3. Two regions on D3 (41,779,195-41,836,120 bp, 41,836,768-41,872,287 bp) were found by BSA-seq and covered about 92.4 Kb. This 92.4 Kb region overlapped with the stable QTL qNFFB-D3-1 and contained 8 annotated genes. By qRT-PCR, Ghir_D03G012430 showed a lower expression level from the 1- to 2-leaf stage and a higher expression level from the 3- to 6-leaf stage in the buds of JF173 than that of JF914. Ghir_D03G012390 reached the highest level at the 3- and 5-leaf stages in the buds of JF173 and JF914, respectively. As JF173 has lower NFFB and more early maturity than JF914, these two genes might be important in cell division and differentiation during NFFB formation in the seedling stage. The results of this study will facilitate a better understanding of the genetic basis of NFFB and benefit cotton molecular breeding for improving earliness traits.

19.
Artigo em Inglês | MEDLINE | ID: mdl-37027748

RESUMO

Due to simplicity, K-means has become a widely used clustering method. However, its clustering result is seriously affected by the initial centers and the allocation strategy makes it hard to identify manifold clusters. Many improved K-means are proposed to accelerate it and improve the quality of initialize cluster centers, but few researchers pay attention to the shortcoming of K-means in discovering arbitrary-shaped clusters. Using graph distance (GD) to measure the dissimilarity between objects is a good way to solve this problem, but computing the GD is time-consuming. Inspired by the idea that granular ball uses a ball to represent the local data, we select representatives from a local neighborhood, called natural density peaks (NDPs). On the basis of NDPs, we propose a novel K-means algorithm for identifying arbitrary-shaped clusters, called NDP-Kmeans. It defines neighbor-based distance between NDPs and takes advantage of the neighbor-based distance to compute the GD between NDPs. Afterward, an improved K-means with high-quality initial centers and GD is used to cluster NDPs. Finally, each remaining object is assigned according to its representative. The experimental results show that our algorithms can not only recognize spherical clusters but also manifold clusters. Therefore, NDP-Kmeans has more advantages in detecting arbitrary-shaped clusters than other excellent algorithms.

20.
Artigo em Inglês | MEDLINE | ID: mdl-37023166

RESUMO

Hashing methods have sparked a great revolution in cross-modal retrieval due to the low cost of storage and computation. Benefiting from the sufficient semantic information of labeled data, supervised hashing methods have shown better performance compared with unsupervised ones. Nevertheless, it is expensive and labor intensive to annotate the training samples, which restricts the feasibility of supervised methods in real applications. To deal with this limitation, a novel semisupervised hashing method, i.e., three-stage semisupervised hashing (TS3H) is proposed in this article, where both labeled and unlabeled data are seamlessly handled. Different from other semisupervised approaches that learn the pseudolabels, hash codes, and hash functions simultaneously, the new approach is decomposed into three stages as the name implies, in which all of the stages are conducted individually to make the optimization cost-effective and precise. Specifically, the classifiers of different modalities are learned via the provided supervised information to predict the labels of unlabeled data at first. Then, hash code learning is achieved with a simple but efficient scheme by unifying the provided and the newly predicted labels. To capture the discriminative information and preserve the semantic similarities, we leverage pairwise relations to supervise both classifier learning and hash code learning. Finally, the modality-specific hash functions are obtained by transforming the training samples to the generated hash codes. The new approach is compared with the state-of-the-art shallow and deep cross-modal hashing (DCMH) methods on several widely used benchmark databases, and the experiment results verify its efficiency and superiority.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA