Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36702755

RESUMO

Due to the high heterogeneity and complexity of cancers, patients with different cancer subtypes often have distinct groups of genomic and clinical characteristics. Therefore, the discovery and identification of cancer subtypes are crucial to cancer diagnosis, prognosis and treatment. Recent technological advances have accelerated the increasing availability of multi-omics data for cancer subtyping. To take advantage of the complementary information from multi-omics data, it is necessary to develop computational models that can represent and integrate different layers of data into a single framework. Here, we propose a decoupled contrastive clustering method (Subtype-DCC) based on multi-omics data integration for clustering to identify cancer subtypes. The idea of contrastive learning is introduced into deep clustering based on deep neural networks to learn clustering-friendly representations. Experimental results demonstrate the superior performance of the proposed Subtype-DCC model in identifying cancer subtypes over the currently available state-of-the-art clustering methods. The strength of Subtype-DCC is also supported by the survival and clinical analysis.


Assuntos
Multiômica , Neoplasias , Humanos , Algoritmos , Genômica/métodos , Neoplasias/genética , Análise por Conglomerados , Receptor DCC
2.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37313714

RESUMO

Single-cell RNA sequencing (scRNA-seq) measures transcriptome-wide gene expression at single-cell resolution. Clustering analysis of scRNA-seq data enables researchers to characterize cell types and states, shedding new light on cell-to-cell heterogeneity in complex tissues. Recently, self-supervised contrastive learning has become a prominent technique for underlying feature representation learning. However, for the noisy, high-dimensional and sparse scRNA-seq data, existing methods still encounter difficulties in capturing the intrinsic patterns and structures of cells, and seldom utilize prior knowledge, resulting in clusters that mismatch with the real situation. To this end, we propose scDECL, a novel deep enhanced constraint clustering algorithm for scRNA-seq data analysis based on contrastive learning and pairwise constraints. Specifically, based on interpolated contrastive learning, a pre-training model is trained to learn the feature embedding, and then perform clustering according to the constructed enhanced pairwise constraint. In the pre-training stage, a mixup data augmentation strategy and interpolation loss is introduced to improve the diversity of the dataset and the robustness of the model. In the clustering stage, the prior information is converted into enhanced pairwise constraints to guide the clustering. To validate the performance of scDECL, we compare it with six state-of-the-art algorithms on six real scRNA-seq datasets. The experimental results demonstrate the proposed algorithm outperforms the six competing methods. In addition, the ablation studies on each module of the algorithm indicate that these modules are complementary to each other and effective in improving the performance of the proposed algorithm. Our method scDECL is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/DBLABDHU/scDECL.


Assuntos
Perfilação da Expressão Gênica , Análise da Expressão Gênica de Célula Única , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Algoritmos , Análise por Conglomerados
3.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36631401

RESUMO

The advances in single-cell ribonucleic acid sequencing (scRNA-seq) allow researchers to explore cellular heterogeneity and human diseases at cell resolution. Cell clustering is a prerequisite in scRNA-seq analysis since it can recognize cell identities. However, the high dimensionality, noises and significant sparsity of scRNA-seq data have made it a big challenge. Although many methods have emerged, they still fail to fully explore the intrinsic properties of cells and the relationship among cells, which seriously affects the downstream clustering performance. Here, we propose a new deep contrastive clustering algorithm called scDCCA. It integrates a denoising auto-encoder and a dual contrastive learning module into a deep clustering framework to extract valuable features and realize cell clustering. Specifically, to better characterize and learn data representations robustly, scDCCA utilizes a denoising Zero-Inflated Negative Binomial model-based auto-encoder to extract low-dimensional features. Meanwhile, scDCCA incorporates a dual contrastive learning module to capture the pairwise proximity of cells. By increasing the similarities between positive pairs and the differences between negative ones, the contrasts at both the instance and the cluster level help the model learn more discriminative features and achieve better cell segregation. Furthermore, scDCCA joins feature learning with clustering, which realizes representation learning and cell clustering in an end-to-end manner. Experimental results of 14 real datasets validate that scDCCA outperforms eight state-of-the-art methods in terms of accuracy, generalizability, scalability and efficiency. Cell visualization and biological analysis demonstrate that scDCCA significantly improves clustering and facilitates downstream analysis for scRNA-seq data. The code is available at https://github.com/WJ319/scDCCA.


Assuntos
Perfilação da Expressão Gênica , Análise da Expressão Gênica de Célula Única , Humanos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Algoritmos , Análise por Conglomerados
4.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36715275

RESUMO

A large number of works have presented the single-cell RNA sequencing (scRNA-seq) to study the diversity and biological functions of cells at the single-cell level. Clustering identifies unknown cell types, which is essential for downstream analysis of scRNA-seq samples. However, the high dimensionality, high noise and pervasive dropout rate of scRNA-seq samples have a significant challenge to the cluster analysis of scRNA-seq samples. Herein, we propose a new adaptive fuzzy clustering model based on the denoising autoencoder and self-attention mechanism called the scDASFK. It implements the comparative learning to integrate cell similar information into the clustering method and uses a deep denoising network module to denoise the data. scDASFK consists of a self-attention mechanism for further denoising where an adaptive clustering optimization function for iterative clustering is implemented. In order to make the denoised latent features better reflect the cell structure, we introduce a new adaptive feedback mechanism to supervise the denoising process through the clustering results. Experiments on 16 real scRNA-seq datasets show that scDASFK performs well in terms of clustering accuracy, scalability and stability. Overall, scDASFK is an effective clustering model with great potential for scRNA-seq samples analysis. Our scDASFK model codes are freely available at https://github.com/LRX2022/scDASFK.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Análise por Conglomerados , Algoritmos
5.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38040491

RESUMO

Pancreatic cancer is a globally recognized highly aggressive malignancy, posing a significant threat to human health and characterized by pronounced heterogeneity. In recent years, researchers have uncovered that the development and progression of cancer are often attributed to the accumulation of somatic mutations within cells. However, cancer somatic mutation data exhibit characteristics such as high dimensionality and sparsity, which pose new challenges in utilizing these data effectively. In this study, we propagated the discrete somatic mutation data of pancreatic cancer through a network propagation model based on protein-protein interaction networks. This resulted in smoothed somatic mutation profile data that incorporate protein network information. Based on this smoothed mutation profile data, we obtained the activity levels of different metabolic pathways in pancreatic cancer patients. Subsequently, using the activity levels of various metabolic pathways in cancer patients, we employed a deep clustering algorithm to establish biologically and clinically relevant metabolic subtypes of pancreatic cancer. Our study holds scientific significance in classifying pancreatic cancer based on somatic mutation data and may provide a crucial theoretical basis for the diagnosis and immunotherapy of pancreatic cancer patients.


Assuntos
Genômica , Neoplasias Pancreáticas , Humanos , Prognóstico , Genômica/métodos , Neoplasias Pancreáticas/genética , Mutação , Análise por Conglomerados
6.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37020333

RESUMO

Molecular clustering analysis has been developed to facilitate visual inspection in the process of structure-based virtual screening. However, traditional methods based on molecular fingerprints or molecular descriptors limit the accuracy of selecting active hit compounds, which may be attributed to the lack of representations of receptor structural and protein-ligand interaction during the clustering. Here, a novel deep clustering framework named ClusterX is proposed to learn molecular representations of protein-ligand complexes and cluster the ligands. In ClusterX, the graph was used to represent the protein-ligand complex, and the joint optimisation can be used efficiently for learning the cluster-friendly features. Experiments on the KLIFs database show that the model can distinguish well between the binding modes of different kinase inhibitors. To validate the effectiveness of the model, the clustering results on the virtual screening dataset further demonstrated that ClusterX achieved better or more competitive performance against traditional methods, such as SIFt and extended connectivity fingerprints. This framework may provide a unique tool for clustering analysis and prove to assist computational medicinal chemists in visual decision-making.


Assuntos
Ligantes , Análise por Conglomerados
7.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34472585

RESUMO

Clustering and cell type classification are a vital step of analyzing scRNA-seq data to reveal the complexity of the tissue (e.g. the number of cell types and the transcription characteristics of the respective cell type). Recently, deep learning-based single-cell clustering algorithms become popular since they integrate the dimensionality reduction with clustering. But these methods still have unstable clustering effects for the scRNA-seq datasets with high dropouts or noise. In this study, a novel single-cell RNA-seq deep embedding clustering via convolutional autoencoder embedding and soft K-means (scCAEs) is proposed by simultaneously learning the feature representation and clustering. It integrates the deep learning with convolutional autoencoder to characterize scRNA-seq data and proposes a regularized soft K-means algorithm to cluster cell populations in a learned latent space. Next, a novel constraint is introduced to the clustering objective function to iteratively optimize the clustering results, and more importantly, it is theoretically proved that this objective function optimization ensures the convergence. Moreover, it adds the reconstruction loss to the objective function combining the dimensionality reduction with clustering to find a more suitable embedding space for clustering. The proposed method is validated on a variety of datasets, in which the number of clusters in the mentioned datasets ranges from 4 to 46, and the number of cells ranges from 90 to 30 302. The experimental results show that scCAEs is superior to other state-of-the-art methods on the mentioned datasets, and it also keeps the satisfying compatibility and robustness. In addition, for single-cell datasets with the batch effects, scCAEs can ensure the cell separation while removing batch effects.


Assuntos
Algoritmos , Análise de Célula Única , Análise por Conglomerados , RNA-Seq , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos
8.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35172334

RESUMO

Single-cell RNA sequencing (scRNA-seq) permits researchers to study the complex mechanisms of cell heterogeneity and diversity. Unsupervised clustering is of central importance for the analysis of the scRNA-seq data, as it can be used to identify putative cell types. However, due to noise impacts, high dimensionality and pervasive dropout events, clustering analysis of scRNA-seq data remains a computational challenge. Here, we propose a new deep structural clustering method for scRNA-seq data, named scDSC, which integrate the structural information into deep clustering of single cells. The proposed scDSC consists of a Zero-Inflated Negative Binomial (ZINB) model-based autoencoder, a graph neural network (GNN) module and a mutual-supervised module. To learn the data representation from the sparse and zero-inflated scRNA-seq data, we add a ZINB model to the basic autoencoder. The GNN module is introduced to capture the structural information among cells. By joining the ZINB-based autoencoder with the GNN module, the model transfers the data representation learned by autoencoder to the corresponding GNN layer. Furthermore, we adopt a mutual supervised strategy to unify these two different deep neural architectures and to guide the clustering task. Extensive experimental results on six real scRNA-seq datasets demonstrate that scDSC outperforms state-of-the-art methods in terms of clustering accuracy and scalability. Our method scDSC is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/DHUDBlab/scDSC.


Assuntos
Redes Neurais de Computação , Análise de Célula Única , Análise por Conglomerados , Perfilação da Expressão Gênica , RNA-Seq , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos
9.
BMC Bioinformatics ; 23(Suppl 3): 140, 2022 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-35439945

RESUMO

BACKGROUND: Chronic cough affects approximately 10% of adults. The lack of ICD codes for chronic cough makes it challenging to apply supervised learning methods to predict the characteristics of chronic cough patients, thereby requiring the identification of chronic cough patients by other mechanisms. We developed a deep clustering algorithm with auto-encoder embedding (DCAE) to identify clusters of chronic cough patients based on data from a large cohort of 264,146 patients from the Electronic Medical Records (EMR) system. We constructed features using the diagnosis within the EMR, then built a clustering-oriented loss function directly on embedded features of the deep autoencoder to jointly perform feature refinement and cluster assignment. Lastly, we performed statistical analysis on the identified clusters to characterize the chronic cough patients compared to the non-chronic cough patients. RESULTS: The experimental results show that the DCAE model generated three chronic cough clusters and one non-chronic cough patient cluster. We found various diagnoses, medications, and lab tests highly associated with chronic cough patients by comparing the chronic cough cluster with the non-chronic cough cluster. Comparison of chronic cough clusters demonstrated that certain combinations of medications and diagnoses characterize some chronic cough clusters. CONCLUSIONS: To the best of our knowledge, this study is the first to test the potential of unsupervised deep learning methods for chronic cough investigation, which also shows a great advantage over existing algorithms for patient data clustering.


Assuntos
Aprendizado Profundo , Adulto , Algoritmos , Análise por Conglomerados , Tosse , Humanos
10.
BMC Bioinformatics ; 23(Suppl 4): 132, 2022 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-35428173

RESUMO

BACKGROUND: Converting molecules into computer-interpretable features with rich molecular information is a core problem of data-driven machine learning applications in chemical and drug-related tasks. Generally speaking, there are global and local features to represent a given molecule. As most algorithms have been developed based on one type of feature, a remaining bottleneck is to combine both feature sets for advanced molecule-based machine learning analysis. Here, we explored a novel analytical framework to make embeddings of the molecular features and apply them in the clustering of a large number of small molecules. RESULTS: In this novel framework, we first introduced a principal component analysis method encoding the molecule-specific atom and bond information. We then used a variational autoencoder (AE)-based method to make embeddings of the global chemical properties and the local atom and bond features. Next, using the embeddings from the encoded local and global features, we implemented and compared several unsupervised clustering algorithms to group the molecule-specific embeddings. The number of clusters was treated as a hyper-parameter and determined by the Silhouette method. Finally, we evaluated the corresponding results using three internal indices. Applying the analysis framework to a large chemical library of more than 47,000 molecules, we successfully identified 50 molecular clusters using the K-means method with 32 embeddings based on the AE method. We visualized the clustering result via t-SNE for the overall distribution of molecules and the similarity maps for the structural analysis of randomly selected cluster-specific molecules. CONCLUSIONS: This study developed a novel analytical framework that comprises a feature engineering scheme for molecule-specific atomic and bonding features and a deep learning-based embedding strategy for different molecular features. By applying the identified embeddings, we show their usefulness for clustering a large molecule dataset. Our novel analytic algorithms can be applied to any virtual library of chemical compounds with diverse molecular structures. Hence, these tools have the potential of optimizing drug discovery, as they can decrease the number of compounds to be screened in any drug screening campaign.


Assuntos
Algoritmos , Análise por Conglomerados , Análise de Componente Principal
11.
Curr Genomics ; 23(5): 353-368, 2022 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-36778191

RESUMO

Background: One major challenge in binning Metagenomics data is the limited availability of reference datasets, as only 1% of the total microbial population is yet cultured. This has given rise to the efficacy of unsupervised methods for binning in the absence of any reference datasets. Objective: To develop a deep clustering-based binning approach for Metagenomics data and to evaluate results with suitable measures. Methods: In this study, a deep learning-based approach has been taken for binning the Metagenomics data. The results are validated on different datasets by considering features such as Tetra-nucleotide frequency (TNF), Hexa-nucleotide frequency (HNF) and GC-Content. Convolutional Autoencoder is used for feature extraction and for binning; the K-means clustering method is used. Results: In most cases, it has been found that evaluation parameters such as the Silhouette index and Rand index are more than 0.5 and 0.8, respectively, which indicates that the proposed approach is giving satisfactory results. The performance of the developed approach is compared with current methods and tools using benchmarked low complexity simulated and real metagenomic datasets. It is found better for unsupervised and at par with semi-supervised methods. Conclusion: An unsupervised advanced learning-based approach for binning has been proposed, and the developed method shows promising results for various datasets. This is a novel approach for solving the lack of reference data problem of binning in metagenomics.

12.
Ultrasound Med Biol ; 50(5): 703-711, 2024 05.
Artigo em Inglês | MEDLINE | ID: mdl-38350787

RESUMO

OBJECTIVE: The aim of this study was address the challenges posed by the manual labeling of fetal ultrasound images by introducing an unsupervised approach, the fetal ultrasound semantic clustering (FUSC) method. The primary objective was to automatically cluster a large volume of ultrasound images into various fetal views, reducing or eliminating the need for labor-intensive manual labeling. METHODS: The FUSC method was developed by using a substantial data set comprising 88,063 images. The methodology involves an unsupervised clustering approach to categorize ultrasound images into diverse fetal views. The method's effectiveness was further evaluated on an additional, unseen data set consisting of 8187 images. The evaluation included assessment of the clustering purity, and the entire process is detailed to provide insights into the method's performance. RESULTS: The FUSC method exhibited notable success, achieving >92% clustering purity on the evaluation data set of 8187 images. The results signify the feasibility of automatically clustering fetal ultrasound images without relying on manual labeling. The study showcases the potential of this approach in handling a large volume of ultrasound scans encountered in clinical practice, with implications for improving efficiency and accuracy in fetal ultrasound imaging. CONCLUSION: The findings of this investigation suggest that the FUSC method holds significant promise for the field of fetal ultrasound imaging. By automating the clustering of ultrasound images, this approach has the potential to reduce the manual labeling burden, making the process more efficient. The results pave the way for advanced automated labeling solutions, contributing to the enhancement of clinical practices in fetal ultrasound imaging. Our code is available at https://github.com/BioMedIA-MBZUAI/FUSC.


Assuntos
Semântica , Ultrassonografia Pré-Natal , Gravidez , Feminino , Humanos , Segundo Trimestre da Gravidez , Ultrassonografia Pré-Natal/métodos , Aprendizado de Máquina Supervisionado , Análise por Conglomerados
13.
Talanta ; 275: 126076, 2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-38663070

RESUMO

Raman spectroscopy serves as a powerful and reliable tool for the characterization of pathogenic bacteria. The integration of Raman spectroscopy with artificial intelligence techniques to rapidly identify pathogenic bacteria has become paramount for expediting disease diagnosis. However, the development of prevailing supervised artificial intelligence algorithms is still constrained by costly and limited well-annotated Raman spectroscopy datasets. Furthermore, tackling various high-dimensional and intricate Raman spectra of pathogenic bacteria in the absence of annotations remains a formidable challenge. In this paper, we propose a concise and efficient deep clustering-based framework (RamanCluster) to achieve accurate and robust unsupervised Raman spectral identification of pathogenic bacteria without the need for any annotated data. RamanCluster is composed of a novel representation learning module and a machine learning-based clustering module, systematically enabling the extraction of robust discriminative representations and unsupervised Raman spectral identification of pathogenic bacteria. The extensive experimental results show that RamanCluster has achieved high accuracy on both Bacteria-4 and Bacteria-6, with ACC values of 77 % and 74.1 %, NMI values of 75 % and 73 %, as well as AMI values of 74.6 % and 72.6 %, respectively. Furthermore, compared with other state-of-the-art methods, RamanCluster exhibits the superior accuracy on handling various complicated pathogenic bacterial Raman spectroscopy datasets, including situations with strong noise and a wide variety of pathogenic bacterial species. Additionally, RamanCluster also demonstrates commendable robustness in these challenging scenarios. In short, RamanCluster has a promising prospect in accelerating the development of low-cost and widely applicable disease diagnosis in clinical medicine.


Assuntos
Bactérias , Análise Espectral Raman , Análise Espectral Raman/métodos , Bactérias/isolamento & purificação , Análise por Conglomerados , Algoritmos
14.
Neural Netw ; 171: 114-126, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38091755

RESUMO

Multi-view clustering has attracted growing attention owing to its powerful capacity of multi-source information integration. Although numerous advanced methods have been proposed in past decades, most of them generally fail to distinguish the unequal importance of multiple views to the clustering task and overlook the scale uniformity of learned latent representation among different views, resulting in blurry physical meaning and suboptimal model performance. To address these issues, in this paper, we propose a joint learning framework, termed Adaptive-weighted deep Multi-view Clustering with Uniform scale representation (AMCU). Specifically, to achieve more reasonable multi-view fusion, we introduce an adaptive weighting strategy, which imposes simplex constraints on heterogeneous views for measuring their varying degrees of contribution to consensus prediction. Such a simple yet effective strategy shows its clear physical meaning for the multi-view clustering task. Furthermore, a novel regularizer is incorporated to learn multiple latent representations sharing approximately the same scale, so that the objective for calculating clustering loss cannot be sensitive to the views and thus the entire model training process can be guaranteed to be more stable as well. Through comprehensive experiments on eight popular real-world datasets, we demonstrate that our proposal performs better than several state-of-the-art single-view and multi-view competitors.


Assuntos
Aprendizagem , Análise por Conglomerados , Consenso
15.
Neural Netw ; 175: 106287, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38593558

RESUMO

Deep multi-view clustering, which can obtain complementary information from different views, has received considerable attention in recent years. Although some efforts have been made and achieve decent performances, most of them overlook the structural information and are susceptible to poor quality views, which may seriously restrict the capacity for clustering. To this end, we propose Structural deep Multi-View Clustering with integrated abstraction and detail (SMVC). Specifically, multi-layer perceptrons are used to extract features from specific views, which are then concatenated to form the global features. Besides, a global target distribution is constructed and guides the soft cluster assignments of specific views. In addition to the exploitation of the top-level abstraction, we also design the mining of the underlying details. We construct instance-level contrastive learning using high-order adjacency matrices, which has an equivalent effect to graph attention network and reduces feature redundancy. By integrating the top-level abstraction and underlying detail into a unified framework, our model can jointly optimize the cluster assignments and feature embeddings. Extensive experiments on four benchmark datasets have demonstrated that the proposed SMVC consistently outperforms the state-of-the-art methods.


Assuntos
Redes Neurais de Computação , Análise por Conglomerados , Aprendizado Profundo , Algoritmos , Humanos
16.
Neural Netw ; 180: 106696, 2024 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-39255633

RESUMO

Despite significant advances in the deep clustering research, there remain three critical limitations to most of the existing approaches. First, they often derive the clustering result by associating some distribution-based loss to specific network layers, neglecting the potential benefits of leveraging the contrastive sample-wise relationships. Second, they frequently focus on representation learning at the full-image scale, overlooking the discriminative information latent in partial image regions. Third, although some prior studies perform the learning process at multiple levels, they mostly lack the ability to exploit the interaction between different learning levels. To overcome these limitations, this paper presents a novel deep image clustering approach via Partial Information discrimination and Cross-level Interaction (PICI). Specifically, we utilize a Transformer encoder as the backbone, coupled with two types of augmentations to formulate two parallel views. The augmented samples, integrated with masked patches, are processed through the Transformer encoder to produce the class tokens. Subsequently, three partial information learning modules are jointly enforced, namely, the partial information self-discrimination (PISD) module for masked image reconstruction, the partial information contrastive discrimination (PICD) module for the simultaneous instance- and cluster-level contrastive learning, and the cross-level interaction (CLI) module to ensure the consistency across different learning levels. Through this unified formulation, our PICI approach for the first time, to our knowledge, bridges the gap between the masked image modeling and the deep contrastive clustering, offering a novel pathway for enhanced representation learning and clustering. Experimental results across six image datasets demonstrate the superiority of our PICI approach over the state-of-the-art. In particular, our approach achieves an ACC of 0.772 (0.634) on the RSOD (UC-Merced) dataset, which shows an improvement of 29.7% (24.8%) over the best baseline. The source code is available at https://github.com/Regan-Zhang/PICI.

17.
Neural Netw ; 180: 106684, 2024 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-39243506

RESUMO

Image clustering aims to divide a set of unlabeled images into multiple clusters. Recently, clustering methods based on contrastive learning have attracted much attention due to their ability to learn discriminative feature representations. Nevertheless, existing clustering algorithms face challenges in capturing global information and preserving semantic continuity. Additionally, these methods often exhibit relatively singular feature distributions, limiting the full potential of contrastive learning in clustering. These problems can have a negative impact on the performance of image clustering. To address the above problems, we propose a deep clustering framework termed Efficient Contrastive Clustering via Pseudo-Siamese Vision Transformer and Multi-view Augmentation (ECCT). The core idea is to introduce Vision Transformer (ViT) to provide the global view, and improve it with Hilbert Patch Embedding (HPE) module to construct a new ViT branch. Finally, we fuse the features extracted from the two ViT branches to obtain both global view and semantic coherence. In addition, we employ multi-view random aggressive augmentation to broaden the feature distribution, enabling the model to learn more comprehensive and richer contrastive features. Our results on five datasets demonstrate that ECCT outperforms previous clustering methods. In particular, the ARI metric of ECCT on the STL-10 (ImageNet-Dogs) dataset is 0.852 (0.424), which is 10.3% (4.8%) higher than the best baseline.

18.
Med Image Anal ; 95: 103204, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38761438

RESUMO

Due to the intra-class diversity of mitotic cells and the morphological overlap with similarly looking imposters, automatic mitosis detection in histopathology slides is still a challenging task. In this paper, we propose a novel mitosis detection model in a weakly supervised way, which consists of a candidate proposal network and a verification network. The candidate proposal network based on patch learning aims to separate both mitotic cells and their mimics from the background as candidate objects, which substantially reduces missed detections in the screening process of candidates. These obtained candidate results are then fed into the verification network for mitosis refinement. The verification network adopts an RBF-based subcategorization scheme to deal with the problems of high intra-class variability of mitosis and the mimics with similar appearance. We utilize the RBF centers to define subcategories containing mitotic cells with similar properties and capture representative RBF center locations through joint training of classification and clustering. Due to the lower intra-class variation within a subcategory, the localized feature space at subcategory level can better characterize a certain type of mitotic figures and can provide a better similarity measurement for distinguishing mitotic cells from nonmitotic cells. Our experiments manifest that this subcategorization scheme helps improve the performance of mitosis detection and achieves state-of-the-art results on the publicly available mitosis datasets using only weak labels.


Assuntos
Neoplasias da Mama , Mitose , Mitose/fisiologia , Humanos , Neoplasias da Mama/diagnóstico por imagem , Neoplasias da Mama/patologia , Feminino , Interpretação de Imagem Assistida por Computador/métodos , Algoritmos , Aprendizado Profundo
19.
Accid Anal Prev ; 208: 107779, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-39299180

RESUMO

This study highlights the significance of understanding and categorizing driving styles to improve traffic safety and increase fuel efficiency. By analyzing a comprehensive dataset of naturalistic driving records from taxi drivers, it offers insight into driving behaviors in various environments. Utilizing deep clustering methodology, the research develops a novel framework for categorizing driving behaviors into Baseline Driving Characteristics (BDC), encompassing aspects such as turning, cruising, acceleration, and deceleration. These characteristics are instrumental in creating an abnormal driving index that serves as a quantitative measure for evaluating driving styles concerning traffic safety. Furthermore, the study elaborates on the utility of the abnormal driving index and its correlation with headway distances, enabling the formulation of personalized safety guidelines for drivers. This research contributes to the field of traffic safety by using the BDC to offer insight into driving behaviors. It lays the groundwork for future research aimed at enhancing driving behavior analysis through the integration of advanced driver assistance systems and exploration of linkages between the abnormal driving index and actual crash risk. The results of this study advance understanding of driving behaviors and their implications for traffic safety, paving the way for the development of broader and more effective safety measures in transportation.


Assuntos
Acidentes de Trânsito , Condução de Veículo , Segurança , Condução de Veículo/psicologia , Humanos , Acidentes de Trânsito/prevenção & controle , Masculino , Adulto , Aceleração , Feminino , Desaceleração , Pessoa de Meia-Idade , Adulto Jovem , Análise por Conglomerados
20.
Sci Rep ; 14(1): 13541, 2024 06 12.
Artigo em Inglês | MEDLINE | ID: mdl-38866896

RESUMO

Single-cell ribonucleic acid sequencing (scRNA-seq) is a high-throughput genomic technique that is utilized to investigate single-cell transcriptomes. Cluster analysis can effectively reveal the heterogeneity and diversity of cells in scRNA-seq data, but existing clustering algorithms struggle with the inherent high dimensionality, noise, and sparsity of scRNA-seq data. To overcome these limitations, we propose a clustering algorithm: the Dual Correlation Reduction network-based Extreme Learning Machine (DCRELM). First, DCRELM obtains the low-dimensional and dense result features of scRNA-seq data in an extreme learning machine (ELM) random mapping space. Second, the ELM graph distortion module is employed to obtain a dual view of the resulting features, effectively enhancing their robustness. Third, the autoencoder fusion module is employed to learn the attributes and structural information of the resulting features, and merge these two types of information to generate consistent latent representations of these features. Fourth, the dual information reduction network is used to filter the redundant information and noise in the dual consistent latent representations. Last, a triplet self-supervised learning mechanism is utilized to further improve the clustering performance. Extensive experiments show that the DCRELM performs well in terms of clustering performance and robustness. The code is available at https://github.com/gaoqingyun-lucky/awesome-DCRELM .


Assuntos
Algoritmos , Aprendizado de Máquina , RNA-Seq , Análise de Célula Única , Análise de Célula Única/métodos , Análise por Conglomerados , RNA-Seq/métodos , Humanos , Análise de Sequência de RNA/métodos , Análise da Expressão Gênica de Célula Única
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA