Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 57
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Proteins ; 92(6): 776-794, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38258321

RESUMEN

Three-dimensional (3D) structure information, now available at the proteome scale, may facilitate the detection of remote evolutionary relationships in protein superfamilies. Here, we illustrate this with the identification of a novel family of protein domains related to the ferredoxin-like superfold, by combining (i) transitive sequence similarity searches, (ii) clustering approaches, and (iii) the use of AlphaFold2 3D structure models. Domains of this family were initially identified in relation with the intracellular biomineralization of calcium carbonates by Cyanobacteria. They are part of the large heavy-metal-associated (HMA) superfamily, departing from the latter by specific sequence and structural features. In particular, most of them share conserved basic amino acids  (hence their name CoBaHMA for Conserved Basic residues HMA), forming a positively charged surface, which is likely to interact with anionic partners. CoBaHMA domains are found in diverse modular organizations in bacteria, existing in the form of monodomain proteins or as part of larger proteins, some of which are membrane proteins involved in transport or lipid metabolism. This suggests that the CoBaHMA domains may exert a regulatory function, involving interactions with anionic lipids. This hypothesis might have a particular resonance in the context of the compartmentalization observed for cyanobacterial intracellular calcium carbonates.


Asunto(s)
Secuencia de Aminoácidos , Proteínas Bacterianas , Metales Pesados , Modelos Moleculares , Proteínas Bacterianas/química , Proteínas Bacterianas/metabolismo , Proteínas Bacterianas/genética , Metales Pesados/química , Metales Pesados/metabolismo , Dominios Proteicos , Cianobacterias/metabolismo , Cianobacterias/química , Cianobacterias/genética , Ferredoxinas/química , Ferredoxinas/metabolismo , Pliegue de Proteína
2.
Entropy (Basel) ; 25(10)2023 Oct 10.
Artículo en Inglés | MEDLINE | ID: mdl-37895553

RESUMEN

Graph clustering is a fundamental and challenging task in unsupervised learning. It has achieved great progress due to contrastive learning. However, we find that there are two problems that need to be addressed: (1) The augmentations in most graph contrastive clustering methods are manual, which can result in semantic drift. (2) Contrastive learning is usually implemented on the feature level, ignoring the structure level, which can lead to sub-optimal performance. In this work, we propose a method termed Graph Clustering with High-Order Contrastive Learning (GCHCL) to solve these problems. First, we construct two views by Laplacian smoothing raw features with different normalizations and design a structure alignment loss to force these two views to be mapped into the same space. Second, we build a contrastive similarity matrix with two structure-based similarity matrices and force it to align with an identity matrix. In this way, our designed contrastive learning encompasses a larger neighborhood, enabling our model to learn clustering-friendly embeddings without the need for an extra clustering module. In addition, our model can be trained on a large dataset. Extensive experiments on five datasets validate the effectiveness of our model. For example, compared to the second-best baselines on four small and medium datasets, our model achieved an average improvement of 3% in accuracy. For the largest dataset, our model achieved an accuracy score of 81.92%, whereas the compared baselines encountered out-of-memory issues.

3.
BMC Bioinformatics ; 23(1): 108, 2022 Mar 30.
Artículo en Inglés | MEDLINE | ID: mdl-35354426

RESUMEN

BACKGROUND: Biological sequence clustering is a complicated data clustering problem owing to the high computation costs incurred for pairwise sequence distance calculations through sequence alignments, as well as difficulties in determining parameters for deriving robust clusters. While current approaches are successful in reducing the number of sequence alignments performed, the generated clusters are based on a single sequence identity threshold applied to every cluster. Poor choices of this identity threshold would thus lead to low quality clusters. There is however little support provided to users in selecting thresholds that are well matched with the input sequences. RESULTS: We present a novel sequence clustering approach called ALFATClust that exploits rapid pairwise alignment-free sequence distance calculations and community detection in graph for clusters generation. Instead of a single threshold applied to every generated cluster, ALFATClust is capable of dynamically determining the cut-off threshold for each individual cluster by considering both cluster separation and intra-cluster sequence similarity. Benchmarking analysis shows that ALFATClust generally outperforms existing approaches by simultaneously maintaining cluster robustness and substantial cluster separation for the benchmark datasets. The software also provides an evaluation report for verifying the quality of the non-singleton clusters obtained. CONCLUSIONS: ALFATClust is able to generate sequence clusters having high intra-cluster sequence similarity and substantial separation between clusters without having users to decide precise similarity cut-off thresholds.


Asunto(s)
Algoritmos , Programas Informáticos , Benchmarking , Análisis por Conglomerados , Alineación de Secuencia
4.
Sensors (Basel) ; 21(22)2021 Nov 12.
Artículo en Inglés | MEDLINE | ID: mdl-34833582

RESUMEN

With the rise of online/mobile transactions, the cost of cash-out has decreased and the cost of detection has increased. In the world of online/mobile payment in IoT, merchants and credit cards can be applied and approved online and used in the form of a QR code but not a physical card or Point of Sale equipment, making it easy for these systems to be controlled by a group of fraudsters. In mainland China, where the credit card transaction fee is, on average, lower than a retail loan rate, the credit card cash-out option is attractive for people for an investment or business operation, which, after investigation, can be considered unlawful if over a certain amount is used. Because cash-out will incur fees for the merchants, while bringing money to the credit cards' owners, it is difficult to confirm, as nobody will declare or admit it. Furthermore, it is more difficult to detect cash-out groups than individuals, because cash-out groups are more hidden, which leads to bigger transaction amounts. We propose a new method for the detection of cash-out groups. First, the seed cards are mined and the seed cards' diffusion is then performed through the local graph clustering algorithm (Approximate PageRank, APR). Second, a merchant association network in IoT is constructed based on the suspicious cards, using the graph embedding algorithm (Node2Vec). Third, we use the clustering algorithm (DBSCAN) to cluster the nodes in the Euclidean space, which divides the merchants into groups. Finally, we design a method to classify the severity of the groups to facilitate the following risk investigation. The proposed method covers 145 merchants from 195 known risky merchants in groups that acquire cash-out from four banks, which shows that this method can identify most (74.4%) cash-out groups. In addition, the proposed method identifies a further 178 cash-out merchants in the group within the same four acquirers, resulting in a total of 30,586 merchants. The results and framework are already adopted and absorbed into the design for a cash-out group detection system in IoT by the Chinese payment processor.


Asunto(s)
Algoritmos , Confidencialidad , China , Humanos
5.
Entropy (Basel) ; 23(10)2021 Sep 28.
Artículo en Inglés | MEDLINE | ID: mdl-34681995

RESUMEN

Functional modules can be predicted using genome-wide protein-protein interactions (PPIs) from a systematic perspective. Various graph clustering algorithms have been applied to PPI networks for this task. In particular, the detection of overlapping clusters is necessary because a protein is involved in multiple functions under different conditions. graph entropy (GE) is a novel metric to assess the quality of clusters in a large, complex network. In this study, the unweighted and weighted GE algorithm is evaluated to prove the validity of predicting function modules. To measure clustering accuracy, the clustering results are compared to protein complexes and Gene Ontology (GO) annotations as references. We demonstrate that the GE algorithm is more accurate in overlapping clusters than the other competitive methods. Moreover, we confirm the biological feasibility of the proteins that occur most frequently in the set of identified clusters. Finally, novel proteins for the additional annotation of GO terms are revealed.

6.
BMC Genomics ; 21(Suppl 9): 586, 2020 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-32900369

RESUMEN

BACKGROUND: Haplotypes, the ordered lists of single nucleotide variations that distinguish chromosomal sequences from their homologous pairs, may reveal an individual's susceptibility to hereditary and complex diseases and affect how our bodies respond to therapeutic drugs. Reconstructing haplotypes of an individual from short sequencing reads is an NP-hard problem that becomes even more challenging in the case of polyploids. While increasing lengths of sequencing reads and insert sizes helps improve accuracy of reconstruction, it also exacerbates computational complexity of the haplotype assembly task. This has motivated the pursuit of algorithmic frameworks capable of accurate yet efficient assembly of haplotypes from high-throughput sequencing data. RESULTS: We propose a novel graphical representation of sequencing reads and pose the haplotype assembly problem as an instance of community detection on a spatial random graph. To this end, we construct a graph where each read is a node with an unknown community label associating the read with the haplotype it samples. Haplotype reconstruction can then be thought of as a two-step procedure: first, one recovers the community labels on the nodes (i.e., the reads), and then uses the estimated labels to assemble the haplotypes. Based on this observation, we propose ComHapDet - a novel assembly algorithm for diploid and ployploid haplotypes which allows both bialleleic and multi-allelic variants. CONCLUSIONS: Performance of the proposed algorithm is benchmarked on simulated as well as experimental data obtained by sequencing Chromosome 5 of tetraploid biallelic Solanum-Tuberosum (Potato). The results demonstrate the efficacy of the proposed method and that it compares favorably with the existing techniques.


Asunto(s)
Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento , Diploidia , Haplotipos , Humanos , Poliploidía , Análisis de Secuencia de ADN
7.
Sensors (Basel) ; 21(1)2020 Dec 26.
Artículo en Inglés | MEDLINE | ID: mdl-33375309

RESUMEN

This paper introduces a system that can estimate the deformation process of a deformed flat object (folded plane) and generate the input data for a robot with human-like dexterous hands and fingers to reproduce the same deformation of another similar object. The system is based on processing RGB data and depth data with three core techniques: a weighted graph clustering method for non-rigid point matching and clustering; a refined region growing method for plane detection on depth data based on an offset error defined by ourselves; and a novel sliding checking model to check the bending line and adjacent relationship between each pair of planes. Through some evaluation experiments, we show the improvement of the core techniques to conventional studies. By applying our approach to different deformed papers, the performance of the entire system is confirmed to have around 1.59 degrees of average angular error, which is similar to the smallest angular discrimination of human eyes. As a result, for the deformation of the flat object caused by folding, if our system can get at least one feature point cluster on each plane, it can get spatial information of each bending line and each plane with acceptable accuracy. The subject of this paper is a folded plane, but we will develop it into a robotic reproduction of general object deformation.

8.
BMC Bioinformatics ; 20(Suppl 13): 381, 2019 Jul 24.
Artículo en Inglés | MEDLINE | ID: mdl-31337329

RESUMEN

BACKGROUND: How can we obtain fast and high-quality clusters in genome scale bio-networks? Graph clustering is a powerful tool applied on bio-networks to solve various biological problems such as protein complexes detection, disease module detection, and gene function prediction. Especially, MCL (Markov Clustering) has been spotlighted due to its superior performance on bio-networks. MCL, however, is skewed towards finding a large number of very small clusters (size 1-3) and fails to detect many larger clusters (size 10+). To resolve this fragmentation problem, MLR-MCL (Multi-level Regularized MCL) has been developed. MLR-MCL still suffers from the fragmentation and, in cases, unrealistically large clusters are generated. RESULTS: In this paper, we propose PS-MCL (Parallel Shotgun Coarsened MCL), a parallel graph clustering method outperforming MLR-MCL in terms of running time and cluster quality. PS-MCL adopts an efficient coarsening scheme, called SC (Shotgun Coarsening), to improve graph coarsening in MLR-MCL. SC allows merging multiple nodes at a time, which leads to improvement in quality, time and space usage. Also, PS-MCL parallelizes main operations used in MLR-MCL which includes matrix multiplication. CONCLUSIONS: Experiments show that PS-MCL dramatically alleviates the fragmentation problem, and outperforms MLR-MCL in quality and running time. We also show that the running time of PS-MCL is effectively reduced with parallelization.


Asunto(s)
Algoritmos , Proteínas/metabolismo , Análisis por Conglomerados , Cadenas de Markov , Mapas de Interacción de Proteínas , Proteínas/química
9.
BMC Bioinformatics ; 20(1): 225, 2019 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-31046665

RESUMEN

BACKGROUND: Characterizing the modular structure of cellular network is an important way to identify novel genes for targeted therapeutics. This is made possible by the rising of high-throughput technology. Unfortunately, computational methods to identify functional modules were limited by the data quality issues of high-throughput techniques. This study aims to integrate knowledge extracted from literature to further improve the accuracy of functional module identification. RESULTS: Our new model and algorithm were applied to both yeast and human interactomes. Predicted functional modules have covered over 90% of the proteins in both organisms, while maintaining a comparable overall accuracy. We found that the combination of both mRNA expression information and biomedical knowledge greatly improved the performance of functional module identification, which is better than those only using protein interaction network weighted with transcriptomic data, literature knowledge, or simply unweighted protein interaction network. Our new algorithm also achieved better performance when comparing with some other well-known methods, especially in terms of the positive predictive value (PPV), which indicated the confidence of novel discovery. CONCLUSION: Higher PPV with the multiplex approach suggested that information from both sources has been effectively integrated to reduce false positive. With protein coverage higher than 90%, our algorithm is able to generate more novel biological hypothesis with higher confidence.


Asunto(s)
Algoritmos , Mapeo de Interacción de Proteínas/métodos , Análisis por Conglomerados , Perfilación de la Expresión Génica , Genes Fúngicos , Humanos , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo
10.
BMC Genomics ; 20(1): 637, 2019 Aug 07.
Artículo en Inglés | MEDLINE | ID: mdl-31390979

RESUMEN

BACKGROUND: The detection of protein complexes is of great significance for researching mechanisms underlying complex diseases and developing new drugs. Thus, various computational algorithms have been proposed for protein complex detection. However, most of these methods are based on only topological information and are sensitive to the reliability of interactions. As a result, their performance is affected by false-positive interactions in PPINs. Moreover, these methods consider only density and modularity and ignore protein complexes with various densities and modularities. RESULTS: To address these challenges, we propose an algorithm to exploit protein complexes in PPINs by a Seed-Extended algorithm based on Density and Modularity with Topological structure and GO annotations, named SE-DMTG to improve the accuracy of protein complex detection. First, we use common neighbors and GO annotations to construct a weighted PPIN. Second, we define a new seed selection strategy to select seed nodes. Third, we design a new fitness function to detect protein complexes with various densities and modularities. We compare the performance of SE-DMTG with that of thirteen state-of-the-art algorithms on several real datasets. CONCLUSION: The experimental results show that SE-DMTG not only outperforms some classical algorithms in yeast PPINs in terms of the F-measure and Jaccard but also achieves an ideal performance in terms of functional enrichment. Furthermore, we apply SE-DMTG to PPINs of several other species and demonstrate the outstanding accuracy and matching ratio in detecting protein complexes compared with other algorithms.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Anotación de Secuencia Molecular , Mapeo de Interacción de Proteínas , Animales , Análisis por Conglomerados , Reacciones Falso Positivas , Humanos , Ratones , Aprendizaje Automático Supervisado
11.
Comput Stat Data Anal ; 132: 46-69, 2019 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38774121

RESUMEN

Clustering methods for multivariate data exploiting the underlying geometry of the graphical structure between variables are presented. As opposed to standard approaches for graph clustering that assume known graph structures, the edge structure of the unknown graph is first estimated using sparse regression based approaches for sparse graph structure learning. Subsequently, graph clustering on the lower dimensional projections of the graph is performed based on Laplacian embeddings using a penalized k-means approach, motivated by Dirichlet process mixture models in Bayesian nonparametrics. In contrast to standard algorithmic approaches for known graphs, the proposed method allows estimation and inference for both graph structure learning and clustering. More importantly, the arguments for Laplacian embeddings as suitable projections for graph clustering are formalized by providing theoretical support for the consistency of the eigenspace of the estimated graph Laplacians. Fast computational algorithms are proposed to scale the method to large number of nodes. Extensive simulations are presented to compare the clustering performance with standard methods. The methods are applied to a novel pan-cancer proteomic data set, and protein networks and clusters are evaluated across multiple different cancer types.

12.
J Biomed Inform ; 84: 42-58, 2018 08.
Artículo en Inglés | MEDLINE | ID: mdl-29906584

RESUMEN

OBJECTIVE: Automatic text summarization offers an efficient solution to access the ever-growing amounts of both scientific and clinical literature in the biomedical domain by summarizing the source documents while maintaining their most informative contents. In this paper, we propose a novel graph-based summarization method that takes advantage of the domain-specific knowledge and a well-established data mining technique called frequent itemset mining. METHODS: Our summarizer exploits the Unified Medical Language System (UMLS) to construct a concept-based model of the source document and mapping the document to the concepts. Then, it discovers frequent itemsets to take the correlations among multiple concepts into account. The method uses these correlations to propose a similarity function based on which a represented graph is constructed. The summarizer then employs a minimum spanning tree based clustering algorithm to discover various subthemes of the document. Eventually, it generates the final summary by selecting the most informative and relative sentences from all subthemes within the text. RESULTS: We perform an automatic evaluation over a large number of summaries using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics. The results demonstrate that the proposed summarization system outperforms various baselines and benchmark approaches. CONCLUSION: The carried out research suggests that the incorporation of domain-specific knowledge and frequent itemset mining equips the summarization system in a better way to address the informativeness measurement of the sentences. Moreover, clustering the graph nodes (sentences) can enable the summarizer to target different main subthemes of a source document efficiently. The evaluation results show that the proposed approach can significantly improve the performance of the summarization systems in the biomedical domain.


Asunto(s)
Análisis por Conglomerados , Minería de Datos/métodos , Informática Médica/métodos , Semántica , Algoritmos , Registros Electrónicos de Salud , Reconocimiento de Normas Patrones Automatizadas , Unified Medical Language System
13.
BMC Bioinformatics ; 18(1): 530, 2017 Nov 29.
Artículo en Inglés | MEDLINE | ID: mdl-29187152

RESUMEN

BACKGROUND: Transcription factors (TFs) form a complex regulatory network within the cell that is crucial to cell functioning and human health. While methods to establish where a TF binds to DNA are well established, these methods provide no information describing how TFs interact with one another when they do bind. TFs tend to bind the genome in clusters, and current methods to identify these clusters are either limited in scope, unable to detect relationships beyond motif similarity, or not applied to TF-TF interactions. METHODS: Here, we present a proximity-based graph clustering approach to identify TF clusters using either ChIP-seq or motif search data. We use TF co-occurrence to construct a filtered, normalized adjacency matrix and use the Markov Clustering Algorithm to partition the graph while maintaining TF-cluster and cluster-cluster interactions. We then apply our graph structure beyond clustering, using it to increase the accuracy of motif-based TFBS searching for an example TF. RESULTS: We show that our method produces small, manageable clusters that encapsulate many known, experimentally validated transcription factor interactions and that our method is capable of capturing interactions that motif similarity methods might miss. Our graph structure is able to significantly increase the accuracy of motif TFBS searching, demonstrating that the TF-TF connections within the graph correlate with biological TF-TF interactions. CONCLUSION: The interactions identified by our method correspond to biological reality and allow for fast exploration of TF clustering and regulatory dynamics.


Asunto(s)
Algoritmos , Factores de Transcripción/metabolismo , Inmunoprecipitación de Cromatina , Análisis por Conglomerados , ADN/química , ADN/aislamiento & purificación , ADN/metabolismo , Redes Reguladoras de Genes , Humanos , Células K562 , Cadenas de Markov , Mapas de Interacción de Proteínas/genética , Análisis de Secuencia de ADN , Factores de Transcripción/genética
14.
Molecules ; 22(12)2017 Dec 08.
Artículo en Inglés | MEDLINE | ID: mdl-29292776

RESUMEN

Most proteins perform their biological functions while interacting as complexes. The detection of protein complexes is an important task not only for understanding the relationship between functions and structures of biological network, but also for predicting the function of unknown proteins. We present a new nodal metric by integrating its local topological information. The metric reflects its representability in a larger local neighborhood to a cluster of a protein interaction (PPI) network. Based on the metric, we propose a seed-expansion graph clustering algorithm (SEGC) for protein complexes detection in PPI networks. A roulette wheel strategy is used in the selection of the seed to enhance the diversity of clustering. For a candidate node u, we define its closeness to a cluster C, denoted as NC(u, C), by combing the density of a cluster C and the connection between a node u and C. In SEGC, a cluster which initially consists of only a seed node, is extended by adding nodes recursively from its neighbors according to the closeness, until all neighbors fail the process of expansion. We compare the F-measure and accuracy of the proposed SEGC algorithm with other algorithms on Saccharomyces cerevisiae protein interaction networks. The experimental results show that SEGC outperforms other algorithms under full coverage.


Asunto(s)
Modelos Biológicos , Mapeo de Interacción de Proteínas/métodos , Proteínas de Saccharomyces cerevisiae/química , Algoritmos , Análisis por Conglomerados , Bases de Datos de Proteínas , Mapas de Interacción de Proteínas , Saccharomyces cerevisiae/química
15.
BMC Bioinformatics ; 17 Suppl 7: 269, 2016 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-27454228

RESUMEN

BACKGROUND: Protein-protein interaction networks are receiving increased attention due to their importance in understanding life at the cellular level. A major challenge in systems biology is to understand the modular structure of such biological networks. Although clustering techniques have been proposed for clustering protein-protein interaction networks, those techniques suffer from some drawbacks. The application of earlier clustering techniques to protein-protein interaction networks in order to predict protein complexes within the networks does not yield good results due to the small-world and power-law properties of these networks. RESULTS: In this paper, we construct a new clustering algorithm for predicting protein complexes through the use of genetic algorithms. We design an objective function for exclusive clustering and overlapping clustering. We assess the quality of our proposed clustering algorithm using two gold-standard data sets. CONCLUSIONS: Our algorithm can identify protein complexes that are significantly enriched in the gold-standard data sets. Furthermore, our method surpasses three competing methods: MCL, ClusterOne, and MCODE in terms of the quality of the predicted complexes. The source code and accompanying examples are freely available at http://faculty.kfupm.edu.sa/ics/eramadan/GACluster.zip .


Asunto(s)
Algoritmos , Mapeo de Interacción de Proteínas/métodos , Análisis por Conglomerados , Bases de Datos de Proteínas , Mapas de Interacción de Proteínas
16.
Int J Mol Sci ; 17(6)2016 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-27258269

RESUMEN

How can complex relationships among molecular or clinico-pathological entities of neurological disorders be represented and analyzed? Graphs seem to be the current answer to the question no matter the type of information: molecular data, brain images or neural signals. We review a wide spectrum of graph representation and graph analysis methods and their application in the study of both the genomic level and the phenotypic level of the neurological disorder. We find numerous research works that create, process and analyze graphs formed from one or a few data types to gain an understanding of specific aspects of the neurological disorders. Furthermore, with the increasing number of data of various types becoming available for neurological disorders, we find that integrative analysis approaches that combine several types of data are being recognized as a way to gain a global understanding of the diseases. Although there are still not many integrative analyses of graphs due to the complexity in analysis, multi-layer graph analysis is a promising framework that can incorporate various data types. We describe and discuss the benefits of the multi-layer graph framework for studies of neurological disease.


Asunto(s)
Análisis por Conglomerados , Modelos Biológicos , Enfermedades del Sistema Nervioso/etiología , Enfermedades del Sistema Nervioso/metabolismo , Animales , Encéfalo/metabolismo , Encéfalo/fisiopatología , Simulación por Computador , Redes Reguladoras de Genes , Humanos , Redes y Vías Metabólicas , Red Nerviosa , Enfermedades del Sistema Nervioso/patología , Vías Nerviosas , Mapas de Interacción de Proteínas , Transducción de Señal
17.
Neural Netw ; 170: 405-416, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38029721

RESUMEN

The multi-layer network consists of the interactions between different layers, where each layer of the network is depicted as a graph, providing a comprehensive way to model the underlying complex systems. The layer-specific modules of multi-layer networks are critical to understanding the structure and function of the system. However, existing methods fail to characterize and balance the connectivity and specificity of layer-specific modules in networks because of the complicated inter- and intra-coupling of various layers. To address the above issues, a joint learning graph clustering algorithm (DRDF) for detecting layer-specific modules in multi-layer networks is proposed, which simultaneously learns the deep representation and discriminative features. Specifically, DRDF learns the deep representation with deep nonnegative matrix factorization, where the high-order topology of the multi-layer network is gradually and precisely characterized. Moreover, it addresses the specificity of modules with discriminative feature learning, where the intra-class compactness and inter-class separation of pseudo-labels of clusters are explored as self-supervised information, thereby providing a more accurate method to explicitly model the specificity of the multi-layer network. Finally, DRDF balances the connectivity and specificity of layer-specific modules with joint learning, where the overall objective of the graph clustering algorithm and optimization rules are derived. The experiments on ten multi-layer networks showed that DRDF not only outperforms eight baselines on graph clustering but also enhances the robustness of algorithms.


Asunto(s)
Aprendizaje Discriminativo , Aprendizaje , Algoritmos , Análisis por Conglomerados , Gestión de la Información
18.
Comput Biol Med ; 169: 107852, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38134750

RESUMEN

Establishing reference intervals (RIs) for pediatric patients is crucial in clinical decision-making, and there is a critical gap of pediatric RIs in China. However, the direct sampling technique for establishing RIs is resource-intensive and ethically challenging. Indirect estimation methods, such as unsupervised clustering algorithms, have emerged as potential alternatives for predicting reference intervals. This study introduces deep graph clustering methods into indirect estimation of pediatric reference intervals. Specifically, we propose a Density Graph Deep Embedded Clustering (DGDEC) algorithm, which incorporates a density feature extractor to enhance sample representation and provides additional perspectives for distinguishing different levels of health status among populations. Additionally, we construct an adjacency matrix by computing the similarity between samples after feature enhancement. The DGDEC algorithm leverages the adjacency matrix to capture the interrelationships between patients and divides patients into different groups, thereby estimating reference intervals for the potential healthy population. The experimental results demonstrate that when compared to other indirect estimation techniques, our method ensures the predicted pediatric reference intervals in different age and gender groups are closer to the true values while maintaining good generalization performance. Additionally, through ablation experiments, our study confirms that the similarity between patients and the multi-scale density features of samples can effectively describe the potential health status of patients.


Asunto(s)
Algoritmos , Niño , Humanos , Análisis por Conglomerados
19.
PNAS Nexus ; 2(6): pgad180, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-37287709

RESUMEN

Graph clustering is a fundamental problem in machine learning with numerous applications in data science. State-of-the-art approaches to the problem, Louvain and Leiden, aim at optimizing the modularity function. However, their greedy nature leads to fast convergence to sub-optimal solutions. Here, we design a new approach to graph clustering, Tel-Aviv University (TAU), that efficiently explores the solution space using a genetic algorithm. We benchmark TAU on synthetic and real data sets and show its superiority over previous methods both in terms of the modularity of the computed solution and its similarity to a ground-truth partition when such exists. TAU is available at https://github.com/GalGilad/TAU.

20.
Proc Symp Appl Comput ; 2023: 518-527, 2023 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-37720922

RESUMEN

Patients with cancer or other chronic diseases often experience different symptoms before or after treatments. The symptoms could be physical, gastrointestinal, psychological, or cognitive (memory loss), or other types. Previous research focuses on understanding the individual symptoms or symptom correlations by collecting data through symptom surveys and using traditional statistical methods to analyze the symptoms, such as principal component analysis or factor analysis. This research proposes a computational system, SymptomGraph, to identify the symptom clusters in the narrative text of written clinical notes in electronic health records (EHR). SymptomGraph is developed to use a set of natural language processing (NLP) and artificial intelligence (AI) methods to first extract the clinician-documented symptoms from clinical notes. Then, a semantic symptom expression clustering method is used to discover a set of typical symptoms. A symptom graph is built based on the co-occurrences of the symptoms. Finally, a graph clustering algorithm is developed to discover the symptom clusters. Although SymptomGraph is applied to the narrative clinical notes, it can be adapted to analyze symptom survey data. We applied Symptom-Graph on a colorectal cancer patient with and without diabetes (Type 2) data set to detect the patient symptom clusters one year after the chemotherapy. Our results show that SymptomGraph can identify the typical symptom clusters of colorectal cancer patients' post-chemotherapy. The results also show that colorectal cancer patients with diabetes often show more symptoms of peripheral neuropathy, younger patients have mental dysfunctions of alcohol or tobacco abuse, and patients at later cancer stages show more memory loss symptoms. Our system can be generalized to extract and analyze symptom clusters of other chronic diseases or acute diseases like COVID-19.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA