Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 148
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38855914

RESUMEN

Cluster analysis, a pivotal step in single-cell sequencing data analysis, presents substantial opportunities to effectively unveil the molecular mechanisms underlying cellular heterogeneity and intercellular phenotypic variations. However, the inherent imperfections arise as different clustering algorithms yield diverse estimates of cluster numbers and cluster assignments. This study introduces Single Cell Consistent Clustering based on Spectral Matrix Decomposition (SCSMD), a comprehensive clustering approach that integrates the strengths of multiple methods to determine the optimal clustering scheme. Testing the performance of SCSMD across different distances and employing the bespoke evaluation metric, the methodological selection undergoes validation to ensure the optimal efficacy of the SCSMD. A consistent clustering test is conducted on 15 authentic scRNA-seq datasets. The application of SCSMD to human embryonic stem cell scRNA-seq data successfully identifies known cell types and delineates their developmental trajectories. Similarly, when applied to glioblastoma cells, SCSMD accurately detects pre-existing cell types and provides finer sub-division within one of the original clusters. The results affirm the robust performance of our SCSMD method in terms of both the number of clusters and cluster assignments. Moreover, we have broadened the application scope of SCSMD to encompass larger datasets, thereby furnishing additional evidence of its superiority. The findings suggest that SCSMD is poised for application to additional scRNA-seq datasets and for further downstream analyses.


Asunto(s)
Algoritmos , Análisis de la Célula Individual , Humanos , Análisis de la Célula Individual/métodos , Análisis por Conglomerados , Biología Computacional/métodos , Glioblastoma/genética , Glioblastoma/patología , Glioblastoma/metabolismo
2.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36445207

RESUMEN

Driven by multi-omics data, some multi-view clustering algorithms have been successfully applied to cancer subtypes prediction, aiming to identify subtypes with biometric differences in the same cancer, thereby improving the clinical prognosis of patients and designing personalized treatment plan. Due to the fact that the number of patients in omics data is much smaller than the number of genes, multi-view spectral clustering based on similarity learning has been widely developed. However, these algorithms still suffer some problems, such as over-reliance on the quality of pre-defined similarity matrices for clustering results, inability to reasonably handle noise and redundant information in high-dimensional omics data, ignoring complementary information between omics data, etc. This paper proposes multi-view spectral clustering with latent representation learning (MSCLRL) method to alleviate the above problems. First, MSCLRL generates a corresponding low-dimensional latent representation for each omics data, which can effectively retain the unique information of each omics and improve the robustness and accuracy of the similarity matrix. Second, the obtained latent representations are assigned appropriate weights by MSCLRL, and global similarity learning is performed to generate an integrated similarity matrix. Third, the integrated similarity matrix is used to feed back and update the low-dimensional representation of each omics. Finally, the final integrated similarity matrix is used for clustering. In 10 benchmark multi-omics datasets and 2 separate cancer case studies, the experiments confirmed that the proposed method obtained statistically and biologically meaningful cancer subtypes.


Asunto(s)
Multiómica , Neoplasias , Humanos , Algoritmos , Neoplasias/genética , Análisis por Conglomerados
3.
Cereb Cortex ; 34(2)2024 01 31.
Artículo en Inglés | MEDLINE | ID: mdl-38300216

RESUMEN

The dorsolateral prefrontal cortex (DLPFC) assumes a central role in cognitive and behavioral control, emerging as a crucial target region for interventions in autism spectrum disorder neuroregulation. Consequently, we endeavor to unravel the functional subregions within the DLPFC to shed light on the intricate functions of the brain. We introduce a distance-constrained spectral clustering (SC-DW) methodology that leverages functional connection to identify distinctive functional subregions within the DLPFC. Furthermore, we verify the relationship between the functional characteristics of these subregions and their clinical implications. Our methodology begins with principal component analysis to extract the salient features. Subsequently, we construct an adjacency matrix, which is constrained by the spatial properties of the brain, by linearly combining the distance matrix and a similarity matrix. The quality of spectral clustering is further optimized through multiple cluster evaluation coefficient. The results from SC-DW revealed four uniform and contiguous subregions within the bilateral DLPFC. Notably, we observe a substantial positive correlation between the functional characteristics of the third and fourth subregions in the left DLPFC with clinical manifestations. These findings underscore the unique insights offered by our proposed methodology in the realms of brain subregion delineation and therapeutic targeting.


Asunto(s)
Trastorno del Espectro Autista , Corteza Prefontal Dorsolateral , Humanos , Imagen por Resonancia Magnética/métodos , Mapeo Encefálico/métodos , Trastorno del Espectro Autista/diagnóstico por imagen , Corteza Prefrontal/diagnóstico por imagen , Corteza Prefrontal/fisiología , Análisis por Conglomerados
4.
Neuroimage ; 297: 120739, 2024 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-39009250

RESUMEN

Heritability and genetic covariance/correlation quantify the marginal and shared genetic effects across traits. They offer insights on the genetic architecture of complex traits and diseases. To explore how genetic variations contribute to brain function variations, we estimated heritability and genetic correlation across cortical thickness, surface area, and volume of 33 anatomically predefined regions in left and right hemispheres, using summary statistics of genome-wide association analyses of 31,968 participants in the UK Biobank. To characterize the relationships between these regions of interest, we constructed a genetic network for these regions using recursive two-way cut-offs in similarity matrices defined by genetic correlations. The inferred genetic network matches the brain lobe mapping more closely than the network inferred from phenotypic similarities. We further studied the associations between the genetic network for brain regions and 30 complex traits through a novel composite-linkage disequilibrium score regression method. We identified seven significant pairs, which offer insights on the genetic basis for regions of interest mediated by cortical measures.


Asunto(s)
Corteza Cerebral , Estudio de Asociación del Genoma Completo , Humanos , Corteza Cerebral/diagnóstico por imagen , Corteza Cerebral/anatomía & histología , Femenino , Masculino , Imagen por Resonancia Magnética , Persona de Mediana Edad , Redes Reguladoras de Genes/genética , Polimorfismo de Nucleótido Simple , Anciano , Modelos Genéticos
5.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35419595

RESUMEN

Limitations of bulk sequencing techniques on cell heterogeneity and diversity analysis have been pushed with the development of single-cell RNA-sequencing (scRNA-seq). To detect clusters of cells is a key step in the analysis of scRNA-seq. However, the high-dimensionality of scRNA-seq data and the imbalances in the number of different subcellular types are ubiquitous in real scRNA-seq data sets, which poses a huge challenge to the single-cell-type detection.We propose a meta-learning-based model, SiaClust, which is the combination of Siamese Convolutional Neural Network (CNN) and improved spectral clustering, to achieve scRNA-seq cell type detection. To be specific, with the help of the constrained Sigmoid kernel, the raw high-dimensionality data is mapped to a low-dimensional space, and the Siamese CNN learns the differences between the cell types in the low-dimensional feature space. The similarity matrix learned by Siamese CNN is used in combination with improved spectral clustering and t-distribution Stochastic Neighbor Embedding (t-SNE) for visualization. SiaClust highlights the differences between cell types by comparing the similarity of the samples, whereas blurring the differences within the cell types is better in processing high-dimensional and imbalanced data. SiaClust significantly improves clustering accuracy by using data generated by nine different species and tissues through different scNA-seq protocols for extensive evaluation, as well as analogies to state-of-the-art single-cell clustering models. More importantly, SiaClust accurately locates the exact site of dropout gene, and is more flexible with data size and cell type.


Asunto(s)
Algoritmos , Análisis de la Célula Individual , Análisis por Conglomerados , Perfilación de la Expresión Génica , RNA-Seq , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos
6.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35255494

RESUMEN

Single-particle cryo-electron microscopy (cryo-EM) has become one of the mainstream technologies in the field of structural biology to determine the three-dimensional (3D) structures of biological macromolecules. Heterogeneous cryo-EM projection image classification is an effective way to discover conformational heterogeneity of biological macromolecules in different functional states. However, due to the low signal-to-noise ratio of the projection images, the classification of heterogeneous cryo-EM projection images is a very challenging task. In this paper, two novel distance measures between projection images integrating the reliability of common lines, pixel intensity and class averages are designed, and then a two-stage spectral clustering algorithm based on the two distance measures is proposed for heterogeneous cryo-EM projection image classification. In the first stage, the novel distance measure integrating common lines and pixel intensities of projection images is used to obtain preliminary classification results through spectral clustering. In the second stage, another novel distance measure integrating the first novel distance measure and class averages generated from each group of projection images is used to obtain the final classification results through spectral clustering. The proposed two-stage spectral clustering algorithm is applied on a simulated and a real cryo-EM dataset for heterogeneous reconstruction. Results show that the two novel distance measures can be used to improve the classification performance of spectral clustering, and using the proposed two-stage spectral clustering algorithm can achieve higher classification and reconstruction accuracy than using RELION and XMIPP.


Asunto(s)
Algoritmos , Procesamiento de Imagen Asistido por Computador , Análisis por Conglomerados , Microscopía por Crioelectrón/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Reproducibilidad de los Resultados , Relación Señal-Ruido
7.
Bull Math Biol ; 86(9): 105, 2024 Jul 12.
Artículo en Inglés | MEDLINE | ID: mdl-38995438

RESUMEN

The growing complexity of biological data has spurred the development of innovative computational techniques to extract meaningful information and uncover hidden patterns within vast datasets. Biological networks, such as gene regulatory networks and protein-protein interaction networks, hold critical insights into biological features' connections and functions. Integrating and analyzing high-dimensional data, particularly in gene expression studies, stands prominent among the challenges in deciphering these networks. Clustering methods play a crucial role in addressing these challenges, with spectral clustering emerging as a potent unsupervised technique considering intrinsic geometric structures. However, spectral clustering's user-defined cluster number can lead to inconsistent and sometimes orthogonal clustering regimes. We propose the Multi-layer Bundling (MLB) method to address this limitation, combining multiple prominent clustering regimes to offer a comprehensive data view. We call the outcome clusters "bundles". This approach refines clustering outcomes, unravels hierarchical organization, and identifies bridge elements mediating communication between network components. By layering clustering results, MLB provides a global-to-local view of biological feature clusters enabling insights into intricate biological systems. Furthermore, the method enhances bundle network predictions by integrating the bundle co-cluster matrix with the affinity matrix. The versatility of MLB extends beyond biological networks, making it applicable to various domains where understanding complex relationships and patterns is needed.


Asunto(s)
Algoritmos , Biología Computacional , Redes Reguladoras de Genes , Conceptos Matemáticos , Mapas de Interacción de Proteínas , Análisis por Conglomerados , Humanos , Modelos Biológicos , Perfilación de la Expresión Génica/estadística & datos numéricos , Perfilación de la Expresión Génica/métodos
8.
Sensors (Basel) ; 24(4)2024 Feb 19.
Artículo en Inglés | MEDLINE | ID: mdl-38400499

RESUMEN

Underwater acoustic technology as an important means of exploring the oceans is receiving more attention. Denoising for underwater acoustic information in complex marine environments has become a hot research topic. In order to realize the hydrophone signal denoising, this paper proposes a joint denoising method based on improved symplectic geometry modal decomposition (ISGMD) and wavelet threshold (WT). Firstly, the energy contribution (EC) is introduced into the SGMD as an iterative termination condition, which efficiently improves the denoising capability of SGMD and generates a reasonable number of symplectic geometry components (SGCs). Then spectral clustering (SC) is used to accurately aggregate SGCs into information clusters mixed-clusters, and noise clusters. Spectrum entropy (SE) is used to distinguish clusters quickly. Finally, the mixed clusters achieve the signal denoising by wavelet threshold. The useful information is reconstructed to achieve the original signal denoising. In the simulation experiment, the denoising effect of different denoising algorithms in the time domain and frequency domain is compared, and SNR and RMSE are used as evaluation indexes. The results show that the proposed algorithm has better performance. In the experiment of hydrophone, the denoising ability of the proposed algorithm is also verified.

9.
J Sci Food Agric ; 104(7): 4309-4319, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38305465

RESUMEN

BACKGROUND: Due to the scalability of deep learning technology, researchers have applied it to the non-destructive testing of peach internal quality. In addition, the soluble solids content (SSC) is an important internal quality indicator that determines the quality of peaches. Peaches with high SSC have a sweeter taste and better texture, making them popular in the market. Therefore, SSC is an important indicator for measuring peach internal quality and making harvesting decisions. RESULTS: This article presents the High Order Spatial Interaction Network (HOSINet), which combines the Position Attention Module (PAM) and Channel Attention Module (CAM). Additionally, a feature wavelength selection algorithm similar to the Group-based Clustering Subspace Representation (GCSR-C) is used to establish the Position and Channel Attention Module-High Order Spatial Interaction (PC-HOSI) model for peach SSC prediction. The accuracy of this model is compared with traditional machine learning and traditional deep learning models. Finally, the permutation algorithm is combined with deep learning models to visually evaluate the importance of feature wavelengths. Increasing the order of the PC-HOSI model enhances its ability to learn spatial correlations in the dataset, thus improving its predictive performance. CONCLUSION: The optimal model, PC-HOSI model, performed well with an order of 3 (PC-HOSI-3), with a root mean square error of 0.421 °Brix and a coefficient of determination of 0.864. Compared with traditional machine learning and deep learning algorithms, the coefficient of determination for the prediction set was improved by 0.07 and 0.39, respectively. The permutation algorithm also provided interpretability analysis for the predictions of the deep learning model, offering insights into the importance of spectral bands. These results contribute to the accurate prediction of SSC in peaches and support research on interpretability of neural network models for prediction. © 2024 Society of Chemical Industry.


Asunto(s)
Prunus persica , Espectroscopía Infrarroja Corta/métodos , Análisis de los Mínimos Cuadrados , Algoritmos , Redes Neurales de la Computación
10.
J Proteome Res ; 22(5): 1501-1509, 2023 05 05.
Artículo en Inglés | MEDLINE | ID: mdl-36802412

RESUMEN

Liquid chromatography coupled with tandem mass spectrometry is commonly adopted in large-scale glycoproteomic studies involving hundreds of disease and control samples. The software for glycopeptide identification in such data (e.g., the commercial software Byonic) analyzes the individual data set and does not exploit the redundant spectra of glycopeptides presented in the related data sets. Herein, we present a novel concurrent approach for glycopeptide identification in multiple related glycoproteomic data sets by using spectral clustering and spectral library searching. The evaluation on two large-scale glycoproteomic data sets showed that the concurrent approach can identify 105%-224% more spectra as glycopeptides compared to the glycopeptide identification on individual data sets using Byonic alone. The improvement of glycopeptide identification also enabled the discovery of several potential biomarkers of protein glycosylations in hepatocellular carcinoma patients.


Asunto(s)
Neoplasias Hepáticas , Espectrometría de Masas en Tándem , Humanos , Espectrometría de Masas en Tándem/métodos , Glicopéptidos/análisis , Cromatografía Liquida , Programas Informáticos
11.
J Proteome Res ; 22(6): 1639-1648, 2023 06 02.
Artículo en Inglés | MEDLINE | ID: mdl-37166120

RESUMEN

As current shotgun proteomics experiments can produce gigabytes of mass spectrometry data per hour, processing these massive data volumes has become progressively more challenging. Spectral clustering is an effective approach to speed up downstream data processing by merging highly similar spectra to minimize data redundancy. However, because state-of-the-art spectral clustering tools fail to achieve optimal runtimes, this simply moves the processing bottleneck. In this work, we present a fast spectral clustering tool, HyperSpec, based on hyperdimensional computing (HDC). HDC shows promising clustering capability while only requiring lightweight binary operations with high parallelism that can be optimized using low-level hardware architectures, making it possible to run HyperSpec on graphics processing units to achieve extremely efficient spectral clustering performance. Additionally, HyperSpec includes optimized data preprocessing modules to reduce the spectrum preprocessing time, which is a critical bottleneck during spectral clustering. Based on experiments using various mass spectrometry data sets, HyperSpec produces results with comparable clustering quality as state-of-the-art spectral clustering tools while achieving speedups by orders of magnitude, shortening the clustering runtime of over 21 million spectra from 4 h to only 24 min.


Asunto(s)
Algoritmos , Péptidos , Péptidos/análisis , Espectrometría de Masas/métodos , Proteómica/métodos , Análisis por Conglomerados
12.
Brief Bioinform ; 22(4)2021 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-33003206

RESUMEN

Single-cell RNA-sequencing (scRNA-seq) data widely exist in bioinformatics. It is crucial to devise a distance metric for scRNA-seq data. Almost all existing clustering methods based on spectral clustering algorithms work in three separate steps: similarity graph construction; continuous labels learning; discretization of the learned labels by k-means clustering. However, this common practice has potential flaws that may lead to severe information loss and degradation of performance. Furthermore, the performance of a kernel method is largely determined by the selected kernel; a self-weighted multiple kernel learning model can help choose the most suitable kernel for scRNA-seq data. To this end, we propose to automatically learn similarity information from data. We present a new clustering method in the form of a multiple kernel combination that can directly discover groupings in scRNA-seq data. The main proposition is that automatically learned similarity information from scRNA-seq data is used to transform the candidate solution into a new solution that better approximates the discrete one. The proposed model can be efficiently solved by the standard support vector machine (SVM) solvers. Experiments on benchmark scRNA-Seq data validate the superior performance of the proposed model. Spectral clustering with multiple kernels is implemented in Matlab, licensed under Massachusetts Institute of Technology (MIT) and freely available from the Github website, https://github.com/Cuteu/SMSC/.


Asunto(s)
Algoritmos , Bases de Datos de Ácidos Nucleicos , RNA-Seq , Análisis de la Célula Individual , Análisis por Conglomerados
13.
Biometrics ; 79(2): 940-950, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-35338489

RESUMEN

High-dimensional clustering analysis is a challenging problem in statistics and machine learning, with broad applications such as the analysis of microarray data and RNA-seq data. In this paper, we propose a new clustering procedure called spectral clustering with feature selection (SC-FS), where we first obtain an initial estimate of labels via spectral clustering, then select a small fraction of features with the largest R-squared with these labels, that is, the proportion of variation explained by group labels, and conduct clustering again using selected features. Under mild conditions, we prove that the proposed method identifies all informative features with high probability and achieves the minimax optimal clustering error rate for the sparse Gaussian mixture model. Applications of SC-FS to four real-world datasets demonstrate its usefulness in clustering high-dimensional data.


Asunto(s)
Algoritmos , Aprendizaje Automático , Análisis por Conglomerados
14.
Molecules ; 28(6)2023 Mar 19.
Artículo en Inglés | MEDLINE | ID: mdl-36985742

RESUMEN

Optical spectroscopic analysis of the chemical composition of milk in its natural state is complicated by a complex colloidal structure, represented by differently sized fat and protein particles. The classical techniques of molecular spectroscopy in the visible, near-, and mid-infrared ranges carry only bulk chemical information about a sample, which usually undergoes a destructive preparation stage. The combination of Raman spectroscopy with confocal microscopy provides a unique opportunity to obtain a vibrational spectrum at any single point of the sample volume. In this study, scanning confocal Raman microscopy was applied for the first time to investigate the chemical microstructure of milk using samples of various compositions. The obtained hyperspectral images of selected planes in milk samples are represented by three-dimensional data arrays. Chemometric data analysis, in particular the method of multivariate curve resolution, has been used to extract the chemical information from complex partially overlaid spectral responses. The results obtained show the spatial distribution of the main chemical components, i.e., fat, protein, and lactose, in the milk samples under study using intuitive graphical maps. The proposed experimental and data analysis method can be used in an advanced chemical analysis of natural milk and products on its basis.


Asunto(s)
Imágenes Hiperespectrales , Leche , Animales , Espectrometría Raman/métodos , Microscopía Confocal , Vibración
15.
Entropy (Basel) ; 25(2)2023 Feb 13.
Artículo en Inglés | MEDLINE | ID: mdl-36832711

RESUMEN

Community detection is an important and powerful way to understand the latent structure of complex networks in social network analysis. This paper considers the problem of estimating community memberships of nodes in a directed network, where a node may belong to multiple communities. For such a directed network, existing models either assume that each node belongs solely to one community or ignore variation in node degree. Here, a directed degree corrected mixed membership (DiDCMM) model is proposed by considering degree heterogeneity. An efficient spectral clustering algorithm with a theoretical guarantee of consistent estimation is designed to fit DiDCMM. We apply our algorithm to a small scale of computer-generated directed networks and several real-world directed networks.

16.
Entropy (Basel) ; 25(12)2023 Dec 03.
Artículo en Inglés | MEDLINE | ID: mdl-38136497

RESUMEN

To address the problem that traditional spectral clustering algorithms cannot obtain the complete structural information of networks, this paper proposes a spectral clustering community detection algorithm, PMIK-SC, based on the point-wise mutual information (PMI) graph kernel. The kernel is constructed according to the point-wise mutual information between nodes, which is then used as a proximity matrix to reconstruct the network and obtain the symmetric normalized Laplacian matrix. Finally, the network is partitioned by the eigendecomposition and eigenvector clustering of the Laplacian matrix. In addition, to determine the number of clusters during spectral clustering, this paper proposes a fast algorithm, BI-CNE, for estimating the number of communities. For a specific network, the algorithm first reconstructs the original network and then runs Monte Carlo sampling to estimate the number of communities by Bayesian inference. Experimental results show that the detection speed and accuracy of the algorithm are superior to other existing algorithms for estimating the number of communities. On this basis, the spectral clustering community detection algorithm PMIK-SC also has high accuracy and stability compared with other community detection algorithms and spectral clustering algorithms.

17.
Entropy (Basel) ; 25(5)2023 May 12.
Artículo en Inglés | MEDLINE | ID: mdl-37238547

RESUMEN

The Internet of Vehicles (IoV) enables vehicular data services and applications through vehicle-to-everything (V2X) communications. One of the key services provided by IoV is popular content distribution (PCD), which aims to quickly deliver popular content that most vehicles request. However, it is challenging for vehicles to receive the complete popular content from roadside units (RSUs) due to their mobility and the RSUs' constrained coverage. The collaboration of vehicles via vehicle-to-vehicle (V2V) communications is an effective solution to assist more vehicles to obtain the entire popular content at a lower time cost. To this end, we propose a multi-agent deep reinforcement learning (MADRL)-based popular content distribution scheme in vehicular networks, where each vehicle deploys an MADRL agent that learns to choose the appropriate data transmission policy. To reduce the complexity of the MADRL-based algorithm, a vehicle clustering algorithm based on spectral clustering is provided to divide all vehicles in the V2V phase into groups, so that only vehicles within the same group exchange data. Then the multi-agent proximal policy optimization (MAPPO) algorithm is used to train the agent. We introduce the self-attention mechanism when constructing the neural network for the MADRL to help the agent accurately represent the environment and make decisions. Furthermore, the invalid action masking technique is utilized to prevent the agent from taking invalid actions, accelerating the training process of the agent. Finally, experimental results are shown and a comprehensive comparison is provided, which demonstrates that our MADRL-PCD scheme outperforms both the coalition game-based scheme and the greedy strategy-based scheme, achieving a higher PCD efficiency and a lower transmission delay.

18.
BMC Bioinformatics ; 23(1): 303, 2022 Jul 26.
Artículo en Inglés | MEDLINE | ID: mdl-35883022

RESUMEN

BACKGROUND: The discovery of critical biomarkers is significant for clinical diagnosis, drug research and development. Researchers usually obtain biomarkers from microarray data, which comes from the dimensional curse. Feature selection in machine learning is usually used to solve this problem. However, most methods do not fully consider feature dependence, especially the real pathway relationship of genes. RESULTS: Experimental results show that the proposed method is superior to classical algorithms and advanced methods in feature number and accuracy, and the selected features have more significance. METHOD: This paper proposes a feature selection method based on a graph neural network. The proposed method uses the actual dependencies between features and the Pearson correlation coefficient to construct graph-structured data. The information dissemination and aggregation operations based on graph neural network are applied to fuse node information on graph structured data. The redundant features are clustered by the spectral clustering method. Then, the feature ranking aggregation model using eight feature evaluation methods acts on each clustering sub-cluster for different feature selection. CONCLUSION: The proposed method can effectively remove redundant features. The algorithm's output has high stability and classification accuracy, which can potentially select potential biomarkers.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Biomarcadores , Análisis por Conglomerados , Aprendizaje Automático
19.
Hum Mutat ; 43(9): 1268-1285, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-35475554

RESUMEN

Von Hippel-Lindau (VHL) disease is a hereditary cancer syndrome where individuals are predisposed to tumor development in the brain, adrenal gland, kidney, and other organs. It is caused by pathogenic variants in the VHL tumor suppressor gene. Standardized disease information has been difficult to collect due to the rarity and diversity of VHL patients. Over 4100 unique articles published until October 2019 were screened for germline genotype-phenotype data. Patient data were translated into standardized descriptions using Human Genome Variation Society gene variant nomenclature and Human Phenotype Ontology terms and has been manually curated into an open-access knowledgebase called Clinical Interpretation of Variants in Cancer. In total, 634 unique VHL variants, 2882 patients, and 1991 families from 427 papers were captured. We identified relationship trends between phenotype and genotype data using classic statistical methods and spectral clustering unsupervised learning. Our analyses reveal earlier onset of pheochromocytoma/paraganglioma and retinal angiomas, phenotype co-occurrences and genotype-phenotype correlations including hotspots. It confirms existing VHL associations and can be used to identify new patterns and associations in VHL disease. Our database serves as an aggregate knowledge translation tool to facilitate sharing information about the pathogenicity of VHL variants.


Asunto(s)
Neoplasias de las Glándulas Suprarrenales , Enfermedad de von Hippel-Lindau , Neoplasias de las Glándulas Suprarrenales/diagnóstico , Neoplasias de las Glándulas Suprarrenales/genética , Genotipo , Humanos , Aprendizaje Automático , Fenotipo , Proteína Supresora de Tumores del Síndrome de Von Hippel-Lindau/genética , Enfermedad de von Hippel-Lindau/complicaciones , Enfermedad de von Hippel-Lindau/diagnóstico , Enfermedad de von Hippel-Lindau/genética
20.
Proc Natl Acad Sci U S A ; 116(13): 5995-6000, 2019 03 26.
Artículo en Inglés | MEDLINE | ID: mdl-30850525

RESUMEN

Clustering is concerned with coherently grouping observations without any explicit concept of true groupings. Spectral graph clustering-clustering the vertices of a graph based on their spectral embedding-is commonly approached via K-means (or, more generally, Gaussian mixture model) clustering composed with either Laplacian spectral embedding (LSE) or adjacency spectral embedding (ASE). Recent theoretical results provide deeper understanding of the problem and solutions and lead us to a "two-truths" LSE vs. ASE spectral graph clustering phenomenon convincingly illustrated here via a diffusion MRI connectome dataset: The different embedding methods yield different clustering results, with LSE capturing left hemisphere/right hemisphere affinity structure and ASE capturing gray matter/white matter core-periphery structure.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA