Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 139
Filtrar
1.
Int J Neural Syst ; : 2450040, 2024 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-38753012

RESUMEN

Neonatal epilepsy is a common emergency phenomenon in neonatal intensive care units (NICUs), which requires timely attention, early identification, and treatment. Traditional detection methods mostly use supervised learning with enormous labeled data. Hence, this study offers a semi-supervised hybrid architecture for detecting seizures, which combines the extracted electroencephalogram (EEG) feature dataset and convolutional autoencoder, called Fd-CAE. First, various features in the time domain and entropy domain are extracted to characterize the EEG signal, which helps distinguish epileptic seizures subsequently. Then, the unlabeled EEG features are fed into the convolutional autoencoder (CAE) for training, which effectively represents EEG features by optimizing the loss between the input and output features. This unsupervised feature learning process can better combine and optimize EEG features from unlabeled data. After that, the pre-trained encoder part of the model is used for further feature learning of labeled data to obtain its low-dimensional feature representation and achieve classification. This model is performed on the neonatal EEG dataset collected at the University of Helsinki Hospital, which has a high discriminative ability to detect seizures, with an accuracy of 92.34%, precision of 93.61%, recall rate of 98.74%, and F1-score of 95.77%, respectively. The results show that unsupervised learning by CAE is beneficial to the characterization of EEG signals, and the proposed Fd-CAE method significantly improves classification performance.

2.
Artículo en Inglés | MEDLINE | ID: mdl-38568771

RESUMEN

The pathogenesis of Alzheimer's disease (AD) is extremely intricate, which makes AD patients almost incurable. Recent studies have demonstrated that analyzing multi-modal data can offer a comprehensive perspective on the different stages of AD progression, which is beneficial for early diagnosis of AD. In this paper, we propose a deep self-reconstruction fusion similarity hashing (DS-FSH) method to effectively capture the AD-related biomarkers from the multi-modal data and leverage them to diagnose AD. Given that most existing methods ignore the topological structure of the data, a deep self-reconstruction model based on random walk graph regularization is designed to reconstruct the multi-modal data, thereby learning the nonlinear relationship between samples. Additionally, a fused similarity hash based on anchor graph is proposed to generate discriminative binary hash codes for multi-modal reconstructed data. This allows sample fused similarity to be effectively modeled by a fusion similarity matrix based on anchor graph while modal correlation can be approximated by Hamming distance. Especially, extracted features from the multi-modal data are classified using deep sparse autoencoders classifier. Finally, experiments conduct on the AD Neuroimaging Initiative database show that DS-FSH outperforms comparable methods of AD classification. To conclude, DS-FSH identifies multi-modal features closely associated with AD, which are expected to contribute significantly to understanding of the pathogenesis of AD.

3.
IEEE J Biomed Health Inform ; 28(5): 3029-3041, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38427553

RESUMEN

The roles of brain region activities and genotypic functions in the pathogenesis of Alzheimer's disease (AD) remain unclear. Meanwhile, current imaging genetics methods are difficult to identify potential pathogenetic markers by correlation analysis between brain network and genetic variation. To discover disease-related brain connectome from the specific brain structure and the fine-grained level, based on the Automated Anatomical Labeling (AAL) and human Brainnetome atlases, the functional brain network is first constructed for each subject. Specifically, the upper triangle elements of the functional connectivity matrix are extracted as connectivity features. The clustering coefficient and the average weighted node degree are developed to assess the significance of every brain area. Since the constructed brain network and genetic data are characterized by non-linearity, high-dimensionality, and few subjects, the deep subspace clustering algorithm is proposed to reconstruct the original data. Our multilayer neural network helps capture the non-linear manifolds, and subspace clustering learns pairwise affinities between samples. Moreover, most approaches in neuroimaging genetics are unsupervised learning, neglecting the diagnostic information related to diseases. We presented a label constraint with diagnostic status to instruct the imaging genetics correlation analysis. To this end, a diagnosis-guided deep subspace clustering association (DDSCA) method is developed to discover brain connectome and risk genetic factors by integrating genotypes with functional network phenotypes. Extensive experiments prove that DDSCA achieves superior performance to most association methods and effectively selects disease-relevant genetic markers and brain connectome at the coarse-grained and fine-grained levels.


Asunto(s)
Enfermedad de Alzheimer , Encéfalo , Imagen por Resonancia Magnética , Humanos , Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/diagnóstico por imagen , Análisis por Conglomerados , Encéfalo/diagnóstico por imagen , Imagen por Resonancia Magnética/métodos , Conectoma/métodos , Algoritmos , Anciano , Biomarcadores , Femenino , Masculino , Atlas como Asunto , Neuroimagen/métodos
4.
IEEE J Biomed Health Inform ; 28(5): 3178-3185, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38408006

RESUMEN

CircRNA has been proved to play an important role in the diseases diagnosis and treatment. Considering that the wet-lab is time-consuming and expensive, computational methods are viable alternative in these years. However, the number of circRNA-disease associations (CDAs) that can be verified is relatively few, and some methods do not take full advantage of dependencies between attributes. To solve these problems, this paper proposes a novel method based on Kernel Fusion and Deep Auto-encoder (KFDAE) to predict the potential associations between circRNAs and diseases. Firstly, KFDAE uses a non-linear method to fuse the circRNA similarity kernels and disease similarity kernels. Then the vectors are connected to make the positive and negative sample sets, and these data are send to deep auto-encoder to reduce dimension and extract features. Finally, three-layer deep feedforward neural network is used to learn features and gain the prediction score. The experimental results show that compared with existing methods, KFDAE achieves the best performance. In addition, the results of case studies prove the effectiveness and practical significance of KFDAE, which means KFDAE is able to capture more comprehensive information and generate credible candidate for subsequent wet-lab.


Asunto(s)
Algoritmos , Biología Computacional , Redes Neurales de la Computación , ARN Circular , Humanos , ARN Circular/genética , Biología Computacional/métodos , Aprendizaje Profundo
5.
Artículo en Inglés | MEDLINE | ID: mdl-36912759

RESUMEN

The development and widespread utilization of high-throughput sequencing technologies in biology has fueled the rapid growth of single-cell RNA sequencing (scRNA-seq) data over the past decade. The development of scRNA-seq technology has significantly expanded researchers' understanding of cellular heterogeneity. Accurate cell type identification is the prerequisite for any research on heterogeneous cell populations. However, due to the high noise and high dimensionality of scRNA-seq data, improving the effectiveness of cell type identification remains a challenge. As an effective dimensionality reduction method, Principal Component Analysis (PCA) is an essential tool for visualizing high-dimensional scRNA-seq data and identifying cell subpopulations. However, traditional PCA has some defects when used in mining the nonlinear manifold structure of the data and usually suffers from over-density of principal components (PCs). Therefore, we present a novel method in this paper called joint L2,p-norm and random walk graph constrained PCA (RWPPCA). RWPPCA aims to retain the data's local information in the process of mapping high-dimensional data to low-dimensional space, to more accurately obtain sparse principal components and to then identify cell types more precisely. Specifically, RWPPCA combines the random walk (RW) algorithm with graph regularization to more accurately determine the local geometric relationships between data points. Moreover, to mitigate the adverse effects of dense PCs, the L2,p-norm is introduced to make the PCs sparser, thus increasing their interpretability. Then, we evaluate the effectiveness of RWPPCA on simulated data and scRNA-seq data. The results show that RWPPCA performs well in cell type identification and outperforms other comparison methods.


Asunto(s)
Análisis de la Célula Individual , Análisis de Expresión Génica de una Sola Célula , Análisis de Componente Principal , Análisis de la Célula Individual/métodos , Algoritmos , Análisis por Conglomerados
6.
IEEE J Biomed Health Inform ; 28(2): 1110-1121, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38055359

RESUMEN

Accumulating evidence indicates that microRNAs (miRNAs) can control and coordinate various biological processes. Consequently, abnormal expressions of miRNAs have been linked to various complex diseases. Recognizable proof of miRNA-disease associations (MDAs) will contribute to the diagnosis and treatment of human diseases. Nevertheless, traditional experimental verification of MDAs is laborious and limited to small-scale. Therefore, it is necessary to develop reliable and effective computational methods to predict novel MDAs. In this work, a multi-kernel graph attention deep autoencoder (MGADAE) method is proposed to predict potential MDAs. In detail, MGADAE first employs the multiple kernel learning (MKL) algorithm to construct an integrated miRNA similarity and disease similarity, providing more biological information for further feature learning. Second, MGADAE combines the known MDAs, disease similarity, and miRNA similarity into a heterogeneous network, then learns the representations of miRNAs and diseases through graph convolution operation. After that, an attention mechanism is introduced into MGADAE to integrate the representations from multiple graph convolutional network (GCN) layers. Lastly, the integrated representations of miRNAs and diseases are input into the bilinear decoder to obtain the final predicted association scores. Corresponding experiments prove that the proposed method outperforms existing advanced approaches in MDA prediction. Furthermore, case studies related to two human cancers provide further confirmation of the reliability of MGADAE in practice.


Asunto(s)
MicroARNs , Neoplasias , Humanos , MicroARNs/genética , Reproducibilidad de los Resultados , Biología Computacional/métodos , Neoplasias/genética , Algoritmos
7.
Front Genet ; 14: 1249171, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37614816

RESUMEN

Identification of disease-associated long non-coding RNAs (lncRNAs) is crucial for unveiling the underlying genetic mechanisms of complex diseases. Multiple types of similarity networks of lncRNAs (or diseases) can complementary and comprehensively characterize their similarities. Hence, in this study, we presented a computational model iLncDA-RSN based on reliable similarity networks for identifying potential lncRNA-disease associations (LDAs). Specifically, for constructing reliable similarity networks of lncRNAs and diseases, miRNA heuristic information with lncRNAs and diseases is firstly introduced to construct their respective Jaccard similarity networks; then Gaussian interaction profile (GIP) kernel similarity networks and Jaccard similarity networks of lncRNAs and diseases are provided based on the lncRNA-disease association network; a random walk with restart strategy is finally applied on Jaccard similarity networks, GIP kernel similarity networks, as well as lncRNA functional similarity network and disease semantic similarity network to construct reliable similarity networks. Depending on the lncRNA-disease association network and the reliable similarity networks, feature vectors of lncRNA-disease pairs are integrated from lncRNA and disease perspectives respectively, and then dimensionality reduced by the elastic net. Two random forests are at last used together on different lncRNA-disease association feature sets to identify potential LDAs. The iLncDA-RSN is evaluated by five-fold cross-validation to analyse its prediction performance, results of which show that the iLncDA-RSN outperforms the compared models. Furthermore, case studies of different complex diseases demonstrate the effectiveness of the iLncDA-RSN in identifying potential LDAs.

8.
IEEE J Biomed Health Inform ; 27(10): 5187-5198, 2023 10.
Artículo en Inglés | MEDLINE | ID: mdl-37498764

RESUMEN

Advances in omics technology have enriched the understanding of the biological mechanisms of diseases, which has provided a new approach for cancer research. Multi-omics data contain different levels of cancer information, and comprehensive analysis of them has attracted wide attention. However, limited by the dimensionality of matrix models, traditional methods cannot fully use the key high-dimensional global structure of multi-omics data. Moreover, besides global information, local features within each omics are also critical. It is necessary to consider the potential local information together with the high-dimensional global information, ensuring that the shared and complementary features of the omics data are comprehensively observed. In view of the above, this article proposes a new tensor integrative framework called the strong complementarity tensor decomposition model (BioSTD) for cancer multi-omics data. It is used to identify cancer subtype specific genes and cluster subtype samples. Different from the matrix framework, BioSTD utilizes multi-view tensors to coordinate each omics to maximize high-dimensional spatial relationships, which jointly considers the different characteristics of different omics data. Meanwhile, we propose the concept of strong complementarity constraint applicable to omics data and introduce it into BioSTD. Strong complementarity is used to explore the potential local information, which can enhance the separability of different subtypes, allowing consistency and complementarity in the omics data to be fully represented. Experimental results on real cancer datasets show that our model outperforms other advanced models, which confirms its validity.


Asunto(s)
Neoplasias , Humanos , Neoplasias/genética , Multiómica
9.
BMC Genomics ; 24(1): 426, 2023 Jul 29.
Artículo en Inglés | MEDLINE | ID: mdl-37516822

RESUMEN

Comprehensive analysis of multiple data sets can identify potential driver genes for various cancers. In recent years, driver gene discovery based on massive mutation data and gene interaction networks has attracted increasing attention, but there is still a need to explore combining functional and structural information of genes in protein interaction networks to identify driver genes. Therefore, we propose a network embedding framework combining functional and structural information to identify driver genes. Firstly, we combine the mutation data and gene interaction networks to construct mutation integration network using network propagation algorithm. Secondly, the struc2vec model is used for extracting gene features from the mutation integration network, which contains both gene's functional and structural information. Finally, machine learning algorithms are utilized to identify the driver genes. Compared with the previous four excellent methods, our method can find gene pairs that are distant from each other through structural similarities and has better performance in identifying driver genes for 12 cancers in the cancer genome atlas. At the same time, we also conduct a comparative analysis of three gene interaction networks, three gene standard sets, and five machine learning algorithms. Our framework provides a new perspective for feature selection to identify novel driver genes.


Asunto(s)
Algoritmos , Redes Reguladoras de Genes , Estudios de Asociación Genética , Aprendizaje Automático , Mapeo de Interacción de Proteínas
10.
IEEE J Biomed Health Inform ; 27(10): 5199-5209, 2023 10.
Artículo en Inglés | MEDLINE | ID: mdl-37506010

RESUMEN

The development of single-cell RNA sequencing (scRNA-seq) technology has opened up a new perspective for us to study disease mechanisms at the single cell level. Cell clustering reveals the natural grouping of cells, which is a vital step in scRNA-seq data analysis. However, the high noise and dropout of single-cell data pose numerous challenges to cell clustering. In this study, we propose a novel matrix factorization method named NLRRC for single-cell type identification. NLRRC joins non-negative low-rank representation (LRR) and random walk graph regularized NMF (RWNMFC) to accurately reveal the natural grouping of cells. Specifically, we find the lowest rank representation of single-cell samples by non-negative LRR to reduce the difficulty of analyzing high-dimensional samples and capture the global information of the samples. Meanwhile, by using random walk graph regularization (RWGR) and NMF, RWNMFC captures manifold structure and cluster information before generating a cluster allocation matrix. The cluster assignment matrix contains cluster labels, which can be used directly to get the clustering results. The performance of NLRRC is validated on simulated and real single-cell datasets. The results of the experiments illustrate that NLRRC has a significant advantage in single-cell type identification.


Asunto(s)
Algoritmos , Análisis de la Célula Individual , Humanos , Análisis por Conglomerados , Perfilación de la Expresión Génica/métodos
11.
J Comput Biol ; 30(8): 937-947, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37486669

RESUMEN

Determining the association between drug and disease is important in drug development. However, existing approaches for drug-disease associations (DDAs) prediction are too homogeneous in terms of feature extraction. Here, a novel graph representation approach based on light gradient boosting machine (GRLGB) is proposed for prediction of DDAs. After the introduction of the protein into a heterogeneous network, nodes features were extracted from two perspectives: network topology and biological knowledge. Finally, the GRLGB classifier was applied to predict potential DDAs. GRLGB achieved satisfactory results on Bdataset and Fdataset through 10-fold cross-validation. To further prove the reliability of the GRLGB, case studies involving anxiety disorders and clozapine were conducted. The results suggest that GRLGB can identify novel DDAs.


Asunto(s)
Biología Computacional , Proteínas , Reproducibilidad de los Resultados , Biología Computacional/métodos , Algoritmos
12.
Artículo en Inglés | MEDLINE | ID: mdl-37459265

RESUMEN

Increasing microRNAs (miRNAs) have been confirmed to be inextricably linked to various diseases, and the discovery of their associations has become a routine way of treating diseases. To overcome the time-consuming and laborious shortcoming of traditional experiments in verifying the associations of miRNAs and diseases (MDAs), a variety of computational methods have emerged. However, these methods still have many shortcomings in terms of predictive performance and accuracy. In this study, a model based on multiple graph convolutional networks and random forest (MGCNRF) was proposed for the prediction MDAs. Specifically, MGCNRF first mapped miRNA functional similarity and sequence similarity, disease semantic similarity and target similarity, and the known MDAs into four different two-layer heterogeneous networks. Second, MGCNRF applied four heterogeneous networks into four different layered attention graph convolutional networks (GCNs), respectively, to extract MDA embeddings. Finally, MGCNRF integrated the embeddings of every MDA into the features of the miRNA-disease pair and predicted potential MDAs through the random forest (RF). Fivefold cross-validation was applied to verify the prediction performance of MGCNRF, which outperforms the other seven state-of-the-art methods by area under curve. Furthermore, the accuracy and the case studies of different diseases further demonstrate the scientific rationale of MGCNRF. In conclusion, MGCNRF can serve as a scientific tool for predicting potential MDAs.

13.
J Comput Biol ; 30(8): 926-936, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37466461

RESUMEN

Clinical trials indicate that the dysregulation of microRNAs (miRNAs) is closely associated with the development of diseases. Therefore, predicting miRNA-disease associations is significant for studying the pathogenesis of diseases. Since traditional wet-lab methods are resource-intensive, cost-saving computational models can be an effective complementary tool in biological experiments. In this work, a locality-constrained linear coding is proposed to predict associations (ILLCEL). Among them, ILLCEL adopts miRNA sequence similarity, miRNA functional similarity, disease semantic similarity, and interaction profile similarity obtained by locality-constrained linear coding (LLC) as the priori information. Next, features and similarities extracted from multiperspectives are input to the ensemble learning framework to improve the comprehensiveness of the prediction. Significantly, the introduction of hypergraph-regular terms improves the accuracy of prediction by describing complex associations between samples. The results under fivefold cross validation indicate that ILLCEL achieves superior prediction performance. In case studies, known associations are accurately predicted and novel associations are verified in HMDD v3.2, miRCancer, and existing literature. It is concluded that ILLCEL can be served as a powerful tool for inferring potential associations.

14.
J Comput Biol ; 30(8): 848-860, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37471220

RESUMEN

The development of single-cell transcriptome sequencing technologies has opened new ways to study biological phenomena at the cellular level. A key application of such technologies involves the employment of single-cell RNA sequencing (scRNA-seq) data to identify distinct cell types through clustering, which in turn provides evidence for revealing heterogeneity. Despite the promise of this approach, the inherent characteristics of scRNA-seq data, such as higher noise levels and lower coverage, pose major challenges to existing clustering methods and compromise their accuracy. In this study, we propose a method called Adjusted Random walk Graph regularization Sparse Low-Rank Representation (ARGLRR), a practical sparse subspace clustering method, to identify cell types. The fundamental low-rank representation (LRR) model is concerned with the global structure of data. To address the limited ability of the LRR method to capture local structure, we introduced adjusted random walk graph regularization in its framework. ARGLRR allows for the capture of both local and global structures in scRNA-seq data. Additionally, the imposition of similarity constraints into the LRR framework further improves the ability of the proposed model to estimate cell-to-cell similarity and capture global structural relationships between cells. ARGLRR surpasses other advanced comparison approaches on nine known scRNA-seq data sets judging by the results. In the normalized mutual information and Adjusted Rand Index metrics on the scRNA-seq data sets clustering experiments, ARGLRR outperforms the best-performing comparative method by 6.99% and 5.85%, respectively. In addition, we visualize the result using Uniform Manifold Approximation and Projection. Visualization results show that the usage of ARGLRR enhances the separation of different cell types within the similarity matrix.


Asunto(s)
Algoritmos , ARN , Análisis por Conglomerados , Análisis de la Célula Individual/métodos , Análisis de Secuencia de ARN , Perfilación de la Expresión Génica
15.
J Comput Biol ; 30(8): 889-899, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37471239

RESUMEN

The analysis of cancer data from multi-omics can effectively promote cancer research. The main focus of this article is to cluster cancer samples and identify feature genes to reveal the correlation between cancers and genes, with the primary approach being the analysis of multi-view cancer omics data. Our proposed solution, the Multi-View Enhanced Tensor Nuclear Norm and Local Constraint (MVET-LC) model, aims to utilize the consistency and complementarity of omics data to support biological research. The model is designed to maximize the utilization of multi-view data and incorporates a nuclear norm and local constraint to achieve this goal. The first step involves introducing the concept of enhanced partial sum of tensor nuclear norm, which significantly enhances the flexibility of the tensor nuclear norm. After that, we incorporate total variation regularization into the MVET-LC model to further augment its performance. It enables MVET-LC to make use of the relationship between tensor data structures and sparse data while paying attention to the feature details of the tensor data. To tackle the iterative optimization problem of MVET-LC, the alternating direction method of multipliers is utilized. Through experimental validation, it is demonstrated that our proposed model outperforms other comparison models.


Asunto(s)
Algoritmos , Neoplasias , Humanos , Neoplasias/genética , Análisis por Conglomerados
16.
IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 2802-2809, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37285246

RESUMEN

Biclustering algorithms are essential for processing gene expression data. However, to process the dataset, most biclustering algorithms require preprocessing the data matrix into a binary matrix. Regrettably, this type of preprocessing may introduce noise or cause information loss in the binary matrix, which would reduce the biclustering algorithm's ability to effectively obtain the optimal biclusters. In this paper, we propose a new preprocessing method named Mean-Standard Deviation (MSD) to resolve the problem. Additionally, we introduce a new biclustering algorithm called Weight Adjacency Difference Matrix Binary Biclustering (W-AMBB) to effectively process datasets containing overlapping biclusters. The basic idea is to create a weighted adjacency difference matrix by applying weights to a binary matrix that is derived from the data matrix. This allows us to identify genes with significant associations in sample data by efficiently identifying similar genes that respond to specific conditions. Furthermore, the performance of the W-AMBB algorithm was tested on both synthetic and real datasets and compared with other classical biclustering methods. The experiment results demonstrate that the W-AMBB algorithm is significantly more robust than the compared biclustering methods on the synthetic dataset. Additionally, the results of the GO enrichment analysis show that the W-AMBB method possesses biological significance on real datasets.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Análisis por Conglomerados , Expresión Génica
17.
IEEE J Biomed Health Inform ; 27(7): 3686-3694, 2023 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-37163398

RESUMEN

Identifying drug-disease associations (DDAs) is critical to the development of drugs. Traditional methods to determine DDAs are expensive and inefficient. Therefore, it is imperative to develop more accurate and effective methods for DDAs prediction. Most current DDAs prediction methods utilize original DDAs matrix directly. However, the original DDAs matrix is sparse, which greatly affects the prediction consequences. Hence, a prediction method based on multi-similarities graph convolutional autoencoder (MSGCA) is proposed for DDAs prediction. First, MSGCA integrates multiple drug similarities and disease similarities using centered kernel alignment-based multiple kernel learning (CKA-MKL) algorithm to form new drug similarity and disease similarity, respectively. Second, the new drug and disease similarities are improved by linear neighborhood, and the DDAs matrix is reconstructed by weighted K nearest neighbor profiles. Next, the reconstructed DDAs and the improved drug and disease similarities are integrated into a heterogeneous network. Finally, the graph convolutional autoencoder with attention mechanism is utilized to predict DDAs. Compared with extant methods, MSGCA shows superior results on three datasets. Furthermore, case studies further demonstrate the reliability of MSGCA.


Asunto(s)
Algoritmos , Humanos , Reproducibilidad de los Resultados
18.
BMC Genomics ; 24(1): 279, 2023 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-37226081

RESUMEN

BACKGROUND: Piwi-interacting RNAs (piRNAs) have been proven to be closely associated with human diseases. The identification of the potential associations between piRNA and disease is of great significance for complex diseases. Traditional "wet experiment" is time-consuming and high-priced, predicting the piRNA-disease associations by computational methods is of great significance. METHODS: In this paper, a method based on the embedding transformation graph convolution network is proposed to predict the piRNA-disease associations, named ETGPDA. Specifically, a heterogeneous network is constructed based on the similarity information of piRNA and disease, as well as the known piRNA-disease associations, which is applied to extract low-dimensional embeddings of piRNA and disease based on graph convolutional network with an attention mechanism. Furthermore, the embedding transformation module is developed for the problem of embedding space inconsistency, which is lightweighter, stronger learning ability and higher accuracy. Finally, the piRNA-disease association score is calculated by the similarity of the piRNA and disease embedding. RESULTS: Evaluated by fivefold cross-validation, the AUC of ETGPDA achieves 0.9603, which is better than the other five selected computational models. The case studies based on Head and neck squamous cell carcinoma and Alzheimer's disease further prove the superior performance of ETGPDA. CONCLUSIONS: Hence, the ETGPDA is an effective method for predicting the hidden piRNA-disease associations.


Asunto(s)
Enfermedad de Alzheimer , Neoplasias de Cabeza y Cuello , Humanos , ARN de Interacción con Piwi , Enfermedad de Alzheimer/genética , Aprendizaje , Proyectos de Investigación
19.
IEEE J Biomed Health Inform ; 27(5): 2575-2584, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-37027680

RESUMEN

Single-cell RNA sequencing (scRNA-seq) technology can provide expression profile of single cells, which propels biological research into a new chapter. Clustering individual cells based on their transcriptome is a critical objective of scRNA-seq data analysis. However, the high-dimensional, sparse and noisy nature of scRNA-seq data pose a challenge to single-cell clustering. Therefore, it is urgent to develop a clustering method targeting scRNA-seq data characteristics. Due to its powerful subspace learning capability and robustness to noise, the subspace segmentation method based on low-rank representation (LRR) is broadly used in clustering researches and achieves satisfactory results. In view of this, we propose a personalized low-rank subspace clustering method, namely PLRLS, to learn more accurate subspace structures from both global and local perspectives. Specifically, we first introduce the local structure constraint to capture the local structure information of the data, while helping our method to obtain better inter-cluster separability and intra-cluster compactness. Then, in order to retain the important similarity information that is ignored by the LRR model, we utilize the fractional function to extract similarity information between cells, and introduce this information as the similarity constraint into the LRR framework. The fractional function is an efficient similarity measure designed for scRNA-seq data, which has theoretical and practical implications. In the end, based on the LRR matrix learned from PLRLS, we perform downstream analyses on real scRNA-seq datasets, including spectral clustering, visualization and marker gene identification. Comparative experiments show that the proposed method achieves superior clustering accuracy and robustness.


Asunto(s)
Algoritmos , Análisis de Expresión Génica de una Sola Célula , Humanos , Transcriptoma , Análisis por Conglomerados , Análisis de Datos , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodos
20.
Comput Biol Chem ; 104: 107862, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-37031647

RESUMEN

Single-cell RNA sequencing technology provides a tremendous opportunity for studying disease mechanisms at the single-cell level. Cell type identification is a key step in the research of disease mechanisms. Many clustering algorithms have been proposed to identify cell types. Most clustering algorithms perform similarity calculation before cell clustering. Because clustering and similarity calculation are independent, a low-rank matrix obtained only by similarity calculation may be unable to fully reveal the patterns in single-cell data. In this study, to capture accurate single-cell clustering information, we propose a novel method based on a low-rank representation model, called KGLRR, that combines the low-rank representation approach with K-means clustering. The cluster centroid is updated as the cell dimension decreases to better from new clusters and improve the quality of clustering information. In addition, the low-rank representation model ignores local geometric information, so the graph regularization constraint is introduced. KGLRR is tested on both simulated and real single-cell datasets to validate the effectiveness of the new method. The experimental results show that KGLRR is more robust and accurate in cell type identification than other advanced algorithms.


Asunto(s)
Algoritmos , Análisis por Conglomerados
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...