Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 178
Filtrar
Más filtros

Bases de datos
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38426327

RESUMEN

Cluster assignment is vital to analyzing single-cell RNA sequencing (scRNA-seq) data to understand high-level biological processes. Deep learning-based clustering methods have recently been widely used in scRNA-seq data analysis. However, existing deep models often overlook the interconnections and interactions among network layers, leading to the loss of structural information within the network layers. Herein, we develop a new self-supervised clustering method based on an adaptive multi-scale autoencoder, called scAMAC. The self-supervised clustering network utilizes the Multi-Scale Attention mechanism to fuse the feature information from the encoder, hidden and decoder layers of the multi-scale autoencoder, which enables the exploration of cellular correlations within the same scale and captures deep features across different scales. The self-supervised clustering network calculates the membership matrix using the fused latent features and optimizes the clustering network based on the membership matrix. scAMAC employs an adaptive feedback mechanism to supervise the parameter updates of the multi-scale autoencoder, obtaining a more effective representation of cell features. scAMAC not only enables cell clustering but also performs data reconstruction through the decoding layer. Through extensive experiments, we demonstrate that scAMAC is superior to several advanced clustering and imputation methods in both data clustering and reconstruction. In addition, scAMAC is beneficial for downstream analysis, such as cell trajectory inference. Our scAMAC model codes are freely available at https://github.com/yancy2024/scAMAC.


Asunto(s)
Análisis de Datos , Análisis de Expresión Génica de una Sola Célula , Análisis por Conglomerados , Análisis de Secuencia de ARN , Perfilación de la Expresión Génica , Algoritmos
2.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38935070

RESUMEN

Inferring gene regulatory network (GRN) is one of the important challenges in systems biology, and many outstanding computational methods have been proposed; however there remains some challenges especially in real datasets. In this study, we propose Directed Graph Convolutional neural network-based method for GRN inference (DGCGRN). To better understand and process the directed graph structure data of GRN, a directed graph convolutional neural network is conducted which retains the structural information of the directed graph while also making full use of neighbor node features. The local augmentation strategy is adopted in graph neural network to solve the problem of poor prediction accuracy caused by a large number of low-degree nodes in GRN. In addition, for real data such as E.coli, sequence features are obtained by extracting hidden features using Bi-GRU and calculating the statistical physicochemical characteristics of gene sequence. At the training stage, a dynamic update strategy is used to convert the obtained edge prediction scores into edge weights to guide the subsequent training process of the model. The results on synthetic benchmark datasets and real datasets show that the prediction performance of DGCGRN is significantly better than existing models. Furthermore, the case studies on bladder uroepithelial carcinoma and lung cancer cells also illustrate the performance of the proposed model.


Asunto(s)
Biología Computacional , Redes Reguladoras de Genes , Redes Neurales de la Computación , Humanos , Biología Computacional/métodos , Algoritmos , Neoplasias de la Vejiga Urinaria/genética , Neoplasias de la Vejiga Urinaria/patología , Escherichia coli/genética
3.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38581416

RESUMEN

The inference of gene regulatory networks (GRNs) from gene expression profiles has been a key issue in systems biology, prompting many researchers to develop diverse computational methods. However, most of these methods do not reconstruct directed GRNs with regulatory types because of the lack of benchmark datasets or defects in the computational methods. Here, we collect benchmark datasets and propose a deep learning-based model, DeepFGRN, for reconstructing fine gene regulatory networks (FGRNs) with both regulation types and directions. In addition, the GRNs of real species are always large graphs with direction and high sparsity, which impede the advancement of GRN inference. Therefore, DeepFGRN builds a node bidirectional representation module to capture the directed graph embedding representation of the GRN. Specifically, the source and target generators are designed to learn the low-dimensional dense embedding of the source and target neighbors of a gene, respectively. An adversarial learning strategy is applied to iteratively learn the real neighbors of each gene. In addition, because the expression profiles of genes with regulatory associations are correlative, a correlation analysis module is designed. Specifically, this module not only fully extracts gene expression features, but also captures the correlation between regulators and target genes. Experimental results show that DeepFGRN has a competitive capability for both GRN and FGRN inference. Potential biomarkers and therapeutic drugs for breast cancer, liver cancer, lung cancer and coronavirus disease 2019 are identified based on the candidate FGRNs, providing a possible opportunity to advance our knowledge of disease treatments.


Asunto(s)
Redes Reguladoras de Genes , Neoplasias Hepáticas , Humanos , Biología de Sistemas/métodos , Transcriptoma , Algoritmos , Biología Computacional/métodos
4.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36592058

RESUMEN

The progress of single-cell RNA sequencing (scRNA-seq) has led to a large number of scRNA-seq data, which are widely used in biomedical research. The noise in the raw data and tens of thousands of genes pose a challenge to capture the real structure and effective information of scRNA-seq data. Most of the existing single-cell analysis methods assume that the low-dimensional embedding of the raw data belongs to a Gaussian distribution or a low-dimensional nonlinear space without any prior information, which limits the flexibility and controllability of the model to a great extent. In addition, many existing methods need high computational cost, which makes them difficult to be used to deal with large-scale datasets. Here, we design and develop a depth generation model named Gaussian mixture adversarial autoencoders (scGMAAE), assuming that the low-dimensional embedding of different types of cells follows different Gaussian distributions, integrating Bayesian variational inference and adversarial training, as to give the interpretable latent representation of complex data and discover the statistical distribution of different types of cells. The scGMAAE is provided with good controllability, interpretability and scalability. Therefore, it can process large-scale datasets in a short time and give competitive results. scGMAAE outperforms existing methods in several ways, including dimensionality reduction visualization, cell clustering, differential expression analysis and batch effect removal. Importantly, compared with most deep learning methods, scGMAAE requires less iterations to generate the best results.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de Expresión Génica de una Sola Célula , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Distribución Normal , Teorema de Bayes , Análisis de la Célula Individual/métodos , Análisis por Conglomerados
5.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36631401

RESUMEN

The advances in single-cell ribonucleic acid sequencing (scRNA-seq) allow researchers to explore cellular heterogeneity and human diseases at cell resolution. Cell clustering is a prerequisite in scRNA-seq analysis since it can recognize cell identities. However, the high dimensionality, noises and significant sparsity of scRNA-seq data have made it a big challenge. Although many methods have emerged, they still fail to fully explore the intrinsic properties of cells and the relationship among cells, which seriously affects the downstream clustering performance. Here, we propose a new deep contrastive clustering algorithm called scDCCA. It integrates a denoising auto-encoder and a dual contrastive learning module into a deep clustering framework to extract valuable features and realize cell clustering. Specifically, to better characterize and learn data representations robustly, scDCCA utilizes a denoising Zero-Inflated Negative Binomial model-based auto-encoder to extract low-dimensional features. Meanwhile, scDCCA incorporates a dual contrastive learning module to capture the pairwise proximity of cells. By increasing the similarities between positive pairs and the differences between negative ones, the contrasts at both the instance and the cluster level help the model learn more discriminative features and achieve better cell segregation. Furthermore, scDCCA joins feature learning with clustering, which realizes representation learning and cell clustering in an end-to-end manner. Experimental results of 14 real datasets validate that scDCCA outperforms eight state-of-the-art methods in terms of accuracy, generalizability, scalability and efficiency. Cell visualization and biological analysis demonstrate that scDCCA significantly improves clustering and facilitates downstream analysis for scRNA-seq data. The code is available at https://github.com/WJ319/scDCCA.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de Expresión Génica de una Sola Célula , Humanos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Algoritmos , Análisis por Conglomerados
6.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36715275

RESUMEN

A large number of works have presented the single-cell RNA sequencing (scRNA-seq) to study the diversity and biological functions of cells at the single-cell level. Clustering identifies unknown cell types, which is essential for downstream analysis of scRNA-seq samples. However, the high dimensionality, high noise and pervasive dropout rate of scRNA-seq samples have a significant challenge to the cluster analysis of scRNA-seq samples. Herein, we propose a new adaptive fuzzy clustering model based on the denoising autoencoder and self-attention mechanism called the scDASFK. It implements the comparative learning to integrate cell similar information into the clustering method and uses a deep denoising network module to denoise the data. scDASFK consists of a self-attention mechanism for further denoising where an adaptive clustering optimization function for iterative clustering is implemented. In order to make the denoised latent features better reflect the cell structure, we introduce a new adaptive feedback mechanism to supervise the denoising process through the clustering results. Experiments on 16 real scRNA-seq datasets show that scDASFK performs well in terms of clustering accuracy, scalability and stability. Overall, scDASFK is an effective clustering model with great potential for scRNA-seq samples analysis. Our scDASFK model codes are freely available at https://github.com/LRX2022/scDASFK.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Análisis por Conglomerados , Algoritmos
7.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37861174

RESUMEN

Antiviral peptides (AVPs) are widely found in animals and plants, with high specificity and strong sensitivity to drug-resistant viruses. However, due to the great heterogeneity of different viruses, most of the AVPs have specific antiviral activities. Therefore, it is necessary to identify the specific activities of AVPs on virus types. Most existing studies only identify AVPs, with only a few studies identifying subclasses by training multiple binary classifiers. We develop a two-stage prediction tool named FFMAVP that can simultaneously predict AVPs and their subclasses. In the first stage, we identify whether a peptide is AVP or not. In the second stage, we predict the six virus families and eight species specifically targeted by AVPs based on two multiclass tasks. Specifically, the feature extraction module in the two-stage task of FFMAVP adopts the same neural network structure, in which one branch extracts features based on amino acid feature descriptors and the other branch extracts sequence features. Then, the two types of features are fused for the following task. Considering the correlation between the two tasks of the second stage, a multitask learning model is constructed to improve the effectiveness of the two multiclass tasks. In addition, to improve the effectiveness of the second stage, the network parameters trained through the first-stage data are used to initialize the network parameters in the second stage. As a demonstration, the cross-validation results, independent test results and visualization results show that FFMAVP achieves great advantages in both stages.


Asunto(s)
Algoritmos , Péptidos , Péptidos/química , Redes Neurales de la Computación , Aprendizaje Automático , Antivirales/farmacología , Antivirales/química
8.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36611253

RESUMEN

Although previous studies have revealed that synonymous mutations contribute to various human diseases, distinguishing deleterious synonymous mutations from benign ones is still a challenge in medical genomics. Recently, computational tools have been introduced to predict the harmfulness of synonymous mutations. However, most of these computational tools rely on balanced training sets without considering abundant negative samples that could result in deficient performance. In this study, we propose a computational model that uses a selective ensemble to predict deleterious synonymous mutations (seDSM). We construct several candidate base classifiers for the ensemble using balanced training subsets randomly sampled from the imbalanced benchmark training sets. The diversity measures of the base classifiers are calculated by the pairwise diversity metrics, and the classifiers with the highest diversities are selected for integration using soft voting for synonymous mutation prediction. We also design two strategies for filling in missing values in the imbalanced dataset and constructing models using different pairwise diversity metrics. The experimental results show that a selective ensemble based on double fault with the ensemble strategy EKNNI for filling in missing values is the most effective scheme. Finally, using 40-dimensional biology features, we propose a novel model based on a selective ensemble for predicting deleterious synonymous mutations (seDSM). seDSM outperformed other state-of-the-art methods on the independent test sets according to multiple evaluation indicators, indicating that it has an outstanding predictive performance for deleterious synonymous mutations. We hope that seDSM will be useful for studying deleterious synonymous mutations and advancing our understanding of synonymous mutations. The source code of seDSM is freely accessible at https://github.com/xialab-ahu/seDSM.git.


Asunto(s)
Genómica , Mutación Silenciosa , Humanos , Genómica/métodos , Programas Informáticos , Algoritmos
9.
Brief Bioinform ; 25(1)2023 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-38145949

RESUMEN

Prediction of drug-target interactions (DTIs) is essential in medicine field, since it benefits the identification of molecular structures potentially interacting with drugs and facilitates the discovery and reposition of drugs. Recently, much attention has been attracted to network representation learning to learn rich information from heterogeneous data. Although network representation learning algorithms have achieved success in predicting DTI, several manually designed meta-graphs limit the capability of extracting complex semantic information. To address the problem, we introduce an adaptive meta-graph-based method, termed AMGDTI, for DTI prediction. In the proposed AMGDTI, the semantic information is automatically aggregated from a heterogeneous network by training an adaptive meta-graph, thereby achieving efficient information integration without requiring domain knowledge. The effectiveness of the proposed AMGDTI is verified on two benchmark datasets. Experimental results demonstrate that the AMGDTI method overall outperforms eight state-of-the-art methods in predicting DTI and achieves the accurate identification of novel DTIs. It is also verified that the adaptive meta-graph exhibits flexibility and effectively captures complex fine-grained semantic information, enabling the learning of intricate heterogeneous network topology and the inference of potential drug-target relationship.


Asunto(s)
Algoritmos , Medicina , Benchmarking , Sistemas de Liberación de Medicamentos , Semántica
10.
Brief Bioinform ; 23(2)2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35136924

RESUMEN

Rapid development of single-cell RNA sequencing (scRNA-seq) technology has allowed researchers to explore biological phenomena at the cellular scale. Clustering is a crucial and helpful step for researchers to study the heterogeneity of cell. Although many clustering methods have been proposed, massive dropout events and the curse of dimensionality in scRNA-seq data make it still difficult to analysis because they reduce the accuracy of clustering methods, leading to misidentification of cell types. In this work, we propose the scHFC, which is a hybrid fuzzy clustering method optimized by natural computation based on Fuzzy C Mean (FCM) and Gath-Geva (GG) algorithms. Specifically, principal component analysis algorithm is utilized to reduce the dimensions of scRNA-seq data after it is preprocessed. Then, FCM algorithm optimized by simulated annealing algorithm and genetic algorithm is applied to cluster the data to output a membership matrix, which represents the initial clustering result and is taken as the input for GG algorithm to get the final clustering results. We also develop a cluster number estimation method called multi-index comprehensive estimation, which can estimate the cluster numbers well by combining four clustering effectiveness indexes. The performance of the scHFC method is evaluated on 17 scRNA-seq datasets, and compared with six state-of-the-art methods. Experimental results validate the better performance of our scHFC method in terms of clustering accuracy and stability of algorithm. In short, scHFC is an effective method to cluster cells for scRNA-seq data, and it presents great potential for downstream analysis of scRNA-seq data. The source code is available at https://github.com/WJ319/scHFC.


Asunto(s)
Análisis de la Célula Individual , Programas Informáticos , Algoritmos , Análisis por Conglomerados , Perfilación de la Expresión Génica/métodos , RNA-Seq , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos
11.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-35988921

RESUMEN

Neuropeptides (NPs) are a particular class of informative substances in the immune system and physiological regulation. They play a crucial role in regulating physiological functions in various biological growth and developmental stages. In addition, NPs are crucial for developing new drugs for the treatment of neurological diseases. With the development of molecular biology techniques, some data-driven tools have emerged to predict NPs. However, it is necessary to improve the predictive performance of these tools for NPs. In this study, we developed a deep learning model (NeuroPred-CLQ) based on the temporal convolutional network (TCN) and multi-head attention mechanism to identify NPs effectively and translate the internal relationships of peptide sequences into numerical features by the Word2vec algorithm. The experimental results show that NeuroPred-CLQ learns data information effectively, achieving 93.6% accuracy and 98.8% AUC on the independent test set. The model has better performance in identifying NPs than the state-of-the-art predictors. Visualization of features using t-distribution random neighbor embedding shows that the NeuroPred-CLQ can clearly distinguish the positive NPs from the negative ones. We believe the NeuroPred-CLQ can facilitate drug development and clinical trial studies to treat neurological disorders.


Asunto(s)
Algoritmos , Neuropéptidos , Neuropéptidos/genética , Péptidos/química
12.
Brief Bioinform ; 23(6)2022 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-36125190

RESUMEN

The rapid development of biomedicine has produced a large number of biomedical written materials. These unstructured text data create serious challenges for biomedical researchers to find information. Biomedical named entity recognition (BioNER) and biomedical relation extraction (BioRE) are the two most fundamental tasks of biomedical text mining. Accurately and efficiently identifying entities and extracting relations have become very important. Methods that perform two tasks separately are called pipeline models, and they have shortcomings such as insufficient interaction, low extraction quality and easy redundancy. To overcome the above shortcomings, many deep learning-based joint name entity recognition and relation extraction models have been proposed, and they have achieved advanced performance. This paper comprehensively summarize deep learning models for joint name entity recognition and relation extraction for biomedicine. The joint BioNER and BioRE models are discussed in the light of the challenges existing in the BioNER and BioRE tasks. Five joint BioNER and BioRE models and one pipeline model are selected for comparative experiments on four biomedical public datasets, and the experimental results are analyzed. Finally, we discuss the opportunities for future development of deep learning-based joint BioNER and BioRE models.


Asunto(s)
Aprendizaje Profundo , Minería de Datos/métodos
13.
Brief Bioinform ; 23(6)2022 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-36305457

RESUMEN

With the development of research on the complex aetiology of many diseases, computational drug repositioning methodology has proven to be a shortcut to costly and inefficient traditional methods. Therefore, developing more promising computational methods is indispensable for finding new candidate diseases to treat with existing drugs. In this paper, a model integrating a new variant of message passing neural network and a novel-gated fusion mechanism called GLGMPNN is proposed for drug-disease association prediction. First, a light-gated message passing neural network (LGMPNN), including message passing, aggregation and updating, is proposed to separately extract multiple pieces of information from the similarity networks and the association network. Then, a gated fusion mechanism consisting of a forget gate and an output gate is applied to integrate the multiple pieces of information to extent. The forget gate calculated by the multiple embeddings is built to integrate the association information into the similarity information. Furthermore, the final node representations are controlled by the output gate, which fuses the topology information of the networks and the initial similarity information. Finally, a bilinear decoder is adopted to reconstruct an adjacency matrix for drug-disease associations. Evaluated by 10-fold cross-validations, GLGMPNN achieves excellent performance compared with the current models. The following studies show that our model can effectively discover novel drug-disease associations.


Asunto(s)
Biología Computacional , Redes Neurales de la Computación , Biología Computacional/métodos , Reposicionamiento de Medicamentos/métodos , Algoritmos
14.
PLoS Comput Biol ; 19(8): e1011344, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37651321

RESUMEN

Accumulating evidence suggests that circRNAs play crucial roles in human diseases. CircRNA-disease association prediction is extremely helpful in understanding pathogenesis, diagnosis, and prevention, as well as identifying relevant biomarkers. During the past few years, a large number of deep learning (DL) based methods have been proposed for predicting circRNA-disease association and achieved impressive prediction performance. However, there are two main drawbacks to these methods. The first is these methods underutilize biometric information in the data. Second, the features extracted by these methods are not outstanding to represent association characteristics between circRNAs and diseases. In this study, we developed a novel deep learning model, named iCircDA-NEAE, to predict circRNA-disease associations. In particular, we use disease semantic similarity, Gaussian interaction profile kernel, circRNA expression profile similarity, and Jaccard similarity simultaneously for the first time, and extract hidden features based on accelerated attribute network embedding (AANE) and dynamic convolutional autoencoder (DCAE). Experimental results on the circR2Disease dataset show that iCircDA-NEAE outperforms other competing methods significantly. Besides, 16 of the top 20 circRNA-disease pairs with the highest prediction scores were validated by relevant literature. Furthermore, we observe that iCircDA-NEAE can effectively predict new potential circRNA-disease associations.


Asunto(s)
Algoritmos , ARN Circular , Humanos , ARN Circular/genética , Semántica
15.
J Chem Inf Model ; 2024 Jun 13.
Artículo en Inglés | MEDLINE | ID: mdl-38870455

RESUMEN

Optimization techniques play a pivotal role in advancing drug development, serving as the foundation of numerous generative methods tailored to efficiently design optimized molecules derived from existing lead compounds. However, existing methods often encounter difficulties in generating diverse, novel, and high-property molecules that simultaneously optimize multiple drug properties. To overcome this bottleneck, we propose a multiobjective molecule optimization framework (MOMO). MOMO employs a specially designed Pareto-based multiproperty evaluation strategy at the molecular sequence level to guide the evolutionary search in an implicit chemical space. A comparative analysis of MOMO with five state-of-the-art methods across two benchmark multiproperty molecule optimization tasks reveals that MOMO markedly outperforms them in terms of diversity, novelty, and optimized properties. The practical applicability of MOMO in drug discovery has also been validated on four challenging tasks in the real-world discovery problem. These results suggest that MOMO can provide a useful tool to facilitate molecule optimization problems with multiple properties.

16.
BMC Bioinformatics ; 24(1): 217, 2023 May 26.
Artículo en Inglés | MEDLINE | ID: mdl-37237310

RESUMEN

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) strives to capture cellular diversity with higher resolution than bulk RNA sequencing. Clustering analysis is critical to transcriptome research as it allows for further identification and discovery of new cell types. Unsupervised clustering cannot integrate prior knowledge where relevant information is widely available. Purely unsupervised clustering algorithms may not yield biologically interpretable clusters when confronted with the high dimensionality of scRNA-seq data and frequent dropout events, which makes identification of cell types more challenging. RESULTS: We propose scSemiAAE, a semi-supervised clustering model for scRNA sequence analysis using deep generative neural networks. Specifically, scSemiAAE carefully designs a ZINB adversarial autoencoder-based architecture that inherently integrates adversarial training and semi-supervised modules in the latent space. In a series of experiments on scRNA-seq datasets spanning thousands to tens of thousands of cells, scSemiAAE can significantly improve clustering performance compared to dozens of unsupervised and semi-supervised algorithms, promoting clustering and interpretability of downstream analyses. CONCLUSION: scSemiAAE is a Python-based algorithm implemented on the VSCode platform that provides efficient visualization, clustering, and cell type assignment for scRNA-seq data. The tool is available from https://github.com/WHang98/scSemiAAE .


Asunto(s)
Perfilación de la Expresión Génica , Análisis de Expresión Génica de una Sola Célula , Análisis de la Célula Individual , Transcriptoma , Análisis de Secuencia de ARN , Algoritmos , Análisis por Conglomerados
17.
BMC Genomics ; 24(1): 448, 2023 Aug 09.
Artículo en Inglés | MEDLINE | ID: mdl-37559017

RESUMEN

BACKGROUND: Previous studies have identified that chromosome structure plays a very important role in gene control. The transcription factor Yin Yang 1 (YY1), a multifunctional DNA binding protein, could form a dimer to mediate chromatin loops and active enhancer-promoter interactions. The deletion of YY1 or point mutations at the YY1 binding sites significantly inhibit the enhancer-promoter interactions and affect gene expression. To date, only a few computational methods are available for identifying YY1-mediated chromatin loops. RESULTS: We proposed a novel model named CapsNetYY1, which was based on capsule network architecture to identify whether a pair of YY1 motifs can form a chromatin loop. Firstly, we encode the DNA sequence using one-hot encoding method. Secondly, multi-scale convolution layer is used to extract local features of the sequence, and bidirectional gated recurrent unit is used to learn the features across time steps. Finally, capsule networks (convolution capsule layer and digital capsule layer) used to extract higher level features and recognize YY1-mediated chromatin loops. Compared with DeepYY1, the only prediction for YY1-mediated chromatin loops, our model CapsNetYY1 achieved the better performance on the independent datasets (AUC [Formula: see text]). CONCLUSION: The results indicate that CapsNetYY1 is an excellent method for identifying YY1-mediated chromatin loops. We believe that the CapsNetYY1 method will be used for predictive classification of other DNA sequences.


Asunto(s)
Secuencias Reguladoras de Ácidos Nucleicos , Factor de Transcripción YY1 , Factor de Transcripción YY1/genética , Factor de Transcripción YY1/metabolismo , Inmunoprecipitación de Cromatina , Regiones Promotoras Genéticas , Cromatina/genética
18.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33415333

RESUMEN

Predicting disease-related long non-coding RNAs (lncRNAs) is beneficial to finding of new biomarkers for prevention, diagnosis and treatment of complex human diseases. In this paper, we proposed a machine learning techniques-based classification approach to identify disease-related lncRNAs by graph auto-encoder (GAE) and random forest (RF) (GAERF). First, we combined the relationship of lncRNA, miRNA and disease into a heterogeneous network. Then, low-dimensional representation vectors of nodes were learned from the network by GAE, which reduce the dimension and heterogeneity of biological data. Taking these feature vectors as input, we trained a RF classifier to predict new lncRNA-disease associations (LDAs). Related experiment results show that the proposed method for the representation of lncRNA-disease characterizes them accurately. GAERF achieves superior performance owing to the ensemble learning method, outperforming other methods significantly. Moreover, case studies further demonstrated that GAERF is an effective method to predict LDAs.


Asunto(s)
Neoplasias Pulmonares/genética , Aprendizaje Automático , Redes Neurales de la Computación , Neoplasias de la Próstata/genética , ARN Largo no Codificante/genética , Neoplasias Gástricas/genética , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Biología Computacional/métodos , Gráficos por Computador/estadística & datos numéricos , Árboles de Decisión , Regulación Neoplásica de la Expresión Génica , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/metabolismo , Neoplasias Pulmonares/patología , Masculino , MicroARNs/clasificación , MicroARNs/genética , MicroARNs/metabolismo , Neoplasias de la Próstata/diagnóstico , Neoplasias de la Próstata/metabolismo , Neoplasias de la Próstata/patología , ARN Largo no Codificante/clasificación , ARN Largo no Codificante/metabolismo , Curva ROC , Factores de Riesgo , Neoplasias Gástricas/diagnóstico , Neoplasias Gástricas/metabolismo , Neoplasias Gástricas/patología
19.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33866367

RESUMEN

Although synonymous mutations do not alter the encoded amino acids, they may impact protein function by interfering with the regulation of RNA splicing or altering transcript splicing. New progress on next-generation sequencing technologies has put the exploration of synonymous mutations at the forefront of precision medicine. Several approaches have been proposed for predicting the deleterious synonymous mutations specifically, but their performance is limited by imbalance of the positive and negative samples. In this study, we firstly expanded the number of samples greatly from various data sources and compared six undersampling strategies to solve the problem of the imbalanced datasets. The results suggested that cluster centroid is the most effective scheme. Secondly, we presented a computational model, undersampling scheme based method for deleterious synonymous mutation (usDSM) prediction, using 14-dimensional biology features and random forest classifier to detect the deleterious synonymous mutation. The results on the test datasets indicated that the proposed usDSM model can attain superior performance in comparison with other state-of-the-art machine learning methods. Lastly, we found that the deep learning model did not play a substantial role in deleterious synonymous mutation prediction through a lot of experiments, although it achieves superior results in other fields. In conclusion, we hope our work will contribute to the future development of computational methods for a more accurate prediction of the deleterious effect of human synonymous mutation. The web server of usDSM is freely accessible at http://usdsm.xialab.info/.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Aprendizaje Automático , Modelos Genéticos , Proteínas/genética , Mutación Silenciosa , Humanos , Proteínas/química , Reproducibilidad de los Resultados
20.
Bioinformatics ; 38(15): 3703-3709, 2022 08 02.
Artículo en Inglés | MEDLINE | ID: mdl-35699473

RESUMEN

MOTIVATION: A large number of studies have shown that clustering is a crucial step in scRNA-seq analysis. Most existing methods are based on unsupervised learning without the prior exploitation of any domain knowledge, which does not utilize available gold-standard labels. When confronted by the high dimensionality and general dropout events of scRNA-seq data, purely unsupervised clustering methods may not produce biologically interpretable clusters, which complicate cell type assignment. RESULTS: In this article, we propose a semi-supervised clustering method based on a capsule network named scCNC that integrates domain knowledge into the clustering step. Significantly, we also propose a Semi-supervised Greedy Iterative Training method used to train the whole network. Experiments on some real scRNA-seq datasets show that scCNC can significantly improve clustering performance and facilitate downstream analyses. AVAILABILITY AND IMPLEMENTATION: The source code of scCNC is freely available at https://github.com/WHY-17/scCNC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodos , Análisis por Conglomerados , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA