Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37587836

RESUMO

Recent studies have demonstrated the significant role that circRNA plays in the progression of human diseases. Identifying circRNA-disease associations (CDA) in an efficient manner can offer crucial insights into disease diagnosis. While traditional biological experiments can be time-consuming and labor-intensive, computational methods have emerged as a viable alternative in recent years. However, these methods are often limited by data sparsity and their inability to explore high-order information. In this paper, we introduce a novel method named Knowledge Graph Encoder from Transformer for predicting CDA (KGETCDA). Specifically, KGETCDA first integrates more than 10 databases to construct a large heterogeneous non-coding RNA dataset, which contains multiple relationships between circRNA, miRNA, lncRNA and disease. Then, a biological knowledge graph is created based on this dataset and Transformer-based knowledge representation learning and attentive propagation layers are applied to obtain high-quality embeddings with accurately captured high-order interaction information. Finally, multilayer perceptron is utilized to predict the matching scores of CDA based on their embeddings. Our empirical results demonstrate that KGETCDA significantly outperforms other state-of-the-art models. To enhance user experience, we have developed an interactive web-based platform named HNRBase that allows users to visualize, download data and make predictions using KGETCDA with ease. The code and datasets are publicly available at https://github.com/jinyangwu/KGETCDA.


Assuntos
RNA Circular , RNA Longo não Codificante , Humanos , Reconhecimento Automatizado de Padrão , Aprendizagem , Bases de Dados Factuais , Bases de Conhecimento , Biologia Computacional
2.
Bioinformatics ; 40(3)2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38449288

RESUMO

MOTIVATION: Topologically associating domains (TADs) are fundamental building blocks of 3D genome. TAD-like domains in single cells are regarded as the underlying genesis of TADs discovered in bulk cells. Understanding the organization of TAD-like domains helps to get deeper insights into their regulatory functions. Unfortunately, it remains a challenge to identify TAD-like domains on single-cell Hi-C data due to its ultra-sparsity. RESULTS: We propose scKTLD, an in silico tool for the identification of TAD-like domains on single-cell Hi-C data. It takes Hi-C contact matrix as the adjacency matrix for a graph, embeds the graph structures into a low-dimensional space with the help of sparse matrix factorization followed by spectral propagation, and the TAD-like domains can be identified using a kernel-based changepoint detection in the embedding space. The results tell that our scKTLD is superior to the other methods on the sparse contact matrices, including downsampled bulk Hi-C data as well as simulated and experimental single-cell Hi-C data. Besides, we demonstrated the conservation of TAD-like domain boundaries at single-cell level apart from heterogeneity within and across cell types, and found that the boundaries with higher frequency across single cells are more enriched for architectural proteins and chromatin marks, and they preferentially occur at TAD boundaries in bulk cells, especially at those with higher hierarchical levels. AVAILABILITY AND IMPLEMENTATION: scKTLD is freely available at https://github.com/lhqxinghun/scKTLD.


Assuntos
Cromatina , Cromossomos , Genoma
3.
Ann Rheum Dis ; 83(7): 926-944, 2024 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-38325908

RESUMO

OBJECTIVES: Single-cell and spatial transcriptomics analysis of human knee articular cartilage tissue to present a comprehensive transcriptome landscape and osteoarthritis (OA)-critical cell populations. METHODS: Single-cell RNA sequencing and spatially resolved transcriptomic technology have been applied to characterise the cellular heterogeneity of human knee articular cartilage which were collected from 8 OA donors, and 3 non-OA control donors, and a total of 19 samples. The novel chondrocyte population and marker genes of interest were validated by immunohistochemistry staining, quantitative real-time PCR, etc. The OA-critical cell populations were validated through integrative analyses of publicly available bulk RNA sequencing data and large-scale genome-wide association studies. RESULTS: We identified 33 cell population-specific marker genes that define 11 chondrocyte populations, including 9 known populations and 2 new populations, that is, pre-inflammatory chondrocyte population (preInfC) and inflammatory chondrocyte population (InfC). The novel findings that make this an important addition to the literature include: (1) the novel InfC activates the mediator MIF-CD74; (2) the prehypertrophic chondrocyte (preHTC) and hypertrophic chondrocyte (HTC) are potentially OA-critical cell populations; (3) most OA-associated differentially expressed genes reside in the articular surface and superficial zone; (4) the prefibrocartilage chondrocyte (preFC) population is a major contributor to the stratification of patients with OA, resulting in both an inflammatory-related subtype and a non-inflammatory-related subtype. CONCLUSIONS: Our results highlight InfC, preHTC, preFC and HTC as potential cell populations to target for therapy. Also, we conclude that profiling of those cell populations in patients might be used to stratify patient populations for defining cohorts for clinical trials and precision medicine.


Assuntos
Cartilagem Articular , Condrócitos , Osteoartrite do Joelho , Humanos , Condrócitos/patologia , Condrócitos/metabolismo , Osteoartrite do Joelho/patologia , Osteoartrite do Joelho/genética , Cartilagem Articular/patologia , Cartilagem Articular/metabolismo , Pessoa de Meia-Idade , Masculino , Transcriptoma , Estudo de Associação Genômica Ampla , Feminino , Análise de Célula Única/métodos , Idoso , Perfilação da Expressão Gênica/métodos , Hipertrofia/genética , Multiômica
4.
Nucleic Acids Res ; 50(3): e14, 2022 02 22.
Artigo em Inglês | MEDLINE | ID: mdl-34792173

RESUMO

For many RNA molecules, the secondary structure is essential for the correct function of the RNA. Predicting RNA secondary structure from nucleotide sequences is a long-standing problem in genomics, but the prediction performance has reached a plateau over time. Traditional RNA secondary structure prediction algorithms are primarily based on thermodynamic models through free energy minimization, which imposes strong prior assumptions and is slow to run. Here, we propose a deep learning-based method, called UFold, for RNA secondary structure prediction, trained directly on annotated data and base-pairing rules. UFold proposes a novel image-like representation of RNA sequences, which can be efficiently processed by Fully Convolutional Networks (FCNs). We benchmark the performance of UFold on both within- and cross-family RNA datasets. It significantly outperforms previous methods on within-family datasets, while achieving a similar performance as the traditional methods when trained and tested on distinct RNA families. UFold is also able to predict pseudoknots accurately. Its prediction is fast with an inference time of about 160 ms per sequence up to 1500 bp in length. An online web server running UFold is available at https://ufold.ics.uci.edu. Code is available at https://github.com/uci-cbcl/UFold.


Assuntos
Aprendizado Profundo , RNA , Algoritmos , Pareamento de Bases , Humanos , Conformação de Ácido Nucleico , RNA/química , RNA/genética
5.
Nucleic Acids Res ; 50(21): e121, 2022 11 28.
Artigo em Inglês | MEDLINE | ID: mdl-36130281

RESUMO

Multimodal single-cell sequencing technologies provide unprecedented information on cellular heterogeneity from multiple layers of genomic readouts. However, joint analysis of two modalities without properly handling the noise often leads to overfitting of one modality by the other and worse clustering results than vanilla single-modality analysis. How to efficiently utilize the extra information from single cell multi-omics to delineate cell states and identify meaningful signal remains as a significant computational challenge. In this work, we propose a deep learning framework, named SAILERX, for efficient, robust, and flexible analysis of multi-modal single-cell data. SAILERX consists of a variational autoencoder with invariant representation learning to correct technical noises from sequencing process, and a multimodal data alignment mechanism to integrate information from different modalities. Instead of performing hard alignment by projecting both modalities to a shared latent space, SAILERX encourages the local structures of two modalities measured by pairwise similarities to be similar. This strategy is more robust against overfitting of noises, which facilitates various downstream analysis such as clustering, imputation, and marker gene detection. Furthermore, the invariant representation learning part enables SAILERX to perform integrative analysis on both multi- and single-modal datasets, making it an applicable and scalable tool for more general scenarios.


Assuntos
Genômica , Multiômica , Análise por Conglomerados , Análise de Célula Única
6.
Bioinformatics ; 37(Suppl_1): i317-i326, 2021 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-34252968

RESUMO

MOTIVATION: Single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) provides new opportunities to dissect epigenomic heterogeneity and elucidate transcriptional regulatory mechanisms. However, computational modeling of scATAC-seq data is challenging due to its high dimension, extreme sparsity, complex dependencies and high sensitivity to confounding factors from various sources. RESULTS: Here, we propose a new deep generative model framework, named SAILER, for analyzing scATAC-seq data. SAILER aims to learn a low-dimensional nonlinear latent representation of each cell that defines its intrinsic chromatin state, invariant to extrinsic confounding factors like read depth and batch effects. SAILER adopts the conventional encoder-decoder framework to learn the latent representation but imposes additional constraints to ensure the independence of the learned representations from the confounding factors. Experimental results on both simulated and real scATAC-seq datasets demonstrate that SAILER learns better and biologically more meaningful representations of cells than other methods. Its noise-free cell embeddings bring in significant benefits in downstream analyses: clustering and imputation based on SAILER result in 6.9% and 18.5% improvements over existing methods, respectively. Moreover, because no matrix factorization is involved, SAILER can easily scale to process millions of cells. We implemented SAILER into a software package, freely available to all for large-scale scATAC-seq data analysis. AVAILABILITY AND IMPLEMENTATION: The software is publicly available at https://github.com/uci-cbcl/SAILER. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Análise de Célula Única , Epigenômica , Análise de Sequência de RNA , Software , Transposases
7.
IEEE J Biomed Health Inform ; 27(11): 5655-5664, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37669210

RESUMO

Non-coding RNAs (ncRNAs) are a class of RNA molecules that lack the ability to encode proteins in human cells, but play crucial roles in various biological process. Understanding the interactions between different ncRNAs and their impact on diseases can significantly contribute to diagnosis, prevention, and treatment of diseases. However, predicting tertiary interactions between ncRNAs and diseases based on structural information in multiple scales remains a challenging task. To address this challenge, we propose a method called BertNDA, aiming to predict potential relationships between miRNAs, lncRNAs, and diseases. The framework identifies the local information through connectionless subgraph, which aggregate neighbor nodes' feature. And global information is extracted by leveraging Laplace transform of graph structures and WL (Weisfeiler-Lehman) absolute role coding. Additionally, an EMLP (Element-wise MLP) structure is designed to fuse pairwise global information. The transformer-encoder is employed as the backbone of our approach, followed by a prediction-layer to output the final correlation score. Extensive experiments demonstrate that BertNDA outperforms state-of-the-art methods in prediction assignment and exhibits significant potential for various biological applications. Moreover, we develop an online prediction platform that incorporates the prediction model, providing users with an intuitive and interactive experience. Overall, our model offers an efficient, accurate, and comprehensive tool for predicting tertiary associations between ncRNAs and diseases.


Assuntos
MicroRNAs , RNA Longo não Codificante , Humanos , Fontes de Energia Elétrica
8.
Artigo em Inglês | MEDLINE | ID: mdl-37027676

RESUMO

Long non-coding RNAs (LncRNAs) serve a vital role in regulating gene expressions and other biological processes. Differentiation of lncRNAs from protein-coding transcripts helps researchers dig into the mechanism of lncRNA formation and its downstream regulations related to various diseases. Previous works have been proposed to identify lncRNAs, including traditional bio-sequencing and machine learning approaches. Considering the tedious work of biological characteristic-based feature extraction procedures and inevitable artifacts during bio-sequencing processes, those lncRNA detection methods are not always satisfactory. Hence, in this work, we presented lncDLSM, a deep learning-based framework differentiating lncRNA from other protein-coding transcripts without dependencies on prior biological knowledge. lncDLSM is a helpful tool for identifying lncRNAs compared with other biological feature-based machine learning methods and can be applied to other species by transfer learning achieving satisfactory results. Further experiments showed that different species display distinct boundaries among distributions corresponding to the homology and the specificity among species, respectively. An online web server is provided to the community for easy use and efficient identification of lncRNA, available at http://39.106.16.168/lncDLSM.

9.
IEEE/ACM Trans Comput Biol Bioinform ; 17(5): 1721-1728, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-30951477

RESUMO

DNA methylation plays an important role in the regulation of some biological processes. Up to now, with the development of machine learning models, there are several sequence-based deep learning models designed to predict DNA methylation states, which gain better performance than traditional methods like random forest and SVM. However, convolutional network based deep learning models that use one-hot encoding DNA sequence as input may discover limited information and cause unsatisfactory prediction performance, so more data and model structures of diverse angles should be considered. In this work, we proposed a hybrid sequence-based deep learning model with both MeDIP-seq data and Histone information to predict DNA methylated CpG states (MHCpG). We combined both MeDIP-seq data and histone modification data with sequence information and implemented convolutional network to discover sequence patterns. In addition, we used statistical data gained from previous three input data and adopted a 3-layer feedforward neuron network to extract more high-level features. We compared our method with traditional predicting methods using random forest and other previous methods like CpGenie and DeepCpG, the result showed that MHCpG exceeded the other approaches and gained more satisfactory performance.


Assuntos
Metilação de DNA/genética , Aprendizado Profundo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Código das Histonas/genética , Análise de Sequência de DNA/métodos , Linhagem Celular Tumoral , Biologia Computacional/métodos , DNA/genética , Humanos
10.
Sci Adv ; 6(51)2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-33355120

RESUMO

Characterizing genome-wide binding profiles of transcription factors (TFs) is essential for understanding biological processes. Although techniques have been developed to assess binding profiles within a population of cells, determining them at a single-cell level remains elusive. Here, we report scFAN (single-cell factor analysis network), a deep learning model that predicts genome-wide TF binding profiles in individual cells. scFAN is pretrained on genome-wide bulk assay for transposase-accessible chromatin sequencing (ATAC-seq), DNA sequence, and chromatin immunoprecipitation sequencing (ChIP-seq) data and uses single-cell ATAC-seq to predict TF binding in individual cells. We demonstrate the efficacy of scFAN by both studying sequence motifs enriched within predicted binding peaks and using predicted TFs for discovering cell types. We develop a new metric "TF activity score" to characterize each cell and show that activity scores can reliably capture cell identities. scFAN allows us to discover and study cellular identities and heterogeneity based on chromatin accessibility profiles.

11.
Sci Rep ; 7(1): 14482, 2017 11 03.
Artigo em Inglês | MEDLINE | ID: mdl-29101378

RESUMO

Cumulative evidence from biological experiments has confirmed that microRNAs (miRNAs) are related to many types of human diseases through different biological processes. It is anticipated that precise miRNA-disease association prediction could not only help infer potential disease-related miRNA but also boost human diagnosis and disease prevention. Considering the limitations of previous computational models, a more effective computational model needs to be implemented to predict miRNA-disease associations. In this work, we first constructed a human miRNA-miRNA similarity network utilizing miRNA-miRNA functional similarity data and heterogeneous miRNA Gaussian interaction profile kernel similarities based on the assumption that similar miRNAs with similar functions tend to be associated with similar diseases, and vice versa. Then, we constructed disease-disease similarity using disease semantic information and heterogeneous disease-related interaction data. We proposed a deep ensemble model called DeepMDA that extracts high-level features from similarity information using stacked autoencoders and then predicts miRNA-disease associations by adopting a 3-layer neural network. In addition to five-fold cross-validation, we also proposed another cross-validation method to evaluate the performance of the model. The results show that the proposed model is superior to previous methods with high robustness.


Assuntos
Doença , MicroRNAs/metabolismo , Modelos Biológicos , Área Sob a Curva , Humanos , Aprendizado de Máquina , Redes Neurais de Computação , Curva ROC
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA