Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 73
Filtrar
1.
Nucleic Acids Res ; 52(8): 4137-4150, 2024 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-38572749

RESUMO

DNA motifs are crucial patterns in gene regulation. DNA-binding proteins (DBPs), including transcription factors, can bind to specific DNA motifs to regulate gene expression and other cellular activities. Past studies suggest that DNA shape features could be subtly involved in DNA-DBP interactions. Therefore, the shape motif annotations based on intrinsic DNA topology can deepen the understanding of DNA-DBP binding. Nevertheless, high-throughput tools for DNA shape motif discovery that incorporate multiple features altogether remain insufficient. To address it, we propose a series of methods to discover non-redundant DNA shape motifs with the generalization to multiple motifs in multiple shape features. Specifically, an existing Gibbs sampling method is generalized to multiple DNA motif discovery with multiple shape features. Meanwhile, an expectation-maximization (EM) method and a hybrid method coupling EM with Gibbs sampling are proposed and developed with promising performance, convergence capability, and efficiency. The discovered DNA shape motif instances reveal insights into low-signal ChIP-seq peak summits, complementing the existing sequence motif discovery works. Additionally, our modelling captures the potential interplays across multiple DNA shape features. We provide a valuable platform of tools for DNA shape motif discovery. An R package is built for open accessibility and long-lasting impact: https://zenodo.org/doi/10.5281/zenodo.10558980.


Assuntos
DNA , Motivos de Nucleotídeos , DNA/química , DNA/genética , DNA/metabolismo , Proteínas de Ligação a DNA/metabolismo , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/genética , Algoritmos , Conformação de Ácido Nucleico , Sequenciamento de Cromatina por Imunoprecipitação/métodos , Sítios de Ligação , Fatores de Transcrição/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/química , Humanos , Ligação Proteica
2.
Artigo em Inglês | MEDLINE | ID: mdl-38578856

RESUMO

Accurate screening of cancer types is crucial for effective cancer detection and precise treatment selection. However, the association between gene expression profiles and tumors is often limited to a small number of biomarker genes. While computational methods using nature-inspired algorithms have shown promise in selecting predictive genes, existing techniques are limited by inefficient search and poor generalization across diverse datasets. This study presents a framework termed Evolutionary Optimized Diverse Ensemble Learning (EODE) to improve ensemble learning for cancer classification from gene expression data. The EODE methodology combines an intelligent grey wolf optimization algorithm for selective feature space reduction, guided random injection modeling for ensemble diversity enhancement, and subset model optimization for synergistic classifier combinations. Extensive experiments were conducted across 35 gene expression benchmark datasets encompassing varied cancer types. Results demonstrated that EODE obtained significantly improved screening accuracy over individual and conventionally aggregated models. The integrated optimization of advanced feature selection, directed specialized modeling, and cooperative classifier ensembles helps address key challenges in current nature-inspired approaches. This provides an effective framework for robust and generalized ensemble learning with gene expression biomarkers. Specifically, we have opened EODE source code on Github at https://github.com/wangxb96/EODE.

3.
Adv Sci (Weinh) ; 11(16): e2307280, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38380499

RESUMO

Single-cell RNA sequencing (scRNA-seq) is a robust method for studying gene expression at the single-cell level, but accurately quantifying genetic material is often hindered by limited mRNA capture, resulting in many missing expression values. Existing imputation methods rely on strict data assumptions, limiting their broader application, and lack reliable supervision, leading to biased signal recovery. To address these challenges, authors developed Bis, a distribution-agnostic deep learning model for accurately recovering missing sing-cell gene expression from multiple platforms. Bis is an optimal transport-based autoencoder model that can capture the intricate distribution of scRNA-seq data while addressing the characteristic sparsity by regularizing the cellular embedding space. Additionally, they propose a module using bulk RNA-seq data to guide reconstruction and ensure expression consistency. Experimental results show Bis outperforms other models across simulated and real datasets, showcasing superiority in various downstream analyses including batch effect removal, clustering, differential expression analysis, and trajectory inference. Moreover, Bis successfully restores gene expression levels in rare cell subsets in a tumor-matched peripheral blood dataset, revealing developmental characteristics of cytokine-induced natural killer cells within a head and neck squamous cell carcinoma microenvironment.


Assuntos
Aprendizado Profundo , Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Análise de Sequência de RNA/métodos , Perfilação da Expressão Gênica/métodos
4.
Bone ; 182: 117050, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38367924

RESUMO

Postmenopausal osteoporosis (PMOP) is a common kind of osteoporosis that is associated with excessive osteocyte death and bone loss. Previous studies have shown that TNF-α-induced osteocyte necroptosis might exert a stronger effect on PMOP than apoptosis, and TLR4 can also induce cell necroptosis, as confirmed by recent studies. However, little is known about the relationship between TNF-α-induced osteocyte necroptosis and TLR4. In the present study, we showed that TNF-α increased the expression of TLR4, which promoted osteocyte necroptosis in PMOP. In patients with PMOP, TLR4 was highly expressed at skeletal sites where exists osteocyte necroptosis, and high TLR4 expression is correlated with enhanced TNF-α expression. Osteocytes exhibited robust TLR4 expression upon exposure to necroptotic osteocytes in vivo and in vitro. Western blotting and immunofluorescence analyses demonstrated that TNF-α upregulated TLR4 expression in vitro, which might further promote osteocyte necroptosis. Furthermore, inhibition of TLR4 by TAK-242 in vitro effectively blocked osteocyte necroptosis induced by TNF-α. Collectively, these results suggest a novel TLR4-mediated process of osteocyte necroptosis, which might increase osteocyte death and bone loss in the process of PMOP.


Assuntos
Osteócitos , Osteoporose Pós-Menopausa , Receptor 4 Toll-Like , Fator de Necrose Tumoral alfa , Feminino , Humanos , Necroptose , Osteócitos/metabolismo , Osteoporose Pós-Menopausa/metabolismo , Receptor 4 Toll-Like/metabolismo , Fator de Necrose Tumoral alfa/metabolismo
5.
Methods ; 223: 65-74, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38280472

RESUMO

MicroRNAs (miRNAs) are vital in regulating gene expression through binding to specific target sites on messenger RNAs (mRNAs), a process closely tied to cancer pathogenesis. Identifying miRNA functional targets is essential but challenging, due to incomplete genome annotation and an emphasis on known miRNA-mRNA interactions, restricting predictions of unknown ones. To address those challenges, we have developed a deep learning model based on miRNA functional target identification, named miTDS, to investigate miRNA-mRNA interactions. miTDS first employs a scoring mechanism to eliminate unstable sequence pairs and then utilizes a dynamic word embedding model based on the transformer architecture, enabling a comprehensive analysis of miRNA-mRNA interaction sites by harnessing the global contextual associations of each nucleotide. On this basis, miTDS fuses extended seed alignment representations learned in the multi-scale attention mechanism module with dynamic semantic representations extracted in the RNA-based dual-path module, which can further elucidate and predict miRNA and mRNA functions and interactions. To validate the effectiveness of miTDS, we conducted a thorough comparison with state-of-the-art miRNA-mRNA functional target prediction methods. The evaluation, performed on a dataset cross-referenced with entries from MirTarbase and Diana-TarBase, revealed that miTDS surpasses current methods in accurately predicting functional targets. In addition, our model exhibited proficiency in identifying A-to-I RNA editing sites, which represents an aberrant interaction that yields valuable insights into the suppression of cancerous processes.


Assuntos
Aprendizado Profundo , MicroRNAs , MicroRNAs/genética , RNA Mensageiro/genética , Nucleotídeos , Edição de RNA
6.
Sci Rep ; 14(1): 395, 2024 01 03.
Artigo em Inglês | MEDLINE | ID: mdl-38172255

RESUMO

In recent times, a new wave of scientific and technological advancements has significantly reshaped the global economic structure. This shift has redefined the role of regional innovation, particularly in its contribution to developing the Guangdong-Hong Kong-Macao Greater Bay area (GBA) into a renowned center for science, technology, and innovation. This study constructs a comprehensive evaluation system for the Regional Innovation Ecosystem (RIE). By applying the coupling coordination degree model and social network analysis, we have extensively analyzed the spatial structure and network attributes of the coupled and coordinated innovation ecosystem in the GBA from 2010 to 2019. Our findings reveal several key developments: (1) There has been a noticeable rightward shift in the kernel density curve, indicating an ongoing optimization of the overall coupling coordination level. Notably, the center of gravity for coupling coordination has progressively moved southeast. This shift has led to a reduction in the elliptical area each year, while the trend surface consistently shows a convex orientation toward the center. The most significant development is observed along the 'Guangdong-Shenzhen-Hong Kong-Macao Science and Technology Innovation Corridor', where the level of coupling coordination has become increasingly pronounced. (2) The spatial linkages within the GBA have been strengthening. There are significant spatial transaction costs in the regional innovation ecological network. In the context of the 2019 US-China trade war, the cities of Jiangmen and Zhaoqing experienced a notable decrease in connectivity with other cities, raising concerns about their potential marginalization. (3) Guangzhou, Shenzhen, and Hong Kong have emerged as core nodes within the network. The network exhibits a distinctive "core-edge" spatial structure, characterized by both robustness and vulnerability in various aspects.


Assuntos
Ecossistema , Hong Kong , Macau , China , Cidades
7.
Comput Biol Med ; 168: 107753, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-38039889

RESUMO

BACKGROUND: Trans-acting factors are of special importance in transcription regulation, which is a group of proteins that can directly or indirectly recognize or bind to the 8-12 bp core sequence of cis-acting elements and regulate the transcription efficiency of target genes. The progressive development in high-throughput chromatin capture technology (e.g., Hi-C) enables the identification of chromatin-interacting sequence groups where trans-acting DNA motif groups can be discovered. The problem difficulty lies in the combinatorial nature of DNA sequence pattern matching and its underlying sequence pattern search space. METHOD: Here, we propose to develop MotifHub for trans-acting DNA motif group discovery on grouped sequences. Specifically, the main approach is to develop probabilistic modeling for accommodating the stochastic nature of DNA motif patterns. RESULTS: Based on the modeling, we develop global sampling techniques based on EM and Gibbs sampling to address the global optimization challenge for model fitting with latent variables. The results reflect that our proposed approaches demonstrate promising performance with linear time complexities. CONCLUSION: MotifHub is a novel algorithm considering the identification of both DNA co-binding motif groups and trans-acting TFs. Our study paves the way for identifying hub TFs of stem cell development (OCT4 and SOX2) and determining potential therapeutic targets of prostate cancer (FOXA1 and MYC). To ensure scientific reproducibility and long-term impact, its matrix-algebra-optimized source code is released at http://bioinfo.cs.cityu.edu.hk/MotifHub.


Assuntos
Algoritmos , Software , Motivos de Nucleotídeos/genética , Reprodutibilidade dos Testes , Cromatina/genética
8.
Bioinformatics ; 39(11)2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37934154

RESUMO

MOTIVATION: Recent frameworks based on deep learning have been developed to identify cancer subtypes from high-throughput gene expression profiles. Unfortunately, the performance of deep learning is highly dependent on its neural network architectures which are often hand-crafted with expertise in deep neural networks, meanwhile, the optimization and adjustment of the network are usually costly and time consuming. RESULTS: To address such limitations, we proposed a fully automated deep neural architecture search model for diagnosing consensus molecular subtypes from gene expression data (DNAS). The proposed model uses ant colony algorithm, one of the heuristic swarm intelligence algorithms, to search and optimize neural network architecture, and it can automatically find the optimal deep learning model architecture for cancer diagnosis in its search space. We validated DNAS on eight colorectal cancer datasets, achieving the average accuracy of 95.48%, the average specificity of 98.07%, and the average sensitivity of 96.24%, respectively. Without the loss of generality, we investigated the general applicability of DNAS further on other cancer types from different platforms including lung cancer and breast cancer, and DNAS achieved an area under the curve of 95% and 96%, respectively. In addition, we conducted gene ontology enrichment and pathological analysis to reveal interesting insights into cancer subtype identification and characterization across multiple cancer types. AVAILABILITY AND IMPLEMENTATION: The source code and data can be downloaded from https://github.com/userd113/DNAS-main. And the web server of DNAS is publicly accessible at 119.45.145.120:5001.


Assuntos
Neoplasias da Mama , Aprendizado Profundo , Humanos , Feminino , Redes Neurais de Computação , Algoritmos , Software
9.
Adv Sci (Weinh) ; 10(33): e2303502, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37816141

RESUMO

Single-cell Hi-C (scHi-C) has made it possible to analyze chromatin organization at the single-cell level. However, scHi-C experiments generate inherently sparse data, which poses a challenge for loop calling methods. The existing approach performs significance tests across the imputed dense contact maps, leading to substantial computational overhead and loss of information at the single-cell level. To overcome this limitation, a lightweight framework called scGSLoop is proposed, which sets a new paradigm for scHi-C loop calling by adapting the training and inferencing strategies of graph-based deep learning to leverage the sequence features and 1D positional information of genomic loci. With this framework, sparsity is no longer a challenge, but rather an advantage that the model leverages to achieve unprecedented computational efficiency. Compared to existing methods, scGSLoop makes more accurate predictions and is able to identify more loops that have the potential to play regulatory roles in genome functioning. Moreover, scGSLoop preserves single-cell information by identifying a distinct group of loops for each individual cell, which not only enables an understanding of the variability of chromatin looping states between cells, but also allows scGSLoop to be extended for the investigation of multi-connected hubs and their underlying mechanisms.


Assuntos
Cromatina , Genômica , Cromatina/genética , Genoma
10.
Nat Commun ; 14(1): 6824, 2023 10 26.
Artigo em Inglês | MEDLINE | ID: mdl-37884495

RESUMO

RNA-binding proteins play crucial roles in the regulation of gene expression, and understanding the interactions between RNAs and RBPs in distinct cellular conditions forms the basis for comprehending the underlying RNA function. However, current computational methods pose challenges to the cross-prediction of RNA-protein binding events across diverse cell lines and tissue contexts. Here, we develop HDRNet, an end-to-end deep learning-based framework to precisely predict dynamic RBP binding events under diverse cellular conditions. Our results demonstrate that HDRNet can accurately and efficiently identify binding sites, particularly for dynamic prediction, outperforming other state-of-the-art models on 261 linear RNA datasets from both eCLIP and CLIP-seq, supplemented with additional tissue data. Moreover, we conduct motif and interpretation analyses to provide fresh insights into the pathological mechanisms underlying RNA-RBP interactions from various perspectives. Our functional genomic analysis further explores the gene-human disease associations, uncovering previously uncharacterized observations for a broad range of genetic disorders.


Assuntos
Proteínas de Ligação a RNA , RNA , Humanos , RNA/genética , RNA/metabolismo , Proteínas de Ligação a RNA/metabolismo , Sítios de Ligação/genética , Ligação Proteica , Sequenciamento de Cromatina por Imunoprecipitação
11.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37455245

RESUMO

The rapid growth of omics-based data has revolutionized biomedical research and precision medicine, allowing machine learning models to be developed for cutting-edge performance. However, despite the wealth of high-throughput data available, the performance of these models is hindered by the lack of sufficient training data, particularly in clinical research (in vivo experiments). As a result, translating this knowledge into clinical practice, such as predicting drug responses, remains a challenging task. Transfer learning is a promising tool that bridges the gap between data domains by transferring knowledge from the source to the target domain. Researchers have proposed transfer learning to predict clinical outcomes by leveraging pre-clinical data (mouse, zebrafish), highlighting its vast potential. In this work, we present a comprehensive literature review of deep transfer learning methods for health informatics and clinical decision-making, focusing on high-throughput molecular data. Previous reviews mostly covered image-based transfer learning works, while we present a more detailed analysis of transfer learning papers. Furthermore, we evaluated original studies based on different evaluation settings across cross-validations, data splits and model architectures. The result shows that those transfer learning methods have great potential; high-throughput sequencing data and state-of-the-art deep learning models lead to significant insights and conclusions. Additionally, we explored various datasets in transfer learning papers with statistics and visualization.


Assuntos
Benchmarking , Peixe-Zebra , Animais , Camundongos , Peixe-Zebra/genética , Aprendizado de Máquina , Medicina de Precisão , Tomada de Decisão Clínica
12.
Adv Sci (Weinh) ; 10(22): e2205442, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37290050

RESUMO

Unsupervised clustering is an essential step in identifying cell types from single-cell RNA sequencing (scRNA-seq) data. However, a common issue with unsupervised clustering models is that the optimization direction of the objective function and the final generated clustering labels in the absence of supervised information may be inconsistent or even arbitrary. To address this challenge, a dynamic ensemble pruning framework (DEPF) is proposed to identify and interpret single-cell molecular heterogeneity. In particular, a silhouette coefficient-based indicator is developed to determine the optimization direction of the bi-objective function. In addition, a hierarchical autoencoder is employed to project the high-dimensional data onto multiple low-dimensional latent space sets, and then a clustering ensemble is produced in the latent space by the basic clustering algorithm. Following that, a bi-objective fruit fly optimization algorithm is designed to prune dynamically the low-quality basic clustering in the ensemble. Multiple experiments are conducted on 28 real scRNA-seq datasets and one large real scRNA-seq dataset from diverse platforms and species to validate the effectiveness of the DEPF. In addition, biological interpretability and transcriptional and post-transcriptional regulatory are conducted to explore biological patterns from the cell types identified, which could provide novel insights into characterizing the mechanisms.


Assuntos
Algoritmos , Análise de Célula Única , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Análise por Conglomerados , Regulação da Expressão Gênica
13.
Comput Struct Biotechnol J ; 21: 2454-2470, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37077177

RESUMO

Cancer has received extensive recognition for its high mortality rate, with metastatic cancer being the top cause of cancer-related deaths. Metastatic cancer involves the spread of the primary tumor to other body organs. As much as the early detection of cancer is essential, the timely detection of metastasis, the identification of biomarkers, and treatment choice are valuable for improving the quality of life for metastatic cancer patients. This study reviews the existing studies on classical machine learning (ML) and deep learning (DL) in metastatic cancer research. Since the majority of metastatic cancer research data are collected in the formats of PET/CT and MRI image data, deep learning techniques are heavily involved. However, its black-box nature and expensive computational cost are notable concerns. Furthermore, existing models could be overestimated for their generality due to the non-diverse population in clinical trial datasets. Therefore, research gaps are itemized; follow-up studies should be carried out on metastatic cancer using machine learning and deep learning tools with data in a symmetric manner.

14.
Adv Sci (Weinh) ; 10(11): e2204113, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36762572

RESUMO

The single-cell RNA sequencing (scRNA-seq) quantifies the gene expression of individual cells, while the bulk RNA sequencing (bulk RNA-seq) characterizes the mixed transcriptome of cells. The inference of drug sensitivities for individual cells can provide new insights to understand the mechanism of anti-cancer response heterogeneity and drug resistance at the cellular resolution. However, pharmacogenomic information related to their corresponding scRNA-Seq is often limited. Therefore, a transfer learning model is proposed to infer the drug sensitivities at single-cell level. This framework learns bulk transcriptome profiles and pharmacogenomics information from population cell lines in a large public dataset and transfers the knowledge to infer drug efficacy of individual cells. The results suggest that it is suitable to learn knowledge from pre-clinical cell lines to infer pre-existing cell subpopulations with different drug sensitivities prior to drug exposure. In addition, the model offers a new perspective on drug combinations. It is observed that drug-resistant subpopulation can be sensitive to other drugs (e.g., a subset of JHU006 is Vorinostat-resistant while Gefitinib-sensitive); such finding corroborates the previously reported drug combination (Gefitinib + Vorinostat) strategy in several cancer types. The identified drug sensitivity biomarkers reveal insights into the tumor heterogeneity and treatment at cellular resolution.


Assuntos
Transcriptoma , RNA-Seq/métodos , Gefitinibe , Vorinostat , Transcriptoma/genética , Análise de Sequência de RNA/métodos
15.
RNA ; 29(5): 517-530, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36737104

RESUMO

In recent years, the advances in single-cell RNA-seq techniques have enabled us to perform large-scale transcriptomic profiling at single-cell resolution in a high-throughput manner. Unsupervised learning such as data clustering has become the central component to identify and characterize novel cell types and gene expression patterns. In this study, we review the existing single-cell RNA-seq data clustering methods with critical insights into the related advantages and limitations. In addition, we also review the upstream single-cell RNA-seq data processing techniques such as quality control, normalization, and dimension reduction. We conduct performance comparison experiments to evaluate several popular single-cell RNA-seq clustering approaches on simulated and multiple single-cell transcriptomic data sets.


Assuntos
Algoritmos , Análise da Expressão Gênica de Célula Única , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos , Análise por Conglomerados
16.
J Vasc Surg Venous Lymphat Disord ; 11(3): 626-633, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36787860

RESUMO

OBJECTIVE: To investigate the safety and effectiveness of venous stenting in patients with chronic iliofemoral venous obstruction and secondary lymphedema from malignancy. METHODS: From July 2012 to December 2020, patients with iliofemoral venous obstruction and secondary lymphedema who underwent venous stenting in our institution were reviewed retrospectively. Clinical characteristics, surgical complications, and symptom relief were assessed. Stent patency was evaluated with duplex ultrasound or computed tomographic venography. Twelve-month outcomes were reported. RESULTS: Fifty-three patients with concurrent secondary lymphedema who had stents placed for iliofemoral venous obstruction were included. There were 42 females, and the mean age was 56.9 years. Nonthrombotic iliac vein lesions were identified in 16 patients (30.1%). Immediate technical success was 100%, with an average of two stents implanted. The median Villalta score, and Chronic Venous Disease Quality of Life quality of life questionnaire scores decreased from 12 (IQR, 10-15) and 58 (IQR, 50-66) at baseline, respectively, to 5 (interquartile range [IQR], 4-6) and 28 (IQR, 22-45) at 12 months after the procedure (P < .05), showing significant improvement in the quality of life. At the end of a median follow-up of 12 months (range, 3-25 months), the cumulative primary, assisted primary, and secondary patency rates were 70.8%, 76.9%, and 90.1%, respectively. CONCLUSIONS: In patients with secondary lymphedema from malignancy, venous stent placement is safe and effective for iliofemoral venous obstruction.


Assuntos
Neoplasias , Doenças Vasculares , Feminino , Humanos , Pessoa de Meia-Idade , Estudos Retrospectivos , Qualidade de Vida , Veia Femoral/diagnóstico por imagem , Veia Femoral/cirurgia , Resultado do Tratamento , Stents , Veia Ilíaca/diagnóstico por imagem , Veia Ilíaca/cirurgia , Doença Crônica
17.
Bioinformatics ; 39(2)2023 02 14.
Artigo em Inglês | MEDLINE | ID: mdl-36734596

RESUMO

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) is an increasingly popular technique for transcriptomic analysis of gene expression at the single-cell level. Cell-type clustering is the first crucial task in the analysis of scRNA-seq data that facilitates accurate identification of cell types and the study of the characteristics of their transcripts. Recently, several computational models based on a deep autoencoder and the ensemble clustering have been developed to analyze scRNA-seq data. However, current deep autoencoders are not sufficient to learn the latent representations of scRNA-seq data, and obtaining consensus partitions from these feature representations remains under-explored. RESULTS: To address this challenge, we propose a single-cell deep clustering model via a dual denoising autoencoder with bipartite graph ensemble clustering called scBGEDA, to identify specific cell populations in single-cell transcriptome profiles. First, a single-cell dual denoising autoencoder network is proposed to project the data into a compressed low-dimensional space and that can learn feature representation via explicit modeling of synergistic optimization of the zero-inflated negative binomial reconstruction loss and denoising reconstruction loss. Then, a bipartite graph ensemble clustering algorithm is designed to exploit the relationships between cells and the learned latent embedded space by means of a graph-based consensus function. Multiple comparison experiments were conducted on 20 scRNA-seq datasets from different sequencing platforms using a variety of clustering metrics. The experimental results indicated that scBGEDA outperforms other state-of-the-art methods on these datasets, and also demonstrated its scalability to large-scale scRNA-seq datasets. Moreover, scBGEDA was able to identify cell-type specific marker genes and provide functional genomic analysis by quantifying the influence of genes on cell clusters, bringing new insights into identifying cell types and characterizing the scRNA-seq data from different perspectives. AVAILABILITY AND IMPLEMENTATION: The source code of scBGEDA is available at https://github.com/wangyh082/scBGEDA. The software and the supporting data can be downloaded from https://figshare.com/articles/software/scBGEDA/19657911. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Perfilação da Expressão Gênica , Análise de Sequência de RNA/métodos , Perfilação da Expressão Gênica/métodos , Software , Análise de Célula Única/métodos , Análise por Conglomerados
18.
Commun Biol ; 6(1): 73, 2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36653447

RESUMO

Protein-protein interactions (PPIs) govern cellular pathways and processes, by significantly influencing the functional expression of proteins. Therefore, accurate identification of protein-protein interaction binding sites has become a key step in the functional analysis of proteins. However, since most computational methods are designed based on biological features, there are no available protein language models to directly encode amino acid sequences into distributed vector representations to model their characteristics for protein-protein binding events. Moreover, the number of experimentally detected protein interaction sites is much smaller than that of protein-protein interactions or protein sites in protein complexes, resulting in unbalanced data sets that leave room for improvement in their performance. To address these problems, we develop an ensemble deep learning model (EDLM)-based protein-protein interaction (PPI) site identification method (EDLMPPI). Evaluation results show that EDLMPPI outperforms state-of-the-art techniques including several PPI site prediction models on three widely-used benchmark datasets including Dset_448, Dset_72, and Dset_164, which demonstrated that EDLMPPI is superior to those PPI site prediction models by nearly 10% in terms of average precision. In addition, the biological and interpretable analyses provide new insights into protein binding site identification and characterization mechanisms from different perspectives. The EDLMPPI webserver is available at http://www.edlmppi.top:5002/ .


Assuntos
Aprendizado Profundo , Proteoma , Ligação Proteica , Algoritmos , Sítios de Ligação
19.
Nat Commun ; 14(1): 400, 2023 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-36697410

RESUMO

Single-cell RNA sequencing provides high-throughput gene expression information to explore cellular heterogeneity at the individual cell level. A major challenge in characterizing high-throughput gene expression data arises from challenges related to dimensionality, and the prevalence of dropout events. To address these concerns, we develop a deep graph learning method, scMGCA, for single-cell data analysis. scMGCA is based on a graph-embedding autoencoder that simultaneously learns cell-cell topology representation and cluster assignments. We show that scMGCA is accurate and effective for cell segregation and batch effect correction, outperforming other state-of-the-art models across multiple platforms. In addition, we perform genomic interpretation on the key compressed transcriptomic space of the graph-embedding autoencoder to demonstrate the underlying gene regulation mechanism. We demonstrate that in a pancreatic ductal adenocarcinoma dataset, scMGCA successfully provides annotations on the specific cell types and reveals differential gene expression levels across multiple tumor-associated and cell signalling pathways.


Assuntos
Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Humanos , Perfilação da Expressão Gênica/métodos , Neoplasias Pancreáticas/genética , Regulação da Expressão Gênica , Transcriptoma , Carcinoma Ductal Pancreático/genética , Análise de Célula Única/métodos
20.
IEEE Trans Cybern ; 53(5): 2753-2766, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36251897

RESUMO

Recently, low-rank tensor recovery methods based on subspace representation have received increased attention in the field of hyperspectral image (HSI) denoising. Unfortunately, those methods usually analyze the prior structural information within different dimensions indiscriminately, ignoring the differences between modes, leaving substantial room for improvement. In this article, we first consider the low-rank properties in the subspace and prove that the structure correlation across the nonlocal self-similarity mode is much stronger than in the spatial sparsity and spectral correlation modes. On that basis, we introduce a new multidirectional low-rank regularization, in which each mode is assigned a different weight to characterize its contribution to estimating the tensor rank. After that, integrating the proposed regularization with the subspace-based tensor recovery framework, an optimization model for HSI mixed noise removal is developed. The proposed model can be addressed efficiently via the alternating minimization algorithm. Extensive experiments implemented with synthetic and real data demonstrate that the proposed method significantly outperforms other state-of-the-art HSI denoising methods, which clearly indicates the effectiveness of the proposed approach in HSI denoising.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...