Búsqueda | Portal Regional de la BVS

1.

A Comprehensive Evaluation of Arbitrary Image Style Transfer Methods.

Zhou, Zijun; Tang, Fan; Zhang, Yuxin; Deussen, Oliver; Cao, Juan; Dong, Weiming; Li, Xiangtao; Lee, Tong-Yee.

IEEE Trans Vis Comput Graph ; PP2024 Sep 25.

Artículo en Inglés | MEDLINE | ID: mdl-39320993

RESUMEN

Despite the remarkable process in the field of arbitrary image style transfer (AST), inconsistent evaluation continues to plague style transfer research. Existing methods often suffer from limited objective evaluation and inconsistent subjective feedback, hindering reliable comparisons among AST variants. In this study, we propose a multi-granularity assessment system that combines standardized objective and subjective evaluations. We collect a fine-grained dataset considering a range of image contexts such as different scenes, object complexities, and rich parsing information from multiple sources. Objective and subjective studies are conducted using the collected dataset. Specifically, we innovate on traditional subjective studies by developing an online evaluation system utilizing a combination of point-wise, pair-wise, and group-wise questionnaires. Finally, we bridge the gap between objective and subjective evaluations by examining the consistency between the results from the two studies. We experimentally evaluate CNN-based, flow-based, transformer-based, and diffusion-based AST methods by the proposed multi-granularity assessment system, which lays the foundation for a reliable and robust evaluation. Providing standardized measures, objective data, and detailed subjective feedback empowers researchers to make informed comparisons and drive innovation in this rapidly evolving field. Finally, for the collected dataset and our online evaluation system, please see http://ivc.ia.ac.cn.

2.

Deciphering 3'UTR Mediated Gene Regulation Using Interpretable Deep Representation Learning.

Yang, Yuning; Li, Gen; Pang, Kuan; Cao, Wuxinhao; Zhang, Zhaolei; Li, Xiangtao.

Adv Sci (Weinh) ; : e2407013, 2024 08 19.

Artículo en Inglés | MEDLINE | ID: mdl-39159140

RESUMEN

The 3' untranslated regions (3'UTRs) of messenger RNAs contain many important cis-regulatory elements that are under functional and evolutionary constraints. It is hypothesized that these constraints are similar to grammars and syntaxes in human languages and can be modeled by advanced natural language techniques such as Transformers, which has been very effective in modeling complex protein sequence and structures. Here 3UTRBERT is described, which implements an attention-based language model, i.e., Bidirectional Encoder Representations from Transformers (BERT). 3UTRBERT is pre-trained on aggregated 3'UTR sequences of human mRNAs in a task-agnostic manner; the pre-trained model is then fine-tuned for specific downstream tasks such as identifying RBP binding sites, m6A RNA modification sites, and predicting RNA sub-cellular localizations. Benchmark results show that 3UTRBERT generally outperformed other contemporary methods in each of these tasks. More importantly, the self-attention mechanism within 3UTRBERT allows direct visualization of the semantic relationship between sequence elements and effectively identifies regions with important regulatory potential. It is expected that 3UTRBERT model can serve as the foundational tool to analyze various sequence labeling tasks within the 3'UTR fields, thus enhancing the decipherability of post-transcriptional regulatory mechanisms.

3.

Photoredox Catalytic Deracemization Enabled Enantioselective and Modular Access to Axially Chiral N-Arylquinazolinones.

Liu, Yilin; Chu, Mengqi; Li, Xiangtao; Cao, Zheng; Zhao, Xiaowei; Yin, Yanli; Jiang, Zhiyong.

Angew Chem Int Ed Engl ; : e202411236, 2024 Jul 24.

Artículo en Inglés | MEDLINE | ID: mdl-39045910

RESUMEN

Visible light-driven photocatalytic deracemization is highly esteemed as an ideal tool for organic synthesis due to its exceptional atom economy and synthetic efficiency. Consequently, successful instances of deracemization of allenes have been established, where the activated energy of photosensitizer should surpass that of the substrates, representing an intrinsic requirement. Accordingly, this method is not applicable for axially chiral molecules with significantly high triplet energies. In this study, we present a photoredox catalytic deracemization approach that enables the efficient synthesis of valuable yet challenging-to-access axially chiral 2-azaarene-functionalized quinazolinones. The substrate scope is extensive, allowing for both 3-axis and unmet 1-axis assembly through facile oxidation of diverse central chiral 2,3-dihydroquinazolin-4(1H)-ones that can be easily prepared and achieve enantiomer enrichment via deracemization. Mechanistic studies reveal the importance of photosensitizer selection in attaining excellent chemoselectivity and highlight the indispensability of a chiral Brønsted acid in enabling highly enantioselective protonation to accomplish efficient deracemization.

4.

Unraveling Spatial Domain Characterization in Spatially Resolved Transcriptomics with Robust Graph Contrastive Clustering.

Zhang, Yingxi; Yu, Zhuohan; Wong, Ka-Chun; Li, Xiangtao.

Bioinformatics ; 2024 Jul 16.

Artículo en Inglés | MEDLINE | ID: mdl-39012523

RESUMEN

MOTIVATION: Spatial transcriptomics can quantify gene expression and its spatial distribution in tissues, thus revealing molecular mechanisms of cellular interactions underlying tissue heterogeneity, tissue regeneration, and spatially localized disease mechanisms. However, existing spatial clustering methods often fail to exploit the full potential of spatial information, resulting in inaccurate identification of spatial domains. RESULTS: In this paper, we develop a deep graph contrastive clustering framework, stDGCC, that accurately uncovers underlying spatial domains via explicitly modeling spatial information and gene expression profiles from spatial transcriptomics data. The stDGCC framework proposes a spatially informed graph node embedding model to preserve the topological information of spots and to learn the informative and discriminative characterization of spatial transcriptomics data through self-supervised contrastive learning. By simultaneously optimizing the contrastive learning loss, reconstruction loss, and Kullback-Leibler (KL) divergence loss, stDGCC achieves joint optimization of feature learning and topology structure preservation in an end-to-end manner. We validate the effectiveness of stDGCC on various spatial transcriptomics datasets acquired from different platforms, each with varying spatial resolutions. Our extensive experiments demonstrate the superiority of stDGCC over various state-of-the-art clustering methods in accurately identifying cellular-level biological structures. AVAILABILITY: Code and data are available from https://github.com/TimE9527/stDGCC and https://figshare.com/projects/stDGCC/186525. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

5.

SOFB is a comprehensive ensemble deep learning approach for elucidating and characterizing protein-nucleic-acid-binding residues.

Zhang, Bin; Hou, Zilong; Yang, Yuning; Wong, Ka-Chun; Zhu, Haoran; Li, Xiangtao.

Commun Biol ; 7(1): 679, 2024 Jun 03.

Artículo en Inglés | MEDLINE | ID: mdl-38830995

RESUMEN

Proteins and nucleic-acids are essential components of living organisms that interact in critical cellular processes. Accurate prediction of nucleic acid-binding residues in proteins can contribute to a better understanding of protein function. However, the discrepancy between protein sequence information and obtained structural and functional data renders most current computational models ineffective. Therefore, it is vital to design computational models based on protein sequence information to identify nucleic acid binding sites in proteins. Here, we implement an ensemble deep learning model-based nucleic-acid-binding residues on proteins identification method, called SOFB, which characterizes protein sequences by learning the semantics of biological dynamics contexts, and then develop an ensemble deep learning-based sequence network to learn feature representation and classification by explicitly modeling dynamic semantic information. Among them, the language learning model, which is constructed from natural language to biological language, captures the underlying relationships of protein sequences, and the ensemble deep learning-based sequence network consisting of different convolutional layers together with Bi-LSTM refines various features for optimal performance. Meanwhile, to address the imbalanced issue, we adopt ensemble learning to train multiple models and then incorporate them. Our experimental results on several DNA/RNA nucleic-acid-binding residue datasets demonstrate that our proposed model outperforms other state-of-the-art methods. In addition, we conduct an interpretability analysis of the identified nucleic acid binding residue sequences based on the attention weights of the language learning model, revealing novel insights into the dynamic semantic information that supports the identified nucleic acid binding residues. SOFB is available at https://github.com/Encryptional/SOFB and https://figshare.com/articles/online_resource/SOFB_figshare_rar/25499452 .

Asunto(s)

Aprendizaje Profundo , Sitios de Unión , Ácidos Nucleicos/metabolismo , Ácidos Nucleicos/química , Proteínas/química , Proteínas/metabolismo , Proteínas/genética , Unión Proteica , Biología Computacional/métodos

6.

Quadruple primary tumors in a lynch syndrome patient surviving more than 26 years with genetic analysis: a case report and literature review.

Zhu, Bosen; Liu, Ming; Mu, Tianhao; Li, Wentao; Ren, Junqi; Li, Xiangtao; Liang, Yi; Yang, Ziyi; Niu, Yulin; Chen, Shifu; Lin, Junqiong.

Front Oncol ; 14: 1382154, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38894864

RESUMEN

The incidence of multiple primary tumors(MPTs) is on the rise in recent years, but patients having four or more primary tumors is still rare. Lynch syndrome (LS) patients have a high risk of developing MPTs. NGS sequencing could identify the genetic alterations in different tumors to make a definite diagnosis of uncommon cases in clinical practice. Here, we report the case of a 66-year-old female patient who develops four MPTS between the ages of 41 and 66, that is sigmoid colon cancer, acute non-lymphocytic leukemia, urothelial carcinoma and ascending colon cancer. She has survived for more than 26 years since the first discovery of tumor. Targeted sequencing indicates that she has a pathogenic germline mutation in the exon 13 of MSH2, and her 2020 ureteral cancer sample and 2023 colon cancer sample have completely different mutation profiles. To the best of our knowledge, this is the first case of multiple primary tumors with an acute non-lymphocytic leukemia in LS patients.

7.

PredGCN: a Pruning-enabled Gene-Cell Net for automatic cell annotation of single cell transcriptome data.

Qi, Qi; Wang, Yunhe; Huang, Yujian; Fan, Yi; Li, Xiangtao.

Bioinformatics ; 40(7)2024 07 01.

Artículo en Inglés | MEDLINE | ID: mdl-38924517

RESUMEN

MOTIVATION: The annotation of cell types from single-cell transcriptomics is essential for understanding the biological identity and functionality of cellular populations. Although manual annotation remains the gold standard, the advent of automatic pipelines has become crucial for scalable, unbiased, and cost-effective annotations. Nonetheless, the effectiveness of these automatic methods, particularly those employing deep learning, significantly depends on the architecture of the classifier and the quality and diversity of the training datasets. RESULTS: To address these limitations, we present a Pruning-enabled Gene-Cell Net (PredGCN) incorporating a Coupled Gene-Cell Net (CGCN) to enable representation learning and information storage. PredGCN integrates a Gene Splicing Net (GSN) and a Cell Stratification Net (CSN), employing a pruning operation (PrO) to dynamically tackle the complexity of heterogeneous cell identification. Among them, GSN leverages multiple statistical and hypothesis-driven feature extraction methods to selectively assemble genes with specificity for scRNA-seq data while CSN unifies elements based on diverse region demarcation principles, exploiting the representations from GSN and precise identification from different regional homogeneity perspectives. Furthermore, we develop a multi-objective Pareto pruning operation (Pareto PrO) to expand the dynamic capabilities of CGCN, optimizing the sub-network structure for accurate cell type annotation. Multiple comparison experiments on real scRNA-seq datasets from various species have demonstrated that PredGCN surpasses existing state-of-the-art methods, including its scalability to cross-species datasets. Moreover, PredGCN can uncover unknown cell types and provide functional genomic analysis by quantifying the influence of genes on cell clusters, bringing new insights into cell type identification and characterizing scRNA-seq data from different perspectives. AVAILABILITY AND IMPLEMENTATION: The source code is available at https://github.com/IrisQi7/PredGCN and test data is available at https://figshare.com/articles/dataset/PredGCN/25251163.

Asunto(s)

Análisis de la Célula Individual , Transcriptoma , Análisis de la Célula Individual/métodos , Transcriptoma/genética , Programas Informáticos , Anotación de Secuencia Molecular/métodos , Animales , Humanos , Perfilación de la Expresión Génica/métodos , Biología Computacional/métodos , Algoritmos

8.

TP-LMMSG: a peptide prediction graph neural network incorporating flexible amino acid property representation.

Chen, Nanjun; Yu, Jixiang; Zhe, Liu; Wang, Fuzhou; Li, Xiangtao; Wong, Ka-Chun.

Brief Bioinform ; 25(4)2024 May 23.

Artículo en Inglés | MEDLINE | ID: mdl-38920345

RESUMEN

Bioactive peptide therapeutics has been a long-standing research topic. Notably, the antimicrobial peptides (AMPs) have been extensively studied for its therapeutic potential. Meanwhile, the demand for annotating other therapeutic peptides, such as antiviral peptides (AVPs) and anticancer peptides (ACPs), also witnessed an increase in recent years. However, we conceive that the structure of peptide chains and the intrinsic information between the amino acids is not fully investigated among the existing protocols. Therefore, we develop a new graph deep learning model, namely TP-LMMSG, which offers lightweight and easy-to-deploy advantages while improving the annotation performance in a generalizable manner. The results indicate that our model can accurately predict the properties of different peptides. The model surpasses the other state-of-the-art models on AMP, AVP and ACP prediction across multiple experimental validated datasets. Moreover, TP-LMMSG also addresses the challenges of time-consuming pre-processing in graph neural network frameworks. With its flexibility in integrating heterogeneous peptide features, our model can provide substantial impacts on the screening and discovery of therapeutic peptides. The source code is available at https://github.com/NanjunChen37/TP_LMMSG.

Asunto(s)

Aminoácidos , Redes Neurales de la Computación , Péptidos , Aminoácidos/química , Péptidos/química , Biología Computacional/métodos , Aprendizaje Profundo , Péptidos Antimicrobianos/química , Algoritmos

9.

Discovering DNA shape motifs with multiple DNA shape features: generalization, methods, and validation.

Chen, Nanjun; Yu, Jixiang; Liu, Zhe; Meng, Lingkuan; Li, Xiangtao; Wong, Ka-Chun.

Nucleic Acids Res ; 52(8): 4137-4150, 2024 May 08.

Artículo en Inglés | MEDLINE | ID: mdl-38572749

RESUMEN

DNA motifs are crucial patterns in gene regulation. DNA-binding proteins (DBPs), including transcription factors, can bind to specific DNA motifs to regulate gene expression and other cellular activities. Past studies suggest that DNA shape features could be subtly involved in DNA-DBP interactions. Therefore, the shape motif annotations based on intrinsic DNA topology can deepen the understanding of DNA-DBP binding. Nevertheless, high-throughput tools for DNA shape motif discovery that incorporate multiple features altogether remain insufficient. To address it, we propose a series of methods to discover non-redundant DNA shape motifs with the generalization to multiple motifs in multiple shape features. Specifically, an existing Gibbs sampling method is generalized to multiple DNA motif discovery with multiple shape features. Meanwhile, an expectation-maximization (EM) method and a hybrid method coupling EM with Gibbs sampling are proposed and developed with promising performance, convergence capability, and efficiency. The discovered DNA shape motif instances reveal insights into low-signal ChIP-seq peak summits, complementing the existing sequence motif discovery works. Additionally, our modelling captures the potential interplays across multiple DNA shape features. We provide a valuable platform of tools for DNA shape motif discovery. An R package is built for open accessibility and long-lasting impact: https://zenodo.org/doi/10.5281/zenodo.10558980.

Asunto(s)

ADN , Motivos de Nucleótidos , ADN/química , ADN/genética , ADN/metabolismo , Proteínas de Unión al ADN/metabolismo , Proteínas de Unión al ADN/química , Proteínas de Unión al ADN/genética , Algoritmos , Conformación de Ácido Nucleico , Secuenciación de Inmunoprecipitación de Cromatina/métodos , Sitios de Unión , Factores de Transcripción/metabolismo , Factores de Transcripción/genética , Factores de Transcripción/química , Humanos , Unión Proteica

10.

Exhaustive Exploitation of Nature-Inspired Computation for Cancer Screening in an Ensemble Manner.

Wang, Xubin; Wang, Yunhe; Ma, Zhiqiang; Wong, Ka-Chun; Li, Xiangtao.

IEEE/ACM Trans Comput Biol Bioinform ; 21(5): 1366-1379, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38578856

RESUMEN

Accurate screening of cancer types is crucial for effective cancer detection and precise treatment selection. However, the association between gene expression profiles and tumors is often limited to a small number of biomarker genes. While computational methods using nature-inspired algorithms have shown promise in selecting predictive genes, existing techniques are limited by inefficient search and poor generalization across diverse datasets. This study presents a framework termed Evolutionary Optimized Diverse Ensemble Learning (EODE) to improve ensemble learning for cancer classification from gene expression data. The EODE methodology combines an intelligent grey wolf optimization algorithm for selective feature space reduction, guided random injection modeling for ensemble diversity enhancement, and subset model optimization for synergistic classifier combinations. Extensive experiments were conducted across 35 gene expression benchmark datasets encompassing varied cancer types. Results demonstrated that EODE obtained significantly improved screening accuracy over individual and conventionally aggregated models. The integrated optimization of advanced feature selection, directed specialized modeling, and cooperative classifier ensembles helps address key challenges in current nature-inspired approaches. This provides an effective framework for robust and generalized ensemble learning with gene expression biomarkers.

Asunto(s)

Algoritmos , Biología Computacional , Detección Precoz del Cáncer , Perfilación de la Expresión Génica , Neoplasias , Neoplasias/genética , Humanos , Biología Computacional/métodos , Detección Precoz del Cáncer/métodos , Perfilación de la Expresión Génica/métodos , Aprendizaje Automático , Biomarcadores de Tumor/genética

11.

Distribution-Agnostic Deep Learning Enables Accurate Single-Cell Data Recovery and Transcriptional Regulation Interpretation.

Su, Yanchi; Yu, Zhuohan; Yang, Yuning; Wong, Ka-Chun; Li, Xiangtao.

Adv Sci (Weinh) ; 11(16): e2307280, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-38380499

RESUMEN

Single-cell RNA sequencing (scRNA-seq) is a robust method for studying gene expression at the single-cell level, but accurately quantifying genetic material is often hindered by limited mRNA capture, resulting in many missing expression values. Existing imputation methods rely on strict data assumptions, limiting their broader application, and lack reliable supervision, leading to biased signal recovery. To address these challenges, authors developed Bis, a distribution-agnostic deep learning model for accurately recovering missing sing-cell gene expression from multiple platforms. Bis is an optimal transport-based autoencoder model that can capture the intricate distribution of scRNA-seq data while addressing the characteristic sparsity by regularizing the cellular embedding space. Additionally, they propose a module using bulk RNA-seq data to guide reconstruction and ensure expression consistency. Experimental results show Bis outperforms other models across simulated and real datasets, showcasing superiority in various downstream analyses including batch effect removal, clustering, differential expression analysis, and trajectory inference. Moreover, Bis successfully restores gene expression levels in rare cell subsets in a tumor-matched peripheral blood dataset, revealing developmental characteristics of cytokine-induced natural killer cells within a head and neck squamous cell carcinoma microenvironment.

Asunto(s)

Aprendizaje Profundo , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Humanos , Análisis de Secuencia de ARN/métodos , Perfilación de la Expresión Génica/métodos

12.

TNF-α promotes osteocyte necroptosis by upregulating TLR4 in postmenopausal osteoporosis.

Cui, Hongwang; Li, Ji; Li, Xiangtao; Su, Tian; Wen, Peng; Wang, Chuanling; Deng, Xiaozhong; Fu, Yonghua; Zhao, Weijie; Li, Changjia; Hua, Pengbing; Zhu, Yongjun; Wan, Wei.

Bone ; 182: 117050, 2024 May.

Artículo en Inglés | MEDLINE | ID: mdl-38367924

RESUMEN

Postmenopausal osteoporosis (PMOP) is a common kind of osteoporosis that is associated with excessive osteocyte death and bone loss. Previous studies have shown that TNF-α-induced osteocyte necroptosis might exert a stronger effect on PMOP than apoptosis, and TLR4 can also induce cell necroptosis, as confirmed by recent studies. However, little is known about the relationship between TNF-α-induced osteocyte necroptosis and TLR4. In the present study, we showed that TNF-α increased the expression of TLR4, which promoted osteocyte necroptosis in PMOP. In patients with PMOP, TLR4 was highly expressed at skeletal sites where exists osteocyte necroptosis, and high TLR4 expression is correlated with enhanced TNF-α expression. Osteocytes exhibited robust TLR4 expression upon exposure to necroptotic osteocytes in vivo and in vitro. Western blotting and immunofluorescence analyses demonstrated that TNF-α upregulated TLR4 expression in vitro, which might further promote osteocyte necroptosis. Furthermore, inhibition of TLR4 by TAK-242 in vitro effectively blocked osteocyte necroptosis induced by TNF-α. Collectively, these results suggest a novel TLR4-mediated process of osteocyte necroptosis, which might increase osteocyte death and bone loss in the process of PMOP.

Asunto(s)

Osteocitos , Osteoporosis Posmenopáusica , Receptor Toll-Like 4 , Factor de Necrosis Tumoral alfa , Femenino , Humanos , Necroptosis , Osteocitos/metabolismo , Osteoporosis Posmenopáusica/metabolismo , Receptor Toll-Like 4/metabolismo , Factor de Necrosis Tumoral alfa/metabolismo

13.

Spatial structure and network characteristics of the coupling coordination innovation ecosystems in the Guangdong-Hong Kong-Macao Greater Bay area.

Yang, Zhichen; Li, Xiangtao; Wang, Fangfang; Chen, Rongjian; Ma, Renwen.

Sci Rep ; 14(1): 395, 2024 01 03.

Artículo en Inglés | MEDLINE | ID: mdl-38172255

RESUMEN

In recent times, a new wave of scientific and technological advancements has significantly reshaped the global economic structure. This shift has redefined the role of regional innovation, particularly in its contribution to developing the Guangdong-Hong Kong-Macao Greater Bay area (GBA) into a renowned center for science, technology, and innovation. This study constructs a comprehensive evaluation system for the Regional Innovation Ecosystem (RIE). By applying the coupling coordination degree model and social network analysis, we have extensively analyzed the spatial structure and network attributes of the coupled and coordinated innovation ecosystem in the GBA from 2010 to 2019. Our findings reveal several key developments: (1) There has been a noticeable rightward shift in the kernel density curve, indicating an ongoing optimization of the overall coupling coordination level. Notably, the center of gravity for coupling coordination has progressively moved southeast. This shift has led to a reduction in the elliptical area each year, while the trend surface consistently shows a convex orientation toward the center. The most significant development is observed along the 'Guangdong-Shenzhen-Hong Kong-Macao Science and Technology Innovation Corridor', where the level of coupling coordination has become increasingly pronounced. (2) The spatial linkages within the GBA have been strengthening. There are significant spatial transaction costs in the regional innovation ecological network. In the context of the 2019 US-China trade war, the cities of Jiangmen and Zhaoqing experienced a notable decrease in connectivity with other cities, raising concerns about their potential marginalization. (3) Guangzhou, Shenzhen, and Hong Kong have emerged as core nodes within the network. The network exhibits a distinctive "core-edge" spatial structure, characterized by both robustness and vulnerability in various aspects.

Asunto(s)

Ecosistema , Hong Kong , Macao , China , Ciudades

14.

miTDS: Uncovering miRNA-mRNA interactions with deep learning for functional target prediction.

Zhang, Jialin; Zhu, Haoran; Liu, Yin; Li, Xiangtao.

Methods ; 223: 65-74, 2024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-38280472

RESUMEN

MicroRNAs (miRNAs) are vital in regulating gene expression through binding to specific target sites on messenger RNAs (mRNAs), a process closely tied to cancer pathogenesis. Identifying miRNA functional targets is essential but challenging, due to incomplete genome annotation and an emphasis on known miRNA-mRNA interactions, restricting predictions of unknown ones. To address those challenges, we have developed a deep learning model based on miRNA functional target identification, named miTDS, to investigate miRNA-mRNA interactions. miTDS first employs a scoring mechanism to eliminate unstable sequence pairs and then utilizes a dynamic word embedding model based on the transformer architecture, enabling a comprehensive analysis of miRNA-mRNA interaction sites by harnessing the global contextual associations of each nucleotide. On this basis, miTDS fuses extended seed alignment representations learned in the multi-scale attention mechanism module with dynamic semantic representations extracted in the RNA-based dual-path module, which can further elucidate and predict miRNA and mRNA functions and interactions. To validate the effectiveness of miTDS, we conducted a thorough comparison with state-of-the-art miRNA-mRNA functional target prediction methods. The evaluation, performed on a dataset cross-referenced with entries from MirTarbase and Diana-TarBase, revealed that miTDS surpasses current methods in accurately predicting functional targets. In addition, our model exhibited proficiency in identifying A-to-I RNA editing sites, which represents an aberrant interaction that yields valuable insights into the suppression of cancerous processes.

Asunto(s)

Aprendizaje Profundo , MicroARNs , MicroARNs/genética , ARN Mensajero/genética , Nucleótidos , Edición de ARN

15.

MotifHub: Detection of trans-acting DNA motif group with probabilistic modeling algorithm.

Liu, Zhe; Wong, Hiu-Man; Chen, Xingjian; Lin, Jiecong; Zhang, Shixiong; Yan, Shankai; Wang, Fuzhou; Li, Xiangtao; Wong, Ka-Chun.

Comput Biol Med ; 168: 107753, 2024 01.

Artículo en Inglés | MEDLINE | ID: mdl-38039889

RESUMEN

BACKGROUND: Trans-acting factors are of special importance in transcription regulation, which is a group of proteins that can directly or indirectly recognize or bind to the 8-12 bp core sequence of cis-acting elements and regulate the transcription efficiency of target genes. The progressive development in high-throughput chromatin capture technology (e.g., Hi-C) enables the identification of chromatin-interacting sequence groups where trans-acting DNA motif groups can be discovered. The problem difficulty lies in the combinatorial nature of DNA sequence pattern matching and its underlying sequence pattern search space. METHOD: Here, we propose to develop MotifHub for trans-acting DNA motif group discovery on grouped sequences. Specifically, the main approach is to develop probabilistic modeling for accommodating the stochastic nature of DNA motif patterns. RESULTS: Based on the modeling, we develop global sampling techniques based on EM and Gibbs sampling to address the global optimization challenge for model fitting with latent variables. The results reflect that our proposed approaches demonstrate promising performance with linear time complexities. CONCLUSION: MotifHub is a novel algorithm considering the identification of both DNA co-binding motif groups and trans-acting TFs. Our study paves the way for identifying hub TFs of stem cell development (OCT4 and SOX2) and determining potential therapeutic targets of prostate cancer (FOXA1 and MYC). To ensure scientific reproducibility and long-term impact, its matrix-algebra-optimized source code is released at http://bioinfo.cs.cityu.edu.hk/MotifHub.

Asunto(s)

Algoritmos , Programas Informáticos , Motivos de Nucleótidos/genética , Reproducibilidad de los Resultados , Cromatina/genética

16.

Automated exploitation of deep learning for cancer patient stratification across multiple types.

Sun, Pingping; Fan, Shijie; Li, Shaochuan; Zhao, Yingwei; Lu, Chang; Wong, Ka-Chun; Li, Xiangtao.

Bioinformatics ; 39(11)2023 11 01.

Artículo en Inglés | MEDLINE | ID: mdl-37934154

RESUMEN

MOTIVATION: Recent frameworks based on deep learning have been developed to identify cancer subtypes from high-throughput gene expression profiles. Unfortunately, the performance of deep learning is highly dependent on its neural network architectures which are often hand-crafted with expertise in deep neural networks, meanwhile, the optimization and adjustment of the network are usually costly and time consuming. RESULTS: To address such limitations, we proposed a fully automated deep neural architecture search model for diagnosing consensus molecular subtypes from gene expression data (DNAS). The proposed model uses ant colony algorithm, one of the heuristic swarm intelligence algorithms, to search and optimize neural network architecture, and it can automatically find the optimal deep learning model architecture for cancer diagnosis in its search space. We validated DNAS on eight colorectal cancer datasets, achieving the average accuracy of 95.48%, the average specificity of 98.07%, and the average sensitivity of 96.24%, respectively. Without the loss of generality, we investigated the general applicability of DNAS further on other cancer types from different platforms including lung cancer and breast cancer, and DNAS achieved an area under the curve of 95% and 96%, respectively. In addition, we conducted gene ontology enrichment and pathological analysis to reveal interesting insights into cancer subtype identification and characterization across multiple cancer types. AVAILABILITY AND IMPLEMENTATION: The source code and data can be downloaded from https://github.com/userd113/DNAS-main. And the web server of DNAS is publicly accessible at 119.45.145.120:5001.

Asunto(s)

Neoplasias de la Mama , Aprendizaje Profundo , Humanos , Femenino , Redes Neurales de la Computación , Algoritmos , Programas Informáticos

17.

A Lightweight Framework For Chromatin Loop Detection at the Single-Cell Level.

Wang, Fuzhou; Alinejad-Rokny, Hamid; Lin, Jiecong; Gao, Tingxiao; Chen, Xingjian; Zheng, Zetian; Meng, Lingkuan; Li, Xiangtao; Wong, Ka-Chun.

Adv Sci (Weinh) ; 10(33): e2303502, 2023 11.

Artículo en Inglés | MEDLINE | ID: mdl-37816141

RESUMEN

Single-cell Hi-C (scHi-C) has made it possible to analyze chromatin organization at the single-cell level. However, scHi-C experiments generate inherently sparse data, which poses a challenge for loop calling methods. The existing approach performs significance tests across the imputed dense contact maps, leading to substantial computational overhead and loss of information at the single-cell level. To overcome this limitation, a lightweight framework called scGSLoop is proposed, which sets a new paradigm for scHi-C loop calling by adapting the training and inferencing strategies of graph-based deep learning to leverage the sequence features and 1D positional information of genomic loci. With this framework, sparsity is no longer a challenge, but rather an advantage that the model leverages to achieve unprecedented computational efficiency. Compared to existing methods, scGSLoop makes more accurate predictions and is able to identify more loops that have the potential to play regulatory roles in genome functioning. Moreover, scGSLoop preserves single-cell information by identifying a distinct group of loops for each individual cell, which not only enables an understanding of the variability of chromatin looping states between cells, but also allows scGSLoop to be extended for the investigation of multi-connected hubs and their underlying mechanisms.

Asunto(s)

Cromatina , Genómica , Cromatina/genética , Genoma

18.

Dynamic characterization and interpretation for protein-RNA interactions across diverse cellular conditions using HDRNet.

Zhu, Haoran; Yang, Yuning; Wang, Yunhe; Wang, Fuzhou; Huang, Yujian; Chang, Yi; Wong, Ka-Chun; Li, Xiangtao.

Nat Commun ; 14(1): 6824, 2023 10 26.

Artículo en Inglés | MEDLINE | ID: mdl-37884495

RESUMEN

RNA-binding proteins play crucial roles in the regulation of gene expression, and understanding the interactions between RNAs and RBPs in distinct cellular conditions forms the basis for comprehending the underlying RNA function. However, current computational methods pose challenges to the cross-prediction of RNA-protein binding events across diverse cell lines and tissue contexts. Here, we develop HDRNet, an end-to-end deep learning-based framework to precisely predict dynamic RBP binding events under diverse cellular conditions. Our results demonstrate that HDRNet can accurately and efficiently identify binding sites, particularly for dynamic prediction, outperforming other state-of-the-art models on 261 linear RNA datasets from both eCLIP and CLIP-seq, supplemented with additional tissue data. Moreover, we conduct motif and interpretation analyses to provide fresh insights into the pathological mechanisms underlying RNA-RBP interactions from various perspectives. Our functional genomic analysis further explores the gene-human disease associations, uncovering previously uncharacterized observations for a broad range of genetic disorders.

Asunto(s)

Proteínas de Unión al ARN , ARN , Humanos , ARN/genética , ARN/metabolismo , Proteínas de Unión al ARN/metabolismo , Sitios de Unión/genética , Unión Proteica , Secuenciación de Inmunoprecipitación de Cromatina

19.

Deep transfer learning for clinical decision-making based on high-throughput data: comprehensive survey with benchmark results.

Toseef, Muhammad; Olayemi Petinrin, Olutomilayo; Wang, Fuzhou; Rahaman, Saifur; Liu, Zhe; Li, Xiangtao; Wong, Ka-Chun.

Brief Bioinform ; 24(4)2023 07 20.

Artículo en Inglés | MEDLINE | ID: mdl-37455245

RESUMEN

The rapid growth of omics-based data has revolutionized biomedical research and precision medicine, allowing machine learning models to be developed for cutting-edge performance. However, despite the wealth of high-throughput data available, the performance of these models is hindered by the lack of sufficient training data, particularly in clinical research (in vivo experiments). As a result, translating this knowledge into clinical practice, such as predicting drug responses, remains a challenging task. Transfer learning is a promising tool that bridges the gap between data domains by transferring knowledge from the source to the target domain. Researchers have proposed transfer learning to predict clinical outcomes by leveraging pre-clinical data (mouse, zebrafish), highlighting its vast potential. In this work, we present a comprehensive literature review of deep transfer learning methods for health informatics and clinical decision-making, focusing on high-throughput molecular data. Previous reviews mostly covered image-based transfer learning works, while we present a more detailed analysis of transfer learning papers. Furthermore, we evaluated original studies based on different evaluation settings across cross-validations, data splits and model architectures. The result shows that those transfer learning methods have great potential; high-throughput sequencing data and state-of-the-art deep learning models lead to significant insights and conclusions. Additionally, we explored various datasets in transfer learning papers with statistics and visualization.

Asunto(s)

Benchmarking , Pez Cebra , Animales , Ratones , Pez Cebra/genética , Aprendizaje Automático , Medicina de Precisión , Toma de Decisiones Clínicas

20.

Reliable Identification and Interpretation of Single-Cell Molecular Heterogeneity and Transcriptional Regulation using Dynamic Ensemble Pruning.

Fan, Yi; Wang, Yunhe; Wang, Fuzhou; Huang, Lei; Yang, Yuning; Wong, Ka-Chun; Li, Xiangtao.

Adv Sci (Weinh) ; 10(22): e2205442, 2023 08.

Artículo en Inglés | MEDLINE | ID: mdl-37290050

RESUMEN

Unsupervised clustering is an essential step in identifying cell types from single-cell RNA sequencing (scRNA-seq) data. However, a common issue with unsupervised clustering models is that the optimization direction of the objective function and the final generated clustering labels in the absence of supervised information may be inconsistent or even arbitrary. To address this challenge, a dynamic ensemble pruning framework (DEPF) is proposed to identify and interpret single-cell molecular heterogeneity. In particular, a silhouette coefficient-based indicator is developed to determine the optimization direction of the bi-objective function. In addition, a hierarchical autoencoder is employed to project the high-dimensional data onto multiple low-dimensional latent space sets, and then a clustering ensemble is produced in the latent space by the basic clustering algorithm. Following that, a bi-objective fruit fly optimization algorithm is designed to prune dynamically the low-quality basic clustering in the ensemble. Multiple experiments are conducted on 28 real scRNA-seq datasets and one large real scRNA-seq dataset from diverse platforms and species to validate the effectiveness of the DEPF. In addition, biological interpretability and transcriptional and post-transcriptional regulatory are conducted to explore biological patterns from the cell types identified, which could provide novel insights into characterizing the mechanisms.

Asunto(s)

Algoritmos , Análisis de la Célula Individual , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Análisis por Conglomerados , Regulación de la Expresión Génica

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA