Search | VHL Search Portal

1.

Self-supervised learning on millions of primary RNA sequences from 72 vertebrates improves sequence-based RNA splicing prediction.

Chen, Ken; Zhou, Yue; Ding, Maolin; Wang, Yu; Ren, Zhixiang; Yang, Yuedong.

Brief Bioinform ; 25(3)2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38605640

ABSTRACT

Language models pretrained by self-supervised learning (SSL) have been widely utilized to study protein sequences, while few models were developed for genomic sequences and were limited to single species. Due to the lack of genomes from different species, these models cannot effectively leverage evolutionary information. In this study, we have developed SpliceBERT, a language model pretrained on primary ribonucleic acids (RNA) sequences from 72 vertebrates by masked language modeling, and applied it to sequence-based modeling of RNA splicing. Pretraining SpliceBERT on diverse species enables effective identification of evolutionarily conserved elements. Meanwhile, the learned hidden states and attention weights can characterize the biological properties of splice sites. As a result, SpliceBERT was shown effective on several downstream tasks: zero-shot prediction of variant effects on splicing, prediction of branchpoints in humans, and cross-species prediction of splice sites. Our study highlighted the importance of pretraining genomic language models on a diverse range of species and suggested that SSL is a promising approach to enhance our understanding of the regulatory logic underlying genomic sequences.

Subject(s)

RNA Splicing , Vertebrates , Animals , Humans , Base Sequence , Vertebrates/genetics , RNA , Supervised Machine Learning

2.

Complementary multi-modality molecular self-supervised learning via non-overlapping masking for property prediction.

Shen, Ao; Yuan, Mingzhi; Ma, Yingfan; Du, Jie; Wang, Manning.

Brief Bioinform ; 25(4)2024 May 23.

Article in English | MEDLINE | ID: mdl-38801702

ABSTRACT

Self-supervised learning plays an important role in molecular representation learning because labeled molecular data are usually limited in many tasks, such as chemical property prediction and virtual screening. However, most existing molecular pre-training methods focus on one modality of molecular data, and the complementary information of two important modalities, SMILES and graph, is not fully explored. In this study, we propose an effective multi-modality self-supervised learning framework for molecular SMILES and graph. Specifically, SMILES data and graph data are first tokenized so that they can be processed by a unified Transformer-based backbone network, which is trained by a masked reconstruction strategy. In addition, we introduce a specialized non-overlapping masking strategy to encourage fine-grained interaction between these two modalities. Experimental results show that our framework achieves state-of-the-art performance in a series of molecular property prediction tasks, and a detailed ablation study demonstrates efficacy of the multi-modality framework and the masking strategy.

Subject(s)

Supervised Machine Learning , Algorithms , Computational Biology/methods

3.

NMDA-driven dendritic modulation enables multitask representation learning in hierarchical sensory processing pathways.

Wybo, Willem A M; Tsai, Matthias C; Tran, Viet Anh Khoa; Illing, Bernd; Jordan, Jakob; Morrison, Abigail; Senn, Walter.

Proc Natl Acad Sci U S A ; 120(32): e2300558120, 2023 08 08.

Article in English | MEDLINE | ID: mdl-37523562

ABSTRACT

While sensory representations in the brain depend on context, it remains unclear how such modulations are implemented at the biophysical level, and how processing layers further in the hierarchy can extract useful features for each possible contextual state. Here, we demonstrate that dendritic N-Methyl-D-Aspartate spikes can, within physiological constraints, implement contextual modulation of feedforward processing. Such neuron-specific modulations exploit prior knowledge, encoded in stable feedforward weights, to achieve transfer learning across contexts. In a network of biophysically realistic neuron models with context-independent feedforward weights, we show that modulatory inputs to dendritic branches can solve linearly nonseparable learning problems with a Hebbian, error-modulated learning rule. We also demonstrate that local prediction of whether representations originate either from different inputs, or from different contextual modulations of the same input, results in representation learning of hierarchical feedforward weights across processing layers that accommodate a multitude of contexts.

Subject(s)

Models, Neurological , N-Methylaspartate , Learning/physiology , Neurons/physiology , Perception

4.

DIST: spatial transcriptomics enhancement using deep learning.

Zhao, Yanping; Wang, Kui; Hu, Gang.

Brief Bioinform ; 24(2)2023 03 19.

Article in English | MEDLINE | ID: mdl-36653906

ABSTRACT

Spatially resolved transcriptomics technologies enable comprehensive measurement of gene expression patterns in the context of intact tissues. However, existing technologies suffer from either low resolution or shallow sequencing depth. Here, we present DIST, a deep learning-based method that imputes the gene expression profiles on unmeasured locations and enhances the gene expression for both original measured spots and imputed spots by self-supervised learning and transfer learning. We evaluate the performance of DIST for imputation, clustering, differential expression analysis and functional enrichment analysis. The results show that DIST can impute the gene expression accurately, enhance the gene expression for low-quality data, help detect more biological meaningful differentially expressed genes and pathways, therefore allow for deeper insights into the biological processes.

Subject(s)

Deep Learning , Transcriptome , Gene Expression Profiling/methods , Cluster Analysis

5.

scGAD: a new task and end-to-end framework for generalized cell type annotation and discovery.

Zhai, Yuyao; Chen, Liang; Deng, Minghua.

Brief Bioinform ; 24(2)2023 03 19.

Article in English | MEDLINE | ID: mdl-36869836

ABSTRACT

The rapid development of single-cell RNA sequencing (scRNA-seq) technology allows us to study gene expression heterogeneity at the cellular level. Cell annotation is the basis for subsequent downstream analysis in single-cell data mining. As more and more well-annotated scRNA-seq reference data become available, many automatic annotation methods have sprung up in order to simplify the cell annotation process on unlabeled target data. However, existing methods rarely explore the fine-grained semantic knowledge of novel cell types absent from the reference data, and they are usually susceptible to batch effects on the classification of seen cell types. Taking into consideration the limitations above, this paper proposes a new and practical task called generalized cell type annotation and discovery for scRNA-seq data whereby target cells are labeled with either seen cell types or cluster labels, instead of a unified 'unassigned' label. To accomplish this, we carefully design a comprehensive evaluation benchmark and propose a novel end-to-end algorithmic framework called scGAD. Specifically, scGAD first builds the intrinsic correspondences on seen and novel cell types by retrieving geometrically and semantically mutual nearest neighbors as anchor pairs. Together with the similarity affinity score, a soft anchor-based self-supervised learning module is then designed to transfer the known label information from reference data to target data and aggregate the new semantic knowledge within target data in the prediction space. To enhance the inter-type separation and intra-type compactness, we further propose a confidential prototype self-supervised learning paradigm to implicitly capture the global topological structure of cells in the embedding space. Such a bidirectional dual alignment mechanism between embedding space and prediction space can better handle batch effect and cell type shift. Extensive results on massive simulation datasets and real datasets demonstrate the superiority of scGAD over various state-of-the-art clustering and annotation methods. We also implement marker gene identification to validate the effectiveness of scGAD in clustering novel cell types and their biological significance. To the best of our knowledge, we are the first to introduce this new and practical task and propose an end-to-end algorithmic framework to solve it. Our method scGAD is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/aimeeyaoyao/scGAD.

Subject(s)

Algorithms , Gene Expression Profiling , Gene Expression Profiling/methods , Single-Cell Analysis/methods , Computer Simulation , Cluster Analysis , Sequence Analysis, RNA/methods

6.

CasANGCL: pre-training and fine-tuning model based on cascaded attention network and graph contrastive learning for molecular property prediction.

Zheng, Zixi; Tan, Yanyan; Wang, Hong; Yu, Shengpeng; Liu, Tianyu; Liang, Cheng.

Brief Bioinform ; 24(1)2023 01 19.

Article in English | MEDLINE | ID: mdl-36592051

ABSTRACT

MOTIVATION: Molecular property prediction is a significant requirement in AI-driven drug design and discovery, aiming to predict the molecular property information (e.g. toxicity) based on the mined biomolecular knowledge. Although graph neural networks have been proven powerful in predicting molecular property, unbalanced labeled data and poor generalization capability for new-synthesized molecules are always key issues that hinder further improvement of molecular encoding performance. RESULTS: We propose a novel self-supervised representation learning scheme based on a Cascaded Attention Network and Graph Contrastive Learning (CasANGCL). We design a new graph network variant, designated as cascaded attention network, to encode local-global molecular representations. We construct a two-stage contrast predictor framework to tackle the label imbalance problem of training molecular samples, which is an integrated end-to-end learning scheme. Moreover, we utilize the information-flow scheme for training our network, which explicitly captures the edge information in the node/graph representations and obtains more fine-grained knowledge. Our model achieves an 81.9% ROC-AUC average performance on 661 tasks from seven challenging benchmarks, showing better portability and generalizations. Further visualization studies indicate our model's better representation capacity and provide interpretability.

Subject(s)

Benchmarking , Learning , Drug Design , Neural Networks, Computer

7.

FG-BERT: a generalized and self-supervised functional group-based molecular representation learning framework for properties prediction.

Li, Biaoshun; Lin, Mujie; Chen, Tiegen; Wang, Ling.

Brief Bioinform ; 24(6)2023 09 22.

Article in English | MEDLINE | ID: mdl-37930026

ABSTRACT

Artificial intelligence-based molecular property prediction plays a key role in molecular design such as bioactive molecules and functional materials. In this study, we propose a self-supervised pretraining deep learning (DL) framework, called functional group bidirectional encoder representations from transformers (FG-BERT), pertained based on ~1.45 million unlabeled drug-like molecules, to learn meaningful representation of molecules from function groups. The pretrained FG-BERT framework can be fine-tuned to predict molecular properties. Compared to state-of-the-art (SOTA) machine learning and DL methods, we demonstrate the high performance of FG-BERT in evaluating molecular properties in tasks involving physical chemistry, biophysics and physiology across 44 benchmark datasets. In addition, FG-BERT utilizes attention mechanisms to focus on FG features that are critical to the target properties, thereby providing excellent interpretability for downstream training tasks. Collectively, FG-BERT does not require any artificially crafted features as input and has excellent interpretability, providing an out-of-the-box framework for developing SOTA models for a variety of molecule (especially for drug) discovery tasks.

Subject(s)

Algorithms , Artificial Intelligence , Benchmarking , Machine Learning

8.

SMG: self-supervised masked graph learning for cancer gene identification.

Cui, Yan; Wang, Zhikang; Wang, Xiaoyu; Zhang, Yiwen; Zhang, Ying; Pan, Tong; Zhang, Zhe; Li, Shanshan; Guo, Yuming; Akutsu, Tatsuya; Song, Jiangning.

Brief Bioinform ; 24(6)2023 09 22.

Article in English | MEDLINE | ID: mdl-37950905

ABSTRACT

Cancer genomics is dedicated to elucidating the genes and pathways that contribute to cancer progression and development. Identifying cancer genes (CGs) associated with the initiation and progression of cancer is critical for characterization of molecular-level mechanism in cancer research. In recent years, the growing availability of high-throughput molecular data and advancements in deep learning technologies has enabled the modelling of complex interactions and topological information within genomic data. Nevertheless, because of the limited labelled data, pinpointing CGs from a multitude of potential mutations remains an exceptionally challenging task. To address this, we propose a novel deep learning framework, termed self-supervised masked graph learning (SMG), which comprises SMG reconstruction (pretext task) and task-specific fine-tuning (downstream task). In the pretext task, the nodes of multi-omic featured protein-protein interaction (PPI) networks are randomly substituted with a defined mask token. The PPI networks are then reconstructed using the graph neural network (GNN)-based autoencoder, which explores the node correlations in a self-prediction manner. In the downstream tasks, the pre-trained GNN encoder embeds the input networks into feature graphs, whereas a task-specific layer proceeds with the final prediction. To assess the performance of the proposed SMG method, benchmarking experiments are performed on three node-level tasks (identification of CGs, essential genes and healthy driver genes) and one graph-level task (identification of disease subnetwork) across eight PPI networks. Benchmarking experiments and performance comparison with existing state-of-the-art methods demonstrate the superiority of SMG on multi-omic feature engineering.

Subject(s)

Neoplasms , Oncogenes , Mutation , Benchmarking , Genes, Essential , Genomics , Neoplasms/genetics

9.

BatmanNet: bi-branch masked graph transformer autoencoder for molecular representation.

Wang, Zhen; Feng, Zheng; Li, Yanjun; Li, Bowen; Wang, Yongrui; Sha, Chulin; He, Min; Li, Xiaolin.

Brief Bioinform ; 25(1)2023 11 22.

Article in English | MEDLINE | ID: mdl-38033291

ABSTRACT

Although substantial efforts have been made using graph neural networks (GNNs) for artificial intelligence (AI)-driven drug discovery, effective molecular representation learning remains an open challenge, especially in the case of insufficient labeled molecules. Recent studies suggest that big GNN models pre-trained by self-supervised learning on unlabeled datasets enable better transfer performance in downstream molecular property prediction tasks. However, the approaches in these studies require multiple complex self-supervised tasks and large-scale datasets , which are time-consuming, computationally expensive and difficult to pre-train end-to-end. Here, we design a simple yet effective self-supervised strategy to simultaneously learn local and global information about molecules, and further propose a novel bi-branch masked graph transformer autoencoder (BatmanNet) to learn molecular representations. BatmanNet features two tailored complementary and asymmetric graph autoencoders to reconstruct the missing nodes and edges, respectively, from a masked molecular graph. With this design, BatmanNet can effectively capture the underlying structure and semantic information of molecules, thus improving the performance of molecular representation. BatmanNet achieves state-of-the-art results for multiple drug discovery tasks, including molecular properties prediction, drug-drug interaction and drug-target interaction, on 13 benchmark datasets, demonstrating its great potential and superiority in molecular representation learning.

Subject(s)

Artificial Intelligence , Benchmarking , Drug Delivery Systems , Drug Discovery , Neural Networks, Computer

10.

scGAAC: A graph attention autoencoder for clustering single-cell RNA-sequencing data.

Zhang, Lin; Xiang, Haiping; Wang, Feng; Chen, Zepeng; Shen, Mo; Ma, Jiani; Liu, Hui; Zheng, Hongdang.

Methods ; 229: 115-124, 2024 Sep.

Article in English | MEDLINE | ID: mdl-38950719

ABSTRACT

Single-cell RNA-sequencing (scRNA-seq) enables the investigation of intricate mechanisms governing cell heterogeneity and diversity. Clustering analysis remains a pivotal tool in scRNA-seq for discerning cell types. However, persistent challenges arise from noise, high dimensionality, and dropout in single-cell data. Despite the proliferation of scRNA-seq clustering methods, these often focus on extracting representations from individual cell expression data, neglecting potential intercellular relationships. To overcome this limitation, we introduce scGAAC, a novel clustering method based on an attention-based graph convolutional autoencoder. By leveraging structural information between cells through a graph attention autoencoder, scGAAC uncovers latent relationships while extracting representation information from single-cell gene expression patterns. An attention fusion module amalgamates the learned features of the graph attention autoencoder and the autoencoder through attention weights. Ultimately, a self-supervised learning policy guides model optimization. scGAAC, a hypothesis-free framework, performs better on four real scRNA-seq datasets than most state-of-the-art methods. The scGAAC implementation is publicly available on Github at: https://github.com/labiip/scGAAC.

Subject(s)

Sequence Analysis, RNA , Single-Cell Analysis , Single-Cell Analysis/methods , Humans , Cluster Analysis , Sequence Analysis, RNA/methods , RNA-Seq/methods , Algorithms , Software

11.

PhosBERT: A self-supervised learning model for identifying phosphorylation sites in SARS-CoV-2-infected human cells.

Li, Yong; Gao, Ru; Liu, Shan; Zhang, Hongqi; Lv, Hao; Lai, Hongyan.

Methods ; 230: 140-146, 2024 Aug 22.

Article in English | MEDLINE | ID: mdl-39179191

ABSTRACT

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a single-stranded RNA virus, which mainly causes respiratory and enteric diseases and is responsible for the outbreak of coronavirus disease 19 (COVID-19). Numerous studies have demonstrated that SARS-CoV-2 infection will lead to a significant dysregulation of protein post-translational modification profile in human cells. The accurate recognition of phosphorylation sites in host cells will contribute to a deep understanding of the pathogenic mechanisms of SARS-CoV-2 and also help to screen drugs and compounds with antiviral potential. Therefore, there is a need to develop cost-effective and high-precision computational strategies for specifically identifying SARS-CoV-2-infected phosphorylation sites. In this work, we first implemented a custom neural network model (named PhosBERT) on the basis of a pre-trained protein language model of ProtBert, which was a self-supervised learning approach developed on the Bidirectional Encoder Representation from Transformers (BERT) architecture. PhosBERT was then trained and validated on serine (S) and threonine (T) phosphorylation dataset and tyrosine (Y) phosphorylation dataset with 5-fold cross-validation, respectively. Independent validation results showed that PhosBERT could identify S/T phosphorylation sites with high accuracy and AUC (area under the receiver operating characteristic) value of 81.9% and 0.896. The prediction accuracy and AUC value of Y phosphorylation sites reached up to 87.1% and 0.902. It indicated that the proposed model was of good prediction ability and stability and would provide a new approach for studying SARS-CoV-2 phosphorylation sites.

12.

Slideflow: deep learning for digital histopathology with real-time whole-slide visualization.

Dolezal, James M; Kochanny, Sara; Dyer, Emma; Ramesh, Siddhi; Srisuwananukorn, Andrew; Sacco, Matteo; Howard, Frederick M; Li, Anran; Mohan, Prajval; Pearson, Alexander T.

BMC Bioinformatics ; 25(1): 134, 2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38539070

ABSTRACT

Deep learning methods have emerged as powerful tools for analyzing histopathological images, but current methods are often specialized for specific domains and software environments, and few open-source options exist for deploying models in an interactive interface. Experimenting with different deep learning approaches typically requires switching software libraries and reprocessing data, reducing the feasibility and practicality of experimenting with new architectures. We developed a flexible deep learning library for histopathology called Slideflow, a package which supports a broad array of deep learning methods for digital pathology and includes a fast whole-slide interface for deploying trained models. Slideflow includes unique tools for whole-slide image data processing, efficient stain normalization and augmentation, weakly-supervised whole-slide classification, uncertainty quantification, feature generation, feature space analysis, and explainability. Whole-slide image processing is highly optimized, enabling whole-slide tile extraction at 40x magnification in 2.5 s per slide. The framework-agnostic data processing pipeline enables rapid experimentation with new methods built with either Tensorflow or PyTorch, and the graphical user interface supports real-time visualization of slides, predictions, heatmaps, and feature space characteristics on a variety of hardware devices, including ARM-based devices such as the Raspberry Pi.

Subject(s)

Deep Learning , Software , Computers , Image Processing, Computer-Assisted/methods

13.

Prediction of mutation-induced protein stability changes based on the geometric representations learned by a self-supervised method.

Li, Shan Shan; Liu, Zhao Ming; Li, Jiao; Ma, Yi Bo; Dong, Ze Yuan; Hou, Jun Wei; Shen, Fu Jie; Wang, Wei Bu; Li, Qi Ming; Su, Ji Guo.

BMC Bioinformatics ; 25(1): 282, 2024 Aug 28.

Article in English | MEDLINE | ID: mdl-39198740

ABSTRACT

BACKGROUND: Thermostability is a fundamental property of proteins to maintain their biological functions. Predicting protein stability changes upon mutation is important for our understanding protein structure-function relationship, and is also of great interest in protein engineering and pharmaceutical design. RESULTS: Here we present mutDDG-SSM, a deep learning-based framework that uses the geometric representations encoded in protein structure to predict the mutation-induced protein stability changes. mutDDG-SSM consists of two parts: a graph attention network-based protein structural feature extractor that is trained with a self-supervised learning scheme using large-scale high-resolution protein structures, and an eXtreme Gradient Boosting model-based stability change predictor with an advantage of alleviating overfitting problem. The performance of mutDDG-SSM was tested on several widely-used independent datasets. Then, myoglobin and p53 were used as case studies to illustrate the effectiveness of the model in predicting protein stability changes upon mutations. Our results show that mutDDG-SSM achieved high performance in estimating the effects of mutations on protein stability. In addition, mutDDG-SSM exhibited good unbiasedness, where the prediction accuracy on the inverse mutations is as well as that on the direct mutations. CONCLUSION: Meaningful features can be extracted from our pre-trained model to build downstream tasks and our model may serve as a valuable tool for protein engineering and drug design.

Subject(s)

Mutation , Protein Stability , Proteins , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , Myoglobin/chemistry , Myoglobin/genetics , Tumor Suppressor Protein p53/genetics , Tumor Suppressor Protein p53/chemistry , Tumor Suppressor Protein p53/metabolism , Computational Biology/methods , Deep Learning , Supervised Machine Learning , Databases, Protein , Protein Conformation

14.

Deep self-supervised machine learning algorithms with a novel feature elimination and selection approaches for blood test-based multi-dimensional health risks classification.

Tutsoy, Onder; Koç, Gizem Gul.

BMC Bioinformatics ; 25(1): 103, 2024 Mar 08.

Article in English | MEDLINE | ID: mdl-38459463

ABSTRACT

BACKGROUND: Blood test is extensively performed for screening, diagnoses and surveillance purposes. Although it is possible to automatically evaluate the raw blood test data with the advanced deep self-supervised machine learning approaches, it has not been profoundly investigated and implemented yet. RESULTS: This paper proposes deep machine learning algorithms with multi-dimensional adaptive feature elimination, self-feature weighting and novel feature selection approaches. To classify the health risks based on the processed data with the deep layers, four machine learning algorithms having various properties from being utterly model free to gradient driven are modified. CONCLUSIONS: The results show that the proposed deep machine learning algorithms can remove the unnecessary features, assign self-importance weights, selects their most informative ones and classify the health risks automatically from the worst-case low to worst-case high values.

Subject(s)

Algorithms , Machine Learning , Supervised Machine Learning

15.

MSH-DTI: multi-graph convolution with self-supervised embedding and heterogeneous aggregation for drug-target interaction prediction.

Zhang, Beiyi; Niu, Dongjiang; Zhang, Lianwei; Zhang, Qiang; Li, Zhen.

BMC Bioinformatics ; 25(1): 275, 2024 Aug 23.

Article in English | MEDLINE | ID: mdl-39179993

ABSTRACT

BACKGROUND: The rise of network pharmacology has led to the widespread use of network-based computational methods in predicting drug target interaction (DTI). However, existing DTI prediction models typically rely on a limited amount of data to extract drug and target features, potentially affecting the comprehensiveness and robustness of features. In addition, although multiple networks are used for DTI prediction, the integration of heterogeneous information often involves simplistic aggregation and attention mechanisms, which may impose certain limitations. RESULTS: MSH-DTI, a deep learning model for predicting drug-target interactions, is proposed in this paper. The model uses self-supervised learning methods to obtain drug and target structure features. A Heterogeneous Interaction-enhanced Feature Fusion Module is designed for multi-graph construction, and the graph convolutional networks are used to extract node features. With the help of an attention mechanism, the model focuses on the important parts of different features for prediction. Experimental results show that the AUROC and AUPR of MSH-DTI are 0.9620 and 0.9605 respectively, outperforming other models on the DTINet dataset. CONCLUSION: The proposed MSH-DTI is a helpful tool to discover drug-target interactions, which is also validated through case studies in predicting new DTIs.

Subject(s)

Deep Learning , Supervised Machine Learning , Computational Biology/methods , Network Pharmacology/methods

16.

Disentangling brain atrophy heterogeneity in Alzheimer's disease: A deep self-supervised approach with interpretable latent space.

Kang, Sohyun; Kim, Sung-Woo; Seong, Joon-Kyung.

Neuroimage ; 297: 120737, 2024 Aug 15.

Article in English | MEDLINE | ID: mdl-39004409

ABSTRACT

Alzheimer's disease (AD) is heterogeneous, but existing methods for capturing this heterogeneity through dimensionality reduction and unsupervised clustering have limitations when it comes to extracting intricate atrophy patterns. In this study, we propose a deep learning based self-supervised framework that characterizes complex atrophy features using latent space representation. It integrates feature engineering, classification, and clustering to synergistically disentangle heterogeneity in Alzheimer's disease. Through this representation learning, we trained a clustered latent space with distinct atrophy patterns and clinical characteristics in AD, and replicated the findings in prodromal Alzheimer's disease. Moreover, we discovered that these clusters are not solely attributed to subtypes but also reflect disease progression in the latent space, representing the core dimensions of heterogeneity, namely progression and subtypes. Furthermore, longitudinal latent space analysis revealed two distinct disease progression pathways: medial temporal and parietotemporal pathways. The proposed approach enables effective latent representations that can be integrated with individual-level cognitive profiles, thereby facilitating a comprehensive understanding of AD heterogeneity.

Subject(s)

Alzheimer Disease , Atrophy , Brain , Deep Learning , Disease Progression , Humans , Alzheimer Disease/pathology , Alzheimer Disease/diagnostic imaging , Atrophy/pathology , Aged , Female , Male , Brain/pathology , Brain/diagnostic imaging , Magnetic Resonance Imaging/methods , Aged, 80 and over , Supervised Machine Learning

17.

Self-supervised multimodal learning for group inferences from MRI data: Discovering disorder-relevant brain regions and multimodal links.

Fedorov, Alex; Geenjaar, Eloy; Wu, Lei; Sylvain, Tristan; DeRamus, Thomas P; Luck, Margaux; Misiura, Maria; Mittapalle, Girish; Hjelm, R Devon; Plis, Sergey M; Calhoun, Vince D.

Neuroimage ; 285: 120485, 2024 Jan.

Article in English | MEDLINE | ID: mdl-38110045

ABSTRACT

In recent years, deep learning approaches have gained significant attention in predicting brain disorders using neuroimaging data. However, conventional methods often rely on single-modality data and supervised models, which provide only a limited perspective of the intricacies of the highly complex brain. Moreover, the scarcity of accurate diagnostic labels in clinical settings hinders the applicability of the supervised models. To address these limitations, we propose a novel self-supervised framework for extracting multiple representations from multimodal neuroimaging data to enhance group inferences and enable analysis without resorting to labeled data during pre-training. Our approach leverages Deep InfoMax (DIM), a self-supervised methodology renowned for its efficacy in learning representations by estimating mutual information without the need for explicit labels. While DIM has shown promise in predicting brain disorders from single-modality MRI data, its potential for multimodal data remains untapped. This work extends DIM to multimodal neuroimaging data, allowing us to identify disorder-relevant brain regions and explore multimodal links. We present compelling evidence of the efficacy of our multimodal DIM analysis in uncovering disorder-relevant brain regions, including the hippocampus, caudate, insula, - and multimodal links with the thalamus, precuneus, and subthalamus hypothalamus. Our self-supervised representations demonstrate promising capabilities in predicting the presence of brain disorders across a spectrum of Alzheimer's phenotypes. Comparative evaluations against state-of-the-art unsupervised methods based on autoencoders, canonical correlation analysis, and supervised models highlight the superiority of our proposed method in achieving improved classification performance, capturing joint information, and interpretability capabilities. The computational efficiency of the decoder-free strategy enhances its practical utility, as it saves compute resources without compromising performance. This work offers a significant step forward in addressing the challenge of understanding multimodal links in complex brain disorders, with potential applications in neuroimaging research and clinical diagnosis.

Subject(s)

Brain Diseases , Magnetic Resonance Imaging , Humans , Magnetic Resonance Imaging/methods , Neuroimaging/methods , Brain/diagnostic imaging , Multimodal Imaging/methods

18.

Adaptive node feature extraction in graph-based neural networks for brain diseases diagnosis using self-supervised learning.

Zeng, Youbing; Lin, Jiaying; Li, Zhuoshuo; Xiao, Zehui; Wang, Chen; Ge, Xinting; Wang, Cheng; Huang, Gui; Liu, Mengting.

Neuroimage ; 297: 120750, 2024 Aug 15.

Article in English | MEDLINE | ID: mdl-39059681

ABSTRACT

Electroencephalography (EEG) has demonstrated significant value in diagnosing brain diseases. In particular, brain networks have gained prominence as they offer additional valuable insights by establishing connections between EEG signal channels. While brain connections are typically delineated by channel signal similarity, there lacks a consistent and reliable strategy for ascertaining node characteristics. Conventional node features such as temporal and frequency domain properties of EEG signals prove inadequate for capturing the extensive EEG information. In our investigation, we introduce a novel adaptive method for extracting node features from EEG signals utilizing a distinctive task-induced self-supervised learning technique. By amalgamating these extracted node features with fundamental edge features constructed using Pearson correlation coefficients, we showed that the proposed approach can function as a plug-in module that can be integrated to many common GNN networks (e.g., GCN, GraphSAGE, GAT) as a replacement of node feature selections module. Comprehensive experiments are then conducted to demonstrate the consistently superior performance and high generality of the proposed method over other feature selection methods in various of brain disorder prediction tasks, such as depression, schizophrenia, and Parkinson's disease. Furthermore, compared to other node features, our approach unveils profound spatial patterns through graph pooling and structural learning, shedding light on pivotal brain regions influencing various brain disorder prediction based on derived features.

Subject(s)

Brain Diseases , Electroencephalography , Neural Networks, Computer , Supervised Machine Learning , Humans , Electroencephalography/methods , Brain Diseases/diagnostic imaging , Brain Diseases/physiopathology , Signal Processing, Computer-Assisted , Adult , Brain/diagnostic imaging , Brain/physiopathology , Male , Female

19.

SIFLoc: a self-supervised pre-training method for enhancing the recognition of protein subcellular localization in immunofluorescence microscopic images.

Tu, Yanlun; Lei, Houchao; Shen, Hong-Bin; Yang, Yang.

Brief Bioinform ; 23(2)2022 03 10.

Article in English | MEDLINE | ID: mdl-35152293

ABSTRACT

With the rapid growth of high-resolution microscopy imaging data, revealing the subcellular map of human proteins has become a central task in the spatial proteome. The cell atlas of the Human Protein Atlas (HPA) provides precious resources for recognizing subcellular localization patterns at the cell level, and the large-scale annotated data enable learning via advanced deep neural networks. However, the existing predictors still suffer from the imbalanced class distribution and the lack of labeled data for minor classes. Thus, it is necessary to develop new methods for coping with these issues. We leverage the self-supervised learning protocol to address these problems. Especially, we propose a pre-training scheme to enhance the conventional supervised learning framework called SIFLoc. The pre-training is featured by a hybrid data augmentation method and a modified contrastive loss function, aiming to learn good feature representations from microscopic images. The experiments are performed on a large-scale immunofluorescence microscopic image dataset collected from the HPA database. Using the same deep neural networks as the classifier, the model pre-trained via SIFLoc not only outperforms the model without pre-training by a large margin but also shows advantages over the state-of-the-art self-supervised learning methods. Especially, SIFLoc improves the prediction accuracy for minor organelles significantly.

Subject(s)

Neural Networks, Computer , Fluorescent Antibody Technique , Humans , Proteome , Supervised Machine Learning

20.

Improving quantitative MRI using self-supervised deep learning with model reinforcement: Demonstration for rapid T1 mapping.

Bian, Wanyu; Jang, Albert; Liu, Fang.

Magn Reson Med ; 92(1): 98-111, 2024 Jul.

Article in English | MEDLINE | ID: mdl-38342980

ABSTRACT

PURPOSE: This paper proposes a novel self-supervised learning framework that uses model reinforcement, REference-free LAtent map eXtraction with MOdel REinforcement (RELAX-MORE), for accelerated quantitative MRI (qMRI) reconstruction. The proposed method uses an optimization algorithm to unroll an iterative model-based qMRI reconstruction into a deep learning framework, enabling accelerated MR parameter maps that are highly accurate and robust. METHODS: Unlike conventional deep learning methods which require large amounts of training data, RELAX-MORE is a subject-specific method that can be trained on single-subject data through self-supervised learning, making it accessible and practically applicable to many qMRI studies. Using quantitative T 1 $$ {\mathrm{T}}_1 $$ mapping as an example, the proposed method was applied to the brain, knee and phantom data. RESULTS: The proposed method generates high-quality MR parameter maps that correct for image artifacts, removes noise, and recovers image features in regions of imperfect image conditions. Compared with other state-of-the-art conventional and deep learning methods, RELAX-MORE significantly improves efficiency, accuracy, robustness, and generalizability for rapid MR parameter mapping. CONCLUSION: This work demonstrates the feasibility of a new self-supervised learning method for rapid MR parameter mapping, that is readily adaptable to the clinical translation of qMRI.

Subject(s)

Algorithms , Brain , Deep Learning , Image Processing, Computer-Assisted , Magnetic Resonance Imaging , Phantoms, Imaging , Magnetic Resonance Imaging/methods , Humans , Image Processing, Computer-Assisted/methods , Brain/diagnostic imaging , Knee/diagnostic imaging , Artifacts , Supervised Machine Learning

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL