ABSTRACT
Accurate prognosis for cancer patients can provide critical information for optimizing treatment plans and improving life quality. Combining omics data and demographic/clinical information can offer a more comprehensive view of cancer prognosis than using omics or clinical data alone and can also reveal the underlying disease mechanisms at the molecular level. In this study, we developed and validated a deep learning framework to extract information from high-dimensional gene expression and miRNA expression data and conduct prognosis prediction for breast cancer and ovarian-cancer patients using multiple independent multi-omics datasets. Our model achieved significantly better prognosis prediction than the current machine learning and deep learning approaches in various settings. Moreover, an interpretation method was applied to tackle the "black-box" nature of deep neural networks and we identified features (i.e., genes, miRNA, demographic/clinical variables) that were important to distinguish predicted high- and low-risk patients. The significance of the identified features was partially supported by previous studies.
ABSTRACT
Accurate prognosis for cancer patients can provide critical information for optimizing treatment plans and improving life quality. Combining omics data and demographic/clinical information can offer a more comprehensive view of cancer prognosis than using omics or clinical data alone and can reveal the underlying disease mechanisms at the molecular level. In this study, we developed a novel deep learning framework to extract information from high-dimensional gene expression and miRNA expression data and conduct prognosis prediction for breast cancer and ovarian cancer patients. Our model achieved significantly better prognosis prediction than the conventional Cox Proportional Hazard model and other competitive deep learning approaches in various settings. Moreover, an interpretation approach was applied to tackle the "black-box" nature of deep neural networks and we identified features (i.e., genes, miRNA, demographic/clinical variables) that made important contributions to distinguishing predicted high- and low-risk patients. The identified associations were partially supported by previous studies.
ABSTRACT
The imaging genetics approach generates large amount of high dimensional and multi-modal data, providing complementary information for comprehensive study of Schizophrenia, a complex mental disease. However, at the same time, the variety of these data in structures, resolutions, and formats makes their integrative study a forbidding task. In this paper, we propose a novel model called Joint Sparse Collaborative Regression (JSCoReg), which can extract class-specific features from different health conditions/disease classes. We first evaluate the performance of feature selection in terms of Receiver operating characteristic curve and the area under the ROC curve in the simulation experiment. We demonstrate that the JSCoReg model can achieve higher accuracy compared with similar models including Joint Sparse Canonical Correlation Analysis and Sparse Collaborative Regression. We then applied the JSCoReg model to the analysis of schizophrenia dataset collected from the Mind Clinical Imaging Consortium. The JSCoReg enables us to better identify biomarkers associated with schizophrenia, which are verified to be both biologically and statistically significant.
Subject(s)
Schizophrenia , Humans , Schizophrenia/diagnostic imaging , Schizophrenia/genetics , Algorithms , Magnetic Resonance Imaging/methods , Computer Simulation , ROC Curve , BrainABSTRACT
In response to the unprecedented global healthcare crisis of the COVID-19 pandemic, the scientific community has joined forces to tackle the challenges and prepare for future pandemics. Multiple modalities of data have been investigated to understand the nature of COVID-19. In this paper, MIDRC investigators present an overview of the state-of-the-art development of multimodal machine learning for COVID-19 and model assessment considerations for future studies. We begin with a discussion of the lessons learned from radiogenomic studies for cancer diagnosis. We then summarize the multi-modality COVID-19 data investigated in the literature including symptoms and other clinical data, laboratory tests, imaging, pathology, physiology, and other omics data. Publicly available multimodal COVID-19 data provided by MIDRC and other sources are summarized. After an overview of machine learning developments using multimodal data for COVID-19, we present our perspectives on the future development of multimodal machine learning models for COVID-19.
ABSTRACT
Recent studies show that multi-modal data fusion techniques combine information from diverse sources for comprehensive diagnosis and prognosis of complex brain disorder, often resulting in improved accuracy compared to single-modality approaches. However, many existing data fusion methods extract features from homogeneous networs, ignoring heterogeneous structural information among multiple modalities. To this end, we propose a Hypergraph-based Multi-modal data Fusion algorithm, namely HMF. Specifically, we first generate a hypergraph similarity matrix to represent the high-order relationships among subjects, and then enforce the regularization term based upon both the inter- and intra-modality relationships of the subjects. Finally, we apply HMF to integrate imaging and genetics datasets. Validation of the proposed method is performed on both synthetic data and real samples from schizophrenia study. Results show that our algorithm outperforms several competing methods, and reveals significant interactions among risk genes, environmental factors and abnormal brain regions.
Subject(s)
Schizophrenia , Algorithms , Brain/diagnostic imaging , Humans , Magnetic Resonance Imaging/methods , Multimodal Imaging/methods , Schizophrenia/diagnostic imaging , Schizophrenia/geneticsABSTRACT
OBJECTIVE: Graphical deep learning models provide a desirable way for brain functional connectivity analysis. However, the application of current graph deep learning models to brain network analysis is challenging due to the limited sample size and complex relationships between different brain regions. METHOD: In this work, a graph convolutional network (GCN) based framework is proposed by exploiting the information from both region-to-region connectivities of the brain and subject-subject relationships. We first construct an affinity subject-subject graph followed by GCN analysis. A Laplacian regularization term is introduced in our model to tackle the overfitting problem. We apply and validate the proposed model to the Philadelphia Neurodevelopmental Cohort for the brain cognition study. RESULTS: Experimental analysis shows that our proposed framework outperforms other competing models in classifying groups with low and high Wide Range Achievement Test (WRAT) scores. Moreover, to examine each brain region's contribution to cognitive function, we use the occlusion sensitivity analysis method to identify cognition-related brain functional networks. The results are consistent with previous research yet yield new findings. CONCLUSION AND SIGNIFICANCE: Our study demonstrates that GCN incorporating prior knowledge about brain networks offers a powerful way to detect important brain networks and regions associated with cognitive functions.
Subject(s)
Deep Learning , Brain/diagnostic imaging , Cognition , Cohort Studies , Humans , Magnetic Resonance Imaging , Sample SizeABSTRACT
Transcriptome-wide association studies (TWAS) systematically investigate the association of genetically predicted gene expression with disease risk, providing an effective approach to identify novel susceptibility genes. Osteoporosis is the most common metabolic bone disease, associated with reduced bone mineral density (BMD) and increased risk of osteoporotic fractures, whereas genetic factors explain approximately 70% of the variance in phenotypes associated with bone. BMD is commonly assessed using dual-energy X-ray absorptiometry (DXA) to obtain measurements (g/cm2) of areal BMD. However, quantitative computed tomography (QCT) measured 3D volumetric BMD (vBMD) (g/cm3) has important advantages compared with DXA since it can evaluate cortical and trabecular microstructural features of bone quality, which can be used to directly predict fracture risk. Here, we performed the first TWAS for volumetric BMD (vBMD) by integrating genome-wide association studies (GWAS) data from two independent cohorts, namely the Framingham Heart Study (FHS, n = 3298) and the Osteoporotic Fractures in Men (MrOS, n = 4641), with tissue-specific gene expression data from the Genotype-Tissue Expression (GTEx) project. We first used stratified linkage disequilibrium (LD) score regression approach to identify 12 vBMD-relevant tissues, for which vBMD heritability is enriched in tissue-specific genes of the given tissue. Focusing on these tissues, we subsequently leveraged GTEx expression reference panels to predict tissue-specific gene expression levels based on the genotype data from FHS and MrOS. The associations between predicted gene expression levels and vBMD variation were then tested by MultiXcan, an innovative TWAS method that integrates information available across multiple tissues. We identified 70 significant genes associated with vBMD, including some previously identified osteoporosis-related genes such as LYRM2 and NME8, as well as some novel loci such as DNAAF2 and SPAG16. Our findings provide novel insights into the pathophysiological mechanisms of osteoporosis and highlight several novel vBMD-associated genes that warrant further investigation.
Subject(s)
Bone Density , Osteoporotic Fractures , Absorptiometry, Photon , Bone Density/genetics , Genome-Wide Association Study , Humans , Male , Transcriptome/geneticsABSTRACT
OBJECTIVE: To understand the association between brain networks and behaviors of an individual, most studies build predictive models based on functional connectivity (FC) from a single dataset with linear analysis techniques. Such approaches may fail to capture the nonlinear structure of brain networks and neglect the complementary information contained in FC networks (FCNs) from multiple datasets. To address this challenging issue, we use multiview dimensionality reduction to extract a coherent low-dimensional representation of the FCNs from resting-state and emotion identification task-based functional magnetic resonance imaging (fMRI) datasets. METHODS: We propose a scheme based on multiview diffusion map to extract intrinsic features while preserving the underlying geometric structure of high dimensional datasets. This method is robust to noise and small variations in the data. RESULTS: After validation on the Philadelphia Neurodevelopmental Cohort data, the predictive model built from both resting-state and emotion identification task-based fMRI datasets outperforms the one using each individual fMRI dataset. In addition, the proposed model achieves better prediction performance than principal component analysis (PCA) and three other competing data fusion methods. CONCLUSION: Our framework for combing multiple FCNs in one predictive model exhibits improved prediction performance. SIGNIFICANCE: To our knowledge, we demonstrate a first application of multiview diffusion map to successfully fuse different types of fMRI data for predicting fluid intelligence (gF).
Subject(s)
Brain , Magnetic Resonance Imaging , Brain/diagnostic imaging , Brain Mapping , Diffusion , Humans , Intelligence , Principal Component AnalysisABSTRACT
With the development of neuroimaging techniques, a growing amount of multi-modal brain imaging data are collected, facilitating comprehensive study of the brain. In this paper, we jointly analyzed functional magnetic resonance imaging (fMRI) collected under different paradigms in order to understand cognitive behaviors of an individual. To this end, we proposed a novel multi-view learning algorithm called structure-enforced collaborative regression (SCoRe) to extract co-expressed discriminative brain regions under the guidance of anatomical structure of the brain. An advantage of SCoRe over its predecessor collaborative regression (CoRe) lies in its incorporation of group structures in the brain imaging data, which makes the model biologically more meaningful. Results from real data analysis has confirmed that by incorporating prior knowledge of brain structure, SCoRe can deliver better prediction performance and is less sensitive to hyper-parameters than CoRe. After validation with simulation experiments, we applied SCoRe to fMRI data collected from the Philadelphia Neurodevelopmental Cohort and adopted the scores from the wide range achievement test (WRAT) to evaluate an individual's cognitive skills. We located 14 relevant brain regions that can efficiently predict WRAT scores and these brain regions were further confirmed by other independent studies.
Subject(s)
Magnetic Resonance Imaging , Neuroimaging , Algorithms , Brain/diagnostic imaging , Cognition , HumansABSTRACT
The combination of multimodal imaging and genomics provides a more comprehensive way for the study of mental illnesses and brain functions. Deep network-based data fusion models have been developed to capture their complex associations, resulting in improved diagnosis of diseases. However, deep learning models are often difficult to interpret, bringing about challenges for uncovering biological mechanisms using these models. In this work, we develop an interpretable multimodal fusion model to perform automated diagnosis and result interpretation simultaneously. We name it Grad-CAM guided convolutional collaborative learning (gCAM-CCL), which is achieved by combining intermediate feature maps with gradient-based weights. The gCAM-CCL model can generate interpretable activation maps to quantify pixel-level contributions of the input features. Moreover, the estimated activation maps are class-specific, which can therefore facilitate the identification of biomarkers underlying different groups. We validate the gCAM-CCL model on a brain imaging-genetic study, and demonstrate its applications to both the classification of cognitive function groups and the discovery of underlying biological mechanisms. Specifically, our analysis results suggest that during task-fMRI scans, several object recognition related regions of interests (ROIs) are activated followed by several downstream encoding ROIs. In addition, the high cognitive group may have stronger neurotransmission signaling while the low cognitive group may have problems in brain/neuron development due to genetic variations.
Subject(s)
Deep Learning , Brain/diagnostic imaging , Cognition , Magnetic Resonance Imaging , Neural Networks, ComputerABSTRACT
Human osteoblasts are multifunctional bone cells, which play essential roles in bone formation, angiogenesis regulation, as well as maintenance of hematopoiesis. However, the categorization of primary osteoblast subtypes in vivo in humans has not yet been achieved. Here, we used single-cell RNA sequencing (scRNA-seq) to perform a systematic cellular taxonomy dissection of freshly isolated human osteoblasts from one 31-year-old male with osteoarthritis and osteopenia after hip replacement. Based on the gene expression patterns and cell lineage reconstruction, we identified three distinct cell clusters including preosteoblasts, mature osteoblasts, and an undetermined rare osteoblast subpopulation. This novel subtype was found to be the major source of the nuclear receptor subfamily 4 group A member 1 and 2 (NR4A1 and NR4A2) in primary osteoblasts, and the expression of NR4A1 was confirmed by immunofluorescence staining on mouse osteoblasts in vivo. Trajectory inference analysis suggested that the undetermined cluster, together with the preosteoblasts, are involved in the regulation of osteoblastogenesis and also give rise to mature osteoblasts. Investigation of the biological processes and signaling pathways enriched in each subpopulation revealed that in addition to bone formation, preosteoblasts and undetermined osteoblasts may also regulate both angiogenesis and hemopoiesis. Finally, we demonstrated that there are systematic differences between the transcriptional profiles of human and mouse osteoblasts, highlighting the necessity for studying bone physiological processes in humans rather than solely relying on mouse models. Our findings provide novel insights into the cellular heterogeneity and potential biological functions of human primary osteoblasts at the single-cell level.
Subject(s)
Osteoblasts/cytology , Adult , Animals , Cell Differentiation , Cells, Cultured , Humans , Male , Mice , Nuclear Receptor Subfamily 4, Group A, Member 1/genetics , Nuclear Receptor Subfamily 4, Group A, Member 1/metabolism , Nuclear Receptor Subfamily 4, Group A, Member 2/genetics , Nuclear Receptor Subfamily 4, Group A, Member 2/metabolism , Osteoblasts/metabolism , Sequence Analysis, RNA , Single-Cell AnalysisABSTRACT
With the rapid development of high-throughput technologies, a growing amount of multi-omics data are collected, giving rise to a great demand for combining such data for biomedical discovery. Due to the cost and time to label the data manually, the number of labelled samples is limited. This motivated the need for semi-supervised learning algorithms. In this work, we applied a graph-based semi-supervised learning (GSSL) to classify a severe chronic mental disorder, schizophrenia (SZ). An advantage of GSSL is that it can simultaneously analyse more than two types of data, while many existing models focus on pairwise data analysis. In particular, we applied GSSL to the analysis of single nucleotide polymorphism (SNP), functional magnetic resonance imaging (fMRI) and DNA methylation data, which accounts for genetics, brain imaging (endophenotypes), and environmental factors (epigenomics) respectively. While parameter selection has been an open challenge for most models, another key contribution of this work is that we explored the parameter space to interpret their meaning and established practical guidelines. Based on the practical significance of each hyper-parameter, a relatively small range of candidate values can be determined in a data-driven way to both optimize and speed up the parameter tuning process. We validated the model through both synthetic data and a real SZ dataset of 184 subjects from the Mental Illness and Neuroscience Discovery (MIND) Clinical Imaging Consortium. In comparison to several existing approaches, our algorithm achieved better performance in terms of classification accuracy. We also confirmed the significance of several brain regions associated with SZ.
Subject(s)
Brain , Genomics , Algorithms , Brain/diagnostic imaging , DNA Methylation , Humans , Magnetic Resonance Imaging , NeuroimagingABSTRACT
OBJECTIVE: Integration of multiple datasets is a hot topic in many fields. When studying complex mental disorders, great effort has been dedicated to fusing genetic and brain imaging data. However, an increasing number of studies have pointed out the importance of epigenetic factors in the cause of psychiatric diseases. In this study, we endeavor to fill the gap by combining epigenetics (e.g., DNA methylation) with imaging data (e.g., fMRI) to identify biomarkers for schizophrenia (SZ). METHODS: We propose to combine linear regression with canonical correlation analysis (CCA) in a relaxed yet coupled manner to extract discriminative features for SZ that are co-expressed in the fMRI and DNA methylation data. RESULT: After validation through simulations, we applied our method to real imaging epigenetics data of 184 subjects from the Mental Illness and Neuroscience Discovery Clinical Imaging Consortium. After significance test, we identified 14 brain regions and 44 cytosine-phosphate-guanine(CpG) sites. Average classification accuracy is [Formula: see text]. By linking the CpG sites to genes, we identified pathways Guanosine ribonucleotides de novo biosynthesis and Guanosine nucleotides de novo biosynthesis, and a GO term Perikaryon. CONCLUSION: This imaging epigenetics study has identified both brain regions and genes that are associated with neuron development and memory processing. These biomarkers contribute to a good understanding of the mechanism underlying SZ but are overlooked by previous imaging genetics studies. SIGNIFICANCE: Our study sheds light on the understanding and diagnosis of SZ with a imaging epigenetics approach, which is demonstrated to be effective in extracting novel biomarkers associated with SZ.
Subject(s)
Magnetic Resonance Imaging , Schizophrenia , Biomarkers , DNA Methylation , Epigenesis, Genetic/genetics , Humans , Schizophrenia/diagnostic imaging , Schizophrenia/geneticsABSTRACT
Gastric cancer (GC) is one of the leading causes of cancer-associated deaths worldwide. Due to the lack of typical symptoms and effective biomarkers for non-invasive screening, most patients develop advanced-stage GC by the time of diagnosis. Circulating microRNA (miRNA)-based panels have been reported as a promising tool for the screening of certain types of cancers. In this study, we performed differential expression analysis of miRNA profiles of plasma samples obtained from gastric cancer and non-cancer patients using two independent Gene Expression Omnibus (GEO) datasets: GSE113486 and GSE124158. We identified three miRNAs, hsa-miR-320a, hsa-miR-1260b, and hsa-miR-6515-5p, to distinguish gastric cancer cases from non-cancer controls. The three miRNAs showed an area under the curve (AUC) over 0.95 with high specificity (>93.0%) and sensitivity (>85.0%) in both the discovery datasets. In addition, we further validated these three miRNAs in two external datasets: GSE106817 [sensitivity: hsa-miR-320a (99.1%), hsa-miR-1260b (97.4%), and hsa-miR-6515-5p (92.2%); specificity: hsa-miR-320a (88.8%), hsa-miR-1260b (89.6%), and hsa-miR-6515-5p (88.7%); and AUC: hsa-miR-320a (96.3%), hsa-miR-1260b (97.4%), and hsa-miR-6515-5p (94.6%)] and GSE112264 [sensitivity: hsa-miR-320a (100.0%), hsa-miR-1260b (98.0%), and hsa-miR-6515.5p (98.0%); specificity: hsa-miR-320a (100.0%), hsa-miR-1260b (100.0%), and hsa-miR-6515.5p (92.7%); and AUC: hsa-miR-320a (1.000), hsa-miR-1260b (1.000), and hsa-miR-6515-5p (0.988)]. On the basis of these findings, the three miRNAs can be used as potential biomarkers for gastric cancer screening, which can provide patients with a higher chance of curative resection and longer survival.
ABSTRACT
Recently, a hypergraph constructed from functional magnetic resonance imaging (fMRI) was utilized to explore brain functional connectivity networks (FCNs) for the classification of neurodegenerative diseases. Each edge of a hypergraph (called hyperedge) can connect any number of brain regions-of-interest (ROIs) instead of only two ROIs, and thus characterizes high-order relations among multiple ROIs that cannot be uncovered by a simple graph in the traditional graph based FCN construction methods. Unlike the existing hypergraph based methods where all hyperedges are assumed to have equal weights and only certain topological features are extracted from the hypergraphs, we propose a hypergraph learning based method for FCN construction in this paper. Specifically, we first generate hyperedges from fMRI time series based on sparse representation, then employ hypergraph learning to adaptively learn hyperedge weights, and finally define a hypergraph similarity matrix to represent the FCN. In our proposed method, weighting hyperedges results in better discriminative FCNs across subjects, and the defined hypergraph similarity matrix can better reveal the overall structure of brain network than using those hypergraph topological features. Moreover, we propose a multi-hypergraph learning based method by integrating multi-paradigm fMRI data, where the hyperedge weights associated with each fMRI paradigm are jointly learned and then a unified hypergraph similarity matrix is computed to represent the FCN. We validate the effectiveness of the proposed method on the Philadelphia Neurodevelopmental Cohort dataset for the classification of individuals' learning ability from three paradigms of fMRI data. Experimental results demonstrate that our proposed approach outperforms the traditional graph based methods (i.e., Pearson's correlation and partial correlation with the graphical Lasso) and the existing unweighted hypergraph based methods, which sheds light on how to optimize estimation of FCNs for cognitive and behavioral study.