RESUMO
BACKGROUND: Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder that is highly phenotypically and genetically heterogeneous. With the accumulation of biological sequencing data, more and more studies shift to molecular subtype-first approach, from identifying molecular subtypes based on genetic and molecular data to linking molecular subtypes with clinical manifestation, which can reduce heterogeneity before phenotypic profiling. RESULTS: In this study, we perform similarity network fusion to integrate gene and gene set expression data of multiple human brain cell types for ASD molecular subtype identification. Then we apply subtype-specific differential gene and gene set expression analyses to study expression patterns specific to molecular subtypes in each cell type. To demonstrate the biological and practical significance, we analyze the molecular subtypes, investigate their correlation with ASD clinical phenotype, and construct ASD molecular subtype prediction models. CONCLUSIONS: The identified molecular subtype-specific gene and gene set expression may be used to differentiate ASD molecular subtypes, facilitating the diagnosis and treatment of ASD. Our method provides an analytical pipeline for the identification of molecular subtypes and even disease subtypes of complex disorders.
Assuntos
Transtorno do Espectro Autista , Transtorno Autístico , Humanos , Transtorno Autístico/genética , Transtorno do Espectro Autista/genética , Encéfalo/metabolismoRESUMO
BACKGROUND: Placental dysfunction, a root cause of common syndromes affecting human pregnancy, such as preeclampsia (PE), fetal growth restriction (FGR), and spontaneous preterm delivery (sPTD), remains poorly defined. These common, yet clinically disparate obstetrical syndromes share similar placental histopathologic patterns, while individuals within each syndrome present distinct molecular changes, challenging our understanding and hindering our ability to prevent and treat these syndromes. METHODS: Using our extensive biobank, we identified women with severe PE (n = 75), FGR (n = 40), FGR with a hypertensive disorder (FGR + HDP; n = 33), sPTD (n = 72), and two uncomplicated control groups, term (n = 113), and preterm without PE, FGR, or sPTD (n = 16). We used placental biopsies for transcriptomics, proteomics, metabolomics data, and histological evaluation. After conventional pairwise comparison, we deployed an unbiased, AI-based similarity network fusion (SNF) to integrate the datatypes and identify omics-defined placental clusters. We used Bayesian model selection to compare the association between the histopathological features and disease conditions vs SNF clusters. RESULTS: Pairwise, disease-based comparisons exhibited relatively few differences, likely reflecting the heterogeneity of the clinical syndromes. Therefore, we deployed the unbiased, omics-based SNF method. Our analysis resulted in four distinct clusters, which were mostly dominated by a specific syndrome. Notably, the cluster dominated by early-onset PE exhibited strong placental dysfunction patterns, with weaker injury patterns in the cluster dominated by sPTD. The SNF-defined clusters exhibited better correlation with the histopathology than the predefined disease groups. CONCLUSIONS: Our results demonstrate that integrated omics-based SNF distinctively reclassifies placental dysfunction patterns underlying the common obstetrical syndromes, improves our understanding of the pathological processes, and could promote a search for more personalized interventions.
Assuntos
Placenta , Pré-Eclâmpsia , Gravidez , Recém-Nascido , Feminino , Humanos , Teorema de Bayes , Multiômica , Síndrome , Biópsia , Retardo do Crescimento FetalRESUMO
BACKGROUND: Chronic obstructive pulmonary disease (COPD) is a highly morbid and heterogenous disease. While COPD is defined by spirometry, many COPD characteristics are seen in cigarette smokers with normal spirometry. The extent to which COPD and COPD heterogeneity is captured in omics of lung tissue is not known. METHODS: We clustered gene expression and methylation data in 78 lung tissue samples from former smokers with normal lung function or severe COPD. We applied two integrative omics clustering methods: (1) Similarity Network Fusion (SNF) and (2) Entropy-Based Consensus Clustering (ECC). RESULTS: SNF clusters were not significantly different by the percentage of COPD cases (48.8% vs. 68.6%, p = 0.13), though were different according to median forced expiratory volume in one second (FEV1) % predicted (82 vs. 31, p = 0.017). In contrast, the ECC clusters showed stronger evidence of separation by COPD case status (48.2% vs. 81.8%, p = 0.013) and similar stratification by median FEV1% predicted (82 vs. 30.5, p = 0.0059). ECC clusters using both gene expression and methylation were identical to the ECC clustering solution generated using methylation data alone. Both methods selected clusters with differentially expressed transcripts enriched for interleukin signaling and immunoregulatory interactions between lymphoid and non-lymphoid cells. CONCLUSIONS: Unsupervised clustering analysis from integrated gene expression and methylation data in lung tissue resulted in clusters with modest concordance with COPD, though were enriched in pathways potentially contributing to COPD-related pathology and heterogeneity.
Assuntos
Doença Pulmonar Obstrutiva Crônica , Fumar , Humanos , Pulmão , Volume Expiratório Forçado , Análise por ConglomeradosRESUMO
BACKGROUND: Drug repositioning is an emerging approach in pharmaceutical research for identifying novel therapeutic potentials for approved drugs and discover therapies for untreated diseases. Due to its time and cost efficiency, drug repositioning plays an instrumental role in optimizing the drug development process compared to the traditional de novo drug discovery process. Advances in the genomics, together with the enormous growth of large-scale publicly available data and the availability of high-performance computing capabilities, have further motivated the development of computational drug repositioning approaches. More recently, the rise of machine learning techniques, together with the availability of powerful computers, has made the area of computational drug repositioning an area of intense activities. RESULTS: In this study, a novel framework SNF-NN based on deep learning is presented, where novel drug-disease interactions are predicted using drug-related similarity information, disease-related similarity information, and known drug-disease interactions. Heterogeneous similarity information related to drugs and disease is fed to the proposed framework in order to predict novel drug-disease interactions. SNF-NN uses similarity selection, similarity network fusion, and a highly tuned novel neural network model to predict new drug-disease interactions. The robustness of SNF-NN is evaluated by comparing its performance with nine baseline machine learning methods. The proposed framework outperforms all baseline methods ([Formula: see text] = 0.867, and [Formula: see text]=0.876) using stratified 10-fold cross-validation. To further demonstrate the reliability and robustness of SNF-NN, two datasets are used to fairly validate the proposed framework's performance against seven recent state-of-the-art methods for drug-disease interaction prediction. SNF-NN achieves remarkable performance in stratified 10-fold cross-validation with [Formula: see text] ranging from 0.879 to 0.931 and [Formula: see text] from 0.856 to 0.903. Moreover, the efficiency of SNF-NN is verified by validating predicted unknown drug-disease interactions against clinical trials and published studies. CONCLUSION: In conclusion, computational drug repositioning research can significantly benefit from integrating similarity measures in heterogeneous networks and deep learning models for predicting novel drug-disease interactions. The data and implementation of SNF-NN are available at http://pages.cpsc.ucalgary.ca/ tnjarada/snf-nn.php .
Assuntos
Biologia Computacional , Reposicionamento de Medicamentos , Preparações Farmacêuticas , Algoritmos , Tratamento Farmacológico , Redes Neurais de Computação , Reprodutibilidade dos TestesRESUMO
BACKGROUND: Viruses are closely related to bacteria and human diseases. It is of great significance to predict associations between viruses and hosts for understanding the dynamics and complex functional networks in microbial community. With the rapid development of the metagenomics sequencing, some methods based on sequence similarity and genomic homology have been used to predict associations between viruses and hosts. However, the known virus-host association network was ignored in these methods. RESULTS: We proposed a kernelized logistic matrix factorization with integrating different information to predict potential virus-host associations on the heterogeneous network (ILMF-VH) which is constructed by connecting a virus network with a host network based on known virus-host associations. The virus network is constructed based on oligonucleotide frequency measurement, and the host network is constructed by integrating oligonucleotide frequency similarity and Gaussian interaction profile kernel similarity through similarity network fusion. The host prediction accuracy of our method is better than other methods. In addition, case studies show that the host of crAssphage predicted by ILMF-VH is consistent with presumed host in previous studies, and another potential host Escherichia coli is also predicted. CONCLUSIONS: The proposed model is an effective computational tool for predicting interactions between viruses and hosts effectively, and it has great potential for discovering novel hosts of viruses.
Assuntos
Algoritmos , Vírus/genética , Área Sob a Curva , Bases de Dados como Assunto , Interações Hospedeiro-Patógeno , Humanos , Modelos LogísticosRESUMO
BACKGROUND: Long non-coding RNA (lncRNA) plays important roles in many biological and pathological processes, including transcriptional regulation and gene regulation. As lncRNA interacts with multiple proteins, predicting lncRNA-protein interactions (lncRPIs) is an important way to study the functions of lncRNA. Up to now, there have been a few works that exploit protein-protein interactions (PPIs) to help the prediction of new lncRPIs. RESULTS: In this paper, we propose to boost the prediction of lncRPIs by fusing multiple protein-protein similarity networks (PPSNs). Concretely, we first construct four PPSNs based on protein sequences, protein domains, protein GO terms and the STRING database respectively, then build a more informative PPSN by fusing these four constructed PPSNs. Finally, we predict new lncRPIs by a random walk method with the fused PPSN and known lncRPIs. Our experimental results show that the new approach outperforms the existing methods. CONCLUSION: Fusing multiple protein-protein similarity networks can effectively boost the performance of predicting lncRPIs.
Assuntos
Proteínas/metabolismo , RNA Longo não Codificante/metabolismo , Homologia de Sequência de Aminoácidos , Área Sob a Curva , Humanos , RNA Longo não Codificante/genética , Curva ROCRESUMO
Prediction of protein-protein interactions (PPIs) is of great significance. To achieve this, we propose a novel computational method for PPIs prediction based on a similarity network fusion (SNF) model for integrating the physical and chemical properties of proteins. Specifically, the physical and chemical properties of protein are the protein amino acid mutation rate and its hydrophobicity, respectively. The amino acid mutation rate is extracted using a BLOSUM62 matrix, which puts the protein sequence into block substitution matrix. The SNF model is exploited to fuse protein physical and chemical features of multiple data by iteratively updating each original network. Finally, the complementary features from the fused network are fed into a label propagation algorithm (LPA) for PPIs prediction. The experimental results show that the proposed method achieves promising performance and outperforms the traditional methods for the public dataset of H. pylori, Human, and Yeast. In addition, our proposed method achieves average accuracy of 76.65%, 81.98%, 84.56%, 84.01% and 84.38% on E. coli, C. elegans, H. sapien, H. pylori and M. musculus datasets, respectively. Comparison results demonstrate that the proposed method is very promising and provides a cost-effective alternative for predicting PPIs. The source code and all datasets are available at http://pan.baidu.com/s/1dF7rp7N.
Assuntos
Algoritmos , Mapas de Interação de Proteínas , Sequência de Aminoácidos , Animais , Bases de Dados de Proteínas , Humanos , Interações Hidrofóbicas e Hidrofílicas , Taxa de MutaçãoRESUMO
BACKGROUND: Clustering approaches using single omics platforms are increasingly used to characterise molecular phenotypes of eosinophilic and neutrophilic asthma. Effective integration of multi-omics platforms should lead towards greater refinement of asthma endotypes across molecular dimensions and indicate key targets for intervention or biomarker development. OBJECTIVES: To determine whether multi-omics integration of sputum leads to improved granularity of the molecular classification of severe asthma. METHODS: We analyzed six -omics data blocks-microarray transcriptomics, gene set variation analysis of microarray transcriptomics, SomaSCAN proteomics assay, shotgun proteomics, 16S microbiome sequencing, and shotgun metagenomic sequencing-from induced sputum samples of 57 severe asthma patients, 15 mild-moderate asthma patients, and 13 healthy volunteers in the U-BIOPRED European cohort. We used Monti consensus clustering algorithm for aggregation of clustering results and Similarity Network Fusion to integrate the 6 multi-omics datasets of the 72 asthmatics. RESULTS: Five stable omics-associated clusters were identified (OACs). OAC1 had the best lung function with the least number of severe asthmatics with sputum paucigranulocytic inflammation. OAC5 also had fewer severe asthma patients but the highest incidence of atopy and allergic rhinitis, with paucigranulocytic inflammation. OAC3 comprised only severe asthmatics with the highest sputum eosinophilia. OAC2 had the highest sputum neutrophilia followed by OAC4 with both clusters consisting of mostly severe asthma but with more ex/current smokers in OAC4. Compared to OAC4, there was higher incidence of nasal polyps, allergic rhinitis, and eczema in OAC2. OAC2 had microbial dysbiosis with abundant Moraxella catarrhalis and Haemophilus influenzae. OAC4 was associated with pathways linked to IL-22 cytokine activation, with the prediction of therapeutic response to anti-IL22 antibody therapy. CONCLUSION: Multi-omics analysis of sputum in asthma has defined with greater granularity the asthma endotypes linked to neutrophilic and eosinophilic inflammation. Modelling diverse types of high-dimensional interactions will contribute to a more comprehensive understanding of complex endotypes. KEY POINTS: Unsupervised clustering on sputum multi-omics of asthma subjects identified 3 out of 5 clusters with predominantly severe asthma. One severe asthma cluster was linked to type 2 inflammation and sputum eosinophilia while the other 2 clusters to sputum neutrophilia. One severe neutrophilic asthma cluster was linked to Moraxella catarrhalis and to a lesser extent Haemophilus influenzae while the second cluster to activation of IL-22.
Assuntos
Asma , Escarro , Humanos , Escarro/microbiologia , Escarro/metabolismo , Asma/microbiologia , Asma/imunologia , Asma/genética , Masculino , Feminino , Adulto , Pessoa de Meia-Idade , Neutrófilos/metabolismo , Neutrófilos/imunologia , Eosinófilos/metabolismo , MultiômicaRESUMO
Many experiments have proved that long non-coding RNAs (lncRNAs) in humans have been implicated in disease development. The prediction of lncRNA-disease association is essential in promoting disease treatment and drug development. It is time-consuming and laborious to explore the relationship between lncRNA and diseases in the laboratory. The computation-based approach has clear advantages and has become a promising research direction. This paper proposes a new lncRNA disease association prediction algorithm BRWMC. Firstly, BRWMC constructed several lncRNA (disease) similarity networks based on different measurement angles and fused them into an integrated similarity network by similarity network fusion (SNF). In addition, the random walk method is used to preprocess the known lncRNA-disease association matrix and calculate the estimated scores of potential lncRNA-disease associations. Finally, the matrix completion method accurately predicts the potential lncRNA-disease associations. Under the framework of leave-one-out cross-validation and 5-fold cross-validation, the AUC values obtained by BRWMC are 0.9610 and 0.9739, respectively. In addition, case studies of three common diseases show that BRWMC is a reliable method for prediction.
Assuntos
RNA Longo não Codificante , Humanos , RNA Longo não Codificante/genética , Biologia Computacional/métodos , AlgoritmosRESUMO
With the rapid development of multi-omics technologies and accumulation of large-scale bio-datasets, many studies have conducted a more comprehensive understanding of human diseases and drug sensitivity from multiple biomolecules, such as DNA, RNA, proteins and metabolites. Using single omics data is difficult to systematically and comprehensively analyze the complex disease pathology and drug pharmacology. The molecularly targeted therapy-based approaches face some challenges, such as insufficient target gene labeling ability, and no clear targets for non-specific chemotherapeutic drugs. Consequently, the integrated analysis of multi-omics data has become a new direction for scientists to explore the mechanism of disease and drug. However, the available drug sensitivity prediction models based on multi-omics data still have problems such as overfitting, lack of interpretability, difficulties in integrating heterogeneous data, and the prediction accuracy needs to be improved. In this paper, we proposed a novel drug sensitivity prediction (NDSP) model based on deep learning and similarity network fusion approaches, which extracts drug targets using an improved sparse principal component analysis (SPCA) method for each omics data, and construct sample similarity networks based on the sparse feature matrices. Furthermore, the fused similarity networks are put into a deep neural network for training, which greatly reduces the data dimensionality and weakens the risk of overfitting problem. We use three omics of data, RNA sequence, copy number aberration and methylation, and select 35 drugs from Genomics of Drug Sensitivity in Cancer (GDSC) for experiments, including Food and Drug Administration (FDA)-approved targeted drugs, FDA-unapproved targeted drugs and non-specific therapies. Compared with some current deep learning methods, our proposed method can extract highly interpretable biological features to achieve highly accurate sensitivity prediction of targeted and non-specific cancer drugs, which is beneficial for the development of precision oncology beyond targeted therapy.
RESUMO
BACKGROUND: As new infectious diseases (ID) emerge and others continue to mutate, there remains an imminent threat, especially for vulnerable individuals. Yet no generalizable framework exists to identify the at-risk group prior to infection. Metabolomics has the advantage of capturing the existing physiologic state, unobserved via current clinical measures. Furthermore, metabolomics profiling during acute disease can be influenced by confounding factors such as indications, medical treatments, and lifestyles. METHODS: We employed metabolomic profiling to cluster infection-free individuals and assessed their relationship with COVID severity and influenza incidence/recurrence. FINDINGS: We identified a metabolomic susceptibility endotype that was strongly associated with both severe COVID (ORICUadmission = 6.7, p-value = 1.2 × 10-08, ORmortality = 4.7, p-value = 1.6 × 10-04) and influenza (ORincidence = 2.9; p-values = 2.2 × 10-4, ßrecurrence = 1.03; p-value = 5.1 × 10-3). We observed similar severity associations when recapitulating this susceptibility endotype using metabolomics from individuals during and after acute COVID infection. We demonstrate the value of using metabolomic endotyping to identify a metabolically susceptible group for two-and potentially more-IDs that are driven by increases in specific amino acids, including microbial-related metabolites such as tryptophan, bile acids, histidine, polyamine, phenylalanine, and tyrosine metabolism, as well as carbohydrates involved in glycolysis. INTERPRETATIONS: These metabolites may be identified prior to infection to enable protective measures for these individuals. FUNDING: The Longitudinal EMR and Omics COVID-19 Cohort (LEOCC) and metabolomic profiling were supported by the National Heart, Lung, and Blood Institute and the Intramural Research Program of the National Center for Advancing Translational Sciences, National Institutes of Health.
Assuntos
COVID-19 , Doenças Transmissíveis , Influenza Humana , Humanos , Metaboloma , Estudos Prospectivos , Influenza Humana/epidemiologia , Metabolômica , Doenças Transmissíveis/etiologiaRESUMO
Various diseases, including Huntington's disease, Alzheimer's disease, and Parkinson's disease, have been reported to be linked to amyloid. Therefore, it is crucial to distinguish amyloid from non-amyloid proteins or peptides. While experimental approaches are typically preferred, they are costly and time-consuming. In this study, we have developed a machine learning framework called iAMY-RECMFF to discriminate amyloidgenic from non-amyloidgenic peptides. In our model, we first encoded the peptide sequences using the residue pairwise energy content matrix. We then utilized Pearson's correlation coefficient and distance correlation to extract useful information from this matrix. Additionally, we employed an improved similarity network fusion algorithm to integrate features from different perspectives. The Fisher approach was adopted to select the optimal feature subset. Finally, the selected features were inputted into a support vector machine for identifying amyloidgenic peptides. Experimental results demonstrate that our proposed method significantly improves the identification of amyloidgenic peptides compared to existing predictors. This suggests that our method may serve as a powerful tool in identifying amyloidgenic peptides. To facilitate academic use, the dataset and codes used in the current study are accessible at https://figshare.com/articles/online_resource/iAMY-RECMFF/22816916.
Assuntos
Algoritmos , Peptídeos , Peptídeos/química , Sequência de Aminoácidos , Aprendizado de Máquina , Máquina de Vetores de SuporteRESUMO
We propose a computational framework for selecting biologically plausible genes identified by clustering of multi-omics data that reveal patients' similarity, thus giving researchers a more comprehensive view on any given disease. We employ spectral clustering of a similarity network created by fusion of three similarity networks, based on mRNA expression of immune genes, miRNA expression and DNA methylation data, using SNF_v2.1 software. For each cluster, we rank multi-omics features, ensuring the best separation between clusters, and select the top-ranked features that preserve clustering. To find genes targeted by DNA methylation and miRNAs found in the top-ranked features, we use chromosome-conformation capture data and miRNet2.0 software, respectively. To identify informative genes, these combined sets of target genes are analyzed in terms of their enrichment in somatic/germline mutations, GO biological processes/pathways terms and known sets of genes considered to be important in relation to a given disease, as recorded in the Molecular Signature Database from GSEA. The protein-protein interaction (PPI) networks were analyzed to identify genes that are hubs of PPI networks. We used data recorded in The Cancer Genome Atlas for patients with acute myeloid leukemia to demonstrate our approach, and discuss our findings in the context of results in the literature.
Assuntos
Leucemia Mieloide Aguda , MicroRNAs , Humanos , Multiômica , Biologia Computacional/métodos , Leucemia Mieloide Aguda/genética , MicroRNAs/genética , MicroRNAs/metabolismo , SoftwareRESUMO
The clinical outcome and disease severity in coronavirus disease 2019 (COVID-19) are heterogeneous, and the progression or fatality of the disease cannot be explained by a single factor like age or comorbidities. In this study, we used system-wide network-based system biology analysis using whole blood RNA sequencing, immunophenotyping by flow cytometry, plasma metabolomics, and single-cell-type metabolomics of monocytes to identify the potential determinants of COVID-19 severity at personalized and group levels. Digital cell quantification and immunophenotyping of the mononuclear phagocytes indicated a substantial role in coordinating the immune cells that mediate COVID-19 severity. Stratum-specific and personalized genome-scale metabolic modeling indicated monocarboxylate transporter family genes (e.g., SLC16A6), nucleoside transporter genes (e.g., SLC29A1), and metabolites such as α-ketoglutarate, succinate, malate, and butyrate could play a crucial role in COVID-19 severity. Metabolic perturbations targeting the central metabolic pathway (TCA cycle) can be an alternate treatment strategy in severe COVID-19.
Assuntos
COVID-19 , Humanos , Redes e Vias Metabólicas , MetabolômicaRESUMO
In light of the rapid accumulation of large-scale omics datasets, numerous studies have attempted to characterize the molecular and clinical features of cancers from a multi-omics perspective. However, there are great challenges in integrating multi-omics using machine learning methods for cancer subtype classification. In this study, MoGCN, a multi-omics integration model based on graph convolutional network (GCN) was developed for cancer subtype classification and analysis. Genomics, transcriptomics and proteomics datasets for 511 breast invasive carcinoma (BRCA) samples were downloaded from the Cancer Genome Atlas (TCGA). The autoencoder (AE) and the similarity network fusion (SNF) methods were used to reduce dimensionality and construct the patient similarity network (PSN), respectively. Then the vector features and the PSN were input into the GCN for training and testing. Feature extraction and network visualization were used for further biological knowledge discovery and subtype classification. In the analysis of multi-dimensional omics data of the BRCA samples in TCGA, MoGCN achieved the highest accuracy in cancer subtype classification compared with several popular algorithms. Moreover, MoGCN can extract the most significant features of each omics layer and provide candidate functional molecules for further analysis of their biological effects. And network visualization showed that MoGCN could make clinically intuitive diagnosis. The generality of MoGCN was proven on the TCGA pan-kidney cancer datasets. MoGCN and datasets are public available at https://github.com/Lifoof/MoGCN. Our study shows that MoGCN performs well for heterogeneous data integration and the interpretability of classification results, which confers great potential for applications in biomarker identification and clinical diagnosis.
RESUMO
Accurate identification of Drug Target Interactions (DTIs) is of great significance for understanding the mechanism of drug treatment and discovering new drugs for disease treatment. Currently, computational methods of DTIs prediction that combine drug and target multi-source data can effectively reduce the cost and time of drug development. However, in multi-source data processing, the contribution of different source data to DTIs is often not considered. Therefore, how to make full use of the contribution of different source data to predict DTIs for efficient fusion is the key to improving the prediction accuracy of DTIs. In this paper, considering the contribution of different source data to DTIs prediction, a DTIs prediction approach based on an effective fusion of drug and target multi-source data is proposed, named EFMSDTI. EFMSDTI first builds 15 similarity networks based on multi-source information networks classified as topological and semantic graphs of drugs and targets according to their biological characteristics. Then, the multi-networks are fused by selective and entropy weighting based on similarity network fusion (SNF) according to their contribution to DTIs prediction. The deep neural networks model learns the embedding of low-dimensional vectors of drugs and targets. Finally, the LightGBM algorithm based on Gradient Boosting Decision Tree (GBDT) is used to complete DTIs prediction. Experimental results show that EFMSDTI has better performance (AUROC and AUPR are 0.982) than several state-of-the-art algorithms. Also, it has a good effect on analyzing the top 1000 prediction results, while 990 of the first 1000DTIs were confirmed. Code and data are available at https://github.com/meng-jie/EFMSDTI.
RESUMO
MicroRNAs (miRNAs) that belong to non-coding RNAs are verified to be closely associated with several complicated biological processes and human diseases. In this study, we proposed a novel model that was Similarity Network Fusion and Inductive Matrix Completion for miRNA-Disease Association Prediction (SNFIMCMDA). We applied inductive matrix completion (IMC) method to acquire possible associations between miRNAs and diseases, which also could obtain corresponding correlation scores. IMC was performed based on the verified connections of miRNA-disease, miRNA similarity, and disease similarity. In addition, miRNA similarity and disease similarity were calculated by similarity network fusion, which could masterly integrate multiple data types to obtain target data. We integrated miRNA functional similarity and Gaussian interaction profile kernel similarity by similarity network fusion to obtain miRNA similarity. Similarly, disease similarity was integrated in this way. To indicate the utility and effectiveness of SNFIMCMDA, we both applied global leave-one-out cross-validation and five-fold cross-validation to validate our model. Furthermore, case studies on three significant human diseases were also implemented to prove the effectiveness of SNFIMCMDA. The results demonstrated that SNFIMCMDA was effective for prediction of possible associations of miRNA-disease.
RESUMO
BACKGROUND: Alzheimer's Disease (AD) is a neurodegenerative brain disease in the elderly. Recent studies have revealed the heterogeneous nature of AD. Mild Cognitive Impairment (MCI) is the prodromal stage of AD. OBJECTIVES: In this study, we identified subtypes of MCI based on genetic polymorphism and gene expression. METHODS: We utilized the two types of omics data, namely genetic polymorphism and gene expression profiling, derived from 125 MCI patients' peripheral blood samples from the ADNI-1 dataset. Similarity network fusion (SNF) algorithm was implemented to cluster MCI patient subtypes. And 185 MCI patients in ADNI-2 were utilized to evaluate the effectiveness of this method. Two MCI subtypes were identified by implementing the SNF algorithm. RESULTS: We used Kaplan-Meier analysis and log-rank testing for the conversion from MCI to AD between two subtypes, and p-value is 4.58×10-3. In addition, we compared patients among two MCI subtypes by the following factors: the changes in Alzheimer's Disease cognitive scales and MRI image; significantly enriched pathways based on differentially expressed genes. This study proved that MCI is a heterogeneous disease by concluding that AD development in two MCI subtypes is significantly different. CONCLUSIONS: MCI patients with different molecular characteristics have different risks converting to AD. In addition to evaluating statistics, genetic polymorphism and gene expression profiling from MCI patients' peripheral blood are non-invasiveness and cost-effectiveness markers to identify MCI subtypes for clinical application.
Assuntos
Doença de Alzheimer/genética , Disfunção Cognitiva/genética , Expressão Gênica/genética , Polimorfismo Genético/genética , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Doença de Alzheimer/mortalidade , Biomarcadores/análise , Disfunção Cognitiva/mortalidade , Progressão da Doença , Feminino , Humanos , Imageamento por Ressonância Magnética/métodos , MasculinoRESUMO
This study is to identify potential multiomics biomarkers for the early detection of the prognostic recurrence of PC patients. A total of 494 prostate adenocarcinoma (PRAD) patients (60-recurrent included) from the Cancer Genome Atlas (TCGA) portal were analyzed using the autoencoder model and similarity network fusion. Then, multiomics panels were constructed according to the intersected omics biomarkers identified from the two models. Six intersected omics biomarkers, TELO2, ZMYND19, miR-143, miR-378a, cg00687383 (MED4), and cg02318866 (JMJD6; METTL23), were collected for multiomics panel construction. The difference between the Kaplan-Meier curves of high and low recurrence-risk groups generated from the multiomics panel achieved p-value = 5.33 × 10-9, which is better than the former study (p-value = 5 × 10-7). Additionally, when evaluating the selected multiomics biomarkers with clinical information (Gleason score, age, and cancer stage), a high-performance prediction model was generated with C-index = 0.713, p-value = 2.97 × 10-15, and AUC = 0.789. The risk score generated from the selected multiomics biomarkers worked as an effective indicator for the prediction of PRAD recurrence. This study helps us to understand the etiology and pathways of PRAD and further benefits both patients and physicians with potential prognostic biomarkers when making clinical decisions after surgical treatment.
RESUMO
SCOPE: Combining different "omics" data types in a single, integrated analysis may better characterize the effects of diet on human health. METHODS AND RESULTS: The performance of two data integration tools, similarity network fusion tool (SNFtool) and Data Integration Analysis for Biomarker discovery using Latent variable approaches for "Omics" (DIABLO; MixOmics), in discriminating responses to diet and metabolic phenotypes is investigated by combining transcriptomics and metabolomics datasets from three human intervention studies: a postprandial crossover study testing dairy foods (n = 7; study 1), a postprandial challenge study comparing obese and non-obese subjects (n = 13; study 2); and an 8-week parallel intervention study that assessed three diets with variable lipid content on fasting parameters (n = 39; study 3). In study 1, combining datasets using SNF or DIABLO significantly improve sample classification. For studies 2 and 3, the value of SNF integration depends on the dietary groups being compared, while DIABLO discriminates samples well but does not perform better than transcriptomic data alone. CONCLUSION: The integration of associated "omics" datasets can help clarify the subtle signals observed in nutritional interventions. The performance of each integration tool is differently influenced by study design, size of the datasets, and sample size.