ABSTRACT
The functional study of proteins is a critical task in modern biology, playing a pivotal role in understanding the mechanisms of pathogenesis, developing new drugs, and discovering novel drug targets. However, existing computational models for subcellular localization face significant challenges, such as reliance on known Gene Ontology (GO) annotation databases or overlooking the relationship between GO annotations and subcellular localization. To address these issues, we propose DeepMTC, an end-to-end deep learning-based multi-task collaborative training model. DeepMTC integrates the interrelationship between subcellular localization and the functional annotation of proteins, leveraging multi-task collaborative training to eliminate dependence on known GO databases. This strategy gives DeepMTC a distinct advantage in predicting newly discovered proteins without prior functional annotations. First, DeepMTC leverages pre-trained language model with high accuracy to obtain the 3D structure and sequence features of proteins. Additionally, it employs a graph transformer module to encode protein sequence features, addressing the problem of long-range dependencies in graph neural networks. Finally, DeepMTC uses a functional cross-attention mechanism to efficiently combine upstream learned functional features to perform the subcellular localization task. The experimental results demonstrate that DeepMTC outperforms state-of-the-art models in both protein function prediction and subcellular localization. Moreover, interpretability experiments revealed that DeepMTC can accurately identify the key residues and functional domains of proteins, confirming its superior performance. The code and dataset of DeepMTC are freely available at https://github.com/ghli16/DeepMTC.
Subject(s)
Deep Learning , Proteins , Proteins/metabolism , Computational Biology/methods , Databases, Protein , Neural Networks, Computer , Humans , Gene OntologyABSTRACT
Pigs are the most suitable model to study various therapeutic strategies and drugs for human beings, although knowledge about cell type-specific transcriptomes and heterogeneity is poorly available. Through single-cell RNA sequencing and flow cytometry analysis of the types in the jejunum of pigs, we found that innate lymphoid cells (ILCs) existed in the lamina propria lymphocytes (LPLs) of the jejunum. Then, through flow sorting of live/dead-lineage (Lin)-CD45+ cells and single-cell RNA sequencing, we found that ILCs in the porcine jejunum were mainly ILC3s, with a small number of NK cells, ILC1s, and ILC2s. ILCs coexpressed IL-7Rα, ID2, and other genes and differentially expressed RORC, GATA3, and other genes but did not express the CD3 gene. ILC3s can be divided into four subgroups, and genes such as CXCL8, CXCL2, IL-22, IL-17, and NCR2 are differentially expressed. To further detect and identify ILC3s, we verified the classification of ILCs in the porcine jejunum subgroup and the expression of related hallmark genes at the protein level by flow cytometry. For systematically characterizing ILCs in the porcine intestines, we combined our pig ILC dataset with publicly available human and mice ILC data and identified that the human and pig ILCs shared more common features than did those mouse ILCs in gene signatures and cell states. Our results showed in detail for the first time (to our knowledge) the gene expression of porcine jejunal ILCs, the subtype classification of ILCs, and the markers of various ILCs, which provide a basis for an in-depth exploration of porcine intestinal mucosal immunity.
Subject(s)
Immunity, Innate , Lymphocytes , Humans , Animals , Mice , Swine , Jejunum , Killer Cells, Natural , Mucous MembraneABSTRACT
Sorbitol is a critical photosynthate and storage substance in the Rosaceae family. Sorbitol transporters (SOTs) play a vital role in facilitating sorbitol allocation from source to sink organs and sugar accumulation in sink organs. While prior research has addressed gene duplications within the SOT gene family in Rosaceae, the precise origin and evolutionary dynamics of these duplications remain unclear, largely due to the complicated interplay of whole genome duplications and tandem duplications. Here, we investigated the synteny relationships among all identified Polyol/Monosaccharide Transporter (PLT) genes in 61 angiosperm genomes and SOT genes in representative genomes within the Rosaceae family. By integrating phylogenetic analyses, we elucidated the lineage-specific expansion and syntenic conservation of PLTs and SOTs across diverse plant lineages. We found that Rosaceae SOTs, as PLT family members, originated from a pair of tandemly duplicated PLT genes within Class III-A. Furthermore, our investigation highlights the role of lineage-specific and synergistic duplications in Amygdaloideae in contributing to the expansion of SOTs in Rosaceae plants. Collectively, our findings provide insights into the genomic origins, duplication events, and subsequent divergence of SOT gene family members. Such insights lay a crucial foundation for comprehensive functional characterizations in future studies.
Subject(s)
Magnoliopsida , Rosaceae , Rosaceae/genetics , Phylogeny , Magnoliopsida/genetics , Genome, Plant/genetics , Sorbitol , Evolution, Molecular , Gene DuplicationABSTRACT
MOTIVATION: Recent advances in spatial transcriptomics technologies have enabled gene expression profiles while preserving spatial context. Accurately identifying spatial domains is crucial for downstream analysis and it requires the effective integration of gene expression profiles and spatial information. While increasingly computational methods have been developed for spatial domain detection, most of them cannot adaptively learn the complex relationship between gene expression and spatial information, leading to sub-optimal performance. RESULTS: To overcome these challenges, we propose a novel deep learning method named Spatial-MGCN for identifying spatial domains, which is a Multi-view Graph Convolutional Network (GCN) with attention mechanism. We first construct two neighbor graphs using gene expression profiles and spatial information, respectively. Then, a multi-view GCN encoder is designed to extract unique embeddings from both the feature and spatial graphs, as well as their shared embeddings by combining both graphs. Finally, a zero-inflated negative binomial decoder is used to reconstruct the original expression matrix by capturing the global probability distribution of gene expression profiles. Moreover, Spatial-MGCN incorporates a spatial regularization constraint into the features learning to preserve spatial neighbor information in an end-to-end manner. The experimental results show that Spatial-MGCN outperforms state-of-the-art methods consistently in several tasks, including spatial clustering and trajectory inference.
Subject(s)
Eye Diseases, Hereditary , Genetic Diseases, X-Linked , Humans , Gene Expression ProfilingABSTRACT
MOTIVATION: Recent advances in spatial transcriptomics technologies have provided multi-modality data integrating gene expression, spatial context, and histological images. Accurately identifying spatial domains and spatially variable genes is crucial for understanding tissue structures and biological functions. However, effectively combining multi-modality data to identify spatial domains and determining SVGs closely related to these spatial domains remains a challenge. RESULTS: In this study, we propose spatial transcriptomics multi-modality and multi-granularity collaborative learning (spaMMCL). For detecting spatial domains, spaMMCL mitigates the adverse effects of modality bias by masking portions of gene expression data, integrates gene and image features using a shared graph convolutional network, and employs graph self-supervised learning to deal with noise from feature fusion. Simultaneously, based on the identified spatial domains, spaMMCL integrates various strategies to detect potential SVGs at different granularities, enhancing their reliability and biological significance. Experimental results demonstrate that spaMMCL substantially improves the identification of spatial domains and SVGs. AVAILABILITY AND IMPLEMENTATION: The code and data of spaMMCL are available on Github: Https://github.com/liangxiao-cs/spaMMCL.
Subject(s)
Transcriptome , Humans , Transcriptome/genetics , Gene Expression Profiling/methods , Computational Biology/methods , Algorithms , SoftwareABSTRACT
BACKGROUND: Conducting traditional wet experiments to guide drug development is an expensive, time-consuming and risky process. Analyzing drug function and repositioning plays a key role in identifying new therapeutic potential of approved drugs and discovering therapeutic approaches for untreated diseases. Exploring drug-disease associations has far-reaching implications for identifying disease pathogenesis and treatment. However, reliable detection of drug-disease relationships via traditional methods is costly and slow. Therefore, investigations into computational methods for predicting drug-disease associations are currently needed. RESULTS: This paper presents a novel drug-disease association prediction method, RAFGAE. First, RAFGAE integrates known associations between diseases and drugs into a bipartite network. Second, RAFGAE designs the Re_GAT framework, which includes multilayer graph attention networks (GATs) and two residual networks. The multilayer GATs are utilized for learning the node embeddings, which is achieved by aggregating information from multihop neighbors. The two residual networks are used to alleviate the deep network oversmoothing problem, and an attention mechanism is introduced to combine the node embeddings from different attention layers. Third, two graph autoencoders (GAEs) with collaborative training are constructed to simulate label propagation to predict potential associations. On this basis, free multiscale adversarial training (FMAT) is introduced. FMAT enhances node feature quality through small gradient adversarial perturbation iterations, improving the prediction performance. Finally, tenfold cross-validations on two benchmark datasets show that RAFGAE outperforms current methods. In addition, case studies have confirmed that RAFGAE can detect novel drug-disease associations. CONCLUSIONS: The comprehensive experimental results validate the utility and accuracy of RAFGAE. We believe that this method may serve as an excellent predictor for identifying unobserved disease-drug associations.
Subject(s)
Drug Repositioning , Drug Repositioning/methods , Humans , Computational Biology/methods , Algorithms , Neural Networks, ComputerABSTRACT
BACKGROUND: Long noncoding RNAs (lncRNAs) are integral to a plethora of critical cellular biological processes, including the regulation of gene expression, cell differentiation, and the development of tumors and cancers. Predicting the relationships between lncRNAs and diseases can contribute to a better understanding of the pathogenic mechanisms of disease and provide strong support for the development of advanced treatment methods. RESULTS: Therefore, we present an innovative Node-Adaptive Graph Transformer model for predicting unknown LncRNA-Disease Associations, named NAGTLDA. First, we utilize the node-adaptive feature smoothing (NAFS) method to learn the local feature information of nodes and encode the structural information of the fusion similarity network of diseases and lncRNAs using Structural Deep Network Embedding (SDNE). Next, the Transformer module is used to capture potential association information between the network nodes. Finally, we employ a Transformer module with two multi-headed attention layers for learning global-level embedding fusion. Network structure coding is added as the structural inductive bias of the network to compensate for the missing message-passing mechanism in Transformer. NAGTLDA achieved an average AUC of 0.9531 and AUPR of 0.9537 significantly higher than state-of-the-art methods in 5-fold cross validation. We perform case studies on 4 diseases; 55 out of 60 associations between lncRNAs and diseases have been validated in the literatures. The results demonstrate the enormous potential of the graph Transformer structure to incorporate graph structural information for uncovering lncRNA-disease unknown correlations. CONCLUSIONS: Our proposed NAGTLDA model can serve as a highly efficient computational method for predicting biological information associations.
Subject(s)
Neoplasms , RNA, Long Noncoding , Humans , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism , Computational Biology/methods , Neoplasms/genetics , AlgorithmsABSTRACT
Circular RNAs (circRNAs) are a category of novelty discovered competing endogenous non-coding RNAs that have been proved to implicate many human complex diseases. A large number of circRNAs have been confirmed to be involved in cancer progression and are expected to become promising biomarkers for tumor diagnosis and targeted therapy. Deciphering the underlying relationships between circRNAs and diseases may provide new insights for us to understand the pathogenesis of complex diseases and further characterize the biological functions of circRNAs. As traditional experimental methods are usually time-consuming and laborious, computational models have made significant progress in systematically exploring potential circRNA-disease associations, which not only creates new opportunities for investigating pathogenic mechanisms at the level of circRNAs, but also helps to significantly improve the efficiency of clinical trials. In this review, we first summarize the functions and characteristics of circRNAs and introduce some representative circRNAs related to tumorigenesis. Then, we mainly investigate the available databases and tools dedicated to circRNA and disease studies. Next, we present a comprehensive review of computational methods for predicting circRNA-disease associations and classify them into five categories, including network propagating-based, path-based, matrix factorization-based, deep learning-based and other machine learning methods. Finally, we further discuss the challenges and future researches in this field.
Subject(s)
Neoplasms , RNA, Circular , Algorithms , Computational Biology/methods , Humans , Machine Learning , Neoplasms/geneticsABSTRACT
Increasing biological evidence indicated that microRNAs (miRNAs) play a vital role in exploring the pathogenesis of various human diseases (especially in tumors). Mining disease-related miRNAs is of great significance for the clinical diagnosis and treatment of diseases. Compared with the traditional experimental methods with the significant limitations of high cost, long cycle and small scale, the methods based on computing have the advantages of being cost-effective. However, although the current methods based on computational biology can accurately predict the correlation between miRNAs and disease, they can not predict the detailed association information at a fine level. We propose a knowledge-driven approach to the fine-grained prediction of disease-related miRNAs (KDFGMDA). Different from the previous methods, this method can finely predict the clear associations between miRNA and disease, such as upregulation, downregulation or dysregulation. Specifically, KDFGMDA extracts triple information from massive experimental data and existing datasets to construct a knowledge graph and then trains a depth graph representation learning model based on knowledge graph to complete fine-grained prediction tasks. Experimental results show that KDFGMDA can predict the relationship between miRNA and disease accurately, which is of far-reaching significance for medical clinical research and early diagnosis, prevention and treatment of diseases. Additionally, the results of case studies on three types of cancers, Kaplan-Meier survival analysis and expression difference analysis further provide the effectiveness and feasibility of KDFGMDA to detect potential candidate miRNAs. Availability: Our work can be downloaded from https://github.com/ShengPengYu/KDFGMDA.
Subject(s)
MicroRNAs , Neoplasms , Algorithms , Computational Biology/methods , Down-Regulation , Humans , MicroRNAs/genetics , MicroRNAs/metabolism , Neoplasms/diagnosis , Neoplasms/geneticsABSTRACT
MOTIVATION: Single-cell RNA-sequencing (scRNA-seq) is widely used to reveal cellular heterogeneity, complex disease mechanisms and cell differentiation processes. Due to high sparsity and complex gene expression patterns, scRNA-seq data present a large number of dropout events, affecting downstream tasks such as cell clustering and pseudo-time analysis. Restoring the expression levels of genes is essential for reducing technical noise and facilitating downstream analysis. However, existing scRNA-seq data imputation methods ignore the topological structure information of scRNA-seq data and cannot comprehensively utilize the relationships between cells. RESULTS: Here, we propose a single-cell Graph Contrastive Learning method for scRNA-seq data imputation, named scGCL, which integrates graph contrastive learning and Zero-inflated Negative Binomial (ZINB) distribution to estimate dropout values. scGCL summarizes global and local semantic information through contrastive learning and selects positive samples to enhance the representation of target nodes. To capture the global probability distribution, scGCL introduces an autoencoder based on the ZINB distribution, which reconstructs the scRNA-seq data based on the prior distribution. Through extensive experiments, we verify that scGCL outperforms existing state-of-the-art imputation methods in clustering performance and gene imputation on 14 scRNA-seq datasets. Further, we find that scGCL can enhance the expression patterns of specific genes in Alzheimer's disease datasets. AVAILABILITY AND IMPLEMENTATION: The code and data of scGCL are available on Github: https://github.com/zehaoxiong123/scGCL. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Gene Expression Profiling , Software , Sequence Analysis, RNA , Single-Cell Gene Expression Analysis , Single-Cell Analysis/methods , Cluster AnalysisABSTRACT
The projection of fringes plays an essential role in many applications, such as fringe projection profilometry and structured illumination microscopy. However, these capabilities are significantly constrained in environments affected by optical scattering. Although recent developments in wavefront shaping have effectively generated high-fidelity focal points and relatively simple structured images amidst scattering, the ability to project fringes that cover half of the projection area has not yet been achieved. To address this limitation, this study presents a fringe projector enabled by a neural network, capable of projecting fringes with variable periodicities and orientation angles through scattering media. We tested this projector on two types of scattering media: ground glass diffusers and multimode fibers. For these scattering media, the average Pearson's correlation coefficients between the projected fringes and their designed configurations are 86.9% and 79.7%, respectively. These results demonstrate the effectiveness of the proposed neural network enabled fringe projector. This advancement is expected to broaden the scope of fringe-based imaging techniques, making it feasible to employ them in conditions previously hindered by scattering effects.
ABSTRACT
OBJECTIVE: To identify the hub miRNAs and mRNAs contributing to the spontaneous recovery of an H2O2-induced zebrafish cataract model. METHODS: Zebrafishes were divided into three groups, i.e., Group A, which included normal control fish (day 0), and Groups B and C, where fish were injected with 2.5% hydrogen peroxide into the anterior chamber and reared for 14 and 30 days, respectively. Fish eyes were examined by stereomicroscope photography and optical coherence tomography (OCT). RNA profiles of fish lenses were detected by RNA sequencing. Differentially expressed genes (DEGs) and differentially expressed miRNAs (DEmiRs) were identified among three groups. The DEGs and DEmiRs, which changed in opposite positions between "B vs. A" and "C vs. B" were defined as ODGs (opposite positions changed DEGs) and ODmiRs (opposite positions changed DEmiRs). Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) analysis were carried out by R language. The protein-protein interaction network (PPI) was constructed using STRING. Potential targets of miRNAs were obtained using miRanda. miRNA-mRNA networks were constructed by Cytoscape. RESULTS: The fish lens opacity formed on day 14 and recovered to transparent on day 30 after injection. Compared to group B, 1366 DEGs and 54 DEmiRs were identified in group C. "C vs. B" DEGs were enriched in gene clusters related to development and oxidative phosphorylation. Target genes of DEmiRs were enriched in clusters such as development and cysteine metabolism. Among three groups, 786 ODGs and 27 ODmiRs were identified, and 480 ODGs were predicted as targets of ODmiRs. Target ODGs were enriched in pathways related to methionine metabolism, ubiquitin, sensory system development, and structural constituents of the eye lens. In addition, we established an ODmiRs-ODGs regulation network. CONCLUSION: We identified several hub mRNAs and altered miRNAs in the formation and reversal of zebrafish cataracts. These hub miRNAs/mRNAs could be potential targets for the non-surgical treatment of ARC.
Subject(s)
MicroRNAs , Animals , MicroRNAs/genetics , MicroRNAs/metabolism , Zebrafish/genetics , Hydrogen Peroxide , Gene Regulatory Networks , Gene Expression Profiling/methods , RNA, Messenger/genetics , RNA, Messenger/metabolismABSTRACT
Escherichia coli O157:H7 (E. coli O157:H7) is a foodborne pathogenic microorganism that is commonly found in the environment and poses a significant threat to human health, public safety, and economic stability worldwide. Thus, early detection is essential for E. coli O157:H7 control. In recent years, a series of E. coli O157:H7 detection methods have been developed, but the sensitivity and portability of the methods still need improvement. Therefore, in this study, a rapid and efficient testing platform based on the CRISPR/Cas12a cleavage reaction was constructed. Through the integration of recombinant polymerase amplification and lateral flow chromatography, we established a dual-interpretation-mode detection platform based on CRISPR/Cas12a-derived fluorescence and lateral flow chromatography for the detection of E. coli O157:H7. For the fluorescence detection method, the limits of detection (LODs) of genomic DNA and E. coli O157:H7 were 1.8 fg/µL and 2.4 CFU/mL, respectively, within 40 min. Conversely, for the lateral flow detection method, LODs of 1.8 fg/µL and 2.4 × 102 CFU/mL were achieved for genomic DNA and E. coli O157:H7, respectively, within 45 min. This detection strategy offered higher sensitivity and lower equipment requirements than industry standards. In conclusion, the established platform showed excellent specificity and strong universality. Modifying the target gene and its primers can broaden the platform's applicability to detect various other foodborne pathogens.
Subject(s)
CRISPR-Cas Systems , Escherichia coli O157 , Limit of Detection , Escherichia coli O157/genetics , Escherichia coli O157/isolation & purification , DNA, Bacterial/analysis , DNA, Bacterial/genetics , Food Microbiology/methods , CRISPR-Associated Proteins/genetics , Humans , Endodeoxyribonucleases/geneticsABSTRACT
BACKGROUND: Extended depth of focus (EDOF) and multifocal (Multi) intraocular lenses (IOL) can provide a fixed distance of near vision, which may result in some discomfort for patients who prefer different near distances. The aim of this study was to compare the vision, comfortable near distance (CND) and visual comfort in patients who underwent implantation of EDOF, Multi, and monofocal (Mono) IOLs. METHODS: A total of 100 eyes were implanted with Tecnis ZXR00, ZMB00 or ZCB00 IOLs. Uncorrected distance, intermediate, and near visual acuity (UDVA, UIVA, and UNVA, respectively), corrected distance visual acuity (CDVA), the fluctuations of CND, the ability to see at comfortable or standard near distance and visual comfort were evaluated at 3-month postoperative. RESULTS: At 3 months postoperative, the EDOF and Multi groups showed non-inferiority compared to the Mono group in the UDVA (P > 0.05) and CDVA (P > 0.05) but superiority in the UNVA (P < 0.001). The UIVA was better in the EDOF group, with comparable results for the Multi and Mono groups. There was no difference in preoperative and postoperative CND in the three groups. The CND visual acuity (CNDVA) was lower than the UNVA in the three groups, especially in the EDOF and Multi groups (P < 0.05). The CND effectively improved patients' near visual comfort and visual clarity, except for one patient in the Multi group who complained of severe fatigue and was unable to tolerate the experience at month 3. CONCLUSION: The EDOF and Multi IOLs achieved excellent visual quality and superior UNVA compared to the Mono IOL, but the CNDVA was significantly inferior to the UNVA. Patients' near visual experience can be effectively improved at their CND.
Subject(s)
Depth Perception , Lens Implantation, Intraocular , Multifocal Intraocular Lenses , Visual Acuity , Humans , Visual Acuity/physiology , Female , Male , Middle Aged , Aged , Lens Implantation, Intraocular/methods , Depth Perception/physiology , Lenses, Intraocular , Patient Satisfaction , Pseudophakia/physiopathology , Phacoemulsification/methods , Prosthesis Design , Prospective Studies , Cataract/physiopathology , Cataract/complications , Refraction, Ocular/physiologyABSTRACT
In response to the challenges of accurate identification and localization of garbage in intricate urban street environments, this paper proposes EcoDetect-YOLO, a garbage exposure detection algorithm based on the YOLOv5s framework, utilizing an intricate environment waste exposure detection dataset constructed in this study. Initially, a convolutional block attention module (CBAM) is integrated between the second level of the feature pyramid etwork (P2) and the third level of the feature pyramid network (P3) layers to optimize the extraction of relevant garbage features while mitigating background noise. Subsequently, a P2 small-target detection head enhances the model's efficacy in identifying small garbage targets. Lastly, a bidirectional feature pyramid network (BiFPN) is introduced to strengthen the model's capability for deep feature fusion. Experimental results demonstrate EcoDetect-YOLO's adaptability to urban environments and its superior small-target detection capabilities, effectively recognizing nine types of garbage, such as paper and plastic trash. Compared to the baseline YOLOv5s model, EcoDetect-YOLO achieved a 4.7% increase in mAP0.5, reaching 58.1%, with a compact model size of 15.7 MB and an FPS of 39.36. Notably, even in the presence of strong noise, the model maintained a mAP0.5 exceeding 50%, underscoring its robustness. In summary, EcoDetect-YOLO, as proposed in this paper, boasts high precision, efficiency, and compactness, rendering it suitable for deployment on mobile devices for real-time detection and management of urban garbage exposure, thereby advancing urban automation governance and digital economic development.
ABSTRACT
Two-dimensional (2D) nanosheet arrays with unidirectional orientations are of great significance for synthesizing wafer-scale single crystals. Although great efforts have been devoted, the growth of atomically thin magnetic nanosheet arrays and single crystals is still unaddressed. Here we design an interisland-distance-mediated chemical vapor deposition strategy to synthesize centimeter-scale atomically thin Fe3O4 arrays with unidirectional orientations on mica. The unidirectional alignment of nearly all the Fe3O4 nanosheets is driven by a dual-coupling-guided growth mechanism. The Fe3O4/mica interlayer interaction induces two preferred antiparallel orientations, whereas the interisland interaction of Fe3O4 breaks the energy degeneracy of antiparallel orientations. The room-temperature long-range ferrimagnetic order and thickness-tunable magnetic domain evolution are uncovered in atomically thin Fe3O4. This strategy to tune the orientations of nanosheets through the an interisland interaction can guide the synthesis of other 2D transition-metal oxides, thereby laying a solid foundation for future spintronic device applications at the integration level.
ABSTRACT
This study investigated the association between BMI trajectories in late middle age and incident diabetes in later years. A total of 11,441 participants aged 50-60 years from the Health and Retirement Study with at least two self-reported BMI records were included. Individual BMI trajectories representing average BMI changes per year were generated using multilevel modeling. Adjusted risk ratios (ARRs) and 95% confidence intervals (95% CIs) were calculated. Associations between BMI trajectories and diabetes risk in participants with different genetic risks were estimated for 5720 participants of European ancestry. BMI trajectories were significantly associated with diabetes risk in older age (slowly increasing vs. stable: ARR 1.31, 95% CI 1.12-1.54; rapidly increasing vs. stable: ARR 1.5, 95% CI 1.25-1.79). This association was strongest for normal-initial-BMI participants (slowly increasing: ARR 1.34, 95% CI 0.96-1.88; rapidly increasing: ARR 2.06, 95% CI 1.37-3.11). Participants with a higher genetic liability to diabetes and a rapidly increasing BMI trajectory had the highest risk for diabetes (ARR 2.15, 95% CI 1.67-2.76). These findings confirmed that BMI is the leading risk factor for diabetes and that although the normal BMI group has the lowest incidence rate for diabetes, people with normal BMI are most sensitive to changes in BMI.
ABSTRACT
MOTIVATION: In recent years, a growing number of studies have proved that microRNAs (miRNAs) play significant roles in the development of human complex diseases. Discovering the associations between miRNAs and diseases has become an important part of the discovery and treatment of disease. Since uncovering associations via traditional experimental methods is complicated and time-consuming, many computational methods have been proposed to identify the potential associations. However, there are still challenges in accurately determining potential associations between miRNA and disease by using multisource data. RESULTS: In this study, we develop a Multi-view Multichannel Attention Graph Convolutional Network (MMGCN) to predict potential miRNA-disease associations. Different from simple multisource information integration, MMGCN employs GCN encoder to obtain the features of miRNA and disease in different similarity views, respectively. Moreover, our MMGCN can enhance the learned latent representations for association prediction by utilizing multichannel attention, which adaptively learns the importance of different features. Empirical results on two datasets demonstrate that MMGCN model can achieve superior performance compared with nine state-of-the-art methods on most of the metrics. Furthermore, we prove the effectiveness of multichannel attention mechanism and the validity of multisource data in miRNA and disease association prediction. Case studies also indicate the ability of the method for discovering new associations.
Subject(s)
Algorithms , Biomarkers , Computational Biology/methods , Disease Susceptibility , MicroRNAs/genetics , Neural Networks, Computer , Databases, Genetic , Humans , ROC Curve , Web BrowserABSTRACT
MOTIVATION: MircroRNAs (miRNAs) regulate target genes and are responsible for lethal diseases such as cancers. Accurately recognizing and identifying miRNA and gene pairs could be helpful in deciphering the mechanism by which miRNA affects and regulates the development of cancers. Embedding methods and deep learning methods have shown their excellent performance in traditional classification tasks in many scenarios. But not so many attempts have adapted and merged these two methods into miRNA-gene relationship prediction. Hence, we proposed a novel computational framework. We first generated representational features for miRNAs and genes using both sequence and geometrical information and then leveraged a deep learning method for the associations' prediction. RESULTS: We used long short-term memory (LSTM) to predict potential relationships and proved that our method outperformed other state-of-the-art methods. Results showed that our framework SG-LSTM got an area under curve of 0.94 and was superior to other methods. In the case study, we predicted the top 10 miRNA-gene relationships and recommended the top 10 potential genes for hsa-miR-335-5p for SG-LSTM-core. We also tested our model using a larger dataset, from which 14 668 698 miRNA-gene pairs were predicted. The top 10 unknown pairs were also listed. AVAILABILITY: Our work can be download in https://github.com/Xshelton/SG_LSTM. CONTACT: luojiawei@hnu.edu.cn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Briefings in Bioinformatics online.
Subject(s)
Computational Biology/methods , Epistasis, Genetic , MicroRNAs/genetics , Algorithms , Datasets as Topic , Deep Learning , HumansABSTRACT
MOTIVATION: human microbes play a critical role in an extensive range of complex human diseases and become a new target in precision medicine. In silico methods of identifying microbe-disease associations not only can provide a deep insight into understanding the pathogenic mechanism of complex human diseases but also assist pharmacologists to screen candidate targets for drug development. However, the majority of existing approaches are based on linear models or label propagation, which suffers from limitations in capturing nonlinear associations between microbes and diseases. Besides, it is still a great challenge for most previous methods to make predictions for new diseases (or new microbes) with few or without any observed associations. RESULTS: in this work, we construct features for microbes and diseases by fully exploiting multiply sources of biomedical data, and then propose a novel deep learning framework of graph attention networks with inductive matrix completion for human microbe-disease association prediction, named GATMDA. To our knowledge, this is the first attempt to leverage graph attention networks for this important task. In particular, we develop an optimized graph attention network with talking-heads to learn representations for nodes (i.e. microbes and diseases). To focus on more important neighbours and filter out noises, we further design a bi-interaction aggregator to enforce representation aggregation of similar neighbours. In addition, we combine inductive matrix completion to reconstruct microbe-disease associations to capture the complicated associations between diseases and microbes. Comprehensive experiments on two data sets (i.e. HMDAD and Disbiome) demonstrated that our proposed model consistently outperformed baseline methods. Case studies on two diseases, i.e. asthma and inflammatory bowel disease, further confirmed the effectiveness of our proposed model of GATMDA. AVAILABILITY: python codes and data set are available at: https://github.com/yahuilong/GATMDA. CONTACT: luojiawei@hnu.edu.cn.