RESUMO
BACKGROUND: Many biological studies have shown that lncRNAs regulate the expression of epigenetically related genes. The study of lncRNAs has helped to deepen our understanding of the pathogenesis of complex diseases at the molecular level. Due to the large number of lncRNAs and the complex and time-consuming nature of biological experiments, applying computer techniques to predict potential lncRNA-disease associations is very effective. To explore information between complex network structures, existing methods rely mainly on lncRNA and disease information. Metapaths have been applied to network models as an effective method for exploring information in heterogeneous graphs. However, existing methods are dominated by lncRNAs or disease nodes and tend to ignore the paths provided by intermediate nodes. METHODS: We propose a deep learning model based on hierarchical graphical attention networks to predict unknown lncRNA-disease associations using multiple types of metapaths to extract features. We have named this model the MMHGAN. First, the model constructs a lncRNA-disease-miRNA heterogeneous graph based on known associations and two homogeneous graphs of lncRNAs and diseases. Second, for homogeneous graphs, the features of neighboring nodes are aggregated using a multihead attention mechanism. Third, for the heterogeneous graph, metapaths of different intermediate nodes are selected to construct subgraphs, and the importance of different types of metapaths is calculated and aggregated to obtain the final embedded features. Finally, the features are reconstructed using a fully connected layer to obtain the prediction results. RESULTS: We used a fivefold cross-validation method and obtained an average AUC value of 96.07% and an average AUPR value of 93.23%. Additionally, ablation experiments demonstrated the role of homogeneous graphs and different intermediate node path weights. In addition, we studied lung cancer, esophageal carcinoma, and breast cancer. Among the 15 lncRNAs associated with these diseases, 15, 12, and 14 lncRNAs were validated by the lncRNA Disease Database and the Lnc2Cancer Database, respectively. CONCLUSION: We compared the MMHGAN model with six existing models with better performance, and the case study demonstrated that the model was effective in predicting the correlation between potential lncRNAs and diseases.
Assuntos
Neoplasias da Mama , Neoplasias Pulmonares , MicroRNAs , RNA Longo não Codificante , Humanos , Feminino , RNA Longo não Codificante/genética , Biologia Computacional/métodos , MicroRNAs/genética , AlgoritmosRESUMO
BACKGROUND: A growing body of researches indicate that the disrupted expression of long non-coding RNA (lncRNA) is linked to a range of human disorders. Therefore, the effective prediction of lncRNA-disease association (LDA) can not only suggest solutions to diagnose a condition but also save significant time and labor costs. METHOD: In this work, we proposed a novel LDA predicting algorithm based on graph convolutional network and transformer, named GCNFORMER. Firstly, we integrated the intraclass similarity and interclass connections between miRNAs, lncRNAs and diseases, and built a graph adjacency matrix. Secondly, to completely obtain the features between various nodes, we employed a graph convolutional network for feature extraction. Finally, to obtain the global dependencies between inputs and outputs, we used a transformer encoder with a multiheaded attention mechanism to forecast lncRNA-disease associations. RESULTS: The results of fivefold cross-validation experiment on the public dataset revealed that the AUC and AUPR of GCNFORMER achieved 0.9739 and 0.9812, respectively. We compared GCNFORMER with six advanced LDA prediction models, and the results indicated its superiority over the other six models. Furthermore, GCNFORMER's effectiveness in predicting potential LDAs is underscored by case studies on breast cancer, colon cancer and lung cancer. CONCLUSIONS: The combination of graph convolutional network and transformer can effectively improve the performance of LDA prediction model and promote the in-depth development of this research filed.
Assuntos
Neoplasias da Mama , Neoplasias do Colo , MicroRNAs , RNA Longo não Codificante , Humanos , Feminino , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , MicroRNAs/genética , Algoritmos , Neoplasias da Mama/genética , Biologia Computacional/métodosRESUMO
N-terminal pro-B-type natriuretic peptide (NT-proBNP) is an essential biomarker for the prediction of heart failure (HF), but its prognostic ability across body mass index (BMI) categories needs to be clarified. Our study aimed to explore the association between BMI and NT-proBNP and assess the effect of BMI on the prognostic ability of NT-proBNP in Chinese patients with HF. We retrospectively analyzed clinical data from the FuWai Hospital HF Center in Beijing, China. According to the Chinese adult BMI standard, 1,508 patients with HF were classified into four groups: underweight (BMI < 18.5 kg/m2), normal weight (BMI 18.5-23.9 kg/m2, as a reference category), overweight (BMI 24-27.9 kg/m2), and obesity (BMI ≥ 28 kg/m2). NT-proBNP was examined for its prognostic role in adverse events as an endpoint. BMI was independently and negatively associated with NT-proBNP (ß = -0.074; P < 0.001), and NT-proBNP levels tended to decrease as BMI increased across the different BMI categories. The results of our study differ from those of other studies of European-American populations. In this study, NT-proBNP was a weak predictor of a 4-year adverse prognosis in underweight patients (BMI < 18.5 kg/m2). In other BMI categories, NT-proBNP was an independent predictor of adverse events in HF. BMI and sex significantly affected the optimal threshold for NT-proBNP to predict the risk of adverse events. There is a negative correlation between BMI and NT-proBNP, and NT-proBNP independently predicts adverse HF events in patients with a BMI of ≥ 18.5 kg/m2. The optimal risk prediction cutoffs are lower in patients who are overweight and obese.
Assuntos
Insuficiência Cardíaca , Peptídeo Natriurético Encefálico , Humanos , Prognóstico , Índice de Massa Corporal , Sobrepeso/complicações , Estudos Retrospectivos , Magreza , Obesidade/complicações , Biomarcadores , Fragmentos de Peptídeos , Insuficiência Cardíaca/complicações , Insuficiência Cardíaca/diagnósticoRESUMO
Type 2 diabetes mellitus (T2DM) is a complex disease caused by multiple factors, which are often accompanied by the disorder of glucose and lipid metabolism and the lack of vitamin D.Over the years, researchers have conducted numerous studies into the pathogenesis and prevention strategies of diabetes. In this study, diabetic SD rats were randomly divided into type 2 diabetes group, vitamin D intervention group, 7-dehydrocholesterole reductase (DHCR7) inhibitor intervention group, simvastatin intervention group, and naive control group. Before and 12 weeks after intervention, liver tissue was extracted to isolate hepatocytes. Compared with naive control group, in the type 2 diabetic group without interference, the expression of DHCR7 increased, the level of 25(OH)D3 decreased, the level of cholesterol increased. In the primary cultured naive and type 2 diabetic hepatocytes, the expression of genes related to lipid metabolism and vitamin D metabolism were differently regulated in each of the 5 treatment groups. Overall, DHCR7 is an indicator for type 2 diabetic glycolipid metabolism disorder and vitamin D deficiency. Targeting DHCR7 will help with T2DM therapy.The management model of comprehensive health intervention can timely discover the disease problems of diabetes patients and high-risk groups and reduce the incidence of diabetes.
Assuntos
Diabetes Mellitus Tipo 2 , Hipercolesterolemia , Oxirredutases atuantes sobre Doadores de Grupo CH-CH , Deficiência de Vitamina D , Animais , Ratos , Diabetes Mellitus Tipo 2/prevenção & controle , Oxirredutases , Oxirredutases atuantes sobre Doadores de Grupo CH-CH/genética , Oxirredutases atuantes sobre Doadores de Grupo CH-CH/metabolismo , Ratos Sprague-Dawley , Vitamina D/uso terapêuticoRESUMO
BACKGROUND: Accumulated evidence shows that the abnormal regulation of long non-coding RNA (lncRNA) is associated with various human diseases. Accurately identifying disease-associated lncRNAs is helpful to study the mechanism of lncRNAs in diseases and explore new therapies of diseases. Many lncRNA-disease association (LDA) prediction models have been implemented by integrating multiple kinds of data resources. However, most of the existing models ignore the interference of noisy and redundancy information among these data resources. RESULTS: To improve the ability of LDA prediction models, we implemented a random forest and feature selection based LDA prediction model (RFLDA in short). First, the RFLDA integrates the experiment-supported miRNA-disease associations (MDAs) and LDAs, the disease semantic similarity (DSS), the lncRNA functional similarity (LFS) and the lncRNA-miRNA interactions (LMI) as input features. Then, the RFLDA chooses the most useful features to train prediction model by feature selection based on the random forest variable importance score that takes into account not only the effect of individual feature on prediction results but also the joint effects of multiple features on prediction results. Finally, a random forest regression model is trained to score potential lncRNA-disease associations. In terms of the area under the receiver operating characteristic curve (AUC) of 0.976 and the area under the precision-recall curve (AUPR) of 0.779 under 5-fold cross-validation, the performance of the RFLDA is better than several state-of-the-art LDA prediction models. Moreover, case studies on three cancers demonstrate that 43 of the 45 lncRNAs predicted by the RFLDA are validated by experimental data, and the other two predicted lncRNAs are supported by other LDA prediction models. CONCLUSIONS: Cross-validation and case studies indicate that the RFLDA has excellent ability to identify potential disease-associated lncRNAs.
Assuntos
Algoritmos , Doença/genética , RNA Longo não Codificante/metabolismo , Área Sob a Curva , Biologia Computacional/métodos , Simulação por Computador , Humanos , MicroRNAs/metabolismo , Neoplasias/genética , Curva ROC , Análise de Regressão , Fatores de RiscoRESUMO
The metabolic syndrome (MS) is a cluster of interrelated risk factors including diabetes mellitus, abdominal obesity, high cholesterol, and hypertension, which can significantly increase mortality and disability. Accumulating evidence suggest that long non-coding RNAs (lncRNAs) are involved in the pathogenesis of human metabolic diseases. However, little is known about the regulatory role of lncRNAs in MS. In this work, we proposed a method for identifying potential MS-associated lncRNAs by constructing an lncRNA-miRNA-mRNA network (LMMN). Firstly, we constructed LMMN by integrating MS-associated genes, miRNA-mRNA interactions, miRNA-lncRNA interactions and mRNA/miRNA expression profiles in patients with MS. Then, we predicted potential MS-associated lncRNAs based on the topological properties of LMMN. As a result, we identified XIST as the most important lncRNA in LMMN. Furthermore, we focused on XIST/miR-214-3p and mir-181a-5p/PTEN axis and validated their expression in MS using real-time quantitative polymerase chain reaction (RT-qPCR). The RT-qPCR results showed that the expression of XIST and PTEN was significantly decreased (P < 0.05) while the expression of miR-214-3p was significantly increased (P < 0.05) in peripheral blood mononuclear cells (PBMCs) of patients with MS, compared with healthy controls. In addition, correlation analysis showed that XIST was negatively correlated with serum C peptide and PTEN was positively correlated with BMI of MS patients. Our findings provided new evidence for further exploring the regulatory role of XIST and other lncRNAs in MS.
Assuntos
Biomarcadores/análise , Redes Reguladoras de Genes , Síndrome Metabólica/patologia , MicroRNAs/genética , RNA Longo não Codificante/genética , RNA Mensageiro/metabolismo , Perfilação da Expressão Gênica , Humanos , Síndrome Metabólica/genética , Síndrome Metabólica/metabolismo , PTEN Fosfo-Hidrolase/genética , PTEN Fosfo-Hidrolase/metabolismo , RNA Longo não Codificante/metabolismo , RNA Mensageiro/genéticaRESUMO
BACKGROUND: A large body of evidence shows that miRNA regulates the expression of its target genes at post-transcriptional level and the dysregulation of miRNA is related to many complex human diseases. Accurately discovering disease-related miRNAs is conductive to the exploring of the pathogenesis and treatment of diseases. However, because of the limitation of time-consuming and expensive experimental methods, predicting miRNA-disease associations by computational models has become a more economical and effective mean. RESULTS: Inspired by the work of predecessors, we proposed an improved computational model based on random forest (RF) for identifying miRNA-disease associations (IRFMDA). First, the integrated similarity of diseases and the integrated similarity of miRNAs were calculated by combining the semantic similarity and Gaussian interaction profile kernel (GIPK) similarity of diseases, the functional similarity and GIPK similarity of miRNAs, respectively. Then, the integrated similarity of diseases and the integrated similarity of miRNAs were combined to represent each miRNA-disease relationship pair. Next, the miRNA-disease relationship pairs contained in the HMDD (v2.0) database were considered positive samples, and the randomly constructed miRNA-disease relationship pairs not included in HMDD (v2.0) were considered negative samples. Next, the feature selection based on the variable importance score of RF was performed to choose more useful features to represent samples to optimize the model's ability of inferring miRNA-disease associations. Finally, a RF regression model was trained on reduced sample space to score the unknown miRNA-disease associations. The AUCs of IRFMDA under local leave-one-out cross-validation (LOOCV), global LOOCV and 5-fold cross-validation achieved 0.8728, 0.9398 and 0.9363, which were better than several excellent models for predicting miRNA-disease associations. Moreover, case studies on oesophageal cancer, lymphoma and lung cancer showed that 94 (oesophageal cancer), 98 (lymphoma) and 100 (lung cancer) of the top 100 disease-associated miRNAs predicted by IRFMDA were supported by the experimental data in the dbDEMC (v2.0) database. CONCLUSIONS: Cross-validation and case studies demonstrated that IRFMDA is an excellent miRNA-disease association prediction model, and can provide guidance and help for experimental studies on the regulatory mechanism of miRNAs in complex human diseases in the future.
Assuntos
Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Estudos de Associação Genética , Predisposição Genética para Doença , MicroRNAs/genética , Área Sob a Curva , Humanos , MicroRNAs/metabolismo , Neoplasias/genética , Fatores de RiscoRESUMO
Identifying the associations between long noncoding RNAs (lncRNAs) and disease is critical for disease prevention, diagnosis and treatment. However, conducting wet experiments to discover these associations is time-consuming and costly. Therefore, computational modeling for predicting lncRNA-disease associations (LDAs) has become an important alternative. To enhance the accuracy of LDAs prediction and alleviate the issue of node feature oversmoothing when exploring the potential features of nodes using graph neural networks, we introduce DPFELDA, a dual-path feature extraction network that leverages the integration of information from multiple sources to predict LDA. Initially, we establish a dual-view structure of lncRNAs and disease and a heterogeneous network of lncRNA-disease-microRNA (miRNA) interactions. Subsequently, features are extracted using a dual-path feature extraction network. In particular, we employ a combination of a graph convolutional network, a convolutional block attention module, and a node aggregation layer to perform multilayer topology feature extraction for the dual-view structure of lncRNAs and diseases. Additionally, we utilize a Transformer model to construct the node topology feature residual network for obtaining node-specific features in heterogeneous networks. Finally, XGBoost is employed for LDA prediction. The experimental results demonstrate that DPFELDA outperforms the benchmark model on various benchmark data sets. In the course of model exploration, it becomes evident that DPFELDA successfully alleviates the issue of node feature oversmoothing induced by graph-based learning. Ablation experiments confirm the effectiveness of the innovative module, and a case study substantiates the accuracy of DPFELDA model in predicting novel LDAs for characteristic diseases.
RESUMO
The Internet of Things (IoT) is an extensive system of interrelated devices equipped with sensors to monitor and track real world objects, spanning several verticals, covering many different industries. The IoT's promise is capturing interest as its value in healthcare continues to grow, as it can overlay on top of challenges dealing with the rising burden of chronic disease management and an aging population. To address difficulties associated with IoT-enabled healthcare, we propose a secure routing protocol that combines a fuzzy logic system and the Whale Optimization Algorithm (WOA) hierarchically. The suggested method consists of two primary approaches: the fuzzy trust strategy and the WOA-inspired clustering methodology. The first methodology plays a critical role in determining the trustworthiness of connected IoT equipment. Furthermore, a WOA-based clustering framework is implemented. A fitness function assesses the likelihood of IoT devices acting as cluster heads. This formula considers factors such as centrality, range of communication, hop count, remaining energy, and trustworthiness. Compared with other algorithms, the proposed method outperformed them in terms of network lifespan, energy usage, and packet delivery ratio by 47%, 58%, and 17.7%, respectively.
Assuntos
Algoritmos , Lógica Fuzzy , Internet das Coisas , Atenção à Saúde , Humanos , Análise por Conglomerados , Redes de Comunicação de ComputadoresRESUMO
Increasing evidence indicates that mutations and dysregulation of long non-coding RNA (lncRNA) play a crucial role in the pathogenesis and prognosis of complex human diseases. Computational methods for predicting the association between lncRNAs and diseases have gained increasing attention. However, these methods face two key challenges: obtaining reliable negative samples and incorporating lncRNA-disease association (LDA) information from multiple perspectives. This paper proposes a method called NDMLDA, which combines multi-view feature extraction, unsupervised negative sample denoising, and stacking ensemble classifier. Firstly, an unsupervised method (K-means) is used to design a negative sample denoising module to alleviate the imbalance of samples and the impact of potential noise in the negative samples on model performance. Secondly, graph attention networks are employed to extract multi-view features of both lncRNAs and diseases, thereby enhancing the learning of association information between them. Finally, lncRNA-disease association prediction is implemented through a stacking ensemble classifier. Existing research datasets are integrated to evaluate performance, and 5-fold cross-validation is conducted on this dataset. Experimental results demonstrate that NDMLDA achieves an AUC of 0.9907and an AUPR of 0.9927, with a 5-fold cross-validation variance of less than 0.1%. These results outperform the baseline methods. Additionally, case studies further illustrate the model's potential in cancer diagnosis and precision medicine implementation.
RESUMO
Objective: The objective of this research is to construct a method to alleviate the problem of sample imbalance in classification, especially for arrhythmia classification. This approach can improve the performance of the model without using data enhancement. Methods: In this study, we have developed a new Multi-layer Perceptron (MLP) block and have used a Weight Capsule (WCapsule) network with MLP combined with sequence-to-sequence (Seq2Seq) network to classify arrhythmias. Our work is based on the MIT-BIH arrhythmia database, the original electrocardiogram (ECG) data is classified according to the criteria recommended by the American Association for Medical Instrumentation (AAMI). Also, our method's performance is further evaluated. Results: The proposed model is evaluated using the inter-patient paradigm. Our proposed method shows an accuracy (ACC) of 99.88% under sample imbalance. For Class N, sensitivity (SEN) is 99.79%, positive predictive value (PPV) is 99.90%, and specificity (SPEC) is 99.19%. For Class S, SEN is 97.66%, PPV is 96.14%, and SPEC is 99.85%. For Class V, SEN is 99.97%, PPV is 99.07%, and SPEC is 99.94%. For Class F, SEN is 97.94%, PPV is 98.70%, and SPEC is 99.99%. When using only half of the training sample, our method shows that the SEN of Class N and V is 0.97% and 5.27% higher than the traditional machine learning algorithm. Conclusion: The proposed method combines MLP, weight capsule network with Seq2seq network, effectively addresses the problem of sample imbalance in arrhythmia classification, and produces good performance. Our method also shows promising potential in less samples.
RESUMO
More and more studies have proved that microRNAs (miRNAs) play a critical role in gene expression regulation, and the irregular expression of miRNAs tends to be associated with a variety of complex human diseases. Because of the high cost and low efficiency of identifying disease-associated miRNAs through biological experiments, scholars have focused on predicting potential disease-associated miRNAs by computational methods. Considering that the existing methods are flawed in constructing negative sample set, we proposed a clustering-based sampling method for miRNA-disease association prediction (CSMDA). Firstly, we integrated multiple similarity information of miRNA and disease to represent miRNA-disease pairs. Secondly, we performed a clustering-based sampling method to avoid introducing potential positive samples when constructing negative sample set. Thirdly, we employed a random forest-based feature selection method to reduce noise and redundant information in the high-dimensional feature space. Finally, we implemented an ensemble learning framework for predicting miRNA-disease associations by soft voting. The Precision, Recall, F1-score, AUROC and AUPR of the CSMDA achieved 0.9676, 0.9545, 0.9610, 0.9928, and 0.9940, respectively, under five-fold cross-validation. Besides, case study on three cancers showed that the top 20 potentially associated miRNAs predicted by the CSMDA were confirmed by the dbDEMC database or literatures. The above results demonstrate that the CSMDA can predict potential disease-associated miRNAs more accurately.
RESUMO
More and more evidences have showed that the unnatural expression of long non-coding RNA (lncRNA) is relevant to varieties of human diseases. Therefore, accurate identification of disease-related lncRNAs can help to understand lncRNA expression at the molecular level and to explore more effective treatments for diseases. Plenty of lncRNA-disease association prediction models have been raised but it is still a challenge to recognize unknown lncRNA-disease associations. In this work, we have proposed a computational model for predicting lncRNA-disease associations based on geometric complement heterogeneous information and random forest. Firstly, geometric complement heterogeneous information was used to integrate lncRNA-miRNA interactions and miRNA-disease associations verified by experiments. Secondly, lncRNA and disease features consisted of their respective similarity coefficients were fused into input feature space. Thirdly, an autoencoder was adopted to project raw high-dimensional features into low-dimension space to learn representation for lncRNAs and diseases. Finally, the low-dimensional lncRNA and disease features were fused into input feature space to train a random forest classifier for lncRNA-disease association prediction. Under five-fold cross-validation, the AUC (area under the receiver operating characteristic curve) is 0.9897 and the AUPR (area under the precision-recall curve) is 0.7040, indicating that the performance of our model is better than several state-of-the-art lncRNA-disease association prediction models. In addition, case studies on colon and stomach cancer indicate that our model has a good ability to predict disease-related lncRNAs.
RESUMO
AIM: Type 2 diabetes and obesity are diseases related to surplus energy in the body. Abnormal interaction between the hypothalamus and adipose tissues is a key trigger of energy metabolism dysfunction. Extracellular vesicles (EVs) regulate intercellular communication by transporting intracellular cargo to recipient cells thereby altering the function of recipient cells. This study aimed to evaluate whether adipocyte-derived EVs can act on hypothalamic neurons to modulate energy intake and to identify the EV-associated non-coding RNAs. METHODS: Confocal imaging was used to trace the uptake of labelled adipocyte-derived exosomes by hypothalamic anorexigenic POMC neurons. The effects of adipocyte-derived EVs on the mammalian target of rapamycin (mTOR) signalling pathway in POMC neurons were evaluated based on mRNA and protein expression in vitro using quantitative real-time PCR and western blotting. In addition, adipocyte-derived EVs were injected into recipient mice, and changes in mice body weight and daily food intake were monitored. The biological effects of the EV-associated MALAT1 on POMC neurons were explored. RESULTS: Adipocyte-derived EVs were successfully transferred into POMC neurons in vitro. Results showed that adipocytes of obese mice secreted MALAT1-containing EVs, which increased appetite and weight when administered to lean mice. Conversely, adipocyte-derived EVs from lean mice decreased food intake and weight when administered to obese mice. CONCLUSION: Adipocyte-derived EVs play important roles in mediating the interaction between adipocytes and hypothalamic neurons. Adipocyte-derived EVs can regulate POMC expression through the hypothalamic mTOR signalling in vivo and in vitro, thereby affecting body energy intake.
Assuntos
Adipócitos/metabolismo , Apetite/fisiologia , Peso Corporal/fisiologia , Vesículas Extracelulares/metabolismo , Hipotálamo/metabolismo , Obesidade/metabolismo , Serina-Treonina Quinases TOR/metabolismo , Adipócitos/patologia , Animais , Encéfalo/metabolismo , Encéfalo/patologia , Células Cultivadas , Dieta Hiperlipídica , Modelos Animais de Doenças , Vesículas Extracelulares/patologia , Hipotálamo/patologia , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Neurônios/metabolismo , Neurônios/patologia , Obesidade/patologia , Ratos Wistar , Transdução de SinaisRESUMO
Metabolic syndrome is a cluster of the most dangerous heart attack risk factors (diabetes and raised fasting plasma glucose, abdominal obesity, high cholesterol and high blood pressure), and has become a major global threat to human health. A number of studies have demonstrated that hundreds of non-coding RNAs, including miRNAs and lncRNAs, are involved in metabolic syndrome-related diseases such as obesity, type 2 diabetes mellitus, hypertension, etc. However, these research results are distributed in a large number of literature, which is not conducive to analysis and use. There is an urgent need to integrate these relationship data between metabolic syndrome and non-coding RNA into a specialized database. To address this need, we developed a metabolic syndrome-associated non-coding RNA database (ncRNA2MetS) to curate the associations between metabolic syndrome and non-coding RNA. Currently, ncRNA2MetS contains 1,068 associations between five metabolic syndrome traits and 627 non-coding RNAs (543 miRNAs and 84 lncRNAs) in four species. Each record in ncRNA2MetS database represents a pair of disease-miRNA (lncRNA) association consisting of non-coding RNA category, miRNA (lncRNA) name, name of metabolic syndrome trait, expressive patterns of non-coding RNA, method for validation, specie involved, a brief introduction to the association, the article referenced, etc. We also developed a user-friendly website so that users can easily access and download all data. In short, ncRNA2MetS is a complete and high-quality data resource for exploring the role of non-coding RNA in the pathogenesis of metabolic syndrome and seeking new treatment options. The website is freely available at http://www.biomed-bigdata.com:50020/index.html.
RESUMO
High-dimensional data and a large number of redundancy features in bioinformatics research have created an urgent need for feature selection. In this paper, a novel random forests-based feature selection method is proposed that adopts the idea of stratifying feature space and combines generalised sequence backward searching and generalised sequence forward searching strategies. A random forest variable importance score is used to rank features, and different classifiers are used as a feature subset evaluating function. The proposed method is examined on five microarray expression datasets, including leukaemia, prostate, breast, nervous and DLBCL, and the average accuracies of the SVM classifier in these datasets are 100%, 95.24%, 85%, 91.67%, and 91.67%, respectively. The results show that the proposed method could not only improve the classification accuracy but also greatly reduce the computation time of the feature selection process.