Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 100
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38622357

RESUMEN

Pseudouridine is an RNA modification that is widely distributed in both prokaryotes and eukaryotes, and plays a critical role in numerous biological activities. Despite its importance, the precise identification of pseudouridine sites through experimental approaches poses significant challenges, requiring substantial time and resources.Therefore, there is a growing need for computational techniques that can reliably and quickly identify pseudouridine sites from vast amounts of RNA sequencing data. In this study, we propose fuzzy kernel evidence Random Forest (FKeERF) to identify pseudouridine sites. This method is called PseU-FKeERF, which demonstrates high accuracy in identifying pseudouridine sites from RNA sequencing data. The PseU-FKeERF model selected four RNA feature coding schemes with relatively good performance for feature combination, and then input them into the newly proposed FKeERF method for category prediction. FKeERF not only uses fuzzy logic to expand the original feature space, but also combines kernel methods that are easy to interpret in general for category prediction. Both cross-validation tests and independent tests on benchmark datasets have shown that PseU-FKeERF has better predictive performance than several state-of-the-art methods. This new method not only improves the accuracy of pseudouridine site identification, but also provides a certain reference for disease control and related drug development in the future.


Asunto(s)
Seudouridina , Bosques Aleatorios , Seudouridina/genética , ARN/genética , Secuencia de Bases
2.
Nucleic Acids Res ; 52(D1): D990-D997, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-37831073

RESUMEN

Rare variants contribute significantly to the genetic causes of complex traits, as they can have much larger effects than common variants and account for much of the missing heritability in genome-wide association studies. The emergence of UK Biobank scale datasets and accurate gene-level rare variant-trait association testing methods have dramatically increased the number of rare variant associations that have been detected. However, no systematic collection of these associations has been carried out to date, especially at the gene level. To address the issue, we present the Rare Variant Association Repository (RAVAR), a comprehensive collection of rare variant associations. RAVAR includes 95 047 high-quality rare variant associations (76186 gene-level and 18 861 variant-level associations) for 4429 reported traits which are manually curated from 245 publications. RAVAR is the first resource to collect and curate published rare variant associations in an interactive web interface with integrated visualization, search, and download features. Detailed gene and SNP information are provided for each association, and users can conveniently search for related studies by exploring the EFO tree structure and interactive Manhattan plots. RAVAR could vastly improve the accessibility of rare variant studies. RAVAR is freely available for all users without login requirement at http://www.ravar.bio.


Asunto(s)
Bases de Datos Genéticas , Variación Genética , Estudio de Asociación del Genoma Completo , Estudio de Asociación del Genoma Completo/métodos , Herencia Multifactorial , Fenotipo
3.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37930024

RESUMEN

Development of robust and effective strategies for synthesizing new compounds, drug targeting and constructing GEnome-scale Metabolic models (GEMs) requires a deep understanding of the underlying biological processes. A critical step in achieving this goal is accurately identifying the categories of pathways in which a compound participated. However, current machine learning-based methods often overlook the multifaceted nature of compounds, resulting in inaccurate pathway predictions. Therefore, we present a novel framework on Multi-View Multi-Label Learning for Metabolic Pathway Inference, hereby named MVML-MPI. First, MVML-MPI learns the distinct compound representations in parallel with corresponding compound encoders to fully extract features. Subsequently, we propose an attention-based mechanism that offers a fusion module to complement these multi-view representations. As a result, MVML-MPI accurately represents and effectively captures the complex relationship between compounds and metabolic pathways and distinguishes itself from current machine learning-based methods. In experiments conducted on the Kyoto Encyclopedia of Genes and Genomes pathways dataset, MVML-MPI outperformed state-of-the-art methods, demonstrating the superiority of MVML-MPI and its potential to utilize the field of metabolic pathway design, which can aid in optimizing drug-like compounds and facilitating the development of GEMs. The code and data underlying this article are freely available at https://github.com/guofei-tju/MVML-MPI. Contact:  jtang@cse.sc.edu, guofei@csu.edu.com or wuxi_dyj@csj.uestc.edu.cn.


Asunto(s)
Aprendizaje Automático , Redes y Vías Metabólicas
4.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36502371

RESUMEN

Deoxyribonucleic acid(DNA) N6-methyladenine plays a vital role in various biological processes, and the accurate identification of its site can provide a more comprehensive understanding of its biological effects. There are several methods for 6mA site prediction. With the continuous development of technology, traditional techniques with the high costs and low efficiencies are gradually being replaced by computer methods. Computer methods that are widely used can be divided into two categories: traditional machine learning and deep learning methods. We first list some existing experimental methods for predicting the 6mA site, then analyze the general process from sequence input to results in computer methods and review existing model architectures. Finally, the results were summarized and compared to facilitate subsequent researchers in choosing the most suitable method for their work.


Asunto(s)
Metilación de ADN , Aprendizaje Automático , Proyectos de Investigación , ADN/genética
5.
PLoS Comput Biol ; 20(6): e1012229, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38924082

RESUMEN

De novo drug design is crucial in advancing drug discovery, which aims to generate new drugs with specific pharmacological properties. Recently, deep generative models have achieved inspiring progress in generating drug-like compounds. However, the models prioritize a single target drug generation for pharmacological intervention, neglecting the complicated inherent mechanisms of diseases, and influenced by multiple factors. Consequently, developing novel multi-target drugs that simultaneously target specific targets can enhance anti-tumor efficacy and address issues related to resistance mechanisms. To address this issue and inspired by Generative Pre-trained Transformers (GPT) models, we propose an upgraded GPT model with generative adversarial imitation learning for multi-target molecular generation called MTMol-GPT. The multi-target molecular generator employs a dual discriminator model using the Inverse Reinforcement Learning (IRL) method for a concurrently multi-target molecular generation. Extensive results show that MTMol-GPT generates various valid, novel, and effective multi-target molecules for various complex diseases, demonstrating robustness and generalization capability. In addition, molecular docking and pharmacophore mapping experiments demonstrate the drug-likeness properties and effectiveness of generated molecules potentially improve neuropsychiatric interventions. Furthermore, our model's generalizability is exemplified by a case study focusing on the multi-targeted drug design for breast cancer. As a broadly applicable solution for multiple targets, MTMol-GPT provides new insight into future directions to enhance potential complex disease therapeutics by generating high-quality multi-target molecules in drug discovery.


Asunto(s)
Biología Computacional , Descubrimiento de Drogas , Simulación del Acoplamiento Molecular , Humanos , Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Diseño de Fármacos , Antineoplásicos/química , Antineoplásicos/farmacología , Algoritmos , Aprendizaje Profundo , Aprendizaje Automático
6.
Methods ; 223: 75-82, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38286333

RESUMEN

The accurate identification of drug-protein interactions (DPIs) is crucial in drug development, especially concerning G protein-coupled receptors (GPCRs), which are vital targets in drug discovery. However, experimental validation of GPCR-drug pairings is costly, prompting the need for accurate predictive methods. To address this, we propose MFD-GDrug, a multimodal deep learning model. Leveraging the ESM pretrained model, we extract protein features and employ a CNN for protein feature representation. For drugs, we integrated multimodal features of drug molecular structures, including three-dimensional features derived from Mol2vec and the topological information of drug graph structures extracted through Graph Convolutional Neural Networks (GCN). By combining structural characterizations and pretrained embeddings, our model effectively captures GPCR-drug interactions. Our tests on leading GPCR-drug interaction datasets show that MFD-GDrug outperforms other methods, demonstrating superior predictive accuracy.


Asunto(s)
Aprendizaje Profundo , Interacciones Farmacológicas , Desarrollo de Medicamentos , Descubrimiento de Drogas , Redes Neurales de la Computación
7.
Methods ; 2024 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-39097179

RESUMEN

DNA N6 methyladenine (6 mA) plays an important role in many biological processes, and accurately identifying its sites helps one to understand its biological effects more comprehensively. Previous traditional experimental methods are very labor-intensive and traditional machine learning methods also seem to be somewhat insufficient as the database of 6 mA methylation groups becomes progressively larger, so we propose a deep learning-based method called multi-scale convolutional model based on global response normalization (CG6mA) to solve the prediction problem of 6 mA site. This method is tested with other methods on three different kinds of benchmark datasets, and the results show that our model can get more excellent prediction results.

8.
Brief Bioinform ; 23(2)2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35134117

RESUMEN

Targeted drugs have been applied to the treatment of cancer on a large scale, and some patients have certain therapeutic effects. It is a time-consuming task to detect drug-target interactions (DTIs) through biochemical experiments. At present, machine learning (ML) has been widely applied in large-scale drug screening. However, there are few methods for multiple information fusion. We propose a multiple kernel-based triple collaborative matrix factorization (MK-TCMF) method to predict DTIs. The multiple kernel matrices (contain chemical, biological and clinical information) are integrated via multi-kernel learning (MKL) algorithm. And the original adjacency matrix of DTIs could be decomposed into three matrices, including the latent feature matrix of the drug space, latent feature matrix of the target space and the bi-projection matrix (used to join the two feature spaces). To obtain better prediction performance, MKL algorithm can regulate the weight of each kernel matrix according to the prediction error. The weights of drug side-effects and target sequence are the highest. Compared with other computational methods, our model has better performance on four test data sets.


Asunto(s)
Algoritmos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Interacciones Farmacológicas , Humanos , Aprendizaje Automático
9.
Brief Bioinform ; 23(6)2022 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-36259601

RESUMEN

In the entire life cycle of drug development, the side effect is one of the major failure factors. Severe side effects of drugs that go undetected until the post-marketing stage leads to around two million patient morbidities every year in the United States. Therefore, there is an urgent need for a method to predict side effects of approved drugs and new drugs. Following this need, we present a new predictor for finding side effects of drugs. Firstly, multiple similarity matrices are constructed based on the association profile feature and drug chemical structure information. Secondly, these similarity matrices are integrated by Centered Kernel Alignment-based Multiple Kernel Learning algorithm. Then, Weighted K nearest known neighbors is utilized to complement the adjacency matrix. Next, we construct Restricted Boltzmann machines (RBM) in drug space and side effect space, respectively, and apply a penalized maximum likelihood approach to train model. At last, the average decision rule was adopted to integrate predictions from RBMs. Comparison results and case studies demonstrate, with four benchmark datasets, that our method can give a more accurate and reliable prediction result.


Asunto(s)
Algoritmos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Humanos , Funciones de Verosimilitud , Análisis por Conglomerados
10.
Methods ; 219: 73-81, 2023 11.
Artículo en Inglés | MEDLINE | ID: mdl-37783242

RESUMEN

Adverse drug reactions include side effects, allergic reactions, and secondary infections. Severe adverse reactions can cause cancer, deformity, or mutation. The monitoring of drug side effects is an important support for post marketing safety supervision of drugs, and an important basis for revising drug instructions. Its purpose is to timely detect and control drug safety risks. Traditional methods are time-consuming. To accelerate the discovery of side effects, we propose a machine learning based method, called correntropy-loss based matrix factorization with neural tangent kernel (CLMF-NTK), to solve the prediction of drug side effects. Our method and other computational methods are tested on three benchmark datasets, and the results show that our method achieves the best predictive performance.


Asunto(s)
Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Neoplasias , Humanos , Aprendizaje Automático , Neoplasias/genética , Benchmarking , Algoritmos
11.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33443536

RESUMEN

Relationship of accurate associations between non-coding RNAs and diseases could be of great help in the treatment of human biomedical research. However, the traditional technology is only applied on one type of non-coding RNA or a specific disease, and the experimental method is time-consuming and expensive. More computational tools have been proposed to detect new associations based on known ncRNA and disease information. Due to the ncRNAs (circRNAs, miRNAs and lncRNAs) having a close relationship with the progression of various human diseases, it is critical for developing effective computational predictors for ncRNA-disease association prediction. In this paper, we propose a new computational method of three-matrix factorization with hypergraph regularization terms (HGRTMF) based on central kernel alignment (CKA), for identifying general ncRNA-disease associations. In the process of constructing the similarity matrix, various types of similarity matrices are applicable to circRNAs, miRNAs and lncRNAs. Our method achieves excellent performance on five datasets, involving three types of ncRNAs. In the test, we obtain best area under the curve scores of $0.9832$, $0.9775$, $0.9023$, $0.8809$ and $0.9185$ via 5-fold cross-validation and $0.9832$, $0.9836$, $0.9198$, $0.9459$ and $0.9275$ via leave-one-out cross-validation on five datasets. Furthermore, our novel method (CKA-HGRTMF) is also able to discover new associations between ncRNAs and diseases accurately. Availability: Codes and data are available: https://github.com/hzwh6910/ncRNA2Disease.git. Contact:fguo@tju.edu.cn.


Asunto(s)
Algoritmos , Biología Computacional , Enfermedad/genética , Modelos Genéticos , ARN no Traducido , Humanos , ARN no Traducido/genética , ARN no Traducido/metabolismo
12.
Methods ; 208: 1-8, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36220606

RESUMEN

An enhancer is a short DNA sequence containing many binding sites of transcription factors that plays a crucial role in the gene expression of major eukaryotes. It is difficult to avoid the time consumption and high cost of experimental methods. Therefore, with the continuous development of genomics, it is an urgent task to identify enhancers and their intensities by computational methods. In this paper, we propose a two-layer model called iEnhancer-MRBF, wherein the first layer is used to identify enhancers, and the identified enhancers are divided into strong enhancers and weak enhancers according to their strength in the second layer. In iEnhancer-MRBF, a new classifier multiple Laplacian-regularized radial basis function network (MLR-RBFN) is proposed, and three feature representation methods, namely, kmer, nucleotide binary profiles (NBP) and ac-cumulated nucleotide frequency (ANF), as well as feature selection, are used to process DNA sequences. The experimental results show that the model is significantly better than the previous prediction models, and the test accuracy rates of the first and second layers of independent datasets are 79.75% and 83.50%, respectively.


Asunto(s)
Elementos de Facilitación Genéticos , Genómica , Genómica/métodos , Nucleótidos , Factores de Transcripción/metabolismo , Secuencia de Bases
13.
Methods ; 207: 29-37, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-36087888

RESUMEN

DNA-binding proteins actively participate in life activities such as DNA replication, recombination, gene expression and regulation and play a prominent role in these processes. As DNA-binding proteins continue to be discovered and increase, it is imperative to design an efficient and accurate identification tool. Considering the time-consuming and expensive traditional experimental technology and the insufficient number of samples in the biological computing method based on structural information, we proposed a machine learning algorithm based on sequence information to identify DNA binding proteins, named multi-view Least Squares Support Vector Machine via Hilbert-Schmidt Independence Criterion (multi-view LSSVM via HSIC). This method took 6 feature sets as multi-view input and trains a single view through the LSSVM algorithm. Then, we integrated HSIC into LSSVM as a regular term to reduce the dependence between views and explored the complementary information of multiple views. Subsequently, we trained and coordinated the submodels and finally combined the submodels in the form of weights to obtain the final prediction model. On training set PDB1075, the prediction results of our model were better than those of most existing methods. Independent tests are conducted on the datasets PDB186 and PDB2272. The accuracy of the prediction results was 85.5% and 79.36%, respectively. This result exceeded the current state-of-the-art methods, which showed that the multi-view LSSVM via HSIC can be used as an efficient predictor.


Asunto(s)
Proteínas de Unión al ADN , Máquina de Vectores de Soporte , Proteínas de Unión al ADN/química , Análisis de los Mínimos Cuadrados , Aprendizaje Automático , Algoritmos
14.
Int J Mol Sci ; 24(12)2023 Jun 12.
Artículo en Inglés | MEDLINE | ID: mdl-37373163

RESUMEN

High-fat diet (HFD)-induced insulin resistance (IR) in skeletal muscle is often accompanied by mitochondrial dysfunction and oxidative stress. Boosting nicotinamide adenine dinucleotide (NAD) using nicotinamide riboside (NR) can effectively decrease oxidative stress and increase mitochondrial function. However, whether NR can ameliorate IR in skeletal muscle is still inconclusive. We fed male C57BL/6J mice with an HFD (60% fat) ± 400 mg/kg·bw NR for 24 weeks. C2C12 myotube cells were treated with 0.25 mM palmitic acid (PA) ± 0.5 mM NR for 24 h. Indicators for IR and mitochondrial dysfunction were analyzed. NR treatment alleviated IR in HFD-fed mice with regard to improved glucose tolerance and a remarkable decrease in the levels of fasting blood glucose, fasting insulin and HOMA-IR index. NR-treated HFD-fed mice also showed improved metabolic status regarding a significant reduction in body weight and lipid contents in serum and the liver. NR activated AMPK in the skeletal muscle of HFD-fed mice and PA-treated C2C12 myotube cells and upregulated the expression of mitochondria-related transcriptional factors and coactivators, thereby improving mitochondrial function and alleviating oxidative stress. Upon inhibiting AMPK using Compound C, NR lost its ability in enhancing mitochondrial function and protection against IR induced by PA. In summary, improving mitochondrial function through the activation of AMPK pathway in skeletal muscle may play an important role in the amelioration of IR using NR.


Asunto(s)
Resistencia a la Insulina , Masculino , Ratones , Animales , Resistencia a la Insulina/fisiología , Proteínas Quinasas Activadas por AMP/metabolismo , Ratones Endogámicos C57BL , Mitocondrias , Músculo Esquelético/metabolismo , Insulina/metabolismo , Ácido Palmítico/farmacología , Ácido Palmítico/metabolismo , Dieta Alta en Grasa/efectos adversos
15.
Brief Bioinform ; 21(5): 1628-1640, 2020 09 25.
Artículo en Inglés | MEDLINE | ID: mdl-31697319

RESUMEN

Human protein subcellular localization has an important research value in biological processes, also in elucidating protein functions and identifying drug targets. Over the past decade, a number of protein subcellular localization prediction tools have been designed and made freely available online. The purpose of this paper is to summarize the progress of research on the subcellular localization of human proteins in recent years, including commonly used data sets proposed by the predecessors and the performance of all selected prediction tools against the same benchmark data set. We carry out a systematic evaluation of several publicly available subcellular localization prediction methods on various benchmark data sets. Among them, we find that mLASSO-Hum and pLoc-mHum provide a statistically significant improvement in performance, as measured by the value of accuracy, relative to the other methods. Meanwhile, we build a new data set using the latest version of Uniprot database and construct a new GO-based prediction method HumLoc-LBCI in this paper. Then, we test all selected prediction tools on the new data set. Finally, we discuss the possible development directions of human protein subcellular localization. Availability: The codes and data are available from http://www.lbci.cn/syn/.


Asunto(s)
Internet , Proteínas/metabolismo , Fracciones Subcelulares/metabolismo , Benchmarking , Conjuntos de Datos como Asunto , Humanos
16.
Int J Mol Sci ; 23(6)2022 Mar 11.
Artículo en Inglés | MEDLINE | ID: mdl-35328461

RESUMEN

Dihydrouridine (D) is an abundant post-transcriptional modification present in transfer RNA from eukaryotes, bacteria, and archaea. D has contributed to treatments for cancerous diseases. Therefore, the precise detection of D modification sites can enable further understanding of its functional roles. Traditional experimental techniques to identify D are laborious and time-consuming. In addition, there are few computational tools for such analysis. In this study, we utilized eleven sequence-derived feature extraction methods and implemented five popular machine algorithms to identify an optimal model. During data preprocessing, data were partitioned for training and testing. Oversampling was also adopted to reduce the effect of the imbalance between positive and negative samples. The best-performing model was obtained through a combination of random forest and nucleotide chemical property modeling. The optimized model presented high sensitivity and specificity values of 0.9688 and 0.9706 in independent tests, respectively. Our proposed model surpassed published tools in independent tests. Furthermore, a series of validations across several aspects was conducted in order to demonstrate the robustness and reliability of our model.


Asunto(s)
Algoritmos , Nucleótidos , Biología Computacional/métodos , ARN de Transferencia , Reproducibilidad de los Resultados
17.
BMC Bioinformatics ; 22(Suppl 3): 291, 2021 May 31.
Artículo en Inglés | MEDLINE | ID: mdl-34058979

RESUMEN

BACKGROUND: DNA-Binding Proteins (DBP) plays a pivotal role in biological system. A mounting number of researchers are studying the mechanism and detection methods. To detect DBP, the tradition experimental method is time-consuming and resource-consuming. In recent years, Machine Learning methods have been used to detect DBP. However, it is difficult to adequately describe the information of proteins in predicting DNA-binding proteins. In this study, we extract six features from protein sequence and use Multiple Kernel Learning-based on Centered Kernel Alignment to integrate these features. The integrated feature is fed into Support Vector Machine to build predictive model and detect new DBP. RESULTS: In our work, date sets of PDB1075 and PDB186 are employed to test our method. From the results, our model obtains better results (accuracy) than other existing methods on PDB1075 ([Formula: see text]) and PDB186 ([Formula: see text]), respectively. CONCLUSION: Multiple kernel learning could fuse the complementary information between different features. Compared with existing methods, our method achieves comparable and best results on benchmark data sets.


Asunto(s)
Proteínas de Unión al ADN , Máquina de Vectores de Soporte , Aprendizaje Automático
18.
BMC Bioinformatics ; 22(Suppl 3): 431, 2021 Sep 08.
Artículo en Inglés | MEDLINE | ID: mdl-34496763

RESUMEN

BACKGROUND: RNA secondary structure prediction is an important research content in the field of biological information. Predicting RNA secondary structure with pseudoknots has been proved to be an NP-hard problem. Traditional machine learning methods can not effectively apply protein sequence information with different sequence lengths to the prediction process due to the constraint of the self model when predicting the RNA secondary structure. In addition, there is a large difference between the number of paired bases and the number of unpaired bases in the RNA sequences, which means the problem of positive and negative sample imbalance is easy to make the model fall into a local optimum. To solve the above problems, this paper proposes a variable-length dynamic bidirectional Gated Recurrent Unit(VLDB GRU) model. The model can accept sequences with different lengths through the introduction of flag vector. The model can also make full use of the base information before and after the predicted base and can avoid losing part of the information due to truncation. Introducing a weight vector to predict the RNA training set by dynamically adjusting each base loss function solves the problem of balanced sample imbalance. RESULTS: The algorithm proposed in this paper is compared with the existing algorithms on five representative subsets of the data set RNA STRAND. The experimental results show that the accuracy and Matthews correlation coefficient of the method are improved by 4.7% and 11.4%, respectively. CONCLUSIONS: The flag vector introduced allows the model to effectively use the information before and after the protein sequence; the introduced weight vector solves the problem of unbalanced sample balance. Compared with other algorithms, the LVDB GRU algorithm proposed in this paper has the best detection results.


Asunto(s)
Redes Neurales de la Computación , ARN , Algoritmos , Conformación de Ácido Nucleico , Estructura Secundaria de Proteína , ARN/genética
19.
BMC Genomics ; 22(1): 605, 2021 Aug 09.
Artículo en Inglés | MEDLINE | ID: mdl-34372777

RESUMEN

BACKGROUND: Identifying potential associations between genes and diseases via biomedical experiments must be the time-consuming and expensive research works. The computational technologies based on machine learning models have been widely utilized to explore genetic information related to complex diseases. Importantly, the gene-disease association detection can be defined as the link prediction problem in bipartite network. However, many existing methods do not utilize multiple sources of biological information; Additionally, they do not extract higher-order relationships among genes and diseases. RESULTS: In this study, we propose a novel method called Dual Hypergraph Regularized Least Squares (DHRLS) with Centered Kernel Alignment-based Multiple Kernel Learning (CKA-MKL), in order to detect all potential gene-disease associations. First, we construct multiple kernels based on various biological data sources in gene and disease spaces respectively. After that, we use CAK-MKL to obtain the optimal kernels in the two spaces respectively. To specific, hypergraph can be employed to establish higher-order relationships. Finally, our DHRLS model is solved by the Alternating Least squares algorithm (ALSA), for predicting gene-disease associations. CONCLUSION: Comparing with many outstanding prediction tools, DHRLS achieves best performance on gene-disease associations network under two types of cross validation. To verify robustness, our proposed approach has excellent prediction performance on six real-world networks. Our research work can effectively discover potential disease-associated genes and provide guidance for the follow-up verification methods of complex diseases.


Asunto(s)
Algoritmos , Redes Reguladoras de Genes , Humanos , Análisis de los Mínimos Cuadrados , Aprendizaje Automático
20.
BMC Genomics ; 22(1): 56, 2021 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-33451286

RESUMEN

BACKGROUND: Biological functions of biomolecules rely on the cellular compartments where they are located in cells. Importantly, RNAs are assigned in specific locations of a cell, enabling the cell to implement diverse biochemical processes in the way of concurrency. However, lots of existing RNA subcellular localization classifiers only solve the problem of single-label classification. It is of great practical significance to expand RNA subcellular localization into multi-label classification problem. RESULTS: In this study, we extract multi-label classification datasets about RNA-associated subcellular localizations on various types of RNAs, and then construct subcellular localization datasets on four RNA categories. In order to study Homo sapiens, we further establish human RNA subcellular localization datasets. Furthermore, we utilize different nucleotide property composition models to extract effective features to adequately represent the important information of nucleotide sequences. In the most critical part, we achieve a major challenge that is to fuse the multivariate information through multiple kernel learning based on Hilbert-Schmidt independence criterion. The optimal combined kernel can be put into an integration support vector machine model for identifying multi-label RNA subcellular localizations. Our method obtained excellent results of 0.703, 0.757, 0.787, and 0.800, respectively on four RNA data sets on average precision. CONCLUSION: To be specific, our novel method performs outstanding rather than other prediction tools on novel benchmark datasets. Moreover, we establish user-friendly web server with the implementation of our method.


Asunto(s)
Biología Computacional , Proteínas , Bases de Datos de Proteínas , Humanos , ARN/genética , Máquina de Vectores de Soporte
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA