Búsqueda | Portal de Búsqueda de la BVS España

1.

Analysis of Expression Pattern of snoRNAs in Different Cancer Types with Machine Learning Algorithms.

Pan, Xiaoyong; Chen, Lei; Feng, Kai-Yan; Hu, Xiao-Hua; Zhang, Yu-Hang; Kong, Xiang-Yin; Huang, Tao; Cai, Yu-Dong.

Int J Mol Sci ; 20(9)2019 May 02.

Artículo en Inglés | MEDLINE | ID: mdl-31052553

RESUMEN

Small nucleolar RNAs (snoRNAs) are a new type of functional small RNAs involved in the chemical modifications of rRNAs, tRNAs, and small nuclear RNAs. It is reported that they play important roles in tumorigenesis via various regulatory modes. snoRNAs can both participate in the regulation of methylation and pseudouridylation and regulate the expression pattern of their host genes. This research investigated the expression pattern of snoRNAs in eight major cancer types in TCGA via several machine learning algorithms. The expression levels of snoRNAs were first analyzed by a powerful feature selection method, Monte Carlo feature selection (MCFS). A feature list and some informative features were accessed. Then, the incremental feature selection (IFS) was applied to the feature list to extract optimal features/snoRNAs, which can make the support vector machine (SVM) yield best performance. The discriminative snoRNAs included HBII-52-14, HBII-336, SNORD123, HBII-85-29, HBII-420, U3, HBI-43, SNORD116, SNORA73B, SCARNA4, HBII-85-20, etc., on which the SVM can provide a Matthew's correlation coefficient (MCC) of 0.881 for predicting these eight cancer types. On the other hand, the informative features were fed into the Johnson reducer and repeated incremental pruning to produce error reduction (RIPPER) algorithms to generate classification rules, which can clearly show different snoRNAs expression patterns in different cancer types. The analysis results indicated that extracted discriminative snoRNAs can be important for identifying cancer samples in different types and the expression pattern of snoRNAs in different cancer types can be partly uncovered by quantitative recognition rules.

Asunto(s)

Regulación Neoplásica de la Expresión Génica , Aprendizaje Automático , Neoplasias/genética , ARN Nucleolar Pequeño/genética , Algoritmos , Humanos , Método de Montecarlo , Máquina de Vectores de Soporte

2.

Prediction of drug target groups based on chemical-chemical similarities and chemical-chemical/protein connections.

Chen, Lei; Lu, Jing; Luo, Xiaomin; Feng, Kai-Yan.

Biochim Biophys Acta ; 1844(1 Pt B): 207-13, 2014 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-23732562

RESUMEN

Drug-target interaction is a key research topic in drug discovery since correct identification of target proteins of drug candidates can help screen out those with unacceptable toxicities, thereby saving expense. In this study, we developed a novel computational approach to predict drug target groups that may reduce the number of candidate target proteins associated with a query drug. A benchmark dataset, consisting of 3028 drugs assigned within nine categories, was constructed by collecting data from KEGG. The nine categories are (1) G protein-coupled receptors, (2) cytokine receptors, (3) nuclear receptors, (4) ion channels, (5) transporters, (6) enzymes, (7) protein kinases, (8) cellular antigens and (9) pathogens. The proposed method combines the data gleaned from chemical-chemical similarities, chemical-chemical connections and chemical-protein connections to allocate drugs to each of the nine target groups. A jackknife test applied to the training dataset that was constructed from the benchmark dataset, provided an overall correct prediction rate of 87.45%, as compared to 87.79% for the test dataset that was constructed by randomly selecting 10% of samples from the benchmark dataset. These prediction rates are much higher than the 11.11% achieved by random guesswork. These promising results suggest that the proposed method can become a useful tool in identifying drug target groups. This article is part of a Special Issue entitled: Computational Proteomics, Systems Biology & Clinical Implications. Guest Editor: Yudong Cai.

Asunto(s)

Bases de Datos de Proteínas , Diseño de Fármacos , Proteínas/química , Receptores Acoplados a Proteínas G/química , Algoritmos , Interacciones Farmacológicas , Humanos , Canales Iónicos/química , Terapia Molecular Dirigida , Receptores Citoplasmáticos y Nucleares/química

3.

Predicting DNA-binding sites of proteins based on sequential and 3D structural information.

Li, Bi-Qing; Feng, Kai-Yan; Ding, Juan; Cai, Yu-Dong.

Mol Genet Genomics ; 289(3): 489-99, 2014 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-24448651

RESUMEN

Protein-DNA interactions play important roles in many biological processes. To understand the molecular mechanisms of protein-DNA interaction, it is necessary to identify the DNA-binding sites in DNA-binding proteins. In the last decade, computational approaches have been developed to predict protein-DNA-binding sites based solely on protein sequences. In this study, we developed a novel predictor based on support vector machine algorithm coupled with the maximum relevance minimum redundancy method followed by incremental feature selection. We incorporated not only features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure, solvent accessibility, but also five three-dimensional (3D) structural features calculated from PDB data to predict the protein-DNA interaction sites. Feature analysis showed that 3D structural features indeed contributed to the prediction of DNA-binding site and it was demonstrated that the prediction performance was better with 3D structural features than without them. It was also shown via analysis of features from each site that the features of DNA-binding site itself contribute the most to the prediction. Our prediction method may become a useful tool for identifying the DNA-binding sites and the feature analysis described in this paper may provide useful insights for in-depth investigations into the mechanisms of protein-DNA interaction.

Asunto(s)

Sitios de Unión , Biología Computacional/métodos , Proteínas de Unión al ADN/química , ADN/química , Máquina de Vectores de Soporte , Algoritmos , ADN/metabolismo , Proteínas de Unión al ADN/metabolismo , Conformación Molecular , Unión Proteica , Reproducibilidad de los Resultados

4.

Machine Learning Classification of Time since BNT162b2 COVID-19 Vaccination Based on Array-Measured Antibody Activity.

Ma, Qing-Lan; Huang, Fei-Ming; Guo, Wei; Feng, Kai-Yan; Huang, Tao; Cai, Yu-Dong.

Life (Basel) ; 13(6)2023 May 31.

Artículo en Inglés | MEDLINE | ID: mdl-37374086

RESUMEN

Vaccines trigger an immunological response that includes B and T cells, with B cells producing antibodies. SARS-CoV-2 immunity weakens over time after vaccination. Discovering key changes in antigen-reactive antibodies over time after vaccination could help improve vaccine efficiency. In this study, we collected data on blood antibody levels in a cohort of healthcare workers vaccinated for COVID-19 and obtained 73 antigens in samples from four groups according to the duration after vaccination, including 104 unvaccinated healthcare workers, 534 healthcare workers within 60 days after vaccination, 594 healthcare workers between 60 and 180 days after vaccination, and 141 healthcare workers over 180 days after vaccination. Our work was a reanalysis of the data originally collected at Irvine University. This data was obtained in Orange County, California, USA, with the collection process commencing in December 2020. British variant (B.1.1.7), South African variant (B.1.351), and Brazilian/Japanese variant (P.1) were the most prevalent strains during the sampling period. An efficient machine learning based framework containing four feature selection methods (least absolute shrinkage and selection operator, light gradient boosting machine, Monte Carlo feature selection, and maximum relevance minimum redundancy) and four classification algorithms (decision tree, k-nearest neighbor, random forest, and support vector machine) was designed to select essential antibodies against specific antigens. Several efficient classifiers with a weighted F1 value around 0.75 were constructed. The antigen microarray used for identifying antibody levels in the coronavirus features ten distinct SARS-CoV-2 antigens, comprising various segments of both nucleocapsid protein (NP) and spike protein (S). This study revealed that S1 + S2, S1.mFcTag, S1.HisTag, S1, S2, Spike.RBD.His.Bac, Spike.RBD.rFc, and S1.RBD.mFc were most highly ranked among all features, where S1 and S2 are the subunits of Spike, and the suffixes represent the tagging information of different recombinant proteins. Meanwhile, the classification rules were obtained from the optimal decision tree to explain quantitatively the roles of antigens in the classification. This study identified antibodies associated with decreased clinical immunity based on populations with different time spans after vaccination. These antibodies have important implications for maintaining long-term immunity to SARS-CoV-2.

5.

Patterns of Gene Expression Profiles Associated with Colorectal Cancer in Colorectal Mucosa by Using Machine Learning Methods.

Ren, Jing Xin; Chen, Lei; Guo, Wei; Feng, Kai Yan; Huang, Tao; Cai, Yu-Dong.

Comb Chem High Throughput Screen ; 2023 Nov 10.

Artículo en Inglés | MEDLINE | ID: mdl-37957897

RESUMEN

BACKGROUND: Colorectal cancer (CRC) has a very high incidence and lethality rate and is one of the most dangerous cancer types. Timely diagnosis can effectively reduce the incidence of colorectal cancer. Changes in para-cancerous tissues may serve as an early signal for tumorigenesis. Comparison of the differences in gene expression between para-cancerous and normal mucosa can help in the diagnosis of CRC and understanding the mechanisms of development. OBJECTIVES: This study aimed to identify specific genes at the level of gene expression, which are expressed in normal mucosa and may be predictive of CRC risk. METHODS: A machine learning approach was used to analyze transcriptomic data in 459 samples of normal colonic mucosal tissue from 322 CRC cases and 137 non-CRC, in which each sample contained 28,706 gene expression levels. The genes were ranked using four ranking methods based on importance estimation (LASSO, LightGBM, MCFS, mRMR, and RF) and four classification algorithms (decision tree [DT], K-nearest neighbor [KNN], random forest [RF], and support vector machine [SVM]) were combined with incremental feature selection [IFS] methods to construct a prediction model with excellent performance. RESULT: The top-ranked genes, namely, HOXD12, CDH1, and S100A12, were associated with tumorigenesis based on previous studies. CONCLUSION: This study summarized four sets of quantitative classification rules based on the DT algorithm, providing clues for understanding the microenvironmental changes caused by CRC. According to the rules, the effect of CRC on normal mucosa can be determined.

6.

Identification of Gene Markers Associated with COVID-19 Severity and Recovery in Different Immune Cell Subtypes.

Ren, Jing-Xin; Gao, Qian; Zhou, Xiao-Chao; Chen, Lei; Guo, Wei; Feng, Kai-Yan; Lu, Lin; Huang, Tao; Cai, Yu-Dong.

Biology (Basel) ; 12(7)2023 Jul 02.

Artículo en Inglés | MEDLINE | ID: mdl-37508378

RESUMEN

As COVID-19 develops, dynamic changes occur in the patient's immune system. Changes in molecular levels in different immune cells can reflect the course of COVID-19. This study aims to uncover the molecular characteristics of different immune cell subpopulations at different stages of COVID-19. We designed a machine learning workflow to analyze scRNA-seq data of three immune cell types (B, T, and myeloid cells) in four levels of COVID-19 severity/outcome. The datasets for three cell types included 403,700 B-cell, 634,595 T-cell, and 346,547 myeloid cell samples. Each cell subtype was divided into four groups, control, convalescence, progression mild/moderate, and progression severe/critical, and each immune cell contained 27,943 gene features. A feature analysis procedure was applied to the data of each cell type. Irrelevant features were first excluded according to their relevance to the target variable measured by mutual information. Then, four ranking algorithms (last absolute shrinkage and selection operator, light gradient boosting machine, Monte Carlo feature selection, and max-relevance and min-redundancy) were adopted to analyze the remaining features, resulting in four feature lists. These lists were fed into the incremental feature selection, incorporating three classification algorithms (decision tree, k-nearest neighbor, and random forest) to extract key gene features and construct classifiers with superior performance. The results confirmed that genes such as PFN1, RPS26, and FTH1 played important roles in SARS-CoV-2 infection. These findings provide a useful reference for the understanding of the ongoing effect of COVID-19 development on the immune system.

7.

Predicting the network of substrate-enzyme-product triads by combining compound similarity and functional domain composition.

Chen, Lei; Feng, Kai-Yan; Cai, Yu-Dong; Chou, Kuo-Chen; Li, Hai-Peng.

BMC Bioinformatics ; 11: 293, 2010 May 31.

Artículo en Inglés | MEDLINE | ID: mdl-20513238

RESUMEN

BACKGROUND: Metabolic pathway is a highly regulated network consisting of many metabolic reactions involving substrates, enzymes, and products, where substrates can be transformed into products with particular catalytic enzymes. Since experimental determination of the network of substrate-enzyme-product triad (whether the substrate can be transformed into the product with a given enzyme) is both time-consuming and expensive, it would be very useful to develop a computational approach for predicting the network of substrate-enzyme-product triads. RESULTS: A mathematical model for predicting the network of substrate-enzyme-product triads was developed. Meanwhile, a benchmark dataset was constructed that contains 744,192 substrate-enzyme-product triads, of which 14,592 are networking triads, and 729,600 are non-networking triads; i.e., the number of the negative triads was about 50 times the number of the positive triads. The molecular graph was introduced to calculate the similarity between the substrate compounds and between the product compounds, while the functional domain composition was introduced to calculate the similarity between enzyme molecules. The nearest neighbour algorithm was utilized as a prediction engine, in which a novel metric was introduced to measure the "nearness" between triads. To train and test the prediction engine, one tenth of the positive triads and one tenth of the negative triads were randomly picked from the benchmark dataset as the testing samples, while the remaining were used to train the prediction model. It was observed that the overall success rate in predicting the network for the testing samples was 98.71%, with 95.41% success rate for the 1,460 testing networking triads and 98.77% for the 72,960 testing non-networking triads. CONCLUSIONS: It is quite promising and encouraged to use the molecular graph to calculate the similarity between compounds and use the functional domain composition to calculate the similarity between enzymes for studying the substrate-enzyme-product network system. The software is available upon request.

Asunto(s)

Algoritmos , Enzimas/metabolismo , Sitios de Unión , Cinética , Redes y Vías Metabólicas , Relación Estructura-Actividad , Especificidad por Sustrato

8.

HIV-1 protease cleavage site prediction based on amino acid property.

Niu, Bing; Lu, Lin; Liu, Liang; Gu, Tian Hong; Feng, Kai-Yan; Lu, Wen-Cong; Cai, Yu-Dong.

J Comput Chem ; 30(1): 33-9, 2009 Jan 15.

Artículo en Inglés | MEDLINE | ID: mdl-18496789

RESUMEN

Knowledge of the polyprotein cleavage sites by HIV protease will refine our understanding of its specificity, and the information thus acquired is useful for designing specific and efficient HIV protease inhibitors. Recently, several works have approached the HIV-1 protease specificity problem by applying a number of classifier creation and combination methods. The pace in searching for the proper inhibitors of HIV protease will be greatly expedited if one can find an accurate, robust, and rapid method for predicting the cleavage sites in proteins by HIV protease. In this article, we selected HIV-1 protease as the subject of the study. 299 oligopeptides were chosen for the training set, while the other 63 oligopeptides were taken as a test set. The peptides are represented by features constructed by AAIndex (Kawashima et al., Nucleic Acids Res 1999, 27, 368; Kawashima and Kanehisa, Nucleic Acids Res 2000, 28, 374). The mRMR method (Maximum Relevance, Minimum Redundancy; Ding and Peng, Proc Second IEEE Comput Syst Bioinformatics Conf 2003, 523; Peng et al., IEEE Trans Pattern Anal Mach Intell 2005, 27, 1226) combining with incremental feature selection (IFS) and feature forward search (FFS) are applied to find the two important cleavage sites and to select 364 important biochemistry features by jackknife test. Using KNN (K-nearest neighbors) to combine the selected features, the prediction model obtains high accuracy rate of 91.3% for Jackknife cross-validation test and 87.3% for independent-set test. It is expected that our feature selection scheme can be referred to as a useful assistant technique for finding effective inhibitors of HIV protease, especially for the scientists in this field.

Asunto(s)

Aminoácidos/química , Proteasa del VIH/química , VIH-1/enzimología , Oligopéptidos/química , Algoritmos , Sitios de Unión , Biología Computacional , Proteasa del VIH/metabolismo , Modelos Químicos , Oligopéptidos/metabolismo , Relación Estructura-Actividad , Especificidad por Sustrato

9.

Predicting subcellular localization with AdaBoost Learner.

Jin, Yu-Huan; Niu, Bing; Feng, Kai-Yan; Lu, Wen-Cong; Cai, Yu-Dong; Li, Guo-Zheng.

Protein Pept Lett ; 15(3): 286-9, 2008.

Artículo en Inglés | MEDLINE | ID: mdl-18336359

RESUMEN

Protein subcellular localization, which tells where a protein resides in a cell, is an important characteristic of a protein, and relates closely to the function of proteins. The prediction of their subcellular localization plays an important role in the prediction of protein function, genome annotation and drug design. Therefore, it is an important and challenging role to predict subcellular localization using bio-informatics approach. In this paper, a robust predictor, AdaBoost Learner is introduced to predict protein subcellular localization based on its amino acid composition. Jackknife cross-validation and independent dataset test were used to demonstrate that Adaboost is a robust and efficient model in predicting protein subcellular localization. As a result, the correct prediction rates were 74.98% and 80.12% for the Jackknife test and independent dataset test respectively, which are higher than using other existing predictors. An online server for predicting subcellular localization of proteins based on AdaBoost classifier was available on http://chemdata.shu. edu.cn/sl12.

Asunto(s)

Algoritmos , Proteínas/análisis , Simulación por Computador , Bases de Datos de Proteínas , Proteínas/química , Proteínas/metabolismo , Programas Informáticos , Fracciones Subcelulares/química , Fracciones Subcelulares/metabolismo

10.

Predicting membrane protein types with bragging learner.

Niu, Bing; Jin, Yu-Huan; Feng, Kai-Yan; Liu, Liang; Lu, Wen-Cong; Cai, Yu-Dong; Li, Guo-Zheng.

Protein Pept Lett ; 15(6): 590-4, 2008.

Artículo en Inglés | MEDLINE | ID: mdl-18680454

RESUMEN

The membrane protein type is an important feature in characterizing the overall topological folding type of a protein or its domains therein. Many investigators have put their efforts to the prediction of membrane protein type. Here, we propose a new approach, the bootstrap aggregating method or bragging learner, to address this problem based on the protein amino acid composition. As a demonstration, the benchmark dataset constructed by K.C. Chou and D.W. Elrod was used to test the new method. The overall success rate thus obtained by jackknife cross-validation was over 84%, indicating that the bragging learner as presented in this paper holds a quite high potential in predicting the attributes of proteins, or at least can play a complementary role to many existing algorithms in this area. It is anticipated that the prediction quality can be further enhanced if the pseudo amino acid composition can be effectively incorporated into the current predictor. An online membrane protein type prediction web server developed in our lab is available at http://chemdata.shu.edu.cn/protein/protein.jsp.

Asunto(s)

Algoritmos , Proteínas de la Membrana/química , Análisis de Secuencia de Proteína/métodos , Bases de Datos de Proteínas , Internet , Conformación Proteica

11.

Support vector machine for predicting alpha-turn types.

Cai, Yu-Dong; Feng, Kai-Yan; Li, Yi-Xue; Chou, Kuo-Chen.

Peptides ; 24(4): 629-30, 2003 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-12860209

RESUMEN

Tight turns play an important role in globular proteins from both the structural and functional points of view. Of tight turns, beta-turns and gamma-turns have been extensively studied, but alpha-turns were little investigated. Recently, a systematic search for alpha-turns classified alpha-turns into nine different types according to their backbone trajectory features. In this paper, Support Vector Machines (SVMs), a new machine learning method, is proposed for predicting the alpha-turn types in proteins. The high rates of correct prediction imply that that the formation of different alpha-turn types is evidently correlated with the sequence of a pentapeptide, and hence can be approximately predicted based on the sequence information of the pentapeptide alone, although the incorporation of its interaction with the other part of a protein, the so-called "long distance interaction", will further improve the prediction quality.

Asunto(s)

Biología Computacional/métodos , Conformación Proteica , Estructura Secundaria de Proteína , Algoritmos , Secuencia de Aminoácidos , Vectores Genéticos , Datos de Secuencia Molecular , Péptidos/química , Programas Informáticos

12.

Prediction of cancer drugs by chemical-chemical interactions.

Lu, Jing; Huang, Guohua; Li, Hai-Peng; Feng, Kai-Yan; Chen, Lei; Zheng, Ming-Yue; Cai, Yu-Dong.

PLoS One ; 9(2): e87791, 2014.

Artículo en Inglés | MEDLINE | ID: mdl-24498372

RESUMEN

Cancer, which is a leading cause of death worldwide, places a big burden on health-care system. In this study, an order-prediction model was built to predict a series of cancer drug indications based on chemical-chemical interactions. According to the confidence scores of their interactions, the order from the most likely cancer to the least one was obtained for each query drug. The 1(st) order prediction accuracy of the training dataset was 55.93%, evaluated by Jackknife test, while it was 55.56% and 59.09% on a validation test dataset and an independent test dataset, respectively. The proposed method outperformed a popular method based on molecular descriptors. Moreover, it was verified that some drugs were effective to the 'wrong' predicted indications, indicating that some 'wrong' drug indications were actually correct indications. Encouraged by the promising results, the method may become a useful tool to the prediction of drugs indications.

Asunto(s)

Antineoplásicos/farmacología , Interacciones Farmacológicas , Informática/métodos , Modelos Teóricos , Neoplasias/tratamiento farmacológico , Humanos

13.

Using WPNNA classifier in ubiquitination site prediction based on hybrid features.

Feng, Kai-Yan; Huang, Tao; Feng, Kai-Rui; Liu, Xiao-Jun.

Protein Pept Lett ; 20(3): 318-23, 2013 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-22591471

RESUMEN

Ubiquitination, a reversible protein post-translational modification (PTM), occurs when an amide bond is formed between ubiquitin (a small protein) and the targeted protein. It involves in a wide variety of cellular processes and is associated with various diseases such as Alzheimer's disease. In order to understand ubiquitination at the molecular level, it is important to identify the ubiquitination site by which the ubiquitin binds to. Since experimental methods to determine ubiquitination sites are both expensive and time-consuming, it is necessary to develop in-silico methods to predict ubiquitination sites based on merely the sequential information of the target protein. In this paper, we apply a new classifier called weighted passive nearest neighbor algorithm (WPNNA) to predict the ubiquitination sites. WPNNA was demonstrated to be insensitive to the varied datum densities between different classes. A hybrid of features, including PSSM conservation scores, amino acid factors and disorder scores, are employed to code the protein fragments centered on the possible ubiquitination sites. The Mathew's correlation coefficient (MCC) of our predictor on a training dataset is 0.169 with sensitivity of 31.6% and specificity of 82.9%, and on an independent test dataset is 0.403 with sensitivity of 64.3% and specificity of 75.7%. We compare our predictor with that of a recent published paper which also made predictions on the same datasets. Our predictor achieves much better sensitivities on both datasets than the paper and achieves much better MCC than the paper on the independent test dataset, indicating that the predictor based on WPNNA is as least a good complement to the current state of art in ubiquitination site prediction.

Asunto(s)

Aminoácidos/química , Proteínas , Ubiquitina , Ubiquitinación , Algoritmos , Animales , Sitios de Unión , Biología Computacional/métodos , Humanos , Procesamiento Proteico-Postraduccional , Proteínas/química , Proteínas/metabolismo , Análisis de Secuencia de Proteína , Ubiquitina/química , Ubiquitina/metabolismo

14.

Prediction of protein-protein interactions based on feature selection and data balancing.

Liu, Liang; Lu, Wen-Cong; Cai, Yu-Dong; Feng, Kai-Yan; Peng, Chunrong; Zhu, Yubei.

Protein Pept Lett ; 20(3): 336-45, 2013 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-22591478

RESUMEN

Computational approaches are able to analyze protein-protein interactions (PPIs) from a different angle of view by complementing the experimental ones. And they are very efficient in determining whether two proteins can interact with each other. In this paper, KNNs (K-nearest neighbors) is applied to predict the PPIs by coding each protein with the physical and chemical properties of its residues, predicted secondary structures and amino acid compositions. mRMR (minimum-redundancy maximum-relevance) feature selection is adopted to select a compact feature set, features of which are considered to be important for the determination of PPI-nesses. Because the size of the negative dataset (containing non-interactive protein pairs) is much larger than that of the positive dataset (containing interactive protein pairs), the negative dataset is divided into 5 portions and each portion is combined with the positive dataset for one prediction. Thus 5 predictions are performed and the final results are obtained through voting. As a result, the prediction achieves an overall accuracy of 0.8369 with sensitivity of 0.7356. The predictor, developed by this research for the prediction of the fruit fly PPI-nesses, is available for public use at http://chemdata.shu.edu.cn/ppip.

Asunto(s)

Aminoácidos/química , Biología Computacional/métodos , Unión Proteica , Proteínas/química , Algoritmos , Mapas de Interacción de Proteínas

15.

Inter- and intra-chain disulfide bond prediction based on optimal feature selection.

Niu, Shen; Huang, Tao; Feng, Kai-Yan; He, Zhisong; Cui, Weiren; Gu, Lei; Li, Haipeng; Cai, Yu-Dong; Li, Yixue.

Protein Pept Lett ; 20(3): 324-35, 2013 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-22591475

RESUMEN

Protein disulfide bond is formed during post-translational modifications, and has been implicated in various physiological and pathological processes. Proper localization of disulfide bonds also facilitates the prediction of protein three-dimensional (3D) structure. However, it is both time-consuming and labor-intensive using conventional experimental approaches to determine disulfide bonds, especially for large-scale data sets. Since there are also some limitations for disulfide bond prediction based on 3D structure features, developing sequence-based, convenient and fast-speed computational methods for both inter- and intra-chain disulfide bond prediction is necessary. In this study, we developed a computational method for both types of disulfide bond prediction based on maximum relevance and minimum redundancy (mRMR) method followed by incremental feature selection (IFS), with nearest neighbor algorithm as its prediction model. Features of sequence conservation, residual disorder, and amino acid factor are used for inter-chain disulfide bond prediction. And in addition to these features, sequential distance between a pair of cysteines is also used for intra-chain disulfide bond prediction. Our approach achieves a prediction accuracy of 0.8702 for inter-chain disulfide bond prediction using 128 features and 0.9219 for intra-chain disulfide bond prediction using 261 features. Analysis of optimal feature set indicated key features and key sites for the disulfide bond formation. Interestingly, comparison of top features between interand intra-chain disulfide bonds revealed the similarities and differences of the mechanisms of forming these two types of disulfide bonds, which might help understand more of the mechanisms and provide clues to further experimental studies in this research field.

Asunto(s)

Aminoácidos/química , Cisteína/química , Disulfuros/química , Proteínas/química , Algoritmos , Biología Computacional , Conformación Molecular , Pliegue de Proteína , Procesamiento Proteico-Postraduccional

16.

Prediction of active sites of enzymes by maximum relevance minimum redundancy (mRMR) feature selection.

Gao, Yu-Fei; Li, Bi-Qing; Cai, Yu-Dong; Feng, Kai-Yan; Li, Zhan-Dong; Jiang, Yang.

Mol Biosyst ; 9(1): 61-9, 2013 Jan 27.

Artículo en Inglés | MEDLINE | ID: mdl-23117653

RESUMEN

Identification of catalytic residues plays a key role in understanding how enzymes work. Although numerous computational methods have been developed to predict catalytic residues and active sites, the prediction accuracy remains relatively low with high false positives. In this work, we developed a novel predictor based on the Random Forest algorithm (RF) aided by the maximum relevance minimum redundancy (mRMR) method and incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility to predict active sites of enzymes and achieved an overall accuracy of 0.885687 and MCC of 0.689226 on an independent test dataset. Feature analysis showed that every category of the features except disorder contributed to the identification of active sites. It was also shown via the site-specific feature analysis that the features derived from the active site itself contributed most to the active site determination. Our prediction method may become a useful tool for identifying the active sites and the key features identified by the paper may provide valuable insights into the mechanism of catalysis.

Asunto(s)

Biología Computacional/métodos , Enzimas/química , Enzimas/metabolismo , Modelos Químicos , Dominio Catalítico , Fenómenos Químicos , Secuencia Conservada , Bases de Datos de Proteínas , Árboles de Decisión , Estructura Secundaria de Proteína , Análisis de Secuencia de Proteína , Relación Estructura-Actividad , Máquina de Vectores de Soporte

17.

Prediction of effective drug combinations by chemical interaction, protein interaction and target enrichment of KEGG pathways.

Chen, Lei; Li, Bi-Qing; Zheng, Ming-Yue; Zhang, Jian; Feng, Kai-Yan; Cai, Yu-Dong.

Biomed Res Int ; 2013: 723780, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-24083237

RESUMEN

Drug combinatorial therapy could be more effective in treating some complex diseases than single agents due to better efficacy and reduced side effects. Although some drug combinations are being used, their underlying molecular mechanisms are still poorly understood. Therefore, it is of great interest to deduce a novel drug combination by their molecular mechanisms in a robust and rigorous way. This paper attempts to predict effective drug combinations by a combined consideration of: (1) chemical interaction between drugs, (2) protein interactions between drugs' targets, and (3) target enrichment of KEGG pathways. A benchmark dataset was constructed, consisting of 121 confirmed effective combinations and 605 random combinations. Each drug combination was represented by 465 features derived from the aforementioned three properties. Some feature selection techniques, including Minimum Redundancy Maximum Relevance and Incremental Feature Selection, were adopted to extract the key features. Random forest model was built with its performance evaluated by 5-fold cross-validation. As a result, 55 key features providing the best prediction result were selected. These important features may help to gain insights into the mechanisms of drug combinations, and the proposed prediction model could become a useful tool for screening possible drug combinations.

Asunto(s)

Biología Computacional/métodos , Combinación de Medicamentos , Interacciones Farmacológicas , Preparaciones Farmacéuticas/metabolismo , Proteínas/metabolismo , Transducción de Señal , Algoritmos , Curva ROC

18.

Predicting drugs side effects based on chemical-chemical interactions and protein-chemical interactions.

Chen, Lei; Huang, Tao; Zhang, Jian; Zheng, Ming-Yue; Feng, Kai-Yan; Cai, Yu-Dong; Chou, Kuo-Chen.

Biomed Res Int ; 2013: 485034, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-24078917

RESUMEN

A drug side effect is an undesirable effect which occurs in addition to the intended therapeutic effect of the drug. The unexpected side effects that many patients suffer from are the major causes of large-scale drug withdrawal. To address the problem, it is highly demanded by pharmaceutical industries to develop computational methods for predicting the side effects of drugs. In this study, a novel computational method was developed to predict the side effects of drug compounds by hybridizing the chemical-chemical and protein-chemical interactions. Compared to most of the previous works, our method can rank the potential side effects for any query drug according to their predicted level of risk. A training dataset and test datasets were constructed from the benchmark dataset that contains 835 drug compounds to evaluate the method. By a jackknife test on the training dataset, the 1st order prediction accuracy was 86.30%, while it was 89.16% on the test dataset. It is expected that the new method may become a useful tool for drug design, and that the findings obtained by hybridizing various interactions in a network system may provide useful insights for conducting in-depth pharmacological research as well, particularly at the level of systems biomedicine.

Asunto(s)

Interacciones Farmacológicas , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/metabolismo , Preparaciones Farmacéuticas/metabolismo , Proteínas/metabolismo , Bases de Datos como Asunto , Humanos

19.

Prediction of drugs target groups based on ChEBI ontology.

Gao, Yu-Fei; Chen, Lei; Huang, Guo-Hua; Zhang, Tao; Feng, Kai-Yan; Li, Hai-Peng; Jiang, Yang.

Biomed Res Int ; 2013: 132724, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-24350241

RESUMEN

Most drugs have beneficial as well as adverse effects and exert their biological functions by adjusting and altering the functions of their target proteins. Thus, knowledge of drugs target proteins is essential for the improvement of therapeutic effects and mitigation of undesirable side effects. In the study, we proposed a novel prediction method based on drug/compound ontology information extracted from ChEBI to identify drugs target groups from which the kind of functions of a drug may be deduced. By collecting data in KEGG, a benchmark dataset consisting of 876 drugs, categorized into four target groups, was constructed. To evaluate the method more thoroughly, the benchmark dataset was divided into a training dataset and an independent test dataset. It is observed by jackknife test that the overall prediction accuracy on the training dataset was 83.12%, while it was 87.50% on the test dataset-the predictor exhibited an excellent generalization. The good performance of the method indicates that the ontology information of the drugs contains rich information about their target groups, and the study may become an inspiration to solve the problems of this sort and bridge the gap between ChEBI ontology and drugs target groups.

Asunto(s)

Sistemas de Liberación de Medicamentos/métodos , Ontologías Biológicas , Bases de Datos Factuales , Proteínas/metabolismo

20.

Identification of age-related macular degeneration related genes by applying shortest path algorithm in protein-protein interaction network.

Zhang, Jian; Jiang, Min; Yuan, Fei; Feng, Kai-Yan; Cai, Yu-Dong; Xu, Xun; Chen, Lei.

Biomed Res Int ; 2013: 523415, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-24455700

RESUMEN

This study attempted to find novel age-related macular degeneration (AMD) related genes based on 36 known AMD genes. The well-known shortest path algorithm, Dijkstra's algorithm, was applied to find the shortest path connecting each pair of known AMD related genes in protein-protein interaction (PPI) network. The genes occurring in any shortest path were considered as candidate AMD related genes. As a result, 125 novel AMD genes were predicted. The further analysis based on betweenness and permutation test indicates that there are 10 genes involved in the formation or development of AMD and may be the actual AMD related genes with high probability. We hope that this contribution would promote the study of age-related macular degeneration and discovery of novel effective treatments.

Asunto(s)

Biología Computacional/métodos , Predisposición Genética a la Enfermedad , Degeneración Macular/genética , Mapas de Interacción de Proteínas/genética , Factores de Edad , Algoritmos , Humanos , Degeneración Macular/patología , Modelos Teóricos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA