Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
J Org Chem ; 85(14): 9367-9374, 2020 Jul 17.
Artigo em Inglês | MEDLINE | ID: mdl-32578986

RESUMO

The dearomatizing spirocyclization of phenolic biarylic ketones using PhI(OCOCF3)2 as oxidant is presented. The reaction affords various cyclohexadienones through C-C bond cleavage under mild conditions. Mechanistic investigations reveal that an exocyclic enol ether acts as the key intermediate in the transformation.

2.
Mol Genet Genomics ; 289(3): 489-99, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24448651

RESUMO

Protein-DNA interactions play important roles in many biological processes. To understand the molecular mechanisms of protein-DNA interaction, it is necessary to identify the DNA-binding sites in DNA-binding proteins. In the last decade, computational approaches have been developed to predict protein-DNA-binding sites based solely on protein sequences. In this study, we developed a novel predictor based on support vector machine algorithm coupled with the maximum relevance minimum redundancy method followed by incremental feature selection. We incorporated not only features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure, solvent accessibility, but also five three-dimensional (3D) structural features calculated from PDB data to predict the protein-DNA interaction sites. Feature analysis showed that 3D structural features indeed contributed to the prediction of DNA-binding site and it was demonstrated that the prediction performance was better with 3D structural features than without them. It was also shown via analysis of features from each site that the features of DNA-binding site itself contribute the most to the prediction. Our prediction method may become a useful tool for identifying the DNA-binding sites and the feature analysis described in this paper may provide useful insights for in-depth investigations into the mechanisms of protein-DNA interaction.


Assuntos
Sítios de Ligação , Biologia Computacional/métodos , Proteínas de Ligação a DNA/química , DNA/química , Máquina de Vetores de Suporte , Algoritmos , DNA/metabolismo , Proteínas de Ligação a DNA/metabolismo , Conformação Molecular , Ligação Proteica , Reprodutibilidade dos Testes
3.
J Biomol Struct Dyn ; 33(11): 2479-90, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25616595

RESUMO

Lysine acetylation and ubiquitination are two primary post-translational modifications (PTMs) in most eukaryotic proteins. Lysine residues are targets for both types of PTMs, resulting in different cellular roles. With the increasing availability of protein sequences and PTM data, it is challenging to distinguish the two types of PTMs on lysine residues. Experimental approaches are often laborious and time consuming. There is an urgent need for computational tools to distinguish between lysine acetylation and ubiquitination. In this study, we developed a novel method, called DAUFSA (distinguish between lysine acetylation and lysine ubiquitination with feature selection and analysis), to discriminate ubiquitinated and acetylated lysine residues. The method incorporated several types of features: PSSM (position-specific scoring matrix) conservation scores, amino acid factors, secondary structures, solvent accessibilities, and disorder scores. By using the mRMR (maximum relevance minimum redundancy) method and the IFS (incremental feature selection) method, an optimal feature set containing 290 features was selected from all incorporated features. A dagging-based classifier constructed by the optimal features achieved a classification accuracy of 69.53%, with an MCC of .3853. An optimal feature set analysis showed that the PSSM conservation score features and the amino acid factor features were the most important attributes, suggesting differences between acetylation and ubiquitination. Our study results also supported previous findings that different motifs were employed by acetylation and ubiquitination. The feature differences between the two modifications revealed in this study are worthy of experimental validation and further investigation.


Assuntos
Lisina/química , Lisina/metabolismo , Acetilação , Sequência de Aminoácidos , Biologia Computacional/métodos , Sequência Conservada , Bases de Dados Genéticas , Matrizes de Pontuação de Posição Específica , Conformação Proteica , Processamento de Proteína Pós-Traducional , Ubiquitinação
4.
PLoS One ; 9(2): e88300, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24505469

RESUMO

Lung cancer is one of the leading causes of cancer mortality worldwide and non-small cell lung cancer (NSCLC) accounts for the most part. NSCLC can be further divided into adenocarcinoma (ACA) and squamous cell carcinoma (SCC). It is of great value to distinguish these two subgroups clinically. In this study, we compared the genome-wide copy number alterations (CNAs) patterns of 208 early stage ACA and 93 early stage SCC tumor samples. As a result, 266 CNA probes stood out for better discrimination of ACA and SCC. It was revealed that the genes corresponding to these 266 probes were enriched in lung cancer related pathways and enriched in the chromosome regions where CNA usually occur in lung cancer. This study sheds lights on the CNA study of NSCLC and provides some insights on the epigenetic of NSCLC.


Assuntos
Adenocarcinoma/genética , Carcinoma Pulmonar de Células não Pequenas/genética , Carcinoma de Células Escamosas/genética , Variações do Número de Cópias de DNA , Neoplasias Pulmonares/genética , Adenocarcinoma/classificação , Adenocarcinoma/patologia , Carcinoma Pulmonar de Células não Pequenas/classificação , Carcinoma Pulmonar de Células não Pequenas/patologia , Carcinoma de Células Escamosas/classificação , Carcinoma de Células Escamosas/patologia , Dosagem de Genes , Humanos , Pulmão/metabolismo , Pulmão/patologia , Neoplasias Pulmonares/classificação , Neoplasias Pulmonares/patologia
5.
PLoS One ; 9(9): e107464, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25222670

RESUMO

Post-translational modifications (PTMs) are crucial steps in protein synthesis and are important factors contributing to protein diversity. PTMs play important roles in the regulation of gene expression, protein stability and metabolism. Lysine residues in protein sequences have been found to be targeted for both types of PTMs: sumoylations and acetylations; however, each PTM has a different cellular role. As experimental approaches are often laborious and time consuming, it is challenging to distinguish the two types of PTMs on lysine residues using computational methods. In this study, we developed a method to discriminate between sumoylated lysine residues and acetylated residues. The method incorporated several features: PSSM conservation scores, amino acid factors, secondary structures, solvent accessibilities and disorder scores. By using the mRMR (Maximum Relevance Minimum Redundancy) method and the IFS (Incremental Feature Selection) method, an optimal feature set was selected from all of the incorporated features, with which the classifier achieved 92.14% accuracy with an MCC value of 0.7322. Analysis of the optimal feature set revealed some differences between acetylation and sumoylation. The results from our study also supported the previous finding that there exist different consensus motifs for the two types of PTMs. The results could suggest possible dominant factors governing the acetylation and sumoylation of lysine residues, shedding some light on the modification dynamics and molecular mechanisms of the two types of PTMs, and provide guidelines for experimental validations.


Assuntos
Lisina/metabolismo , Acetilação , Sumoilação
6.
PLoS One ; 9(1): e86729, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24466214

RESUMO

Aptamers are oligonucleic acid or peptide molecules that bind to specific target molecules. As a novel and powerful class of ligands, aptamers are thought to have excellent potential for applications in the fields of biosensing, diagnostics and therapeutics. In this study, a new method for predicting aptamer-target interacting pairs was proposed by integrating features derived from both aptamers and their targets. Features of nucleotide composition and traditional amino acid composition as well as pseudo amino acid were utilized to represent aptamers and targets, respectively. The predictor was constructed based on Random Forest and the optimal features were selected by using the maximum relevance minimum redundancy (mRMR) method and the incremental feature selection (IFS) method. As a result, 81.34% accuracy and 0.4612 MCC were obtained for the training dataset, and 77.41% accuracy and 0.3717 MCC were achieved for the testing dataset. An optimal feature set of 220 features were selected, which were considered as the ones that contributed significantly to the interacting aptamer-target pair predictions. Analysis of the optimal feature set indicated several important factors in determining aptamer-target interactions. It is anticipated that our prediction method may become a useful tool for identifying aptamer-target pairs and the features selected and analyzed in this study may provide useful insights into the mechanism of interactions between aptamers and targets.


Assuntos
Aptâmeros de Nucleotídeos/química , Aptâmeros de Peptídeos/química , Biologia Computacional/métodos , Modelos Genéticos , Algoritmos , Aminoácidos/análise , Inteligência Artificial , Composição de Bases , Ligantes , Relação Estrutura-Atividade
7.
Biomed Res Int ; 2014: 438341, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25184139

RESUMO

Protein S-nitrosylation plays a very important role in a wide variety of cellular biological activities. Hitherto, accurate prediction of S-nitrosylation sites is still of great challenge. In this paper, we presented a framework to computationally predict S-nitrosylation sites based on kernel sparse representation classification and minimum Redundancy Maximum Relevance algorithm. As much as 666 features derived from five categories of amino acid properties and one protein structure feature are used for numerical representation of proteins. A total of 529 protein sequences collected from the open-access databases and published literatures are used to train and test our predictor. Computational results show that our predictor achieves Matthews' correlation coefficients of 0.1634 and 0.2919 for the training set and the testing set, respectively, which are better than those of k-nearest neighbor algorithm, random forest algorithm, and sparse representation classification algorithm. The experimental results also indicate that 134 optimal features can better represent the peptides of protein S-nitrosylation than the original 666 redundant features. Furthermore, we constructed an independent testing set of 113 protein sequences to evaluate the robustness of our predictor. Experimental result showed that our predictor also yielded good performance on the independent testing set with Matthews' correlation coefficients of 0.2239.


Assuntos
Algoritmos , Biologia Computacional , Processamento de Proteína Pós-Traducional , Proteínas/química , Sequência de Aminoácidos , Aminoácidos/química , Aminoácidos/genética , Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Proteínas/genética , Proteínas/metabolismo , Software
8.
Protein Pept Lett ; 20(3): 352-63, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22591477

RESUMO

Colorectal cancer (CRC) is one of the most malignant cancers. A growing number of studies have shown that both genetic and epigenetic play important roles in the etiology of CRC. Both microRNA (miRNA) and DNA methylation belong to the scope of epigenetic and there are complex regulatory mechanisms within miRNA and DNA methylation. We compiled 71 CRC related genes and 134 CRC related miRNAs. Then we identified 417 feed forward loops (FFLs) and 37 feedback loops (FBLs) among these genes, miRNAs and transcription factors (TFs). We constructed a network of miRNAs and TFs mediation for CRC utilizing these FFLs and FBLs. Statistical tests proved that these FFLs were significantly enriched in the CRC comparing to the esophageal cancer, breast cancer and randomly selected CRCmiRNA-gene pairs. Analysis of the network singled out 3 core genes, 2 core miRNAs and 5 core TFs. The KEGG enrichment and GO enrichment for the 2 core miRNA target genes indicated that they were significantly enriched in CRC related pathways. (Ex. MARK pathway, TGFß pathway and cell cycle) Through the investigation on methylation, we found that most of the CRC related genes and miRNAs were prone to be regulated by methylation. This study sheds lights on the regulatory mechanisms in CRC and we provide some insights on the epigenetic of CRC.


Assuntos
Neoplasias Colorretais/genética , Metilação de DNA/genética , Redes Reguladoras de Genes , MicroRNAs/genética , Neoplasias Colorretais/metabolismo , Neoplasias Colorretais/patologia , Epigênese Genética , Regulação Neoplásica da Expressão Gênica , Humanos , Redes e Vias Metabólicas , MicroRNAs/metabolismo , Fatores de Transcrição/genética
9.
Biomed Res Int ; 2013: 304029, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23998122

RESUMO

One of the most important and challenging problems in biomedicine is how to predict the cancer related genes. Retinoblastoma (RB) is the most common primary intraocular malignancy usually occurring in childhood. Early detection of RB could reduce the morbidity and promote the probability of disease-free survival. Therefore, it is of great importance to identify RB genes. In this study, we developed a computational method to predict RB related genes based on Dagging, with the maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). 119 RB genes were compiled from two previous RB related studies, while 5,500 non-RB genes were randomly selected from Ensemble genes. Ten datasets were constructed based on all these RB and non-RB genes. Each gene was encoded with a 13,126-dimensional vector including 12,887 Gene Ontology enrichment scores and 239 KEGG enrichment scores. Finally, an optimal feature set including 1061 GO terms and 8 KEGG pathways was obtained. Analysis showed that these features were closely related to RB. It is anticipated that the method can be applied to predict the other cancer related genes as well.


Assuntos
Bases de Dados Genéticas , Ontologia Genética , Genes Neoplásicos/genética , Marcadores Genéticos/genética , Modelos Genéticos , Proteínas de Neoplasias/genética , Retinoblastoma/genética , Simulação por Computador , Mineração de Dados/métodos , Humanos
10.
Mol Biosyst ; 9(11): 2729-40, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-24056952

RESUMO

Protein carbamylation is one of the important post-translational modifications, which plays a pivotal role in a number of biological conditions, such as diseases, chronic renal failure and atherosclerosis. Therefore, recognition and identification of protein carbamylated sites are essential for disease treatment and prevention. Yet the mechanism of action of carbamylated lysine sites is still not realized. Thus it remains a largely unsolved challenge to uncover it, whether experimentally or theoretically. To address this problem, we have presented a computational framework for theoretically predicting and analyzing carbamylated lysine sites based on both the one-class k-nearest neighbor method and two-stage feature selection. The one-class k-nearest neighbor method requires no negative samples in training. Experimental results showed that by using 280 optimal features the presented method achieved promising performances of SN=82.50% for the jackknife test on the training set, and SN=66.67%, SP=100.00% and MCC=0.8097 for the independent test on the testing set, respectively. Further analysis of the optimal features provided insights into the mechanism of action of carbamylated lysine sites. It is anticipated that our method could be a potentially useful and essential tool for biologists to theoretically investigate carbamylated lysine sites.


Assuntos
Biologia Computacional/métodos , Lisina/metabolismo , Processamento de Proteína Pós-Traducional , Proteínas/metabolismo , Acetilação , Algoritmos , Bases de Dados de Proteínas , Matrizes de Pontuação de Posição Específica , Proteínas/química , Curva ROC , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Sumoilação , Ubiquitinação
11.
Mol Biosyst ; 9(1): 61-9, 2013 Jan 27.
Artigo em Inglês | MEDLINE | ID: mdl-23117653

RESUMO

Identification of catalytic residues plays a key role in understanding how enzymes work. Although numerous computational methods have been developed to predict catalytic residues and active sites, the prediction accuracy remains relatively low with high false positives. In this work, we developed a novel predictor based on the Random Forest algorithm (RF) aided by the maximum relevance minimum redundancy (mRMR) method and incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility to predict active sites of enzymes and achieved an overall accuracy of 0.885687 and MCC of 0.689226 on an independent test dataset. Feature analysis showed that every category of the features except disorder contributed to the identification of active sites. It was also shown via the site-specific feature analysis that the features derived from the active site itself contributed most to the active site determination. Our prediction method may become a useful tool for identifying the active sites and the key features identified by the paper may provide valuable insights into the mechanism of catalysis.


Assuntos
Biologia Computacional/métodos , Enzimas/química , Enzimas/metabolismo , Modelos Químicos , Domínio Catalítico , Fenômenos Químicos , Sequência Conservada , Bases de Dados de Proteínas , Árvores de Decisões , Estrutura Secundária de Proteína , Análise de Sequência de Proteína , Relação Estrutura-Atividade , Máquina de Vetores de Suporte
12.
PLoS One ; 8(5): e63494, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23658834

RESUMO

Colorectal cancer can be grouped into Dukes A, B, C, and D stages based on its developments. Generally speaking, more advanced patients have poorer prognosis. To integrate progression stage prediction systems with recurrence prediction systems, we proposed an ensemble prognostic model for colorectal cancer. In this model, each patient was assigned a most possible stage and a most possible recurrence status. If a patient was predicted to be recurrence patient in advanced stage, he would be classified into high risk group. The ensemble model considered both progression stages and recurrence status. High risk patients and low risk patients predicted by the ensemble model had a significant different disease free survival (log-rank test p-value, 0.0016) and disease specific survival (log-rank test p-value, 0.0041). The ensemble model can better distinguish the high risk and low risk patients than the stage prediction model and the recurrence prediction model alone. This method could be applied to the studies of other diseases and it could significantly improve the prediction performance by ensembling heterogeneous information.


Assuntos
Neoplasias Colorretais/diagnóstico , Modelos Estatísticos , Neoplasias Colorretais/patologia , Progressão da Doença , Intervalo Livre de Doença , Humanos , Estadiamento de Neoplasias , Recidiva , Medição de Risco , Taxa de Sobrevida
13.
Biomed Res Int ; 2013: 723780, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24083237

RESUMO

Drug combinatorial therapy could be more effective in treating some complex diseases than single agents due to better efficacy and reduced side effects. Although some drug combinations are being used, their underlying molecular mechanisms are still poorly understood. Therefore, it is of great interest to deduce a novel drug combination by their molecular mechanisms in a robust and rigorous way. This paper attempts to predict effective drug combinations by a combined consideration of: (1) chemical interaction between drugs, (2) protein interactions between drugs' targets, and (3) target enrichment of KEGG pathways. A benchmark dataset was constructed, consisting of 121 confirmed effective combinations and 605 random combinations. Each drug combination was represented by 465 features derived from the aforementioned three properties. Some feature selection techniques, including Minimum Redundancy Maximum Relevance and Incremental Feature Selection, were adopted to extract the key features. Random forest model was built with its performance evaluated by 5-fold cross-validation. As a result, 55 key features providing the best prediction result were selected. These important features may help to gain insights into the mechanisms of drug combinations, and the proposed prediction model could become a useful tool for screening possible drug combinations.


Assuntos
Biologia Computacional/métodos , Combinação de Medicamentos , Interações Medicamentosas , Preparações Farmacêuticas/metabolismo , Proteínas/metabolismo , Transdução de Sinais , Algoritmos , Curva ROC
14.
Biomed Res Int ; 2013: 414327, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23710446

RESUMO

With a large number of disordered proteins and their important functions discovered, it is highly desired to develop effective methods to computationally predict protein disordered regions. In this study, based on Random Forest (RF), Maximum Relevancy Minimum Redundancy (mRMR), and Incremental Feature Selection (IFS), we developed a new method to predict disordered regions in proteins. The mRMR criterion was used to rank the importance of all candidate features. Finally, top 128 features were selected from the ranked feature list to build the optimal model, including 92 Position Specific Scoring Matrix (PSSM) conservation score features and 36 secondary structure features. As a result, Matthews correlation coefficient (MCC) of 0.3895 was achieved on the training set by 10-fold cross-validation. On the basis of predicting results for each query sequence by using the method, we used the scanning and modification strategy to improve the performance. The accuracy (ACC) and MCC were increased by 4% and almost 0.2%, respectively, compared with other three popular predictors: DISOPRED, DISOclust, and OnD-CRF. The selected features may shed some light on the understanding of the formation mechanism of disordered structures, providing guidelines for experimental validation.


Assuntos
Algoritmos , Biologia Computacional , Proteínas/química , Análise de Sequência de Proteína , Matrizes de Pontuação de Posição Específica , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína
15.
Biomed Res Int ; 2013: 267375, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23762832

RESUMO

Lung cancer is one of the leading causes of cancer mortality worldwide. The main types of lung cancer are small cell lung cancer (SCLC) and nonsmall cell lung cancer (NSCLC). In this work, a computational method was proposed for identifying lung-cancer-related genes with a shortest path approach in a protein-protein interaction (PPI) network. Based on the PPI data from STRING, a weighted PPI network was constructed. 54 NSCLC- and 84 SCLC-related genes were retrieved from associated KEGG pathways. Then the shortest paths between each pair of these 54 NSCLC genes and 84 SCLC genes were obtained with Dijkstra's algorithm. Finally, all the genes on the shortest paths were extracted, and 25 and 38 shortest genes with a permutation P value less than 0.05 for NSCLC and SCLC were selected for further analysis. Some of the shortest path genes have been reported to be related to lung cancer. Intriguingly, the candidate genes we identified from the PPI network contained more cancer genes than those identified from the gene expression profiles. Furthermore, these genes possessed more functional similarity with the known cancer genes than those identified from the gene expression profiles. This study proved the efficiency of the proposed method and showed promising results.


Assuntos
Perfilação da Expressão Gênica/métodos , Genes Neoplásicos/genética , Neoplasias Pulmonares/genética , Mapas de Interação de Proteínas/genética , Carcinoma Pulmonar de Células não Pequenas/genética , Regulação Neoplásica da Expressão Gênica , Estudos de Associação Genética , Humanos , Carcinoma de Pequenas Células do Pulmão/genética
16.
PLoS One ; 8(6): e66678, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23805260

RESUMO

Most of pyruvoyl-dependent proteins observed in prokaryotes and eukaryotes are critical regulatory enzymes, which are primary targets of inhibitors for anti-cancer and anti-parasitic therapy. These proteins undergo an autocatalytic, intramolecular self-cleavage reaction in which a covalently bound pyruvoyl group is generated on a conserved serine residue. Traditional detections of the modified serine sites are performed by experimental approaches, which are often labor-intensive and time-consuming. In this study, we initiated in an attempt for the computational predictions of such serine sites with Feature Selection based on a Random Forest. Since only a small number of experimentally verified pyruvoyl-modified proteins are collected in the protein database at its current version, we only used a small dataset in this study. After removing proteins with sequence identities >60%, a non-redundant dataset was generated and was used, which contained only 46 proteins, with one pyruvoyl serine site for each protein. Several types of features were considered in our method including PSSM conservation scores, disorders, secondary structures, solvent accessibilities, amino acid factors and amino acid occurrence frequencies. As a result, a pretty good performance was achieved in our dataset. The best 100.00% accuracy and 1.0000 MCC value were obtained from the training dataset, and 93.75% accuracy and 0.8441 MCC value from the testing dataset. The optimal feature set contained 9 features. Analysis of the optimal feature set indicated the important roles of some specific features in determining the pyruvoyl-group-serine sites, which were consistent with several results of earlier experimental studies. These selected features may shed some light on the in-depth understanding of the mechanism of the post-translational self-maturation process, providing guidelines for experimental validation. Future work should be made as more pyruvoyl-modified proteins are found and the method should be evaluated on larger datasets. At last, the predicting software can be downloaded from http://www.nkbiox.com/sub/pyrupred/index.html.


Assuntos
Biologia Computacional/métodos , Proteínas/metabolismo , Serina/metabolismo , Algoritmos , Área Sob a Curva , Bases de Dados de Proteínas , Curva ROC
17.
PLoS One ; 8(6): e65207, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23762317

RESUMO

Acquired immune deficiency syndrome (AIDS) is a severe infectious disease that causes a large number of deaths every year. Traditional anti-AIDS drugs directly targeting the HIV-1 encoded enzymes including reverse transcriptase (RT), protease (PR) and integrase (IN) usually suffer from drug resistance after a period of treatment and serious side effects. In recent years, the emergence of numerous useful information of protein-protein interactions (PPI) in the HIV life cycle and related inhibitors makes PPI a new way for antiviral drug intervention. In this study, we identified 26 core human proteins involved in PPI between HIV-1 and host, that have great potential for HIV therapy. In addition, 280 chemicals that interact with three HIV drugs targeting human proteins can also interact with these 26 core proteins. All these indicate that our method as presented in this paper is quite promising. The method may become a useful tool, or at least plays a complementary role to the existing method, for identifying novel anti-HIV drugs.


Assuntos
Algoritmos , Fármacos Anti-HIV/química , Infecções por HIV/tratamento farmacológico , HIV-1/efeitos dos fármacos , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , 1-Desoxinojirimicina/análogos & derivados , 1-Desoxinojirimicina/química , 1-Desoxinojirimicina/farmacologia , Fármacos Anti-HIV/farmacologia , Antagonistas dos Receptores CCR5 , Simulação por Computador , Cicloexanos/química , Cicloexanos/farmacologia , Bases de Dados de Compostos Químicos , Didanosina/química , Didanosina/farmacologia , Desenho de Fármacos , Descoberta de Drogas , Infecções por HIV/virologia , HIV-1/genética , HIV-1/metabolismo , Interações Hospedeiro-Patógeno , Humanos , Maraviroc , Modelos Moleculares , Receptores CCR5/química , Receptores CCR5/metabolismo , Triazóis/química , Triazóis/farmacologia
18.
PLoS One ; 7(8): e43927, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22937126

RESUMO

Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility. We also included five 3D structural features to predict protein-protein interaction sites and achieved an overall accuracy of 0.672997 and MCC of 0.347977. Feature analysis showed that 3D structural features such as Depth Index (DPX) and surface curvature (SC) contributed most to the prediction of protein-protein interaction sites. It was also shown via site-specific feature analysis that the features of individual residues from PPI sites contribute most to the determination of protein-protein interaction sites. It is anticipated that our prediction method will become a useful tool for identifying PPI sites, and that the feature analysis described in this paper will provide useful insights into the mechanisms of interaction.


Assuntos
Biologia Computacional/métodos , Proteínas/metabolismo , Algoritmos , Conformação Proteica
19.
PLoS One ; 7(9): e45854, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23029276

RESUMO

Proteinases play critical roles in both intra and extracellular processes by binding and cleaving their protein substrates. The cleavage can either be non-specific as part of degradation during protein catabolism or highly specific as part of proteolytic cascades and signal transduction events. Identification of these targets is extremely challenging. Current computational approaches for predicting cleavage sites are very limited since they mainly represent the amino acid sequences as patterns or frequency matrices. In this work, we developed a novel predictor based on Random Forest algorithm (RF) using maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). The features of physicochemical/biochemical properties, sequence conservation, residual disorder, amino acid occurrence frequency, secondary structure and solvent accessibility were utilized to represent the peptides concerned. Here, we compared existing prediction tools which are available for predicting possible cleavage sites in candidate substrates with ours. It is shown that our method makes much more reliable predictions in terms of the overall prediction accuracy. In addition, this predictor allows the use of a wide range of proteinases.


Assuntos
Modelos Moleculares , Proteólise , Algoritmos , Motivos de Aminoácidos , Sequência de Aminoácidos , Sequência Conservada , Árvores de Decisões , Dados de Sequência Molecular , Peptídeo Hidrolases/química , Complexo de Endopeptidases do Proteassoma/química , Análise de Sequência de Proteína
20.
J Proteomics ; 75(5): 1654-65, 2012 Feb 16.
Artigo em Inglês | MEDLINE | ID: mdl-22178444

RESUMO

S-nitrosylation (SNO) is one of the most important and universal post-translational modifications (PTMs) which regulates various cellular functions and signaling events. Identification of the exact S-nitrosylation sites in proteins may facilitate the understanding of the molecular mechanisms and biological function of S-nitrosylation. Unfortunately, traditional experimental approaches used for detecting S-nitrosylation sites are often laborious and time-consuming. However, computational methods could overcome this demerit. In this work, we developed a novel predictor based on nearest neighbor algorithm (NNA) with the maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). The features of physicochemical/biochemical properties, sequence conservation, residual disorder, amino acid occurrence frequency, second structure and the solvent accessibility were utilized to represent the peptides concerned. Feature analysis showed that the features except residual disorder affected identification of the S-nitrosylation sites. It was also shown via the site-specific feature analysis that the features of sites away from the central cysteine might contribute to the S-nitrosylation site determination through a subtle manner. It is anticipated that our prediction method may become a useful tool for identifying the protein S-nitrosylation sites and that the features analysis described in this paper may provide useful insights for in-depth investigation into the mechanism of S-nitrosylation.


Assuntos
Algoritmos , Processamento de Proteína Pós-Traducional , Proteínas/química , Análise de Sequência de Proteína/métodos , Animais , Humanos , Estrutura Secundária de Proteína , Proteínas/genética , Proteínas/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA