Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34415016

RESUMEN

Accurate prediction of immunogenic peptide recognized by T cell receptor (TCR) can greatly benefit vaccine development and cancer immunotherapy. However, identifying immunogenic peptides accurately is still a huge challenge. Most of the antigen peptides predicted in silico fail to elicit immune responses in vivo without considering TCR as a key factor. This inevitably causes costly and time-consuming experimental validation test for predicted antigens. Therefore, it is necessary to develop novel computational methods for precisely and effectively predicting immunogenic peptide recognized by TCR. Here, we described DLpTCR, a multimodal ensemble deep learning framework for predicting the likelihood of interaction between single/paired chain(s) of TCR and peptide presented by major histocompatibility complex molecules. To investigate the generality and robustness of the proposed model, COVID-19 data and IEDB data were constructed for independent evaluation. The DLpTCR model exhibited high predictive power with area under the curve up to 0.91 on COVID-19 data while predicting the interaction between peptide and single TCR chain. Additionally, the DLpTCR model achieved the overall accuracy of 81.03% on IEDB data while predicting the interaction between peptide and paired TCR chains. The results demonstrate that DLpTCR has the ability to learn general interaction rules and generalize to antigen peptide recognition by TCR. A user-friendly webserver is available at http://jianglab.org.cn/DLpTCR/. Additionally, a stand-alone software package that can be downloaded from https://github.com/jiangBiolab/DLpTCR.


Asunto(s)
Tratamiento Farmacológico de COVID-19 , Epítopos/inmunología , Péptidos/inmunología , Receptores de Antígenos de Linfocitos T/inmunología , SARS-CoV-2/inmunología , Secuencia de Aminoácidos/genética , COVID-19/genética , COVID-19/inmunología , COVID-19/virología , Simulación por Computador , Aprendizaje Profundo , Epítopos/genética , Humanos , Péptidos/genética , Péptidos/uso terapéutico , Unión Proteica/genética , Receptores de Antígenos de Linfocitos T/genética , SARS-CoV-2/genética , SARS-CoV-2/patogenicidad , Programas Informáticos
2.
Genomics ; 114(6): 110486, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-36126833

RESUMEN

DNA methylation is an important epigenetics, which occurs in the early stages of tumor formation. And it also is of great significance to find the relationship between DNA methylation and cancer. This paper proposes a novel model, iCancer-Pred, to identify cancer and classify its types further. The datasets of DNA methylation information of 7 cancer types have been collected from The Cancer Genome Atlas (TCGA). The coefficient of variation firstly is used to reduce the number of features, and then the elastic network is applied to select important features. Finally, a fully connected neural network is constructed with these selected features. In predicting seven types of cancers, iCancer-Pred has achieved an overall accuracy of over 97% accuracy with 5-fold cross-validation. For the convenience of the application, a user-friendly web server: http://bioinfo.jcu.edu.cn/cancer or http://121.36.221.79/cancer/ is available. And the source codes are freely available for download at https://github.com/Huerhu/iCancer-Pred.


Asunto(s)
Metilación de ADN , Neoplasias , Humanos , Epigenómica , Neoplasias/genética
3.
Int J Mol Sci ; 24(5)2023 Feb 24.
Artículo en Inglés | MEDLINE | ID: mdl-36901929

RESUMEN

A norm in modern medicine is to prescribe polypharmacy to treat disease. The core concern with the co-administration of drugs is that it may produce adverse drug-drug interaction (DDI), which can cause unexpected bodily injury. Therefore, it is essential to identify potential DDI. Most existing methods in silico only judge whether two drugs interact, ignoring the importance of interaction events to study the mechanism implied in combination drugs. In this work, we propose a deep learning framework named MSEDDI that comprehensively considers multi-scale embedding representations of the drug for predicting drug-drug interaction events. In MSEDDI, we design three-channel networks to process biomedical network-based knowledge graph embedding, SMILES sequence-based notation embedding, and molecular graph-based chemical structure embedding, respectively. Finally, we fuse three heterogeneous features from channel outputs through a self-attention mechanism and feed them to the linear layer predictor. In the experimental section, we evaluate the performance of all methods on two different prediction tasks on two datasets. The results show that MSEDDI outperforms other state-of-the-art baselines. Moreover, we also reveal the stable performance of our model in a broader sample set via case studies.


Asunto(s)
Bases del Conocimiento , Polifarmacia , Humanos , Interacciones Farmacológicas
4.
BMC Bioinformatics ; 23(1): 126, 2022 Apr 12.
Artículo en Inglés | MEDLINE | ID: mdl-35413800

RESUMEN

BACKGROUND: In research on new drug discovery, the traditional wet experiment has a long period. Predicting drug-target interaction (DTI) in silico can greatly narrow the scope of search of candidate medications. Excellent algorithm model may be more effective in revealing the potential connection between drug and target in the bioinformatics network composed of drugs, proteins and other related data. RESULTS: In this work, we have developed a heterogeneous graph neural network model, named as HGDTI, which includes a learning phase of network node embedding and a training phase of DTI classification. This method first obtains the molecular fingerprint information of drugs and the pseudo amino acid composition information of proteins, then extracts the initial features of nodes through Bi-LSTM, and uses the attention mechanism to aggregate heterogeneous neighbors. In several comparative experiments, the overall performance of HGDTI significantly outperforms other state-of-the-art DTI prediction models, and the negative sampling technology is employed to further optimize the prediction power of model. In addition, we have proved the robustness of HGDTI through heterogeneous network content reduction tests, and proved the rationality of HGDTI through other comparative experiments. These results indicate that HGDTI can utilize heterogeneous information to capture the embedding of drugs and targets, and provide assistance for drug development. CONCLUSIONS: The HGDTI based on heterogeneous graph neural network model, can utilize heterogeneous information to capture the embedding of drugs and targets, and provide assistance for drug development. For the convenience of related researchers, a user-friendly web-server has been established at http://bioinfo.jcu.edu.cn/hgdti .


Asunto(s)
Biología Computacional , Redes Neurales de la Computación , Algoritmos , Desarrollo de Medicamentos/métodos , Interacciones Farmacológicas , Proteínas/metabolismo
5.
J Biomed Inform ; 131: 104098, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35636720

RESUMEN

In drug development, unexpected side effects are the main reason for the failure of candidate drug trials. Discovering potential side effects of drugsin silicocan improve the success rate of drug screening. However, most previous works extracted and utilized an effective representation of drugs from a single perspective. These methods merely considered the topological information of drug in the biological entity network, or combined the association information (e.g. knowledge graph KG) between drug and other biomarkers, or only used the chemical structure or sequence information of drug. Consequently, to jointly learn drug features from both the macroscopic biological network and the microscopic drug molecules. We propose a hybrid embedding graph neural network model named idse-HE, which integrates graph embedding module and node embedding module. idse-HE can fuse the drug chemical structure information, the drug substructure sequence information and the drug network topology information. Our model deems the final representation of drugs and side effects as two implicit factors to reconstruct the original matrix and predicts the potential side effects of drugs. In the robustness experiment, idse-HE shows stable performance in all indicators. We reproduce the baselines under the same conditions, and the experimental results indicate that idse-HE is superior to other advanced methods. Finally, we also collect evidence to confirm several real drug side effect pairs in the predicted results, which were previously regarded as negative samples. More detailed information, scientific researchers can access the user-friendly web-server of idse-HE at http://bioinfo.jcu.edu.cn/idse-HE. In this server, users can obtain the original data and source code, and will be guided to reproduce the model results.


Asunto(s)
Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Redes Neurales de la Computación , Desarrollo de Medicamentos , Humanos , Conocimiento , Programas Informáticos
6.
Bioinformatics ; 35(3): 398-406, 2019 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-30010789

RESUMEN

Motivation: A cell contains numerous protein molecules. One of the fundamental goals in cell biology is to determine their subcellular locations, which can provide useful clues about their functions. Knowledge of protein subcellular localization is also indispensable for prioritizing and selecting the right targets for drug development. With the avalanche of protein sequences emerging in the post-genomic age, it is highly desired to develop computational tools for timely and effectively identifying their subcellular localization based on the sequence information alone. Recently, a predictor called 'pLoc-mAnimal' was developed for identifying the subcellular localization of animal proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with the multi-label systems in which some proteins, called 'multiplex proteins', may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mAnimal was trained by an extremely skewed dataset in which some subset (subcellular location) was about 128 times the size of the other subsets. Accordingly, such an uneven training dataset will inevitably cause a biased consequence. Results: To alleviate such biased consequence, we have developed a new and bias-reducing predictor called pLoc_bal-mAnimal by quasi-balancing the training dataset. Cross-validation tests on exactly the same experiment-confirmed dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mAnimal, the existing state-of-the-art predictor, in identifying the subcellular localization of animal proteins. Availability and implementation: To maximize the convenience for the vast majority of experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mAnimal/, by which users can easily get their desired results without the need to go through the complicated mathematics. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Bases de Datos de Proteínas , Proteínas , Secuencia de Aminoácidos , Animales , Transporte de Proteínas , Fracciones Subcelulares
7.
Bioinformatics ; 33(22): 3524-3531, 2017 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-29036535

RESUMEN

MOTIVATION: Cells are deemed the basic unit of life. However, many important functions of cells as well as their growth and reproduction are performed via the protein molecules located at their different organelles or locations. Facing explosive growth of protein sequences, we are challenged to develop fast and effective method to annotate their subcellular localization. However, this is by no means an easy task. Particularly, mounting evidences have indicated proteins have multi-label feature meaning that they may simultaneously exist at, or move between, two or more different subcellular location sites. Unfortunately, most of the existing computational methods can only be used to deal with the single-label proteins. Although the 'iLoc-Animal' predictor developed recently is quite powerful that can be used to deal with the animal proteins with multiple locations as well, its prediction quality needs to be improved, particularly in enhancing the absolute true rate and reducing the absolute false rate. RESULTS: Here we propose a new predictor called 'pLoc-mAnimal', which is superior to iLoc-Animal as shown by the compelling facts. When tested by the most rigorous cross-validation on the same high-quality benchmark dataset, the absolute true success rate achieved by the new predictor is 37% higher and the absolute false rate is four times lower in comparison with the state-of-the-art predictor. AVAILABILITY AND IMPLEMENTATION: To maximize the convenience of most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc-mAnimal/, by which users can easily get their desired results without the need to go through the complicated mathematics involved. CONTACT: xxiao@gordonlifescience.org or kcchou@gordonlifescience.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Espacio Intracelular/metabolismo , Proteínas/metabolismo , Programas Informáticos , Secuencia de Aminoácidos , Animales , Transporte de Proteínas , Proteínas/química , Reproducibilidad de los Resultados , Programas Informáticos/normas
8.
Bioinformatics ; 32(24): 3745-3752, 2016 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-27565585

RESUMEN

MOTIVATION: With the rapid increase of infection resistance to antibiotics, it is urgent to find novel infection therapeutics. In recent years, antimicrobial peptides (AMPs) have been utilized as potential alternatives for infection therapeutics. AMPs are key components of the innate immune system and can protect the host from various pathogenic bacteria. Identifying AMPs and their functional types has led to many studies, and various predictors using machine learning have been developed. However, there is room for improvement; in particular, no predictor takes into account the lack of balance among different functional AMPs. RESULTS: In this paper, a new synthetic minority over-sampling technique on imbalanced and multi-label datasets, referred to as ML-SMOTE, was designed for processing and identifying AMPs' functional families. A novel multi-label classifier, MLAMP, was also developed using ML-SMOTE and grey pseudo amino acid composition. The classifier obtained 0.4846 subset accuracy and 0.16 hamming loss. AVAILABILITY AND IMPLEMENTATION: A user-friendly web-server for MLAMP was established at http://www.jci-bioinfo.cn/MLAMP CONTACTS: linweizhong@jci.edu.cn or xudong@missouri.edu.


Asunto(s)
Aminoácidos/química , Péptidos Catiónicos Antimicrobianos/química , Biología Computacional/métodos , Algoritmos , Internet , Modelos Teóricos
9.
J Membr Biol ; 248(4): 745-52, 2015 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-25796484

RESUMEN

Predicting membrane protein type is a challenging problem, particularly when the query proteins may simultaneously have two or more different types. Most of the existing methods can only be used to deal with the single-label proteins. Actually, multiple-label proteins should not be ignored because they usually bear some special functions worthy of in-depth studies. By introducing the "multi-labeled learning" and hybridizing evolution information through Grey-PSSM, a novel predictor called iMem-Seq is developed that can be used to deal with the systems containing both single and multiple types of membrane proteins. As a demonstration, the jackknife cross-validation was performed with iMem-Seq on a benchmark dataset of membrane proteins classified into the eight types, where some proteins belong to two or there types, but none has ≥25% pairwise sequence identity to any other in a same subset. It was demonstrated via the rigorous cross-validations that the new predictor remarkably outperformed all its counterparts. As a user-friendly web-server, iMem-Seq is freely accessible to the public at the website http://www.jci-bioinfo.cn/iMem-Seq .


Asunto(s)
Proteínas de la Membrana/clasificación , Proteínas de la Membrana/genética , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Proteínas de la Membrana/química
10.
Curr Pharm Des ; 30(6): 468-476, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38323613

RESUMEN

INTRODUCTION: Drug development is a challenging and costly process, yet it plays a crucial role in improving healthcare outcomes. Drug development requires extensive research and testing to meet the demands for economic efficiency, cures, and pain relief. METHODS: Drug development is a vital research area that necessitates innovation and collaboration to achieve significant breakthroughs. Computer-aided drug design provides a promising avenue for drug discovery and development by reducing costs and improving the efficiency of drug design and testing. RESULTS: In this study, a novel model, namely LSTM-SAGDTA, capable of accurately predicting drug-target binding affinity, was developed. We employed SeqVec for characterizing the protein and utilized the graph neural networks to capture information on drug molecules. By introducing self-attentive graph pooling, the model achieved greater accuracy and efficiency in predicting drug-target binding affinity. CONCLUSION: Moreover, LSTM-SAGDTA obtained superior accuracy over current state-of-the-art methods only by using less training time. The results of experiments suggest that this method represents a highprecision solution for the DTA predictor.


Asunto(s)
Redes Neurales de la Computación , Humanos , Preparaciones Farmacéuticas/metabolismo , Preparaciones Farmacéuticas/química , Desarrollo de Medicamentos , Diseño de Fármacos , Proteínas/metabolismo , Proteínas/química
11.
Proteins ; 81(1): 140-8, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22933332

RESUMEN

Protein folding is the process by which a protein processes from its denatured state to its specific biologically active conformation. Understanding the relationship between sequences and the folding rates of proteins remains an important challenge. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. In this study, the long-range and short-range contact in protein were used to derive extended version of the pseudo amino acid composition based on sliding window method. This method is capable of predicting the protein folding rates just from the amino acid sequence without the aid of any structural class information. We systematically studied the contributions of individual features to folding rate prediction. The optimal feature selection procedures are adopted by means of combining the forward feature selection and sequential backward selection method. Using the jackknife cross validation test, the method was demonstrated on the large dataset. The predictor was achieved on the basis of multitudinous physicochemical features and statistical features from protein using nonlinear support vector machine (SVM) regression model, the method obtained an excellent agreement between predicted and experimentally observed folding rates of proteins. The correlation coefficient is 0.9313 and the standard error is 2.2692. The prediction server is freely available at http://www.jci-bioinfo.cn/swfrate/input.jsp.


Asunto(s)
Aminoácidos/química , Modelos Químicos , Pliegue de Proteína , Proteínas/química , Secuencia de Aminoácidos , Aminoácidos/metabolismo , Bases de Datos de Proteínas , Proteínas/metabolismo , Reproducibilidad de los Resultados , Programas Informáticos , Relación Estructura-Actividad , Máquina de Vectores de Soporte
12.
Anal Biochem ; 436(2): 168-77, 2013 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-23395824

RESUMEN

Antimicrobial peptides (AMPs), also called host defense peptides, are an evolutionarily conserved component of the innate immune response and are found among all classes of life. According to their special functions, AMPs are generally classified into ten categories: Antibacterial Peptides, Anticancer/tumor Peptides, Antifungal Peptides, Anti-HIV Peptides, Antiviral Peptides, Antiparasital Peptides, Anti-protist Peptides, AMPs with Chemotactic Activity, Insecticidal Peptides, and Spermicidal Peptides. Given a query peptide, how can we identify whether it is an AMP or non-AMP? If it is, can we identify which functional type or types it belong to? Particularly, how can we deal with the multi-type problem since an AMP may belong to two or more functional types? To address these problems, which are obviously very important to both basic research and drug development, a multi-label classifier was developed based on the pseudo amino acid composition (PseAAC) and fuzzy K-nearest neighbor (FKNN) algorithm, where the components of PseAAC were featured by incorporating five physicochemical properties. The novel classifier is called iAMP-2L, where "2L" means that it is a 2-level predictor. The 1st-level is to answer the 1st question above, while the 2nd-level is to answer the 2nd and 3rd questions that are beyond the reach of any existing methods in this area. For the conveniences of users, a user-friendly web-server for iAMP-2L was established at http://www.jci-bioinfo.cn/iAMP-2L.


Asunto(s)
Algoritmos , Péptidos Catiónicos Antimicrobianos/clasificación , Péptidos Catiónicos Antimicrobianos/farmacología , Aminoácidos/análisis , Antiinfecciosos/química , Antiinfecciosos/farmacología , Péptidos Catiónicos Antimicrobianos/química , Bases de Datos de Proteínas , Péptidos/química , Péptidos/clasificación , Péptidos/farmacología , Interfaz Usuario-Computador
13.
Front Genet ; 13: 926927, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35846148

RESUMEN

The early symptoms of lung adenocarcinoma patients are inapparent, and the clinical diagnosis of lung adenocarcinoma is primarily through X-ray examination and pathological section examination, whereas the discovery of biomarkers points out another direction for the diagnosis of lung adenocarcinoma with the development of bioinformatics technology. However, it is not accurate and trustworthy to diagnose lung adenocarcinoma due to omics data with high-dimension and low-sample size (HDLSS) features or biomarkers produced by utilizing only single omics data. To address the above problems, the feature selection methods of biological analysis are used to reduce the dimension of gene expression data (GSE19188) and DNA methylation data (GSE139032, GSE49996). In addition, the Cartesian product method is used to expand the sample set and integrate gene expression data and DNA methylation data. The classification is built by using a deep neural network and is evaluated on K-fold cross validation. Moreover, gene ontology analysis and literature retrieving are used to analyze the biological relevance of selected genes, TCGA database is used for survival analysis of these potential genes through Kaplan-Meier estimates to discover the detailed molecular mechanism of lung adenocarcinoma. Survival analysis shows that COL5A2 and SERPINB5 are significant for identifying lung adenocarcinoma and are considered biomarkers of lung adenocarcinoma.

14.
Biomed Opt Express ; 12(5): 2888-2901, 2021 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-34168906

RESUMEN

We have demonstrated widely tunable Yb:fiber-based laser sources, aiming to replace Ti:sapphire lasers for the nJ-level ultrafast applications, especially for the uses of nonlinear light microscopy. We investigated the influence of different input parameters to obtain an expansive spectral broadening, enabled by self-phase modulation and further reshaped by self-steepening, in the normal dispersion regime before the fiber damage. We also discussed the compressibility and intensity fluctuations of the demonstrated pulses, to reach the transform-limited duration with a very low intensity noise. Most importantly, we have demonstrated clear two-photon fluorescence images from UV-absorbing fluorophores to deep red dye stains.

15.
Curr Pharm Des ; 26(26): 3105-3114, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32552636

RESUMEN

The catalytic efficiency of the enzyme is thousands of times higher than that of ordinary catalysts. Thus, they are widely used in industrial and medical fields. However, enzymes with protein structure can be destroyed and inactivated in high temperature, over acid or over alkali environment. It is well known that most of enzymes work well in an environment with pH of 6-8, while some special enzymes remain active only in an alkaline environment with pH > 8 or an acidic environment with pH < 6. Therefore, the identification of acidic and alkaline enzymes has become a key task for industrial production. Because of the wide varieties of enzymes, it is hard work to determine the acidity and alkalinity of the enzyme by experimental methods, and even this task cannot be achieved. Converting protein sequences into digital features and building computational models can efficiently and accurately identify the acidity and alkalinity of enzymes. This review summarized the progress of the digital features to express proteins and computational methods to identify acidic and alkaline enzymes. We hope that this paper will provide more convenience, ideas, and guides for computationally classifying acid and alkaline enzymes.


Asunto(s)
Biología Computacional , Enzimas , Secuencia de Aminoácidos , Enzimas/metabolismo , Humanos , Concentración de Iones de Hidrógeno
16.
Amino Acids ; 37(4): 741-9, 2009 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-19037711

RESUMEN

Many proteins are composed of two or more subunits, each associated with different polypeptide chains. The number and arrangement of subunits forming a protein are referred to as quaternary structure. It has been known for long that the functions of proteins are closely related to their quaternary structure. In this paper the grey incidence degree is introduced that can calculate the numerical relation between various components, expressed the similar or different degree between these components. We have demonstrated that introduction of the grey incidence degree can remarkably enhance the success rates in predicting the protein quaternary structural class. It is anticipated that the grey incidence degree can be also used to predict many other protein attributes, such as subcellular localization, membrane protein type, enzyme functional class, GPCR type, protease type, among many others.


Asunto(s)
Estructura Cuaternaria de Proteína , Proteínas/química , Análisis de Secuencia de Proteína/métodos , Algoritmos , Bases de Datos de Proteínas
17.
J Comput Chem ; 29(12): 2018-24, 2008 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-18381630

RESUMEN

Using the pseudo amino acid (PseAA) composition to represent the sample of a protein can incorporate a considerable amount of sequence pattern information so as to improve the prediction quality for its structural or functional classification. However, how to optimally formulate the PseAA composition is an important problem yet to be solved. In this article the grey modeling approach is introduced that is particularly efficient in coping with complicated systems such as the one consisting of many proteins with different sequence orders and lengths. On the basis of the grey model, four coefficients derived from each of the protein sequences concerned are adopted for its PseAA components. The PseAA composition thus formulated is called the "grey-PseAA" composition that can catch the essence of a protein sequence and better reflect its overall pattern. In our study we have demonstrated that introduction of the grey-PseAA composition can remarkably enhance the success rates in predicting the protein structural class. It is anticipated that the concept of grey-PseAA composition can be also used to predict many other protein attributes, such as subcellular localization, membrane protein type, enzyme functional class, GPCR type, protease type, among many others.


Asunto(s)
Aminoácidos/química , Modelos Moleculares , Proteínas/química , Secuencia de Aminoácidos , Proteínas/fisiología
18.
J Biomol Struct Dyn ; 33(8): 1731-42, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25248923

RESUMEN

As one of the most important posttranslational modifications (PTMs), ubiquitination plays an important role in regulating varieties of biological processes, such as signal transduction, cell division, apoptosis, and immune response. Ubiquitination is also named "lysine ubiquitination" because it occurs when an ubiquitin is covalently attached to lysine (K) residues of targeting proteins. Given an uncharacterized protein sequence that contains many lysine residues, which one of them is the ubiquitination site, and which one is of non-ubiquitination site? With the avalanche of protein sequences generated in the postgenomic age, it is highly desired for both basic research and drug development to develop an automated method for rapidly and accurately annotating the ubiquitination sites in proteins. In view of this, a new predictor called "iUbiq-Lys" was developed based on the evolutionary information, gray system model, as well as the general form of pseudo-amino acid composition. It was demonstrated via the rigorous cross-validations that the new predictor remarkably outperformed all its counterparts. As a web-server, iUbiq-Lys is accessible to the public at http://www.jci-bioinfo.cn/iUbiq-Lys . For the convenience of most experimental scientists, we have further provided a protocol of step-by-step guide, by which users can easily get their desired results without the need to follow the complicated mathematics that were presented in this paper just for the integrity of its development process.


Asunto(s)
Evolución Biológica , Lisina/química , Modelos Teóricos , Proteínas/química , Programas Informáticos , Ubiquitinación , Algoritmos , Lisina/metabolismo , Proteínas/metabolismo , Reproducibilidad de los Resultados , Navegador Web
19.
J Biomol Struct Dyn ; 33(10): 2221-33, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25513722

RESUMEN

Information about the interactions of drug compounds with proteins in cellular networking is very important for drug development. Unfortunately, all the existing predictors for identifying drug-protein interactions were trained by a skewed benchmark data-set where the number of non-interactive drug-protein pairs is overwhelmingly larger than that of the interactive ones. Using this kind of highly unbalanced benchmark data-set to train predictors would lead to the outcome that many interactive drug-protein pairs might be mispredicted as non-interactive. Since the minority interactive pairs often contain the most important information for drug design, it is necessary to minimize this kind of misprediction. In this study, we adopted the neighborhood cleaning rule and synthetic minority over-sampling technique to treat the skewed benchmark datasets and balance the positive and negative subsets. The new benchmark datasets thus obtained are called the optimized benchmark datasets, based on which a new predictor called iDrug-Target was developed that contains four sub-predictors: iDrug-GPCR, iDrug-Chl, iDrug-Ezy, and iDrug-NR, specialized for identifying the interactions of drug compounds with GPCRs (G-protein-coupled receptors), ion channels, enzymes, and NR (nuclear receptors), respectively. Rigorous cross-validations on a set of experiment-confirmed datasets have indicated that these new predictors remarkably outperformed the existing ones for the same purpose. To maximize users' convenience, a public accessible Web server for iDrug-Target has been established at http://www.jci-bioinfo.cn/iDrug-Target/ , by which users can easily get their desired results. It has not escaped our notice that the aforementioned strategy can be widely used in many other areas as well.


Asunto(s)
Drogas en Investigación/química , Enzimas/química , Canales Iónicos/química , Receptores Citoplasmáticos y Nucleares/química , Receptores Acoplados a Proteínas G/química , Programas Informáticos , Benchmarking , Bases de Datos de Compuestos Químicos , Conjuntos de Datos como Asunto , Diseño de Fármacos , Descubrimiento de Drogas , Drogas en Investigación/síntesis química , Enzimas/metabolismo , Humanos , Internet , Canales Iónicos/metabolismo , Terapia Molecular Dirigida/métodos , Unión Proteica , Curva ROC , Receptores Citoplasmáticos y Nucleares/metabolismo , Receptores Acoplados a Proteínas G/metabolismo
20.
Biomed Res Int ; 2014: 947416, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24977164

RESUMEN

Before becoming the native proteins during the biosynthesis, their polypeptide chains created by ribosome's translating mRNA will undergo a series of "product-forming" steps, such as cutting, folding, and posttranslational modification (PTM). Knowledge of PTMs in proteins is crucial for dynamic proteome analysis of various human diseases and epigenetic inheritance. One of the most important PTMs is the Arg- or Lys-methylation that occurs on arginine or lysine, respectively. Given a protein, which site of its Arg (or Lys) can be methylated, and which site cannot? This is the first important problem for understanding the methylation mechanism and drug development in depth. With the avalanche of protein sequences generated in the postgenomic age, its urgency has become self-evident. To address this problem, we proposed a new predictor, called iMethyl-PseAAC. In the prediction system, a peptide sample was formulated by a 346-dimensional vector, formed by incorporating its physicochemical, sequence evolution, biochemical, and structural disorder information into the general form of pseudo amino acid composition. It was observed by the rigorous jackknife test and independent dataset test that iMethyl-PseAAC was superior to any of the existing predictors in this area.


Asunto(s)
Algoritmos , Reconocimiento de Normas Patrones Automatizadas/métodos , Procesamiento Proteico-Postraduccional/fisiología , Proteínas/química , Proteínas/metabolismo , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Sitios de Unión , Metilación , Datos de Secuencia Molecular , Unión Proteica , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA