Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Front Genet ; 13: 926927, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35846148

RESUMO

The early symptoms of lung adenocarcinoma patients are inapparent, and the clinical diagnosis of lung adenocarcinoma is primarily through X-ray examination and pathological section examination, whereas the discovery of biomarkers points out another direction for the diagnosis of lung adenocarcinoma with the development of bioinformatics technology. However, it is not accurate and trustworthy to diagnose lung adenocarcinoma due to omics data with high-dimension and low-sample size (HDLSS) features or biomarkers produced by utilizing only single omics data. To address the above problems, the feature selection methods of biological analysis are used to reduce the dimension of gene expression data (GSE19188) and DNA methylation data (GSE139032, GSE49996). In addition, the Cartesian product method is used to expand the sample set and integrate gene expression data and DNA methylation data. The classification is built by using a deep neural network and is evaluated on K-fold cross validation. Moreover, gene ontology analysis and literature retrieving are used to analyze the biological relevance of selected genes, TCGA database is used for survival analysis of these potential genes through Kaplan-Meier estimates to discover the detailed molecular mechanism of lung adenocarcinoma. Survival analysis shows that COL5A2 and SERPINB5 are significant for identifying lung adenocarcinoma and are considered biomarkers of lung adenocarcinoma.

2.
Biomed Opt Express ; 12(5): 2888-2901, 2021 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-34168906

RESUMO

We have demonstrated widely tunable Yb:fiber-based laser sources, aiming to replace Ti:sapphire lasers for the nJ-level ultrafast applications, especially for the uses of nonlinear light microscopy. We investigated the influence of different input parameters to obtain an expansive spectral broadening, enabled by self-phase modulation and further reshaped by self-steepening, in the normal dispersion regime before the fiber damage. We also discussed the compressibility and intensity fluctuations of the demonstrated pulses, to reach the transform-limited duration with a very low intensity noise. Most importantly, we have demonstrated clear two-photon fluorescence images from UV-absorbing fluorophores to deep red dye stains.

3.
Bioinformatics ; 35(3): 398-406, 2019 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-30010789

RESUMO

Motivation: A cell contains numerous protein molecules. One of the fundamental goals in cell biology is to determine their subcellular locations, which can provide useful clues about their functions. Knowledge of protein subcellular localization is also indispensable for prioritizing and selecting the right targets for drug development. With the avalanche of protein sequences emerging in the post-genomic age, it is highly desired to develop computational tools for timely and effectively identifying their subcellular localization based on the sequence information alone. Recently, a predictor called 'pLoc-mAnimal' was developed for identifying the subcellular localization of animal proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with the multi-label systems in which some proteins, called 'multiplex proteins', may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mAnimal was trained by an extremely skewed dataset in which some subset (subcellular location) was about 128 times the size of the other subsets. Accordingly, such an uneven training dataset will inevitably cause a biased consequence. Results: To alleviate such biased consequence, we have developed a new and bias-reducing predictor called pLoc_bal-mAnimal by quasi-balancing the training dataset. Cross-validation tests on exactly the same experiment-confirmed dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mAnimal, the existing state-of-the-art predictor, in identifying the subcellular localization of animal proteins. Availability and implementation: To maximize the convenience for the vast majority of experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mAnimal/, by which users can easily get their desired results without the need to go through the complicated mathematics. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Bases de Dados de Proteínas , Proteínas , Sequência de Aminoácidos , Animais , Transporte Proteico , Frações Subcelulares
4.
Bioinformatics ; 33(22): 3524-3531, 2017 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-29036535

RESUMO

MOTIVATION: Cells are deemed the basic unit of life. However, many important functions of cells as well as their growth and reproduction are performed via the protein molecules located at their different organelles or locations. Facing explosive growth of protein sequences, we are challenged to develop fast and effective method to annotate their subcellular localization. However, this is by no means an easy task. Particularly, mounting evidences have indicated proteins have multi-label feature meaning that they may simultaneously exist at, or move between, two or more different subcellular location sites. Unfortunately, most of the existing computational methods can only be used to deal with the single-label proteins. Although the 'iLoc-Animal' predictor developed recently is quite powerful that can be used to deal with the animal proteins with multiple locations as well, its prediction quality needs to be improved, particularly in enhancing the absolute true rate and reducing the absolute false rate. RESULTS: Here we propose a new predictor called 'pLoc-mAnimal', which is superior to iLoc-Animal as shown by the compelling facts. When tested by the most rigorous cross-validation on the same high-quality benchmark dataset, the absolute true success rate achieved by the new predictor is 37% higher and the absolute false rate is four times lower in comparison with the state-of-the-art predictor. AVAILABILITY AND IMPLEMENTATION: To maximize the convenience of most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc-mAnimal/, by which users can easily get their desired results without the need to go through the complicated mathematics involved. CONTACT: xxiao@gordonlifescience.org or kcchou@gordonlifescience.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Espaço Intracelular/metabolismo , Proteínas/metabolismo , Software , Sequência de Aminoácidos , Animais , Transporte Proteico , Proteínas/química , Reprodutibilidade dos Testes , Software/normas
5.
J Membr Biol ; 248(4): 745-52, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-25796484

RESUMO

Predicting membrane protein type is a challenging problem, particularly when the query proteins may simultaneously have two or more different types. Most of the existing methods can only be used to deal with the single-label proteins. Actually, multiple-label proteins should not be ignored because they usually bear some special functions worthy of in-depth studies. By introducing the "multi-labeled learning" and hybridizing evolution information through Grey-PSSM, a novel predictor called iMem-Seq is developed that can be used to deal with the systems containing both single and multiple types of membrane proteins. As a demonstration, the jackknife cross-validation was performed with iMem-Seq on a benchmark dataset of membrane proteins classified into the eight types, where some proteins belong to two or there types, but none has ≥25% pairwise sequence identity to any other in a same subset. It was demonstrated via the rigorous cross-validations that the new predictor remarkably outperformed all its counterparts. As a user-friendly web-server, iMem-Seq is freely accessible to the public at the website http://www.jci-bioinfo.cn/iMem-Seq .


Assuntos
Proteínas de Membrana/classificação , Proteínas de Membrana/genética , Análise de Sequência de Proteína/métodos , Software , Proteínas de Membrana/química
6.
J Biomol Struct Dyn ; 33(10): 2221-33, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25513722

RESUMO

Information about the interactions of drug compounds with proteins in cellular networking is very important for drug development. Unfortunately, all the existing predictors for identifying drug-protein interactions were trained by a skewed benchmark data-set where the number of non-interactive drug-protein pairs is overwhelmingly larger than that of the interactive ones. Using this kind of highly unbalanced benchmark data-set to train predictors would lead to the outcome that many interactive drug-protein pairs might be mispredicted as non-interactive. Since the minority interactive pairs often contain the most important information for drug design, it is necessary to minimize this kind of misprediction. In this study, we adopted the neighborhood cleaning rule and synthetic minority over-sampling technique to treat the skewed benchmark datasets and balance the positive and negative subsets. The new benchmark datasets thus obtained are called the optimized benchmark datasets, based on which a new predictor called iDrug-Target was developed that contains four sub-predictors: iDrug-GPCR, iDrug-Chl, iDrug-Ezy, and iDrug-NR, specialized for identifying the interactions of drug compounds with GPCRs (G-protein-coupled receptors), ion channels, enzymes, and NR (nuclear receptors), respectively. Rigorous cross-validations on a set of experiment-confirmed datasets have indicated that these new predictors remarkably outperformed the existing ones for the same purpose. To maximize users' convenience, a public accessible Web server for iDrug-Target has been established at http://www.jci-bioinfo.cn/iDrug-Target/ , by which users can easily get their desired results. It has not escaped our notice that the aforementioned strategy can be widely used in many other areas as well.


Assuntos
Drogas em Investigação/química , Enzimas/química , Canais Iônicos/química , Receptores Citoplasmáticos e Nucleares/química , Receptores Acoplados a Proteínas G/química , Software , Benchmarking , Bases de Dados de Compostos Químicos , Conjuntos de Dados como Assunto , Desenho de Fármacos , Descoberta de Drogas , Drogas em Investigação/síntese química , Enzimas/metabolismo , Humanos , Internet , Canais Iônicos/metabolismo , Terapia de Alvo Molecular/métodos , Ligação Proteica , Curva ROC , Receptores Citoplasmáticos e Nucleares/metabolismo , Receptores Acoplados a Proteínas G/metabolismo
7.
J Biomol Struct Dyn ; 33(8): 1731-42, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25248923

RESUMO

As one of the most important posttranslational modifications (PTMs), ubiquitination plays an important role in regulating varieties of biological processes, such as signal transduction, cell division, apoptosis, and immune response. Ubiquitination is also named "lysine ubiquitination" because it occurs when an ubiquitin is covalently attached to lysine (K) residues of targeting proteins. Given an uncharacterized protein sequence that contains many lysine residues, which one of them is the ubiquitination site, and which one is of non-ubiquitination site? With the avalanche of protein sequences generated in the postgenomic age, it is highly desired for both basic research and drug development to develop an automated method for rapidly and accurately annotating the ubiquitination sites in proteins. In view of this, a new predictor called "iUbiq-Lys" was developed based on the evolutionary information, gray system model, as well as the general form of pseudo-amino acid composition. It was demonstrated via the rigorous cross-validations that the new predictor remarkably outperformed all its counterparts. As a web-server, iUbiq-Lys is accessible to the public at http://www.jci-bioinfo.cn/iUbiq-Lys . For the convenience of most experimental scientists, we have further provided a protocol of step-by-step guide, by which users can easily get their desired results without the need to follow the complicated mathematics that were presented in this paper just for the integrity of its development process.


Assuntos
Evolução Biológica , Lisina/química , Modelos Teóricos , Proteínas/química , Software , Ubiquitinação , Algoritmos , Lisina/metabolismo , Proteínas/metabolismo , Reprodutibilidade dos Testes , Navegador
8.
Biomed Res Int ; 2014: 947416, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24977164

RESUMO

Before becoming the native proteins during the biosynthesis, their polypeptide chains created by ribosome's translating mRNA will undergo a series of "product-forming" steps, such as cutting, folding, and posttranslational modification (PTM). Knowledge of PTMs in proteins is crucial for dynamic proteome analysis of various human diseases and epigenetic inheritance. One of the most important PTMs is the Arg- or Lys-methylation that occurs on arginine or lysine, respectively. Given a protein, which site of its Arg (or Lys) can be methylated, and which site cannot? This is the first important problem for understanding the methylation mechanism and drug development in depth. With the avalanche of protein sequences generated in the postgenomic age, its urgency has become self-evident. To address this problem, we proposed a new predictor, called iMethyl-PseAAC. In the prediction system, a peptide sample was formulated by a 346-dimensional vector, formed by incorporating its physicochemical, sequence evolution, biochemical, and structural disorder information into the general form of pseudo amino acid composition. It was observed by the rigorous jackknife test and independent dataset test that iMethyl-PseAAC was superior to any of the existing predictors in this area.


Assuntos
Algoritmos , Reconhecimento Automatizado de Padrão/métodos , Processamento de Proteína Pós-Traducional/fisiologia , Proteínas/química , Proteínas/metabolismo , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Sítios de Ligação , Metilação , Dados de Sequência Molecular , Ligação Proteica , Software
9.
Curr Top Med Chem ; 13(14): 1622-35, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23889055

RESUMO

With the explosion of protein sequences generated in the postgenomic era, the gap between the number of attribute- known proteins and that of uncharacterized ones has become increasingly large. Knowing the key attributes of proteins is a shortcut for prioritizing drug targets and developing novel new drugs. Unfortunately, it is both time-consuming and costly to acquire these kinds of information by purely conducting biological experiments. Therefore, it is highly desired to develop various computational tools for fast and effectively classifying proteins according to their sequence information alone. The process of developing these high throughput tools is generally involved with the following procedures: (1) constructing benchmark datasets; (2) representing a protein sequence with a discrete numerical model; (3) developing or introducing a powerful algorithm or machine learning operator to conduct the prediction; (4) estimating the anticipated accuracy with a proper and objective test method; and (5) establishing a user-friendly web-server accessible to the public. This minireview is focused on the recent progresses in identifying the types of G-protein coupled receptors (GPCRs), subcellular localization of proteins, DNA-binding proteins and their binding sites. All these identification tools may provide very useful informations for in-depth study of drug metabolism.


Assuntos
Desenho de Fármacos , Proteínas/antagonistas & inibidores , Proteínas/classificação , Algoritmos , Animais , Sítios de Ligação/efeitos dos fármacos , Humanos , Modelos Moleculares , Preparações Farmacêuticas/química , Preparações Farmacêuticas/metabolismo , Proteínas/química , Proteínas/metabolismo
10.
Mol Biosyst ; 9(4): 634-44, 2013 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-23370050

RESUMO

Predicting protein subcellular localization is a challenging problem, particularly when query proteins have multi-label features meaning that they may simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing methods can only be used to deal with the single-label proteins. Actually, multi-label proteins should not be ignored because they usually bear some special function worthy of in-depth studies. By introducing the "multi-label learning" approach, a new predictor, called iLoc-Animal, has been developed that can be used to deal with the systems containing both single- and multi-label animal (metazoan except human) proteins. Meanwhile, to measure the prediction quality of a multi-label system in a rigorous way, five indices were introduced; they are "Absolute-True", "Absolute-False" (or Hamming-Loss"), "Accuracy", "Precision", and "Recall". As a demonstration, the jackknife cross-validation was performed with iLoc-Animal on a benchmark dataset of animal proteins classified into the following 20 location sites: (1) acrosome, (2) cell membrane, (3) centriole, (4) centrosome, (5) cell cortex, (6) cytoplasm, (7) cytoskeleton, (8) endoplasmic reticulum, (9) endosome, (10) extracellular, (11) Golgi apparatus, (12) lysosome, (13) mitochondrion, (14) melanosome, (15) microsome, (16) nucleus, (17) peroxisome, (18) plasma membrane, (19) spindle, and (20) synapse, where many proteins belong to two or more locations. For such a complicated system, the outcomes achieved by iLoc-Animal for all the aforementioned five indices were quite encouraging, indicating that the predictor may become a useful tool in this area. It has not escaped our notice that the multi-label approach and the rigorous measurement metrics can also be used to investigate many other multi-label problems in molecular biology. As a user-friendly web-server, iLoc-Animal is freely accessible to the public at the web-site .


Assuntos
Proteínas/química , Proteínas/metabolismo , Software , Algoritmos , Animais , Biologia Computacional/métodos , Internet , Espaço Intracelular/metabolismo , Transporte Proteico , Coloração e Rotulagem
11.
Anal Biochem ; 436(2): 168-77, 2013 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-23395824

RESUMO

Antimicrobial peptides (AMPs), also called host defense peptides, are an evolutionarily conserved component of the innate immune response and are found among all classes of life. According to their special functions, AMPs are generally classified into ten categories: Antibacterial Peptides, Anticancer/tumor Peptides, Antifungal Peptides, Anti-HIV Peptides, Antiviral Peptides, Antiparasital Peptides, Anti-protist Peptides, AMPs with Chemotactic Activity, Insecticidal Peptides, and Spermicidal Peptides. Given a query peptide, how can we identify whether it is an AMP or non-AMP? If it is, can we identify which functional type or types it belong to? Particularly, how can we deal with the multi-type problem since an AMP may belong to two or more functional types? To address these problems, which are obviously very important to both basic research and drug development, a multi-label classifier was developed based on the pseudo amino acid composition (PseAAC) and fuzzy K-nearest neighbor (FKNN) algorithm, where the components of PseAAC were featured by incorporating five physicochemical properties. The novel classifier is called iAMP-2L, where "2L" means that it is a 2-level predictor. The 1st-level is to answer the 1st question above, while the 2nd-level is to answer the 2nd and 3rd questions that are beyond the reach of any existing methods in this area. For the conveniences of users, a user-friendly web-server for iAMP-2L was established at http://www.jci-bioinfo.cn/iAMP-2L.


Assuntos
Algoritmos , Peptídeos Catiônicos Antimicrobianos/classificação , Peptídeos Catiônicos Antimicrobianos/farmacologia , Aminoácidos/análise , Anti-Infecciosos/química , Anti-Infecciosos/farmacologia , Peptídeos Catiônicos Antimicrobianos/química , Bases de Dados de Proteínas , Peptídeos/química , Peptídeos/classificação , Peptídeos/farmacologia , Interface Usuário-Computador
12.
Proteins ; 81(1): 140-8, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22933332

RESUMO

Protein folding is the process by which a protein processes from its denatured state to its specific biologically active conformation. Understanding the relationship between sequences and the folding rates of proteins remains an important challenge. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. In this study, the long-range and short-range contact in protein were used to derive extended version of the pseudo amino acid composition based on sliding window method. This method is capable of predicting the protein folding rates just from the amino acid sequence without the aid of any structural class information. We systematically studied the contributions of individual features to folding rate prediction. The optimal feature selection procedures are adopted by means of combining the forward feature selection and sequential backward selection method. Using the jackknife cross validation test, the method was demonstrated on the large dataset. The predictor was achieved on the basis of multitudinous physicochemical features and statistical features from protein using nonlinear support vector machine (SVM) regression model, the method obtained an excellent agreement between predicted and experimentally observed folding rates of proteins. The correlation coefficient is 0.9313 and the standard error is 2.2692. The prediction server is freely available at http://www.jci-bioinfo.cn/swfrate/input.jsp.


Assuntos
Aminoácidos/química , Modelos Químicos , Dobramento de Proteína , Proteínas/química , Sequência de Aminoácidos , Aminoácidos/metabolismo , Bases de Dados de Proteínas , Proteínas/metabolismo , Reprodutibilidade dos Testes , Software , Relação Estrutura-Atividade , Máquina de Vetores de Suporte
13.
PLoS One ; 7(11): e49040, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23189138

RESUMO

The malaria disease has become a cause of poverty and a major hindrance to economic development. The culprit of the disease is the parasite, which secretes an array of proteins within the host erythrocyte to facilitate its own survival. Accordingly, the secretory proteins of malaria parasite have become a logical target for drug design against malaria. Unfortunately, with the increasing resistance to the drugs thus developed, the situation has become more complicated. To cope with the drug resistance problem, one strategy is to timely identify the secreted proteins by malaria parasite, which can serve as potential drug targets. However, it is both expensive and time-consuming to identify the secretory proteins of malaria parasite by experiments alone. To expedite the process for developing effective drugs against malaria, a computational predictor called "iSMP-Grey" was developed that can be used to identify the secretory proteins of malaria parasite based on the protein sequence information alone. During the prediction process a protein sample was formulated with a 60D (dimensional) feature vector formed by incorporating the sequence evolution information into the general form of PseAAC (pseudo amino acid composition) via a grey system model, which is particularly useful for solving complicated problems that are lack of sufficient information or need to process uncertain information. It was observed by the jackknife test that iSMP-Grey achieved an overall success rate of 94.8%, remarkably higher than those by the existing predictors in this area. As a user-friendly web-server, iSMP-Grey is freely accessible to the public at http://www.jci-bioinfo.cn/iSMP-Grey. Moreover, for the convenience of most experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated mathematical equations involved in this paper.


Assuntos
Aminoácidos/química , Biologia Computacional/métodos , Evolução Molecular , Plasmodium/química , Proteínas de Protozoários/química , Bases de Dados de Proteínas , Humanos , Internet , Plasmodium/metabolismo
14.
PLoS One ; 6(9): e24756, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21935457

RESUMO

DNA-binding proteins play crucial roles in various cellular processes. Developing high throughput tools for rapidly and effectively identifying DNA-binding proteins is one of the major challenges in the field of genome annotation. Although many efforts have been made in this regard, further effort is needed to enhance the prediction power. By incorporating the features into the general form of pseudo amino acid composition that were extracted from protein sequences via the "grey model" and by adopting the random forest operation engine, we proposed a new predictor, called iDNA-Prot, for identifying uncharacterized proteins as DNA-binding proteins or non-DNA binding proteins based on their amino acid sequences information alone. The overall success rate by iDNA-Prot was 83.96% that was obtained via jackknife tests on a newly constructed stringent benchmark dataset in which none of the proteins included has ≥25% pairwise sequence identity to any other in a same subset. In addition to achieving high success rate, the computational time for iDNA-Prot is remarkably shorter in comparison with the relevant existing predictors. Hence it is anticipated that iDNA-Prot may become a useful high throughput tool for large-scale analysis of DNA-binding proteins. As a user-friendly web-server, iDNA-Prot is freely accessible to the public at the web-site on http://icpr.jci.edu.cn/bioinfo/iDNA-Prot or http://www.jci-bioinfo.cn/iDNA-Prot. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results.


Assuntos
Biologia Computacional/métodos , Proteínas de Ligação a DNA/análise , Algoritmos , Bases de Dados de Proteínas , Internet
15.
Protein Eng Des Sel ; 22(11): 699-705, 2009 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-19776029

RESUMO

G-protein-coupled receptors (GPCRs) play fundamental roles in regulating various physiological processes as well as the activity of virtually all cells. Different GPCR families are responsible for different functions. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop an automated method to address the two problems: given the sequence of a query protein, can we identify whether it is a GPCR? If it is, what family class does it belong to? Here, a two-layer ensemble classifier called GPCR-GIA was proposed by introducing a novel scale called 'grey incident degree'. The overall success rate by GPCR-GIA in identifying GPCR and non-GPCR was about 95%, and that in identifying the GPCRs among their nine family classes was about 80%. These rates were obtained by the jackknife cross-validation tests on the stringent benchmark data sets where none of the proteins has > or = 50% pairwise sequence identity to any other in a same class. Moreover, a user-friendly web-server was established at http://218.65.61.89:8080/bioinfo/GPCR-GIA. For user's convenience, a step-by-step guide on how to use the GPCR-GIA web server is provided. Generally speaking, one can get the desired two-level results in around 10 s for a query protein sequence of 300-400 amino acids; the longer the sequence is, the more time that is needed.


Assuntos
Algoritmos , Biologia Computacional/métodos , Internet , Receptores Acoplados a Proteínas G/análise , Receptores Acoplados a Proteínas G/classificação , Bases de Dados de Proteínas , Receptores Acoplados a Proteínas G/química , Interface Usuário-Computador
16.
Amino Acids ; 37(4): 741-9, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19037711

RESUMO

Many proteins are composed of two or more subunits, each associated with different polypeptide chains. The number and arrangement of subunits forming a protein are referred to as quaternary structure. It has been known for long that the functions of proteins are closely related to their quaternary structure. In this paper the grey incidence degree is introduced that can calculate the numerical relation between various components, expressed the similar or different degree between these components. We have demonstrated that introduction of the grey incidence degree can remarkably enhance the success rates in predicting the protein quaternary structural class. It is anticipated that the grey incidence degree can be also used to predict many other protein attributes, such as subcellular localization, membrane protein type, enzyme functional class, GPCR type, protease type, among many others.


Assuntos
Estrutura Quaternária de Proteína , Proteínas/química , Análise de Sequência de Proteína/métodos , Algoritmos , Bases de Dados de Proteínas
17.
J Comput Chem ; 29(12): 2018-24, 2008 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-18381630

RESUMO

Using the pseudo amino acid (PseAA) composition to represent the sample of a protein can incorporate a considerable amount of sequence pattern information so as to improve the prediction quality for its structural or functional classification. However, how to optimally formulate the PseAA composition is an important problem yet to be solved. In this article the grey modeling approach is introduced that is particularly efficient in coping with complicated systems such as the one consisting of many proteins with different sequence orders and lengths. On the basis of the grey model, four coefficients derived from each of the protein sequences concerned are adopted for its PseAA components. The PseAA composition thus formulated is called the "grey-PseAA" composition that can catch the essence of a protein sequence and better reflect its overall pattern. In our study we have demonstrated that introduction of the grey-PseAA composition can remarkably enhance the success rates in predicting the protein structural class. It is anticipated that the concept of grey-PseAA composition can be also used to predict many other protein attributes, such as subcellular localization, membrane protein type, enzyme functional class, GPCR type, protease type, among many others.


Assuntos
Aminoácidos/química , Modelos Moleculares , Proteínas/química , Sequência de Aminoácidos , Proteínas/fisiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA