Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Sci Rep ; 13(1): 23075, 2023 12 27.
Artigo em Inglês | MEDLINE | ID: mdl-38155251

RESUMO

Unconjugated bilirubin (UB) levels during the first week after birth are related to outcomes in neonatal hypoxic-ischemic encephalopathy (HIE). Clinical Sarnat staging of HIE, brain magnetic resonance imaging (MRI), hearing outcomes, and neurodevelopmental outcomes ≥ 1 year were used to correlate UB in 82 HIE patients. The initial UB level was significantly correlated with lactic acid levels. The peak UB was higher (p < 0.001) in stage I (10.13 ± 4.03 mg/dL, n = 34) than in stages II and III (6.11 ± 2.88 mg/dL, n = 48). Among the 48 patients receiving hypothermia treatment, a higher peak UB was significantly (p < 0.001) correlated with unremarkable brain MRI scans and unremarkable neurodevelopmental outcomes at age ≥ 1 year. The peak UB were higher (P = 0.015) in patients free of seizures until 1 year of age (6.63 ± 2.91 mg/dL) than in patients with seizures (4.17 ± 1.77 mg/dL). Regarding hearing outcomes, there were no significant differences between patients with and without hearing loss. The UB level in the first week after birth is an important biomarker for clinical staging, MRI findings, seizures after discharge before 1 year of age, and neurodevelopmental outcomes at ≥ 1 year of age.


Assuntos
Hipotermia Induzida , Hipóxia-Isquemia Encefálica , Recém-Nascido , Humanos , Hipóxia-Isquemia Encefálica/diagnóstico por imagem , Hipóxia-Isquemia Encefálica/terapia , Hipotermia Induzida/métodos , Imageamento por Ressonância Magnética/métodos , Convulsões/terapia , Bilirrubina
2.
Bioinformatics ; 38(18): 4428-4429, 2022 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-35904542

RESUMO

MOTIVATION: MIB2 (metal ion-binding) attempts to overcome the limitation of structure-based prediction approaches, with many proteins lacking a solved structure. MIB2 also offers more accurate prediction performance and more metal ion types. RESULTS: MIB2 utilizes both the (PS)2 method and the AlphaFold Protein Structure Database to acquire predicted structures to perform metal ion docking and predict binding residues. MIB2 offers marked improvements over MIB by collecting more MIB residue templates and using the metal ion type-specific scoring function. It offers a total of 18 types of metal ions for binding site predictions. AVAILABILITY AND IMPLEMENTATION: Freely available on the web at http://bioinfo.cmu.edu.tw/MIB2/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Computadores , Proteínas , Bases de Dados de Proteínas , Proteínas/química , Sítios de Ligação , Domínios Proteicos , Metais , Software
3.
Int J Mol Sci ; 23(8)2022 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-35456975

RESUMO

Glioblastoma (GBM) is one of the most common malignant and incurable brain tumors. The identification of a gene signature for GBM may be helpful for its diagnosis, treatment, prediction of prognosis and even the development of treatments. In this study, we used the GSE108474 database to perform GSEA and machine learning analysis, and identified a 33-gene signature of GBM by examining astrocytoma or non-GBM glioma differential gene expression. The 33 identified signature genes included the overexpressed genes COL6A2, ABCC3, COL8A1, FAM20A, ADM, CTHRC1, PDPN, IBSP, MIR210HG, GPX8, MYL9 and PDLIM4, as well as the underexpressed genes CHST9, CSDC2, ENHO, FERMT1, IGFN1, LINC00836, MGAT4C, SHANK2 and VIPR2. Protein functional analysis by CELLO2GO implied that these signature genes might be involved in regulating various aspects of biological function, including anatomical structure development, cell proliferation and adhesion, signaling transduction and many of the genes were annotated in response to stress. Of these 33 signature genes, 23 have previously been reported to be functionally correlated with GBM; the roles of the remaining 10 genes in glioma development remain unknown. Our results were the first to reveal that GBM exhibited the overexpressed GPX8 gene and underexpressed signature genes including CHST9, CSDC2, ENHO, FERMT1, IGFN1, LINC00836, MGAT4C and SHANK2, which might play crucial roles in the tumorigenesis of different gliomas.


Assuntos
Neoplasias Encefálicas , Glioblastoma , Glioma , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Neoplasias Encefálicas/patologia , Proteínas de Ligação a DNA/metabolismo , Proteínas da Matriz Extracelular , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Glioblastoma/metabolismo , Glioma/metabolismo , Humanos , Peptídeos e Proteínas de Sinalização Intercelular , Proteínas com Domínio LIM/genética , Proteínas de Membrana/metabolismo , Proteínas de Neoplasias/metabolismo , Peroxidases , Sulfotransferases/metabolismo
4.
Pharmaceuticals (Basel) ; 15(2)2022 Jan 24.
Artigo em Inglês | MEDLINE | ID: mdl-35215249

RESUMO

Cancer drug resistance presents a challenge for precision medicine. Drug-resistant mutations are always emerging. In this study, we explored the relationship between drug-resistant mutations and drug resistance from the perspective of protein structure. By combining data from previously identified drug-resistant mutations and information of protein structure and function, we used machine learning-based methods to build models to predict cancer drug resistance mutations. The performance of our combined model achieved an accuracy of 86%, a Matthews correlation coefficient score of 0.57, and an F1 score of 0.66. We have constructed a fast, reliable method that predicts and investigates cancer drug resistance in a protein structure. Nonetheless, more information is needed concerning drug resistance and, in particular, clarification is needed about the relationships between the drug and the drug resistance mutations in proteins. Highly accurate predictions regarding drug resistance mutations can be helpful for developing new strategies with personalized cancer treatments. Our novel concept, which combines protein structure information, has the potential to elucidate physiological mechanisms of cancer drug resistance.

5.
J Clin Med ; 10(17)2021 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-34501458

RESUMO

Troponin I is a biomarker for cardiac injury in children. The role of troponin I in neonatal Hypoxic-Ischemic encephalopathy (HIE) may have valuable clinical implications. Troponin I levels were measured within 6 h of birth to determine their relationship to HIE stage, short-term cardiac functional outcomes, and neurodevelopmental outcomes at 1 year. Seventy-three patients were divided into two groups: mild HIE and moderate to severe HIE. Troponin I levels within 6 h of birth were obtained in 61 patients, and were significantly higher in patients with moderate to severe HIE than in patients with mild HIE (Mann-Whitney U test, U = 146, p = 0.001). A troponin I cut-off level of ≥60 pg/mL predicted moderate to severe HIE with a specificity of 81.1% and a negative prediction rate of 76.9%. A troponin I cut-off level of ≥180 pg/mL was significantly (χ2 (1, n = 61) = 33.1, p = 0.001, odds ratio 96.8) related with hypotension during first admission and significantly (χ2 (1, n = 61) = 5.3, p = 0.021, odds ratio 4.53) related with abnormal neurodevelopmental outcomes at 1 year. Early troponin I level may be a useful biomarker for predicting moderate to severe HIE, and initialization of hypothermia therapy.

6.
Sci Rep ; 11(1): 13599, 2021 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-34193921

RESUMO

Single amino acid variation (SAV) is an amino acid substitution of the protein sequence that can potentially influence the entire protein structure or function, as well as its binding affinity. Protein destabilization is related to diseases, including several cancers, although using traditional experiments to clarify the relationship between SAVs and cancer uses much time and resources. Some SAV prediction methods use computational approaches, with most predicting SAV-induced changes in protein stability. In this investigation, all SAV characteristics generated from protein sequences, structures and the microenvironment were converted into feature vectors and fed into an integrated predicting system using a support vector machine and genetic algorithm. Critical features were used to estimate the relationship between their properties and cancers caused by SAVs. We describe how we developed a prediction system based on protein sequences and structure that is capable of distinguishing if the SAV is related to cancer or not. The five-fold cross-validation performance of our system is 89.73% for the accuracy, 0.74 for the Matthews correlation coefficient, and 0.81 for the F1 score. We have built an online prediction server, CanSavPre ( http://bioinfo.cmu.edu.tw/CanSavPre/ ), which is expected to become a useful, practical tool for cancer research and precision medicine.


Assuntos
Modelos Biológicos , Neoplasias , Máquina de Vetores de Suporte , Substituição de Aminoácidos , Humanos , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/metabolismo , Neoplasias/genética , Neoplasias/metabolismo
7.
Diagnostics (Basel) ; 11(5)2021 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-34070031

RESUMO

BACKGROUND: Identifying an effective method for the early diagnosis of neonatal hypoxic-ischemic encephalopathy (HIE) would be beneficial for effective therapies. METHODS: We studied blood biomarkers before 6 h after birth to correlate the degree of neonatal HIE. A total of 80 patients were divided into group 1 (mild HIE) and group 2 (moderate or severe HIE). Then, 42 patients from group 2 received hypothermia therapy and were further divided into group 3 (unremarkable or mild MRI results) and group 4 (severe MRI results). RESULTS: Between groups 1 and 2, lactate, creatinine, white blood cells, and lactate dehydrogenase (LDH) were significantly different. Between groups 3 and 4, lactate, prothrombin time, and albumin were significantly different. Sarnat staging was based on our observation that more than 45 mg/dL of lactate combined with more than 1000 U/L of LDH yielded the highest positive predictive value (PPV) (95.7%; odds ratio, 22.00), but a low negative predictive value (NPV) for moderate or severe HIE. Using more than 45 mg/dL of lactate yielded the highest NPV (71.4%) correlated with moderate or severe HIE. CONCLUSIONS: Lactate combined with LDH before 6 h after birth yielded a high PPV. Using combined biomarkers to exclude mild HIE, include moderate or severe HIE, and initialize hypothermia therapy is feasible.

8.
J Chem Inf Model ; 56(12): 2287-2291, 2016 12 27.
Artigo em Inglês | MEDLINE | ID: mdl-27976886

RESUMO

The structure of a protein determines its biological function(s) and its interactions with other factors; the binding regions tend to be conserved in sequence and structure, and the interacting residues involved are usually in close 3D space. The Protein Data Bank currently contains more than 110 000 protein structures, approximately one-third of which contain metal ions. Identifying and characterizing metal ion-binding sites is thus essential for investigating a protein's function(s) and interactions. However, experimental approaches are time-consuming and costly. The web server reported here was built to predict metal ion-binding residues and to generate the predicted metal ion-bound 3D structure. Binding templates have been constructed for regions that bind 12 types of metal ion-binding residues have been used to construct binding templates. The templates include residues within 3.5 Šof the metal ion, and the fragment transformation method was used for structural comparison between query proteins and templates without any data training. Through the adjustment of scoring functions, which are based on the similarity of structure and binding residues. Twelve kinds of metal ions (Ca2+, Cu2+, Fe3+, Mg2+, Mn2+, Zn2+, Cd2+, Fe2+, Ni2+, Hg2+, Co2+, and Cu+) binding residues prediction are supported. MIB also provides the metal ions docking after prediction. The MIB server is available at http://bioinfo.cmu.edu.tw/MIB/ .


Assuntos
Metais/metabolismo , Simulação de Acoplamento Molecular , Proteínas/metabolismo , Sítios de Ligação , Cátions/metabolismo , Bases de Dados de Proteínas , Internet , Conformação Proteica , Proteínas/química , Software
9.
Int J Mol Sci ; 16(7): 15136-49, 2015 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-26151847

RESUMO

Protein structure prediction (PSP) is concerned with the prediction of protein tertiary structure from primary structure and is a challenging calculation problem. After decades of research effort, numerous solutions have been proposed for optimisation methods based on energy models. However, further investigation and improvement is still needed to increase the accuracy and similarity of structures. This study presents a novel backbone angle preference factor, which is one of the factors inducing protein folding. The proposed multiobjective optimisation approach simultaneously considers energy models and backbone angle preferences to solve the ab initio PSP. To prove the effectiveness of the multiobjective optimisation approach based on the energy models and backbone angle preferences, 75 amino acid sequences with lengths ranging from 22 to 88 amino acids were selected from the CB513 data set to be the benchmarks. The data sets were highly dissimilar, therefore indicating that they are meaningful. The experimental results showed that the root-mean-square deviation (RMSD) of the multiobjective optimization approach based on energy model and backbone angle preferences was superior to those of typical energy models, indicating that the proposed approach can facilitate the ab initio PSP.


Assuntos
Algoritmos , Simulação de Dinâmica Molecular , Conformação Proteica , Termodinâmica
10.
Biomed Res Int ; 2015: 402536, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26000290

RESUMO

We developed a computational method to identify NAD- and FAD-binding sites in proteins. First, we extracted from the Protein Data Bank structures of proteins that bind to at least one of these ligands. NAD-/FAD-binding residue templates were then constructed by identifying binding residues through the ligand-binding database BioLiP. The fragment transformation method was used to identify structures within query proteins that resembled the ligand-binding templates. By comparing residue types and their relative spatial positions, potential binding sites were identified and a ligand-binding potential for each residue was calculated. Setting the false positive rate at 5%, our method predicted NAD- and FAD-binding sites at true positive rates of 67.1% and 68.4%, respectively. Our method provides excellent results for identifying FAD- and NAD-binding sites in proteins, and the most important is that the requirement of conservation of residue types and local structures in the FAD- and NAD-binding sites can be verified.


Assuntos
Bioquímica/métodos , Flavina-Adenina Dinucleotídeo/metabolismo , NAD/metabolismo , Proteínas/química , Aminoácidos , Sítios de Ligação , Modelos Moleculares , Máquina de Vetores de Suporte
11.
Biomed Res Int ; 2014: 807839, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25295274

RESUMO

We propose a method (EXIA2) of catalytic residue prediction based on protein structure without needing homology information. The method is based on the special side chain orientation of catalytic residues. We found that the side chain of catalytic residues usually points to the center of the catalytic site. The special orientation is usually observed in catalytic residues but not in noncatalytic residues, which usually have random side chain orientation. The method is shown to be the most accurate catalytic residue prediction method currently when combined with PSI-Blast sequence conservation. It performs better than other competing methods on several benchmark datasets that include over 1,200 enzyme structures. The areas under the ROC curve (AUC) on these benchmark datasets are in the range from 0.934 to 0.968.


Assuntos
Catálise , Domínio Catalítico , Proteínas/química , Software , Biologia Computacional , Bases de Dados de Proteínas , Internet , Conformação Proteica , Proteínas/metabolismo
12.
PLoS One ; 9(6): e99368, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24911789

RESUMO

CELLO2GO (http://cello.life.nctu.edu.tw/cello2go/) is a publicly available, web-based system for screening various properties of a targeted protein and its subcellular localization. Herein, we describe how this platform is used to obtain a brief or detailed gene ontology (GO)-type categories, including subcellular localization(s), for the queried proteins by combining the CELLO localization-predicting and BLAST homology-searching approaches. Given a query protein sequence, CELLO2GO uses BLAST to search for homologous sequences that are GO annotated in an in-house database derived from the UniProt KnowledgeBase database. At the same time, CELLO attempts predict at least one subcellular localization on the basis of the species in which the protein is found. When homologs for the query sequence have been identified, the number of terms found for each of their GO categories, i.e., cellular compartment, molecular function, and biological process, are summed and presented as pie charts representing possible functional annotations for the queried protein. Although the experimental subcellular localization of a protein may not be known, and thus not annotated, CELLO can confidentially suggest a subcellular localization. CELLO2GO should be a useful tool for research involving complex subcellular systems because it combines CELLO and BLAST into one platform and its output is easily manipulated such that the user-specific questions may be readily addressed.


Assuntos
Bases de Dados Genéticas , Proteínas/metabolismo , Software , Proteínas de Bactérias/metabolismo , Biologia Computacional/métodos , Conjuntos de Dados como Assunto , Internet , Anotação de Sequência Molecular , Transporte Proteico , Proteínas/genética , Interface Usuário-Computador
13.
Biomed Res Int ; 2013: 185679, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24288665

RESUMO

Recent progress in high-throughput instrumentations has led to an astonishing growth in both volume and complexity of biomedical data collected from various sources. The planet-size data brings serious challenges to the storage and computing technologies. Cloud computing is an alternative to crack the nut because it gives concurrent consideration to enable storage and high-performance computing on large-scale data. This work briefly introduces the data intensive computing system and summarizes existing cloud-based resources in bioinformatics. These developments and applications would facilitate biomedical research to make the vast amount of diversification data meaningful and usable.


Assuntos
Pesquisa Biomédica , Biologia Computacional , Genômica , Software , Pesquisa Translacional Biomédica
14.
PLoS One ; 7(6): e39252, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22723976

RESUMO

The structure of a protein determines its function and its interactions with other factors. Regions of proteins that interact with ligands, substrates, and/or other proteins, tend to be conserved both in sequence and structure, and the residues involved are usually in close spatial proximity. More than 70,000 protein structures are currently found in the Protein Data Bank, and approximately one-third contain metal ions essential for function. Identifying and characterizing metal ion-binding sites experimentally is time-consuming and costly. Many computational methods have been developed to identify metal ion-binding sites, and most use only sequence information. For the work reported herein, we developed a method that uses sequence and structural information to predict the residues in metal ion-binding sites. Six types of metal ion-binding templates- those involving Ca(2+), Cu(2+), Fe(3+), Mg(2+), Mn(2+), and Zn(2+)-were constructed using the residues within 3.5 Å of the center of the metal ion. Using the fragment transformation method, we then compared known metal ion-binding sites with the templates to assess the accuracy of our method. Our method achieved an overall 94.6 % accuracy with a true positive rate of 60.5 % at a 5 % false positive rate and therefore constitutes a significant improvement in metal-binding site prediction.


Assuntos
Íons/química , Metaloproteínas/química , Metais/química , Modelos Moleculares , Aminoácidos/química , Sítios de Ligação , Peptídeos/química , Ligação Proteica , Conformação Proteica , Curva ROC
15.
PLoS One ; 6(5): e20445, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21655262

RESUMO

For the first time, multiple sets of n-peptide compositions from antifreeze protein (AFP) sequences of various cold-adapted fish and insects were analyzed using support vector machine and genetic algorithms. The identification of AFPs is difficult because they exist as evolutionarily divergent types, and because their sequences and structures are present in limited numbers in currently available databases. Our results reveal that it is feasible to identify the shared sequential features among the various structural types of AFPs. Moreover, we were able to identify residues involved in ice binding without requiring knowledge of the three-dimensional structures of these AFPs. This approach should be useful for genomic and proteomic studies involving cold-adapted organisms.


Assuntos
Algoritmos , Proteínas Anticongelantes/química , Proteínas Anticongelantes/metabolismo , Animais , Biologia Computacional , Proteínas de Peixes/química , Proteínas de Peixes/metabolismo , Proteínas de Insetos/química , Proteínas de Insetos/metabolismo , Modelos Moleculares
16.
Proteins ; 67(2): 262-70, 2007 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-17285623

RESUMO

Disulfide bonds play an important role in stabilizing protein structure and regulating protein function. Therefore, the ability to infer disulfide connectivity from protein sequences will be valuable in structural modeling and functional analysis. However, to predict disulfide connectivity directly from sequences presents a challenge to computational biologists due to the nonlocal nature of disulfide bonds, i.e., the close spatial proximity of the cysteine pair that forms the disulfide bond does not necessarily imply the short sequence separation of the cysteine residues. Recently, Chen and Hwang (Proteins 2005;61:507-512) treated this problem as a multiple class classification by defining each distinct disulfide pattern as a class. They used multiple support vector machines based on a variety of sequence features to predict the disulfide patterns. Their results compare favorably with those in the literature for a benchmark dataset sharing less than 30% sequence identity. However, since the number of disulfide patterns grows rapidly when the number of disulfide bonds increases, their method performs unsatisfactorily for the cases of large number of disulfide bonds. In this work, we propose a novel method to represent disulfide connectivity in terms of cysteine pairs, instead of disulfide patterns. Since the number of bonding states of the cysteine pairs is independent of that of disulfide bonds, the problem of class explosion is avoided. The bonding states of the cysteine pairs are predicted using the support vector machines together with the genetic algorithm optimization for feature selection. The complete disulfide patterns are then determined from the connectivity matrices that are constructed from the predicted bonding states of the cysteine pairs. Our approach outperforms the current approaches in the literature.


Assuntos
Dissulfetos/química , Modelos Moleculares , Proteínas/química , Algoritmos , Sequência de Aminoácidos , Biologia Computacional/métodos , Cisteína/química , Dissulfetos/classificação
17.
Proteins ; 64(3): 643-51, 2006 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-16752418

RESUMO

Because the protein's function is usually related to its subcellular localization, the ability to predict subcellular localization directly from protein sequences will be useful for inferring protein functions. Recent years have seen a surging interest in the development of novel computational tools to predict subcellular localization. At present, these approaches, based on a wide range of algorithms, have achieved varying degrees of success for specific organisms and for certain localization categories. A number of authors have noticed that sequence similarity is useful in predicting subcellular localization. For example, Nair and Rost (Protein Sci 2002;11:2836-2847) have carried out extensive analysis of the relation between sequence similarity and identity in subcellular localization, and have found a close relationship between them above a certain similarity threshold. However, many existing benchmark data sets used for the prediction accuracy assessment contain highly homologous sequences-some data sets comprising sequences up to 80-90% sequence identity. Using these benchmark test data will surely lead to overestimation of the performance of the methods considered. Here, we develop an approach based on a two-level support vector machine (SVM) system: the first level comprises a number of SVM classifiers, each based on a specific type of feature vectors derived from sequences; the second level SVM classifier functions as the jury machine to generate the probability distribution of decisions for possible localizations. We compare our approach with a global sequence alignment approach and other existing approaches for two benchmark data sets-one comprising prokaryotic sequences and the other eukaryotic sequences. Furthermore, we carried out all-against-all sequence alignment for several data sets to investigate the relationship between sequence homology and subcellular localization. Our results, which are consistent with previous studies, indicate that the homology search approach performs well down to 30% sequence identity, although its performance deteriorates considerably for sequences sharing lower sequence identity. A data set of high homology levels will undoubtedly lead to biased assessment of the performances of the predictive approaches-especially those relying on homology search or sequence annotations. Our two-level classification system based on SVM does not rely on homology search; therefore, its performance remains relatively unaffected by sequence homology. When compared with other approaches, our approach performed significantly better. Furthermore, we also develop a practical hybrid method, which combines the two-level SVM classifier and the homology search method, as a general tool for the sequence annotation of subcellular localization.


Assuntos
Algoritmos , Biologia Computacional/métodos , Proteínas/metabolismo , Bases de Dados de Proteínas , Conformação Proteica , Transporte Proteico , Proteínas/química , Proteínas/genética , Reprodutibilidade dos Testes , Alinhamento de Sequência , Software
18.
Proteins ; 63(3): 636-43, 2006 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-16470805

RESUMO

To identify functional structural motifs from protein structures of unknown function becomes increasingly important in recent years due to the progress of the structural genomics initiatives. Although certain structural patterns such as the Asp-His-Ser catalytic triad are easy to detect because of their conserved residues and stringently constrained geometry, it is usually more challenging to detect a general structural motifs like, for example, the betabetaalpha-metal binding motif, which has a much more variable conformation and sequence. At present, the identification of these motifs usually relies on manual procedures based on different structure and sequence analysis tools. In this study, we develop a structural alignment algorithm combining both structural and sequence information to identify the local structure motifs. We applied our method to the following examples: the betabetaalpha-metal binding motif and the treble clef motif. The betabetaalpha-metal binding motif plays an important role in nonspecific DNA interactions and cleavage in host defense and apoptosis. The treble clef motif is a zinc-binding motif adaptable to diverse functions such as the binding of nucleic acid and hydrolysis of phosphodiester bonds. Our results are encouraging, indicating that we can effectively identify these structural motifs in an automatic fashion. Our method may provide a useful means for automatic functional annotation through detecting structural motifs associated with particular functions.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Fragmentos de Peptídeos/química , Motivos de Aminoácidos/genética , Sequência de Aminoácidos , Dados de Sequência Molecular , Fragmentos de Peptídeos/genética , Estrutura Secundária de Proteína/genética
19.
Protein Sci ; 13(5): 1402-6, 2004 May.
Artigo em Inglês | MEDLINE | ID: mdl-15096640

RESUMO

Gram-negative bacteria have five major subcellular localization sites: the cytoplasm, the periplasm, the inner membrane, the outer membrane, and the extracellular space. The subcellular location of a protein can provide valuable information about its function. With the rapid increase of sequenced genomic data, the need for an automated and accurate tool to predict subcellular localization becomes increasingly important. We present an approach to predict subcellular localization for Gram-negative bacteria. This method uses the support vector machines trained by multiple feature vectors based on n-peptide compositions. For a standard data set comprising 1443 proteins, the overall prediction accuracy reaches 89%, which, to the best of our knowledge, is the highest prediction rate ever reported. Our prediction is 14% higher than that of the recently developed multimodular PSORT-B. Because of its simplicity, this approach can be easily extended to other organisms and should be a useful tool for the high-throughput and large-scale analysis of proteomic and genomic data.


Assuntos
Inteligência Artificial , Proteínas de Bactérias/análise , Bactérias Gram-Negativas/química , Espaço Intracelular/química , Interpretação Estatística de Dados , Peptídeos/química
20.
Proteins ; 50(4): 531-6, 2003 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-12577258

RESUMO

In the coarse-grained fold assignment of major protein classes, such as all-alpha, all-beta, alpha + beta, alpha/beta proteins, one can easily achieve high prediction accuracy from primary amino acid sequences. However, the fine-grained assignment of folds, such as those defined in the Structural Classification of Proteins (SCOP) database, presents a challenge due to the larger amount of folds available. Recent study yielded reasonable prediction accuracy of 56.0% on an independent set of 27 most populated folds. In this communication, we apply the support vector machine (SVM) method, using a combination of protein descriptors based on the properties derived from the composition of n-peptide and jury voting, to the fine-grained fold prediction, and are able to achieve an overall prediction accuracy of 69.6% on the same independent set-significantly higher than the previous results. On 10-fold cross-validation, we obtained a prediction accuracy of 65.3%. Our results show that SVM coupled with suitable global sequence-coding schemes can significantly improve the fine-grained fold prediction. Our approach should be useful in structure prediction and modeling.


Assuntos
Estrutura Secundária de Proteína , Proteínas/química , Análise de Sequência de Proteína/métodos , Animais , Interações Hidrofóbicas e Hidrofílicas , Peptídeos/química , Dobramento de Proteína , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA