Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Int J Bioinform Res Appl ; 10(6): 647-52, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25335568

RESUMEN

An interactome is defined as a network of protein-protein interactions built from experimentally verified interactions. Basic science as well as application-based research of potential new drugs can be promoted by including proteins that are only predicted into interactomes. The disadvantage of doing so is the risk of devaluing the definition of interactomes. By adding proteins that have only been predicted, an interactome can no longer be classified as experimentally verified and the integrity of the interactome will be endured. Therefore, we propose the term 'hypothome' (collection of hypothetical interactions of predicted proteins). The purpose of such a term is to provide a denotation to the interactome concept allowing the interaction of predicted proteins without devaluing the integrity of the interactome. We define a rule-set for a hypothome and have integrated the predicted protein interaction partners to the hypothetical protein. EAW74251 is an example for the usage of a hypothome.


Asunto(s)
Metaboloma/fisiología , Mapeo de Interacción de Proteínas/clasificación , Proteínas/clasificación , Proteínas/metabolismo , Proteoma/clasificación , Proteoma/metabolismo , Terminología como Asunto , Mapeo de Interacción de Proteínas/métodos , Integración de Sistemas
2.
BMC Bioinformatics ; 14: 347, 2013 Dec 03.
Artículo en Inglés | MEDLINE | ID: mdl-24299017

RESUMEN

BACKGROUND: Protein complexes are basic cellular entities that carry out the functions of their components. It can be found that in databases of protein complexes of yeast like CYC2008, the major type of known protein complexes is heterodimeric complexes. Although a number of methods for trying to predict sets of proteins that form arbitrary types of protein complexes simultaneously have been proposed, it can be found that they often fail to predict heterodimeric complexes. RESULTS: In this paper, we have designed several features characterizing heterodimeric protein complexes based on genomic data sets, and proposed a supervised-learning method for the prediction of heterodimeric protein complexes. This method learns the parameters of the features, which are embedded in the naïve Bayes classifier. The log-likelihood ratio derived from the naïve Bayes classifier with the parameter values obtained by maximum likelihood estimation gives the score of a given pair of proteins to predict whether the pair is a heterodimeric complex or not. A five-fold cross-validation shows good performance on yeast. The trained classifiers also show higher predictability than various existing algorithms on yeast data sets with approximate and exact matching criteria. CONCLUSIONS: Heterodimeric protein complex prediction is a rather harder problem than heteromeric protein complex prediction because heterodimeric protein complex is topologically simpler. However, it turns out that by designing features specialized for heterodimeric protein complexes, predictability of them can be improved. Thus, the design of more sophisticate features for heterodimeric protein complexes as well as the accumulation of more accurate and useful genome-wide data sets will lead to higher predictability of heterodimeric protein complexes. Our tool can be downloaded from http://imi.kyushu-u.ac.jp/~om/.


Asunto(s)
Mapeo de Interacción de Proteínas/clasificación , Mapeo de Interacción de Proteínas/métodos , Multimerización de Proteína , Algoritmos , Teorema de Bayes , Bases de Datos Factuales , Predicción , Funciones de Verosimilitud , Valor Predictivo de las Pruebas , Mapas de Interacción de Proteínas , Proteínas , Proyectos de Investigación , Saccharomyces cerevisiae/enzimología
3.
Adv Exp Med Biol ; 696: 263-70, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21431566

RESUMEN

Protein-protein interaction has proven to be a valuable biological knowledge and an initial point for understanding how the cell internally works. In this chapter, we introduce a novel approach termed STRIKE which uses String Kernel to predict protein-protein interaction. STRIKE classifies protein pairs into "interacting" and "non-interacting" sets based solely on amino acid sequence information. The classification is performed by applying the string kernel approach, which has been shown to achieve good performance on text categorization and protein sequence classification. Two proteins are classified as "interacting" if they contain similar substrings of amino acids. Strings' similarity would allow one to infer homology which could lead to a very similar structural relationship. To evaluate the performance of STRIKE, we apply it to classify into "interacting" and "non-interacting" protein pairs. The dataset of the protein pairs are generated from the yeast protein interaction literature. The dataset is supported by different lines of experimental evidence. STRIKE was able to achieve reasonable improvement over the existing protein-protein interaction prediction methods.


Asunto(s)
Mapeo de Interacción de Proteínas/clasificación , Mapeo de Interacción de Proteínas/estadística & datos numéricos , Algoritmos , Secuencia de Aminoácidos , Biología Computacional , Minería de Datos , Bases de Datos de Proteínas , Reconocimiento de Normas Patrones Automatizadas , Dominios y Motivos de Interacción de Proteínas/genética , Proteínas de Saccharomyces cerevisiae/clasificación , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Programas Informáticos
4.
Artículo en Inglés | MEDLINE | ID: mdl-20704011

RESUMEN

We present the results of the BioCreative II.5 evaluation in association with the FEBS Letters experiment, where authors created Structured Digital Abstracts to capture information about protein-protein interactions. The BioCreative II.5 challenge evaluated automatic annotations from 15 text mining teams based on a gold standard created by reconciling annotations from curators, authors, and automated systems. The tasks were to rank articles for curation based on curatable protein-protein interactions; to identify the interacting proteins (using UniProt identifiers) in the positive articles (61); and to identify interacting protein pairs. There were 595 full-text articles in the evaluation test set, including those both with and without curatable protein interactions. The principal evaluation metrics were the interpolated area under the precision/recall curve (AUC iP/R), and (balanced) F-measure. For article classification, the best AUC iP/R was 0.70; for interacting proteins, the best system achieved good macroaveraged recall (0.73) and interpolated area under the precision/recall curve (0.58), after filtering incorrect species and mapping homonymous orthologs; for interacting protein pairs, the top (filtered, mapped) recall was 0.42 and AUC iP/R was 0.29. Ensemble systems improved performance for the interacting protein task.


Asunto(s)
Indización y Redacción de Resúmenes , Biología Computacional/métodos , Minería de Datos/métodos , Gestión de la Información/métodos , Mapeo de Interacción de Proteínas/clasificación , Recolección de Datos/métodos , Sistemas de Administración de Bases de Datos , Bases de Datos Factuales , Procesamiento de Lenguaje Natural
5.
Artículo en Inglés | MEDLINE | ID: mdl-20671313

RESUMEN

We participated (as Team 9) in the Article Classification Task of the Biocreative II.5 Challenge: binary classification of full-text documents relevant for protein-protein interaction. We used two distinct classifiers for the online and offline challenges: 1) the lightweight Variable Trigonometric Threshold (VTT) linear classifier we successfully introduced in BioCreative 2 for binary classification of abstracts and 2) a novel Naive Bayes classifier using features from the citation network of the relevant literature. We supplemented the supplied training data with full-text documents from the MIPS database. The lightweight VTT classifier was very competitive in this new full-text scenario: it was a top-performing submission in this task, taking into account the rank product of the Area Under the interpolated precision and recall Curve, Accuracy, Balanced F-Score, and Matthew's Correlation Coefficient performance measures. The novel citation network classifier for the biomedical text mining domain, while not a top performing classifier in the challenge, performed above the central tendency of all submissions, and therefore indicates a promising new avenue to investigate further in bibliome informatics.


Asunto(s)
Indización y Redacción de Resúmenes/clasificación , Biología Computacional/métodos , Minería de Datos/métodos , Mapeo de Interacción de Proteínas/clasificación , Algoritmos , Bases de Datos Bibliográficas , Redes Neurales de la Computación , Publicaciones Periódicas como Asunto
6.
J Theor Biol ; 254(2): 301-7, 2008 Sep 21.
Artículo en Inglés | MEDLINE | ID: mdl-18621060

RESUMEN

Various sources of protein data, such as knowledgebases and scientific literature, are currently available, as are numerous tools for their analysis. The matter becomes one of choosing the tools that are most appropriate for the specific task and for the specific proteins. A combination of standard and alternative tools may lead to biologically significant results. Here, a computational classification of proteins is made using standard multiple sequence alignment in combination with an alternative method for analysis of hydropathy distribution in proteins. Both of these methods are applied to the Na+/Cl--dependent neurotransmitter symporters (NSSs), resulting in two alternative classifications. The classifications are validated and interpreted biologically by literature and knowledgebase annotation mining, producing a consensus classification. The classification leads to the identification and functional characterization of three families of largely structurally and functionally uncharacterized orphan NSSs. The literature and knowledgebase annotations are mined to functionally characterize the NSSs in these families. The presented work also demonstrates that, in specific cases, the analysis of the hydropathy distribution in proteins is capable of revealing functional properties of proteins.


Asunto(s)
Biología Computacional/métodos , Proteínas de Transporte de Neurotransmisores en la Membrana Plasmática/clasificación , Animales , Bases de Datos de Proteínas , Interacciones Hidrofóbicas e Hidrofílicas , Bases del Conocimiento , Proteínas de Transporte de Neurotransmisores en la Membrana Plasmática/metabolismo , Mapeo de Interacción de Proteínas/clasificación , Alineación de Secuencia , Análisis de Secuencia de Proteína/métodos
7.
BMC Bioinformatics ; 9: 35, 2008 Jan 23.
Artículo en Inglés | MEDLINE | ID: mdl-18215279

RESUMEN

BACKGROUND: It has repeatedly been shown that interacting protein families tend to have similar phylogenetic trees. These similarities can be used to predicting the mapping between two families of interacting proteins (i.e. which proteins from one family interact with which members of the other). The correct mapping will be that which maximizes the similarity between the trees. The two families may eventually comprise orthologs and paralogs, if members of the two families are present in more than one organism. This fact can be exploited to restrict the possible mappings, simply by impeding links between proteins of different organisms. We present here an algorithm to predict the mapping between families of interacting proteins which is able to incorporate information regarding orthologues, or any other assignment of proteins to "classes" that may restrict possible mappings. RESULTS: For the first time in methods for predicting mappings, we have tested this new approach on a large number of interacting protein domains in order to statistically assess its performance. The method accurately predicts around 80% in the most favourable cases. We also analysed in detail the results of the method for a well defined case of interacting families, the sensor and kinase components of the Ntr-type two-component system, for which up to 98% of the pairings predicted by the method were correct. CONCLUSION: Based on the well established relationship between tree similarity and interactions we developed a method for predicting the mapping between two interacting families using genomic information alone. The program is available through a web interface.


Asunto(s)
Bases de Datos de Proteínas/clasificación , Sistemas de Información , Proteínas/clasificación , Proteínas/genética , Predicción , Sistemas de Información/tendencias , Unión Proteica/fisiología , Mapeo de Interacción de Proteínas/clasificación , Mapeo de Interacción de Proteínas/métodos , Proteínas/metabolismo , Alineación de Secuencia/métodos , Levaduras/genética , Levaduras/metabolismo
8.
BMC Bioinformatics ; 8: 442, 2007 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-18005402

RESUMEN

BACKGROUND: Classification procedures are widely used in phylogenetic inference, the analysis of expression profiles, the study of biological networks, etc. Many algorithms have been proposed to establish the similarity between two different classifications of the same elements. However, methods to determine significant coincidences between hierarchical and non-hierarchical partitions are still poorly developed, in spite of the fact that the search for such coincidences is implicit in many analyses of massive data. RESULTS: We describe a novel strategy to compare a hierarchical and a dichotomic non-hierarchical classification of elements, in order to find clusters in a hierarchical tree in which elements of a given "flat" partition are overrepresented. The key improvement of our strategy respect to previous methods is using permutation analyses of ranked clusters to determine whether regions of the dendrograms present a significant enrichment. We show that this method is more sensitive than previously developed strategies and how it can be applied to several real cases, including microarray and interactome data. Particularly, we use it to compare a hierarchical representation of the yeast mitochondrial interactome and a catalogue of known mitochondrial protein complexes, demonstrating a high level of congruence between those two classifications. We also discuss extensions of this method to other cases which are conceptually related. CONCLUSION: Our method is highly sensitive and outperforms previously described strategies. A PERL script that implements it is available at http://www.uv.es/~genomica/treetracker.


Asunto(s)
Mapeo de Interacción de Proteínas/clasificación , Inteligencia Artificial , Clasificación/métodos , Análisis por Conglomerados , Bases de Datos de Proteínas , Árboles de Decisión , Proteínas Mitocondriales/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos , Reconocimiento de Normas Patrones Automatizadas/métodos , Mapeo de Interacción de Proteínas/métodos , Reproducibilidad de los Resultados , Proteínas de Saccharomyces cerevisiae/metabolismo , Análisis de Secuencia de Proteína , Interfaz Usuario-Computador
9.
BMC Bioinformatics ; 8: 414, 2007 Oct 26.
Artículo en Inglés | MEDLINE | ID: mdl-17963500

RESUMEN

BACKGROUND: Although many genomic features have been used in the prediction of protein-protein interactions (PPIs), frequently only one is used in a computational method. After realizing the limited power in the prediction using only one genomic feature, investigators are now moving toward integration. So far, there have been few integration studies for PPI prediction; one failed to yield appreciable improvement of prediction and the others did not conduct performance comparison. It remains unclear whether an integration of multiple genomic features can improve the PPI prediction and, if it can, how to integrate these features. RESULTS: In this study, we first performed a systematic evaluation on the PPI prediction in Escherichia coli (E. coli) by four genomic context based methods: the phylogenetic profile method, the gene cluster method, the gene fusion method, and the gene neighbor method. The number of predicted PPIs and the average degree in the predicted PPI networks varied greatly among the four methods. Further, no method outperformed the others when we tested using three well-defined positive datasets from the KEGG, EcoCyc, and DIP databases. Based on these comparisons, we developed a novel integrated method, named InPrePPI. InPrePPI first normalizes the AC value (an integrated value of the accuracy and coverage) of each method using three positive datasets, then calculates a weight for each method, and finally uses the weight to calculate an integrated score for each protein pair predicted by the four genomic context based methods. We demonstrate that InPrePPI outperforms each of the four individual methods and, in general, the other two existing integrated methods: the joint observation method and the integrated prediction method in STRING. These four methods and InPrePPI are implemented in a user-friendly web interface. CONCLUSION: This study evaluated the PPI prediction by four genomic context based methods, and presents an integrated evaluation method that shows better performance in E. coli.


Asunto(s)
Unión Proteica/genética , Mapeo de Interacción de Proteínas/métodos , Integración de Sistemas , Interfaz Usuario-Computador , Secuencia de Bases , Análisis por Conglomerados , Interpretación Estadística de Datos , Bases de Datos Genéticas , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Genoma Bacteriano/fisiología , Genómica , Reconocimiento de Normas Patrones Automatizadas/métodos , Filogenia , Valor Predictivo de las Pruebas , Mapeo de Interacción de Proteínas/clasificación , Estándares de Referencia , Alineación de Secuencia
10.
Proteins ; 63(3): 490-500, 2006 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-16450363

RESUMEN

Protein-protein interactions play a key role in many biological systems. High-throughput methods can directly detect the set of interacting proteins in yeast, but the results are often incomplete and exhibit high false-positive and false-negative rates. Recently, many different research groups independently suggested using supervised learning methods to integrate direct and indirect biological data sources for the protein interaction prediction task. However, the data sources, approaches, and implementations varied. Furthermore, the protein interaction prediction task itself can be subdivided into prediction of (1) physical interaction, (2) co-complex relationship, and (3) pathway co-membership. To investigate systematically the utility of different data sources and the way the data is encoded as features for predicting each of these types of protein interactions, we assembled a large set of biological features and varied their encoding for use in each of the three prediction tasks. Six different classifiers were used to assess the accuracy in predicting interactions, Random Forest (RF), RF similarity-based k-Nearest-Neighbor, Naïve Bayes, Decision Tree, Logistic Regression, and Support Vector Machine. For all classifiers, the three prediction tasks had different success rates, and co-complex prediction appears to be an easier task than the other two. Independently of prediction task, however, the RF classifier consistently ranked as one of the top two classifiers for all combinations of feature sets. Therefore, we used this classifier to study the importance of different biological datasets. First, we used the splitting function of the RF tree structure, the Gini index, to estimate feature importance. Second, we determined classification accuracy when only the top-ranking features were used as an input in the classifier. We find that the importance of different features depends on the specific prediction task and the way they are encoded. Strikingly, gene expression is consistently the most important feature for all three prediction tasks, while the protein interactions identified using the yeast-2-hybrid system were not among the top-ranking features under any condition.


Asunto(s)
Biología Computacional/clasificación , Biología Computacional/métodos , Bases de Datos de Proteínas/clasificación , Mapeo de Interacción de Proteínas/clasificación , Mapeo de Interacción de Proteínas/métodos , Predicción
11.
BMC Bioinformatics ; 5: 75, 2004 Jun 09.
Artículo en Inglés | MEDLINE | ID: mdl-15189571

RESUMEN

BACKGROUND: The increasing number of protein sequences and 3D structure obtained from genomic initiatives is leading many of us to focus on proteomics, and to dedicate our experimental and computational efforts on the creation and analysis of information derived from 3D structure. In particular, the high-throughput generation of protein-protein interaction data from a few organisms makes such an approach very important towards understanding the molecular recognition that make-up the entire protein-protein interaction network. Since the generation of sequences, and experimental protein-protein interactions increases faster than the 3D structure determination of protein complexes, there is tremendous interest in developing in silico methods that generate such structure for prediction and classification purposes. In this study we focused on classifying protein family members based on their protein-protein interaction distinctiveness. Structure-based classification of protein-protein interfaces has been described initially by Ponstingl et al. 1 and more recently by Valdar et al. 2 and Mintseris et al. 3, from complex structures that have been solved experimentally. However, little has been done on protein classification based on the prediction of protein-protein complexes obtained from homology modeling and docking simulation. RESULTS: We have developed an in silico classification system entitled HODOCO (Homology modeling, Docking and Classification Oracle), in which protein Residue Potential Interaction Profiles (RPIPS) are used to summarize protein-protein interaction characteristics. This system applied to a dataset of 64 proteins of the death domain superfamily was used to classify each member into its proper subfamily. Two classification methods were attempted, heuristic and support vector machine learning. Both methods were tested with a 5-fold cross-validation. The heuristic approach yielded a 61% average accuracy, while the machine learning approach yielded an 89% average accuracy. CONCLUSION: We have confirmed the reliability and potential value of classifying proteins via their predicted interactions. Our results are in the same range of accuracy as other studies that classify protein-protein interactions from 3D complex structure obtained experimentally. While our classification scheme does not take directly into account sequence information our results are in agreement with functional and sequence based classification of death domain family members.


Asunto(s)
Mapeo de Interacción de Proteínas/clasificación , Proteínas/química , Proteínas/clasificación , Humanos , Péptidos/clasificación , Péptidos/fisiología , Valor Predictivo de las Pruebas , Estructura Cuaternaria de Proteína , Proteómica/métodos , Programas Informáticos
12.
Genome Res ; 13(6A): 1231-43, 2003 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-12799355

RESUMEN

Protein kinases play a crucial role in the regulation of cellular functions. Various kinds of information about these molecules are important for understanding signaling pathways and organism characteristics. We have developed the Kinase Pathway Database, an integrated database involving major completely sequenced eukaryotes. It contains the classification of protein kinases and their functional conservation, ortholog tables among species, protein-protein, protein-gene, and protein-compound interaction data, domain information, and structural information. It also provides an automatic pathway graphic image interface. The protein, gene, and compound interactions are automatically extracted from abstracts for all genes and proteins by natural-language processing (NLP). The method of automatic extraction uses phrase patterns and the GENA protein, gene, and compound name dictionary, which was developed by our group. With this database, pathways are easily compared among species using data with more than 47,000 protein interactions and protein kinase ortholog tables. The database is available for querying and browsing at http://kinasedb.ontology.ims.u-tokyo.ac.jp/.


Asunto(s)
Bases de Datos Genéticas/clasificación , Procesamiento de Lenguaje Natural , Fosfotransferasas/clasificación , Fosfotransferasas/metabolismo , Mapeo de Interacción de Proteínas/clasificación , Proteínas Quinasas/clasificación , Proteínas Quinasas/metabolismo , Animales , Caenorhabditis elegans/clasificación , Caenorhabditis elegans/enzimología , Proteínas de Caenorhabditis elegans/clasificación , Proteínas de Drosophila/clasificación , Drosophila melanogaster/clasificación , Drosophila melanogaster/enzimología , Humanos , Internet , Ratones , Ratas , Saccharomyces cerevisiae/clasificación , Saccharomyces cerevisiae/enzimología , Proteínas de Saccharomyces cerevisiae/clasificación
13.
BMC Bioinformatics ; 4: 11, 2003 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-12689350

RESUMEN

BACKGROUND: The majority of experimentally verified molecular interaction and biological pathway data are present in the unstructured text of biomedical journal articles where they are inaccessible to computational methods. The Biomolecular interaction network database (BIND) seeks to capture these data in a machine-readable format. We hypothesized that the formidable task-size of backfilling the database could be reduced by using Support Vector Machine technology to first locate interaction information in the literature. We present an information extraction system that was designed to locate protein-protein interaction data in the literature and present these data to curators and the public for review and entry into BIND. RESULTS: Cross-validation estimated the support vector machine's test-set precision, accuracy and recall for classifying abstracts describing interaction information was 92%, 90% and 92% respectively. We estimated that the system would be able to recall up to 60% of all non-high throughput interactions present in another yeast-protein interaction database. Finally, this system was applied to a real-world curation problem and its use was found to reduce the task duration by 70% thus saving 176 days. CONCLUSIONS: Machine learning methods are useful as tools to direct interaction and pathway database back-filling; however, this potential can only be realized if these techniques are coupled with human review and entry into a factual database such as BIND. The PreBIND system described here is available to the public at http://bind.ca. Current capabilities allow searching for human, mouse and yeast protein-interaction information.


Asunto(s)
Inteligencia Artificial , Almacenamiento y Recuperación de la Información/tendencias , Mapeo de Interacción de Proteínas/métodos , Algoritmos , Biología Computacional/métodos , Biología Computacional/estadística & datos numéricos , Bases de Datos Factuales/tendencias , Bases de Datos de Proteínas/tendencias , Genoma Fúngico , Mapeo de Interacción de Proteínas/clasificación , Mapeo de Interacción de Proteínas/estadística & datos numéricos , PubMed/clasificación , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/química
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA