Your browser doesn't support javascript.
loading
Overview of the protein-protein interaction annotation extraction task of BioCreative II.
Krallinger, Martin; Leitner, Florian; Rodriguez-Penagos, Carlos; Valencia, Alfonso.
Afiliación
  • Krallinger M; Structural Biology and BioComputing Programme, Spanish Nacional Cancer Research Centre (CNIO), Madrid, Spain.
Genome Biol ; 9 Suppl 2: S4, 2008.
Article en En | MEDLINE | ID: mdl-18834495
BACKGROUND: The biomedical literature is the primary information source for manual protein-protein interaction annotations. Text-mining systems have been implemented to extract binary protein interactions from articles, but a comprehensive comparison between the different techniques as well as with manual curation was missing. RESULTS: We designed a community challenge, the BioCreative II protein-protein interaction (PPI) task, based on the main steps of a manual protein interaction annotation workflow. It was structured into four distinct subtasks related to: (a) detection of protein interaction-relevant articles; (b) extraction and normalization of protein interaction pairs; (c) retrieval of the interaction detection methods used; and (d) retrieval of actual text passages that provide evidence for protein interactions. A total of 26 teams submitted runs for at least one of the proposed subtasks. In the interaction article detection subtask, the top scoring team reached an F-score of 0.78. In the interaction pair extraction and mapping to SwissProt, a precision of 0.37 (with recall of 0.33) was obtained. For associating articles with an experimental interaction detection method, an F-score of 0.65 was achieved. As for the retrieval of the PPI passages best summarizing a given protein interaction in full-text articles, 19% of the submissions returned by one of the runs corresponded to curator-selected sentences. Curators extracted only the passages that best summarized a given interaction, implying that many of the automatically extracted ones could contain interaction information but did not correspond to the most informative sentences. CONCLUSION: The BioCreative II PPI task is the first attempt to compare the performance of text-mining tools specific for each of the basic steps of the PPI extraction pipeline. The challenges identified range from problems in full-text format conversion of articles to difficulties in detecting interactor protein pairs and then linking them to their database records. Some limitations were also encountered when using a single (and possibly incomplete) reference database for protein normalization or when limiting search for interactor proteins to co-occurrence within a single sentence, when a mention might span neighboring sentences. Finally, distinguishing between novel, experimentally verified interactions (annotation relevant) and previously known interactions adds additional complexity to these tasks.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Sociedades Científicas / Biología Computacional / Mapeo de Interacción de Proteínas Tipo de estudio: Prognostic_studies Límite: Animals / Humans Idioma: En Revista: Genome Biol Asunto de la revista: BIOLOGIA MOLECULAR / GENETICA Año: 2008 Tipo del documento: Article País de afiliación: España

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Sociedades Científicas / Biología Computacional / Mapeo de Interacción de Proteínas Tipo de estudio: Prognostic_studies Límite: Animals / Humans Idioma: En Revista: Genome Biol Asunto de la revista: BIOLOGIA MOLECULAR / GENETICA Año: 2008 Tipo del documento: Article País de afiliación: España