Your browser doesn't support javascript.
loading
Deep neural networks and distant supervision for geographic location mention extraction.
Magge, Arjun; Weissenbacher, Davy; Sarker, Abeed; Scotch, Matthew; Gonzalez-Hernandez, Graciela.
Afiliação
  • Magge A; Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, USA.
  • Weissenbacher D; Biodesign Center for Environmental Health Engineering, Biodesign Institute, Arizona State University, Tempe, AZ, USA.
  • Sarker A; Department of Biostatistics, Epidemiology, and Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
  • Scotch M; Department of Biostatistics, Epidemiology, and Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
  • Gonzalez-Hernandez G; Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, USA.
Bioinformatics ; 34(13): i565-i573, 2018 07 01.
Article em En | MEDLINE | ID: mdl-29950020
ABSTRACT
Motivation Virus phylogeographers rely on DNA sequences of viruses and the locations of the infected hosts found in public sequence databases like GenBank for modeling virus spread. However, the locations in GenBank records are often only at the country or state level, and may require phylogeographers to scan the journal articles associated with the records to identify more localized geographic areas. To automate this process, we present a named entity recognizer (NER) for detecting locations in biomedical literature. We built the NER using a deep feedforward neural network to determine whether a given token is a toponym or not. To overcome the limited human annotated data available for training, we use distant supervision techniques to generate additional samples to train our NER.

Results:

Our NER achieves an F1-score of 0.910 and significantly outperforms the previous state-of-the-art system. Using the additional data generated through distant supervision further boosts the performance of the NER achieving an F1-score of 0.927. The NER presented in this research improves over previous systems significantly. Our experiments also demonstrate the NER's capability to embed external features to further boost the system's performance. We believe that the same methodology can be applied for recognizing similar biomedical entities in scientific literature.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Vírus / Armazenamento e Recuperação da Informação / Filogeografia / Aprendizado Profundo Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2018 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Vírus / Armazenamento e Recuperação da Informação / Filogeografia / Aprendizado Profundo Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2018 Tipo de documento: Article País de afiliação: Estados Unidos