A Guide to Dictionary-Based Text Mining.

Cook, Helen V; Jensen, Lars Juhl

Cook, Helen V; Jensen, Lars Juhl.

Afiliação

Cook HV; School of Clinical Medicine, University of Cambridge, Cambridge, UK.
Jensen LJ; Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.

Methods Mol Biol ; 1939: 73-89, 2019.

Article em En | MEDLINE | ID: mdl-30848457

RESUMO

PubMed contains more than 27 million documents, and this number is growing at an estimated 4% per year. Even within specialized topics, it is no longer possible for a researcher to read any field in its entirety, and thus nobody has a complete picture of the scientific knowledge in any given field at any time. Text mining provides a means to automatically read this corpus and to extract the relations found therein as structured information. Having data in a structured format is a huge boon for computational efforts to access, cross reference, and mine the data stored therein. This is increasingly useful as biological research is becoming more focused on systems and multi-omics integration. This chapter provides an overview of the steps that are required for text mining: tokenization, named entity recognition, normalization, event extraction, and benchmarking. It discusses a variety of approaches to these tasks and then goes into detail on how to prepare data for use specifically with the JensenLab tagger. This software uses a dictionary-based approach and provides the text mining evidence for STRING and several other databases.

Assuntos

Biologia Computacional/métodos; Mineração de Dados/métodos; Algoritmos; Animais; Humanos; PubMed; Software

Palavras-chave

Automated text processing; Dictionary-based approach; Named entity recognition; PubMed; Structured information; Text mining; Text normalization

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Limite: Animals / Humans Idioma: En Ano de publicação: 2019 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Limite: Animals / Humans Idioma: En Ano de publicação: 2019 Tipo de documento: Article