Text mining for contexts and relationships in cancer genomics literature.

Collins, Charlotte; Baker, Simon; Brown, Jason; Zheng, Huiyuan; Chan, Adelyne; Stenius, Ulla; Narita, Masashi; Korhonen, Anna

Collins, Charlotte; Baker, Simon; Brown, Jason; Zheng, Huiyuan; Chan, Adelyne; Stenius, Ulla; Narita, Masashi; Korhonen, Anna.

Afiliação

Collins C; Language Technology Laboratory, Theoretical and Applied Linguistics, Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge CB3 9DA, United Kingdom.
Baker S; Language Technology Laboratory, Theoretical and Applied Linguistics, Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge CB3 9DA, United Kingdom.
Brown J; Language Technology Laboratory, Theoretical and Applied Linguistics, Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge CB3 9DA, United Kingdom.
Zheng H; Institute of Environmental Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden.
Chan A; Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge CB2 0RE, United Kingdom.
Stenius U; Institute of Environmental Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden.
Narita M; Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge CB2 0RE, United Kingdom.
Korhonen A; Language Technology Laboratory, Theoretical and Applied Linguistics, Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge CB3 9DA, United Kingdom.

Bioinformatics ; 40(1)2024 01 02.

Article em En | MEDLINE | ID: mdl-38258418

ABSTRACT

ABSTRACT

MOTIVATION Scientific advances build on the findings of existing research. The 2001 publication of the human genome has led to the production of huge volumes of literature exploring the context-specific functions and interactions of genes. Technology is needed to perform large-scale text mining of research papers to extract the reported actions of genes in specific experimental contexts and cell states, such as cancer, thereby facilitating the design of new therapeutic strategies.

RESULTS:

We present a new corpus and Text Mining methodology that can accurately identify and extract the most important details of cancer genomics experiments from biomedical texts. We build a Named Entity Recognition model that accurately extracts relevant experiment details from PubMed abstract text, and a second model that identifies the relationships between them. This system outperforms earlier models and enables the analysis of gene function in diverse and dynamically evolving experimental contexts. AVAILABILITY AND IMPLEMENTATION Code and data are available here https//github.com/cambridgeltl/functional-genomics-ie.

Assuntos

Genômica; Neoplasias; Humanos; Neoplasias/genética; Mineração de Dados/métodos; PubMed; Fenótipo

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Genômica / Neoplasias Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: Bioinformatics / Bioinformatics (Oxford. Online) Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Reino Unido

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google