Validation of a Semiautomated Natural Language Processing-Based Procedure for Meta-Analysis of Cancer Susceptibility Gene Penetrance.

Deng, Zhengyi; Yin, Kanhua; Bao, Yujia; Armengol, Victor Diego; Wang, Cathy; Tiwari, Ankur; Barzilay, Regina; Parmigiani, Giovanni; Braun, Danielle; Hughes, Kevin S

Deng, Zhengyi; Yin, Kanhua; Bao, Yujia; Armengol, Victor Diego; Wang, Cathy; Tiwari, Ankur; Barzilay, Regina; Parmigiani, Giovanni; Braun, Danielle; Hughes, Kevin S.

Afiliação

Deng Z; Massachusetts General Hospital, Boston, MA.
Yin K; Massachusetts General Hospital, Boston, MA.
Bao Y; Massachusetts Institute of Technology, Boston, MA.
Armengol VD; Massachusetts General Hospital, Boston, MA.
Wang C; Harvard TH Chan School of Public Health, Boston, MA.
Tiwari A; Dana-Farber Cancer Institute, Boston, MA.
Barzilay R; Massachusetts General Hospital, Boston, MA.
Parmigiani G; Massachusetts Institute of Technology, Boston, MA.
Braun D; Harvard TH Chan School of Public Health, Boston, MA.
Hughes KS; Dana-Farber Cancer Institute, Boston, MA.

JCO Clin Cancer Inform ; 3: 1-9, 2019 08.

Article em En | MEDLINE | ID: mdl-31419182

ABSTRACT

ABSTRACT

PURPOSE:

Quantifying the risk of cancer associated with pathogenic mutations in germline cancer susceptibility genes-that is, penetrance-enables the personalization of preventive management strategies. Conducting a meta-analysis is the best way to obtain robust risk estimates. We have previously developed a natural language processing (NLP) -based abstract classifier which classifies abstracts as relevant to penetrance, prevalence of mutations, both, or neither. In this work, we evaluate the performance of this NLP-based procedure. MATERIALS AND

METHODS:

We compared the semiautomated NLP-based procedure, which involves automated abstract classification and text mining, followed by human review of identified studies, with the traditional procedure that requires human review of all studies. Ten high-quality gene-cancer penetrance meta-analyses spanning 16 gene-cancer associations were used as the gold standard by which to evaluate the performance of our procedure. For each meta-analysis, we evaluated the number of abstracts that required human review (workload) and the ability to identify the studies that were included by the authors in their quantitative analysis (coverage).

RESULTS:

Compared with the traditional procedure, the semiautomated NLP-based procedure led to a lower workload across all 10 meta-analyses, with an overall 84% reduction (2,774 abstracts v 16,941 abstracts) in the amount of human review required. Overall coverage was 93%-we are able to identify 132 of 142 studies-before reviewing references of identified studies. Reasons for the 10 missed studies included blank and poorly written abstracts. After reviewing references, nine of the previously missed studies were identified and coverage improved to 99% (141 of 142 studies).

CONCLUSION:

We demonstrated that an NLP-based procedure can significantly reduce the review workload without compromising the ability to identify relevant studies. NLP algorithms have promising potential for reducing human efforts in the literature review process.

Assuntos

Biomarcadores Tumorais; Predisposição Genética para Doença; Processamento de Linguagem Natural; Neoplasias/genética; Penetrância; Algoritmos; Biologia Computacional/métodos; Humanos; Reprodutibilidade dos Testes; Fluxo de Trabalho

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Processamento de Linguagem Natural / Biomarcadores Tumorais / Penetrância / Predisposição Genética para Doença / Neoplasias Idioma: En Ano de publicação: 2019 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google