CAT: computer aided triage improving upon the Bayes risk through Îµ-refusal triage rules.

Hengartner, Nicolas; Cuellar, Leticia; Wu, Xiao-Cheng; Tourassi, Georgia; Qiu, John; Christian, Blair; Bhattacharya, Tanmoy

Hengartner, Nicolas; Cuellar, Leticia; Wu, Xiao-Cheng; Tourassi, Georgia; Qiu, John; Christian, Blair; Bhattacharya, Tanmoy.

Affiliation

Hengartner N; Los Alamos National Laboratory, PO Box 1663, Los Alamos, 87545, NM, USA. nickh@lanl.gov.
Cuellar L; Los Alamos National Laboratory, PO Box 1663, Los Alamos, 87545, NM, USA.
Wu XC; Louisiana State University, 2020 Gravier Street, 3rd Floor, New Orleans, 70112, LA, USA.
Tourassi G; Oak Ridge National Laboratory, PO Box 2008, Oak Ridge, 37831, TN, USA.
Qiu J; Oak Ridge National Laboratory, PO Box 2008, Oak Ridge, 37831, TN, USA.
Christian B; Oak Ridge National Laboratory, PO Box 2008, Oak Ridge, 37831, TN, USA.
Bhattacharya T; Los Alamos National Laboratory, PO Box 1663, Los Alamos, 87545, NM, USA.

BMC Bioinformatics ; 19(Suppl 18): 485, 2018 Dec 21.

Article in En | MEDLINE | ID: mdl-30577756

ABSTRACT

ABSTRACT

BACKGROUND:

Manual extraction of information from electronic pathology (epath) reports to populate the Surveillance, Epidemiology, and End Result (SEER) database is labor intensive. Systematizing the data extraction automatically using machine-learning (ML) and natural language processing (NLP) is desirable to reduce the human labor required to populate the SEER database and to improve the timeliness of the data. This enables scaling up registry efficiency and collection of new data elements. To ensure the integrity, quality, and continuity of the SEER data, the misclassification error of ML and NPL algorithms needs to be negligible. Current algorithms fail to achieve the precision of human experts who can bring additional information in their assessments. Differences in registry format and the desire to develop a common information extraction platform further complicate the ML/NLP tasks. The purpose of our study is to develop triage rules to partially automate registry workflow to improve the precision of the auto-extracted information.

RESULTS:

This paper presents a mathematical framework to improve the precision of a classifier beyond that of the Bayes classifier by selectively classifying item that are most likely to be correct. This results in a triage rule that only classifies a subset of the item. We characterize the optimal triage rule and demonstrate its usefulness in the problem of classifying cancer site from electronic pathology reports to achieve a desired precision.

CONCLUSIONS:

From the mathematical formalism, we propose a heuristic estimate for triage rule based on post-processing the soft-max output from standard machine learning algorithms. We show, in test cases, that the triage rule significantly improve the classification accuracy.

Subject(s)
Key words

Classification; Machine learning

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Computers / Triage / Databases, Factual Type of study: Etiology_studies / Prognostic_studies / Risk_factors_studies Limits: Humans Language: En Journal: BMC Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2018 Document type: Article Affiliation country:

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google