Automatic ICD-10 classification of cancers from free-text death certificates.

Koopman, Bevan; Zuccon, Guido; Nguyen, Anthony; Bergheim, Anton; Grayson, Narelle

Koopman, Bevan; Zuccon, Guido; Nguyen, Anthony; Bergheim, Anton; Grayson, Narelle.

Afiliação

Koopman B; The Australian e-Health Research Centre, CSIRO, Brisbane, Australia. Electronic address: bevan.koopman@csiro.au.
Zuccon G; Queensland University of Technology, Brisbane, Australia. Electronic address: g.zuccon@qut.edu.au.
Nguyen A; The Australian e-Health Research Centre, CSIRO, Brisbane, Australia. Electronic address: Anthony.Nguyen@csiro.au.
Bergheim A; Cancer Institute NSW, Sydney, Australia. Electronic address: anton.bergheim@cancerinstitute.org.au.
Grayson N; Cancer Institute NSW, Sydney, Australia. Electronic address: Narelle.Grayson@cancerinstitute.org.au.

Int J Med Inform ; 84(11): 956-65, 2015 Nov.

Article em En | MEDLINE | ID: mdl-26323193

RESUMO

OBJECTIVE: Death certificates provide an invaluable source for cancer mortality statistics; however, this value can only be realised if accurate, quantitative data can be extracted from certificates--an aim hampered by both the volume and variable nature of certificates written in natural language. This paper proposes an automatic classification system for identifying cancer related causes of death from death certificates. METHODS: Detailed features, including terms, n-grams and SNOMED CT concepts were extracted from a collection of 447,336 death certificates. These features were used to train Support Vector Machine classifiers (one classifier for each cancer type). The classifiers were deployed in a cascaded architecture: the first level identified the presence of cancer (i.e., binary cancer/nocancer) and the second level identified the type of cancer (according to the ICD-10 classification system). A held-out test set was used to evaluate the effectiveness of the classifiers according to precision, recall and F-measure. In addition, detailed feature analysis was performed to reveal the characteristics of a successful cancer classification model. RESULTS: The system was highly effective at identifying cancer as the underlying cause of death (F-measure 0.94). The system was also effective at determining the type of cancer for common cancers (F-measure 0.7). Rare cancers, for which there was little training data, were difficult to classify accurately (F-measure 0.12). Factors influencing performance were the amount of training data and certain ambiguous cancers (e.g., those in the stomach region). The feature analysis revealed a combination of features were important for cancer type classification, with SNOMED CT concept and oncology specific morphology features proving the most valuable. CONCLUSION: The system proposed in this study provides automatic identification and characterisation of cancers from large collections of free-text death certificates. This allows organisations such as Cancer Registries to monitor and report on cancer mortality in a timely and accurate manner. In addition, the methods and findings are generally applicable beyond cancer classification and to other sources of medical text besides death certificates.

Assuntos

Atestado de Óbito; Aprendizado de Máquina; Processamento de Linguagem Natural; Neoplasias/classificação; Neoplasias/mortalidade; Causas de Morte; Humanos; Classificação Internacional de Doenças; Aprendizado de Máquina/normas; New South Wales/epidemiologia; Avaliação de Programas e Projetos de Saúde; Sistema de Registros

Palavras-chave

Cancer classification; Death certificates; Machine learning; Natural language processing

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Processamento de Linguagem Natural / Atestado de Óbito / Aprendizado de Máquina / Neoplasias Tipo de estudo: Evaluation_studies / Prognostic_studies Limite: Humans País/Região como assunto: Oceania Idioma: En Revista: Int J Med Inform Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2015 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google