Development of a medical-text parsing algorithm based on character adjacent probability distribution for Japanese radiology reports.

Nishimoto, N; Terae, S; Uesugi, M; Ogasawara, K; Sakurai, T

Nishimoto, N; Terae, S; Uesugi, M; Ogasawara, K; Sakurai, T.

Afiliação

Nishimoto N; Department of Medical Informatics, Graduate School of Medicine, Hokkaido University, Sapporo, Hokkaido, Japan. nishimot@med.hokudai.ac.jp

Methods Inf Med ; 47(6): 513-21, 2008.

Article em En | MEDLINE | ID: mdl-19057808

ABSTRACT

ABSTRACT

OBJECTIVES:

The objectives of this study were to investigate the transitional probability distribution of medical term boundaries between characters and to develop a parsing algorithm specifically for medical texts.

METHODS:

Medical terms in Japanese computed tomography (CT) reports were identified using the ChaSen morphological analysis system. MeSH-based medical terms (51,385 entries), obtained from the metathesaurus in the Unified Medical Language System (UMLS, 2005AA), were added as a medical dictionary for ChaSen. A radiographer corrected the set of results containing 300 parsed CT reports. In addition, two radiologists checked the medical term parsing of 200 CT sentences.

RESULTS:

We obtained modified inter-annotator agreement scores for the text corrected by the radiologists. We retrieved the transitional probability as the conditional probability of a uni-gram, bi-gram, and tri-gram. The highest transitional probability P(Ci | Ci- 2(*)Ci- 1) was 1.00. For an example of anatomical location, the term "pulmonary hilum" was parsed as a tri-gram.

CONCLUSIONS:

Retrieval of transitional probability will improve the accuracy of parsing compound medical terms.

Assuntos

Algoritmos; Informática Médica/organização & administração; Processamento de Linguagem Natural; Probabilidade; Radiologia/métodos; Terminologia como Assunto; Tomografia Computadorizada por Raios X; Unified Medical Language System; Acesso à Informação; Humanos; Japão; Cadeias de Markov; Informática Médica/métodos; Modelos Estatísticos; Modelos Teóricos; Radiologia/organização & administração

Buscar no Google

Imprimir

XML

PubMed Links

Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Radiologia / Algoritmos / Informática Médica / Processamento de Linguagem Natural / Tomografia Computadorizada por Raios X / Probabilidade / Unified Medical Language System / Terminologia como Assunto Tipo de estudo: Health_economic_evaluation / Prognostic_studies / Risk_factors_studies Limite: Humans País/Região como assunto: Asia Idioma: En Revista: Methods Inf Med Ano de publicação: 2008 Tipo de documento: Article País de afiliação: Japão

Buscar no Google

Imprimir

XML

PubMed Links