Development of a medical-text parsing algorithm based on character adjacent probability distribution for Japanese radiology reports.
Methods Inf Med
; 47(6): 513-21, 2008.
Article
em En
| MEDLINE
| ID: mdl-19057808
ABSTRACT
OBJECTIVES:
The objectives of this study were to investigate the transitional probability distribution of medical term boundaries between characters and to develop a parsing algorithm specifically for medical texts.METHODS:
Medical terms in Japanese computed tomography (CT) reports were identified using the ChaSen morphological analysis system. MeSH-based medical terms (51,385 entries), obtained from the metathesaurus in the Unified Medical Language System (UMLS, 2005AA), were added as a medical dictionary for ChaSen. A radiographer corrected the set of results containing 300 parsed CT reports. In addition, two radiologists checked the medical term parsing of 200 CT sentences.RESULTS:
We obtained modified inter-annotator agreement scores for the text corrected by the radiologists. We retrieved the transitional probability as the conditional probability of a uni-gram, bi-gram, and tri-gram. The highest transitional probability P(Ci | Ci- 2(*)Ci- 1) was 1.00. For an example of anatomical location, the term "pulmonary hilum" was parsed as a tri-gram.CONCLUSIONS:
Retrieval of transitional probability will improve the accuracy of parsing compound medical terms.
Buscar no Google
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Radiologia
/
Algoritmos
/
Informática Médica
/
Processamento de Linguagem Natural
/
Tomografia Computadorizada por Raios X
/
Probabilidade
/
Unified Medical Language System
/
Terminologia como Assunto
Tipo de estudo:
Health_economic_evaluation
/
Prognostic_studies
/
Risk_factors_studies
Limite:
Humans
País/Região como assunto:
Asia
Idioma:
En
Revista:
Methods Inf Med
Ano de publicação:
2008
Tipo de documento:
Article
País de afiliação:
Japão