RESUMEN
We performed the multi-year project to collect discharge summary from multiple hospitals and made the big text database to build a common document vector space, and developed various applications. We extracted 243,907 discharge summaries from seven hospitals. There was a difference in term structure and number of terms between the hospitals, however the differences by disease were similar. We built the vector space using TF-IDF method. We performed a cross-match analysis of DPC selection among seven hospitals. About 80% cases were correctly matched. The use of model data of other hospitals reduced selection rate to around 10%; however, integrated model data from all hospitals restored the selection rate.
Asunto(s)
Minería de Datos/métodos , Bases de Datos Factuales , Registros Electrónicos de Salud/organización & administración , Registro Médico Coordinado/métodos , Resumen del Alta del Paciente/estadística & datos numéricos , Vocabulario Controlado , Exactitud de los Datos , Sistemas de Administración de Bases de Datos , Japón , Estudios Multicéntricos como Asunto , Procesamiento de Lenguaje Natural , Integración de SistemasRESUMEN
We started a multi-year project to collect discharge summaries from multiple hospitals and create a big text database to build a common document vector space, and develop various applications such as the autoselection of the disease. As the first step, we extracted discharge summary from two hospitals. Using a text mining method, we carried out a DPC selection. There was a difference in term structure and number of terms between the discharge summaries from both hospitals. Nevertheless, the selection rate of the disease is resembled closely.