The prediction of hospital length of stay using unstructured data.

Chrusciel, Jan; Girardon, François; Roquette, Lucien; Laplanche, David; Duclos, Antoine; Sanchez, Stéphane

Chrusciel, Jan; Girardon, François; Roquette, Lucien; Laplanche, David; Duclos, Antoine; Sanchez, Stéphane.

Afiliação

Chrusciel J; Pôle Territorial Santé Publique et Performance, Centre Hospitalier de Troyes, 101 Avenue Anatole France CS 10718, 10003, Troyes Cedex, France.
Girardon F; Research and Consulting, CODOC SAS, 75008, Paris, France.
Roquette L; Research and Consulting, CODOC SAS, 75008, Paris, France.
Laplanche D; Pôle Territorial Santé Publique et Performance, Centre Hospitalier de Troyes, 101 Avenue Anatole France CS 10718, 10003, Troyes Cedex, France.
Duclos A; Research on Healthcare Performance Lab, INSERM U1290 RESHAPE, Université Claude Bernard Lyon 1, Villeurbanne, France.
Sanchez S; Health Data Department, Hospices Civils de Lyon, Lyon, France.

BMC Med Inform Decis Mak ; 21(1): 351, 2021 12 18.

Article em En | MEDLINE | ID: mdl-34922532

RESUMO

OBJECTIVE: This study aimed to assess the performance improvement for machine learning-based hospital length of stay (LOS) predictions when clinical signs written in text are accounted for and compared to the traditional approach of solely considering structured information such as age, gender and major ICD diagnosis. METHODS: This study was an observational retrospective cohort study and analyzed patient stays admitted between 1 January to 24 September 2019. For each stay, a patient was admitted through the Emergency Department (ED) and stayed for more than two days in the subsequent service. LOS was predicted using two random forest models. The first included unstructured text extracted from electronic health records (EHRs). A word-embedding algorithm based on UMLS terminology with exact matching restricted to patient-centric affirmation sentences was used to assess the EHR data. The second model was primarily based on structured data in the form of diagnoses coded from the International Classification of Disease 10th Edition (ICD-10) and triage codes (CCMU/GEMSA classifications). Variables common to both models were: age, gender, zip/postal code, LOS in the ED, recent visit flag, assigned patient ward after the ED stay and short-term ED activity. Models were trained on 80% of data and performance was evaluated by accuracy on the remaining 20% test data. RESULTS: The model using unstructured data had a 75.0% accuracy compared to 74.1% for the model containing structured data. The two models produced a similar prediction in 86.6% of cases. In a secondary analysis restricted to intensive care patients, the accuracy of both models was also similar (76.3% vs 75.0%). CONCLUSIONS: LOS prediction using unstructured data had similar accuracy to using structured data and can be considered of use to accurately model LOS.

Assuntos

Serviço Hospitalar de Emergência; Hospitalização; Hospitais; Humanos; Tempo de Internação; Estudos Retrospectivos

Palavras-chave

Data mining; Emergency department; Health services research; Length of stay

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Serviço Hospitalar de Emergência / Hospitalização Tipo de estudo: Observational_studies / Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google