Validation of a machine learning approach to estimate Systemic Lupus Erythematosus Disease Activity Index score categories and application in a real-world dataset.

Alves, Pedro; Bandaria, Jigar; Leavy, Michelle B; Gliklich, Benjamin; Boussios, Costas; Su, Zhaohui; Curhan, Gary

Alves, Pedro; Bandaria, Jigar; Leavy, Michelle B; Gliklich, Benjamin; Boussios, Costas; Su, Zhaohui; Curhan, Gary.

Afiliação

Alves P; Data Science, OM1 Inc, Boston, Massachusetts, USA.
Bandaria J; Data Science, OM1 Inc, Boston, Massachusetts, USA.
Leavy MB; Research, OM1 Inc, Boston, Massachusetts, USA mleavy@om1.com.
Gliklich B; Research, Noble and Greenough School, Dedham, Massachusetts, USA.
Boussios C; Data Science, OM1 Inc, Boston, Massachusetts, USA.
Su Z; Biostatistics, OM1 Inc, Boston, Massachusetts, USA.
Curhan G; Research, OM1 Inc, Boston, Massachusetts, USA.

RMD Open ; 7(2)2021 05.

Article em En | MEDLINE | ID: mdl-34016712

ABSTRACT

ABSTRACT

OBJECTIVE:

Use of the Systemic Lupus Erythematosus Disease Activity Index (SLEDAI) in routine clinical practice is inconsistent, and availability of clinician-recorded SLEDAI scores in real-world datasets is limited. This study aimed to validate a machine learning model to estimate SLEDAI score categories using clinical notes and to apply the model to a large, real-world dataset to generate estimated score categories for use in future research studies.

METHODS:

A machine learning model was developed to estimate an individual patient's SLEDAI score category (no activity, mild activity, moderate activity or high/very high activity) for a specific encounter date using clinical notes. A training cohort of 3504 encounters and a separate validation cohort of 1576 encounters were created from the OM1 SLE Registry. Model performance was assessed using the area under the receiver operating characteristic curve (AUC), calculated using a binarised version of the outcome that sets the positive class to be those records with clinician-recorded SLEDAI scores >5 and the negative class to be records with scores ≤5. Model performance was evaluated by categorising the scores into the four disease activity categories and by calculating the Spearman's R value and Pearson's R value.

RESULTS:

The AUC for the two categories was 0.93 for the development cohort and 0.91 for the validation cohort. The model had a Spearman's R value of 0.7 and a Pearson's R value of 0.7 when calculated using the four disease activity categories.

CONCLUSION:

The model performs well when estimating SLEDAI score categories using unstructured clinical notes.

Assuntos

Lúpus Eritematoso Sistêmico; Estudos de Coortes; Humanos; Lúpus Eritematoso Sistêmico/diagnóstico; Lúpus Eritematoso Sistêmico/epidemiologia; Aprendizado de Máquina; Curva ROC; Índice de Gravidade de Doença

Palavras-chave

epidemiology; healthcare; lupus erythematosus; outcome assessment; systemic

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Lúpus Eritematoso Sistêmico Tipo de estudo: Diagnostic_studies / Etiology_studies / Incidence_studies / Observational_studies / Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google