Data-Centric AI for Healthcare Fraud Detection.

Johnson, Justin M; Khoshgoftaar, Taghi M

Johnson, Justin M; Khoshgoftaar, Taghi M.

Afiliación

Johnson JM; Florida Atlantic University, Boca Raton, FL USA.
Khoshgoftaar TM; Florida Atlantic University, Boca Raton, FL USA.

SN Comput Sci ; 4(4): 389, 2023.

Article en En | MEDLINE | ID: mdl-37200563

RESUMEN

Automated methods for detecting fraudulent healthcare providers have the potential to save billions of dollars in healthcare costs and improve the overall quality of patient care. This study presents a data-centric approach to improve healthcare fraud classification performance and reliability using Medicare claims data. Publicly available data from the Centers for Medicare & Medicaid Services (CMS) are used to construct nine large-scale labeled data sets for supervised learning. First, we leverage CMS data to curate the 2013-2019 Part B, Part D, and Durable Medical Equipment, Prosthetics, Orthotics, and Supplies (DMEPOS) Medicare fraud classification data sets. We provide a review of each data set and data preparation techniques to create Medicare data sets for supervised learning and we propose an improved data labeling process. Next, we enrich the original Medicare fraud data sets with up to 58 new provider summary features. Finally, we address a common model evaluation pitfall and propose an adjusted cross-validation technique that mitigates target leakage to provide reliable evaluation results. Each data set is evaluated on the Medicare fraud classification task using extreme gradient boosting and random forest learners, multiple complementary performance metrics, and 95% confidence intervals. Results show that the new enriched data sets consistently outperform the original Medicare data sets that are currently used in related works. Our results encourage the data-centric machine learning workflow and provide a strong foundation for data understanding and preparation techniques for machine learning applications in healthcare fraud.

Palabras clave

Big data; Data labeling; Data preparation; Data quality; Fraud detection; Healthcare

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Tipo de estudio: Diagnostic_studies Idioma: En Revista: SN Comput Sci Año: 2023 Tipo del documento: Article Pais de publicación: Singapur

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google