RESUMO
BACKGROUND: Chronic lymphocytic leukemia (CLL) is one of the most common types of leukemia in the western world which affects mainly the elderly population. Progress of the disease is very heterogeneous both in terms of necessity of treatment and life expectancy. The current scoring system for prognostic evaluation of patients with CLL is called CLL-IPI and predicts the general progress of the disease but is not a measure or a decision aid for the necessity of treatment. Due to the heterogeneous behavior of CLL it is important to develop tools that will identify if and when patients will necessitate treatment for CLL. Recently, Machine Learning (ML) has spread to many public health fields including diagnosis and prognosis of diseases. OBJECTIVE: Existing machine learning methods for CLL treatment prediction rely on expensive tests, such as genetic tests, rendering them useless in peripheral or low-resource clinics such as those in developing countries. We aim to develop a model for predicting whether a patient will need treatment for CLL within two years of diagnosis using a machine learning model based on only on demographic data and routine laboratory tests. METHOD: We conducted a single center study that included adult patients (above the age of 18) that were diagnosed with CLL according to the IWCLL criteria and were under observation at the hematology unit of the Bnai-Zion medical center between 2009 and 2019. Patient data include demographic, clinical and laboratory measures that were extracted from patients' medical records anonymously. All laboratory results, during the observation period, were extracted for the entire cohort. Multiple ML approaches for classifying whether a patient will require treatment during a predetermined period of 2 years were evaluated. Performance of the ML models was measured using repeated cross validation. We evaluated the use of SHapley Additive exPlanation (SHAP) for explaining what influences the models decision. Additionally, we employ a method for extracting a single decision tree from the ML model which enables the doctor to understand the main logic governing the model prediction. RESULTS: The study included 109 patients of them 67 males (61%). Patients were under observation for a median of 44 months and the median age was 65 (age range: 45-87). 64% of the cohort received therapy during follow-up. A Gradient Boosting Model (GBM) model using all of the extracted variables to identify the need for treatment in the coming two years among patients with CLL achieved the AUPRC of 0.78 (±0.08). An identical GBM model, without genetic/FISH and flowcytometry (FACS) data, such that it can be used in peripheral clinics, scored an AUPRC of 0.7686 (±0.0837). A Generalized Linear Model (GLM) using the same features, scored an AUPRC of 0.7535 (±0.0995). All the models described above surpassed the performance of CLL-IPI that was evaluated using the CLL-TIM model. According to the SHAP results, red blood cell (RBC) count was the most predictive value for the necessity for treatment, where a high value is associated with a low probability of requiring treatment in the coming two years. Additionally, the SHAP method was used for estimating the personal risk of a random patient and showed sensible results. A simple Decision Tree classifier showed that patients who had a hemoglobin level of less than 13 gm/dL and a Neutrophil to Lymphocyte Ratio (NLR) less than 0.063, which constituted 34% percent of the patients included in our study, had a high probability (76%) of requiring treatment. CONCLUSIONS: Machine Learning algorithms that were evaluated in this work for predicting the necessity of treatment for patients with CLL achieved reasonable accuracy which surpassed that of CLL-IPI which was evaluated using the CLL-TIM model. Furthermore, we found that a machine learning model trained exclusively using inexpensive features only incurred a modest decrease in performance compared to the model trained using all of the features. Due to the small number of patients in this study it is necessary to validate the results on a larger population.