RESUMO
Inadequate diagnostics compromise cancer care across lower- and middle-income countries (LMICs). We hypothesized that an inexpensive gene expression assay using paraffin-embedded biopsy specimens from LMICs could distinguish lymphoma subtypes without pathologist input. We reviewed all biopsy specimens obtained at the Instituto de Cancerología y Hospital Dr. Bernardo Del Valle in Guatemala City between 2006 and 2018 for suspicion of lymphoma. Diagnoses were established based on the World Health Organization classification and then binned into 9 categories: nonmalignant, aggressive B-cell, diffuse large B-cell, follicular, Hodgkin, mantle cell, marginal zone, natural killer/T-cell, or mature T-cell lymphoma. We established a chemical ligation probe-based assay (CLPA) that quantifies expression of 37 genes by capillary electrophoresis with reagent/consumable cost of approximately $10/sample. To assign bins based on gene expression, 13 models were evaluated as candidate base learners, and class probabilities from each model were then used as predictors in an extreme gradient boosting super learner. Cases with call probabilities < 60% were classified as indeterminate. Four (2%) of 194 biopsy specimens in storage <3 years experienced assay failure. Diagnostic samples were divided into 70% (n = 397) training and 30% (n = 163) validation cohorts. Overall accuracy for the validation cohort was 86% (95% confidence interval [CI]: 80%-91%). After excluding 28 (17%) indeterminate calls, accuracy increased to 94% (95% CI: 89%-97%). Concordance was 97% for a set of high-probability calls (n = 37) assayed by CLPA in both the United States and Guatemala. Accuracy for a cohort of relapsed/refractory biopsy specimens (n = 39) was 79% and 88%, respectively, after excluding indeterminate cases. Machine-learning analysis of gene expression accurately classifies paraffin-embedded lymphoma biopsy specimens and could transform diagnosis in LMICs.