RESUMEN
OBJECTIVES: Natural language processing (NLP) algorithms can interpret unstructured text for commonly used terms and phrases. Pancreatic pathologies are diverse and include benign and malignant entities with associated histologic features. Creating a pancreas NLP algorithm can aid in electronic health record coding as well as large database creation and curation. METHODS: Text-based pancreatic anatomic and cytopathologic reports for pancreatic cancer, pancreatic ductal adenocarcinoma, neuroendocrine tumor, intraductal papillary neoplasm, tumor dysplasia, and suspicious findings were collected. This dataset was split 80/20 for model training and development. A separate set was held out for testing purposes. We trained using convolutional neural network to predict each heading. RESULTS: Over 14,000 reports were obtained from the Mass General Brigham Healthcare System electronic record. Of these, 1252 reports were used for algorithm development. Final accuracy and F1 scores relative to the test set ranged from 95% and 98% for each queried pathology. To understand the dependence of our results to training set size, we also generated learning curves. Scoring metrics improved as more reports were submitted for training; however, some queries had high index performance. CONCLUSIONS: Natural language processing algorithms can be used for pancreatic pathologies. Increased training volume, nonoverlapping terminology, and conserved text structure improve NLP algorithm performance.