RESUMO
Objective: To address thyroid cancer overdiagnosis, we aim to develop a natural language processing (NLP) algorithm to determine the appropriateness of thyroid ultrasounds (TUS). Patients and Methods: Between 2017 and 2021, we identified 18,000 TUS patients at Mayo Clinic and selected 628 for chart review to create a ground truth dataset based on consensus. We developed a rule-based NLP pipeline to identify TUS as appropriate TUS (aTUS) or inappropriate TUS (iTUS) using patients' clinical notes and additional meta information. In addition, we designed an abbreviated NLP pipeline (aNLP) solely focusing on labels from TUS order requisitions to facilitate deployment at other health care systems. Our dataset was split into a training set of 468 (75%) and a test set of 160 (25%), using the former for rule development and the latter for performance evaluation. Results: There were 449 (95.9%) patients identified as aTUS and 19 (4.06%) as iTUS in the training set; there are 155 (96.88%) patients identified as aTUS and 5 (3.12%) were iTUS in the test set. In the training set, the pipeline achieved a sensitivity of 0.99, specificity of 0.95, and positive predictive value of 1.0 for detecting aTUS. The testing cohort revealed a sensitivity of 0.96, specificity of 0.80, and positive predictive value of 0.99. Similar performance metrics were observed in the aNLP pipeline. Conclusion: The NLP models can accurately identify the appropriateness of a thyroid ultrasound from clinical documentation and order requisition information, a critical initial step toward evaluating the drivers and outcomes of TUS use and subsequent thyroid cancer overdiagnosis.
RESUMO
This study aimed to review the application of natural language processing (NLP) in thyroid-related conditions and to summarize current challenges and potential future directions. We performed a systematic search of databases for studies describing NLP applications in thyroid conditions published in English between January 1, 2012 and November 4, 2022. In addition, we used a snowballing technique to identify studies missed in the initial search or published after our search timeline until April 1, 2023. For included studies, we extracted the NLP method (eg, rule-based, machine learning, deep learning, or hybrid), NLP application (eg, identification, classification, and automation), thyroid condition (eg, thyroid cancer, thyroid nodule, and functional or autoimmune disease), data source (eg, electronic health records, health forums, medical literature databases, or genomic databases), performance metrics, and stages of development. We identified 24 eligible NLP studies focusing on thyroid-related conditions. Deep learning-based methods were the most common (38%), followed by rule-based (21%), and traditional machine learning (21%) methods. Thyroid nodules (54%) and thyroid cancer (29%) were the primary conditions under investigation. Electronic health records were the dominant data source (17/24, 71%), with imaging reports being the most frequently used (15/17, 88%). There is increasing interest in NLP applications for thyroid-related studies, mostly addressing thyroid nodules and using deep learning-based methodologies with limited external validation. However, none of the reviewed NLP applications have reached clinical practice. Several limitations, including inconsistent clinical documentation and model portability, need to be addressed to promote the evaluation and implementation of NLP applications to support patient care in thyroidology.