RESUMO
OBJECTIVES: This study aims to address the critical gap of unavailability of publicly accessible oral cavity image datasets for developing machine learning (ML) and artificial intelligence (AI) technologies for the diagnosis and prognosis of oral cancer (OCA) and oral potentially malignant disorders (OPMD), with a particular focus on the high prevalence and delayed diagnosis in Asia. MATERIALS AND METHODS: Following ethical approval and informed written consent, images of the oral cavity were obtained from mobile phone cameras and clinical data was extracted from hospital records from patients attending to the Dental Teaching Hospital, Peradeniya, Sri Lanka. After data management and hosting, image categorization and annotations were done by clinicians using a custom-made software tool developed by the research team. RESULTS: A dataset comprising 3000 high-quality, anonymized images obtained from 714 patients were classified into four distinct categories: healthy, benign, OPMD, and OCA. Images were annotated with polygonal shaped oral cavity and lesion boundaries. Each image is accompanied by patient metadata, including age, sex, diagnosis, and risk factor profiles such as smoking, alcohol, and betel chewing habits. CONCLUSION: Researchers can utilize the annotated images in the COCO format, along with the patients' metadata, to enhance ML and AI algorithm development.