Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Cleaning and Harmonizing Medical Image Data for Reliable AI: Lessons Learned from Longitudinal Oral Cancer Natural History Study Data.

Xue, Zhiyun; Oguguo, Tochi; Yu, Kelly J; Chen, Tseng-Cheng; Hua, Chun-Hung; Kang, Chung Jan; Chien, Chih-Yen; Tsai, Ming-Hsui; Wang, Cheng-Ping; Chaturvedi, Anil K; Antani, Sameer.

Proc SPIE Int Soc Opt Eng ; 129312024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38774479

RESUMO

For deep learning-based machine learning, not only are large and sufficiently diverse data crucial but their good qualities are equally important. However, in real-world applications, it is very common that raw source data may contain incorrect, noisy, inconsistent, improperly formatted and sometimes missing elements, particularly, when the datasets are large and sourced from many sites. In this paper, we present our work towards preparing and making image data ready for the development of AI-driven approaches for studying various aspects of the natural history of oral cancer. Specifically, we focus on two aspects: 1) cleaning the image data; and 2) extracting the annotation information. Data cleaning includes removing duplicates, identifying missing data, correcting errors, standardizing data sets, and removing personal sensitive information, toward combining data sourced from different study sites. These steps are often collectively referred to as data harmonization. Annotation information extraction includes identifying crucial or valuable texts that are manually entered by clinical providers related to the image paths/names and standardizing of the texts of labels. Both are important for the successful deep learning algorithm development and data analyses. Specifically, we provide details on the data under consideration, describe the challenges and issues we observed that motivated our work, present specific approaches and methods that we used to clean and standardize the image data and extract labelling information. Further, we discuss the ways to increase efficiency of the process and the lessons learned. Research ideas on automating the process with ML-driven techniques are also presented and discussed. Our intent in reporting and discussing such work in detail is to help provide insights in automating or, minimally, increasing the efficiency of these critical yet often under-reported processes.

Open World Active Learning for Echocardiography View Classification.

Zamzmi, Ghada; Oguguo, Tochi; Rajaraman, Sivaramakrishnan; Antani, Sameer.

Proc SPIE Int Soc Opt Eng ; 120332022.

Artigo em Inglês | MEDLINE | ID: mdl-36860349

RESUMO

Existing works for automated echocardiography view classification are designed under the assumption that the views in the testing set must belong to a limited number of views that have appeared in the training set. Such a design is called closed world classification. This assumption may be too strict for real-world environments that are open and often have unseen examples, drastically weakening the robustness of the classical view classification approaches. In this work, we developed an open world active learning approach for echocardiography view classification, where the network classifies images of known views into their respective classes and identifies images of unknown views. Then, a clustering approach is used to cluster the unknown views into various groups to be labeled by echocardiologists. Finally, the new labeled samples are added to the initial set of known views and used to update the classification network. This process of actively labeling unknown clusters and integrating them into the classification model significantly increases the efficiency of data labeling and the robustness of the classifier. Our results using an echocardiography dataset containing known and unknown views showed the superiority of the proposed approach as compared to the closed world view classification approaches.

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA