Semi-supervised training using cooperative labeling of weakly annotated data for nodule detection in chest CT.

Maynord, Michael; Farhangi, M Mehdi; Fermüller, Cornelia; Aloimonos, Yiannis; Levine, Gary; Petrick, Nicholas; Sahiner, Berkman; Pezeshk, Aria

Maynord, Michael; Farhangi, M Mehdi; Fermüller, Cornelia; Aloimonos, Yiannis; Levine, Gary; Petrick, Nicholas; Sahiner, Berkman; Pezeshk, Aria.

Affiliation

Maynord M; University of Maryland, Computer Science Department, Iribe Center for Computer Science and Engineering, College Park, Maryland, USA.
Farhangi MM; Division of Imaging, Diagnostics, and Software Reliability (DIDSR), OSEL, CDRH, FDA, Silver Spring, Maryland, USA.
Fermüller C; Division of Imaging, Diagnostics, and Software Reliability (DIDSR), OSEL, CDRH, FDA, Silver Spring, Maryland, USA.
Aloimonos Y; University of Maryland, Institute for Advanced Computer Studies, Iribe Center for Computer Science and Engineering, College Park, Maryland, USA.
Levine G; University of Maryland, Computer Science Department, Iribe Center for Computer Science and Engineering, College Park, Maryland, USA.
Petrick N; Division of Radiological Imaging Devices and Electronic Products, CDRH, FDA, Silver Spring, Maryland, USA.
Sahiner B; Division of Imaging, Diagnostics, and Software Reliability (DIDSR), OSEL, CDRH, FDA, Silver Spring, Maryland, USA.
Pezeshk A; Division of Imaging, Diagnostics, and Software Reliability (DIDSR), OSEL, CDRH, FDA, Silver Spring, Maryland, USA.

Med Phys ; 50(7): 4255-4268, 2023 Jul.

Article in En | MEDLINE | ID: mdl-36630691

ABSTRACT

ABSTRACT

PURPOSE:

Machine learning algorithms are best trained with large quantities of accurately annotated samples. While natural scene images can often be labeled relatively cheaply and at large scale, obtaining accurate annotations for medical images is both time consuming and expensive. In this study, we propose a cooperative labeling method that allows us to make use of weakly annotated medical imaging data for the training of a machine learning algorithm. As most clinically produced data are weakly-annotated - produced for use by humans rather than machines and lacking information machine learning depends upon - this approach allows us to incorporate a wider range of clinical data and thereby increase the training set size.

METHODS:

Our pseudo-labeling method consists of multiple stages. In the first stage, a previously established network is trained using a limited number of samples with high-quality expert-produced annotations. This network is used to generate annotations for a separate larger dataset that contains only weakly annotated scans. In the second stage, by cross-checking the two types of annotations against each other, we obtain higher-fidelity annotations. In the third stage, we extract training data from the weakly annotated scans, and combine it with the fully annotated data, producing a larger training dataset. We use this larger dataset to develop a computer-aided detection (CADe) system for nodule detection in chest CT.

RESULTS:

We evaluated the proposed approach by presenting the network with different numbers of expert-annotated scans in training and then testing the CADe using an independent expert-annotated dataset. We demonstrate that when availability of expert annotations is severely limited, the inclusion of weakly-labeled data leads to a 5% improvement in the competitive performance metric (CPM), defined as the average of sensitivities at different false-positive rates.

CONCLUSIONS:

Our proposed approach can effectively merge a weakly-annotated dataset with a small, well-annotated dataset for algorithm training. This approach can help enlarge limited training data by leveraging the large amount of weakly labeled data typically generated in clinical image interpretation.

Subject(s)

Algorithms; Tomography, X-Ray Computed; Humans; Machine Learning; Supervised Machine Learning; Image Processing, Computer-Assisted/methods

Key words

computer aided detection; pulmonary nodules; semi-supervised learning

Fulltext

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Algorithms / Tomography, X-Ray Computed Type of study: Diagnostic_studies Limits: Humans Language: En Journal: Med Phys Year: 2023 Type: Article Affiliation country: United States

Fulltext

XML

PubMed Links

Search on Google