RESUMEN
Purpose: The Medical Imaging and Data Resource Center (MIDRC) open data commons was launched to accelerate the development of artificial intelligence (AI) algorithms to help address the COVID-19 pandemic. The purpose of this study was to quantify longitudinal representativeness of the demographic characteristics of the primary MIDRC dataset compared to the United States general population (US Census) and COVID-19 positive case counts from the Centers for Disease Control and Prevention (CDC). Approach: The Jensen-Shannon distance (JSD), a measure of similarity of two distributions, was used to longitudinally measure the representativeness of the distribution of (1) all unique patients in the MIDRC data to the 2020 US Census and (2) all unique COVID-19 positive patients in the MIDRC data to the case counts reported by the CDC. The distributions were evaluated in the demographic categories of age at index, sex, race, ethnicity, and the combination of race and ethnicity. Results: Representativeness of the MIDRC data by ethnicity and the combination of race and ethnicity was impacted by the percentage of CDC case counts for which this was not reported. The distributions by sex and race have retained their level of representativeness over time. Conclusion: The representativeness of the open medical imaging datasets in the curated public data commons at MIDRC has evolved over time as the number of contributing institutions and overall number of subjects have grown. The use of metrics, such as the JSD support measurement of representativeness, is one step needed for fair and generalizable AI algorithm development.
RESUMEN
Purpose: To recognize and address various sources of bias essential for algorithmic fairness and trustworthiness and to contribute to a just and equitable deployment of AI in medical imaging, there is an increasing interest in developing medical imaging-based machine learning methods, also known as medical imaging artificial intelligence (AI), for the detection, diagnosis, prognosis, and risk assessment of disease with the goal of clinical implementation. These tools are intended to help improve traditional human decision-making in medical imaging. However, biases introduced in the steps toward clinical deployment may impede their intended function, potentially exacerbating inequities. Specifically, medical imaging AI can propagate or amplify biases introduced in the many steps from model inception to deployment, resulting in a systematic difference in the treatment of different groups. Approach: Our multi-institutional team included medical physicists, medical imaging artificial intelligence/machine learning (AI/ML) researchers, experts in AI/ML bias, statisticians, physicians, and scientists from regulatory bodies. We identified sources of bias in AI/ML, mitigation strategies for these biases, and developed recommendations for best practices in medical imaging AI/ML development. Results: Five main steps along the roadmap of medical imaging AI/ML were identified: (1) data collection, (2) data preparation and annotation, (3) model development, (4) model evaluation, and (5) model deployment. Within these steps, or bias categories, we identified 29 sources of potential bias, many of which can impact multiple steps, as well as mitigation strategies. Conclusions: Our findings provide a valuable resource to researchers, clinicians, and the public at large.
RESUMEN
RATIONALE AND OBJECTIVES: Participation in clinical research can be both highly rewarding and logistically demanding. As highlighted by recent Food and Drug Administration guidance, imaging has become an integral part of this research. The unique technical and administrative aspects of clinical trial imaging may differ substantially from those of standard-of-care imaging and thus burden the established clinical infrastructure at investigational sites. Failure to comply with requirements can lead to unusable data, repeat imaging, or the removal of patients from the trial. It is therefore imperative that all stakeholders address these challenges to engage in clinical research successfully. MATERIALS AND METHODS: The authors' experiences in managing clinical trial imaging requirements at their institution were used to identify common challenges. The impact of these challenges was assessed from an operational perspective. RESULTS: Although contract research organizations attempt to minimize these challenges, their efforts are necessarily limited and insufficient, and there is a lack of infrastructure available at investigational sites to address these issues. As such, recommendations are proposed for addressing these challenges at institutional and industry levels. CONCLUSION: The challenges associated with clinical trial imaging require an investment of resources from all stakeholders. Investigational sites must confront these challenges to satisfy trial requirements effectively, maintain a superior level of patient care, and guarantee trial integrity. Similarly, sponsors must acknowledge the burden of clinical trial imaging and support the development of the necessary local infrastructure. The implementation of the recommendations described here will improve the conduct of clinical trial imaging.
Asunto(s)
Ensayos Clínicos como Asunto , Diagnóstico por Imagen , Diagnóstico por Imagen/normas , Recursos en Salud , Humanos , Estados Unidos , United States Food and Drug AdministrationRESUMEN
RATIONALE AND OBJECTIVES: Managing and supervising the complex imaging examinations performed for clinical research in an academic medical center can be a daunting task. Coordinating with both radiology and research staff to ensure that the necessary imaging is performed, analyzed, and delivered in accordance with the research protocol is nontrivial. The purpose of this communication is to report on the establishment of a new Human Imaging Research Office (HIRO) at our institution that provides a dedicated infrastructure to assist with these issues and improve collaborations between radiology and research staff. MATERIALS AND METHODS: The HIRO was created with three primary responsibilities: 1) coordinate the acquisition of images for clinical research per the study protocol, 2) facilitate reliable and consistent assessment of disease response for clinical research, and 3) manage and distribute clinical research images in a compliant manner. RESULTS: The HIRO currently provides assistance for 191 clinical research studies from 14 sections and departments within our medical center and performs quality assessment of image-based measurements for six clinical research studies. The HIRO has fulfilled 1806 requests for medical images, delivering 81,712 imaging examinations (more than 44.1 million images) and related reports to investigators for research purposes. CONCLUSIONS: The ultimate goal of the HIRO is to increase the level of satisfaction and interaction among investigators, research subjects, radiologists, and other imaging professionals. Clinical research studies that use the HIRO benefit from a more efficient and accurate imaging process. The HIRO model could be adopted by other academic medical centers to support their clinical research activities; the details of implementation may differ among institutions, but the need to support imaging in clinical research through a dedicated, centralized initiative should apply to most academic medical centers.
Asunto(s)
Centros Médicos Académicos/organización & administración , Investigación Biomédica/organización & administración , Diagnóstico por Imagen , Radiología/organización & administración , ChicagoRESUMEN
PURPOSE: To evaluate the robustness of a breast ultrasonographic (US) computer-aided diagnosis (CAD) system in terms of its performance across different patient populations. MATERIALS AND METHODS: Three US databases were analyzed for this study: one South Korean and two United States databases. All three databases were utilized in an institutional review board-approved and HIPAA-compliant manner. Round-robin analysis and independent testing were performed to evaluate the performance of a computerized breast cancer classification scheme across the databases. Receiver operating characteristic (ROC) analysis was used to evaluate performance differences. RESULTS: The round-robin analyses of each database demonstrated similar results, with areas under the ROC curve ranging from 0.88 (95% confidence interval [CI]: 0.820, 0.918) to 0.91 (95% CI: 0.86, 0.95). The independent testing of each database, however, indicated that although the performances were similar, the range in areas under the ROC curve (from 0.79 [95% CI: 0.730, 0.842] to 0.87 [95% CI: 0.794, 0.923]) was wider than that with the round-robin tests. However, the only instances in which statistically significant differences in performance were demonstrated occurred when the Korean database was used in a testing capacity in independent testing. CONCLUSION: The few observed statistically significant differences in performance indicated that while the US features used by the system were useful across the databases, their relative importance differed. In practice, this means that a CAD system may need to be adjusted when applied to a different population.
Asunto(s)
Neoplasias de la Mama/diagnóstico por imagen , Diagnóstico por Computador , Ultrasonografía Mamaria , Teorema de Bayes , Neoplasias de la Mama/epidemiología , Femenino , Humanos , Curva ROC , República de Corea/epidemiología , Estadísticas no Paramétricas , Estados Unidos/epidemiología , Población UrbanaRESUMEN
RATIONALE AND OBJECTIVES: The automated classification of sonographic breast lesions is generally accomplished by extracting and quantifying various features from the lesions. The selection of images to be analyzed, however, is usually left to the radiologist. Here we present an analysis of the effect that image selection can have on the performance of a breast ultrasound computer-aided diagnosis system. MATERIALS AND METHODS: A database of 344 different sonographic lesions was analyzed for this study (219 cysts/benign processes, 125 malignant lesions). The database was collected in an institutional review board-approved, Health Insurance Portability and Accountability Act-compliant manner. Three different image selection protocols were used in the automated classification of each lesion: all images, first image only, and randomly selected images. After image selection, two different protocols were used to classify the lesions: (a) the average feature values were input to the classifier or (b) the classifier outputs were averaged together. Both protocols generated an estimated probability of malignancy. Round-robin analysis was performed using a Bayesian neural network-based classifier. Receiver-operating characteristic analysis was used to evaluate the performance of each protocol. Significance testing of the performance differences was performed via 95% confidence intervals and noninferiority tests. RESULTS: The differences in the area under the receiver-operating characteristic curves were never more than 0.02 for the primary protocols. Noninferiority was demonstrated between these protocols with respect to standard input techniques (all images selected and feature averaging). CONCLUSION: We have proved that our automated lesion classification scheme is robust and can perform well when subjected to variations in user input.
Asunto(s)
Neoplasias de la Mama/diagnóstico por imagen , Interpretación de Imagen Asistida por Computador/métodos , Ultrasonografía Mamaria/métodos , Femenino , Humanos , Variaciones Dependientes del Observador , Reproducibilidad de los Resultados , Sensibilidad y EspecificidadRESUMEN
PURPOSE: To evaluate the performance of a computer-aided diagnosis (CAD) workstation in classifying cancer in a realistic data set representative of a clinical diagnostic breast ultrasonography (US) practice. MATERIALS AND METHODS: The database consisted of consecutive diagnostic breast US scans collected with informed consent with a protocol approved by the institutional review board and compliant with the HIPAA. Images from 508 patients with a total of 1046 distinct abnormalities were used. One hundred one patients had breast cancer. Results both for patients in whom the lesion abnormality was proved with either biopsy or aspiration (n = 183) and for all patients irrespective of biopsy status (n = 508) are presented. The ability of the CAD workstation to help differentiate malignancies from benign lesions was evaluated with a leave-one-out-by-case analysis. The clinical specificity of the radiologists for this dataset was determined according to the biopsy rate and outcome. RESULTS: In the task of differentiating cancer from all other lesions sent to biopsy, the CAD workstation obtained an area under the receiver operating characteristic curve (AUC) value of 0.88, with 100% sensitivity at 26% specificity (157 cancers and 362 lesions total). The radiologists' specificity at 100% sensitivity for this set was zero. When analyzing all lesions irrespective of biopsy status, which is more representative of actual clinical practice, the CAD scheme obtained an AUC of 0.90 and 100% sensitivity at 30% specificity (157 cancers and 1046 lesions total). The radiologists' specificity at 100% sensitivity for this set was 77%. CONCLUSION: Current levels of computer performance warrant a clinical evaluation of the potential of US CAD to aid radiologists in lesion work-up recommendations.