Pesquisa | Biblioteca Virtual em Saúde

A comparison of Bayesian and score methods for interval estimates of positive/negative likelihood ratios in support of diagnostic device performance evaluation.

Hu, Tingting; Sahiner, Berkman; Petrick, Nicholas; Cha, Kenny; Wen, Si; Pennello, Gene.

J Biopharm Stat ; : 1-19, 2024 Jun 18.

Artigo em Inglês | MEDLINE | ID: mdl-38889012

RESUMO

BACKGROUND: Positive and negative likelihood ratios (PLR and NLR) are important metrics of accuracy for diagnostic devices with a binary output. However, the properties of Bayesian and frequentist interval estimators of PLR/NLR have not been extensively studied and compared. In this study, we explore the potential use of the Bayesian method for interval estimation of PLR/NLR, and, more broadly, for interval estimation of the ratio of two independent proportions. METHODS: We develop a Bayesian-based approach for interval estimation of PLR/NLR for use as a part of a diagnostic device performance evaluation. Our approach is applicable to a broader setting for interval estimation of any ratio of two independent proportions. We compare score and Bayesian interval estimators for the ratio of two proportions in terms of the coverage probability (CP) and expected interval width (EW) via extensive experiments and applications to two case studies. A supplementary experiment was also conducted to assess the performance of the proposed exact Bayesian method under different priors. RESULTS: Our experimental results show that the overall mean CP for Bayesian interval estimation is consistent with that for the score method (0.950 vs. 0.952), and the overall mean EW for Bayesian is shorter than that for score method (15.929 vs. 19.724). Application to two case studies showed that the intervals estimated using the Bayesian and frequentist approaches are very similar. DISCUSSION: Our numerical results indicate that the proposed Bayesian approach has a comparable CP performance with the score method while yielding higher precision (i.e. a shorter EW).

Decision region analysis for generalizability of artificial intelligence models: estimating model generalizability in the case of cross-reactivity and population shift.

Burgon, Alexis; Sahiner, Berkman; Petrick, Nicholas; Pennello, Gene; Cha, Kenny H; Samala, Ravi K.

J Med Imaging (Bellingham) ; 11(1): 014501, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38283653

RESUMO

Purpose: Understanding an artificial intelligence (AI) model's ability to generalize to its target population is critical to ensuring the safe and effective usage of AI in medical devices. A traditional generalizability assessment relies on the availability of large, diverse datasets, which are difficult to obtain in many medical imaging applications. We present an approach for enhanced generalizability assessment by examining the decision space beyond the available testing data distribution. Approach: Vicinal distributions of virtual samples are generated by interpolating between triplets of test images. The generated virtual samples leverage the characteristics already in the test set, increasing the sample diversity while remaining close to the AI model's data manifold. We demonstrate the generalizability assessment approach on the non-clinical tasks of classifying patient sex, race, COVID status, and age group from chest x-rays. Results: Decision region composition analysis for generalizability indicated that a disproportionately large portion of the decision space belonged to a single "preferred" class for each task, despite comparable performance on the evaluation dataset. Evaluation using cross-reactivity and population shift strategies indicated a tendency to overpredict samples as belonging to the preferred class (e.g., COVID negative) for patients whose subgroup was not represented in the model development data. Conclusions: An analysis of an AI model's decision space has the potential to provide insight into model generalizability. Our approach uses the analysis of composition of the decision space to obtain an improved assessment of model generalizability in the case of limited test data.

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA