Hypothesis testing procedure for binary and multi-class F<sub>1</sub> -scores in the paired design.

Takahashi, Kanae; Yamamoto, Kouji; Kuchiba, Aya; Shintani, Ayumi; Koyama, Tatsuki

Hypothesis testing procedure for binary and multi-class F₁ -scores in the paired design.

Takahashi, Kanae; Yamamoto, Kouji; Kuchiba, Aya; Shintani, Ayumi; Koyama, Tatsuki.

Afiliação

Takahashi K; Department of Biostatistics, Hyogo Medical University, Hyogo, Japan.
Yamamoto K; Department of Biostatistics, School of Medicine, Yokohama City University, Kanagawa, Japan.
Kuchiba A; Graduate School of Health Innovation, Kanagawa University of Human Services, Kanagawa, Japan.
Shintani A; Department of Medical Statistics, Osaka Metropolitan University Graduate School of Medicine, Osaka, Japan.
Koyama T; Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.

Stat Med ; 42(23): 4177-4192, 2023 10 15.

Article em En | MEDLINE | ID: mdl-37527903

ABSTRACT

ABSTRACT

In modern medicine, medical tests are used for various purposes including diagnosis, disease screening, prognosis, and risk prediction. To quantify the performance of the binary medical test, we often use sensitivity, specificity, and negative and positive predictive values as measures. Additionally, the F 1 $$ {F}_1 $$ -score, which is defined as the harmonic mean of precision (positive predictive value) and recall (sensitivity), has come to be used in the medical field due to its favorable characteristics. The F 1 $$ {F}_1 $$ -score has been extended for multi-class classification, and two types of F 1 $$ {F}_1 $$ -scores have been proposed for multi-class classification a micro-averaged F 1 $$ {F}_1 $$ -score and a macro-averaged F 1 $$ {F}_1 $$ -score. The micro-averaged F 1 $$ {F}_1 $$ -score pools per-sample classifications across classes and then calculates the overall F 1 $$ {F}_1 $$ -score, whereas the macro-averaged F 1 $$ {F}_1 $$ -score computes an arithmetic mean of the F 1 $$ {F}_1 $$ -scores for each class. Additionally, Sokolova and Lapalme 1 $$ {}^1 $$ gave an alternative definition of the macro-averaged F 1 $$ {F}_1 $$ -score as the harmonic mean of the arithmetic means of the precision and recall over classes. Although some statistical methods of inference for binary and multi-class F 1 $$ {F}_1 $$ -scores have been proposed, the methodology development of hypothesis testing procedure for them has not been fully progressing yet. Therefore, we aim to develop hypothesis testing procedure for comparing two F 1 $$ {F}_1 $$ -scores in paired study design based on the large sample multivariate central limit theorem.

Assuntos

Técnicas e Procedimentos Diagnósticos; Prognóstico; Humanos; Técnicas e Procedimentos Diagnósticos/estatística & dados numéricos

Palavras-chave

<ns0:math> <semantics> <mrow> <msub> <mrow> <mi>F</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> </mrow> <annotation>$$ {F}_1 $$</annotation> </semantics> </math> measures; delta-method; multi-class classification; precision; recall

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Prognóstico / Técnicas e Procedimentos Diagnósticos Tipo de estudo: Diagnostic_studies / Prognostic_studies Limite: Humans Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google