Pesquisa | Portal Regional da BVS

1.

Rethink reporting of evaluation results in AI.

Burnell, Ryan; Schellaert, Wout; Burden, John; Ullman, Tomer D; Martinez-Plumed, Fernando; Tenenbaum, Joshua B; Rutar, Danaja; Cheke, Lucy G; Sohl-Dickstein, Jascha; Mitchell, Melanie; Kiela, Douwe; Shanahan, Murray; Voorhees, Ellen M; Cohn, Anthony G; Leibo, Joel Z; Hernandez-Orallo, Jose.

Science ; 380(6641): 136-138, 2023 04 14.

Artigo em Inglês | MEDLINE | ID: mdl-37053341

RESUMO

Aggregate metrics and lack of access to results limit understanding.

2.

Overview of the TREC 2020 Precision Medicine Track.

Roberts, Kirk; Demner-Fushman, Dina; Voorhees, Ellen M; Bedrick, Steven; Hersh, William R.

Text Retr Conf ; 12662020 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-34849513

3.

Overview of the TREC 2019 Precision Medicine Track.

Roberts, Kirk; Demner-Fushman, Dina; Voorhees, Ellen M; Hersh, William R; Bedrick, Steven; Lazar, Alexander J; Pant, Shubham; Meric-Bernstam, Funda.

Text Retr Conf ; 12502019 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-34849512

4.

Using Replicates in Information Retrieval Evaluation.

Voorhees, Ellen M; Samarov, Daniel; Soboroff, Ian.

ACM Trans Inf Syst ; 36(2)2017 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-29905334

RESUMO

This article explores a method for more accurately estimating the main effect of the system in a typical test-collection-based evaluation of information retrieval systems, thus increasing the sensitivity of system comparisons. Randomly partitioning the test document collection allows for multiple tests of a given system and topic (replicates). Bootstrap ANOVA can use these replicates to extract system-topic interactions-something not possible without replicates-yielding a more precise value for the system effect and a narrower confidence interval around that value. Experiments using multiple TREC collections demonstrate that removing the topic-system interactions substantially reduces the confidence intervals around the system effect as well as increases the number of significant pairwise differences found. Further, the method is robust against small changes in the number of partitions used, against variability in the documents that constitute the partitions, and the measure of effectiveness used to quantify system effectiveness.

5.

Overview of the TREC 2017 Precision Medicine Track.

Roberts, Kirk; Demner-Fushman, Dina; Voorhees, Ellen M; Hersh, William R; Bedrick, Steven; Lazar, Alexander J; Pant, Shubham.

Text Retr Conf ; 262017 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-32776021

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA