Implicit data crimes: Machine learning bias arising from misuse of public data.

Shimron, Efrat; Tamir, Jonathan I; Wang, Ke; Lustig, Michael

Shimron, Efrat; Tamir, Jonathan I; Wang, Ke; Lustig, Michael.

Afiliação

Shimron E; Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720.
Tamir JI; Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX 78712.
Wang K; Department of Diagnostic Medicine, Dell Medical School, The University of Texas at Austin, Austin, TX 78712.
Lustig M; Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin, Austin, TX 78712.

Proc Natl Acad Sci U S A ; 119(13): e2117203119, 2022 03 29.

Article em En | MEDLINE | ID: mdl-35312366

RESUMO

SignificancePublic databases are an important resource for machine learning research, but their growing availability sometimes leads to "off-label" usage, where data published for one task are used for another. This work reveals that such off-label usage could lead to biased, overly optimistic results of machine-learning algorithms. The underlying cause is that public data are processed with hidden processing pipelines that alter the data features. Here we study three well-known algorithms developed for image reconstruction from magnetic resonance imaging measurements and show they could produce biased results with up to 48% artificial improvement when applied to public databases. We relate to the publication of such results as implicit "data crimes" to raise community awareness of this growing big data problem.

Assuntos

Algoritmos; Aprendizado de Máquina; Viés; Crime; Processamento de Imagem Assistida por Computador

Palavras-chave

MRI; bias; big data; data crimes; inverse problem

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / Aprendizado de Máquina Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / Aprendizado de Máquina Idioma: En Ano de publicação: 2022 Tipo de documento: Article