Implicit data crimes: Machine learning bias arising from misuse of public data.
Proc Natl Acad Sci U S A
; 119(13): e2117203119, 2022 03 29.
Article
em En
| MEDLINE
| ID: mdl-35312366
SignificancePublic databases are an important resource for machine learning research, but their growing availability sometimes leads to "off-label" usage, where data published for one task are used for another. This work reveals that such off-label usage could lead to biased, overly optimistic results of machine-learning algorithms. The underlying cause is that public data are processed with hidden processing pipelines that alter the data features. Here we study three well-known algorithms developed for image reconstruction from magnetic resonance imaging measurements and show they could produce biased results with up to 48% artificial improvement when applied to public databases. We relate to the publication of such results as implicit "data crimes" to raise community awareness of this growing big data problem.
Palavras-chave
Texto completo:
1
Base de dados:
MEDLINE
Assunto principal:
Algoritmos
/
Aprendizado de Máquina
Idioma:
En
Ano de publicação:
2022
Tipo de documento:
Article