Semi-automatic data annotation based on feature-space projection and local quality metrics: An application to cerebral emboli characterization.

Vindas, Yamil; Guépié, Blaise Kévin; Almar, Marilys; Roux, Emmanuel; Delachartre, Philippe

Vindas, Yamil; Guépié, Blaise Kévin; Almar, Marilys; Roux, Emmanuel; Delachartre, Philippe.

Afiliação

Vindas Y; Univ Lyon, INSA-Lyon, Université Claude Bernard Lyon 1, UJM-Saint Etienne, CNRS, Inserm, CREATIS UMR 5220, U1294, LYON, F-69100, France. Electronic address: yamil.vindas@creatis.insa-lyon.fr.
Guépié BK; Université de Technologie de Troyes / Laboratoire Informatique et Société Numérique, 10004 Troyes, France.
Almar M; Atys Medical, 17 Parc Arbora, Soucieu-en-Jarrest 69510, France.
Roux E; Univ Lyon, INSA-Lyon, Université Claude Bernard Lyon 1, UJM-Saint Etienne, CNRS, Inserm, CREATIS UMR 5220, U1294, LYON, F-69100, France.
Delachartre P; Univ Lyon, INSA-Lyon, Université Claude Bernard Lyon 1, UJM-Saint Etienne, CNRS, Inserm, CREATIS UMR 5220, U1294, LYON, F-69100, France.

Med Image Anal ; 79: 102437, 2022 07.

Article em En | MEDLINE | ID: mdl-35427898

ABSTRACT

ABSTRACT

We propose a semi-supervised learning approach to annotate a dataset with reduced requirements for manual annotation and with controlled annotation error. The method is based on feature-space projection and label propagation using local quality metrics. First, an auto-encoder extracts the features of the samples in an unsupervised manner. Then, the extracted features are projected by a t-distributed stochastic neighbor embedding algorithm into a two-dimensional (2D) space. A selection of the best 2D projection is introduced based on the silhouette score. The expert annotator uses the obtained 2D representation to manually label samples. Finally, the labels of the labeled samples are propagated to the unlabeled samples using a K-nearest neighbor strategy and local quality metrics. We compare our method against semi-supervised optimum-path forest and K-nearest neighbor label propagation (without considering local quality metrics). Our method achieves state-of-the-art results on three different datasets by labeling more than 96% of the samples with an annotation error from 7% to 17%. Additionally, our method allows to control the trade-off between annotation error and number of labeled samples. Moreover, we combine our method with robust loss functions to compensate for the label noise introduced by automatic label propagation. Our method allows to achieve similar, and even better, classification performances compared to those obtained using a fully manually labeled dataset, with up to 6% in terms of classification accuracy.

Assuntos

Curadoria de Dados; Embolia Intracraniana; Algoritmos; Benchmarking; Humanos; Aprendizado de Máquina Supervisionado

Palavras-chave

Data annotation; Emboli characterization; Noisy labels; Semi-supervised learning; Stroke

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Embolia Intracraniana / Curadoria de Dados Limite: Humans Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google