Your browser doesn't support javascript.
loading
Fair Max-Min Diversity Maximization in Streaming and Sliding-Window Models.
Wang, Yanhao; Fabbri, Francesco; Mathioudakis, Michael; Li, Jia.
Afiliación
  • Wang Y; School of Data Science and Engineering, East China Normal University, Shanghai 200062, China.
  • Fabbri F; Spotify, 08000 Barcelona, Spain.
  • Mathioudakis M; Department of Computer Science, University of Helsinki, 00560 Helsinki, Finland.
  • Li J; School of Data Science and Engineering, East China Normal University, Shanghai 200062, China.
Entropy (Basel) ; 25(7)2023 Jul 14.
Article en En | MEDLINE | ID: mdl-37510013
Diversity maximization is a fundamental problem with broad applications in data summarization, web search, and recommender systems. Given a set X of n elements, the problem asks for a subset S of k≪n elements with maximum diversity, as quantified by the dissimilarities among the elements in S. In this paper, we study diversity maximization with fairness constraints in streaming and sliding-window models. Specifically, we focus on the max-min diversity maximization problem, which selects a subset S that maximizes the minimum distance (dissimilarity) between any pair of distinct elements within it. Assuming that the set X is partitioned into m disjoint groups by a specific sensitive attribute, e.g., sex or race, ensuring fairness requires that the selected subset S contains ki elements from each group i∈[m]. Although diversity maximization has been extensively studied, existing algorithms for fair max-min diversity maximization are inefficient for data streams. To address the problem, we first design efficient approximation algorithms for this problem in the (insert-only) streaming model, where data arrive one element at a time, and a solution should be computed based on the elements observed in one pass. Furthermore, we propose approximation algorithms for this problem in the sliding-window model, where only the latest w elements in the stream are considered for computation to capture the recency of the data. Experimental results on real-world and synthetic datasets show that our algorithms provide solutions of comparable quality to the state-of-the-art offline algorithms while running several orders of magnitude faster in the streaming and sliding-window settings.
Palabras clave

Texto completo: 1 Base de datos: MEDLINE Idioma: En Revista: Entropy (Basel) Año: 2023 Tipo del documento: Article País de afiliación: China

Texto completo: 1 Base de datos: MEDLINE Idioma: En Revista: Entropy (Basel) Año: 2023 Tipo del documento: Article País de afiliación: China