Confidence-ranked reconstruction of census microdata from published statistics.

Dick, Travis; Dwork, Cynthia; Kearns, Michael; Liu, Terrance; Roth, Aaron; Vietri, Giuseppe; Wu, Zhiwei Steven

Dick, Travis; Dwork, Cynthia; Kearns, Michael; Liu, Terrance; Roth, Aaron; Vietri, Giuseppe; Wu, Zhiwei Steven.

Afiliación

Dick T; Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104.
Dwork C; School of Engineering and Applied Sciences, Harvard University, Boston, MA 02134.
Kearns M; Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104.
Liu T; School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213.
Roth A; Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104.
Vietri G; Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455.
Wu ZS; School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213.

Proc Natl Acad Sci U S A ; 120(8): e2218605120, 2023 Feb 21.

Article en En | MEDLINE | ID: mdl-36800385

ABSTRACT

ABSTRACT

A reconstruction attack on a private dataset D takes as input some publicly accessible information about the dataset and produces a list of candidate elements of D. We introduce a class of data reconstruction attacks based on randomized methods for nonconvex optimization. We empirically demonstrate that our attacks can not only reconstruct full rows of D from aggregate query statistics Q(D)∈âm but can do so in a way that reliably ranks reconstructed rows by their odds of appearing in the private data, providing a signature that could be used for prioritizing reconstructed rows for further actions such as identity theft or hate crime. We also design a sequence of baselines for evaluating reconstruction attacks. Our attacks significantly outperform those that are based only on access to a public distribution or population from which the private dataset D was sampled, demonstrating that they are exploiting information in the aggregate statistics Q(D) and not simply the overall structure of the distribution. In other words, the queries Q(D) are permitting reconstruction of elements of this dataset, not the distribution from which D was drawn. These findings are established both on 2010 US decennial Census data and queries and Census-derived American Community Survey datasets. Taken together, our methods and experiments illustrate the risks in releasing numerically precise aggregate statistics of a large dataset and provide further motivation for the careful application of provably private techniques such as differential privacy.

Palabras clave

U.S. Census; data privacy; differential privacy; reconstruction attack

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Tipo de estudio: Clinical_trials Idioma: En Revista: Proc Natl Acad Sci U S A Año: 2023 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google