research-article

Private coresets

Authors:
Dan Feldman

Tel-Aviv University, Tel-Aviv, Israel

Tel-Aviv University, Tel-Aviv, Israel
View Profile

,
Amos Fiat

Tel-Aviv University, Tel-Aviv, Israel

Tel-Aviv University, Tel-Aviv, Israel
View Profile

,
Haim Kaplan

Tel-Aviv University, Tel-Aviv, Israel

Tel-Aviv University, Tel-Aviv, Israel
View Profile

,
Kobbi Nissim

Ben-Gurion University, Be'er Sheva, Israel

Ben-Gurion University, Be'er Sheva, Israel
View Profile

STOC '09: Proceedings of the forty-first annual ACM symposium on Theory of computingMay 2009Pages 361–370https://doi.org/10.1145/1536414.1536465

Published:31 May 2009Publication History

STOC '09: Proceedings of the forty-first annual ACM symposium on Theory of computing

Pages 361–370

ABSTRACT

A coreset of a point set P is a small weighted set of points that captures some geometric properties of $P$. Coresets have found use in a vast host of geometric settings. We forge a link between coresets, and differentially private sanitizations that can answer any number of queries without compromising privacy. We define the notion of private coresets, which are simultaneously both coresets and differentially private, and show how they may be constructed. We first show that the existence of a small coreset with low generalized sensitivity (i.e., replacing a single point in the original point set slightly affects the quality of the coreset) implies (in an inefficient manner) the existence of a private coreset for the same queries. This greatly extends the works of Blum, Ligett, and Roth [STOC 2008] and McSherry and Talwar [FOCS 2007]. We also give an efficient algorithm to compute private coresets for k-median and k-mean queries in Re^d, immediately implying efficient differentially private sanitizations for such queries. Following McSherry and Talwar, this construction also gives efficient coalition proof (approximately dominant strategy) mechanisms for location problems. Unlike coresets which only have a multiplicative approximation factor, we prove that private coresets must have an additive error. We present a new technique for showing lower bounds on this error.

References

P. Agarwal, S. Har-Peled, and K.R. Varadarajan. Geometric approximations via coresets. Combinatorial and Computational Geometry -- MSRI Publications, 52:1--30, 2005.Google Scholar
P. Agarwal, S. Har-Peled, and K.R. Varadarajan. Approximating extent measures of points. JACM: Journal of the ACM, 51, 2004. Google ScholarDigital Library
P.K. Agarwal, C.M. Procopiuc, and K.R. Varadarajan. Approximation algorithms for k-line center. ESA, pages 54--63, 2002. Google ScholarDigital Library
M. Badoiu, S. Har-Peled, and P. Indyk. Approximate clustering via core-sets. STOC, pages 250--257, 2002. Google ScholarDigital Library
B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. PODS, pages 273--282, 2007. Google ScholarDigital Library
A. Beimel, K. Nissim, and E. Omri. Distributed private data analysis: Simultaneously solving how and what. CRYPTO, pages 451--468, 2008. Google ScholarDigital Library
A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: the sulq framework. PODS, pages 128--138, 2005. Google ScholarDigital Library
A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. STOC, pages 609--618, 2008. Google ScholarDigital Library
L.S. Buriol, G. Frahling, S. Leonardi, and C. Sohler. Estimating clustering indexes in data streams. ESA, pages 618--632, 2007. Google ScholarDigital Library
T.M. Chan. Faster core-set constructions and data stream algorithms in fixed dimensions. SCG, pages 152--159, 2004. Google ScholarDigital Library
K. Chen. On k-median clusteing in high dimensions. SODA, pages 1177--1185, 2006. Google ScholarDigital Library
K.L. Clarkson. Subgradient and sampling algorithms for l1 regression. SODA, pages 257--266, 2005. Google ScholarDigital Library
A. Czumaj, F. Ergun, L. Fortnow, A. Magen, I. Newman, R. Rubinfeld, and C. Sohler. Approximating the weight of the euclidean minimum spanning tree in sublinear time. SIAM J. Computing vol. 35(1), pages 91--109, 2005. Google ScholarDigital Library
A. Czumaj and C. Sohler. Sublinear-time approximation algorithms for clustering via random sampling. RSA: Random Structures & Algorithms, 30, 2007. Google ScholarDigital Library
I. Dinur and K. Nissim. Revealing information while preserving privacy. PODS, pages 202--210. ACM, 2003. Google ScholarDigital Library
C. Dwork. Differential privacy. ICALP, LNCS, pages 1--12, 2006. Google ScholarDigital Library
C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. EUROCRYPT, LNCS, pages 486--503. Springer, 2006. Google ScholarDigital Library
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. TCC, pages 265--284, 2006. Google ScholarDigital Library
C. Dwork, F. McSherry, and K. Talwar. The price of privacy and the limits of lp decoding. STOC, pages 85--94, 2007. Google ScholarDigital Library
C. Dwork and K .Nissim. Privacy-preserving datamining on vertically partitioned databases. CRYPTO, pages 528--544, 2004.Google ScholarCross Ref
C. Dwork and S. Yekhanin. New efficient attacks on statistical disclosure control mechanisms. CRYPTO, pages 469--480, 2008. Google ScholarDigital Library
A. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. PODS, pages 211--222, 2003. Google ScholarDigital Library
D. Feldman, A. Fiat, and M. Sharir. Coresets for weighted facilities and their applications. FOCS, pages 315--324, 2006. Google ScholarDigital Library
D. Feldman, A. Fiat, M. Sharir, and D. Segev. Bi-criteria linear-time approximations for generalized k-mean/median/center. SoCG, pages 19--26, 2007. Google ScholarDigital Library
D. Feldman, M. Monemizadeh, and C. Sohler. A PTAS for k-means clustering based on weak coresets. Symposium on Computational Geometry, pages 11--18, 2007. Google ScholarDigital Library
G. Frahling, P. Indyk, and C. Sohler. Sampling in dynamic data streams and applications. Int. J. Comput. Geometry Appl, 18(1/2):3--28, 2008.Google ScholarCross Ref
G. Frahling and C. Sohler. Coresets in dynamic geometric data streams. STOC, pages 209--217, 2005. Google ScholarDigital Library
R. L. Graham and N. J. A. Sloane. Lower bounds for constant weight codes. IEEE Trans. Inform. Theory, 26(1):37--43, 1980.Google ScholarDigital Library
S. Har-Peled. Clustering motion. FOCS, pages 84--93, 2001. Google ScholarDigital Library
S. Har-Peled. No coreset, no cry. FSTTCS, pages 324--335, 2004. Google ScholarDigital Library
S. Har-Peled and A. Kushal. Smaller coresets for k-median and k-means clustering. SoCG, pages 126--134, 2005. Google ScholarDigital Library
S. Har-Peled and S. Mazumdar. On coresets for k-means and k-median clustering. STOC '04, pages 291--300, 2004. Google ScholarDigital Library
S. Har-Peled and K.R. Varadarajan. High-dimensional shape fitting in linear time. GEOMETRY: Discrete & Computational Geometry, 32, 2004.Google Scholar
S. Har-Peled and K.R. Varadarajan. Projective clustering in high dimensions using core-sets. SoCG, pages 312--318, 2002. Google ScholarDigital Library
S.P. Kasiviswanathan, H.K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. What can we learn privately? FOCS, pages 531--540, 2008. Google ScholarDigital Library
C. Lammersen and C. Sohler. Facility location in dynamic geometric data streams. ESA, pages 660--671, 2008. Google ScholarDigital Library
F. McSherry and K. Talwar. Mechanism design via differential privacy. FOCS, pages 94--103, 2007. Google ScholarDigital Library
N. Mishra and M. Sandler. Privacy via pseudorandom sketches. PODS, pages 143--152, 2006. Google ScholarDigital Library
K. Nissim. Private data analysis via output perturbation. In Charu Aggarwal and Philip S. Yu, editors, Privacy-Preserving Data Mining, Models and Algorithms, pages 383--414.Google Scholar
K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and sampling in private data analysis. STOC, pages 75--84, 2007. Google ScholarDigital Library
V. Rastogi, S. Hong, and D. Suciu. The boundary between privacy and utility in data publishing. VLDB, pages 531--542, 2007. Google ScholarDigital Library

Index Terms

Private coresets
1. Theory of computation
  1. Randomness, geometry and discrete structures
    1. Computational geometry

Recommendations

Coresets for differentially private k-means clustering and applications to privacy in mobile sensor networks
IPSN '17: Proceedings of the 16th ACM/IEEE International Conference on Information Processing in Sensor Networks

Mobile sensor networks are a great source of data. By collecting data with mobile sensor nodes from individuals in a user community, e.g. using their smartphones, we can learn global information such as traffic congestion patterns in the city, location ...
Read More
On the complexity of differentially private data release: efficient algorithms and hardness results
STOC '09: Proceedings of the forty-first annual ACM symposium on Theory of computing

We consider private data analysis in the setting in which a trusted and trustworthy curator, having obtained a large data set containing private information, releases to the public a "sanitization" of the data set that simultaneously protects the ...
Read More
Differentially private data publishing via optimal univariate microaggregation and record perturbation
Abstract
We present an approach to generate differentially private data sets that consists in adding noise to a microaggregated version of the original data set. While this idea has already been pursued in the literature to reduce the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
STOC '09: Proceedings of the forty-first annual ACM symposium on Theory of computing
May 2009
750 pages
ISBN:9781605585062
DOI:10.1145/1536414
Program Chair:
Michael Mitzenmacher
Harvard University
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 May 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
coresets
differential privacy
privacy
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,469of4,586submissions,32%
Upcoming Conference
STOC '24

Sponsor:

sigact

56th Annual ACM Symposium on Theory of Computing (STOC 2024)

June 24 - 28, 2024

Vancouver , BC , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 78
  Total Citations
  View Citations
- 793
  Total Downloads
- Downloads (Last 12 months)42
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Private coresets

STOC '09: Proceedings of the forty-first annual ACM symposium on Theory of computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Coresets for differentially private k-means clustering and applications to privacy in mobile sensor networks

On the complexity of differentially private data release: efficient algorithms and hardness results

Differentially private data publishing via optimal univariate microaggregation and record perturbation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Private coresets

STOC '09: Proceedings of the forty-first annual ACM symposium on Theory of computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Coresets for differentially private k-means clustering and applications to privacy in mobile sensor networks

On the complexity of differentially private data release: efficient algorithms and hardness results

Differentially private data publishing via optimal univariate microaggregation and record perturbation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media