ABSTRACT
A coreset of a point set P is a small weighted set of points that captures some geometric properties of $P$. Coresets have found use in a vast host of geometric settings. We forge a link between coresets, and differentially private sanitizations that can answer any number of queries without compromising privacy. We define the notion of private coresets, which are simultaneously both coresets and differentially private, and show how they may be constructed. We first show that the existence of a small coreset with low generalized sensitivity (i.e., replacing a single point in the original point set slightly affects the quality of the coreset) implies (in an inefficient manner) the existence of a private coreset for the same queries. This greatly extends the works of Blum, Ligett, and Roth [STOC 2008] and McSherry and Talwar [FOCS 2007]. We also give an efficient algorithm to compute private coresets for k-median and k-mean queries in Red, immediately implying efficient differentially private sanitizations for such queries. Following McSherry and Talwar, this construction also gives efficient coalition proof (approximately dominant strategy) mechanisms for location problems. Unlike coresets which only have a multiplicative approximation factor, we prove that private coresets must have an additive error. We present a new technique for showing lower bounds on this error.
- P. Agarwal, S. Har-Peled, and K.R. Varadarajan. Geometric approximations via coresets. Combinatorial and Computational Geometry -- MSRI Publications, 52:1--30, 2005.Google Scholar
- P. Agarwal, S. Har-Peled, and K.R. Varadarajan. Approximating extent measures of points. JACM: Journal of the ACM, 51, 2004. Google ScholarDigital Library
- P.K. Agarwal, C.M. Procopiuc, and K.R. Varadarajan. Approximation algorithms for k-line center. ESA, pages 54--63, 2002. Google ScholarDigital Library
- M. Badoiu, S. Har-Peled, and P. Indyk. Approximate clustering via core-sets. STOC, pages 250--257, 2002. Google ScholarDigital Library
- B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. PODS, pages 273--282, 2007. Google ScholarDigital Library
- A. Beimel, K. Nissim, and E. Omri. Distributed private data analysis: Simultaneously solving how and what. CRYPTO, pages 451--468, 2008. Google ScholarDigital Library
- A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: the sulq framework. PODS, pages 128--138, 2005. Google ScholarDigital Library
- A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. STOC, pages 609--618, 2008. Google ScholarDigital Library
- L.S. Buriol, G. Frahling, S. Leonardi, and C. Sohler. Estimating clustering indexes in data streams. ESA, pages 618--632, 2007. Google ScholarDigital Library
- T.M. Chan. Faster core-set constructions and data stream algorithms in fixed dimensions. SCG, pages 152--159, 2004. Google ScholarDigital Library
- K. Chen. On k-median clusteing in high dimensions. SODA, pages 1177--1185, 2006. Google ScholarDigital Library
- K.L. Clarkson. Subgradient and sampling algorithms for l1 regression. SODA, pages 257--266, 2005. Google ScholarDigital Library
- A. Czumaj, F. Ergun, L. Fortnow, A. Magen, I. Newman, R. Rubinfeld, and C. Sohler. Approximating the weight of the euclidean minimum spanning tree in sublinear time. SIAM J. Computing vol. 35(1), pages 91--109, 2005. Google ScholarDigital Library
- A. Czumaj and C. Sohler. Sublinear-time approximation algorithms for clustering via random sampling. RSA: Random Structures & Algorithms, 30, 2007. Google ScholarDigital Library
- I. Dinur and K. Nissim. Revealing information while preserving privacy. PODS, pages 202--210. ACM, 2003. Google ScholarDigital Library
- C. Dwork. Differential privacy. ICALP, LNCS, pages 1--12, 2006. Google ScholarDigital Library
- C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. EUROCRYPT, LNCS, pages 486--503. Springer, 2006. Google ScholarDigital Library
- C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. TCC, pages 265--284, 2006. Google ScholarDigital Library
- C. Dwork, F. McSherry, and K. Talwar. The price of privacy and the limits of lp decoding. STOC, pages 85--94, 2007. Google ScholarDigital Library
- C. Dwork and K .Nissim. Privacy-preserving datamining on vertically partitioned databases. CRYPTO, pages 528--544, 2004.Google ScholarCross Ref
- C. Dwork and S. Yekhanin. New efficient attacks on statistical disclosure control mechanisms. CRYPTO, pages 469--480, 2008. Google ScholarDigital Library
- A. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. PODS, pages 211--222, 2003. Google ScholarDigital Library
- D. Feldman, A. Fiat, and M. Sharir. Coresets for weighted facilities and their applications. FOCS, pages 315--324, 2006. Google ScholarDigital Library
- D. Feldman, A. Fiat, M. Sharir, and D. Segev. Bi-criteria linear-time approximations for generalized k-mean/median/center. SoCG, pages 19--26, 2007. Google ScholarDigital Library
- D. Feldman, M. Monemizadeh, and C. Sohler. A PTAS for k-means clustering based on weak coresets. Symposium on Computational Geometry, pages 11--18, 2007. Google ScholarDigital Library
- G. Frahling, P. Indyk, and C. Sohler. Sampling in dynamic data streams and applications. Int. J. Comput. Geometry Appl, 18(1/2):3--28, 2008.Google ScholarCross Ref
- G. Frahling and C. Sohler. Coresets in dynamic geometric data streams. STOC, pages 209--217, 2005. Google ScholarDigital Library
- R. L. Graham and N. J. A. Sloane. Lower bounds for constant weight codes. IEEE Trans. Inform. Theory, 26(1):37--43, 1980.Google ScholarDigital Library
- S. Har-Peled. Clustering motion. FOCS, pages 84--93, 2001. Google ScholarDigital Library
- S. Har-Peled. No coreset, no cry. FSTTCS, pages 324--335, 2004. Google ScholarDigital Library
- S. Har-Peled and A. Kushal. Smaller coresets for k-median and k-means clustering. SoCG, pages 126--134, 2005. Google ScholarDigital Library
- S. Har-Peled and S. Mazumdar. On coresets for k-means and k-median clustering. STOC '04, pages 291--300, 2004. Google ScholarDigital Library
- S. Har-Peled and K.R. Varadarajan. High-dimensional shape fitting in linear time. GEOMETRY: Discrete & Computational Geometry, 32, 2004.Google Scholar
- S. Har-Peled and K.R. Varadarajan. Projective clustering in high dimensions using core-sets. SoCG, pages 312--318, 2002. Google ScholarDigital Library
- S.P. Kasiviswanathan, H.K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. What can we learn privately? FOCS, pages 531--540, 2008. Google ScholarDigital Library
- C. Lammersen and C. Sohler. Facility location in dynamic geometric data streams. ESA, pages 660--671, 2008. Google ScholarDigital Library
- F. McSherry and K. Talwar. Mechanism design via differential privacy. FOCS, pages 94--103, 2007. Google ScholarDigital Library
- N. Mishra and M. Sandler. Privacy via pseudorandom sketches. PODS, pages 143--152, 2006. Google ScholarDigital Library
- K. Nissim. Private data analysis via output perturbation. In Charu Aggarwal and Philip S. Yu, editors, Privacy-Preserving Data Mining, Models and Algorithms, pages 383--414.Google Scholar
- K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and sampling in private data analysis. STOC, pages 75--84, 2007. Google ScholarDigital Library
- V. Rastogi, S. Hong, and D. Suciu. The boundary between privacy and utility in data publishing. VLDB, pages 531--542, 2007. Google ScholarDigital Library
Index Terms
- Private coresets
Recommendations
Coresets for differentially private k-means clustering and applications to privacy in mobile sensor networks
IPSN '17: Proceedings of the 16th ACM/IEEE International Conference on Information Processing in Sensor NetworksMobile sensor networks are a great source of data. By collecting data with mobile sensor nodes from individuals in a user community, e.g. using their smartphones, we can learn global information such as traffic congestion patterns in the city, location ...
On the complexity of differentially private data release: efficient algorithms and hardness results
STOC '09: Proceedings of the forty-first annual ACM symposium on Theory of computingWe consider private data analysis in the setting in which a trusted and trustworthy curator, having obtained a large data set containing private information, releases to the public a "sanitization" of the data set that simultaneously protects the ...
Differentially private data publishing via optimal univariate microaggregation and record perturbation
AbstractWe present an approach to generate differentially private data sets that consists in adding noise to a microaggregated version of the original data set. While this idea has already been pursued in the literature to reduce the ...
Comments