skip to main content
10.1145/1536414.1536465acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article

Private coresets

Published:31 May 2009Publication History

ABSTRACT

A coreset of a point set P is a small weighted set of points that captures some geometric properties of $P$. Coresets have found use in a vast host of geometric settings. We forge a link between coresets, and differentially private sanitizations that can answer any number of queries without compromising privacy. We define the notion of private coresets, which are simultaneously both coresets and differentially private, and show how they may be constructed. We first show that the existence of a small coreset with low generalized sensitivity (i.e., replacing a single point in the original point set slightly affects the quality of the coreset) implies (in an inefficient manner) the existence of a private coreset for the same queries. This greatly extends the works of Blum, Ligett, and Roth [STOC 2008] and McSherry and Talwar [FOCS 2007]. We also give an efficient algorithm to compute private coresets for k-median and k-mean queries in Red, immediately implying efficient differentially private sanitizations for such queries. Following McSherry and Talwar, this construction also gives efficient coalition proof (approximately dominant strategy) mechanisms for location problems. Unlike coresets which only have a multiplicative approximation factor, we prove that private coresets must have an additive error. We present a new technique for showing lower bounds on this error.

References

  1. P. Agarwal, S. Har-Peled, and K.R. Varadarajan. Geometric approximations via coresets. Combinatorial and Computational Geometry -- MSRI Publications, 52:1--30, 2005.Google ScholarGoogle Scholar
  2. P. Agarwal, S. Har-Peled, and K.R. Varadarajan. Approximating extent measures of points. JACM: Journal of the ACM, 51, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P.K. Agarwal, C.M. Procopiuc, and K.R. Varadarajan. Approximation algorithms for k-line center. ESA, pages 54--63, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Badoiu, S. Har-Peled, and P. Indyk. Approximate clustering via core-sets. STOC, pages 250--257, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. PODS, pages 273--282, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Beimel, K. Nissim, and E. Omri. Distributed private data analysis: Simultaneously solving how and what. CRYPTO, pages 451--468, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: the sulq framework. PODS, pages 128--138, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. STOC, pages 609--618, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L.S. Buriol, G. Frahling, S. Leonardi, and C. Sohler. Estimating clustering indexes in data streams. ESA, pages 618--632, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T.M. Chan. Faster core-set constructions and data stream algorithms in fixed dimensions. SCG, pages 152--159, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Chen. On k-median clusteing in high dimensions. SODA, pages 1177--1185, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K.L. Clarkson. Subgradient and sampling algorithms for l1 regression. SODA, pages 257--266, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Czumaj, F. Ergun, L. Fortnow, A. Magen, I. Newman, R. Rubinfeld, and C. Sohler. Approximating the weight of the euclidean minimum spanning tree in sublinear time. SIAM J. Computing vol. 35(1), pages 91--109, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Czumaj and C. Sohler. Sublinear-time approximation algorithms for clustering via random sampling. RSA: Random Structures & Algorithms, 30, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. I. Dinur and K. Nissim. Revealing information while preserving privacy. PODS, pages 202--210. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Dwork. Differential privacy. ICALP, LNCS, pages 1--12, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. EUROCRYPT, LNCS, pages 486--503. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. TCC, pages 265--284, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Dwork, F. McSherry, and K. Talwar. The price of privacy and the limits of lp decoding. STOC, pages 85--94, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Dwork and K .Nissim. Privacy-preserving datamining on vertically partitioned databases. CRYPTO, pages 528--544, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  21. C. Dwork and S. Yekhanin. New efficient attacks on statistical disclosure control mechanisms. CRYPTO, pages 469--480, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. PODS, pages 211--222, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Feldman, A. Fiat, and M. Sharir. Coresets for weighted facilities and their applications. FOCS, pages 315--324, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Feldman, A. Fiat, M. Sharir, and D. Segev. Bi-criteria linear-time approximations for generalized k-mean/median/center. SoCG, pages 19--26, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Feldman, M. Monemizadeh, and C. Sohler. A PTAS for k-means clustering based on weak coresets. Symposium on Computational Geometry, pages 11--18, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Frahling, P. Indyk, and C. Sohler. Sampling in dynamic data streams and applications. Int. J. Comput. Geometry Appl, 18(1/2):3--28, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  27. G. Frahling and C. Sohler. Coresets in dynamic geometric data streams. STOC, pages 209--217, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. L. Graham and N. J. A. Sloane. Lower bounds for constant weight codes. IEEE Trans. Inform. Theory, 26(1):37--43, 1980.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Har-Peled. Clustering motion. FOCS, pages 84--93, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Har-Peled. No coreset, no cry. FSTTCS, pages 324--335, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Har-Peled and A. Kushal. Smaller coresets for k-median and k-means clustering. SoCG, pages 126--134, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Har-Peled and S. Mazumdar. On coresets for k-means and k-median clustering. STOC '04, pages 291--300, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Har-Peled and K.R. Varadarajan. High-dimensional shape fitting in linear time. GEOMETRY: Discrete & Computational Geometry, 32, 2004.Google ScholarGoogle Scholar
  34. S. Har-Peled and K.R. Varadarajan. Projective clustering in high dimensions using core-sets. SoCG, pages 312--318, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S.P. Kasiviswanathan, H.K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. What can we learn privately? FOCS, pages 531--540, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. Lammersen and C. Sohler. Facility location in dynamic geometric data streams. ESA, pages 660--671, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. F. McSherry and K. Talwar. Mechanism design via differential privacy. FOCS, pages 94--103, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. N. Mishra and M. Sandler. Privacy via pseudorandom sketches. PODS, pages 143--152, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. K. Nissim. Private data analysis via output perturbation. In Charu Aggarwal and Philip S. Yu, editors, Privacy-Preserving Data Mining, Models and Algorithms, pages 383--414.Google ScholarGoogle Scholar
  40. K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and sampling in private data analysis. STOC, pages 75--84, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. V. Rastogi, S. Hong, and D. Suciu. The boundary between privacy and utility in data publishing. VLDB, pages 531--542, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Private coresets

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      STOC '09: Proceedings of the forty-first annual ACM symposium on Theory of computing
      May 2009
      750 pages
      ISBN:9781605585062
      DOI:10.1145/1536414

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 31 May 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,469of4,586submissions,32%

      Upcoming Conference

      STOC '24
      56th Annual ACM Symposium on Theory of Computing (STOC 2024)
      June 24 - 28, 2024
      Vancouver , BC , Canada

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader