L15: Anonymous data Isnt!

Materials

Use the raw slides (pdf) before lecture to take notes.

Summary

  • Collecting data and analyzing it is very important for science and societal good,
  • Ad-hoc methods for anonymizing data are likely to fail
    • Erasing Name: but leaving DoB, Zip, Sex allows Linkage attacks
    • Removing PII: but keeping queries (AOL example)
    • Replacing id with random #s: Complex linking with auxiliary data sources (Netflix)
  • K-anonymity
    • This notion is syntactic, it can fail as well
  • Follow-ups, L-diversity, etc.
  • Differential Privacy (DP): a formal mathematical framework that guards against these types of attacks
    • It is not a silver bullet and needs to be used carefully
    • Google, Apple both use sophisticated DP