L15: Anonymous data Isnt!
Materials
Use the raw slides (pdf) before lecture to take notes.
Summary
- Collecting data and analyzing it is very important for science and societal good,
- Ad-hoc methods for anonymizing data are likely to fail
- Erasing Name: but leaving DoB, Zip, Sex allows Linkage attacks
- Removing PII: but keeping queries (AOL example)
- Replacing id with random #s: Complex linking with auxiliary data sources (Netflix)
- K-anonymity
- This notion is syntactic, it can fail as well
- Follow-ups, L-diversity, etc.
- Differential Privacy (DP): a formal mathematical framework that guards against these types of attacks
- It is not a silver bullet and needs to be used carefully
- Google, Apple both use sophisticated DP