pmb was talking recently about how the taboo on talking about salary hurts workers at the negotiating table, because the employers have more data. That reminded me of an idea I had a while ago
( Read more... )
That's the sort of "difficult problems of anonymity and confidentiality" I'm talking about. :) I think good security engineering can mitigate those concerns, but it might impose other constraints on the design.
In the most extreme case, users could each generate asymmetric key-pairs, and submit data through a trusted (open-source) client that encrypts data on their own computer before transmission, using each of their friends' public keys. Data for aggregated public consumption could be submitted pseudonymously through an onion-routing proxy network like Tor. Small random errors or holes can be introduced in the pseudonymous data; these perturbations make individual profiles harder to re-identify, but will tend to cancel out in aggregate statistics over large datasets.
Then the centralized site needn't be trusted, since it would receive only encrypted or already-anonymized data. On the other hand, this is a hard security model to implement correctly, and it would certainly hurt usability. You might be able to get most of the benefits through clever tricks, maybe involving pseudonyms and hashing (c.f. bloom filters for public sharing of private friend lists).
In the most extreme case, users could each generate asymmetric key-pairs, and submit data through a trusted (open-source) client that encrypts data on their own computer before transmission, using each of their friends' public keys. Data for aggregated public consumption could be submitted pseudonymously through an onion-routing proxy network like Tor. Small random errors or holes can be introduced in the pseudonymous data; these perturbations make individual profiles harder to re-identify, but will tend to cancel out in aggregate statistics over large datasets.
Then the centralized site needn't be trusted, since it would receive only encrypted or already-anonymized data. On the other hand, this is a hard security model to implement correctly, and it would certainly hurt usability. You might be able to get most of the benefits through clever tricks, maybe involving pseudonyms and hashing (c.f. bloom filters for public sharing of private friend lists).
Reply
Leave a comment