One Month is Enough

Jul 06, 2008 00:35


Found here

Dear friends,

The first rule of crisis management is to get ahead of the story. Since my shameful secret is about to be revealed, I decided to break it here first. I’d rather you heard it from me than from the media:

In March 2008 I watched Rick Astley’s music video Never Gonna Give You Up on YouTube. It’s widely considered to be the most corny music video ever created. I have no excuse; I can’t even claim to have been RickRolled. I heard about the video, and willingly went and viewed it. It was me, just me, officer!

The reason for this confession is that Google is about to hand over to Viacom a complete list of every video watched by YouTube users:

[...] the judge granted a Viacom motion that records of every video watched by YouTube users, including their login names and IP addresses, be turned over to the entertainment giant.

The order prevents Viacom from using this information to target lawsuits at users. But it makes no sense to give this information to Viacom in the first place: Google could easily make this data anonymous, and they’ve asked Viacom to do just that. Viacom have said that they won’t use any personally identifiable data, but they haven’t replied to Google’s request directly. These mixed signals make me lunge for my tin foil hat: what could explain Viacom’s behavior? Perhaps, once they have the logs in their possession, they intend to ask the judge to allow them greater use of the data. Or perhaps the data will be “accidentally” leaked - after all, that sort of thing happens all the time.

But criticizing a media company like Viacom for ignoring users’ privacy is like berating a toddler for getting food all over themselves: it’s in their nature, and they’re going to keep doing it. Let’s beat up on Google instead, that never gets old. Google shouldn’t have kept this data around for Viacom to subpoena. Google deletes personally identifiable user data after 18 months, which isn’t enough to hide my Rick Astley obsession. Google’s track record on privacy is spotty in general. For example, after a lot of pressure they finally added a link to their privacy policy on the Google homepage in July 2008, not before bitching and moaning like a teenager whose parents have forced him to clean his room.

Google has some of the most sensitive data in the world; in particular, they know every search that a user makes. In their Privacy FAQ they list several good reasons why they need to keep this data:
  • To improve search results
  • To maintain the security of their systems
  • To prevent fraud and other abuses

It’s true that in order to achieve these goals Google needs to save the search logs. However, the problem isn’t that they keep the search logs; it’s that they keep personally identifiable information in the logs, which lets them (or anyone else, such as Viacom) associate searches and clicks with real people. Google keeps this information for 18 months, and that’s far too long. They could erase the personal information much sooner and still achieve all of the goals described above.

For example, Google use the search logs to find common spelling mistakes made by users, so that they can offer automatic suggestions for the correct spelling. This doesn’t require any personally identifiable information. Another use for the search logs is to detect click fraud. For this purpose it is indeed useful to look at the search and click history of individual users. However, the benefit of this personal data quickly diminishes with time. Data about click fraud that is over a month old should be considered prehistoric; the perpetrators are long gone from whatever IP they had been using.

Google’s privacy policy doesn’t say how long they keep search logs; probably forever. The only promise they make is to scrub out personally identifiable information after 18 months. Google are very vague about where this figure of “18 months” comes from; perhaps it has some religious significance. From Google’s Privacy FAQ:
Why are logs kept for 18 months before being anonymized?

We strike a reasonable balance between the competing pressures we face, such as the privacy of our users, the security of our systems and the need for innovation. We believe 18 months strikes the right balance.

It’s time we told Google: 18 months is too long. One month would strike the right balance between privacy, security and the need for innovation. With one month of personally identifiable information, Google will be able to catch all the fraud they are ever likely to catch. After that, it’s time to anonymize the data. The anonymized data is still useful for improving their search engine.

Go to Google’s Privacy Feedback page and ask them to reduce the amount of time they keep personally identifiable data in their logs. You could use a message such as this one:

Dear Google,
I’m concerned about your data retention policy: you keep user identifiable information in your search logs for 18 months, and that’s too long. As we have seen with the recent lawsuit by Viacom, this information can easily fall into the hands of third parties. To protect my privacy and the privacy of the rest of your users, please reduce the amount of time you keep personally identifiable data to one month.   Thank you.
Google isn’t alone in this. Microsoft also anonymizes its logs after 18 months. Yahoo makes do with just 13 months (how did they come up with that number? Perhaps it also holds occult significance). Ask.com, the fourth-largest search provider, gives its users the option of making completely anonymous searches. But we should focus on Google: where the market leader goes, the rest will surely follow.

Originally published at MattEbert.com. Please leave any comments there.

internet, privacy, google

Previous post Next post
Up