Misleading statistics

Aug 10, 2011 22:00

Over on Twitter, someone retweeted a message from @InjusticeFacts: "Social mobility is a myth, those who are born in the top 1% have a 99.87% chance of remaining right there in the top 1%."

I see a few problems with that claim, but Twitter isn't really the best place to debate something like that, so I'm discussing it here (where I can ramble on for longer).

Firstly, they haven't cited their sources. So, where did they get those figures from? Similarly, who is writing these tweets? Are they a credible source of information?

The blurb for the Twitter feed says: "This is an open, circulating, database of facts that deal with the injustices which plague our world. Please add more facts by visiting the site: http://injusticefacts.wordpress.com/2011/05/16/please-add-the-facts-that-you-know-here/"

Following that link, it says: "If you know of any facts about injustice and inequality, whether local, global, cultural, financial, political etc. please post them in the comment section and we'll add them to our circulating database, which is read by 100,000 people per month." There are no comments listed for that blog entry, which may mean that they only get shown to the blog author.

Does the blog author then verify the facts before reposting them to Twitter? It's hard to say, since the About page for the blog simply says: "This is an example of a WordPress page, you could edit this to put information about yourself or your site so readers know where you are coming from. You can create as many pages like this one or sub-pages as you like and manage all of your content inside of WordPress." In other words, they haven't bothered to fill in that section. So, all we know is that an anonymous source has posted unverified information. This doesn't fill me with confidence.

Still, let's assume that they're telling the truth. What does that tweet even mean? Are social classes defined entirely based on (household) income, or are there other criteria? Also, even if the top 1% is fixed, what about the other 99%? If people can move around within that, e.g. going from the bottom 1% to the top 2%, it doesn't really seem accurate to say that "social mobility is a myth".

Since they have a blog, I think the best approach would be to write a blog post for each topic, then include the link with the corresponding tweet. That way, people can get the quick "soundbite" and dig into the source data if they're interested.

Looking through the Twitter timeline, another tweet says: "Every minute 18 people die of starvation in the world." So, the same principle applies here: where did this information come from?

I've seen this writing style used elsewhere, e.g. "every day, X people will be assaulted" (or worse, "this year, X people will die"). The problem is that it changes the claim from a report into a prediction. Bear in mind that information can get bounced around the internet for a long time (e.g. emails with hoax virus alerts), so something that is true now may not be true in a year's time. It would be better to put specific dates on this, e.g. "Between 2008 and 2010, X people died of starvation every minute".

I also have to wonder whether it was that regular. In other words, was it literally 18 people every minute, or was that an average? If it's an average, how much did it vary? For instance, were there some minutes when nobody died at all, and other minutes where several hundred people died?

Let's assume that the source data includes the time of death. In the UK and USA, you need to have particular medical qualifications in order to officially declare someone dead; typically, that means that a doctor has to do it. An ambulance crew can say that it's futile to attempt CPR (e.g. if the casualty has been decapitated), but that person won't be declared dead until the ambulance gets to hospital. This explains the term "Dead On Arrival" (DOA), since the time of death would legally be the time when the vehicle arrives, not the time when the casualty actually died. So, I would expect to see a jump in the death rate at particular times of day, e.g. when a doctor does their rounds.

Alternately, suppose that the source data doesn't include the time of death; maybe it only specifies the date. In that case, the summary should be expressed using days rather than minutes. (18 per minute = 25,920 per day.) This comes back to something that I learnt in Physics class: the difference between accuracy and precision. For instance, if your ruler only goes down to mm level then don't claim that something is 5.8 mm long; all you can say for certain is that it's between 5 mm and 6 mm.

It may seem as if I'm being pointlessly pedantic here, and losing sight of the larger issues. However, I think the relevant phrase here is: "You can't manage what you don't measure". Obviously famine is a serious problem, and we ought to address that. It is then useful to know whether it's getting better or worse, so that we can see how effective various changes have been, e.g. drilling new wells. If the death rate isn't uniform then the peaks and troughs may be significant. It's fine to have a quick summary, but it's wrong to mislead people.

This is similar to what I said in April about confirmation bias (in the context of the royal wedding): people have a tendency to accept claims at face value if those claims support what we already believe. I've fallen into that trap, and it can be embarrassing when the truth comes out later. (For another example of this, BBC News and PC Pro were caught out by the "people who use Internet Explorer have a lower IQ" story.) However, it can also be more serious. Some people think that the London riots have been provoked by social/economic conditions, and that may be true. However, it would be a bad idea to make changes to government policy based on fictitious figures.

physics, logic, statistics, twitter

Previous post Next post
Up