Thursday, March 29, 2012

Statistics Substituting for Logic

You see and hear a lot of statistics in the media. Maybe it's my nearly-middle-aged curmudgeonly nature, but it seems to me that statistics feature strongly in more news stories than ever before. No news items seems to be complete unless there's a statistic or two to underline the concept.

So what's wrong with that, you say? Statistics would appear to be an unbiased, objective analysis of data relating to the subject at hand. We've been brought up to trust statistics.

It's my view that too often, statistics are used as a substitute for a persuasive, well constructed argument. How can that be?

Firstly, the reader has to remember that the writer of the article has selected the statistics that they're giving you, specifically to support the main thrust of their argument. The author, unless it's an academic journal, probably did not include contrary statistics to lend balance, to let you make up your own mind. This does not mean that such adverse evidence does not exist! The source of the statistics is crucial. Are the statistics provided by, or funded by, someone with a vested interest? Is the data even statistically significant? My favourite example of this is advertisements of cosmetics or hair products. Next time you see one of those, along with the strapline of "87% of women agreed that it worked for them", look for the small print that says how many people it was tested on. More often than not you will find that the number of testers was under say, 200. It really beggars belief, doesn't it? Huge, multinational cosmetics companies with advertising budgets of millions, can only find a couple of hundred people, at most, to test their 'fabulous' new product one. Suddenly, when you consider that 87% of 78 people "agreed" or reviewed positively, it doesn't seem such a convincing figure, does it?

Next, it's crucial to read between the lines of the data that's presented, to try to see what the author didn't want you to see or think. Of course, it's often difficult and time consuming to do your own research on any given subject, but a good rule of thumb is that for every statistic pointing in one direction, there's often another statistic pointing equally strongly in the opposite direction.

However, my biggest bugbear is the use (or misuse) of the term "average". Think for a moment about what the word "average" means to you? One of the definitions offered by dictionary.com is "a typical amount, the norm". I'd suggest that what occurs to most of us when we see that term used. How accurate is that perception though?

Consider this graph:
Figure 1: Normal Distribution curve

This is what's called a "normal distribution" graph. It is typical, for example, of exam results, where most results are in the middle of the range, resulting in the peak you see in this graph, with fewer results at either end of the spectrum. You might hear it referred to as a "Bell curve", so named for its shape.

You'll also notice that there are three terms on the graph; mode, median and mean. I'll attempt to explain each of them briefly.
  • Mean - this is what we typically think of as "average". You add all the results together, and divide by the number of samples.
  • Mode - this is the result which occurs most frequently. If you are looking at exam results, then the Mode is the result which most students achieved. This will always be the peak of the graph, as there are more samples at that size (result) than any other.
  • Median - this literally means 'middle'. If you arranged all of the samples (or results) in numerical order, the median is the middle value. For example, the 50th result out of 100, or the 10th result out of 20.
So what? By now you think I'm being overly technical and pedantic, don't you? You can see from the graph above that mean, mode and median are all roughly the same, right? What does it matter if someone refers to "average", or "mean" or "median". It makes no difference, surely? Sometimes it doesn't. Figure 1 is an example of an instance where there may not be much difference.

OK, now look at this graph, Figure 2.

Figure 2: Skewed distribution

Figure 2 shows a markedly different distribution of results. It's obviously skewed to one side, towards the lower end of the results scale. You can clearly see that in this case, mode, mean and median are all going to be drastically different numbers. Now the distinction in the term "average" is really important.

Consider the wage structure of a typical company or organisation. Which graph do you think most closely resembles the wage structure - Figure 1 or Figure 2? Obviously, there will be a bigger concentration of incomes at the lower end of the scale, where the number of people earning the big bucks decreases the higher up the organisation you go. This explains why the mode is the lowest figure, as it's the most common. The median wage will tend to be bang in the middle of the scale, and the mean will often be distorted by the small number of higher incomes at the top of the scale. Which term more closely reflects the wage people are likely to receive working in that organisation?

Now, the next time some politician (the dictionary definition of a person with an agenda or vested interest) tells you that the "average" wage in a given industry (that they happen to be in the process of reforming) is X, think for a moment. Listen carefully to the language used. Did they say "average"? Or did they say something else?

Earlier this week, I read a mostly misleading report written by Edward Boyd, a member of the think tank Policy Exchange defending the scandalous war being waged on police wages. The report claims that 40% of officers will be better off under the new proposals, yet at the same time attempted to portray police wages as unreasonably high by stating that the median police gross wage was £40 402 per year.

Notice the language. The median gross wage was £40 402. This figure includes overtime, which police officers are obliged and required to work if so directed. Overtime that last year would have included policing the riots that spread across the country. Being the median, this also means that it was in the middle of the scale, with no regard to how many officers actually earned that. Think back to Figure 2. If the gross median is £40k, what is the realistic wage earned by most officers? Is it £40k, or is it less than that?

Let's use another analogy. If someone said to you that the average wage for someone working in the banking industry was, say, £50 000, would you assume that this is what the clerk at your local branch is earning? Of course not. That would be a ridiculous assumption; clearly the figure is bolstered by the much higher wages earned in other areas of the industry.

There is one other possibility. The author of this particular report did not understand the term 'median' and misused it. If this is the case, then the report ought to be dismissed out of hand, for its author hasn't sufficient knowledge to write with authority about the subject in hand.

So the next time someone tries to persuade you of their argument by throwing statistics at you, the chances are that the numbers are hiding large holes in the actual logic of their case.

Think carefully. Think critically. Yesterday (29th March), Sky News announced that the surge in petrol buying (caused by the potential for a strike by tanker drivers) had led to an "extra £32m in fuel duty for the Government". Cue lots of outraged people claiming that this was convenient for a Government about to enter another recession. I'm no fan of Goverments, of any colour. On the other hand, let's give this some thought. It's said that petrol & diesel sales went up above normal by more than 80%, and this is where this figure of £32m has come from. Where is this huge extra volume of fuel going? It's going into the fuel tanks of the nation's car, van and truck drivers. Some people, if they've had a total lobotomy, will be storing it in their garages. When this strike issue has blown over, they'll eventually pour this stored fuel into their vehicles.

My point here is that the actual consumption and usage of fuel will remain the same. If anything, it might actually reduce as people drive less, to conserve precious fuel. So - this figure of £32m. Is it "extra" income as was claimed by the media? Or is it simply £32m of fuel duty income that's come in during March that would have come in April had this situation not occurred. I suppose in April, after the double-dip recession has been confirmed, the same media will be claiming that fuel buying had "suddenly plunged" as a result of the recession, when actually it's because the fuel was bought this month rather than next month.

Caveat emptor.

No comments:

Post a Comment