That's a question that is often discussed, though not always with the necessary rigor. The general consensus is that 'you can have the statistics say whatever you want'. I strongly disagree with this statement, and I'll use a real-life example to show why.

Are my shrimps chips the best ever?

I stamble across the following label a few days back:

shrimp chips

(see also the product page on Casino's website)

The package of my (cheap and average tasting) shrimp chips advertises that they have been chosen as having the best taste.  The writing around the central circle states that they have been compared to 4 competitor's chips by 125 consumers.

Everything is wrong with this, and it is one of the core rationale for the common saying that statistics can tell whatever the commentator/journalist/politician/etc. want them to. The truth however is slightely different: you can indeed have statistics tell whatever you want, if and only if you use a weak enough methodology.

Here, the sample is only composed of 125 consumers. In France, there are about 65 million persons. That is, the chips have been tested by 0.000002% of the population. Sure, it wouldn't be wise to use the full population as a sample, but you might want to scale up a bit the size of the sample. To test the preferences of a population on things that are supposedly easily distinguished, such as politicians, survey institutes typically gather the opinions of a random sample composed of about 2000 people. Now, if you want people to make a difference between 5 similarly looking and tasting chips (which rip your palace at each bite by the way), you should use a bigger sample than a mere 125 persons.

There are several advantages in choosing such small sample:

  • It is less costly,
  • You are much more likely to see a false positive appear, as your confidence interval for your statistics is as good as null. That is, you are much more likely to find an artefact, something that doesn't holds for the whole population, but that you nevertheless find in your sample by pure luck,
  • If you don't find the expected result, you can easily discard the result as non significant, and try again. Remember 1), such study on few people is relatively quick. So in the end, it's more like 'keep on sampling 125 people until you find the desired result'.

It is crystal clear here that Casino is using a very weak methodology. They actually do their best to keep costs low and to have the stats be wrong, or at least unreliable.

Take-aways and further readings

  • Statistics are reliable if and only if the methodology used is rigorous.
  • We always need to be sceptical of statistics, and find out the methodology, sample size, assumptions, etc.
  • All domains are concerned by this, from shrimp chips to finance and economics
  • Campbell Harvey and host Russ Roberts discussed at length the potential weakness of statistics in an EconTalk episode