It’s been a difficult day for data. The opinion polls leading up to the general election were confident and almost unanimous in their prediction of a neck-and-neck result. So when exit polls contradicted them, disbelief was widespread among both politicians and commentators.
But when the actual results emerged, leaving many to eat their words (and Paddy Ashdown to eat his hat), accusing eyes turned on the pollsters.
There was nothing inherently wrong with the way in which the research companies calculated their results in the run-up to the election. Each one took a sample of the population and applied the relevant algorithms to take account of age and location, and to convert vote share into a calculation of seats.
By and large they came up with the same results. Within the 5% margin of error they stated that it was impossible to say whether the Conservatives or Labour would take the lead.
>See also: Why we need to unlock the value of public sector data
The real problem lies in the fact that we are dealing with a sample of data. Without interviewing the entire population, there is a large amount of adjustment and skills that need to be applied to the relatively small sample size to produce the pre-election opinion polls.
As a starting point, every communication method used by opinion polling companies creates a skewed sample group – whether the interviews are carried out by telephone, online, or in person. Factors such as the reduced use of landlines among the young and the lower use of the internet by the elderly must be taken into account and corrected by the polling organisation.
Secondly, in elections there is the question of constituencies. Although the polling companies are experienced in converting overall vote share into an estimation of individual seats, this is not something that can be relied upon with absolute certainty.
A poll may well have taken in a good cross section of the population as a whole, but will not necessarily be able to predict the voting patterns of people living within a small area.
As UKIP has been pointing out loudly, it’s possible to have over 10% of the vote yet still win only a single seat. The Greens probably feel much the same way.
Finally, when it comes to statistics, it’s easy to forget that we are dealing with humans. People who say they are going to vote in a particular way may change their minds. Or they may not vote at all. And while it’s far too easy to dismiss the results by saying that nobody told the truth, there’s no doubt that people do often feel self-conscious about admitting that they are likely to vote for a particular party, especially if it is one that their peers might frown upon.
By contrast, the exit polls surveyed around 22,000 people – a much larger sample – all of whom had actually voted.
Ultimately, this experience should not cost us our faith in data. Data is concrete – such as that of the final result of the election – and can teach us a vast amount about human behaviour. It should also, however, teach us to treat predictive sample data and extrapolations with extreme caution, particularly if that sample size is small.
>See also: Why Twitter got the UK’s first social media election so wrong
We’ve been here before. The 1992 election was also predicted to be neck and neck but resulted in a Conservative victory, triggering widespread criticism of the polling companies.
David Dimbleby suggested last night that nobody would ever trust polls again, but the evidence suggests otherwise. In business and politics, we enjoy (and sometimes rely on) predictions based on less than perfect statistics – and we will be just as surprised next time the opinion polls fail to predict the result.
Sourced from Alan Hall, SCL