The devil is in the details – streamlining through Bayesian statistics

As the big data discussion continues in earnest, it's easy to forget the swathes of data already being generated through more traditional channels.

What's more, the jury is still out as to whether businesses are correctly interpreting and employing the existing data they have. Increasingly, standard frequentist interpretations of statistics are falling short and providing distorted results that may actually be damaging. The answer then? A lesser known branch of statistics based upon Bayesian inference.

>See also: Big data and mapping – a potent combination

For those who may not immediately recognise the distinction between frequentist and Bayesian statistics: the frequentist models that were considered "standard” during the 20th century base their statistical analysis on probability.

Dealing solely in hard numbers and classical objectivism, frequentists measure the probability of the pure maths involved in any calculation. Bayesian statistics, on the other hand, propound an entirely different kind of scientific reasoning. Not only is the initial problem considered, but so too any variables that may occur over time – augmenting and refining results with increasing accuracy.

Sea change

For the non-mathematically-minded, the differences may verge upon the mundane, if not the absurd – the general belief being that all statistical information is simply "fact”.

However, since the turn of the century, Bayesian statistics have become integral to fields as disparate as biology and battery development, lab research and logistics, with an onus on interpretation pushing many industries forward in leaps and bounds.

These new results began to highlight glaring errors in many data sets previously considered reliable, and statisticians are only now beginning to cross check the two sets of results.

This has led to many questioning our concept of the "facts” with an eye towards combining both methods with a greater movement towards knowledge, evidence and prediction rather than cold statistical analysis.

Real world applications

A case in point comes from clothing retailer Zalando. Faced with a complex, labour-intensive, time-consuming and expensive distribution problem, Zalando turned to Bayesian statistics to simplify its operations.

Each item shipped from the Zalando warehouse needed to be weighed manually to identify postage costs. Naturally, when dealing with such a large logistical operation, the manual approach was eating into valuable resources – both labour and money was simply being wasted.

Zalando’s idea was simple: its data scientists were to make use of a rich vein of existing data automatically generated when shipping parcels through their chosen logistics partners.

However, its initial formulas were producing wildly varying results – essentially suggesting that some parcels had weighed less than zero grams. It seemed that Zalando’s simple idea was in need of something a little more complex.

Thankfully, along came the wonder of the Bayesian model. In simple terms, Zalando’s data scientists began to incorporate data from varying different sources, including such exotic information regarding the many, varying packaging materials. This, combined with the implementation of a confidence interval (where the data is compared to a "common sense” scale), began to generate surprisingly accurate results.

>See also: The era of big data won’t materialise without fast data

The payoff

With this increase in accuracy, Zalando was able to automate the entire weighing process, increasing efficiency and saving millions of euros at the same time.

Any business thinking of applying Bayesian statistics to similar problems can find a more detailed account of Zalando’s formula on its company blog.

For Zalando, the process might not have been as a simple as originally conceived, but the payoff has most definitely been worth the headache.

Avatar photo

Ben Rossi

Ben was Vitesse Media's editorial director, leading content creation and editorial strategy across all Vitesse products, including its market-leading B2B and consumer magazines, websites, research and...

Related Topics

Analytics
Big Data
Data