Clive Humby, inventor of the Tesco Clubcard, on ways to stop feeling so overwhelmed by data, how to convince your CEO of its importance, and why data should look forward and not backwards.
Clive Humby came up with a brilliant idea. What if you combined those old-fashioned Green Shield stamps, where you licked hundreds of stamps into a book to afford, say, a kettle, with a personalised record of what you’d actually bought?
In exchange for giving the supermarket your shopping history, the understanding was that they would give you special offers and discounts.
However, back in the early Nineties, none of the supermarkets really had any idea of what their customers bought.
In 1994 Tesco, which was the second-most popular supermarket after Sainsbury’s, wanted to create a new loyalty card. The Tesco executive in charge of developing the Tesco loyalty card heard computer scientist Clive Humby speaking at an event and approached him afterwards.
Tesco agreed to trial the dunnhumby Clubcard for three months across nine stores that same year, after which the team were asked to present their findings to the Tesco board. A gulp-inducing moment. At the end of the presentation, it was Tesco’s then-chairman Lord MacLaurin who broke the awkward minute-long silence. He said, “What scares me about this is that you know more about my customers after three months than I know after 30 years.”
Today, Clive Humby is a data science entrepreneur and professor of data science at Sheffield University. Information Age sat down with Clive Humby to discuss how to stop feeling overwhelmed by data, how should use data in the right way, and why data needs to sit in the boardroom and not be shunted off to IT or accounts.
‘We have to look at data being a predictive set of information that tells us what’s changing and what we need to do’
Clive Humby
When did you get interested in data what was it that appealed to you about data?
I started my career in data science in 1976. I went to work for a big American defence contractor that was looking to locate army recruiting station at the end of the draft in America and we started using census data as a way of predicting how many recruits we could get in suburbs of major cities around America. That then became retail location science and led to the development of Acorn, the first of the geodemographic systems, which was classifying you based on where you lived. You have to remember that in those days data was extremely expensive to store so a lot of my early work is how do you reduce a lot of data sets like using techniques like cluster analysis to maintain the integrity of the data but reduce the data dramatically and we’re in a very different age. Now we are in an age of having too much data and nobody tries to reduce it anymore and as a result, they get into a different sort of pickle.
I worked for a company called CACI for 13 years and that developed the first targeting of direct marketing through Acorn, developed retail location studies, and some of the first mapping of customer locations.
Do most enterprise business now how to handle data properly?
No, I don’t think they do. They see data as a technology problem and of course technology is very important to the management of data but for data to be effective you have to put it into what I call context.
I’ll use a simple example. Let us say you are a fashion retailer, and you are selling a black dress. At the start of the season, the black dress is at full price. It is new and the people buy it are very different to people who buy it in week twelve who are very different to the people who will buy it in week 26, when it goes into the January sale.
So, the same product changes its nature over time and that is true of most data sets. Most data sets have to be put into context of the day or the time the data was captured. Going back to my Tesco days, weather is a major factor. Sales of ice cream and soda soar on hot days and hearty meals on the cold days but you wouldn’t know that just by looking at sales figures. To understand what is going on, you have to look at the data in context of the time period. That is true for most data. Most data mean different things on different days and that is where a lot of people become unstuck and that it is why it’s important to need to think about the are you data item but to think about the metadata you can create around the data. Data reduction techniques are the most important part of that journey.
Can you define what you mean by a data reduction technique?
A data reduction technique would be it doesn’t really matter to me what all you buy or perhaps I am just interested in the fact you’re buying hard fruit. Data reduction techniques are about creating new more relevant data from the raw transactional information.
Another example might be that we all use our credit cards when we buy food but spending £200 at an expensive restaurant is very different to spending £50 in a McDonalds. One is not very remarkable while the other is exceptional because it’s so big.
When I talk about data reduction techniques how do take the fine detail and give it context and shape to interpret what it really means? A lot of people rely on machine learning and AI to do that but the problem is that it tends to reduce data in a way you don’t understand. You really need to use human intuition to understand what the patterns mean.
So, I am an overwhelmed data manager and I’ve got millions of data points coming in. What should be my north star when thinking about what to do with all this data?
You really need to think about three things: first, you need to think about what do I really need? In the grocery world, the past four weeks’ transactions compared to the year-on-year sales are much more insightful than having everything because you want to know what’s changed. How do sales compare from this Easter to last Easter, this Christmas to last Christmas? Understanding relative movement in data.
The second thing is to reduces the level of granularity in your data into what I call “baskets of interest”. I am much more interested in the mix of groceries you buy than individual items.
And the third thing, while you might have a warehouse of data with everything in probably every decision you make will need of less than half a per cent for the data. Not trying to analyse all of your data, all the time. If you are looking for trends you don’t need to look at all of the data, just look at 10 per cent of the data. People tend to over-engineer because the technology companies have told them to.
So, despite all these vast datasets grinding away, the human touch and human understanding that’s important?
The way I always think about it was that my job was to watch you unpacking your shopping when you got home and understand how you live your life. Actually, I don’t need to know everything about the products. It’s not the individual items. The individual items are not important. I’m much more interested in product categories, such as if you like ready meals or buy Italian food ingredients.
So don’t get hung up on the atomic level, rely on a small sample of data, just 1-2% can be incredibly insightful compared to being overwhelmed by these millions of data points. The important thing is to get the answer to the question you’re asking
That’s the biggest single challenge and that’s where most people go wrong. They don’t have a metadata strategy. They don’t think about what are the descriptors they want to build and how does the data build that descriptor?
You famously came up with the phrase ‘data is the new oil’. What would your advice be to an IT leader in a company who wants to convince the board about the importance of data?
Ultimately, a business exists to satisfy customer demand, whether it’s a B2B customer or a shopper, and the better you understand your customer, the better you will serve them. Data is all about understanding either your production cycle – say, predicting failure on a production line. Data can predict nearly everything about running a business. The problem is that historically we’ve looked at data retrospectively. It’s been the premise of the finance department to act as a check on have we done a good job? That’s how data has traditionally been seen. We have to look at data being a predictive set of information that tells us what’s changing and what we need to do and then we will use the data in a very different way.
Two things do not put data under finance because they will make it backward looking and don’t put data under IT because they will put it as a storage not as a business added-value issue.
If it’s not stored with finance and it’s not stored with IT, where should it go? Where is the natural home of data in a business?
The boardroom. Data is the dashboard. Any organisation that doesn’t have a chief data officer is not doing a very good job. I would rather have a chief data officer than a CTO.
You have talked about data in a predictive way and where your business is going. Looking into your crystal ball, how is the predictive role of data going to change how we do business?
The biggest thing that’s coming round the corner is the Internet of Things. We are already driving cars that are collecting a huge amount of data about our driving behaviour. At the moment, that’s only being used to predict when your car needs to be serviced or understand impending part failure. The Internet of Things is going to generate a huge amount of information about how we live our lives. Eventually Samsung and other manufacturers, with all their smart devices, are to have far more data about us than the Googles and Amazons of this world.
The other change is that we are going to see reams of open data coming out of government and that gives us a way of understanding the context of what we see in a much clearer way. As we start to understand how all this data sets cross- reference, that’s why we’ll be in a world where data becomes ubiquitous.
Obviously, the big thing coming out of that is how do we control our data as individuals? I think most people are quite rightly worried about data being used against their best interest. But most organisations don’t care about you as an individual, they care about how many people exhibit a certain behaviour, so they can build products and services for them. There’s a fundamental difference between quantifying markets and predicting individuals.
Are you saying that we should not be so worked up about our data being held because it can be anonymised or that people need to understand that they themselves are now the product and they are important to companies?
I’m saying both. We all need to take more responsibility for our data. But we will see trusted parties emerging that will offer this as a service. Of course, we’re all nervous about our data being exploited but we have to appreciate that as a society, UK plc needs to understand the patterns and trends of data to basically make the world a better place. It’s a compromise. For example, if you’re wearing a Fitbit, would you want your health tracker to alert an ambulance if you’re having a heart attack? Of course, you would. But would you want the same data to be sent to an insurance company? It’s the same piece of data but with two very different pieces of value. What is the ethics of that? Where we draw the line with that becomes one of the biggest challenges that we, as a society, face.
Finally, do you think that businesses take the security of data seriously enough? I’m not talking about a Unilever or a Heinz but smaller firms? All of us seem to be in the data collection business
Every business is in the data business. We have to be realistic about our expectations when it comes to smaller businesses using our data and where they should draw the line.
More on data + privacy
Why a data privacy officer should be your company’s next hire – It would be a mistake to assume that the role of a data privacy officer (DPO) is limited to data security
How businesses can prepare for the Data Protection and Digital Information Bill – With the Data Protection and Digital Information Bill currently being reviewed in Parliament, Netwrix vice-president of research and development Michael Paye explains how businesses can amply prepare
Forget digital transformation: data transformation is what you need – Stefano Maifreni, founder of Eggcelerate, discusses why organisations must focus on data transformation to maximise long-term value