Big data as a concept is defined around four aspects: data volume, data velocity, data veracity and data value.
Two patterns emerge when these characteristics are looked at closely. While the volume and velocity aspects refer to data generation process and how to capture and store the data, veracity and value aspects deal with the quality and the usefulness of the data leading to the point.
Data management is a major challenge for most enterprises – even small data is plagued by quality and management issues.
In addition, the digital world is generating new sets of data coming in from different sources (mostly from web) in structured format and unstructured format.
If businesses simply go by the volume and velocity aspects, it qualifies as a big data problem. However, in reality, a lot of this data comprises ‘noise’ (information or metadata having low or no real value for the enterprise).
The purpose of smart data (veracity and value) is to filter out the noise and hold the valuable data, which can be effectively used by the enterprise to solve business problems.
If businesses take the smart data approach, they can always argue that bigger isn’t always better. For a predictive model, will a simple random sample suffice?
What’s the marginal impact on a predictive model’s accuracy if it runs on five million rows versus 10 billion rows? Statistically speaking, the marginal impact is negligible.
How does big data become smart data?
There are no formulas, but one has to better understand the clues in the questions around the data. Analysing data qualitatively enables one to not only become data-driven but also creates opportunities to become creatively-driven. And this is where big data can become smart data.
Instead of just looking at the numbers and making wild guesses about why something works or doesn’t, people who work with data have to humanise it and essentially become ‘data whisperers’.
It is the skill of further analysing the quantitative and qualitative aspects of data together. Businesses have to let the data tell their the story, removing as much of their own bias as possible.
Having lots of data is not enough. The key is to seriously question the data – is the data uniform and regular? Can it be easily extracted and analysed? Is there a significant amount of variation? Is it embedded in a mass of other irrelevant information?
Data interpretation should not be a random activity; it should increasingly point to clear solutions and actionable tasks. The benefits of interpreting data should be analysed.
The collection and exploitation of data is meaningful only when it is used to optimise and automate solutions and solve problems (data-driven decision-making).
There are numerous examples where it can be clearly seen when even changing the colour of a button on a web page leads to higher conversions.
So, the objective is to not only understand and link together the various activities happening through data, but also to improve the performance of an existing process, or develop capabilities to predict the next set of outcomes.
This essentially means that the focus should not just be to collect a vast amount of all possible data, but also contextualise each bit of data with its own specific context.
Data needs to be understood and interpreted in a specific context. For example, what is the value of some information about a website visitor clicking on a link if the context that precedes and follows the clicking is not known?
Does this mean big data is dead? Not really. Understanding and having a complete view of user behavior is critical, and in this case big data plays a key role.
>See also: Big data and mapping – a potent combination
If real-time perspective of user behavior across channels of interactions broken out by some demographic or geographic attribute is the need, then why discard useful data? You should go big.
However, if a machine-learning algorithm can give product recommendations using modest data sets, why take the big data route?
Approaching data science intelligently doesn’t necessarily mean everything has to revolve around the notion of big data. It just means knowing when to pull out the Swiss army knife instead of a chainsaw.
The main objective is to move from data management organisation culture (struggling to manage all kinds of data) to learning organisation culture (leveraging all the value behind the data).
Sourced from Soumendra Mohanty, global head of data and analytics, Mindtree