I read with some interest, and a fair bit of astonishment, Richard Lee’s recent column alleging that data science is dead. Of course, everyone is entitled to an opinion, but when the central premise of the point you are making is inherently faulty, you should expect a strong rebuttal.
Richard’s first idea that ‘we are starting to see big data wane’ confuses an abatement in media hype with the actual statistics on the ground. Big data as an industry sector is growing, and growing fast – A.T. Kearney forecasts global spending on big data hardware, software and services will grow at a CAGR of 30% through 2018, reaching a total market size of $114 billion.
Big data was projected to be a $28.5 billion market in 2014, growing to $50.1 billion in 2015 according to Wikkbon. Forbes published an excellent round up of numerous studies and surveys into current and long-term trends in big data. To save you some time reading it, all the graphs are going up. Big data is alive and kicking.
Regarding the actual definition of data science used in Richard’s column – to be blunt, it’s wrong. Any good data science is based on the rigour of statistics. It was born from the merging of methodologies, techniques and ideas from both statistics and computer science.
From statistics, data science inherited among other things, methods for designing experiments, testing hypotheses and dealing with auto-correlated data (e.g. predicting stock market movements). From computer science, data science inherited all the advances in database design and the innovations in machine learning.
Data science is definitely not about blindly adding loads of data to solve a problem – it’s about the art form of intuitively understanding a problem, imagining how to translate that into something more mathematical, applying the best techniques out there to get an answer and, finally, translating that back into something intuitive that a non-techie can understand.
The idea that data science is untested ignores the fact that many companies are using it successfully. Granted, the sector is in its infancy and as such hasn’t yet gained mass adoption, but that is changing. There’s simply no evidence that there’s a decline in demand for data science. I can say from my own personal experience at Profusion and going by the number of jobs recently posted looking for data scientists to work in-house, demand is actually increasing.
You don’t have to take my word for it – Capgemini posted a wonderful round up on the health of the data science sector. The research noted a report by Transparency Market Research that revealed that the global market value of Hadoop is set to rise from $1.5 billion in 2012 to $20.9 billion in 2018. Given that Hadoop is one of the most popular database solutions used by data scientists, that’s not bad growth in a dead or dying sector.
I do actually agree with Richard’s position on encouraging citizen scientists. The open data movement is wonderful. Additionally, the democratisation of statistical knowledge through the likes of Coursera and other MOOCS (massive open online courses) opens up the opportunity for many more people to become citizen scientists if they have that inclination.
However, there is still the hurdle of people being able or willing to dedicate enough time to learn the basics of statistics. That’s not to say that ordinary citizens aren’t able to do some data explorations, but without statistics it will have to be very basic.
>See also: Gartner reveals bleak outlook for Hadoop
Citizen scientists are therefore not going to replace data scientists. The level of skill currently required to make sense of big data necessitates highly skilled professionals. If you wouldn’t expect someone trained in first-aid to be able to do open heart surgery, why would you expect someone with limited experience of data to do a better job than someone who has dedicate their life to the profession?
I don’t normally feel moved to directly respond to critiques on my profession. However, when the central premise of an argument against data science is built on entirely incorrect assumptions, it is only right to set the record straight. Data science is doing fine.