It is a job role that has been glamourised in recent years, with some even describing it as the ‘sexiest job of the 21st century’. No, not playing James Bond – the data scientist, known for its use of cutting-edge, open-source technology.
Raising the profile of the role has been essential to attract more people to the industry, at a time when there is a worrying skills shortage. But data scientists will only survive in the enterprise if they gain a reputation for supporting organisations’ use of structured and unstructured data for real and accurate business benefits.
Data scientists’ time in the spotlight will be short-lived if they become known for jumping on the latest new technology without purpose.
Of course, technology plays an important role, but data scientists should ensure analytics are truly client-focused by adopting the technology required to provide the richest answer.
>See also: A recipe for the modern data scientist
The new-age data scientist uses lateral thinking to solve real business problems instead of wasting precious time and resources. This means that new technologies must all be evaluated, analysed and tested to determine the real value they have to the business.
One of the areas that data scientists get excited about is open source – this is a natural fit because of the exploratory nature of data science that leads to a high level of user collaboration.
It also happens that many of the ‘hot’ new technologies on the market do emerge through this movement. Established tools for data management and analysis such as Hadoop and Cassandra are all open source and offer huge gains in flexibility and cost.
It’s not just about thinking about cost, though – be it how much you save or how much you spend. Open standards are helping to create a level playing field and allowing companies of all sizes the opportunity to plug lots of things together.
This means businesses are able to combine the capabilities of open source and commercial technology and enjoy the benefits of both.
Exciting tools continue to arrive on the scene every day aiming to replicate their success, which is never easy. And some do have the potential to fundamentally change the way organisations do business – take for example the success of technologies like Python and Spark in the data science field.
The reality is, newer open-source technology isn’t necessarily mature enough to create a stable platform for the enterprise and organisations need to adopt a cautious approach.
Security and a lack of support may also hold enterprises back from using the latest technology on the market. It’s important that business leaders understand the risks involved with implementing new technology that is either not ready or secure enough to be used. It’s all well and good installing the latest tyres on your car, but if they fall apart when you’re on the road, you’re in big trouble.
Once the right piece of technology has been identified for the business, the focus should immediately switch to how to implement and maximise its impact so the business doesn’t stall when it’s up and running.
To maximise the opportunities of innovation through collaboration, while meeting the enterprise need of reliability and ease of use, more and more start-ups are creating enterprise wrappers to go around open-source tools – for example, Wakari.
Revolution Analytics, which was recently acquired by Microsoft, is another company making waves, providing software and services for R, a programming language that is widely used across the world for statistical computing and predictive analytics.
With much more control and security, and still lower cost than proprietary enterprise software, this approach strikes the balance most enterprises need between cutting edge technology and stability.
>See also: How to become a part-time data scientist
Bringing together the best of open source with commercial awareness has much greater combined effect at the fraction of the cost.
Moving forward, the importance of data scientists will become increasingly evident as they analyse the latest products to roll off the service line. The best enterprise data scientists will not be distracted by ‘the next big thing’ and will keep their focus on the tools that help to gain the richest possible insights for the business.
At a time of huge innovation, driven by open source in particular, data scientists need to remember that the best fit technology for the enterprise is not always the newest. The risk is simply far too great to a business to take a chance and install the first shiny piece of new technology they see.
Fundamentally, bleeding-edge technology must be balanced with the security and stability needs of a large enterprise.