These days, very few businesses have the luxury of standing still – especially one like bet365. We operate in an incredibly competitive and fast moving industry. Add to that the zero cost of entry for the consumer and any interruption in the experience and you risk losing your audience.
It's not surprising then, that there's zero tolerance for glitches at bet365. But maintaining the peak performance of our systems comes at a cost. It's what I call the 'Scale/Innovation Dilemma'.
As well as ensuring our systems can scale to meet the demand, we are under constant pressure to develop leading edge products that will excite customers and continue to promote growth. Problem is when you're seeing the kind of explosive growth that we do, you end up spending most of your time re-engineering existing systems to cope with the ever-increasing loads.
And when your developers are having to run harder and faster, just to keep up with the problems of scale, it becomes increasingly difficult for them to find time to innovate as well.
The challenge of scaling our systems was the reason the R&D team was set up three years ago. Back in 2012, we could see that our existing systems would not scale to meet user demand moving forward. So, Martin Davies, CEO of bet365's technology business, brought together a small group of developers who would work on finding solutions to the difficult problems of scale and reliability.
The team would exist outside the commercial business so that it could devote all of its resource to the problems at hand.
The main remit of the R&D team is to challenge the conventional way of doing things and to look at different ways of solving problems. To do that, we look at both new technologies and re-purposing old technology.
In this case it’s an older technology rather than a new one that has provided a solution to our massive parallel processing problems.
Although it is only now earning column inches, Erlang is actually a very mature technology. It was developed in the 1990s by Ericsson for telephone switches. From an IT perspective, it’s interesting that the problems telephone switch providers had to deal with the in 1990s are very similar to the kind of problems we have today – problems of reliability, scalability and simplicity.
With Erlang we’ve found that we can massively reduce the complexity of our code base and build relatively simple systems at pace. Hence the switch from Java.
Our first big success with Erlang was on our system that pushes data to customers using our In-Play product. We have 17 million customers worldwide and currently, during peak times, we are making up to 100,000 changes to the system every second, whilst serving over 2.5 million concurrent connections.
In the past we used a third-party product that worked well but was becoming more complex to tailor to our needs. Written in Java, there was a lot of code to rewrite. On top of its technical complexity it also required a lot of hardware to run.
My team was tasked with making it more efficient, reliable and flexible, simplifying the code environment and reducing the hardware needed to run it.
Martin had followed Erlang’s use in large-scale middleware projects and suspected it could work for us because it was built for distribution, reliability and concurrency.
We were sceptical at first. Coding in Erlang requires a different approach to Java and for people that have built a career as Java developers, it meant changing an ingrained mindset. That scepticism, however, didn't last long. The team picked up and adapted to Erlang very quickly and within a few weeks we were convinced it was the right tool for the job.
It took us six weeks to produce a working prototype of the new Push product despite the need to learn a new language and write the new system from scratch. It took us a few weeks more to refine the system and get it into production.
The results were amazing. We've achieved a profound reduction in the complexity of the code and maintenance of the system. We went from tens of thousands to hundreds of thousands of users supported on a single machine.
Significantly, we also saw a significant increase in the speed of product development and delivery.
This has seen it play an essential role in the creation of our new Cash-Out product, launched in 2014. A feature that allows users to close a bet early, before an event has finished. Something that requires massive computation of odds in real time.
More recently, Erlang has taken an important role in our migration to NoSQL. Although attractive, migrating to NoSQL is not straightforward. When legacy systems are all based on SQL systems, you can't just switch to NoSQL overnight because re-engineering the existing system and its multi-million lines of code would take lifetimes to achieve in terms of coding hours.
>See also: Why 2015 is the year of DevOps culture
By integrating our Erlang systems with Basho's NoSQL system, Riak (also written in Erlang) we are able to build a NoSQL system in tandem with our core technology platform. Moving forward, this is likely to be the real answer to our scale/innovation dilemma.
While I would not suggest that Erlang will replace Java throughout all of our systems, it is true to say that the adoption of this technology has been the key to meeting the challenge of scale and ushering a new era of innovation into bet365.
The small amount of pain getting used to the new language has been worth it and we now have a new tool that will play a major role in the development of our next generation betting platform.
Having been through it and come out the other end with the successes we have, my advice to other software development teams would be to look outside your Java comfort zone and instead find the right tool for the right job.