When the Ebola virus started spreading and we realised we were dealing with a major outbreak, there were 30 researchers and scientists on VBI’s rapid response team that initially provided the U.S. Department of Defense and West African’s Ministries of Health (MOH) with short-term forecasts on disease spread.
However, as the number of Ebola cases climbed, VBI moved to agent-based computational modeling to provide more in-depth analysis of how the disease might spread.
According to statistics released by the U.S. Centers for Disease Control and Prevention, there have been 22,000 reported cases of Ebola, which has claimed the lives of more than 9,000 people prior to February 2015.
In order to help rapid response efforts to mitigate the spread outside of the infected regions, it was critical that we understand how Ebola passes among individuals, while monitoring outbreaks and infection clusters.
>See also: How big data can turn around our National Health Service
The only way to accurately do this was to create an adaptable set of global synthetic populations. These detailed demographics, family structures, travel patterns and activities were used to help model what would potentially happen as the disease spread.
The synthetic data was created in such a way that it mirrored actual census, social, transit and telecommunications data patterns from the targeted population, whilst omitting personally identifiable information. We built entire virtual cities on local, regional and global levels.
VBI prides itself on having a unique ability to create these models with incredible breadth and depth of detail. But what this means is that on the computational side of things we need to keep up with the constant appetite for compute and storage capacity to produce such levels of information.
Ebola outbreak modeling – the technology
To support the outbreak modeling, we needed a mix of computations, which meant the compute and storage would have to be both powerful and flexible to handle the various workloads. Some of the models we were running required a lot of data, while others had to have a constant stream of information which requiring parallel processing of the data.
Just one instance of our global population model requires 10 terabytes of storage, and with that, we might run a simulation 15 different times with varying conditions, which means we really needed a storage system specifically designed for big data.
To address the challenges, we took advantage of our own high performance computing (HPC) system, Shadowfax. With 2,500 cores and nearly 1 petabyte of DDN storage, VBI’s computational modeling tools are designed to scale rapidly and deliver massive performance. So much so that creating a synthetic population of the U.S. now takes 12 seconds, and expanding the model to accommodate a global population of almost 7 billion people will take around 6 minutes. To put that into context, ten years ago it would take us over an hour to simulate just one city.
As you can imagine, that’s a significant amount of data being created in a very short space of time, so we used high performance storage from DDN. Our environment is built on two DDN Storage Fusion Architecture (SFA) GRIDScaler Appliances, with IBM GPFS parallel file systems embedded within the storage controllers.
We also relied heavily on our own internally developed HPC modeling tools, as well as Panda and Python open-source data analysis tools, to help the DoD’s Defense Threat Reduction Agency (DTRA) and West Africa’s Ministries of Health (MOH) determine the resources needed to combat the outbreak.
Applying the theory to the real world
The synthetic global population actually forms the basis for a lot of our research both now and in the future. We’re able to act quickly when situations arise to offer insight based on hundreds of unique combinations of parameters.
But our research isn’t just theoretical – it feeds into the real world where, for example, we are able to work with agencies like the United Nations Children’s Fund (UNICEF) to help plan and provision aid and medical supplies.
We pride ourselves on our responsiveness, which has been tested on multiple occasions. Case in point: we received a call from the DoD on a Friday asking for input on where to place new emergency treatment units (ETUs) by Monday morning, when the military transport planes were taking off. With the data in hand, we quickly looked at a lot of different variables, including road infrastructure in West Africa, and were able to identify hot spots where additional outbreaks were considered imminent and advise where best to place treatment units.
With Ebola, each infected person, on average, infects two other people – so over time that’s exponential growth. Providing timely information and advice to the DoD was critical. Any delay in getting it to them would have essentially been the same as not providing an answer at all – and that’s the difference between life and death.
>See also: Getting the balance right with privacy and e-health
Buzzword for a reason
Big data may well be a buzzword, but for good reason – it can give valuable insights from seemingly disparate facts that, ultimately, can help prevent the spread of infectious diseases like Ebola.
Managing data stores associated with big data is entirely different from managing traditional data infrastructures. Storage technologies such as DDN’s are designed specifically for big data and web and cloud environments, where features like scalability, performance, availability and manageability are paramount. Next-gen storage technologies are making it possible for VBI to deal with very large data sets, analyse it, and therefore gain deeper insights that can make real-world differences.
But this all depends on the ability to store and effectively access massive amounts of data and do it reliably at speed to be able to fully capitalise on the promise of big data.