Ask editor Mar Cabra of the International Consortium of Investigative Journalists (ICIJ), the group behind the Panama Papers what it is she considers her team does, and the answer is what any journalist would say today or 50 years ago: “We use technology to tell great stories.”
It used to be case that the press relied on the phone, a fax and a library of clippings and a network of contacts to write stories.
But now journalists depend on data. As life becomes inexorably more digital, it’s our digital footprint that has to be examined.
That’s why the ICIJ has embraced data-based techniques as core to their mission – and how it is that investigations on the scale of the Panama Papers, which revealed the offshore tax haven secrets of the global elite, were made possible.
Not only does the activity of clients of Panamanian law firm Mossack Fonseca qualify as the world’s largest financial scandal, but also as the largest data-driven investigation: at 2.6 terabytes of data and 12 million documents, it towers over anything Snowden or Wikileaks uncovered.
>See also: Big data vs. privacy: the big balancing act
Notable as the ICIJ’s investigative work and this new powerful data journalism has been, is there a takeaway for the enterprise community from this investigation?
Yes there is, as it clearly shows what a new way of working with data at scale can offer.
So when an anonymous source tipped off the ICIJ about Mossack Fonseca, the team of journalists knew that they would need a specialised tool – one that could process a large volume of highly connected data quickly, easily and efficiently.
That analysis also had to be accessible to journalists around the globe, regardless of their technical skills.
It also had to be able to reveal patterns out of a vast pool of unstructured information, mainly in scanned bank statements and so not easily searchable by conventional means.
Cabra had been exposed to complex data challenges before, and so knew graph databases were the best solution because they excel at spotting relationships inside data, allowing the ICIJ to discern patterns and spot trends that weren’t visible before.
According to Cabra “It wasn’t until we picked up graph database technology that we started to really grasp the potential of the data. And the reaction we started to get from colleagues when we put the data there? ‘Oh my God, this is magic!’”
Graphs reflect the way we understand the world
How can graph technology outperform other more traditional ways of working with data at spotting relationships?
That’s because instead of breaking up data artificially in tables, the way a relational database does, graphs use a notational structure that echoes the way humans intuitively think about and work with information.
Once that data model is coded in a scalable architecture, a graph database is second to none at analysing the connections in huge and complex datasets.
That matters to investigators, as Cabra says, “Relationships are all-important in telling you where the criminality lies, who works with whom, and so on”.
>See also: Machine learning set to unlock the power of big data
That allows not just investigative journalists to spot trends and uncover secrets in ways they have never been able to before, but any team that wants to build and manipulate big data structures.
Social web giants Google, Facebook and LinkedIn, have, for example, been using graph databases to derive value from connected data for some time: the famed PageRank algorithm at Google, which mines connections between web links, is, at heart, a graph application – as are Facebook’s and LinkedIn’s tools for mapping real-time networks and connections to help us traverse our “social graphs.”
As graph database technology has matured, such highly scalable connected data analysis is now available to the masses.
The analyst community is predicting high take up, with Forrester Research claiming 25% of all enterprises will be using graph databases shortly, while Gartner reports that graphs are the fastest-growing category in database management systems, predicting 70% of leading companies will pilot a graph database project of significance by 2018.
Graphs make big data more tractable
Indeed, graph databases are proving their worth in managing much more abstract data.
With the arrival of the Internet of Things (IoT), the era of the petabyte is upon us, but graphs can handle this magnitude easily.
And as the line between analytical and operational repositories blurs, graphs can help enterprises get data in ways that weren’t possible with data warehouses and relational databases.
And there are potentially many use cases based on graph databases, well beyond what the Panama Papers showed us is possible in terms of breaking unstructured data’s secrets.
In any context where large, complex datasets need to be mined, graphs are increasingly the tool of choice.
So consider retail (sophisticated personalisation), financial services (fraud detection), healthcare (the investigation of diseases and cells), media (complex data structures) and government (security plus networks of donors to voters) – there are many examples in each of these and other areas of graph technology proving to be highly useful.
>See also: Smart data is changing the future of big data
It should be pointed out that graph databases aren’t applicable or helpful for every problem; there are transactional and analytical processing needs in business for which relational technology will probably always be the correct option – systems of record such as your financial, HR or ERP may suit a SQL approach better.
What’s more, there are NoSQL (Not Only SQL) database alternatives that handle other vast datasets well.
But a graph database does make sense for any organisation seeking to make the most of its connected data.
In each case, complicated relationship datasets are what graph databases address – and that has to interest any business leader wanting to find new ways of working in our super-connected, data-driven market.
As we’ve seen, graph databases have certainly changed the face of journalism forever.
But as important as the ICIJ’s work has been, it is only the start, as graph databases have the potential to do the same in many more industries, including yours.
Emil Eifrem, co-founder and CEO of Neo Technology