Developing a data strategy means adopting an organised approach to data curation, integration, governance, sharing and security in a manner that spans applications, systems, communities of interest, organisations and even nations.
The key to accomplishing this is managing data independently of its applications. And the overarching strategy needs to incorporate data life cycles, stakeholder attributes, access controls, master data management, and geospatial, temporal and semantic context.
Given the variety and volatility of data used by intelligence agencies, this approach is not without its challenges. Here are four key IT issues intelligence agencies are struggling with.
1. Data and IT systems proliferation
It’s the age-old problem: too many IT systems and not enough data integration. The problem stems from different departments using systems developed organically, bought for specific reasons and at differing times.
Even though each of these systems may be doing the job for which they were originally intended, it wasn’t considered at the time that these systems would need to be integrated in new ways to create information applications agile enough to respond to new threats, integrate new sensors or screening methods, and match evolving analytical intelligence techniques.
System proliferation directly impacts a government’s ability to protect its borders by creating the likelihood that some data will be segregated, orphaned or even unmanaged.
2. A rabbit warren of data silos
Understandably, intelligence services have specialist teams engaged in analysis around specific applications or systems, such as statistics, link analysis and social media analytics.
The problem arises when each separate area’s data feeds, works-in-progress, and even finished intelligence products become trapped in their own fiefdoms and silos.
This ends up complicating interoperability, creating data synchronisation and consistency issues, and reduces return on investment from initiatives such as data centre consolidation and cloud architecture adoption.
Using individual applications – each with their own databases – greatly limits the ability of intelligence agencies to adapt to threats, particularly when evaluating objects and entities related to people, organisations, events, places, and chronologies. It also diverts essential money, time, and resources to infrastructure rather than to the critical enhancement of operations and analysis.
3. Multiple communities of interest
Inevitably, when it comes to the use, monitoring and analysis of any intelligence data, multiple stakeholders and communities of interest are involved. Therefore it’s vital to strike the right balance between sharing intelligence data and safeguarding sensitive content, e.g. personal identification information and even health records.
If threat management systems aren’t designed from the ground up with the understanding that there are multiple communities of interest involved in combatting extremism – true information sharing and collaboration will remain elusive.
4. Data science is still in its infancy
There’s no doubt that innovation in the area of data science is going to transform many different aspects of security and public safety. However, in order to broaden innovation, two key points need to be addressed.
First, precision is key. The same rigour applied to creating a screening algorithm for border control, for example, needs to be applied to any other area.
Second, the IT architecture surrounding data science – frequently a collection of open source tools anchored by Hadoop – requires so much effort and time to wire together that instead of being a platform on which to conduct experiments, it runs the risk of becoming the experiment itself.
A better option is a platform that can support the scientific process, as well as support the algorithms, models, filters, and pattern detectors created from the outset.
Tempting as it may be to simply rip out IT systems and start again, wholesale modernisation is not only economically untenable, but also practically unthinkable from an IT point of view. It ignores the reality that users are trained, productive and familiar with all of the quirks of their existing systems.
Nor can the answer be the incremental and point-to-point integration of these systems as this dooms organisations to an endless loop of engineering and increasingly complex maintenance and quality assurance measures.
Across national security, defence and other organisations, there are dozens of systems implemented and procured in an unsynchronised way. Ripping out and replacing all of these systems is simply not viable.
One approach gaining momentum is an operational data hub, which is an architecture that brings together all data regardless of format or schema. It indexes all structured, unstructured, semantic, geospatial, temporal, metadata and security information, and renders it all easily searchable.
As well as avoiding all of the point-to-point integration, an operational data hub also reduces the need for costly wholesale IT modernisation efforts and can also be quickly adapted, extended and enhanced.
Object-based intelligence and production
Those involved with counter-terrorism and threat management need to be able to create, share, discover and relate information on entities or objects, such as people, organisations, events, observations and chronologies, each of which have multiple attributes of differing values.
With the inclusion of specialised metadata, object-based production means the intelligence lifecycle – the collection, processing, exploitation and dissemination of data – can become more dynamic.
By liberating facts from the confines of their underlying sources or summary documents, cooperating agencies can flexibly and securely share the information they need to fight extremism.
Whilst both relational database management systems (RDBMSs) and NoSQL databases have their strengths, in today’s multi-format, content rich world where 80% of data is unstructured, counter-extremism and security operations need data management systems that are flexible, agile and unfailingly reliable.
Although enterprise architectural patterns built upon legacy RDBMS, such as data warehouses or data marts, address some of the data challenges, they are inflexible and brittle due to the need to express and organise data in harmonised and normalised ‘star’ schemas.
>See also: Want to be a data leader? Here are 8 attributes you'll need
In contrast, modern NoSQL databases are flexible and schema-agnostic as they are designed to cope with the rapidly changing, multi-structured, complex nature of intelligence data.
However, choosing the right NoSQL database is key. Open-source variants do not have all the enterprise-grade security features intelligence organisations require. Only an enterprise NoSQL database combines the flexibility of NoSQL with the enterprise-proven features found in relational databases.
These capabilities include government-grade security, high availability, disaster recovery and backup, elasticity, scaleability and support for ACID transactions, which refers to a set of properties that guarantee database transactions are processed reliably.
As well as considering the technological implications of implementing a new data strategy, there are, of course, also significant organisational, cultural and process changes to be addressed.
Sourced from William Sokol, CTO, Global Public Sector, MarkLogic