It is a question of trust. The value of business intelligence (BI) in the enterprise is undermined entirely if the data being analysed is erroneous, incomplete or contradictory. Even a perfect data model will render misleading results if the underlying data is defective.
Dirty data threatens to undermine critical business processes. In some cases, it can trigger a breakdown in the supply chain, with inventory data indicating plenty of stock even though the shelves are empty; it can inspire the wrong decisions as managers interpret misinformation; and it can lead to inferior customer services and customer defection after personal details are incorrectly recorded or duplicated.
The difference between bad data and dependable data can also – quite literally – be a matter of life and death. Just ask The Ministry of Defence (MoD).
Its logistics arm, supporting key supplies to the the Royal Navy, Army and RAF, uses the NATO Codification System under which each item has its own unique identification code. Unfortunately for the MoD, however, the state of its coding system was until recently in disarray: across the Navy, Army and RAF, three separate codes were used to identify a single item. The result was that someone in the Navy could enter their stock number for a printed circuit board, but using the Army’s system, they would get a plastic bag delivered.
The MoD addressed the issue by creating a single master database for use across the three forces, before integrating all item codes.
“Organisations have to overcome issues like these and do band-aid work to rectify the problems,” says Ed Wrazen, marketing vice president of data quality vendor Trillium Software, which helped the MoD solve its data management problem. “Because data quality is pervasive through many aspects of business, it causes lots of broken processes.”
The weakest link? – Human error
The issue is widespread. In Information Age’s recent reader survey on Business Intelligence, sponsored by business intelligence and performance management software company, Cognos, one business in every eight interviewed said that data quality was having a highly negative impact on their organisation’s business intelligence efforts.
Moreover, almost two thirds of the 529 respondents said that data quality was having at least a moderate effect on their ability to implement BI.
That echoes a similar survey of 600 UK, US and Australian businesses by consultancy PricewaterhouseCoopers. It found that over 75% of respondents had suffered “significant problems, costs or losses” because of poor data quality.
But at the kernel of data quality issues are people, rather than technologies.
According to the Data Warehouse Institute, 76% of all dirty data is a result of unsatisfactory data entry by employees or their contractors. A case in point is mobile phone retailer, the Carphone Warehouse. As part of the company’s initiative to create a single view of its customers, Carphone Warehouse embarked on a vast data integration project. But before the company’s 11 million records could be integrated, the source fields, format and content needed to be understood.
One of the observations uncovered during the data quality assessment process was that customers who bought products at outlets of the Carphone Warehouse in airports often had no address. It appeared that staff members at airport shops had been using the customer address fields to log the customer’s flight numbers.
The reconciliation of out-of-synch data is a huge chore and one that is often not taken into account until data integration projects are already well under way. “Without data quality tools, organisations simply don’t know how big their problem is,” says Tony Fisher, CEO of data quality software vendor, Dataflux.
The problem of bad data is becoming so acute that users often do not trust the results they obtain from their BI front-end tools.
“At present, information quality issues are the weak underbelly of data warehousing, ERP, CRM, and diverse enterprise applications. However, this situation is a well-kept secret because lack of information quality opens up potential governance and compliance liabilities that organisations have tended to sweep under the rug,” says Ted Friedman, an analyst at IT industry advisory group, Gartner.
A view to a skill
While organisations do not want to air the fact that they are often working with poor and untrusted data, they are working to fix the problem. The growing need for accurate BI has fuelled the adoption of data quality software and processes, such that industry analyst group Forrester Research predicts that worldwide sales for data quality technology will surpass the $1 billion mark by 2008.
And the benefits are already appearing. There is more intelligence required in the reporting process now, particularly in industries such as financial services, manufacturing, retail and utilities, says Dataflux’s Fisher. BI provides some of the analytics behind that reporting, but it is highly dependent on accurate and consistent data. “What most companies don’t realise,” he says, “is that BI tools do not provide them with any help in terms of data quality improvement.”
Now, BI tools are being used across initiatives, where data is being pulled in from multiple different sources – and that is exacerbating the data quality problems.
But data cleansing is technology applied after the fact; a different set of technologies is emerging around the notion of master data management. The aim here is to provide a centralised, sophisticated mechanism that enforces rules on the structured data – as it is created or changed. By imposing common definitions and workflowing any changes out from a ‘golden copy’, organisations can synchronise their data across applications and databases.
That raises another question: who, within the organisation, should be responsible for managing the quality of data? Gartner’s Friedman says that at present, the IT department typically bears the burden for data quality – and this needs to change.
As data quality issues are being addressed more systematically, says Friedman, the role of the ‘data steward’ is taking shape – a business (rather than IT) person who ‘owns’ his department’s data and who takes responsibility for quality goals and changes to centralised definitions. “The IT department can help, but data stewardship puts accountability for data where it belongs – in the business,” he says.