When two worlds collide, the results can be cataclysmic: for those that are caught up in it, or even trying to make sense of what is happening from afar, that can mean a lot of confusion.

For the best part of five years, the corporate world has been eagerly anticipating, and bracing itself for, an almighty collision between the two big – but essentially separate – sides of information management. Now it is happening, even most experts agree that the results are confusing.

From one side, there is structured data, consisting of billions of rows of numerical data, or carefully disambiguated textual data, organised in databases, spreadsheets and reports; in many organisations, teams of data management specialists, trained in formatting, organising and optimisation, ensure that this information is good enough to drive business processes and inform decisions.

On the other side, there is unstructured data – sprawling continents of text files, memos, emails, pdfs, blogs, and even, in recent times, voice and video. These files are packed full of meaning and informal structures – but virtually none of it has been formally organised.

To many suppliers – and their customers – it makes a lot of sense to bring these two worlds of data together under one, unified information management architecture. It means that a call centre operator, for example, can see in one screen a customer’s sales history, perhaps his profitability to the company – and his tendency to complain about service to the ombudsman. It means that a series of revealing emails between customers and staff can help to explain a sales manager’s poor figures.

Combined knowledge

Placing two information sources side by side in one screen or portal can involve all kinds of issues of data quality, but it’s not technically difficult. But convergence doesn’t stop there, and when it becomes more ambitious, the much heralded collision of two worlds is creating a lot of hype, uncertainty and confusion – and nowhere more so than in the field of analytics and business intelligence. This is an area where there is so much interest that, Gerry Brown, lead BI analyst, Bloor Research, has given it a new name – “Content Intelligence”.

The convergence appears to be in full swing (see chart). Since mid 2006, just about every major BI company has acquired or partnered with a company that specialises not just in finding the right documents (which is challenging enough), but in extracting meaning from them.

Business Object’s June 2007 acquisition of Inxight, a text mining company with sales of $25 million, is the most recent example. “We will be able to quickly and easily incorporate text analytics and unstructured data into the decision-making workflows of millions of information workers and start to bring these technologies into the mainstream,” says Ian Bonner, CEO of Inxight.

But customers are a little sceptical. They want to understand – as off the record conversations at recent Information Age roundtable lunches have shown. They want to know how it is possible to extract and aggregate real insights from documents that have no structure. And even if they have structure, surely they are not sufficiently structured so that BI style numerical reports and insights can be extracted?

But behind these sceptical questions, there is also real interest. In spite of various plans to “democratise” BI, the difficulty of creating and manipulating reports has prevented it from becoming the pervasive desktop tool that suppliers and managers would like it be. But search boxes are another matter: everyone uses them, and no one needs training. Could search provide the new route into BI?

Suppliers admit they are being persistently asked about unstructured information by potential customers, but many are sceptical about its application in BI. Indeed, when Silvija Seres from Fast Search and Transfer stepped off the stage at the Information Age BI conference in 2006, having articulated the case for Search as a BI tool, several delegates questioned the whole notion: “Search is about finding information, but BI is about aggregating and summarising. The two are solving entirely different problems”, said one. Another was more scathing: “No data warehouse, no ETL (extraction, transformation and loading of data), no idea.”

The confusion is exacerbated by the competitive battle by the different technical approaches being taken (see box two), and by the unproven or unreliable technical capabilities of some of the technologies used to extract meaning from text.

And there is a possibly some self interest at play: data management specialists don’t much like the idea that a big bucket of unstructured data and a search box can replicate the kind of insights they have spent their careers carefully crafting.

Gartner, the analyst company, captures the cultural differences by warning: “Enterprises should avoid treating business intelligence as “search with a suit on” or search as “business intelligence on a night out at the disco.” There is, it says, a long way to go.

Not one problem

Paul Sonderegger, chief strategist for Endeca, an enterprise search company, and a former Forrester analyst, puts the convergence issue into a historical perspective: “A curious thing is that the relational database (RDB) was first designed for speed and for efficient data storage. Access was a secondary consideration.” Search and BI, he argues, are both reactions against the RDB, and both have created indexes that cut across the schema in the database. “With Search, you get lists, with BI, you get lots of summarisation.”

Endeca and FAST, two leading search engine companies, both take the view that they can recreate a lot of the structure of traditional databases within their own fast, flat indexes. Both encourage the idea that structured data sources should be indexed.

This can have several enormous advantages: data management becomes a lot easier, because new information sources can be added to an index more easily than the more complex mapping involved in merging structured data; retrieval is near instantaneous; and there is no need to use a structured query language. As John Lervik, the CEO of FAST says, “SQL is completely limiting as a means of finding information. It should be killed dead. It’s like getting everyone to speak a foreign language.”

FAST’s commitment to the BI market goes beyond that of search rivals Autonomy and Endeca, in that it acquired a data cleansing company and a BI reporting tool company, Radar, before it put together its Adaptive Information Warehouse suite. Data can be extracted, cleansed and tagged before going into the index, which enables the results to be at a more granular level than just a document – and results can be displayed using a reporting tool.

Most BI companies are doing things entirely differently. They have either worked out ways of offering a different, search-oriented way of simply locating BI reports, or, alternatively, they are mining the text to find underlying patterns, structures and meaning.

Text analytics can use a variety of techniques, including entity extraction (identifying common nouns, dates etc), and natural language processing (understanding nouns, verbs and subjects). Other well known search techniques, such as statistical frequency of words, proximity searches, and metadata, can also play a role, along more advanced “Bayesian” techniques that Autonomy, for example, uses.

Most BI companies use this data to create new fields, which can then be used as to create new queries, reports or operational BI systems. Teradata, for example, uses Attensity, while SAS, Business Objects and others use Inxight.

But even here, there is much discussion over what is really possible and useful. Suppliers commonly talk of applications such as fraud analysis, consumer behaviour analysis, counter terrorism, and sentiment analysis, all of which usually involve taking the metadata and the extracted meaning and then doing a large scale analysis.

“The idea of analysing the tone and context of people’s communications and linking it to their lifestyles will soon be commonplace”, says Roger Llewellyn, CEO of Kognitio, the data warehousing company. He gives an example where “loop analysis” of phone calls might link a group of terrorists together – after text analysis of messages had first identified the threat.

A more common example, where simple BI reporting tools and an index might suffice, would involve capturing customer feedback alongside maintenance and returns data. Entities (invoice, part numbers, shops or town names) can easily be captured, alongside sentiment (Furious, delighted) and possibly some verbs (snapped, melted, broke). Engineers can drill into the emails and letters if they need, or look at aggregated data. This type of application is being worked on by suppliers using both indexes and structured databases.

But some analysts warn buyers against getting too carried away by the prospect of easy indexing and user friendly searches. “Where enterprise search can bring a new layer of value is in its ability to rummage through unstructured information to extract potential structure or to establish it where it does not exist at a,” says a recent Gartner report. “But it does not perform sophisticated structured analyses on such information.” The convergence will continue, but it will not be absolute.

How it works

The convergence of traditional BI and search, or text analytics, has so far taken several forms, some of which overlap.

Screen or portal integration. This is integration at its most simple. BI reports appear in one part of the screen, or information portal, and while a search box and results from an enterprise search engine are displayed elsewhere. Effectively, the two systems are entirely separate, although users (for example, call centre staff) have access to a large amount of information within a couple of clicks.

Search of BI reports. Most BI tools have now added a free text search capability to their products. In some cases, this has been done partnerships with, for example, Google, whose search appliance can be used to index the reports. This means that BI reports are much easier to find, and the operator has a familiar user interface. This can drive up the use of BI reports by non-technical end users.

Search interface on structured information sources. This is a variant of 2), above. Although this is arguably not a BI application, this is proving to be a popular way of improving access to data feeds and structured databases, because the search button can eliminate the need for a user to learn about particular applications or SQL. It can also be very fast, if all the data is indexed. The search engine displays the results, but if they are selected, the original structured data is displayed.

Analysing and structuring unstructured or semi-structured data. This is the approach taken by most of the major BI vendors and analytics vendors, including Business Objects, Teradata and others. Using technology from specialist companies such as Inxight (acquired by Business Objects) and Attensity, these companies attempt to identify the hidden structures in so-called unstructured data using techniques such as entity extraction and linguistic analysis. This can then be loaded into a structured database, and analysed and presented using traditional reporting tools. The results can range from excellent to poor, depending on the effectiveness of the analytical method and on the level of structure in the document.

Ordering structured and unstructured data into an index for further analysis. The leading search engine vendors, notably Fast Search and Transfer (FAST), Endeca, and Autonomy, have long argued that believe that their indexes, rather than structured databases, can be used to provide some business intelligence. FAST, in particular, has taken this one step further, and has turned the approach of most BI vendors on its head. Using its own data cleansing tool, data is extracted from structured databases and the relationships stored in an index. On retrieval, the original data can be displayed from the structured database, or a BI reporting tool, Radar, can be used to display relationships stored in the index. The advantage of the approach is speed, and the use of a simple interface to customise or create reports.

Converging worlds
BI vendor	Unstructured capability*
Business Objects	Inxight (acquired 2007))
SAS	Inxight (partner)
Teradata	Attensity (partner 2006))
Kognitio	Active Navigation (partner, 2007)
Cognos	Autonomy (partner)
Informatica	Itemfield (2006)
SPSS	Lexiquest (2002)
IBM	IBM Content Discovery and Omnifind (in house, 2005/6)
Information Builders	Magnify Search (in house, 2007)
Search Vendor	Structured capability*
FAST	Adaptive Information Warehouse (acquired Radar (2006)
Endeca	IAP guided navigation
Autonomy	NCorp (2005)
* These partnerships and acquisitions are not exclusive. Google, for example, has relationships with many BI companies.

The search for intelligence

Combined knowledge

Not one problem

How it works

Further reading in Information Age

Pete Swabey

The search for intelligence

Combined knowledge

Not one problem

How it works

Further reading in Information Age

Pete Swabey

Related Topics

Related Stories

Charting the AI-fuelled evolution of embedded analytics

Data maturity and the squeezed middle – the challenge of going from good to great

How to stop data mesh turning into a data mess

Looking at the Earth with fresh eyes

Related Stories

Charting the AI-fuelled evolution of embedded analytics

Data maturity and the squeezed middle – the challenge of going from good to great

How to stop data mesh turning into a data mess

Observability – everything you need to know