The semantic web has a hallowed position among futuristic technologies, not least of all because its chief champion is Tim Berners-Lee, the man who invented the world wide web.
Unfortunately for Berners-Lee and his acolytes, there is little chance that the semantic revolution – whereby information on the web is supplemented with metadata that describes its meaning, thereby improving the power of search and other information management functions – will happen as quickly, as dramatically or as visibly as the rise of the web in the 1990s.
Instead, the development of the semantic web is more likely to mirror that of Web 2.0 – it will be gradual and partial. It will suit some websites to add semantic metadata to their information, others not at all.
However, in the (slightly) more self-contained world of enterprise information management, semantic technology has some more immediate applications and is already helping some organisations make sense of their oceans of unstructured data.
The umbrella heading of ‘semantic technology’ includes tools that analyse text in order to divine its meaning as well as formats and standards for codifying and integrating information on the basis of that meaning.
In the first case, there are two approaches. One is statistical; some of the meaning of a document can be retrieved by mathematically analysing the text.
An example of a business application of this approach comes from information service provider Thomson Reuters, a sophisticated adopter of semantic technology in all its forms.
“We applied some statistical semantic technology to our marketing campaigns,” explains Peter Jackson, chief scientist at Thomson Reuters Professional. “We used a document categorisation system to analyse the documents that people visit online, based on word-pairs found in the text. That analysis informs the marketing mail-outs we send them, based on what they are interested in.”
More intriguing is the second approach, an ongoing attempt to train computers to decode the meaning of words based on linguistic principles. “The second approach is to look into our own brains and try to codify what all these words mean,” explains Hans Uszkoreit, professor of computational linguistics at
The technology available today can recognise defined concepts, such as company names, with 90% accuracy, Uszkoreit explains. It can also divine simple relationships between those concepts (such as what company sells which products), however the more complex the relationships between the concepts, the lower the rate of recognition currently achievable.
Thomson Reuters has also applied this technology to a business problem in its legal information division. “We developed a system that used natural language processing to identify names of people and companies involved in case reports,” explains
This is emphatically not the same as search technology,
Meaningful standards
Standards that have arisen to describe the semantic meaning of a given data set include the resource description framework (RDF) and the web ontology language (OWL). These provide a framework of meanings – an address or a name, to use two prosaic examples – that can be assigned to data.
Not only can these standards help business to codify their unstructured data, but they can also help in application development and integration, argues Orestis Terzidis, a director at SAP’s research campus.
“When messages pass between two applications, you have to make sure you copy the data from one input field to the correct corresponding field,” he explains. “One approach to that is to refer to an ontology such as OWL. You can ensure that in both applications the fields are defined as an address, for example, and use the ontology to direct the transfer of data.
“That will shift the integration difficulties from a technical problem to a question of the true definition of things,” he adds, which will allow greater business involvement in integration projects.
Terzidis adds that while there are many approaches to this kind of integration, introducing ‘meaning’ to the realm of computing, semantic technology may be one of the most powerful.
“People have compared semantic technology to the relational database,” he says, “in that you can do almost anything that you can do with semantic technology using alternative methods. But with semantic technology you can do it in a simpler and in a more reusable way.”
Further reading