In 2006, information business Thomson Reuters began developing WestlawNext, the successor to its online legal research service Westlaw.
While Westlaw stored 5 billion documents and was used by 98% of the largest law firms in the US, Thomson Reuters wanted to make its successor faster and more intuitive for its customers, who rely on the fast and accurate retrieval of information to win legal cases.
Instead of having to input ‘formal’ search queries to locate information, as Westlaw did, Thomson Reuters wanted to build an agile, efficient IT infrastructure that would not only enable non-literal searches of vast quantities of data, but would also scale to support other areas of the business.
Interesting Links
To achieve this, Thomson Reuters developed a distributed, cloud-like search architecture called Novus. It patented the technology in 2006 and launched the system in 2010.
Novus uses thousands of Linux-based search servers, each running proprietary search software. Each one stores part of the content index for the company’s various information products – WestlawNext, tax and accounting research system Checkpoint and financial service Eikon – in memory.
When a search is executed, it is sent to thousands of search servers at once. Each server sends back relevant results to a controller that ranks, sorts and aggregates them before returning them to the requesting application.
If the user then requests a certain document, the search servers connect to a NetApp NAS storage using 10-Gigabit Ethernet to pull content stored in Oracle database clusters.
Thanks to NetApp’s NFS file system, the search servers can share access to the stored database. This means that WestlawNext is able to search 50 times more data than Westlaw, and in half the time.
“Our whole world is the idea that we want a vast array of data linked to indexes that are really accessible,” explains Rick King, chief operating officer for technology at Thomson reuters. “We want our computing power and server technologies there to be interrelated. That way, anything can operate on anything else at any time.”
Because full content indexes are stored on NetApp NAS storage along with the content, if a product is experiencing heavy demand, servers can be assigned to products that are experiencing heavy demand at the click of a button. “A technician would get onto a master console and go to those servers virtually, tick them off and assign them to another product, such as Eikon, and they would be suddenly reallocated,” says King.
In 2011, Thomson Reuters made further infrastructure enhancements when it added Flash Cache to specific NetApp systems containing low-capacity, high-requirement RAC database clusters, allowing the organisation to increase performance without wasting capacity by adding storage. “Basically, we did it to avoid buying a lot more hardware and under-utilising it,” says King.
As new content is added to the various growing applications, King says that Thomson Reuters can scale-out the infrastructure simply by adding more servers and NetApp NAS storage and connecting them up with switches.
Dynamically reallocating resources on the back-end meant that Thomson Reuters was able to meet demand for various products while avoiding an estimated $65 million in costs for building a new dedicated data centre for WestlawNext.