Wednesday, December 12, 2007

Interwoven on Enterprise Search Done Right

Interwoven presented a webinar on enterprise search focusing on their Interwoven Universal Search product using the Vivisimo Velocity search engine. Interwoven wants to be the exclusive provider of enterprise search to the professional services area.

Gerald Reid CIO of Milbank Tweed Hadley and McCLoy LLP was the presenter. Milbank has over 600 lawyers and over 1000 employees in ten offices.

In 1999 Milbank tried saving emails into their matter management system. It turned out that the search for emails would be too slow to work. In 2001 they brought in AltaVista (remember them) to implement a search engine for their document management system. Attorneys quickly saw the value of the system. The search engine combined the full text search and metadata for the document. This was particularly valuable because it enabled you to easily search across the multiple document libraries. It was easy to search your local library, but hard to search outside other libraries. The attorneys loved it. Gerald got love notes. How often does IT get love notes praising a new tool?

Then, AltaVista went out of business. Milbank kept the product running, but did so naked, with no further vendor support. At this point they were running DocsOpen and Hummingbird. They ran a bake-off between FAST and Hummingbird's Search Server. FAST took days to index the documents and had an index bigger than the document library. The Hummingbird product never quite met the performance of AltaVista.

They moved on to test Autonomy, Recommind and Vivisomo. They presented each of the three with a 700,000 document library. Autonomy took 6 weeks to index. Recommind and Vivisimo took two days. Autonomy also split big documents into multiple pieces. Recommind had performance issues. Recommind did not multi-thread (each search runs in order so you need to wait for the search in front of you to finish).

Vivisimo's initial search results were not as relevant. But with a few minutes of tweaking, the results were just as good as Reccomind. They do have multi-thread processing. The semantic clustering was a bonus feature.

His word of advice is that enterprise search is a killer-app. If you do not have enterprise search, any one of these products will provide extraordinary results. Since Milbank already had an enterprise search, they were a little picky.

Advantages of Vivisimo:

* searches all versions of a document many just search the latest version
* searches email attachments
* Snippets. You can see a piece of relevant text in the search results.
* Clustering. They group similar results. See Clusty.com
* Stemming. Deals with plurals and tense.
* Thesaurus. They have a legal specific thesaurus. And have some for other industries.

The document environment for Milbank's deployment consisted of:

* 3.5 million documents
* 8.2 million saved documents
* 5.7 million email attachments

The initial indexing of documents took 2 days and emails and attachments took about 7 days. He thought the time could be decreased if you used better hardware than they did.

He had a few surprises, but mostly in the searching philosophy on how results should be returned. They are deploying to a pilot group of secretaries and attorneys this month.

I found it interesting that they do not use Interwoven as their document management system. They use DocsOpen.

They allow the individual to decide whether to show snippets or not.

He ran a complicated search that returned 31 documents from documents around the world, with the results coming back almost instantly.

He ran a generic search for: Chase citibank citigroup. 3,100 documents from around the world came back quickly, clustered into "proposal", "credit agreement", "Goldman Sachs", and a few other clusters. He quickly filtered those results on metadata such as author.

It was an impressive display. I had been skeptical of how the clustering would work with a set of legal documents. Let's face it, legal documents generally do not have a diverse vocabulary to distinguish among them. But the clustering worked well.

They are planning to add their internal portal to be indexed and searched by the product. They also want to index the finance system to add other metadata from that system onto the documents.

Interwoven also announced that DLA Piper has selected Universal Search. [Correction: The US Branch of DLA Piper has selected Universal Search]

4 comments:

  1. To add some clarification, the International branch of DLA Piper has selected Interwoven Universal Search as their enterprise engine. The US Branch of DLA Piper is still using Recommind MindServer and remains committed to this platform as our enterprise engine.

    ReplyDelete
  2. So the US Branch of DLA Piper is not part of the DLA Enterprise?
    though seriously, can either side do a single global search?

    ReplyDelete
  3. As I said during the presentation referred to above, any comments about other search engines were from our experience at the time we conducted those tests. Each product has probably advanced since then so please do your own homework.

    ReplyDelete
  4. Michael and Gerald -

    Thank you for your comments and clarifications.

    Search engines are advancing forward and what is true in the selection process 6 months ago may no longer be true today.

    One thing that does ring true is the value of enterprise search, regardless of the vendor.

    ReplyDelete

Note: Only a member of this blog may post a comment.