Guest post by John Gillies, Director of Practice Support, Cassels Brock
This is the fifth in a series of posts about the process of choosing and implementing an enterprise search engine in a law firm. The others have addressed, in order:
- Establishing the Business Requirements
- Picking the Right Search Engine
- Proof of Concept
- Database cleanup
In this instalment, I’ll look at the pilot testing that you’ll want to engage in once you have passed through the first four stages. (You can carry out your testing on a limited basis while carrying out your database cleanup, but can’t do accurate testing with a group beyond just you and your immediate team members until the cleanup is done.)
There should be two aspects to your testing: the mechanics of your testing, and the actual testing itself (in other words, the “how” and the “what”). I’ll look first at the actual mechanics.
Mechanics of testing. One of the most important things you can do is to ensure that you have a logical and well-documented test process. It should of course reflect the usual quality assurance (QA) testing done in connection with any software adoption. But it is crucial that your QA process here be tailored to reflect your testing of an enterprise search engine. There are three steps here.
- First, prepare a set of what you expect to be standard search queries.
- Next, prepare use cases based on those search queries.
- Finally, establish a formal process to document the results of each use case test. You will see why, in the discussion of the aspects that you are testing for, it is important to be able to refer to the results of previous testing to come to certain conclusions.
What you’re testing for. There are then three key performance aspects that you will want to assess as part of your testing process, namely
Consistency. This is the most straightforward aspect to test. You’ll want first to confirm that the same search string delivers up the same results for the same user over time. In other words, the fact that a search on Day 1 was satisfactory is insufficient in and of itself. You’ll want to confirm that on Days 5 and 10 your engine delivers the same result, in the same order (not including, of course, new content that has been indexed in the meantime).
Second, you’ll want to ensure that users with the same profile get the same results regardless of where or how they are logged on (in other words, whether they are on different desktops or in different offices). Note also that if you have adjusted the weighting according to the user profile, you have to be very careful to ensure that you compare apples to apples.
Speed. Testing for speed is more difficult, since most users will assume that Google is the benchmark and that they should be getting results in nanoseconds. Here’s where you’ll need to manage expectations.
I understand that response times for enterprise search generally tend to range somewhere between five and seven seconds (although longer, of course, if the user is logging in remotely). If this is your first enterprise search implementation, however, users will likely be delighted with those sorts of speeds if they are getting good, solid results. Ultimately, though, most of your legal professionals look only to the relevance of the results returned when judging your search engine.
Relevance. Relevance is the most difficult aspect to test properly prior to launch. As noted above, you will want to establish use cases of the different categories of searches that you anticipate your users will be conducting after launch. The difficulty, of course, will be trying to assess in advance what those categories of search will be and what particular queries your typical users will run.
At this point, it may be useful to note the difference between relevance and precision in search testing. The best description of the difference between the two that I have read is the following, from an article entitled Testing Search for Relevancy and Precision, by John Ferrara:
Precision is the ability to retrieve the most precise results. Higher precision means better relevance and more precise results, but may imply fewer results returned. For a query, recall means the ability to retrieve as many documents as possible that match or are related to a query.
Recall may be improved by linguistic processing such as lemmatization, spell-checking, and synonym expansion. In information retrieval, there’s a classic tension between recall and precision. Specifying more recall (trying to find all the relevant items), you often get a lot of junk. If you limit your search trying to find only precisely relevant items, you can miss important items because they don’t use quite the same vocabulary.
Getting the balance right between precision and recall is more art than science and is one of the areas where input from consultants who have engaged in other search implementations will prove particularly valuable.
In trying to get this balance right and running what we expect to be “typical” queries, we knowledge management professionals tend to over-estimate the sophistication of our users. In my firm, a review of the search strings that users actually ran (which we could of course only examine after launch) showed that most users tend to use only a few words, and generally without using quotation marks to search specific terms. I suggest that you prepare your test cases accordingly.
The best course is to go to several lawyers who you know are supportive of your project and review your draft use cases with them, to confirm that these are the types of searches that they might reasonably anticipate running.
Next, you need to get your pilot testers to run those use cases with you standing over their shoulder, watching what they actually do and recording their results. This will serve two purposes. First, it will allow you to see how users actually use the search engine in “real life.” You’ll also see the “mistakes” they make, and be able to adjust your training accordingly.
Second, once you launch your engine, you will want to go back and run the same searches from those use cases and confirm that you get the same results as when you were in pilot testing. It’s possible that those results may vary after launch and, if so, you’ll want empirical data to study and take to the vendor, if needs be.
Once you’ve done your testing, tweaked your settings, and made all the other technical, behind-the-scenes changes you need to make, you’re ready for roll-out. That will be the subject of my last posting in this series.