Archive | Search RSS feed for this section

Enterprise Search Implementation, Step 5: Pilot Testing

21 Aug

Guest post by John Gillies, Director of Practice Support, Cassels Brock

This is the fifth in a series of posts about the process of choosing and implementing an enterprise search engine in a law firm. The others have addressed, in order:

In this instalment, I’ll look at the pilot testing that you’ll want to engage in once you have passed through the first four stages. (You can carry out your testing on a limited basis while carrying out your database cleanup, but can’t do accurate testing with a group beyond just you and your immediate team members until the cleanup is done.)

There should be two aspects to your testing: the mechanics of your testing, and the actual testing itself (in other words, the “how” and the “what”). I’ll look first at the actual mechanics.

Mechanics of testing. One of the most important things you can do is to ensure that you have a logical and well-documented test process. It should of course reflect the usual quality assurance (QA) testing done in connection with any software adoption. But it is crucial that your QA process here be tailored to reflect your testing of an enterprise search engine. There are three steps here.

  • First, prepare a set of what you expect to be standard search queries.
  • Next, prepare use cases based on those search queries.
  • Finally, establish a formal process to document the results of each use case test. You will see why, in the discussion of the aspects that you are testing for, it is important to be able to refer to the results of previous testing to come to certain conclusions.

What you’re testing for. There are then three key performance aspects that you will want to assess as part of your testing process, namely

  • consistency
  • speed
  • relevance

Consistency. This is the most straightforward aspect to test. You’ll want first to confirm that the same search string delivers up the same results for the same user over time. In other words, the fact that a search on Day 1 was satisfactory is insufficient in and of itself. You’ll want to confirm that on Days 5 and 10 your engine delivers the same result, in the same order (not including, of course, new content that has been indexed in the meantime).

Second, you’ll want to ensure that users with the same profile get the same results regardless of where or how they are logged on (in other words, whether they are on different desktops or in different offices). Note also that if you have adjusted the weighting according to the user profile, you have to be very careful to ensure that you compare apples to apples.

Speed. Testing for speed is more difficult, since most users will assume that Google is the benchmark and that they should be getting results in nanoseconds. Here’s where you’ll need to manage expectations.

I understand that response times for enterprise search generally tend to range somewhere between five and seven seconds (although longer, of course, if the user is logging in remotely). If this is your first enterprise search implementation, however, users will likely be delighted with those sorts of speeds if they are getting good, solid results. Ultimately, though, most of your legal professionals look only to the relevance of the results returned when judging your search engine.

Relevance. Relevance is the most difficult aspect to test properly prior to launch. As noted above, you will want to establish use cases of the different categories of searches that you anticipate your users will be conducting after launch. The difficulty, of course, will be trying to assess in advance what those categories of search will be and what particular queries your typical users will run.

At this point, it may be useful to note the difference between relevance and precision in search testing. The best description of the difference between the two that I have read is the following, from an article entitled Testing Search for Relevancy and Precision, by John Ferrara:

Precision is the ability to retrieve the most precise results. Higher precision means better relevance and more precise results, but may imply fewer results returned. For a query, recall means the ability to retrieve as many documents as possible that match or are related to a query.

Recall may be improved by linguistic processing such as lemmatization, spell-checking, and synonym expansion. In information retrieval, there’s a classic tension between recall and precision. Specifying more recall (trying to find all the relevant items), you often get a lot of junk. If you limit your search trying to find only precisely relevant items, you can miss important items because they don’t use quite the same vocabulary.

Getting the balance right between precision and recall is more art than science and is one of the areas where input from consultants who have engaged in other search implementations will prove particularly valuable.

In trying to get this balance right and running what we expect to be “typical” queries, we knowledge management professionals tend to over-estimate the sophistication of our users. In my firm, a review of the search strings that users actually ran (which we could of course only examine after launch) showed that most users tend to use only a few words, and generally without using quotation marks to search specific terms. I suggest that you prepare your test cases accordingly.

The best course is to go to several lawyers who you know are supportive of your project and review your draft use cases with them, to confirm that these are the types of searches that they might reasonably anticipate running.

Next, you need to get your pilot testers to run those use cases with you standing over their shoulder, watching what they actually do and recording their results. This will serve two purposes. First, it will allow you to see how users actually use the search engine in “real life.” You’ll also see the “mistakes” they make, and be able to adjust your training accordingly.

Second, once you launch your engine, you will want to go back and run the same searches from those use cases and confirm that you get the same results as when you were in pilot testing. It’s possible that those results may vary after launch and, if so, you’ll want empirical data to study and take to the vendor, if needs be.

Once you’ve done your testing, tweaked your settings, and made all the other technical, behind-the-scenes changes you need to make, you’re ready for roll-out. That will be the subject of my last posting in this series.

Enterprise Search Implementation, Step 4: Database Cleanup (Hiding What You Shouldn’t Be Able To Find)

30 Nov

Guest Post By John Gillies, Director of Practice Support at Cassels Brock

This is the fourth in a series of posts about the process of choosing and implementing an enterprise search engine in a law firm. The first addressed Establishing the Business Requirements and the second looked at Picking the Right Search Engine. The last one looked at the Proof of Concept stage, which is where you put your selected engine to the test and ensure that it performs as expected in your environment. Assuming that it passed those tests and the decision has been made to proceed, the next hurdle is cleaning up the databases you will be indexing.

As part of your strategic planning, you will have decided which databases those are. The primary advantage of indexing two or more databases is that users are able to see aggregate results brought together that would otherwise have to be searched separately. The main disadvantage of doing so is that you will have to ensure that, in mixing apples and oranges (as it were), the results are displayed in a way that users can understand and use. In the initial roll-out of enterprise search at our firm, for example, we opted to index only the documents in the document management system (DMS). We did this so that we could start with focused content, train users on using the tool for that content, and then slowly build the available content.

Among the databases commonly indexed for enterprise search are those in the accounting system, the DMS, the library catalogue, the KM/precedent repository, relevant content on the firm intranet, and the legal updates on your firm website. While indexing the content in the four items on this list should be fairly straightforward, indexing the accounting and DM systems poses their own challenges.


Indexing the accounting system requires you to make policy decisions as to who will be able to see what content. For example, can all users search the financial data? Only certain users? All the accounting data or only certain segments? Furthermore, from a usability perspective, while the search engine offers the ability to deliver all the content that corresponds to the search criteria, you may wish to narrow the financial data indexed so as not to overwhelm the user.

One issue to address is whether to index time entry narratives. Those narratives may provide very relevant information, particularly when identifying internal expertise. The question is whether the firm wishes to expose this information to all users. This is one area where the solution is not all or nothing. You may choose, for example, to index this data and use the results for determining relevance, without displaying the actual content.

Document Management System

You will have several concerns with indexing your DMS content. First and foremost you will need to ensure that your confidentiality screens effectively deal with relevant content. This works both ways. In other words, those behind a screen need to be able to find content that they are entitled to view, and those outside the screen need to be blocked from seeing any of that content.

Dealing With “Sensitive Documents

It is, however, the problem of “sensitive” documents in the DMS that will prove to be the most vexing. “Sensitive content” may include, for example, confidential memos from firm committees, memos regarding partner allocations and associate compensation, performance reviews, and so forth..

(You may wish to review the PowerPoint slides done at the ILTA 11 presentation entitled Managing Risks Associated with Enterprise Search, which was a panel composed of Lisa Kellar Gianakos, the Director of Knowledge Management at Pillsbury Winthrop, Rizwan Khan, the Vice-President of Customer Service at Autonomy, and me.)

Typically, in the process of implementing enterprise search, firms discover that sensitive content that should not, for whatever reason, be public has in fact been filed in a publicly accessible part of the DMS. Until that point, that content had not really been available because, realistically, users would have been unable to find it (colloquially referred to as “security through obscurity”). With the advent of better search, that approach is no longer possible.

One way to start finding and securing this content is to draw up a list of “dirty words”. You may wish to begin by referring to the terms on List A that formed part of our ILTA presentation (which are also reproduced as an appendix at the end of this article).

This slide from our presentation shows the most frequently recurring “dirty words” as a tag cloud:

Dirty Word Tag Cloud

You will, however, need to exercise discretion when reviewing the results that a search for these terms returns. For example, while it might seem logical to search for curse words, they are frequently used in e-mails and other documents that are sent to the firm and in court transcripts, so you should not set up a absolute rule to exclude these terms.

Consider searching from some or all of the following:

  1. Terms related to the payment of personal income taxes (e.g., where a lawyer has saved to the DMS letters related to the amount and/or payment of personal income taxes).
  2. Wills and related documents such as “last will and testament”, “living will”, and related terms, such as “life support”. Do the same relating to family law matters, like “divorce”, “separation”, “alimony”, “cohabitation”, etc. (The exact terms will depend on the terms used in your jurisdiction.) Note, however, that if your firm has an estates or a family law practice, a number of these terms may legitimately form part of client files. If firm members have used the services of either the estates or family law group, ensure those files are protected.
  3. Names of firm committees such as “executive committee”, “management committee”, etc. Confidential e-mails to and between committee members are not infrequently filed in publicly accessible locations.
  4. Terms like “cottage”, “country house”, or whatever people may call their secondary residence.
  5. Within personal matter numbers of firm members (if you have such numbers), although there may be relevant public material there such as conference papers, articles, publications, etc.

Check with your Finance and HR departments to find out what terms they would search for. Also, seek suggestions from your pilot group, since they may well come up with terms that your implementation team will not have thought of.

This is perhaps a good opportunity to determine whether any of your internal policies (for example, on confidentiality screens) or external policies (for example, relating to the protection of personal information) need to be updated or whether more internal training is needed.

Understand, as well, that this process should be iterative. Even after you are confident that you have plugged the leaks in the dike, you should continue to do different searches to ensure that you have stopped as much as you can. Consider setting up a reminder system to test these issues post roll-out.

Particularly in the first few months after launch, you will want to review reports of the search terms that users have been using, in part to get a sense of what user behaviour actually is (as opposed to what you’ve assumed it will be!) but also to determine whether users are using terms you had not thought of that might turn up other sensitive documents.

When setting expectations as to implementation, you should be aware that your testing for “sensitive” documents may end up being the most time-intensive portion of your project. Depending on your variables (primarily, the number and size of the repositories you will be indexing), you will want to devote several months to ensuring that you are satisfied with the results that users will be seeing. You will want to avoid any unnecessary bumps at the outset, since that can impair the impression of the search engine you will have spent so much time preparing for!

When you are satisfied on this point, you are now ready for pilot testing, which is the topic I will treat in my next article.

Appendix: “Dirty word” list

  • Associate reviews
  • Bonus allocation
  • Bonus decision
  • Bonus structure
  • Charitable contributions
  • Charitable donations
  • Department budget
  • Direct deposit
  • Discretionary bonus program
  • Equity partner
  • Operations committee/Executive
  • committee
  • Partner admission
  • Partner compensation
  • Partner remuneration
  • Partnership admission
  • Partnership issues
  • Partnership meeting
  • Performance review
  • Performance review
  • Promote/promotion
  • Resignation
  • Staff bonus
  • Termination letter/letter of termination


Enterprise Search Implementation, Step 3: The Proof of Concept

16 Sep

Guest Post By John Gillies, Director of Practice Management, Cassels Brock

This is the third in a series of postings about the process of choosing and implementing an enterprise search engine in a law firm. The first addressed Establishing the Business Requirements, while the second looked at Picking the Right Search Engine.

In my previous posting, I talked about the process that should have resulted in your choosing the search engine that (ideally) best meets your needs. The next stage, the proof of concept, is where you put it to the test and ensure that it performs as expected in your environment.

The following is the definition of “proof of concept” from “A proof of concept [PoC] is a test of an idea made by building a prototype of the application. It is an innovative, scaled-down version of the system you intend to develop. In order to create a prototype, you require tools, skills, knowledge, and design specifications.” (I don’t know if the two terms are related, but for me “proof of concept” brings to mind the often misquoted phrase, “the proof of the pudding is in the eating.”)

Essentially, in the PoC you and your IT colleagues are looking to see whether the engine not only does what the vendor has promised, but whether it also does what you need it to do, in the way you need it done, and does so properly in your technical environment.

Generally, this will involve loading the search software in a test environment, indexing a small percentage of the documents that it will be searching, and running tests to determine how it responds.

While your due diligence up to this point will have included confirmation that the engine should perform in your IT environment, now is the time that the nitty gritty testing will take place. (Since every firm’s IT infrastructure is unique, it’s important to do this testing before proceeding any further.)

You will have identified, in your business requirements document, the various document repositories that you will be indexing (such as the DMS, your accounting system, the intranet, etc.). Once the initial technical testing is done, you will take an appropriately sized “slice” of each of those repositories to index. Ideally, a representative mix of document types, sizes, and security profiles would be included. This will provide the raw data that will be used for the rest of your PoC testing as well as for pilot testing. You will want to make sure that the resulting data subset will go together to provide meaningful results. You may wish, for example, to index documents published within the last 30 days. You will need to be mindful that, as a result, there may be content that pilot testers might expect to find but is not in fact in the PoC database, so it will be important to manage expectations at that time.

Once your various data repositories have been crawled and indexed, you’ll need to set up the user interface so that it displays properly, and then set up and test the security modules. One of the first questions that lawyers will ask is whether the search engine respects the permission walls erected around sensitive information. This should be straightforward to confirm for DMS documents, but if you are including other repositories, particularly for your accounting information, you will want to pay special attention to this issue. Nothing will sink acceptance of your search engine faster than the discovery that users are suddenly able to access documents that should be hidden from them.

This will be the point where you should determine when you are going to conduct your database cleanup. While I will deal with this issue in more detail in my next posting, it’s important to note that you will find that there are a number of sensitive documents that currently exist but are essentially hidden because your current search tools are inadequate to reveal them. (This is referred to as “security through obscurity”.) Since you will have to conduct this cleanup before launch, your question is whether to do it now, before the pilot, or afterwards.

Now you will finally be at the point where you can start some serious testing. Your colleagues in IT will need to carry out their technical tests while you use your business requirements document to test the four key aspects of the search engine, namely

  • relevance
  • responsiveness (i.e., speed)
  • consistency
  • proper working of the key functions

You should know that responsiveness is difficult to test in the PoC, because it’s not really until a full index of your data sources is performed, and released into production, that you’ll know how fast it is. The other three aspects, however, are what you should be focusing on.

As to the third bullet, the search engine should consistently apply the established rules for weighting, ranking, and security criteria on a user by user basis. This may result in different results, but it demonstrates a consistency in the expected user experience regardless of how the user is connecting to the engine.

It will be very helpful if you develop use cases to test these aspects. Develop various types of typical searches that you expect different users to conduct, then carry out those searches and record the results. It’s useful in this context to develop personas (e.g., first year associate who knows nothing, experienced senior associate managing a deal, partner in a litigation matter, assistant doing a search on behalf of his or her lawyer, etc.). With your knowledge of the business requirements, you should also develop test cases that highlight what a specific type of user should “not” find (due to data source, or document security). You will want to keep these use cases for more testing during the pilot phase.

Assuming everything has gone well in your PoC, you will be ready to accept the software, engage in the database cleanup (if you haven’t done so already), and proceed to your pilot testing.


Enterprise Search Implementation, Step 2: Picking The Right Search Engine

6 Jul

Guest Post by John Gillies, Director of Practice Support at Cassels Brock

In my previous posting on implementing a search engine in a law firm, I focused on the first step of the process, namely Establishing the Business Requirements. Getting a detailed list of your business requirements is the essential starting point, because you will use it to compare the features of your “finalists.”

Having got to this point, your next hurdle is figuring out which search engines you’re going to want to test. You may well choose at some point in the process to involve an outside consultant with experience in search engine implementations to help guide you through the process. We found the help of our consultant (Joshua Fireman from ii3) to be invaluable. If you haven’t done so before now, this would be a good time.

In determining what sort of search engine you’re looking for, you’re faced with two choices, namely to restrict your search to the engines that have been customized for the legal market, or to look at engines that have been designed for the general market (for the Fortune 500 crowd, if you will), knowing that you’ll need to do a fair bit of customizing to address the many unique aspects of a legal environment.

You should only consider going the second route if you have reliable support on the technology front so you’re confident that your business requirements can be realized in your environment. For example, if integration with your DMS is an important requirement, will you actually be able to optimize your “Fortune 500 search engine” to do that effectively? Also, how much ongoing coding work will need to be done so it continues to function properly in your environment as it is upgraded? Many firms choose the first route simply because they do not want to rely so much on variables, many of which are beyond their control.

We opted to limit our selection to engines optimized for the legal environment, namely Autonomy iManage Universal Search (a/k/a “IUS”) and Recommind’s Decisiv Search. It is here that the investment of time in defining your business requirements really pays big dividends. In my previous posting, I noted that we had ranked our requirements by importance (from “Essential” down to “Nice to have.”) You can now use that list to create an Excel spreadsheet that you can use to compare your finalists.

We created five categories of rankings, with a score that ranged from 5 to 1. We then added another column that ranked the particular engine on how well it met the particular requirement, also on a five point scale. We then systematically went through each item in the business requirements document, assessed how well the particular feature performed, and assigned a score. Excel will calculate the weighted score for each item (so, for example, an “Essential” item that you give a score of 5 gets a weighted score of 25).

While there is no “ideal” minimum or maximum score that you are hoping to see at the end of this process, it’s possible that the ultimate scores are so low that you will have to reassess your entire process, but the likelihood of that is minimal. What you will most likely get is a total score for each of the finalists that enables you to engage a much more objective comparison than if you had just seen vendor demos of each.

You can also use Excel to compare your finalists just on their scores for the “Essential” items. (You may find, for example, that the overall result is fairly even between them, but one of them scores significantly higher when comparing just the “Essential” items. Once again, this is important information in helping you make your final decision.)

That is not, however, the end of the matter. Whether your finalists essentially get the same score (which is what happened in our case) or whether there is a clear winner, there are other non-quantifiable factors that you need to take into account, all of which can significantly influence your final decision.

The first factor is, of course, price. (You may in fact have taken this factor into consideration at the outset in determining which engines were, or were not, going to be tested.) Then, there are some factors that are likely relevant for any firm, as well as others that may be unique to your environment.

Among the common factors might be items such as:

  • What is your relationship with vendor? If you use other applications of this vendor’s, what is their history on responsiveness to issues you’ve raised about those other applications?
  • What are the announced upgrades for the next version of their engine, and what is their development roadmap? What process do they follow in determining which features to focus on for the future?

Aspects relating to your unique environment depend on the state of your current IT infrastructure and might include:

  • How well will this engine integrate with your current applications?
  • What repositories do you intend to index and what are the implications for integrating those different repositories?
  • What internal support requirements are there?

At the end of this process, you should have all the necessary elements for making a final decision, picking a “winner,” and then moving to the next state, namely the proof of concept.


Establishing Business Requirements For Enterprise Search Selection

1 Jun

Guest Post by John Gillies, Director of Practice Support at Cassels Brock & Blackwell

More and more law firms are looking to adopt an enterprise search engine as a way of finding a way into (and out of) the mass of information that they manage. The right search tool can give a firm immediate access to content that is essentially unfindable. Choosing and implementing the right search tool is, therefore, a critical knowledge management activity. I’ll share here the steps we used in our process, and provide detail about the first, perhaps most critical step, establishing business requirements.

We are a one-office firm of about 200 lawyers, with about eight million documents in our iManage document management system (DMS). We have three databases, the Legal db for client-matter workspaces and precedents, the Support db, which is used by the admin side of the firm, and the Admin db. Every firm member has a personal workspace on Admin, with a public and a private folder. The intention, when created, was that lawyers would use their public folders to save things like precedents, business development content, articles they had written, and so forth.

Our process involved the following steps:

•    Establishing the business requirements
•    Identifying the search engines that we would test and picking the “winner”
•    The proof of concept phase
•    Database cleanup
•    Pilot testing
•    Roll-out

There are several possible search engines that either could service the legal market or are designed specifically to do so, and the only way to choose the right one is to know exactly what your needs are. That is why a detailed business requirements document is essential.

This document starts off as a wish list that itemizes all the features that you would want from a search engine. Ideally, you will have a good sense what your users need, which will get you started. (If you don’t have a good sense, or even if you do, a user survey asking about searching and pain points around finding work product could be helpful.) Your list will then be supplemented by your reading, comments from your counterparts at other firms who have already gone through this process, and any other research you can do. (In preparing your list, make sure that you review Doug Cornelius’ posts on Four Types of Document Searches, which should help you focus your ideas.)

We ended up with a list of approximately 150 requirements.

The next step is to group all the items under related topics, and then to prioritize them. We had five categories: “Essential,” “Very Important,” “Important”, “Nice to have,” and “Useful but not critical.” We also established a list of the top ten essential items, which proved very useful in doing a focused comparison between the two search engines that we compared.

We used an Excel spreadsheet for our business requirements document, which allowed us to give marks to each item on the list (on a scale of one to five) and also to provide a weighted score. (For example, the “Essential” items got a weighting of five and the “Useful but not critical” items got a ranking of one. Accordingly, a particular “Essential” item that got a rating of 3 would have a weighted score of 15.) We then compared the functions of each of the search engines we tested, giving it a score, and Excel then did the computations to come up with an overall score.

At the end of the process, we were able to come up with good scores from two different search engines to help us make the final choice.

One aspect to bear in mind, though, is that this scoring and weighting process is designed to select a search engine to test. There will necessarily be other relevant factors that cannot be reduced to a mere score. This process does, however, concentrate the mind wonderfully on all of the necessary features of your search engine.