The Knowledge Gateway: A Primer

6

Applying the Concept Taxonomy


Phase three: Dynamic Conceptual Grouping

The Knowledge Gateway continues the post processing procedure by working with a concept taxonomy. A concept taxonomy is a knowledge structure comprised of frequently used, related concepts, organized into a hierarchical index. The frequently used, related concepts are drawn from the words used in the documents being searched.

The tree-like structure in figure 3 illustrates how words relate within a concept taxonomy. The words at the top levels are, for example, more general, representing the core categories of the knowledge structure. In this illustration, the core category is operating systems.

  • The branches of this tree cascade into sub-categories that, at each level, represent ever more specific instances of the core meaning. These specific instances are conceptually related to one another.

For example, Linux, Solaris, Windows NT, and Windows 98 are all related to operating systems. Yet Linux and Solaris are more specifically related to UNIX operating systems, and Windows NT and Windows 98 are more related to Microsoft operating systems.

In a keyword search, the words "driver problem" are likely to retrieve dozens, if not hundreds, of documents containing these words. The search will retrieve documents that apply to every one of the operating systems mentioned above, plus many that have no application at all. You must then sift through the mass of documents to find one that is relevant.

Figure 4 shows the steps the system performs in phase two.

A Knowledge Gateway search, using the same keywords, driver problem, is likely to retrieve the same number of documents, but here the similarity ends. In post processing, the Knowledge Gateway refines the search by generating questions. Using the concept taxonomy, the system recognizes that—based on the documents returned—it can generate a question about operating systems.

The Knowledge Gateway senses a relationship between the words Linux and Solaris—located in its concept taxonomy—and similar words extracted from the documents. It then generates the question, Which UNIX operating system? According to whatever operating system applies, the user might answer, Linux. The system goes out to Site Server or Lotus Notes and retrieves documents that pertain to both driver problem and Linux.

The concept taxonomy performs two important functions:

  • It provides a gateway with a knowledge structure that analyses incoming documents for words or phrases that best match the original query.
  • It drives the query refinement process by enabling users to answer questions generated by the system.

Using an advanced statistical measure of word relevance, called Information Gain, the Knowledge Gateway takes both the keywords from the original query and the related words from the returned documents and compares them to the concepts in the taxonomy. If there are matches, the related concepts in the taxonomy are linked to the keywords.

The Information Gain scores the keywords and sets in motion the search refinement process:

  • The keyword that best matches the related concepts in the taxonomy scores the most points.
  • The word that scores the most points forms the question that will refine the search.
  • The system poses that question to the user.
  • The user’s answer to the question becomes the next query sent to the third-party external data source.

Figure 5 show the steps the system performs in the dynamic question generation phase.

The Gateway then sends them to Site Server or Lotus Notes.

The system continues generating questions relating to driver problems, refining the search as it goes, until either the user gets problem resolution or the query sent to Site Server or Lotus Notes is so refined that nothing is returned. If no documents are returned, you know with certainty that the desired documents are not in the knowledge base. Through the k-Commerce Support Enterprise statistical reporting utility, you can analyze the search results to determine the kind of documents that are needed in the knowledge base to solve the problem in the future.


February, 2000 6