Applying the Concept Taxonomy
Phase three: Dynamic Conceptual Grouping
The Knowledge Gateway continues the post processing
procedure by working with a concept taxonomy. A concept
taxonomy is a knowledge structure comprised of frequently
used, related concepts, organized into a hierarchical
index. The frequently used, related concepts are drawn
from the words used in the documents being searched.
The tree-like structure in figure 3 illustrates how words
relate within a concept taxonomy. The words at the top levels
are, for example, more general, representing the core categories
of the knowledge structure. In this illustration, the core
category is operating systems.
- The branches of this tree cascade into sub-categories
that, at each level, represent ever more specific instances
of the core meaning. These specific instances are conceptually
related to one another.
For example, Linux, Solaris, Windows
NT, and Windows 98 are all related to operating systems.
Yet Linux and Solaris are more specifically related to UNIX
operating systems, and Windows NT and Windows 98 are more
related to Microsoft operating systems.
In a keyword search, the words "driver problem"
are likely to retrieve dozens, if not hundreds, of documents
containing these words. The search will retrieve documents
that apply to every one of the operating systems mentioned
above, plus many that have no application at all. You must
then sift through the mass of documents to find one that
is relevant.
Figure 4 shows the steps the system
performs in phase two.
A Knowledge Gateway search, using the same keywords, driver
problem, is likely to retrieve the same number of documents,
but here the similarity ends. In post processing, the Knowledge
Gateway refines the search by generating questions. Using
the concept taxonomy, the system recognizes that—based
on the documents returned—it can generate a question
about operating systems.
The Knowledge Gateway senses a relationship between the
words Linux and Solaris—located in its concept taxonomy—and
similar words extracted from the documents. It then generates
the question, Which UNIX operating system? According to
whatever operating system applies, the user might answer,
Linux. The system goes out to Site Server or Lotus Notes
and retrieves documents that pertain to both driver problem
and Linux.
The concept taxonomy performs two important functions:
- It provides a gateway with a knowledge structure that
analyses incoming documents for words or phrases that
best match the original query.
- It drives the query refinement process by enabling users
to answer questions generated by the system.
Using an advanced statistical measure of word relevance,
called Information Gain, the Knowledge Gateway takes both
the keywords from the original query and the related words
from the returned documents and compares them to the concepts
in the taxonomy. If there are matches, the related concepts
in the taxonomy are linked to the keywords.
The Information Gain scores the keywords and sets in motion
the search refinement process:
- The keyword that best matches the related concepts in
the taxonomy scores the most points.
- The word that scores the most points forms the question
that will refine the search.
- The system poses that question to
the user.
- The user’s answer to the question becomes
the next query sent to the third-party external data source.
Figure 5 show the steps the system performs in the dynamic
question generation phase.
The Gateway then sends them to Site Server or Lotus Notes.
The system continues generating questions relating to
driver problems, refining the search as it goes, until either
the user gets problem resolution or the query sent to Site
Server or Lotus Notes is so refined that nothing is returned.
If no documents are returned, you know with certainty that
the desired documents are not in the knowledge base. Through
the k-Commerce Support Enterprise statistical reporting
utility, you can analyze the search results to determine
the kind of documents that are needed in the knowledge base
to solve the problem in the future.
|