The search module contains two parts: a search language and a search engine. The search language takes query from the user and transforms query into formal representation that can be applied to the index. Then the search engine applies the query representation to the index and derives a set of records that meet the criteria specified by the query.
The capabilities of the search language are reliant on the index generated. The more sophisticated the technology used to create the index, the higher the level of sophistication is for the search language. There are two basic types of search languages on the Web: keyword matching (also called associative search) and boolean search.
Keyword matching (associative search) is one type of search languages that can accept "keyword" from the user as a search query. It is the most common search capability and is supported by almost all search systems for the World Wide Web. If a search system supports keyword matching, the user could input any word, or a document title, or event part of a word to do the search. Keyword matching is really pattern matching of characters in the text. For example, a search query of the string "comp" would yield all occurrences of words containing "comp", such as "composer", "computer", "incompetent", and "composure", and so forth.
Boolean search is a more sophisticated search language that serves to narrow and refine keyword matching searches. Boolean search language allows the user to use boolean operations on the keywords. The boolean operators are : AND, OR, and NOT. They are used to build the search query, and can be used in combinations. If a search system supports boolean search language, then the user could input words or phrase, or combinations of phrases as a search query. For the example above, what if we are looking for information only on computer? The example shows that keyword matching search queries often yield a high number of matches covering a broad range of meanings and contexts. To narrow search query, one would take approach of boolean search. We may try " ((computer NOT composure) NOT incompetent) ". If we want the set of results closer to what we are looking for, boolean search may get better results than keyword matching search does. Boolean search allows the construction of sentence-like queries with combinations of boolean operators.
Search engine takes the formal query transformed by the search language, applies the query to the index, and then returns matched records to the user. In other words, the objective of search engine is to apply the query to the index (database), to find matches between the value of attributes in the stored records in the database and the attribute specified by the search query. If a match is found, then that record is retrieved. As a result of search query, the set of retrieved records (called result set) is transferred to the user.
WAIS (Wide Area Information Server) is the most popular indexing and searching method currently used on the Web. WAIS is a client-server internet service and was developed independently of the Web. WAIS offers a sophisticated searching language and search mechanism. It uses keyword search with optional use of boolean operators. WAIS provides most of the components of a search system: an indexer, a searcher, and retrieval, thought it doesn't have a gatherer. When combined WAIS with robot, which fills the role of gatherer, it makes up a complete search system. Traditional databases are also very popular tools for indexing and searching Web resources.
Copyright © 1996 Aixiang (I Song) Yao, All Rights Reserved
Aixiang (I Song) Yao<ayao@csgrad.cs.vt.edu>
Last modified: November 21, 1996