NPAC Technical Report SCCS-752

WWW Search Systems USing SQL*TextRetrieval and Parallel Server for Structured and unstructured Data

Gang Cheng, Piotr Sokolowski, Marek Podgorny, Geoffrey Fox

Submitted January 26 1996


Abstract

We describe our experience in developing Web Search Systems using Oracle's SQL*TextRetrieval. In the prototype system we store on-line books in the HTML and the HTML documents of a web site, SQL*TextRetrieval is used to index full text and other structured data in the 'web space' and to provide an efficient search engine for free-text search. The Web enables global access to and maximum information sharing with a hypertext-based text retrieval system. Using Oracle's Free Text Retrieval technology various search options are implemented, including basic word stemming, phrase, fuzzy, and soundex searching, as well as more advanced proximity search and concept search. For the concept search option, we have integrated a public domain "Roget Thesaurus" into our text search system to support synonym expansions. An advanced search mechanism to recursively refine search domain via the web is also described. The prototype system can be found at URL http://kayak.npac.syr.edu:1963/search/index.html. A full production system will be implemented on a multiprocessor parallel machine where parallel Oracle 7 with parallel server an query options are used.


PostScript version of the paper