NPAC Technical Report SCCS-752
WWW Search Systems USing SQL*TextRetrieval and Parallel Server for Structured and unstructured Data
Gang Cheng, Piotr Sokolowski, Marek Podgorny, Geoffrey Fox
Submitted January 26 1996
Abstract
We describe our experience in developing Web Search Systems using
Oracle's SQL*TextRetrieval. In the prototype system we store on-line
books in the HTML and the HTML documents of a web site,
SQL*TextRetrieval is used to index full text and other structured
data in the 'web space' and to provide an efficient
search engine for free-text search. The Web enables global
access to and maximum information sharing with a
hypertext-based text retrieval system. Using Oracle's
Free Text Retrieval technology various search options are
implemented, including basic word stemming, phrase, fuzzy, and
soundex searching, as well as more advanced proximity search
and concept search. For the concept search option, we have
integrated a public domain "Roget Thesaurus" into our text
search system to support synonym expansions. An advanced
search mechanism to recursively refine search domain via the web
is also described. The prototype system can be found at URL
http://kayak.npac.syr.edu:1963/search/index.html.
A full production system will be implemented on a multiprocessor
parallel machine where parallel Oracle 7 with parallel server an
query options are used.