HELP! * YELLOW=global GREY=local Global HTML version of Foils prepared February 11,1996
Foil 50 The Indexing Subsystem
From IBM Tutorial on Web Technology for HPCC IBM Poughkeepsie -- February 7 1996. byGeoffrey Fox * See also color IMAGE
How text of web documents/files are internally stored/indexed in the text database to efficiently and effectively support searching
Common approach - 'inverted index'
Major issues - direct impact on database size and search performance
compression scheme to store text and their indexes - minimize space consumption
index scheme, tightly coulpled with the search engine - speedup search
indexing modes - real-time, batch, or incremental indexing
high performance web robot - minimize impact on network traffic and database loading
Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.