HELP! * YELLOW=global GREY=local Global HTML version of Foils prepared February 11,1996

Foil 50 The Indexing Subsystem

From IBM Tutorial on Web Technology for HPCC IBM Poughkeepsie -- February 7 1996. by Geoffrey Fox * See also color IMAGE

How text of web documents/files are internally stored/indexed in the text database to efficiently and effectively support searching
Common approach - 'inverted index'
Major issues - direct impact on database size and search performance
  • compression scheme to store text and their indexes - minimize space consumption
  • index scheme, tightly coulpled with the search engine - speedup search
  • indexing modes - real-time, batch, or incremental indexing
  • high performance web robot - minimize impact on network traffic and database loading


Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Tue Feb 18 1997