HTML version of Scripted Foils prepared 5 July 97

Foil 30 Web Search Indexes

From Overview of Basic Web and Internet Technologies Beijing Web Tutorial -- May 27-30 1997. by Geoffrey Fox *

1 Indexing: the information gathered by the robots is organized into an indexing database at the search server.
  • Primarily keyword indexing is currently used - some full text searching is just on single site search engines.
  • Key issue is size of resulting database.
2 Searching: the indexing database allows (keyword) searches by the user.
  • Queries are formed, some number of most highly ranked results are returned.
3 User Interface
  • uniform interface for HTTP, FTP, GOPHER, WAIS, Harvest, Lycos
4 Challenge of WWW search:
  • estimated total size is 30 Gigabytes, 5 million documents (many search engines now take months to crawl the web to update index files.)
  • diversity - huge distributed database, unstructured, non-relational, hierarchical information with many formats.

in Table To:


© Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Thu Aug 21 1997