Basic HTML version of Foils prepared 11 May 1997

Foil 19 Web Search Engines

From Introduction to World Wide Web (WWW) ECS400 Senior Undergraduate Course -- Spring Semester 97. by Nancy J. McCracken *

Search Engines enable users to look up text documents stored on the Web, usually by one or more keywords appearing in the document.
Information gathering and filtering
  • This is done by web "robots" - programs which automatically connect to all servers and search some number of documents - usually up to a certain "depth" of links, such as 4.
  • For each document, the robot returns keywords and other information to the search index. For example, Lycos returns: the title, any headings and subheadings, the 100 most"weighty" words, the first 20 lines, the size in bytes, and the number of words.
  • Problems with information gathering:
    • Information update
    • Information resulting from CGI scripts is not available.
    • Resource intensive: robots repeatedly connect to a site, informal protocols try to prevent "rapid fire" or "robot attack"
    • Preventing robot loops when links are circular.



© Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Thu Aug 21 1997