Search Engines enable users to look up text documents stored on the Web, usually by one or more keywords appearing in the document.
Information gathering and filtering
  • This is done by web ÒrobotsÓ - programs which automatically connect to all servers and search some number of documents - usually up to a certain ÒdepthÓ of links, such as 4.
  • For each document, the robot returns keywords and other information to the search index. For example, Lycos returns: the title, any headings and subheadings, the 100 mostÓweightyÓ words, the first 20 lines, the size in bytes, and the number of words.
  • Problems with information gathering:
    • Information update
    • Information resulting from CGI scripts is not available.
    • Resource intensive: robots repeatedly connect to a site, informal protocols try to prevent Òrapid fireÓ or Òrobot attackÓ
    • Preventing robot loops when links are circular.

See also color IMAGE Foil 19 Web Search Engines

From Introduction to the World Wide Web and Web Technologies presentation: Introduction to the www and Web Technologies -- Fall Semester 96. by Nancy McCracken-Foils prepared December 9 1996


Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Fri Dec 6 1996