Indexing: the information gathered by the robots is organized into an indexing database at the search server.
- Primarily keyword indexing is currently used - some full text searching is just on single site search engines.
- Key issue is size of resulting database.
|
Searching: the indexing database allows (keyword) searches by the user.
- Queries are formed, some number of most highly ranked results are returned.
|
User Interface
- uniform interface for HTTP, FTP, GOPHER, WAIS, Harvest, Lycos
|
Challenge of WWW search:
- estimated total size is 30 Gigabytes, 5 million documents (many search engines now take months to crawl the web to update index files.)
- diversity - huge distributed database, unstructured, non-relational, hierarchical information with many formats.
|