Indexing: the information gathered by the robots is organized into an indexing database at the search server.
-
Primarily keyword indexing is currently used - some full text searching is just on single site search engines.
-
Key issue is size of resulting database.
|
Searching: the indexing database allows (keyword) searches by the user.
-
Queries are formed, some number of most highly ranked results are returned.
|
User Interface
-
uniform interface for HTTP, FTP, GOPHER, WAIS, Harvest, Lycos
|
Challenge of WWW search:
-
estimated total size is 30 Gigabytes, 5 million documents (many search engines now take months to crawl the web to update index files.)
-
diversity - huge distributed database, unstructured, non-relational, hierarchical information with many formats.
|