Gather WWW pages/files from remote web servers and filter them into indexed text database |
Use 'Web Robot' or 'Web Agent' technology - a class of programs that automatically traverse network hosts and bring back information via various network protocols (e.g. HTTP) |
Major issues - direct impact on database size, search coverage and performance
|