HELP! * GREY=local Full HTML for

LOCAL foilset Web Search Presentation for HPDC95 Tutorial

Given by Geoffrey C. Fox at HPDC95 Pentagon City on August 1,1995. Foils prepared July 28,1995
Abstract * Foil Index for this file

See also color IMAGE
This was prepared for tutorial at HPDC-4 Conference
It starts with motivation and Identification of four components of a Web Search system -- Information Gathering and Filtering, Indexing, Searching and User Interface
Web Robots (gatherers) are reviewed followed by
Discussion in detail of 3 examples Lycos, FreeWAIS and Harvest -- the associated demonstrations also include Oracle Free text search
We end with discussion of future technologies including natural language frontends, distributed queries, metadata, caching and artificial intelligence

Table of Contents for full HTML of Web Search Presentation for HPDC95 Tutorial


1 Web Search
2 Abstract of Web Search Presentation
3 Motivations
4 Challenges
5 Major components in a web search system
6 Major components in a web search system (cont.)
7 Web Robots (also called Spiders, Web Worms or Web Wanderers)
8 Major uses of Web Robots
9 Implementation Issues of Web Rebots
10 Costs & Dangers of Using Web Robots
11 Costs & Dangers of Using Web Robots (conÕt)
12 Examples of Web Search Systems
13 Lycos
14 Total Volume and Data Capture in Lycos
15 Data Content in Lycos
16 FreeWAIS - Wide Area Information Server
17 Indexing in FreeWAIS
18 Search in FreeWAIS
19 Harvest System
20 Harvest Architecture
21 Harvest Overview
22 Harvest Gatherer
23 Customized Content Extraction (Essence)
24 Summary Object Interchange Format (SOIF)
25 Harvest Broker
26 Distributed Gatherer-Broker Arrangement
27 Index & Search in Harvest
28 Cache in Harvest
29 Perfomance of Harvest -- Gatherer
30 Performance of Search in Harvest -- Glimpse
31 Implementation of Harvest
Standalone

32 Implementations of Harvest
with Other Systems -- Continued

33 Future Technologies in Web Search
34 Future Technologies in
Web Search - NLP

35 Future Technologies in
Web Search - DQ

36 Future Technologies in Web Search - MDF
37 Future Technologies in
Web Search - AI

38 Future Technologies in
Web Search - CSCM

39 More about Web Search Systems and Web Robots-- Yahoo
40 More about Web Search Systems and Web Robots
41 Recent Capital Ventures in Web Search Business

This table of Contents Abstract



HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 1 Web Search

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
Tutorial Presentation at HPDC95
Pentagon City
August 1,1995
Gang Cheng,Srinivas Polisetty
Presented by Geoffrey Fox
NPAC -- Syracuse University
111 College Place
Syracuse NY 13244-4100

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 2 Abstract of Web Search Presentation

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
This was prepared for tutorial at HPDC-4 Conference
It starts with motivation and Identification of four components of a Web Search system -- Information Gathering and Filtering, Indexing, Searching and User Interface
Web Robots (gatherers) are reviewed followed by
Discussion in detail of 3 examples Lycos, FreeWAIS and Harvest -- the associated demonstrations also include Oracle Free text search
We end with discussion of future technologies including natural language frontends, distributed queries, metadata, caching and artificial intelligence

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 3 Motivations

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
Information discovery, locate relevant sources with reasonable efforts/time
Cache/Replicate information to alleviate regional network and server overhead
A unified Web search interface to many different information resources. e.g.: HTTP,FTP,Gopher,WAIS,Usenet newsgroup and various on-line databases

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 4 Challenges

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
Data Volume
  • Estimated Web total size: 30 GB, 5 million documents, grows daily
  • require more sophisticated information search interfaces than browsing and organizing in hyperlinks
Data Diversity
  • Web - a huge distributed database, unstructured, non-relational, hierarchical (multimedia) information entities with various data formats: MIME -- html,plain text,PostScript, LaTex, images,audio/video clips, etc.
  • Web repositories are heterogeneous . inconsistent . incomplete
User Base
  • different requirements in query patterns, search topics and response time
  • rapid growth in number and search requests daily

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 5 Major components in a web search system

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
an information gathering and filtering subsystem
  • gather source data from remote/local web repositories to local indexing database. Source (searchable text based information on Internet): Web space, including information on servers of HTTP,FTP,Gopher,WAIS etc. Usenet newsgroup, various databases on-line, e.g. periodicals
  • information conversion and extraction
  • deal with update of source
  • usually implemented by a ÒWeb RobotÓ
an indexer
  • meta-data
  • determine the scope and accuracy of the search engine
  • size of the indexed database
  • common indexing schemes: object names, selected keywords, full text
  • pre-computed indexing, automatic indexers

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 6 Major components in a web search system (cont.)

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
a search engine
  • built on the indexed database
  • client-based v.s. server-based
  • common search capability: keyword-based search, attribute-based search, content-based search, concept-based search, NLP(Natural Language Processing)-based search, SQL queries (with RDBMS server) etc.
a search interface
  • HTML Form based
  • CGI gateway from HTTP server to search engine
  • user customizable (for forming query and query results)

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 7 Web Robots (also called Spiders, Web Worms or Web Wanderers)

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
Definition: web robots are a class of software programs that traverse network hosts gathering information from and about resources -- lists, information and collections
Problems Addressed
  • No pre-defined data set or fields for a document on the Web
  • Manual traversal of the Web is virtually impossible
  • Contents of the resources change as also the hyperlinks
  • Improperly maintained Web sites produce dead links
Limitation: Information generated Òon-the-flyÓ (e.g. by CGI scripts) cannot be retrieved by Web rebots

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 8 Major uses of Web Robots

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
Resource Discovery - Summarize large parts of Web, and provide access to them
Statistical Analysis - Count number of Web servers, avg number of documents per server etc
Mirroring - Cache an entire Sub-space of a Web server, and allow load sharing, faster access etc
Maintenance - Assist authors in locating dead links, and help maintain content, structure etc of a document

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 9 Implementation Issues of Web Rebots

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
Past Implementations
  • Browsing - A small list was maintained at CERN for browsing through all available servers
  • Listing - Lists of references to resources on the Web, such as Yahoo, HCC NetServices List, NCSA MetaIndex etc
  • Searching - Searchable databases like the GNA Meta Library
Present Implementations
  • Automatic Collection - Automatically retrieve, a fixed set of documents that the Robot has been programmed to parse regularly like the NCSA What new list
  • Automatic Discovery - Exploit automation, analyze and store the documents encountered
  • Automatic Indexing - Automatically index the set of documents cached
  • Web robot can be a standalone program plugged into a search system or a built-in gathering component in a indexing/search engine

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 10 Costs & Dangers of Using Web Robots

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
Network Resource & Server Load
  • Robots require considerable bandwidth over the internet, disadvantaging corporate users
  • Heavy load on server, resulting in lower service rate and frequent server crashes
  • Remote parts of the network is also effected
Continuous requests, which the robots usually make (Òrapid firesÓ) are even more dangerous
Current HTTP protocol is inefficient to robot attacks

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 11 Costs & Dangers of Using Web Robots (conÕt)

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
Bad implementation of the robot traversing algorithm may lead to the robot running in loops
Updating Overhead
No efficient change control mechanism for URL documents that got changed after caching
Client-side robot disadvantages
  • once distributed, bugs cannot be fixed
  • no knowledge of problem areas can be added
  • no new efficient facilities can be taken advantage of, as not everyone will upgrade to the latest version

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 12 Examples of Web Search Systems

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
Lycos
FreeWAIS
Harvest
Web Search Demos - The demo set is designed for three different search systems to search in the same database: G. FoxÕs newest book ÒParallel Computing Works !Ó
  • Harvest
  • WWWWAIS
  • Oracle Text*Retrieval

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 13 Lycos

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
developed at Carnegie MellonÕs Center for Machine Translation in the School of Computer Science
Hardware Architecture: 7 workstations used as server engines, independently accessing the same index database using replicated data on local disks
Gathering software: Written in Perl; uses C and libwww to fetch URLs
SCOUT Indexer: full boolean retrieval with spelling correction and approximate phonetic matching
Search Capability: document title, heading, link and keyword, approximate matching and probabilistic retrieval

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 14 Total Volume and Data Capture in Lycos

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
descriptors (title, heading and first 20 lines) of 4.24 million (large base) or 505k (small base) URLs
keywords from 767,000 documents (for more than 5.4GB of text), update each day, 5000 new documents/day
so far 16 million queries answered, 175,000 hits/week
total 1493,787 URLs from 23,550 unique HTTP servers
averaged size of a text file downloaded: 7,920 characters .
Current total size of the index database: 737 MB

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 15 Data Content in Lycos

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
Web URLs, including HTTP, FTP, Gopher servers. Not included: WAIS, UseNet News, Telnet Services, Email, CGI scripts
for each URL: Title, Headings, Subheadings, 100 most ÒweightyÓ words, First 20 lines, Size in bytes, Number of words
Performance Related Info:
  • response time is frequently > 20 sec.
  • Limitations:
    • with 7-10 servers running full time the poor response time (or rejection) when high load query requests, due to not scale up, no parallelization and the replicated database approach

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 16 FreeWAIS - Wide Area Information Server

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
Internet indexing and search engine suitable for use on any set of textual documents
client server model allows effective and local access
  • X-client available (xwais)
  • WWW cgi-bin gateway interface available (wwwwais) as form-based web interface
  • waisindex command allows construction of index data for any Unix directory tree of text documents

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 17 Indexing in FreeWAIS

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
index data can be bulky - pure text documents will require almost same file space for index data
index process can be run as a daemon process running nightly
document + index set can be registered with central wais registry making information available to Internet community
docroot of a WWW server can be indexed or partially indexed.
disadvantage of freeWAIS over Harvest for example is that need access to the directory of Unix files to make index data. Although remote documents/index data can be accessed, you cannot index someone elseÕs WWW server with freeWAIS, unlike Harvest

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 18 Search in FreeWAIS

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
Queries are on list of keywords. freeWAIS does not allow complex query formulation, but commercial WAIS product is Z39.50 compliant and does allow Boolean and other complex query formulation
freeWAIS is entirely based on free-text search. It has no ability to recognize special data fields in documents. This can be a problem if documents do not have meaningful filenames, as no title can be returned by the search process
search results are scored according to a matching criterion and can be ordered by search score
all these deficiencies are addressed by the commercial WAIS product

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 19 Harvest System

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
Major components
  • efficient distributed Gathering architecture with customizable content extraction in each Gatherer
  • topic/community focussed Brokers
  • plug-and-play index/search engine in each Broker
  • network-aware caching and replication for scalable access
  • structured content summary interchange protocol

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 20 Harvest Architecture

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * Critical Information in IMAGE
Full HTML Index

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 21 Harvest Overview

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
Gatherer collects info from resources available at Provider sites
Broker retrieves indexing info from gatherers, suppresses duplicate info, indexes collected info, and provides WWW interface to it
Replicator replicates Brokers around the Internet
Users retrieve located info from through the Cache

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 22 Harvest Gatherer

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * Critical Information in IMAGE
Full HTML Index
Collection specified as enumerated URLs with stop lists and other parameters
Actions performed
  • Enumeration
  • Retrieval
  • Unnesting (compression, tar, etc.)
  • Type-specific summarizing
  • Content summary serving

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 23 Customized Content Extraction (Essence)

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * Critical Information in IMAGE
Full HTML Index

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 24 Summary Object Interchange Format (SOIF)

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
A type of Meta-Data
Gatherer-Broker protocol
Structured object summary stream
Arbitrary binary data & extensible fields
Type + URL + set of attribute-value pairs
Easy interoperation with other standards for communication/data exchange

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 25 Harvest Broker

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
Less than a dozen routines define index/search API
Current Engines: freeWAIS, Glimpse, Nebula, Verity, WAIS
Current query interface: WWW
Customizable result formatting
MD5-based duplicate removal
  • MD5: an algorithm for computing a 128 bit ÒdigestÓ of arbitrary-length data such that any alterations in the data will be reflected in the digest
Provide data for indexing from other brokers as well as well as themselves

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 26 Distributed Gatherer-Broker Arrangement

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * Critical Information in IMAGE
Full HTML Index

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 27 Index & Search in Harvest

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
General Broker-Indexer interface to accomodate variety of index/search engines
Can use many backends (WAIS, Ingres etc.)
Index & Search Implemented using two specialized backends:
  • Glimpse for space-efficiency & flexible interactive queries
  • Nebula for time-efficiency & complex standing queries

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 28 Cache in Harvest

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * Critical Information in IMAGE
Full HTML Index
alleviates bottlenecks on web servers from popular objects
improve gathering efficiency in fetching data from file system
two modes: a standalone HTTPD accelerator and an object cache hierarchy using Mosaic proxy interface

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 29 Perfomance of Harvest -- Gatherer

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
Server load efficiency:
  • scans objects periodically & builds cache
  • allows info to be retirieved in single stream
  • retrieving data via Harvest GathererÕs cache has 6,600 times less server loading than via local file system
Network traffic efficiency:
  • supports incremental updates
  • transmittal of only the content summary is 59 times faster than that of full-text indexes

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 30 Performance of Search in Harvest -- Glimpse

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * Critical Information in IMAGE
Full HTML Index
booleans, reg. expressions, approximate searches
incremental indexing
searches take several seconds
indexing ~6000 files (~70MB) on SparcStation IPC:
  • incremental indexing in 2-4 min.
  • complete indexing in 20 min.

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 31 Implementation of Harvest
Standalone

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * Critical Information in IMAGE
Full HTML Index
A complete, information resource discovery, and access system
Archie, Veronica, WWW etc.
  • Gatherer configuration + Essence extraction script
Content Router
  • WAIS enumerator + Essence extraction script

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 32 Implementations of Harvest
with Other Systems -- Continued

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
WAIS
  • Essence full text extraction + ranking search engine
WebCrawler
  • Gatherer configuration + Essence full text extraction
WHOIS++
  • Gatherer configuration + Essence extraction script for centroids. Query front-end to search SOIF records

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 33 Future Technologies in Web Search

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
Natural Language Processing (NLP)
Distributed Queries (DQ)
Meta-Data Format (MDF)
Artificial Intelligence (AI)
Client-Side Data Caching/Mining (CDCM)
A combination of the above technologies

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 34 Future Technologies in
Web Search - NLP

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
Web Interface and web robot should be more user-friendly, allow natural language expressions
Documents should be treated as linear list of words for search (ex: WAIS)
Improved accuracy and richer representation of search output

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 35 Future Technologies in
Web Search - DQ

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
Conventional web robots take a lot of time to traverse and locate the information
Distributed Query enables faster processing, and low response time
Larger data processed in a short time
Let user do distributed search at several databases at once
This is also called ÒThe Webants approachÓ

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 36 Future Technologies in Web Search - MDF

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
Relational attributes pave way for NLP and SQL queries
MetaData Format most suitable
  • Subject: The topic addressed
  • Title: The name of the object
  • Author: Person primarily responsible for content
  • Publisher: Agent who made the object available
  • Other Agent: Editors, transcribers etc
  • Date: The date of publication
  • Object Type: novel, poem, dictionary etc
  • Form: Postscript file, Windows text file etc
  • Identifier: Unique identifier (no.) for the object
  • Relation: Relationship to other objects
  • Source: Objects, from which this one is derived
  • Language: Language of the intellectual content
  • Coverage: Spatial locations and temporal durations characteristic of the object

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 37 Future Technologies in
Web Search - AI

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
User preferences - Keep track of a user personalized preferences and adapt to his/her needs
User feedback - Let users mark what documents are relevant/irrelevant and build new queries based on this information
Dynamic link creation -
  • link to related documents in the bottom
  • links to discussions on Ôa phrase in the currentÕ

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 38 Future Technologies in
Web Search - CSCM

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
Cache/Mirror data for faster access, and indexing
  • When - the user requested, or by the maintainer of the cache
  • What - frequently used documents, and those which arenÕt already cached
  • Who has access to copies - proper cache is accessed according to user preferences
  • What way to present the copies - users shouldnÕt realize that they are looking at cached copies

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 39 More about Web Search Systems and Web Robots-- Yahoo

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
use listing/browsing as a simplified and restricted (yet more structured) way for indexing/searching web sites
hierarchical lists of URLs by topics - thousands of increasingly refined topic-specific areas
search by browsing the predefined links, while keyword-based title-search is available on the local web site
information gathering by manual categorization or user submissions (not clear how Yahoo discovers all the listed web sites)
Yahoo is a web site, not a search service

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 40 More about Web Search Systems and Web Robots

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
InfoSeek -- A popular commercial net search provider
WebCrawler -- gathers indexes of the total contents of documents, as well as URLSs titles
WWW Worm -- gathers information about titles and URLs from Web servers
WebAnts -- a project for cooperating, distributed Web spiders
JumpStation -- indexes the titles and headers of documents on the Web
MOMspider -- a spider that you can install on your system (Unix/Perl)
NIKOS -- allows a topic-oriented search of a spider database
RBSE URL -- a database of URL references, with full WAIS indexing of the contents of the documents
SG-Scout -- a robot for finding Web servers
Wandex -- -- index from the World Wide Web Wanderer

HELP! * GREY=local HTML version of LOCAL Foils prepared July 28,1995

Foil 41 Recent Capital Ventures in Web Search Business

From HPDC95 Web Search Presentation HPDC95 Pentagon City -- August 1,1995. * See also color IMAGE
Full HTML Index
America Online ---> (acquisition) WebCrawler (U. of Washington), WAIS (WAIS Inc.)
InfoSeek ---> (the first commercialized web search provider) net search service charged on a pay-per-document basis
Sequoia Capital ---> (captital venture) Yahoo (Standford U.)
CMG@Ventures ---> (captital venture) Lycos (Carnegie Mellon U.)
Microsoft Inc. ---> (licensed) Lycos (CMU)
KPC & Byers and Institutional Venture Partners ---> (capital venture) Architext Inc. (formed by six Standford graduates)

Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Mon Feb 17 1997