Data Volume
-
Estimated Web total text size: 0.1 - 1 Terabytes, 5 - 10 million documents (this estimation is based on text size on NPAC web server: 110 MB text, 36,000 text URLs, avg. 3K/page) - grows daily
-
Requires more sophisticated search mechanism than browsing and organizing in hyperlinks
|
Data Diversity
-
WWW - a gigantic distributed database with unstructured, non-relational and hierarchical (multimedia) information entities with various data formats: MIME -- html, plain text, PostScript, LaTex, etc.
-
Web repositories are heterogeneous, inconsistent and incomplete.
|
User Base
-
Different requirements in query patterns, search topics and response time
-
Rapid growth in number and search requests daily
|