Given by Gang Cheng (Geoffrey Fox) at CPS600 Spring Semester on April 1995. Foils prepared Jan 8, 1996
Abstract * Foil Index for this file
These foils were prepared by Gang Cheng and presented by Geoffrey Fox in CPS600 |
They contrast the strengths and weaknesses of Web and Database Information models |
They describe Web, Conventional and DB-WEb linked Client Servers Models |
They describe Oracle's WOW Oracle to Web Interface |
They describe one of our experimental implementations which links mh mail to Oracle |
This table of Contents Abstract
Gang Cheng |
NPAC |
111 College Place |
Syracuse NY 13244-4100 |
These foils were prepared by Gang Cheng and presented by Geoffrey Fox in CPS600 |
They contrast the strengths and weaknesses of Web and Database Information models |
They describe Web, Conventional and DB-WEb linked Client Servers Models |
They describe Oracle's WOW Oracle to Web Interface |
They describe one of our experimental implementations which links mh mail to Oracle |
Internet as the major vehicle in networking industry |
Driving force: information access and dissemination |
WWW: Internet navigation and multimedia information access --- a killer networking application |
Main WWW components today:
|
Client-server as the communication model |
(Flat hierarchical UNIX) File system as the major file (data) management system |
Focus on transparent networking navigation and a graphical user-interface for hypertext-based information dissemination |
Relational database management system (RDBMS) as the major data model in database industry |
Driving force: information management, retrieval and processing |
Database server: store, manipulate and manage data in a reliable and powerful information system |
Main RDBMS components
|
database techniques used in Web technology
|
networking techniques used in distributed database technology
|
static, browser-oriented client |
data (information in a narrow sense) repository |
tedious to manage and update (HTML) files on the server side |
difficult to keep most updated data automatically --- modifies usually done by a human using word processor or editing tools |
A plain UNIX file system only supports primitive functions such as open,read/write and close.
|
a non-universal, non-portable remote networking access to database server |
plain-text oriented, no multimedia |
nonstandard front-end browser (most are line-based) |
a powerful backend text searching engine supporting complex queries |
an efficient, reliable and sophisticated data management system supporting representation and organization of logical relationships among information entities |
a dynamic data processing engine, provided with opportunity for optimized searching performance in large information systems |
a time-dimension added HTML programming engine (see case study examples later) |
a transparent networking interface for remote database server access |
a universal GUI-based search interface for hypertext-added data retrieval |
a truly globally-shared data system for maximum information dissemination |
an integrated multi-media information system |
Basic features in HTTP protocol -- a client-server model:
|
client & server open the connection ---> |
client sends request messages to server ---> |
server locates requested information and sends |
response back to client ----> client & server close the connection (note each URL represents a separate connection process) |
Fill-out forms are currently used for the server to accept clients run-time input from a HTML page |
Two different strategies for WWW client-server interactions:
|
SQL is the standard way to extract(query) data out from the database. |
Low-level SQL-based interfaces are available on different vendors RDBMS |
most commonly used one: Embedded SQL in C |
SQL programs: static (compiled) SQL and dynamic (interpretive) SQL |
clients send SQL statements to server ---> |
server parses and executes the SQL against the current database ---> |
returned rows are buffered in server's main memory ---> |
server fetches buffered rows and sends them to clients |
Embedded SQL in C + CGI (C program) |
Form-based relational querying and text searching |
Different approaches for different vendor's RDBMS systems (Sybase,Oracle,Informix, DB2, etc) |
more details can be found at "http://greatwall.npac.syr.edu:1963/PDB/ |
web-db.html" |
Note: Web Server and DB client must be on the same machine, |
while DB client and DB server can be on different machine and a proprietary networking protocol (e.g., SQL*NET for Oracle7) is used. |
Inter-process Comunication (IPC) is used for the Web server to communicate with DB client on the same machine |
Data and SQL engine
|
use PL/SQL for data manipulation and processing. |
PL/SQL is
|
in a specific environment: PL/SQL stored programs |
major features:
|
A small shell-script to set environment-variables |
A C stub called wowstub implemented as a normal CGI program |
A PL/SQL package called wow |
An optional, but strongly recommended set of PL/SQL packages called HTP and HTF. These encapsulate HTML formatting, and will make migration to future HTML versions easy |
A stand-alone PL/SQL compiler with CGI enhancements. The compiler is a |
stand-alone gateway that will allow you to execute pl/sql code directly from within an HTML document |
Note: the gateway wowstub program simply passes PL/SQL program name and input parameters gathered from forms to DB server. |
The DB server does both SQL query and HTML processing/formatting |
implemented using wow |
another Web interface to MH mailbox (similar to xmh but under web) |
a enhanced backend mail archives server with advanced search capability |
basic components
|
Note: field definition of each table is not shown here; |
Inserting a parsed MH mail into the database will affect the tables: |
folder,msg header,msg body and people |
home page --- generated by a PL/SQL procedure which first stores the client's information of the current connection to a 'visitor' table and then generates links of query options. clients history can be easily pull over from the database. (fig. 1) |
query the mailbox --- find out the summary information about the mailbox (fig. 2) |
query the folder --- choose a folder from all folders currently available in the database (fig. 3), and find out summary information of the chosen folder (fig. 4) |
All the above queries are summary information which are meta-data (accumulated or statistic data about basic tables) and produced at the time when clients requested (done by simple SQL calculation statements) |
Client chooses all folders or a folder currently available in the database (fig. 5) |
browse all mail headers (show msgid,date,sender,size in bytes and subject) (fig. 6) |
show chosen mail's body after the mail header browsing (fig. 14) |
optionally show the complete original MH mail in UNIX mailbox (fig. 15) |
Note that full table scan only on the msg header table is required to list all the mail header in a folder (msg body is not touched during the header browsing). |
This is Optimized browsing performance, compared to MH header browsing where header and body are stored in a single file and are not indexed |
Most recent five mails currently in the db |
mail related to now (the exact time as clients click the query-by-date page) in all folders:
|
mail related to today (the same day as clients click the query-by-date page) in selected folders:
|
mail at any specified time or within any specified time range. Choose from hour(1-12), am/pm, day(1-31), month(1-12), year (fig. 8) |
system clock will be consulted to form above queries |
Query by a keyword --- return mails containing the entered keyword (non-indexed pattern matching, future option: context searching using Oracle TextServer search engine) (fig. 9, 10) |
the keyword can be a typed in or chosen from a table of previously entered keywords |
search in all folders or a selected folder from currently available folders in the DB |
search keyword occurrence in:
|
search with either case insensitive or sensitive for the keyword |
typein keyword will or will not be added to the DB table for future use or keyword in the current DB table will or will not be removed from the DB |
database insert/delete operations are used to add/remove keywords |
in the table --- the power of remembering clients interactions on the web |
Query all mail sent from a selected sender, based on all senders gathered in the current mail database (fig. 11) |
sender is uniquely identified by his/her userid |
a specified folder can be used to narrow the searching, as in many other query options |
Note the sender list is not pre-entered or edited. It is automatically built when a new mail is being incorporated into the database from the MH mail folder by the parser --- another dynamic feature of web-db integration |
For security and database consistence reason, userid/password at the database level (not Mosaic page-level) is enforced to update the database (fig. 12) |
Two kinds of folders: MH folder in UNIX file system and folder table in the database system. They hold the same information but in different places. |
Original MH mail (in MH format) in a folder must be converted into the database tables (in Oracle table format) to enable all the above query options on the Oracle db server |
Assume each database folder must have a unique MH folder in UNIX file system. The database folder is created when the first mail in the UNIX MH folder is being incorporated in the database |
Before incorporating new mail from a UNIX folder, the PL/SQL procedure first finds out current maximum msg id in the corresponding DB folder and only try to parse/incorporate those MH mails with msg id greater than the maximum msg id in DB (fig. 13) |
If the the MH folder contains subfolders, new mails in all subfolders will be considered to incorporate, and this is a recursive process. |
Implemented by both a perl script (for parsing) and PL/SQL procedures |
A time-intensive process as insertions may be occurred on up to four DB tables which are indexed. But this process can be done in batch ! |
Incorporating can be done at the same time when other query options are requested by other clients on the web. The DB server will ensure the query concurrency and data consistency --- a reliable information system ! |
List of mail headers satisfying the query (show msgid,date,sender,size in bytes and subject) (fig. 6) |
Show chosen mail's body after the mail header browsing (fig. 14) |
Optionally show the complete original MH mail in UNIX mailbox (fig. 15) |
Query the People |
Query By Mail Type |
Query By Date and Sender |
Query By Date and Subject |
Query By Date and Sender and Subject |
Interface the mail database with WebMail tools (CPS600 server) |
Another example using WOW for a similar database can be found at |
http://kayak.npac.syr.edu:1963 under link TravelVenture Demo for "travel agency" application |
Use Oracle Database Server for Well-structured Information Entities |
WWW Interface to access Oracle-based EIS |
Central New York Tourism Image Database -- Living Schoolbook |
Patient Record Database Prototype -- will enhance with customizable Java frontend |
See http://kayak.npac.syr.edu:1963 |
Use Oracle RDBMS Server and Oracle's Context Server Technology as Indexing and Search Engines |
Use High Performance and Intelligent Web agents to gather local and/or remote Web Pages |
Customizable (form based) Web Search Interface |
See example as on-line HTML book search (Solving Problems on Concurrent Processors) |
Search programming/close captioning attached to digital video database (Living Schoolbook) |
Search traditional text based CD-Rom's (NewsBank in Living SchoolBook) |
and search on 30,000 URL's in NPAC Web Site (http://kayak.npac.syr.edu:1963/search |
Oracle RDBMS server as core engine for relational data |
Oracle's Context Server to deal with non-structures full-text data |
Example is mail or searchable USENET Newsgroups archives -- |
see http://asknpac.npac.npac.syr.edu |
Here To,From,Date,Subject are structured fields |
Body of mail message is unstructured |
Migrating many of above projects to SP2 -- currently 4 wide nodes |
Use Parallel Oracle7 on IBM SP2 (V7.3 beta, Parallel Server and query Options) |
i.e. use Parallelism from multiple users and within query |
Multiple Web Servers and Oracle RDBMS Instances supporting single client image for database and search interface |
See http://128.230.3.39:1963 |