CPS 600 slides for GCF (04/10/95) Integration of Database Server and Web Technology Gang Cheng NPAC 111 College Place Syracuse NY 13244-4100 Abstract of Web Oracle Integration Presentation These foils were prepared by Gang Cheng and presented by Geoffrey Fox in CPS600 They contrast the strengths and weaknesses of Web and Database Information models They describe Web, Conventional and DB-WEb linked Client Servers Models They describe Oracle's WOW Oracle to Web Interface They describe one of our experimental implementations which links mh mail to Oracle Key points in Web Technology Internet as the major vehicle in networking industry Driving force: information access and dissemination WWW: Internet navigation and multimedia information access --- a killer networking application Main WWW components today: HTTP: the transport protocol HTML: the makeup language CGI: an interface to extend HTTP and HTML Fill-Out Form: a set of HTML constructs to allow clients passing run-time input data to server through GUI interface Client-server as the communication model (Flat hierarchical UNIX) File system as the major file (data) management system Focus on transparent networking navigation and a graphical user-interface for hypertext-based information dissemination Key points in database technology Relational database management system (RDBMS) as the major data model in database industry Driving force: information management, retrieval and processing Database server: store, manipulate and manage data in a reliable and powerful information system Main RDBMS components a search engine: the database management system SQL: the query language embedded SQL interfaces: interfaces to extend the SQL non-procedure programming in high-level programming languages database tools: various (GUI) application interfaces to the DBMS client-server as the major communication model customized proprietary network protocols built on TCP/IP as the major transport protocol focus on file management and I/O processing for a powerful text-based searching and data manipulation system The Synergy of Web Networking and Database Technologies database techniques used in Web technology data storage and data caching index searching data processing networking techniques used in distributed database technology distributed database two-phase commit data replication client/server model Weak Points in Web Technology static, browser-oriented client data (information in a narrow sense) repository tedious to manage and update (HTML) files on the server side difficult to keep most updated data automatically --- modifies usually done by a human using word processor or editing tools A plain UNIX file system only supports primitive functions such as open,read/write and close. More powerful operations on files are required for an information system which is supposed to provide information instead of raw data, i.e., low-level data manipulation operators such as those provided in RDBMS: select, insert, update, commit, rollback,etc. Meaningful data needs to be generated on-the-fly from raw data Weak Points in Database Technology a non-universal, non-portable remote networking access to database server plain-text oriented, no multimedia nonstandard front-end browser (most are line-based) By integrating a database server into Web technology, a Web server can be enhanced with: a powerful backend text searching engine supporting complex queries an efficient, reliable and sophisticated data management system supporting representation and organization of logical relationships among information entities a dynamic data processing engine, provided with opportunity for optimized searching performance in large information systems a time-dimension added HTML programming engine (see case study examples later) By integrating Web into database technology, a Database server can be enhanced with: a transparent networking interface for remote database server access a universal GUI-based search interface for hypertext-added data retrieval a truly globally-shared data system for maximum information dissemination an integrated multi-media information system Interfacing Web server and a database server -- Basic Features of WWW Client Server System Basic features in HTTP protocol -- a client-server model: single transaction per connection: stateless nature of the connection: end-user's previous connection run-time input will not be seen by current connection client & server open the connection ---> client sends request messages to server ---> server locates requested information and sends response back to client ----> client & server close the connection (note each URL represents a separate connection process) WWW Client Server Relationship Fill-out forms are currently used for the server to accept clients run-time input from a HTML page Two different strategies for WWW client-server interactions: Pre-edited HTML files (links) --- static and plain ASCII read-only files, identified by a file path; exist on disk; no need to process when used by the server (server performance is only determined by disk -> core memory I/O transfer) CGI scripts --- programs (shell,perl,programming languages) to generate data (usually in HTML format) to be returned to the server (which will then be transferred to the requesting client in HTTP protocol); the resultant data are dynamically generated on the fly; pipes are used to feed the output (via STDOUT) to server; server performance is mostly determined by CPU processing) Basic features in database access -- a SQL-based query interface SQL is the standard way to extract(query) data out from the database. Low-level SQL-based interfaces are available on different vendors RDBMS most commonly used one: Embedded SQL in C SQL programs: static (compiled) SQL and dynamic (interpretive) SQL Basic Client-Server model clients send SQL statements to server ---> server parses and executes the SQL against the current database ---> returned rows are buffered in server's main memory ---> server fetches buffered rows and sends them to clients Major approaches for interfacing web into DB server environment Embedded SQL in C + CGI (C program) Form-based relational querying and text searching Different approaches for different vendor's RDBMS systems (Sybase,Oracle,Informix, DB2, etc) more details can be found at "http://greatwall.npac.syr.edu:1963/PDB/ web-db.html" Basic WWW DB Integration Model Note: Web Server and DB client must be on the same machine, while DB client and DB server can be on different machine and a proprietary networking protocol (e.g., SQL*NET for Oracle7) is used. Inter-process Comunication (IPC) is used for the Web server to communicate with DB client on the same machine Oracle approach to Web-RDBMS Integration Data and SQL engine use tables, instead of static files, to store data build a schema on data tables to define data and the relationships of different information entities (the relational model) Web pages can be created, managed and delivered in more powerful way SQL operators for data and file processing: Select data from tables to provide accurate and up-to-date information, and data can be entered, updated through a web page to keep a consistent information base Procedural and Programmable mechanisms use PL/SQL for data manipulation and processing. PL/SQL is Oracle's procedural language for SQL application development support stored procedures, database triggers, optimized concurrent data access, and executable programs as meta-data (i.e., programs are stored together with and attached to the data objects to be manipulated on) a database programming language with expressive capability between C and SQL: loops, decisions, modules and function calls, subroutines, etc. database server is both a SQL engine and a compute engine programmable HTML pages Web-Oracle-Web (WOW) --- a Web-Oracle interface tool for developing gateways in a specific environment: PL/SQL stored programs major features: a utility to develop CGI gateways for Web-servers with PL/SQL in an Oracle7 database Wow procedures are executed within an Oracle database. They automatically inherit the following characteristics of Oracle7 database server: Scalability, Portability, National Language Support, Multi Threaded Servers, Replication, Distribution, Administration (SNMP), Object encapsulation (PL/SQL packages) Basic components in WOW A small shell-script to set environment-variables A C stub called wowstub implemented as a normal CGI program A PL/SQL package called wow An optional, but strongly recommended set of PL/SQL packages called HTP and HTF. These encapsulate HTML formatting, and will make migration to future HTML versions easy A stand-alone PL/SQL compiler with CGI enhancements. The compiler is a stand-alone gateway that will allow you to execute pl/sql code directly from within an HTML document Client-Server Architecture for WOW Note: the gateway wowstub program simply passes PL/SQL program name and input parameters gathered from forms to DB server. The DB server does both SQL query and HTML processing/formatting Web-oracle Mail Database: a case study of integrating Oracle7 database server into Web implemented using wow another Web interface to MH mailbox (similar to xmh but under web) a enhanced backend mail archives server with advanced search capability basic components a MH mail -> Mail tables parser in perl a Entity-Relation Model for mails a PL/SQL package for basic data query and processing functionality the CPS600 demonstration mail database is available at URL: http://kayak.npac.syr.edu:1963, under CPS600's Demo Mail Database The Entity-Relation Model Note: field definition of each table is not shown here; Inserting a parsed MH mail into the database will affect the tables: folder,msg header,msg body and people Functionality of the mail database home page --- generated by a PL/SQL procedure which first stores the client's information of the current connection to a 'visitor' table and then generates links of query options. clients history can be easily pull over from the database. (fig. 1) query the mailbox --- find out the summary information about the mailbox (fig. 2) query the folder --- choose a folder from all folders currently available in the database (fig. 3), and find out summary information of the chosen folder (fig. 4) All the above queries are summary information which are meta-data (accumulated or statistic data about basic tables) and produced at the time when clients requested (done by simple SQL calculation statements) Query by mail header Client chooses all folders or a folder currently available in the database (fig. 5) browse all mail headers (show msgid,date,sender,size in bytes and subject) (fig. 6) show chosen mail's body after the mail header browsing (fig. 14) optionally show the complete original MH mail in UNIX mailbox (fig. 15) Note that full table scan only on the msg header table is required to list all the mail header in a folder (msg body is not touched during the header browsing). This is Optimized browsing performance, compared to MH header browsing where header and body are stored in a single file and are not indexed Query by date Most recent five mails currently in the db mail related to now (the exact time as clients click the query-by-date page) in all folders: the same hour/day/week/month/ (this hour/today/this week/this month) one day before the same day (last hour/yesterday/last week/last month) (fig . 7) mail related to today (the same day as clients click the query-by-date page) in selected folders: N days/weeks/months ago from today (N >=0) N hours ago from now mail at any specified time or within any specified time range. Choose from hour(1-12), am/pm, day(1-31), month(1-12), year (fig. 8) system clock will be consulted to form above queries Query by Subject Query by a keyword --- return mails containing the entered keyword (non-indexed pattern matching, future option: context searching using Oracle TextServer search engine) (fig. 9, 10) the keyword can be a typed in or chosen from a table of previously entered keywords search in all folders or a selected folder from currently available folders in the DB search keyword occurrence in: both mail header (subject line) and body, or mail header only, or mail body only search with either case insensitive or sensitive for the keyword typein keyword will or will not be added to the DB table for future use or keyword in the current DB table will or will not be removed from the DB database insert/delete operations are used to add/remove keywords in the table --- the power of remembering clients interactions on the web Query by Sender Query all mail sent from a selected sender, based on all senders gathered in the current mail database (fig. 11) sender is uniquely identified by his/her userid a specified folder can be used to narrow the searching, as in many other query options Note the sender list is not pre-entered or edited. It is automatically built when a new mail is being incorporated into the database from the MH mail folder by the parser --- another dynamic feature of web-db integration Incorporating new mail into the Database from your UNIX mailbox --I For security and database consistence reason, userid/password at the database level (not Mosaic page-level) is enforced to update the database (fig. 12) Two kinds of folders: MH folder in UNIX file system and folder table in the database system. They hold the same information but in different places. Original MH mail (in MH format) in a folder must be converted into the database tables (in Oracle table format) to enable all the above query options on the Oracle db server Assume each database folder must have a unique MH folder in UNIX file system. The database folder is created when the first mail in the UNIX MH folder is being incorporated in the database Incorporating new mail into the Database from your UNIX mailbox --II Before incorporating new mail from a UNIX folder, the PL/SQL procedure first finds out current maximum msg id in the corresponding DB folder and only try to parse/incorporate those MH mails with msg id greater than the maximum msg id in DB (fig. 13) If the the MH folder contains subfolders, new mails in all subfolders will be considered to incorporate, and this is a recursive process. Implemented by both a perl script (for parsing) and PL/SQL procedures A time-intensive process as insertions may be occurred on up to four DB tables which are indexed. But this process can be done in batch ! Incorporating can be done at the same time when other query options are requested by other clients on the web. The DB server will ensure the query concurrency and data consistency --- a reliable information system ! Format of Query results from all query options List of mail headers satisfying the query (show msgid,date,sender,size in bytes and subject) (fig. 6) Show chosen mail's body after the mail header browsing (fig. 14) Optionally show the complete original MH mail in UNIX mailbox (fig. 15) Query options to be added in the future to Oracle-based Web Mail System Query the People Query By Mail Type Query By Date and Sender Query By Date and Subject Query By Date and Sender and Subject Interface the mail database with WebMail tools (CPS600 server) Another example using WOW for a similar database can be found at http://kayak.npac.syr.edu:1963 under link TravelVenture Demo for "travel agency" application Some Current(January 96) NPAC Web-RDBMS Activities -- Enterprise Information Systems(EIS) Use Oracle Database Server for Well-structured Information Entities WWW Interface to access Oracle-based EIS Central New York Tourism Image Database -- Living Schoolbook Patient Record Database Prototype -- will enhance with customizable Java frontend See http://kayak.npac.syr.edu:1963 Some Current(January 96) NPAC Web-RDBMS Activities -- Full Text Web Search System Use Oracle RDBMS Server and Oracle's Context Server Technology as Indexing and Search Engines Use High Performance and Intelligent Web agents to gather local and/or remote Web Pages Customizable (form based) Web Search Interface See example as on-line HTML book search (Solving Problems on Concurrent Processors) Search programming/close captioning attached to digital video database (Living Schoolbook) Search traditional text based CD-Rom's (NewsBank in Living SchoolBook) and search on 30,000 URL's in NPAC Web Site (http://kayak.npac.syr.edu:1963/search Some Current(January 96) NPAC Web-RDBMS Activities -- Mixture of well Structured and Full Text Entitities Oracle RDBMS server as core engine for relational data Oracle's Context Server to deal with non-structures full-text data Example is mail or searchable USENET Newsgroups archives -- see http://asknpac.npac.npac.syr.edu Here To,From,Date,Subject are structured fields Body of mail message is unstructured Some Current(January 96) NPAC Web-RDBMS Activities -- Parallel Server Technology Migrating many of above projects to SP2 -- currently 4 wide nodes Use Parallel Oracle7 on IBM SP2 (V7.3 beta, Parallel Server and query Options) i.e. use Parallelism from multiple users and within query Multiple Web Servers and Oracle RDBMS Instances supporting single client image for database and search interface See http://128.230.3.39:1963