Second Part of RDBMS Discussion
	Parallel Relational Database Management Systems II
		
		Gang Cheng, Geoffrey C. Fox, Nancy McCracken, Marek Podgorny, Chao-Wei Ou
		
		Northeast Parallel Architectures Center
		111 College Place
		Syracuse, New York 13244-4100
	Abstract of Second Part of Parallel Relational Database Systems Presentation
		This discusses the last three parts of presentation
		What is situation at NPAC including hardware, InfoMall activities and the DR-LINK product of Textwise (Liz Liddy)
		Detailed Discussion of Parallel Oracle on SP2 and nCUBE
		Results of NPAC's Benchmarking Activities
			Especially TPCD work
	Parallel Database efforts at NPAC and in InfoMall
		 
	NPAC's Experience in Parallel Database Technology
		 Experience with Oracle, nCUBE and IBM for their parallel RDBMS products
		 Direct contacts with PRDBMS developers at Oracle and IBM
		 Hands-on Expertise in large-scale (up to 100 GB DB size) DSS benchmarking
		 Experience in parallel computing and applications in industry, parallel system integration and data visualization 
	InfoMall(NPAC) Parallel Database Evaluation and Benchmarking Service
		Planning, design, implementation, and interpretation of Decision Support benchmarks using parallel database technology
		InfoMall product, supported by technology vendors
		Standard and customer defined benchmarks
		NPAC provides state-of-the-art infrastructure, MPP and parallel database expertise, and trained personnel
		InfoMall assumes most of the risks and hazards involved in the benchmarking process
		Non-disclosure arrangements available 
	
	Integrated Decision Support Systems
		InfoMall project
			NPAC teams with a small software vendor, major software vendor, and hardware vendor to provide an integrated Decision Support system for a number of Fortune 500 companies
			NPAC role:
				help parallelizing the  code kernel of the Decision Support package
				interface the package to a parallel database system and provide capacity planning based on customer performance requirements and NPAC parallel database expertise
				provide infrastructure and system support for project duration
			Project deliverable: integrated turnkey DS system with all elements implemented of the same MPP platform
	Overview of DR-LINK High Level Natural Language Database Interface
		This research activity of Professor E. Liddy of Information Studies School at Syracuse University is being commercialized by InfoMall member -- the start-up company Textwise.
		The Surface Realizations i.e. particular strings of words in documents or database quert, do not reflect the breadth, richness or precision of meaning intended by either
			Author of Document
			Presenter of Query
		Simple word level representation cannot produce the recall and prescision possible when the full conceptual intent of queries and documents is the basis of retrieval
			Thus DR-LINK matches on the concepts of interest to the user and not just the words of interest
		DR-LINK achieves this deeper conceptual representation of texts by utilizing all the levels of language at which meaning is conveyed:
			Lexical, Syntactic, Semantic and Discourse levels of language
			DR-LINK combines these multiple levels of conceptual representation of text
	Features of DR-LINK High Level Natural Language Database Interface
		Accepts lengthy ambiguous complex Natural Language Queries
		Translates Natural Language queries into precise Boolean representations of the user's implied requirements for relevance of a document to query
		Produces summary-level semantic SFC (Subject Field Codes) vector representations of query and documents for quick filtering of large databases
		Catures Text Structure dimensions of time, source, state of completion, credibility, definiteness, intentionality and causality
		Fulfills Proper Noun requirements of queries by:
			Precise matching of focused Proper Noun requests
			Accurate Proper Noun category-level matching
			Expansion of categories to constituent members
		Provides both high recall and high precision via controlled expansion of Complex Nominals
		Evaluates Implicit and Explicit Semantics to assign similiarity between Query and Documents using its Integrated Matcher
		Predicts how many documents need to be viewed to acieve the user-specified level of recall via the Cut-Off Criterion
	Structure of DR-LINK High Level Database Interface
		  Raw
		Documents
	Parallel Oracle7 RDBMS -- A Case Study
		 
	What is the Oracle Parallel Server -- Introduction ?
		 A high-performance RDBMS for loosely coupled and MPP systems
		 Currently supported systems: IBM SP2, nCUBE2, SGI SMP, SUN SMP, HP SMP, Amdahl, Encore, KSR, CM5, 
		 Also known as multi-instance Oracle: Multiple instances running against a single, shared database
		 A evolutionary approach on existing sequential Oracle RDBMS
	
	Oracle Parallel RDBMS Architecture
	Overall Structure
		Two independent options
			Parallel Server (PS) - since v. 6.2
			Parallel Data Query (PDQ) - since v. 7.1
		Parallel Server option available for most Unix platforms
		Parallel Data Query available for few SMPs 
		combination of  PS and PDQ mostly in beta, available for a handful of SMP clusters and MPP machines 
	Oracle Parallel RDBMS Architecture
	Parallel Data Query option:
		parallelizes execution of a single SQL query
		parallel execution carried out by concurrent Unix processes communicating  via shared memory
		single Oracle instance sufficient - no PS necessary
		uses multiple CPUs on SMP architectures but can run on single CPU machines as well with modest performance benefits
		essential for Decision Support
		irrelevant for OLTP
	Oracle Parallel RDBMS Architecture
	Parallel Server option
		lets multiple Oracle instances access the same database
		requires a cluster of single CPU workstations, a cluster of SMPs, or an MPP to run
		requires a Distributed Lock Manager to ensure data consistency
		requires all disk storage accessible by each Oracle node
		supports PDQ distributed across multiple instances
		essential  for  OLTP and Òapplication partitioningÓ
		relevant for scalability of ad hoc DS queries
	What are the versions of the Oracle Parallel Server ?
		First introduced with Oracle 6.2
		version 6.2 = version 6.0 (sequential) + Parallel Cache Management
		version 7.1 = version 6.2 + parallel query processing
		latest version 7.2 = sequential Oracle + Parallel Server Option + Parallel Query Option
		production release on nCUBE is in Jan. 1995 
		All parallel implementation is transparent to users
		Key technology: Parallel Cache Management and Parallel Query Processing
		Key components: a Distributed Lock Manager and a Parallel Query Optimizer
	What is the Oracle Parallel Server File Structure and Architecture?
		 Oracle 7 Server Architecture
			shown in next foil 
		 Oracle 7 Database Files:
			Data Files --- Contain all the database data; logical structure, such as tables and indexes, are physically stored in the data files
			Redo Log Files --- Transaction logs that record all changes made to database information for recovery
			Control Files --- File that record the physical structure of the database including all datafiles and redo log files
	Oracle7 Server Architecture
		 
	The Oracle Server Instance
		 Every time the Oracle 7 Server is started, the System Global Area (SGA) is allocated and the Oracle 7 Server background processes are started. The combination of memory buffers and background processes is called an Oracle Server Instance.
		 The SGA is a group of shared memory buffers allocated by the Oracle 7 Server for an instance
		 The background processes asynchronously perform distinct tasks on behalf  of all database users of the instance
	Oracle7 Parallel Server Characteristics 
		 Start an Oracle Server instance on each node
		 Each instance has its own set of redo log files
		 All instances can concurrently execute transactions against the same database and each instance can have multiple users executing transactions
	The Oracle7 Parallel Server
		Each node of a parallel system runs an instance of the Oracle7 Server that accesses a shared database
	Application Example of Parallel Server (functional parallelism)
		 
	Parallel Server Option
		 Multiple Instances on different nodes share a single physical database
		 The data is accessed by all nodes through a parallel file subsystem
		 All instances have common data & controls files and their individual log files
		 All instances can simultaneously execute transactions against the single database, and each instance can have multiple users executing transactions
		 Data is kept consistent using PCM Technology 
		 Implemented by the Distributed Lock Manager
	Oracle7 on IBM SPX Architecture
		Oracle Components 
			Distributed Lock Manager
			Oracle Parallel Server
		IBM Components 
			High Performance Switch 
			Virtual System Disk (VSD)
	Parallel Cache Management (PCM) and Distributed Lock Manager(DLM)
		 Concurrency control of buffer caches among instances
		 Maintain independent, yet consistent cache of blocks for all instances
		 Allow multiple nodes to simultaneous read/write to database (disks)
		 Maximizes concurrent data access and minimizes (disk) I/O activity
		 Block-level PCM locks, row-level locking preserved
		 A fault-tolerant DLM: if a node goes down, the DLMs will automatically recover and reconstruct the lock database on behalf of the failed DLM 
	Oracle Parallel Server On-Line Recovery
		Three possible failures
	Example of failure due to lack of concurrency control
		Time Steps:
			Step 1: Node 1 reads block 1 from disk and stores in Node 1 memory cache, Node 2 reads block 1 from disk and stores in Node 2 memory cache
			Step 2: Node 1 writes information to block 1, Node 2 writes information to block 1
			Step 3: Node 1 writes block 1 to disk
			Step 4: Node 3 reads block 1 from disk and stores in Node 3 memory cache	
		In this situation, Node 1 and 3 have the same information in each memory cache, but Node 2 has an inconsistent cache.
	Discussion of PCM Lock Example
		The example of previous and following foil, illustrates why concurrency control is needed in a parallel DB server environment.
		It shows how contention for shared resources(data,disk access) is being handled under a multi-node parallel system sharing a single database.
		The blackened boxes in the figure are the same blocks in each node (not rows !) requested by different nodes(instances) and located in different SGA bufferes.
		Concurrency control needs to be implemented to ensure a consistant database,
		i.e. the same block from the disk must be reflected as exactly the same copy in different buffers.
		This is achieved by PCM locks.
	Contention for Shared Resources
		3 Oracle Instances Accessing Shared database
	Using Oracle 7 Server's PCM locks:
		 An instance acquires a PCM lock as a resource for update purposes
		 All buffer caches must contain the same copy of a resource 
		PCM blocks:
			 Ensure cache coherency by only allowing instances to change blocks while holding a lock
			 PCM lock work independently of transaction locks (row-level locking)
			 Ensure cache coherency using minimum amount of communication
			 PCM locks are allocated to files when a instance is started
			Blocks are typical O/S disk blocks and of fixed size
	Consider the following PCM Example
		Time
			1 Instance X attempts to update row 1 in (data block n)
			   Instance X becomes owner of PCM lock for data block n
			   Instance X update row 1
			2 Instance Y attempts to update row 2 in (data block n)
			   Instance Y requests the PCM lock from instance X for data block n
			   Instance X releases the PCM lock
			   Instance Y becomes owner of PCM lock for data block n
			   Instance Y update row 2
			3 Instance X attempts to update row 3 in (data block n)
			   Instance X requests the PCM lock from instance Y for data block n
			   Instance Y releases the PCM lock
			   Instance X becomes owner of PCM lock for data block n
			   Instance X update row 3
		 The PCM is handled separately from transaction lock -- when instance wish to update the same data (e.g.: Instance X and Y attempt to update Row 1), normal transaction locking is involved.
	Parallel Query Option
		Parallel Query Processing 
			 enables the parallel processing of lon and complex queries with parallelizable operators such as scan,join and sorts
			 speeds up query response time (vs throughput in Parallel Server Option)
		 Parallel Data Load
			 Loads data in both a direct and parallel method
			 speeds up the loading of large data loads
		 Parallel Index Creation
			 speeds up the creation of large indexes
		 Parallel Recovery
			 enables concurrent recovery sessions
	Parallel Query Processing: Overview
		Typical Data Flow with 5 steps:
			1 A client process issues a query
			2 The statement is parsed and a shared server process executes the statement
			3 The shared server process determines if the statement can utilize parallel processing, and then enlists two or more query servers to execute the statement in parallel. The shared server process partitions the work of each operation in the query equally among the query servers
			4 Query servers process certain operations in the query independently and return sub-results back to the shared server process
			5 The shared server process reassembles the results from the query servers and returns the query's results to the client
	Query Processing
		 
	Three Processing Phases in Parallel Query
		1 Parse --- a query optimizer determines optimal execution plan, parallelize query if a full scan was included in the execution plan (note the query is parallelized dynamically at execution time, automatically adapting changes in data partition and server configuration)
		2 Execute --- perform operations
		3 Fetch (Queries) --- return rows from query server processes
		The execution and fetch phases can be performed in parallel ( pipelined parallelism) by multiple server processes
	Example of parallel execution of a full scan 
		 The shared query server is known as Query Coordinator who is responsible for parallelizing operations and interpreting the results from query servers
		 Not all operators can be parallelized in Oracle 7.2
		 Oracle Server Parallizable Operations 
			 Table Scans
			 Sorts (for ORDER BY, GROUP BY, Joins, DISTINCT)
			 Joins (Sort-Merge, Nested Loops)
			 Aggregates  (GROUP BY, MIN, MAX, AVG, etc )
			 SET operations (eg., UNION, INTERSECT or MINUS) are not parallelized
		See following two figures
	Parallel Query Execution
		Full Table  Scan without Parallel Query Option
		
		
		
		
		
		

		Multiple Query Servers Performing a Full Table  Scan
	The query coordinator dynamically partitions the work among the query servers
		 
	Data Partitioning in Oracle 7
		 Oracle 7 on nCUBE2 and IBM SP2 are actually implemented as shared-disk (vs. shared-nothing) architecture
			See earlier discussion of sophisticated nCUBE implementation
		 Shared database by sharing disks through parallel file system or parallel I/O system:
			nCUBE -- the GFS (Global File System) over a Gigacache Subcube and nCUBE2's parallel I/O system; 
			IBM SP2 -- VSD (Virtual Shared Disk) parallel I/O system (over the HPS) 
	File Striping in Oracle
		 Automatic file striping and disk partitions by the operating system on nCUBE2
			 General OS and server level, less flexible, but relatively transparent to DBA
			 data skew is less possible, due to the Gigacache
		 Manually file striping at database table level on IBM SP2
			 Application-specific, flexible, but difficult to handle with large number of files and impossible to determine optimal striping length
			 data skew is highly possible
	Benchmarking Parallel RDBMS
		 
	Industry Benchmarking Tests-I
		 Transaction Processing Council
			TPC A,B,C -- standards for OLTP benchmarks
				TPCB OLTP without network
				TPCA OLTP with a LAN or wide area network
				TPCC  Online Business Transaction Processing
			TPCD -- not yet finalized DSS benchmark
		 Wisconsin Benchmark (aka Dewitt)
			Initially a benchmark suite for DIRECT database machine but later used in academia and research for benchmarking parallel RDBMS with relational queries
	Industry Benchmarking Tests-II
		 Set Query
			Complex queries and reporting for DSS applications and datamining applications
		 The Neal Nelson Database Benchmark (tm)
			Multi-user UNIX business benchmarking for specific application domains and customers produced by Neal Nelson Associates
		 SPEC (Standard Performance Evaluation Corporation)
			Industry body to coordinate Industry standard benchmarking in DB (and scientific) arena.
		 The Customer -- or worse >> The Consultant
			Customized domain-specific benchmarking developed by cutomer's own applications and specifications
	Characteristics of DSS Workloads
		 Few insert/update/delete transactions, read-only queries.
		 Non-selective & data-intensive; few in number, large in size
		 May return a large volume of data: e.g.: mailing list
		 Queries frequently scan a large percentage of the data in the database
		 Often used to mine data  in a recursive, drill-down fashion
	
	TPC-D - a Decision Support Benchmark?
	Overall Structure
		 A common suite for DSS benchmarking and comparing:
			different software and hardware systems (such as Oracle 7 on nCUBE2 vs. DB2 on IBM SP2)
			different software Products on the same machine (such as Oracle7 and DB2 on IBM SP2)
			different machines in a compatible family (such as Oracle7 on IBM SP1 and SP2, or on a RS/6000)
			different releases of a product on the same machine 
		Our TPCD benchmarking project has gone through most of  the above examples
	
	TPC-D - a Decision Support Benchmark?
	Nature of Benchmark Examples
		TPC-D definition requires that either one (power test) or many (scale-up test) query streams are run and timed against at least three sizes of a predefined database, with sizes growing exponentially (e.g., 1G, 10GB, 100GB sequence) 
		Query stream consists of 17 complex queries plus two concurrently running update operations
		No extensive database tuning in allowed - optimizer hints are explicitly forbidden
		timings for individual queries are not part of the TPC-D metric
		Database structure well represents  most common DSS DBs in applications, i.e. one largest table takes ~70% of the total DB size.
	
	TPC-D - a Decision Support Benchmark? Nature of Queries
		model complex queries against a relatively large and complex database 
		All currently parallelizable relational operators (Oracle 7 Server) are used, including:
			Aggregate (Group by) - Q2, Q6, Q11, Q14, Q15
			Merge Join - Q8 
			Nested Loops - All  except Q1
			Sort (Group by) - All except Q2
			Sort (Join) - Q8, Q12
			Sort (Order by) - Q3, Q5, Q10, Q11, Q16
			Sort (Unique) - Q4
			Table Access (Full) - All 
	 TPC-D - a Decision Support Benchmark?
	Problems with  TPC-D design
		Query stream design: Query 17 takes more than 95% of the time needed to execute the stream
		Overal design unnecessarily mimics TPC OLTP benchmarks
		Metrics more suitable for Data Warehouse applications than for ad hoc DS queries
		Benchmark expensive to run (huge disk space requirements)
	
	TPC-D - a Decision Support Benchmark?
	Scalability Issues
		Is TPC-D suitable for parallel DBMS evaluation?
			No scalability performance metrics suitable for parallel data query evaluation i.e., query time vs. number of instances (CPUs), and/or, query time vs. available I/O resources  ( I/O channels, drivers, disks and disk partitions)
		Scalability of a parallel database system relies on both CPU resources  and I/O resources. 
		By using multiple instances (CPUs), parallel server/query decreases query response time and increases throughput. 
		By using partitioned data on multiple disks (with multiple I/O channels), parallel server/query achieves higher I/O bandwidth for large DBs. 
		The two characteristics are equally important and must be benchmarked together. In current  TPC-D specification, neither performance metrics is defined.
	
	TPC-D - a Decision Support Benchmark?
	Relevance to Real Applications
		For DS applications, query stream execution time is largely irrelevant. It is far more important to understand performance and scalability of elementary SQL operations and of typical queries encountered in customer application.  
		 Loading and indexing may take a large portion of the overall time in preparing a large DB. Both parallel loading and indexing become  important functionality of a parallel RDBMS. Their performance for large DSS databases should be defined and benchmarked, together with  the query benchmarks. Currently none of them is defined or mentioned in the  TPC-D specification.
	
	TPC-D - a Decision Support Benchmark?
	Overall Evaluation
		TPC-D design contains flaws that make its use questionable even for sequential database systems
		Given the complexity of the parallel database systems, attempts to reduce their performance estimate to one number are most likely futile
		TPC-D defines no metrics useful for understanding behavior of parallel RDBMSes. TPC-D designers should re-examine  academic research work pertinent to the field
		Future research work towards definition of DS benchmark for parallel databases seems important
	Some Lessons Learned in NPAC's Benchmarking
		 Data partitioning must be very carefully designed to balance I/O and CPU
		 Parallel Loading is great ! --- almost linear speedup
		 Backup may be important to save time (we don't have)
		 The most difficult and trouble-making phase is the database preparation/loading/indexing phase which always involves storage management parameters 
		 you must be both a DBA and sysadm and benchmarking relies critically on system stability
		 Be prepared to bring down the whole parallel  system a couple of times a day 
		 Be prepared that the results are negative (nothing may be obtained !)
		 you need at least twice the disk space of your original database size
		 performance tuning is extremely difficult with Oracle 7 parallel server