HELP! * GREY=local Full HTML for

LOCAL foilset Second Part (Benchmarking,Explicit Systems, Textwise) of Parallel Relational Databases

Given by Gang Cheng Marek Podgorny (Geoffrey Fox) at CPS600 Spring Semester on April 1995. Foils prepared July 6,1995
Abstract * Foil Index for this file

See also color IMAGE
This discusses the last three parts of presentation
What is situation at NPAC including hardware, InfoMall activities and the DR-LINK product of Textwise (Liz Liddy)
Detailed Discussion of Parallel Oracle on SP2 and nCUBE
Results of NPAC's Benchmarking Activities
  • Especially TPCD work

Table of Contents for full HTML of Second Part (Benchmarking,Explicit Systems, Textwise) of Parallel Relational Databases


1 Parallel Relational Database Management Systems II
2 Abstract of Second Part of Parallel Relational Database Systems Presentation
3 Parallel Database efforts at NPAC and in InfoMall
4 NPAC's Experience in Parallel Database Technology
5 nCUBE Database Hardware and Software Infrastructure in NPAC
6 IBM Database Hardware and Software Infrastructure in NPAC
7 InfoMall(NPAC) Parallel Database Evaluation and Benchmarking Service
8 Integrated Decision Support Systems
9 Overview of DR-LINK High Level Natural Language Database Interface
10 Features of DR-LINK High Level Natural Language Database Interface
11 Structure of DR-LINK High Level Database Interface
12 Parallel Oracle7 RDBMS -- A Case Study
13 What is the Oracle Parallel Server -- Introduction ?
14 Oracle Parallel RDBMS Architecture
Overall Structure

15 Oracle Parallel RDBMS Architecture
Parallel Data Query option:

16 Oracle Parallel RDBMS Architecture
Parallel Server option

17 What are the versions of the Oracle Parallel Server ?
18 What is the Oracle Parallel Server File Structure and Architecture?
19 Oracle7 Server Architecture
20 The Oracle Server Instance
21 Oracle7 Parallel Server Characteristics
22 The Oracle7 Parallel Server
23 Application Example of Parallel Server (functional parallelism)
24 Parallel Server Option
25 Oracle7 on IBM SPX Architecture
26 Parallel Cache Management (PCM) and Distributed Lock Manager(DLM)
27 Oracle Parallel Server On-Line Recovery
28 Example of failure due to lack of concurrency control
29 Discussion of PCM Lock Example
30 Contention for Shared Resources
31 Using Oracle 7 Server's PCM locks:
32 Consider the following PCM Example
33 Parallel Query Option
34 Parallel Query Processing: Overview
35 Query Processing
36 Three Processing Phases in Parallel Query
37 Example of parallel execution of a full scan
38 Parallel Query Execution
39 The query coordinator dynamically partitions the work among the query servers
40 Data Partitioning in Oracle 7.1.3
41 File Striping in Oracle
42 Benchmarking Parallel RDBMS
43 Industry Benchmarking Tests-I
44 Industry Benchmarking Tests-II
45 Characteristics of DSS Workloads
46 TPC-D - a Decision Support Benchmark?
Overall Structure

47 TPC-D - a Decision Support Benchmark?
Nature of Benchmark Examples

48 TPC-D - a Decision Support Benchmark? Nature of Queries
49 TPC-D - a Decision Support Benchmark?
Problems with TPC-D design

50 TPC-D - a Decision Support Benchmark?
Scalability Issues

51 TPC-D - a Decision Support Benchmark?
Relevance to Real Applications

52 TPC-D - a Decision Support Benchmark?
Overall Evaluation

53 Some Lessons Learned in NPAC's Benchmarking

This table of Contents Abstract



HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 1 Parallel Relational Database Management Systems II

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Gang Cheng
Marek Podgorny
(and a bit of Geoffrey Fox)
gcheng@npac.syr.edu,marek@npac.syr.edu
NPAC
111 College Place
Syracuse
New York 13244-4100

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 2 Abstract of Second Part of Parallel Relational Database Systems Presentation

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
This discusses the last three parts of presentation
What is situation at NPAC including hardware, InfoMall activities and the DR-LINK product of Textwise (Liz Liddy)
Detailed Discussion of Parallel Oracle on SP2 and nCUBE
Results of NPAC's Benchmarking Activities
  • Especially TPCD work

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 3 Parallel Database efforts at NPAC and in InfoMall

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 4 NPAC's Experience in Parallel Database Technology

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Beta test sites with Oracle, nCUBE and IBM for their parallel RDBMS products
Direct contacts with PRDBMS developers at Oracle and IBM
Hands-on Expertise in large-scale (up to 100 GB DB size) DSS benchmarking
Experience in parallel computing and applications in industry, parallel system integration and data visualization

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 5 nCUBE Database Hardware and Software Infrastructure in NPAC

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
nCUBE 2
  • 64 computing nodes, 2 GB RAM
  • 32 disk I/O nodes, 256 MB RAM, 32 SCSI2 channels
  • 96 2 GB disks, 192 GB total
  • OS 3.3 (GFS) and 3.4 beta (PFS, NFS)
  • Oracle 7.1.3 Parallel Server with PDQ
  • Oracle Parallel Text Server 3.0 beta
  • Video-On-Demand software

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 6 IBM Database Hardware and Software Infrastructure in NPAC

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
IBM SP2
  • 8 Thin and 4 Wide nodes, 256 MB RAM each
  • 13/21 GB SCSI2 disks per node, 188 GB total
  • TB2 high-performane switch
  • parallel I/O supported via VSD software
  • database software
    • IBM parallel DB2 (beta)
    • Oracle 7.1.3.2 Parallel Server (beta)
Upcoming upgrade:
  • switched Ethernet/FDDI/ATM hook-up to NPAC backbone

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 7 InfoMall(NPAC) Parallel Database Evaluation and Benchmarking Service

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Planning, design, implementation, and interpretation of Decision Support benchmarks using parallel database technology
InfoMall product, supported by technology vendors
Standard and customer defined benchmarks
NPAC provides state-of-the-art infrastructure, MPP and parallel database expertise, and trained personnel
InfoMall assumes most of the risks and hazards involved in the benchmarking process
Non-disclosure arrangements available

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 8 Integrated Decision Support Systems

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
InfoMall project
  • NPAC teams with a small software vendor, major software vendor, and hardware vendor to provide an integrated Decision Support system for a number of Fortune 500 companies
  • NPAC role:
    • help parallelizing the code kernel of the Decision Support package
    • interface the package to a parallel database system and provide capacity planning based on customer performance requirements and NPAC parallel database expertise
    • provide infrastructure and system support for project duration
  • Project deliverable: integrated turnkey DS system with all elements implemented of the same MPP platform

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 9 Overview of DR-LINK High Level Natural Language Database Interface

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
This research activity of Professor E. Liddy of Information Studies School at Syracuse University is being commercialized by InfoMall member -- the start-up company Textwise.
The Surface Realizations i.e. particular strings of words in documents or database quert, do not reflect the breadth, richness or precision of meaning intended by either
  • Author of Document
  • Presenter of Query
Simple word level representation cannot produce the recall and prescision possible when the full conceptual intent of queries and documents is the basis of retrieval
  • Thus DR-LINK matches on the concepts of interest to the user and not just the words of interest
DR-LINK achieves this deeper conceptual representation of texts by utilizing all the levels of language at which meaning is conveyed:
  • Lexical, Syntactic, Semantic and Discourse levels of language
  • DR-LINK combines these multiple levels of conceptual representation of text

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 10 Features of DR-LINK High Level Natural Language Database Interface

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Accepts lengthy ambiguous complex Natural Language Queries
Translates Natural Language queries into precise Boolean representations of the user's implied requirements for relevance of a document to query
Produces summary-level semantic SFC (Subject Field Codes) vector representations of query and documents for quick filtering of large databases
Catures Text Structure dimensions of time, source, state of completion, credibility, definiteness, intentionality and causality
Fulfills Proper Noun requirements of queries by:
  • Precise matching of focused Proper Noun requests
  • Accurate Proper Noun category-level matching
  • Expansion of categories to constituent members
Provides both high recall and high precision via controlled expansion of Complex Nominals
Evaluates Implicit and Explicit Semantics to assign similiarity between Query and Documents using its Integrated Matcher
Predicts how many documents need to be viewed to acieve the user-specified level of recall via the Cut-Off Criterion

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 11 Structure of DR-LINK High Level Database Interface

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * Critical Information in IMAGE
Full HTML Index
Raw
Documents

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 12 Parallel Oracle7 RDBMS -- A Case Study

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 13 What is the Oracle Parallel Server -- Introduction ?

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
A high-performance RDBMS for loosely coupled and MPP systems
Currently supported systems: IBM SP2, nCUBE2, SGI SMP, SUN SMP, HP SMP, Amdahl, Encore, KSR, CM5,
Also known as multi-instance Oracle: Multiple instances running against a single, shared database
A evolutionary approach on existing sequential Oracle RDBMS

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 14 Oracle Parallel RDBMS Architecture
Overall Structure

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Two independent options
  • Parallel Server (PS) - since v. 6.2
  • Parallel Data Query (PDQ) - since v. 7.1
Parallel Server option available for most Unix platforms
Parallel Data Query available for few SMPs
combination of PS and PDQ mostly in beta, available for a handful of SMP clusters and MPP machines

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 15 Oracle Parallel RDBMS Architecture
Parallel Data Query option:

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
parallelizes execution of a single SQL query
parallel execution carried out by concurrent Unix processes communicating via shared memory
single Oracle instance sufficient - no PS necessary
uses multiple CPUs on SMP architectures but can run on single CPU machines as well with modest performance benefits
essential for Decision Support
irrelevant for OLTP

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 16 Oracle Parallel RDBMS Architecture
Parallel Server option

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
lets multiple Oracle instances access the same database
requires a cluster of single CPU workstations, a cluster of SMPs, or an MPP to run
requires a Distributed Lock Manager to ensure data consistency
requires all disk storage accessible by each Oracle node
supports PDQ distributed across multiple instances
essential for OLTP and Òapplication partitioningÓ
relevant for scalability of ad hoc DS queries

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 17 What are the versions of the Oracle Parallel Server ?

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
First introduced with Oracle 6.2
version 6.2 = version 6.0 (sequential) + Parallel Cache Management
version 7.1 = version 6.2 + parallel query processing
newest version 7.1.3 = sequential Oracle + Parallel Server Option + Parallel Query Option
production release on nCUBE is in Jan. 1995
All parallel implementation is transparent to users
Key technology: Parallel Cache Management and Parallel Query Processing
Key components: a Distributed Lock Manager and a Parallel Query Optimizer

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 18 What is the Oracle Parallel Server File Structure and Architecture?

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Oracle 7 Server Architecture
  • shown in next foil
Oracle 7 Database Files:
  • Data Files --- Contain all the database data; logical structure, such as tables and indexes, are physically stored in the data files
  • Redo Log Files --- Transaction logs that record all changes made to database information for recovery
  • Control Files --- File that record the physical structure of the database including all datafiles and redo log files

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 19 Oracle7 Server Architecture

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * Critical Information in IMAGE
Full HTML Index

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 20 The Oracle Server Instance

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Every time the Oracle 7 Server is started, the System Global Area (SGA) is allocated and the Oracle 7 Server background processes are started. The combination of memory buffers and background processes is called an Oracle Server Instance.
The SGA is a group of shared memory buffers allocated by the Oracle 7 Server for an instance
The background processes asynchronously perform distinct tasks on behalf of all database users of the instance

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 21 Oracle7 Parallel Server Characteristics

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Start an Oracle Server instance on each node
Each instance has its own set of redo log files
All instances can concurrently execute transactions against the same database and each instance can have multiple users executing transactions

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 22 The Oracle7 Parallel Server

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * Critical Information in IMAGE
Full HTML Index
Each node of a parallel system runs an instance of the Oracle7 Server that accesses a shared database

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 23 Application Example of Parallel Server (functional parallelism)

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * Critical Information in IMAGE
Full HTML Index

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 24 Parallel Server Option

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Multiple Instances on different nodes share a single physical database
The data is accessed by all nodes through a parallel file subsystem
All instances have common data & controls files and their individual log files
All instances can simultaneously execute transactions against the single database, and each instance can have multiple users executing transactions
Data is kept consistent using PCM Technology
Implemented by the Distributed Lock Manager

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 25 Oracle7 on IBM SPX Architecture

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * Critical Information in IMAGE
Full HTML Index
Oracle Components
  • Distributed Lock Manager
  • Oracle Parallel Server
IBM Components
  • High Performance Switch
  • Virtual System Disk (VSD)

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 26 Parallel Cache Management (PCM) and Distributed Lock Manager(DLM)

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Concurrency control of buffer caches among instances
Maintain independent, yet consistent cache of blocks for all instances
Allow multiple nodes to simultaneous read/write to database (disks)
Maximizes concurrent data access and minimizes (disk) I/O activity
Block-level PCM locks, row-level locking preserved
A fault-tolerant DLM: if a node goes down, the DLMs will automatically recover and reconstruct the lock database on behalf of the failed DLM

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 27 Oracle Parallel Server On-Line Recovery

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * Critical Information in IMAGE
Full HTML Index
Three possible failures

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 28 Example of failure due to lack of concurrency control

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Time Steps:
  • Step 1: Node 1 reads block 1 from disk and stores in Node 1 memory cache, Node 2 reads block 1 from disk and stores in Node 2 memory cache
  • Step 2: Node 1 writes information to block 1, Node 2 writes information to block 1
  • Step 3: Node 1 writes block 1 to disk
  • Step 4: Node 3 reads block 1 from disk and stores in Node 3 memory cache
In this situation, Node 1 and 3 have the same information in each memory cache, but Node 2 has an inconsistent cache.

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 29 Discussion of PCM Lock Example

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
The example of previous and following foil, illustrates why concurrency control is needed in a parallel DB server environment.
It shows how contention for shared resources(data,disk access) is being handled under a multi-node parallel system sharing a single database.
The blackened boxes in the figure are the same blocks in each node (not rows !) requested by different nodes(instances) and located in different SGA bufferes.
Concurrency control needs to be implemented to ensure a consistant database,
i.e. the same block from the disk must be reflected as exactly the same copy in different buffers.
This is achieved by PCM locks.

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 30 Contention for Shared Resources

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * Critical Information in IMAGE
Full HTML Index
3 Oracle Instances Accessing Shared database

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 31 Using Oracle 7 Server's PCM locks:

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
An instance acquires a PCM lock as a resource for update purposes
All buffer caches must contain the same copy of a resource
PCM blocks:
  • Ensure cache coherency by only allowing instances to change blocks while holding a lock
  • PCM lock work independently of transaction locks (row-level locking)
  • Ensure cache coherency using minimum amount of communication
  • PCM locks are allocated to files when a instance is started
  • Blocks are typical O/S disk blocks and of fixed size

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 32 Consider the following PCM Example

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Time
  • 1 Instance X attempts to update row 1 in (data block n)
  • Instance X becomes owner of PCM lock for data block n
  • Instance X update row 1
  • 2 Instance Y attempts to update row 2 in (data block n)
  • Instance Y requests the PCM lock from instance X for data block n
  • Instance X releases the PCM lock
  • Instance Y becomes owner of PCM lock for data block n
  • Instance Y update row 2
  • 3 Instance X attempts to update row 3 in (data block n)
  • Instance X requests the PCM lock from instance Y for data block n
  • Instance Y releases the PCM lock
  • Instance X becomes owner of PCM lock for data block n
  • Instance X update row 3
The PCM is handled separately from transaction lock -- when instance wish to update the same data (e.g.: Instance X and Y attempt to update Row 1), normal transaction locking is involved.

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 33 Parallel Query Option

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Parallel Query Processing
  • enables the parallel processing of lon and complex queries with parallelizable operators such as scan,join and sorts
  • speeds up query response time (vs throughput in Parallel Server Option)
Parallel Data Load
  • Loads data in both a direct and parallel method
  • speeds up the loading of large data loads
Parallel Index Creation
  • speeds up the creation of large indexes
Parallel Recovery
  • enables concurrent recovery sessions

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 34 Parallel Query Processing: Overview

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Typical Data Flow with 5 steps:
  • 1 A client process issues a query
  • 2 The statement is parsed and a shared server process executes the statement
  • 3 The shared server process determines if the statement can utilize parallel processing, and then enlists two or more query servers to execute the statement in parallel. The shared server process partitions the work of each operation in the query equally among the query servers
  • 4 Query servers process certain operations in the query independently and return sub-results back to the shared server process
  • 5 The shared server process reassembles the results from the query servers and returns the query's results to the client

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 35 Query Processing

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * Critical Information in IMAGE
Full HTML Index

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 36 Three Processing Phases in Parallel Query

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
1 Parse --- a query optimizer determines optimal execution plan, parallelize query if a full scan was included in the execution plan (note the query is parallelized dynamically at execution time, automatically adapting changes in data partition and server configuration)
2 Execute --- perform operations
3 Fetch (Queries) --- return rows from query server processes
The execution and fetch phases can be performed in parallel ( pipelined parallelism) by multiple server processes

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 37 Example of parallel execution of a full scan

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
The shared query server is known as Query Coordinator who is responsible for parallelizing operations and interpreting the results from query servers
Not all operators can be parallelized in Oracle 7.1.3
Oracle Server Parallizable Operations
  • Table Scans
  • Sorts (for ORDER BY, GROUP BY, Joins, DISTINCT)
  • Joins (Sort-Merge, Nested Loops)
  • Aggregates (GROUP BY, MIN, MAX, AVG, etc )
  • SET operations (eg., UNION, INTERSECT or MINUS) are not parallelized
See following two figures

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 38 Parallel Query Execution

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * Critical Information in IMAGE
Full HTML Index
Full Table Scan without Parallel Query Option
Multiple Query Servers Performing a Full Table Scan

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 39 The query coordinator dynamically partitions the work among the query servers

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * Critical Information in IMAGE
Full HTML Index

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 40 Data Partitioning in Oracle 7.1.3

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Oracle 7 on nCUBE2 and IBM SP2 are actually implemented as shared-disk (vs. shared-nothing) architecture
  • See earlier discussion of sophisticated nCUBE implementation
Shared database by sharing disks through parallel file system or parallel I/O system:
  • nCUBE -- the GFS (Global File System) over a Gigacache Subcube and nCUBE2's parallel I/O system;
  • IBM SP2 -- VSD (Virtual Shared Disk) parallel I/O system (over the HPS)

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 41 File Striping in Oracle

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Automatic file striping and disk partitions by the operating system on nCUBE2
  • General OS and server level, less flexible, but relatively transparent to DBA
  • data skew is less possible, due to the Gigacache
Manually file striping at database table level on IBM SP2
  • Application-specific, flexible, but difficult to handle with large number of files and impossible to determine optimal striping length
  • data skew is highly possible

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 42 Benchmarking Parallel RDBMS

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 43 Industry Benchmarking Tests-I

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Transaction Processing Council
  • TPC A,B,C -- standards for OLTP benchmarks
    • TPCB OLTP without network
    • TPCA OLTP with a LAN or wide area network
    • TPCC Online Business Transaction Processing
  • TPCD -- not yet finalized DSS benchmark
Wisconsin Benchmark (aka Dewitt)
  • Initially a benchmark suite for DIRECT database machine but later used in academia and research for benchmarking parallel RDBMS with relational queries

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 44 Industry Benchmarking Tests-II

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Set Query
  • Complex queries and reporting for DSS applications and datamining applications
The Neal Nelson Database Benchmark (tm)
  • Multi-user UNIX business benchmarking for specific application domains and customers produced by Neal Nelson Associates
SPEC (Standard Performance Evaluation Corporation)
  • Industry body to coordinate Industry standard benchmarking in DB (and scientific) arena.
The Customer -- or worse >> The Consultant
  • Customized domain-specific benchmarking developed by cutomer's own applications and specifications

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 45 Characteristics of DSS Workloads

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Few insert/update/delete transactions, read-only queries.
Non-selective & data-intensive; few in number, large in size
May return a large volume of data: e.g.: mailing list
Queries frequently scan a large percentage of the data in the database
Often used to mine data in a recursive, drill-down fashion

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 46 TPC-D - a Decision Support Benchmark?
Overall Structure

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
A common suite for DSS benchmarking and comparing:
  • different software and hardware systems (such as Oracle 7 on nCUBE2 vs. DB2 on IBM SP2)
  • different software Products on the same machine (such as Oracle7 and DB2 on IBM SP2)
  • different machines in a compatible family (such as Oracle7 on IBM SP1 and SP2, or on a RS/6000)
  • different releases of a product on the same machine (such as Oracle 7.1.0, 7.1.2, 7.1.3 on nCUBE2)
Our TPCD benchmarking project has gone through most of the above examples

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 47 TPC-D - a Decision Support Benchmark?
Nature of Benchmark Examples

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
TPC-D definition requires that either one (power test) or many (scale-up test) query streams are run and timed against at least three sizes of a predefined database, with sizes growing exponentially (e.g., 1G, 10GB, 100GB sequence)
Query stream consists of 17 complex queries plus two concurrently running update operations
No extensive database tuning in allowed - optimizer hints are explicitly forbidden
timings for individual queries are not part of the TPC-D metric
Database structure well represents most common DSS DBs in applications, i.e. one largest table takes ~70% of the total DB size.

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 48 TPC-D - a Decision Support Benchmark? Nature of Queries

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
model complex queries against a relatively large and complex database
All currently parallelizable relational operators (Oracle 7 Server) are used, including:
  • Aggregate (Group by) - Q2, Q6, Q11, Q14, Q15
  • Merge Join - Q8
  • Nested Loops - All except Q1
  • Sort (Group by) - All except Q2
  • Sort (Join) - Q8, Q12
  • Sort (Order by) - Q3, Q5, Q10, Q11, Q16
  • Sort (Unique) - Q4
  • Table Access (Full) - All

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 49 TPC-D - a Decision Support Benchmark?
Problems with TPC-D design

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Query stream design: Query 17 takes more than 95% of the time needed to execute the stream
Overal design unnecessarily mimics TPC OLTP benchmarks
Metrics more suitable for Data Warehouse applications than for ad hoc DS queries
Benchmark expensive to run (huge disk space requirements)

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 50 TPC-D - a Decision Support Benchmark?
Scalability Issues

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Is TPC-D suitable for parallel DBMS evaluation?
  • No scalability performance metrics suitable for parallel data query evaluation i.e., query time vs. number of instances (CPUs), and/or, query time vs. available I/O resources ( I/O channels, drivers, disks and disk partitions)
Scalability of a parallel database system relies on both CPU resources and I/O resources.
By using multiple instances (CPUs), parallel server/query decreases query response time and increases throughput.
By using partitioned data on multiple disks (with multiple I/O channels), parallel server/query achieves higher I/O bandwidth for large DBs.
The two characteristics are equally important and must be benchmarked together. In current TPC-D specification, neither performance metrics is defined.

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 51 TPC-D - a Decision Support Benchmark?
Relevance to Real Applications

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
For DS applications, query stream execution time is largely irrelevant. It is far more important to understand performance and scalability of elementary SQL operations and of typical queries encountered in customer application.
Loading and indexing may take a large portion of the overall time in preparing a large DB. Both parallel loading and indexing become important functionality of a parallel RDBMS. Their performance for large DSS databases should be defined and benchmarked, together with the query benchmarks. Currently none of them is defined or mentioned in the TPC-D specification.

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 52 TPC-D - a Decision Support Benchmark?
Overall Evaluation

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
TPC-D design contains flaws that make its use questionable even for sequential database systems
Given the complexity of the parallel database systems, attempts to reduce their performance estimate to one number are most likely futile
TPC-D defines no metrics useful for understanding behavior of parallel RDBMSes. TPC-D designers should re-examine academic research work pertinent to the field
Future research work towards definition of DS benchmark for parallel databases seems important

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 53 Some Lessons Learned in NPAC's Benchmarking

From Explicit Examples in Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Data partitioning must be very carefully designed to balance I/O and CPU
Parallel Loading is great ! --- almost linear speedup
Backup may be important to save time (we don't have)
The most difficult and trouble-making phase is the database preparation/loading/indexing phase which always involves storage management parameters
you must be both a DBA and sysadm and benchmarking relies critically on system stability
Be prepared to bring down the whole parallel system a couple of times a day
Be prepared that the results are negative (nothing may be obtained !)
you need at least twice the disk space of your original database size
performance tuning is extremely difficult with Oracle 7 parallel server

Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Tue Feb 18 1997