Second Part of RDBMS Discussion Parallel Relational Database Management Systems II Gang Cheng Marek Podgorny (and a bit of Geoffrey Fox) gcheng@npac.syr.edu,marek@npac.syr.edu NPAC 111 College Place Syracuse New York 13244-4100 Abstract of Second Part of Parallel Relational Database Systems Presentation This discusses the last three parts of presentation What is situation at NPAC including hardware, InfoMall activities and the DR-LINK product of Textwise (Liz Liddy) Detailed Discussion of Parallel Oracle on SP2 and nCUBE Results of NPAC's Benchmarking Activities Especially TPCD work Parallel Database efforts at NPAC and in InfoMall NPAC's Experience in Parallel Database Technology Beta test sites with Oracle, nCUBE and IBM for their parallel RDBMS products Direct contacts with PRDBMS developers at Oracle and IBM Hands-on Expertise in large-scale (up to 100 GB DB size) DSS benchmarking Experience in parallel computing and applications in industry, parallel system integration and data visualization nCUBE Database Hardware and Software Infrastructure in NPAC nCUBE 2 64 computing nodes, 2 GB RAM 32 disk I/O nodes, 256 MB RAM, 32 SCSI2 channels 96 2 GB disks, 192 GB total OS 3.3 (GFS) and 3.4 beta (PFS, NFS) Oracle 7.1.3 Parallel Server with PDQ Oracle Parallel Text Server 3.0 beta Video-On-Demand software IBM Database Hardware and Software Infrastructure in NPAC IBM SP2 8 Thin and 4 Wide nodes, 256 MB RAM each 13/21 GB SCSI2 disks per node, 188 GB total TB2 high-performane switch parallel I/O supported via VSD software database software IBM parallel DB2 (beta) Oracle 7.1.3.2 Parallel Server (beta) Upcoming upgrade: switched Ethernet/FDDI/ATM hook-up to NPAC backbone InfoMall(NPAC) Parallel Database Evaluation and Benchmarking Service Planning, design, implementation, and interpretation of Decision Support benchmarks using parallel database technology InfoMall product, supported by technology vendors Standard and customer defined benchmarks NPAC provides state-of-the-art infrastructure, MPP and parallel database expertise, and trained personnel InfoMall assumes most of the risks and hazards involved in the benchmarking process Non-disclosure arrangements available Integrated Decision Support Systems InfoMall project NPAC teams with a small software vendor, major software vendor, and hardware vendor to provide an integrated Decision Support system for a number of Fortune 500 companies NPAC role: help parallelizing the code kernel of the Decision Support package interface the package to a parallel database system and provide capacity planning based on customer performance requirements and NPAC parallel database expertise provide infrastructure and system support for project duration Project deliverable: integrated turnkey DS system with all elements implemented of the same MPP platform Overview of DR-LINK High Level Natural Language Database Interface This research activity of Professor E. Liddy of Information Studies School at Syracuse University is being commercialized by InfoMall member -- the start-up company Textwise. The Surface Realizations i.e. particular strings of words in documents or database quert, do not reflect the breadth, richness or precision of meaning intended by either Author of Document Presenter of Query Simple word level representation cannot produce the recall and prescision possible when the full conceptual intent of queries and documents is the basis of retrieval Thus DR-LINK matches on the concepts of interest to the user and not just the words of interest DR-LINK achieves this deeper conceptual representation of texts by utilizing all the levels of language at which meaning is conveyed: Lexical, Syntactic, Semantic and Discourse levels of language DR-LINK combines these multiple levels of conceptual representation of text Features of DR-LINK High Level Natural Language Database Interface Accepts lengthy ambiguous complex Natural Language Queries Translates Natural Language queries into precise Boolean representations of the user's implied requirements for relevance of a document to query Produces summary-level semantic SFC (Subject Field Codes) vector representations of query and documents for quick filtering of large databases Catures Text Structure dimensions of time, source, state of completion, credibility, definiteness, intentionality and causality Fulfills Proper Noun requirements of queries by: Precise matching of focused Proper Noun requests Accurate Proper Noun category-level matching Expansion of categories to constituent members Provides both high recall and high precision via controlled expansion of Complex Nominals Evaluates Implicit and Explicit Semantics to assign similiarity between Query and Documents using its Integrated Matcher Predicts how many documents need to be viewed to acieve the user-specified level of recall via the Cut-Off Criterion Structure of DR-LINK High Level Database Interface Raw Documents Parallel Oracle7 RDBMS -- A Case Study What is the Oracle Parallel Server -- Introduction ? A high-performance RDBMS for loosely coupled and MPP systems Currently supported systems: IBM SP2, nCUBE2, SGI SMP, SUN SMP, HP SMP, Amdahl, Encore, KSR, CM5, Also known as multi-instance Oracle: Multiple instances running against a single, shared database A evolutionary approach on existing sequential Oracle RDBMS Oracle Parallel RDBMS Architecture Overall Structure Two independent options Parallel Server (PS) - since v. 6.2 Parallel Data Query (PDQ) - since v. 7.1 Parallel Server option available for most Unix platforms Parallel Data Query available for few SMPs combination of PS and PDQ mostly in beta, available for a handful of SMP clusters and MPP machines Oracle Parallel RDBMS Architecture Parallel Data Query option: parallelizes execution of a single SQL query parallel execution carried out by concurrent Unix processes communicating via shared memory single Oracle instance sufficient - no PS necessary uses multiple CPUs on SMP architectures but can run on single CPU machines as well with modest performance benefits essential for Decision Support irrelevant for OLTP Oracle Parallel RDBMS Architecture Parallel Server option lets multiple Oracle instances access the same database requires a cluster of single CPU workstations, a cluster of SMPs, or an MPP to run requires a Distributed Lock Manager to ensure data consistency requires all disk storage accessible by each Oracle node supports PDQ distributed across multiple instances essential for OLTP and Òapplication partitioningÓ relevant for scalability of ad hoc DS queries What are the versions of the Oracle Parallel Server ? First introduced with Oracle 6.2 version 6.2 = version 6.0 (sequential) + Parallel Cache Management version 7.1 = version 6.2 + parallel query processing newest version 7.1.3 = sequential Oracle + Parallel Server Option + Parallel Query Option production release on nCUBE is in Jan. 1995 All parallel implementation is transparent to users Key technology: Parallel Cache Management and Parallel Query Processing Key components: a Distributed Lock Manager and a Parallel Query Optimizer What is the Oracle Parallel Server File Structure and Architecture? Oracle 7 Server Architecture shown in next foil Oracle 7 Database Files: Data Files --- Contain all the database data; logical structure, such as tables and indexes, are physically stored in the data files Redo Log Files --- Transaction logs that record all changes made to database information for recovery Control Files --- File that record the physical structure of the database including all datafiles and redo log files Oracle7 Server Architecture The Oracle Server Instance Every time the Oracle 7 Server is started, the System Global Area (SGA) is allocated and the Oracle 7 Server background processes are started. The combination of memory buffers and background processes is called an Oracle Server Instance. The SGA is a group of shared memory buffers allocated by the Oracle 7 Server for an instance The background processes asynchronously perform distinct tasks on behalf of all database users of the instance Oracle7 Parallel Server Characteristics Start an Oracle Server instance on each node Each instance has its own set of redo log files All instances can concurrently execute transactions against the same database and each instance can have multiple users executing transactions The Oracle7 Parallel Server Each node of a parallel system runs an instance of the Oracle7 Server that accesses a shared database Application Example of Parallel Server (functional parallelism) Parallel Server Option Multiple Instances on different nodes share a single physical database The data is accessed by all nodes through a parallel file subsystem All instances have common data & controls files and their individual log files All instances can simultaneously execute transactions against the single database, and each instance can have multiple users executing transactions Data is kept consistent using PCM Technology Implemented by the Distributed Lock Manager Oracle7 on IBM SPX Architecture Oracle Components Distributed Lock Manager Oracle Parallel Server IBM Components High Performance Switch Virtual System Disk (VSD) Parallel Cache Management (PCM) and Distributed Lock Manager(DLM) Concurrency control of buffer caches among instances Maintain independent, yet consistent cache of blocks for all instances Allow multiple nodes to simultaneous read/write to database (disks) Maximizes concurrent data access and minimizes (disk) I/O activity Block-level PCM locks, row-level locking preserved A fault-tolerant DLM: if a node goes down, the DLMs will automatically recover and reconstruct the lock database on behalf of the failed DLM Oracle Parallel Server On-Line Recovery Three possible failures Example of failure due to lack of concurrency control Time Steps: Step 1: Node 1 reads block 1 from disk and stores in Node 1 memory cache, Node 2 reads block 1 from disk and stores in Node 2 memory cache Step 2: Node 1 writes information to block 1, Node 2 writes information to block 1 Step 3: Node 1 writes block 1 to disk Step 4: Node 3 reads block 1 from disk and stores in Node 3 memory cache In this situation, Node 1 and 3 have the same information in each memory cache, but Node 2 has an inconsistent cache. Discussion of PCM Lock Example The example of previous and following foil, illustrates why concurrency control is needed in a parallel DB server environment. It shows how contention for shared resources(data,disk access) is being handled under a multi-node parallel system sharing a single database. The blackened boxes in the figure are the same blocks in each node (not rows !) requested by different nodes(instances) and located in different SGA bufferes. Concurrency control needs to be implemented to ensure a consistant database, i.e. the same block from the disk must be reflected as exactly the same copy in different buffers. This is achieved by PCM locks. Contention for Shared Resources 3 Oracle Instances Accessing Shared database Using Oracle 7 Server's PCM locks: An instance acquires a PCM lock as a resource for update purposes All buffer caches must contain the same copy of a resource PCM blocks: Ensure cache coherency by only allowing instances to change blocks while holding a lock PCM lock work independently of transaction locks (row-level locking) Ensure cache coherency using minimum amount of communication PCM locks are allocated to files when a instance is started Blocks are typical O/S disk blocks and of fixed size Consider the following PCM Example Time 1 Instance X attempts to update row 1 in (data block n) Instance X becomes owner of PCM lock for data block n Instance X update row 1 2 Instance Y attempts to update row 2 in (data block n) Instance Y requests the PCM lock from instance X for data block n Instance X releases the PCM lock Instance Y becomes owner of PCM lock for data block n Instance Y update row 2 3 Instance X attempts to update row 3 in (data block n) Instance X requests the PCM lock from instance Y for data block n Instance Y releases the PCM lock Instance X becomes owner of PCM lock for data block n Instance X update row 3 The PCM is handled separately from transaction lock -- when instance wish to update the same data (e.g.: Instance X and Y attempt to update Row 1), normal transaction locking is involved. Parallel Query Option Parallel Query Processing enables the parallel processing of lon and complex queries with parallelizable operators such as scan,join and sorts speeds up query response time (vs throughput in Parallel Server Option) Parallel Data Load Loads data in both a direct and parallel method speeds up the loading of large data loads Parallel Index Creation speeds up the creation of large indexes Parallel Recovery enables concurrent recovery sessions Parallel Query Processing: Overview Typical Data Flow with 5 steps: 1 A client process issues a query 2 The statement is parsed and a shared server process executes the statement 3 The shared server process determines if the statement can utilize parallel processing, and then enlists two or more query servers to execute the statement in parallel. The shared server process partitions the work of each operation in the query equally among the query servers 4 Query servers process certain operations in the query independently and return sub-results back to the shared server process 5 The shared server process reassembles the results from the query servers and returns the query's results to the client Query Processing Three Processing Phases in Parallel Query 1 Parse --- a query optimizer determines optimal execution plan, parallelize query if a full scan was included in the execution plan (note the query is parallelized dynamically at execution time, automatically adapting changes in data partition and server configuration) 2 Execute --- perform operations 3 Fetch (Queries) --- return rows from query server processes The execution and fetch phases can be performed in parallel ( pipelined parallelism) by multiple server processes Example of parallel execution of a full scan The shared query server is known as Query Coordinator who is responsible for parallelizing operations and interpreting the results from query servers Not all operators can be parallelized in Oracle 7.1.3 Oracle Server Parallizable Operations Table Scans Sorts (for ORDER BY, GROUP BY, Joins, DISTINCT) Joins (Sort-Merge, Nested Loops) Aggregates (GROUP BY, MIN, MAX, AVG, etc ) SET operations (eg., UNION, INTERSECT or MINUS) are not parallelized See following two figures Parallel Query Execution Full Table Scan without Parallel Query Option Multiple Query Servers Performing a Full Table Scan The query coordinator dynamically partitions the work among the query servers Data Partitioning in Oracle 7.1.3 Oracle 7 on nCUBE2 and IBM SP2 are actually implemented as shared-disk (vs. shared-nothing) architecture See earlier discussion of sophisticated nCUBE implementation Shared database by sharing disks through parallel file system or parallel I/O system: nCUBE -- the GFS (Global File System) over a Gigacache Subcube and nCUBE2's parallel I/O system; IBM SP2 -- VSD (Virtual Shared Disk) parallel I/O system (over the HPS) File Striping in Oracle Automatic file striping and disk partitions by the operating system on nCUBE2 General OS and server level, less flexible, but relatively transparent to DBA data skew is less possible, due to the Gigacache Manually file striping at database table level on IBM SP2 Application-specific, flexible, but difficult to handle with large number of files and impossible to determine optimal striping length data skew is highly possible Benchmarking Parallel RDBMS Industry Benchmarking Tests-I Transaction Processing Council TPC A,B,C -- standards for OLTP benchmarks TPCB OLTP without network TPCA OLTP with a LAN or wide area network TPCC Online Business Transaction Processing TPCD -- not yet finalized DSS benchmark Wisconsin Benchmark (aka Dewitt) Initially a benchmark suite for DIRECT database machine but later used in academia and research for benchmarking parallel RDBMS with relational queries Industry Benchmarking Tests-II Set Query Complex queries and reporting for DSS applications and datamining applications The Neal Nelson Database Benchmark (tm) Multi-user UNIX business benchmarking for specific application domains and customers produced by Neal Nelson Associates SPEC (Standard Performance Evaluation Corporation) Industry body to coordinate Industry standard benchmarking in DB (and scientific) arena. The Customer -- or worse >> The Consultant Customized domain-specific benchmarking developed by cutomer's own applications and specifications Characteristics of DSS Workloads Few insert/update/delete transactions, read-only queries. Non-selective & data-intensive; few in number, large in size May return a large volume of data: e.g.: mailing list Queries frequently scan a large percentage of the data in the database Often used to mine data in a recursive, drill-down fashion TPC-D - a Decision Support Benchmark? Overall Structure A common suite for DSS benchmarking and comparing: different software and hardware systems (such as Oracle 7 on nCUBE2 vs. DB2 on IBM SP2) different software Products on the same machine (such as Oracle7 and DB2 on IBM SP2) different machines in a compatible family (such as Oracle7 on IBM SP1 and SP2, or on a RS/6000) different releases of a product on the same machine (such as Oracle 7.1.0, 7.1.2, 7.1.3 on nCUBE2) Our TPCD benchmarking project has gone through most of the above examples TPC-D - a Decision Support Benchmark? Nature of Benchmark Examples TPC-D definition requires that either one (power test) or many (scale-up test) query streams are run and timed against at least three sizes of a predefined database, with sizes growing exponentially (e.g., 1G, 10GB, 100GB sequence) Query stream consists of 17 complex queries plus two concurrently running update operations No extensive database tuning in allowed - optimizer hints are explicitly forbidden timings for individual queries are not part of the TPC-D metric Database structure well represents most common DSS DBs in applications, i.e. one largest table takes ~70% of the total DB size. TPC-D - a Decision Support Benchmark? Nature of Queries model complex queries against a relatively large and complex database All currently parallelizable relational operators (Oracle 7 Server) are used, including: Aggregate (Group by) - Q2, Q6, Q11, Q14, Q15 Merge Join - Q8 Nested Loops - All except Q1 Sort (Group by) - All except Q2 Sort (Join) - Q8, Q12 Sort (Order by) - Q3, Q5, Q10, Q11, Q16 Sort (Unique) - Q4 Table Access (Full) - All TPC-D - a Decision Support Benchmark? Problems with TPC-D design Query stream design: Query 17 takes more than 95% of the time needed to execute the stream Overal design unnecessarily mimics TPC OLTP benchmarks Metrics more suitable for Data Warehouse applications than for ad hoc DS queries Benchmark expensive to run (huge disk space requirements) TPC-D - a Decision Support Benchmark? Scalability Issues Is TPC-D suitable for parallel DBMS evaluation? No scalability performance metrics suitable for parallel data query evaluation i.e., query time vs. number of instances (CPUs), and/or, query time vs. available I/O resources ( I/O channels, drivers, disks and disk partitions) Scalability of a parallel database system relies on both CPU resources and I/O resources. By using multiple instances (CPUs), parallel server/query decreases query response time and increases throughput. By using partitioned data on multiple disks (with multiple I/O channels), parallel server/query achieves higher I/O bandwidth for large DBs. The two characteristics are equally important and must be benchmarked together. In current TPC-D specification, neither performance metrics is defined. TPC-D - a Decision Support Benchmark? Relevance to Real Applications For DS applications, query stream execution time is largely irrelevant. It is far more important to understand performance and scalability of elementary SQL operations and of typical queries encountered in customer application. Loading and indexing may take a large portion of the overall time in preparing a large DB. Both parallel loading and indexing become important functionality of a parallel RDBMS. Their performance for large DSS databases should be defined and benchmarked, together with the query benchmarks. Currently none of them is defined or mentioned in the TPC-D specification. TPC-D - a Decision Support Benchmark? Overall Evaluation TPC-D design contains flaws that make its use questionable even for sequential database systems Given the complexity of the parallel database systems, attempts to reduce their performance estimate to one number are most likely futile TPC-D defines no metrics useful for understanding behavior of parallel RDBMSes. TPC-D designers should re-examine academic research work pertinent to the field Future research work towards definition of DS benchmark for parallel databases seems important Some Lessons Learned in NPAC's Benchmarking Data partitioning must be very carefully designed to balance I/O and CPU Parallel Loading is great ! --- almost linear speedup Backup may be important to save time (we don't have) The most difficult and trouble-making phase is the database preparation/loading/indexing phase which always involves storage management parameters you must be both a DBA and sysadm and benchmarking relies critically on system stability Be prepared to bring down the whole parallel system a couple of times a day Be prepared that the results are negative (nothing may be obtained !) you need at least twice the disk space of your original database size performance tuning is extremely difficult with Oracle 7 parallel server