Full HTML for

Basic foilset First Part (Motivation, Theory) of Parallel Relational Databases

Given by Gang Cheng Marek Podgorny Chao-Wei Ou at ARL Database Tutorial on February 98. Foils prepared 7 October 97
Outside Index Summary of Material


This presentation contains the first two sections
  • Parallel Database Technology in Commercial Applications and Industry
  • Parallel Database Technology and Theory
Of the full ARL Parallel Database Module
The first section sets the scene by motivating the need for paraalel databases while
The second section reviews both Sequentional and Parallel Relational Databases looking at explicit examples
  • nCUBE and SP2 with Oracle and DB2
We also discuss database system architectures and review
The SQL Query language

Table of Contents for full HTML of First Part (Motivation, Theory) of Parallel Relational Databases

Denote Foils where Image Critical
Denote Foils where HTML is sufficient

1 Parallel Relational Database Management Systems -- I
2 Abstract of Parallel Relational Database Management Systems -- I
3 Outline of Full Database Presentation
4 Section I:
Parallel Database Technology in Commercial Applications and Industry

5 Motivations for Parallel Databases -- I:
Overview of Parallel Database Appeal

6 Motivations for Parallel Databases -- II:
Inadequacies with Current Mainframe Solutions

7 Motivations for Parallel Databases -- III:
Commercial versus Scientific Applications

8 Motivations for Parallel Databases -- IV:
Market Demand from Competitiveness

9 Application Areas for Parallel Database: commercial, administration, scientific
10 General Classes of Commercial Applications
11 An Application Example ---
Intelligent Business systems
1)Objectives

12 Intelligent Business systems ---
2)Typical Questions

13 Intelligent Business systems ---
3) Major Technology Challenges

14 Intelligent Business systems ---
4) Solutions

15 Major Software and Hardware vendors in Parallel Database Technology
16 Some Current Major Commercial Users
17 Parallel Database Technology and Theory
18 Hardware architectures for parallel DBMS
-- Generic System

19 Hardware Architectures and forms of Parallelism
20 Three (Hardware) architectures for parallel DBMS
21 Topological view of MP machines (with "Scalability" in mind)
22 Notes on Shared Nothing Architecture
23 Dataflow perspective
24 Parallelisms in parallel database systems
25 Data Partitioning --- How to divide data among multiple disks ?
26 Basic Data Partitioning Schemes
27 Major Approaches in Data Partitioning
28 Danger of Data Skew arises on
Shared Nothing with Data Partitioning

29 Pitfalls in data partitioning
30 Performance Metrics In Parallel Database Systems
31 Performance barriers
32 Some basic terminology for relational database model
33 Examples of Typical Relational Operations
34 Join Operation: a SELECT operation that combines rows from two or more tables. Each returned row contains data from more than one table
35 Overview of Structure Query Language (SQL)
36 Features of Structure Query Language (SQL)
37 Major RDBMS functionality
38 What is a 3GL or 4GL?
39 What is PL/SQL -- I ?
40 What is PL/SQL -- II?
41 What is PL/SQL -- III?
42 What is Data Integrity?
43 What are Schemes?
44 What are Roles?
45 What are Profiles and Auditing?
46 What are Two-phase Database Schema?
47 Example: Relational Joins
48 General Structure of Parallel and Sequential Relational Joins
49 Parallel Algorithm for Relational Joins
50 Parallel Database Software Architecture
Distributed Lock Manager

51 Parallel Database Software Architecture

Outside Index Summary of Material



HTML version of Basic Foils prepared 7 October 97

Foil 1 Parallel Relational Database Management Systems -- I

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
Click here for subtitle
Gang Cheng, Geoffrey C. Fox, Nancy McCracken, Marek Podgorny, Chao-Wei Ou
October 1997
Northeast Parallel Architectures Center
at Syracuse University

HTML version of Basic Foils prepared 7 October 97

Foil 2 Abstract of Parallel Relational Database Management Systems -- I

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
This presentation contains the first two sections
  • Parallel Database Technology in Commercial Applications and Industry
  • Parallel Database Technology and Theory
Of the full ARL Parallel Database Module
The first section sets the scene by motivating the need for paraalel databases while
The second section reviews both Sequentional and Parallel Relational Databases looking at explicit examples
  • nCUBE and SP2 with Oracle and DB2
We also discuss database system architectures and review
The SQL Query language

HTML version of Basic Foils prepared 7 October 97

Foil 3 Outline of Full Database Presentation

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
This Presentation has five Sections
Parallel Database Technology in Commercial Applications and Industry
Parallel Database Technology and Theory
Parallel Database Projects at NPAC
Parallel Oracle7 RDBMS -- A Case Study
Parallel Database Benchmarking

HTML version of Basic Foils prepared 7 October 97

Foil 4 Section I:
Parallel Database Technology in Commercial Applications and Industry

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index

HTML version of Basic Foils prepared 7 October 97

Foil 5 Motivations for Parallel Databases -- I:
Overview of Parallel Database Appeal

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index

HTML version of Basic Foils prepared 7 October 97

Foil 6 Motivations for Parallel Databases -- II:
Inadequacies with Current Mainframe Solutions

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
Conventional (mainframe) systems unable to support
  • time-critical queries
  • simultaneous OLTP and decision support systems
  • gigantic scientific and commercial databases
  • complex queries (approximate matching, pattern recognition ...)
  • sophisticated optimization as in market segmentation

HTML version of Basic Foils prepared 7 October 97

Foil 7 Motivations for Parallel Databases -- III:
Commercial versus Scientific Applications

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
In commercial business world, MPP systems
  • Can find more profitable applications than the scientific world,
  • As SQL intrinsically parallel, one needs only parallelize the database system and not the many SQL applications
    • in contrast for scientific computing, one needs to both produce a parallel compiler and parallelize each application at language and algorithm level
  • Have better cost/performance than the mainframe system
  • Are mature and highly available NOW

HTML version of Basic Foils prepared 7 October 97

Foil 8 Motivations for Parallel Databases -- IV:
Market Demand from Competitiveness

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index

HTML version of Basic Foils prepared 7 October 97

Foil 9 Application Areas for Parallel Database: commercial, administration, scientific

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
High transaction rate applications
  • banks, airlines, telecommunication, security brokers, retailers ...
Real-time applications
  • stock trading, C3I (military, civil defense), air traffic, process control (manufacturing, accelerators, reactors/generators), real-time insurance claim analysis ...
Complex queries
  • decision support (market analysis), molecular biology, chemistry/pharmacology ...
Massive amount of data
  • EOS, HEP, weather, census, global environment models, oil exploration, medical history files, multimedia support ...

HTML version of Basic Foils prepared 7 October 97

Foil 10 General Classes of Commercial Applications

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
OLTP (On Line Transaction Processing) --- consists of performing many small independent requests submitted by many clients simultaneously and operating
  • Performance needed is linear in database size
DSS (Decision Support System)
  • Performance needed can be exponential in database size for sophisticated optimization or queries
Data Warehouse --- operations mixing DSS and OLTP
Data mining --- uncover/extract meaningful information from massive data sources

HTML version of Basic Foils prepared 7 October 97

Foil 11 An Application Example ---
Intelligent Business systems
1)Objectives

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
understand customer buying habits and product preference
predict market tendency
make efficient, productive and profitable decisions
must be based on answers to questions on ALL historical business transaction data (Lots of !)

HTML version of Basic Foils prepared 7 October 97

Foil 12 Intelligent Business systems ---
2)Typical Questions

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
"Give me a breakdown of all customers likely to default in the coming year"
"Do three things: predict year-end demand, tell me which customers will fuel that demand, and tell me why"
"Define a highly concentrated micromarket (make special offers to customer segments that meet very select criteria) within my database"
"Create a model that really explains why some customers renew their subscriptions and others don't"
"Suggest ways of regrouping our customers into new market segments (for Direct-Marketing)"
"Find some customer-preference pattern I might not be aware of"

HTML version of Basic Foils prepared 7 October 97

Foil 13 Intelligent Business systems ---
3) Major Technology Challenges

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
Lots of Historical Data (and they are growing daily !)
High Performance Computing System and DBMS
Intelligent Data Analysis (Optimization)
Not possible on a mainframe system in terms of both performance and capacity

HTML version of Basic Foils prepared 7 October 97

Foil 14 Intelligent Business systems ---
4) Solutions

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
MPP system
Parallel Relational Database Management System
Intelligent Datamining Algorithms
  • Expert Systems
  • Fractal Compression
  • Neural Networks
  • Modelling
  • Genetic Algorithms

HTML version of Basic Foils prepared 7 October 97

Foil 15 Major Software and Hardware vendors in Parallel Database Technology

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
Oracle (Oracle7 with parallel server and parallel query options)
Sybase (Navigation Server on AT&T 3600, SUN SMPs, and IBM SP2 (planned),SQL Server 10)
Informix (INFORMIX-OnLine Dynamic Server)
IBM (Oracle7, DB2/6000, DB2/2, SP2, IBM 390 Parallel Sysplex)
SGI (Oracle7 on SMP Challenge database server)
Taradata with AT&T and NCR
Tandem (Himalaya K10000
Cray Research (SuperServer)
SUN (Oracle7, DB2 on SUN's SMP)
AT&T and NCR
TMC (CM5, Darwin, Oracle7, Decision/SQL, Parasort)
KSR (Query Decomposor, Oracle7)
Amdahl (Oracle7)
nCUBE (Oracle7, nCUBE2)
data-CACHE Corp (SQL dataMANAGER)
Data General Corp (Oracle7, DSO)
DEC (POLYCENTER Manager)
HP (Oracle7, IMAGE/SQL, HP 3000)
Encore Computer Corp (Oracle7 on Infinity 90)

HTML version of Basic Foils prepared 7 October 97

Foil 16 Some Current Major Commercial Users

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
Kmart
American Express
AAA Airline
Citibank
Prudential Securities
Bell Atlantic Corp
BellSouth
BT (British Telecom)
Criminal Investigative Technology Inc.
Check http://greatwall.npac.syr.edu:1963 for more info about parallel database technology in industry

HTML version of Basic Foils prepared 7 October 97

Foil 17 Parallel Database Technology and Theory

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index

HTML version of Basic Foils prepared 7 October 97

Foil 18 Hardware architectures for parallel DBMS
-- Generic System

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
A typical hardware setup is:

HTML version of Basic Foils prepared 7 October 97

Foil 19 Hardware Architectures and forms of Parallelism

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index

HTML version of Basic Foils prepared 7 October 97

Foil 20 Three (Hardware) architectures for parallel DBMS

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
shared-memory (also called Symmetric Multiprocessors SMP)
  • all processors share direct access to a common global memory and to all disks. example: Oracle7 on SGI Challenge
shared-disks
  • each processor has a private memory but has direct access to all disk. example: Oracle 7 on IBM SP2
shared-nothing
  • each memory and disk is owned by some processor that acts as a server for that data. Mass storage is distributed among processors by connecting one or more disk to each processor. example: DB2 on SP2
  • This is "owner compute" rule as in much scientific computing (HPF default implies processor where variable "stored" performs computations setting variable's value)

HTML version of Basic Foils prepared 7 October 97

Foil 21 Topological view of MP machines (with "Scalability" in mind)

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
Shared-Memory and Shared-Disk
  • interference limits scalability
  • complicated and relatively expensive
Shared Nothing
  • minimal interference between processors
  • moves only questions and answers over the network
  • can scale up to hundreds (maybe thousands) of processors
  • composed of cheap, stock parts and a fast interconnecting network
  • achieve near-linear improvements
  • good for system scalability and portability
Shared Data
  • Little experience so far

HTML version of Basic Foils prepared 7 October 97

Foil 22 Notes on Shared Nothing Architecture

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
On a shared-nothing architecture, one or more disks are connectedto each processor, depending on machine's (parallel) I/O architecture, via direct connecting or some I/O channels.
The shared-nothing does not focus on the physical I/O connection architectures, rather, it refers tohow the data is partitioned on the disk arrays and how the data are movedinto the procesor buffers for the parallel query processing.
In this architecture, data locations determines where the data will be processed and how data is shared by processors other than its local processor (ie. the processor which has the local disk holding the data if direct connection like DB2 on SP2, or the shortest path from disks to the process if other I/O architectures like Oracle on ncube).
Unlike shared-disk where data may be shared via the interconnection network when it is first readfrom a disk to a remote processor (buffer), if the data local to processorA is required by a remote processor B to perfom some parallel query processing, A sends the data to B via the communication network.
That is the only shared media for shared-nothing is the communication network.
In most case, data placement has determined how and where (even when) a parallel query processing is decomposed or partitioned.
Data (or I/O) load balance determines the CPU load balance in the system.
This is built in the query decomposer. eg.
if Processor A gets 10% data local to its disk, while B gets remaining 90%.
Then in a shared-nothing system, A spends 10% CPU and then is idle while B takes most of CPU time.
On the otherhand for a shared-disk system, A and B will process similiar amount of data with some overhead of sending data from B to A.

HTML version of Basic Foils prepared 7 October 97

Foil 23 Dataflow perspective

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
Pipelined: "assembly line"
  • average query does not have that many steps
  • some operational stages cannot be pipelined (e.g. Sort)
  • one operator is longer than the other (a kind of skew)
  • This is task parallelism in scientific computing
Partitioned: "divide and conquer"
  • fairly straight forward to implement
  • "tough" operations are divided, parts run in parallel
  • This is classic data parallelism in scientific computing
Combination: "best of both world"
  • several partitions running parallel
  • each partition is, where possible, a short pipeline

HTML version of Basic Foils prepared 7 October 97

Foil 24 Parallelisms in parallel database systems

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
pipelined parallelism (inter-operator parallelism)
query parallelism
  • inter-query parallelism
  • intra-operator parallelism
partitioned (declustered) parallelism (I/O parallelism)
transaction parallelism

HTML version of Basic Foils prepared 7 October 97

Foil 25 Data Partitioning --- How to divide data among multiple disks ?

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
a common practice on large number of disks and mass storage systems (file striping)
high I/O bandwidth of multiple disks and parallel I/O
no need for specialized hardware
important for query load balance on a shared-nothing system
this is of course a central problem in scientific computing and much of High Performance Fortran is devoted to data partitioning

HTML version of Basic Foils prepared 7 October 97

Foil 26 Basic Data Partitioning Schemes

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index

HTML version of Basic Foils prepared 7 October 97

Foil 27 Major Approaches in Data Partitioning

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
Round Robin ( cyclic in HPF/Scientific Computing Notation)
  • data scattered
  • cannot locate specific records
Hashing ( scattered decomposition in HPF style)
  • scatters the data
  • can locate specific records
  • danger of data skew
Range Partitioning (block cyclic in HPF style)
  • data not scattered
  • can locate specific records
  • danger of data skew
  • "related" data can be clustered

HTML version of Basic Foils prepared 7 October 97

Foil 28 Danger of Data Skew arises on
Shared Nothing with Data Partitioning

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index

HTML version of Basic Foils prepared 7 October 97

Foil 29 Pitfalls in data partitioning

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
operations/application dependent
danger of data skew arises on shared-nothing with data partitioning
data skew is the focus of a number of research projects
automatic OS file striping vs manual (application-specific) table striping

HTML version of Basic Foils prepared 7 October 97

Foil 30 Performance Metrics In Parallel Database Systems

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
speedup (degree of parallelism)
  • (fixed problem size)
scaleup (scalability) -- transaction scaleup, batch scaleup, processor scaleup, I/O scaleup
  • (problem size increases)
sizeup (both)
  • (scaled speedup in scientific computing increases problem and computer size by same factor)

HTML version of Basic Foils prepared 7 October 97

Foil 31 Performance barriers

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
startup (latency)
interference (communication, locking)
load balance (execution skew - all the execution occurs in one partition)
data-partition (data skew - all the data is placed in one partition)
imbalance of system resources (CPUs and I/O channels)
CPU-intensive or I/O intensive queries

HTML version of Basic Foils prepared 7 October 97

Foil 32 Some basic terminology for relational database model

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
Data Structure
  • relations (files, tables)
  • tuples (records, rows)
  • attributes (fields, columns)
Relation operators
  • scan (select-project) (a relation, a predicate, and an attribute list)
  • sort (reorder)
  • aggregate operators (SUM,AVG,MAX,MIN,...)
  • insert/delete/update
  • set operators (union, intersection, difference)
  • join, merge and division
embedded operators
  • uniformity of the data and operators
  • source of data-flow execution model

HTML version of Basic Foils prepared 7 October 97

Foil 33 Examples of Typical Relational Operations

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index

HTML version of Basic Foils prepared 7 October 97

Foil 34 Join Operation: a SELECT operation that combines rows from two or more tables. Each returned row contains data from more than one table

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
SQL: Select A.A1,A.A3,B.B1,B.B2 from A,B where A.A1=B.B1;

HTML version of Basic Foils prepared 7 October 97

Foil 35 Overview of Structure Query Language (SQL)

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
a database language specifies the semantics of various components of a DBMS: structures and operation of a data model, data definition, data access, security, programming language interface, and data administration
Industry accepted Standard, first introduced by ANSI in 1986, current SQL92 by ANSI and ISO, new standard SQL3 with enhancements in object-oriented data management is undergoing. Portable to all RDBMS systems.
built on relational model, simple and powerful
non-procedural 4GL language, only specify "what-to-do", not "how-to-do" , extended to object-oriented programming interface
  • this extended model competes with fledging object-oriented database in industry

HTML version of Basic Foils prepared 7 October 97

Foil 36 Features of Structure Query Language (SQL)

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
applications in SQL do not need to be rewritten to run on parallel server (all parallelizations of SQL programs are carried out by the server), compared to scientific areas where both parallel application algorithms and a parallel compiler are required
data-independent --- two-phase database schema (logical and physical database)
SQL in Oracle 7
  • create tables in the database
  • store information in tables
  • select exactly the information you need from your database
  • make changes to your data and the structure of underlying tables, and
  • combine and calculate data to generate the information you need

HTML version of Basic Foils prepared 7 October 97

Foil 37 Major RDBMS functionality

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
Data Access --- SQL, Transactions, PL/SQL, Data Integrity
Data Concurrency and Consistency --- Concurrency, Locking
Data Security --- Users and Schemes, Roles and Privileges, Profiles and Auditing
Database Backup and Recovery --- Redo log, Rolling Forward/Back
Distributed Processing --- Client-Server environment

HTML version of Basic Foils prepared 7 October 97

Foil 38 What is a 3GL or 4GL?

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
Historians of programming languages and database people tend to classify programming languages by generations:
1GL --- machine language (in 0/1 binary)
2GL --- assembly language (in symbolic machine language)
3GL --- high-level programming language (structural, modular and usually procedural)
4GL --- non-procedural (declarative). Most AI languages are 4GL.

HTML version of Basic Foils prepared 7 October 97

Foil 39 What is PL/SQL -- I ?

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
PL/SQL illustrates some low-level data access tools/languages that are available in most RDBMS.
Basically, there are three levels of database access interfaces:
1. highest level --- SQL
  • Non-procedure language, only specify what-to-do, not how-to-do.
  • So-called fouth-generation language (4GL). no control structures(loop,if)
  • No module (block) structure, very expressive or declarative.
  • One Other Language example in this category is Prolog.

HTML version of Basic Foils prepared 7 October 97

Foil 40 What is PL/SQL -- II?

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
2. Middle-level --- PL/SQL
  • fill the gap between the highest SQL level and lowest C level programming.
  • It is basically a procedual implementation of SQL access.
  • Pure SQL is a subset of PL/SQL where more C-like constructs are introduced
  • Such as block-structure,control statments, variables,module,interface specification,functions, etc).
  • But it doesnt have the rich data-types and low-level operators as in C such as
  • Structure(record),enumeration,array, pointer(reference), and set datatypes, and bitwise-operator,I/O and file handling.

HTML version of Basic Foils prepared 7 October 97

Foil 41 What is PL/SQL -- III?

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
3. Embedded SQL in C (or f77,Cobol) --- programming language interfaces
  • Procedural high-level programming langauges (often called 3GL).
  • All capability and flexibility in C-level programming as a host language to embed the SQL statement as the data-access for querying the database engine.
  • Poweful in programming but complex sematics and inconvenient interface to SQL.
  • Usually you need to know many programming/libary details to interface the both in programming.
  • Good for pre-/post-processing to the query but bad for query programming.

HTML version of Basic Foils prepared 7 October 97

Foil 42 What is Data Integrity?

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
A way to enforce error-checking and formating in data-types. Usually the rules for data integrity is created when a table is specified. Examples:
  • Null -- a data field may or may not be allowed to be Null.
  • Unique -- a data field may or may not be allowed to be uniquely identified among all the other raws in a table
  • Primary key -- a data field being a primary key must be unique
  • Referential integrity -- inserting or updating a field's value on table A must depend on if it matches the value of a field on another table B.
Data integrity is the major way to enforce the data-consistency in the database.

HTML version of Basic Foils prepared 7 October 97

Foil 43 What are Schemes?

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
Schemes are used to define scope of different data objects to enforce data secuity and naming issues in a DBMS.
A Scheme in a DB is a logical hierarchy of data storage structures.
Major scheme objects are: users, tablespaces, tables, views, index, cluster, synonyms, columns.
Access to Different Schemes require different roles and privileges for different users.

HTML version of Basic Foils prepared 7 October 97

Foil 44 What are Roles?

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
A way to classify level of database users. Like sysadm usually has superuser privileges, roles are used to group users with different privileges.
Because a DBMS system has many levels of operations which set many levels(kinds) of operation privilages,
Role is another logical level of grouping to define/specify who will have what system privileges.
It makes system management easier and more flexible.

HTML version of Basic Foils prepared 7 October 97

Foil 45 What are Profiles and Auditing?

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
Profiles and Auditing are the way to monitor (system logging) and record user database actions
They are used for:
  • investigating suspicious activity and .
  • gathering data about specific activity for performance tuning and system statistics.
These are just a DB way of saying logging.
As in other cases, Database field sometimes use its own different terms for the same thing in other fields.

HTML version of Basic Foils prepared 7 October 97

Foil 46 What are Two-phase Database Schema?

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
For a RDBMS there are two levels of abstractions of how data being stored and represented in the system:
1) Logical database consists of the conceptual view of a DB system which is described by an ER (entity-relationship) model, defined and access by SQL and represented by tables,columns,views and other abstractions of data-object;
2) The physical database consists of the actual phyical view of the DB which is represented by files,blocks,indexes,clusters,partitions,etc.
End-user and developers only need to deal with the logical level,
While the DBMS and a DBA (database administrator) define and perform the mapping between the two levels.
This is the reason why SQL achieves portability (oompared to f77), from a viewpoint of a data independent model.

HTML version of Basic Foils prepared 7 October 97

Foil 47 Example: Relational Joins

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index

HTML version of Basic Foils prepared 7 October 97

Foil 48 General Structure of Parallel and Sequential Relational Joins

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index

HTML version of Basic Foils prepared 7 October 97

Foil 49 Parallel Algorithm for Relational Joins

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index

HTML version of Basic Foils prepared 7 October 97

Foil 50 Parallel Database Software Architecture
Distributed Lock Manager

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
Implementation of Parallel Cache Management
Support transaction parallelism of multiple OLTP
A simple and effective approach to port sequential RDBMS to MPP and loosely coupled systems with shared-nothing architecture
support parallel loading, parallel indexing, parallel insert/update and parallel recovery
Major functionality
  • keep track of the current "ownship" of a resource
  • accepts requests for resources from application processes
  • notifies the requesting process when a resource is available
  • get exclusive access to a resource for a resource
Note this can work fine in OLTP with many uncorrelated queries but will not work in scientific computing where ALL updates are correclated and reserving a resource introduces a sequential bottleneck

HTML version of Basic Foils prepared 7 October 97

Foil 51 Parallel Database Software Architecture

From untitled presentation ARL Database Tutorial -- February 98. *
Full HTML Index
Parallel Query Optimizer
  • Parallel Query Optimizer builds a query execution plan employing all system resources on the basis of data-partition knowledge, it can easily employ few tens of processors to execution a single query with high degree of parallelism. Both data-parallel and data-flow paradigms are used to achieve this goal
  • Support complex CPU-intensive or I/O-intensive queries for DSS
  • near-linear or even super-linear speedup can be achieved (due to larger effective caches)
  • Major focus for research --- optimized parallel algorithms needed
  • carried out by a query coordinator on the RDBMS server (the query coordinator generates the optimal parallel plan rather than having the best serial plan simply executed in parallel. This optimization is performed by the coordinator without the need for any external information (DB2 and Oracle7 use different approaches)
  • most relational operators are parallizable, including full-table scan, sort, aggregate operators, join, merge
  • System/Server tuning plays an important role in achieving optimized performance in a parallel RDBMS --- benchmarking is an effective means of performing tuning

© on Mon Oct 13 1997