Full HTML for Basic First Part (Motivation, Theory) of Parallel Relational Databases

Full HTML for

Basic foilset First Part (Motivation, Theory) of Parallel Relational Databases

Given by Gang Cheng Marek Podgorny Chao-Wei Ou at ARL Database Tutorial on February 98. Foils prepared 7 October 97
Outside Index Summary of Material

This presentation contains the first two sections

Parallel Database Technology in Commercial Applications and Industry
Parallel Database Technology and Theory

Of the full ARL Parallel Database Module

The first section sets the scene by motivating the need for paraalel databases while

The second section reviews both Sequentional and Parallel Relational Databases looking at explicit examples

nCUBE and SP2 with Oracle and DB2

We also discuss database system architectures and review

The SQL Query language

Table of Contents for full HTML of First Part (Motivation, Theory) of Parallel Relational Databases

Denote Foils where Image Critical

Denote Foils where HTML is sufficient

1

Parallel Relational Database Management Systems -- I
2

Abstract of Parallel Relational Database Management Systems -- I
3

Outline of Full Database Presentation
4

Section I:
Parallel Database Technology in Commercial Applications and Industry
5

Motivations for Parallel Databases -- I:
Overview of Parallel Database Appeal
6

Motivations for Parallel Databases -- II:
Inadequacies with Current Mainframe Solutions
7

Motivations for Parallel Databases -- III:
Commercial versus Scientific Applications
8

Motivations for Parallel Databases -- IV:
Market Demand from Competitiveness
9

Application Areas for Parallel Database: commercial, administration, scientific
10

General Classes of Commercial Applications
11

An Application Example ---
Intelligent Business systems
1)Objectives
12

Intelligent Business systems ---
2)Typical Questions
13

Intelligent Business systems ---
3) Major Technology Challenges
14

Intelligent Business systems ---
4) Solutions
15

Major Software and Hardware vendors in Parallel Database Technology
16

Some Current Major Commercial Users
17

Parallel Database Technology and Theory
18

Hardware architectures for parallel DBMS
-- Generic System
19

Hardware Architectures and forms of Parallelism
20

Three (Hardware) architectures for parallel DBMS
21

Topological view of MP machines (with "Scalability" in mind)
22

Notes on Shared Nothing Architecture
23

Dataflow perspective
24

Parallelisms in parallel database systems
25

Data Partitioning --- How to divide data among multiple disks ?
26

Basic Data Partitioning Schemes
27

Major Approaches in Data Partitioning
28

Danger of Data Skew arises on
Shared Nothing with Data Partitioning
29

Pitfalls in data partitioning
30

Performance Metrics In Parallel Database Systems
31

Performance barriers
32

Some basic terminology for relational database model
33

Examples of Typical Relational Operations
34

Join Operation: a SELECT operation that combines rows from two or more tables. Each returned row contains data from more than one table
35

Overview of Structure Query Language (SQL)
36

Features of Structure Query Language (SQL)
37

Major RDBMS functionality
38

What is a 3GL or 4GL?
39

What is PL/SQL -- I ?
40

What is PL/SQL -- II?
41

What is PL/SQL -- III?
42

What is Data Integrity?
43

What are Schemes?
44

What are Roles?
45

What are Profiles and Auditing?
46

What are Two-phase Database Schema?
47

Example: Relational Joins
48

General Structure of Parallel and Sequential Relational Joins
49

Parallel Algorithm for Relational Joins
50

Parallel Database Software Architecture
Distributed Lock Manager
51

Parallel Database Software Architecture

Outside Index Summary of Material

HTML version of Basic Foils prepared 7 October 97

Foil 1 Parallel Relational Database Management Systems -- I

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

Click here for subtitle

Gang Cheng, Geoffrey C. Fox, Nancy McCracken, Marek Podgorny, Chao-Wei Ou

October 1997

Northeast Parallel Architectures Center

at Syracuse University

HTML version of Basic Foils prepared 7 October 97

Foil 2 Abstract of Parallel Relational Database Management Systems -- I

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

This presentation contains the first two sections

Parallel Database Technology in Commercial Applications and Industry
Parallel Database Technology and Theory

Of the full ARL Parallel Database Module

The first section sets the scene by motivating the need for paraalel databases while

The second section reviews both Sequentional and Parallel Relational Databases looking at explicit examples

nCUBE and SP2 with Oracle and DB2

We also discuss database system architectures and review

The SQL Query language

HTML version of Basic Foils prepared 7 October 97

Foil 3 Outline of Full Database Presentation

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

This Presentation has five Sections

Parallel Database Technology in Commercial Applications and Industry

Parallel Database Technology and Theory

Parallel Database Projects at NPAC

Parallel Oracle7 RDBMS -- A Case Study

Parallel Database Benchmarking

HTML version of Basic Foils prepared 7 October 97

Foil 4 Section I:
Parallel Database Technology in Commercial Applications and Industry

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

HTML version of Basic Foils prepared 7 October 97

Foil 5 Motivations for Parallel Databases -- I:
Overview of Parallel Database Appeal

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

scalability
system expandability alleviates migration problems
- business branches of different size can use the same technology
power and price/performance ratio
- nCUBE + Oracle capable of >100 tps at ~$2000/tps (TPC-B benchmark, compare with 425 tps for 4*VAX cluster at $16,500/tps)
simultaneous OLTP and complex Decision Support queries (single enterprise image
High availability
Integrability
- parallel Oracle RDBMS integrates as easily as conventional RDBMS
does not require re-investment in application software
- Industry standard database language - SQL (Structured Query Language)
- Parallelization is carried out at RDBMS server side
- Migration to parallel RDBMS is painless (compare to scientific applications)

HTML version of Basic Foils prepared 7 October 97

Foil 6 Motivations for Parallel Databases -- II:
Inadequacies with Current Mainframe Solutions

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

Conventional (mainframe) systems unable to support

time-critical queries
simultaneous OLTP and decision support systems
gigantic scientific and commercial databases
complex queries (approximate matching, pattern recognition ...)
sophisticated optimization as in market segmentation

HTML version of Basic Foils prepared 7 October 97

Foil 7 Motivations for Parallel Databases -- III:
Commercial versus Scientific Applications

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

In commercial business world, MPP systems

Can find more profitable applications than the scientific world,
As SQL intrinsically parallel, one needs only parallelize the database system and not the many SQL applications
- in contrast for scientific computing, one needs to both produce a parallel compiler and parallelize each application at language and algorithm level
Have better cost/performance than the mainframe system
Are mature and highly available NOW

HTML version of Basic Foils prepared 7 October 97

Foil 8 Motivations for Parallel Databases -- IV:
Market Demand from Competitiveness

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

Most commercial cooperations re-organize their businesses
to be more efficient with a smaller workforce
need more computing capacity to grow business and fast response to have better services
need better system scalability and price/performance

HTML version of Basic Foils prepared 7 October 97

Foil 9 Application Areas for Parallel Database: commercial, administration, scientific

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

High transaction rate applications

banks, airlines, telecommunication, security brokers, retailers ...

Real-time applications

stock trading, C3I (military, civil defense), air traffic, process control (manufacturing, accelerators, reactors/generators), real-time insurance claim analysis ...

Complex queries

decision support (market analysis), molecular biology, chemistry/pharmacology ...

Massive amount of data

EOS, HEP, weather, census, global environment models, oil exploration, medical history files, multimedia support ...

HTML version of Basic Foils prepared 7 October 97

Foil 10 General Classes of Commercial Applications

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

OLTP (On Line Transaction Processing) --- consists of performing many small independent requests submitted by many clients simultaneously and operating

Performance needed is linear in database size

DSS (Decision Support System)

Performance needed can be exponential in database size for sophisticated optimization or queries

Data Warehouse --- operations mixing DSS and OLTP

Data mining --- uncover/extract meaningful information from massive data sources

HTML version of Basic Foils prepared 7 October 97

Foil 11 An Application Example ---
Intelligent Business systems
1)Objectives

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

understand customer buying habits and product preference

predict market tendency

make efficient, productive and profitable decisions

must be based on answers to questions on ALL historical business transaction data (Lots of !)

HTML version of Basic Foils prepared 7 October 97

Foil 12 Intelligent Business systems ---
2)Typical Questions

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

"Give me a breakdown of all customers likely to default in the coming year"

"Do three things: predict year-end demand, tell me which customers will fuel that demand, and tell me why"

"Define a highly concentrated micromarket (make special offers to customer segments that meet very select criteria) within my database"

"Create a model that really explains why some customers renew their subscriptions and others don't"

"Suggest ways of regrouping our customers into new market segments (for Direct-Marketing)"

"Find some customer-preference pattern I might not be aware of"

HTML version of Basic Foils prepared 7 October 97

Foil 13 Intelligent Business systems ---
3) Major Technology Challenges

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

Lots of Historical Data (and they are growing daily !)

High Performance Computing System and DBMS

Intelligent Data Analysis (Optimization)

Not possible on a mainframe system in terms of both performance and capacity

HTML version of Basic Foils prepared 7 October 97

Foil 14 Intelligent Business systems ---
4) Solutions

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

MPP system

Parallel Relational Database Management System

Intelligent Datamining Algorithms

Expert Systems
Fractal Compression
Neural Networks
Modelling
Genetic Algorithms

HTML version of Basic Foils prepared 7 October 97

Foil 15 Major Software and Hardware vendors in Parallel Database Technology

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

Oracle (Oracle7 with parallel server and parallel query options)

Sybase (Navigation Server on AT&T 3600, SUN SMPs, and IBM SP2 (planned),SQL Server 10)

Informix (INFORMIX-OnLine Dynamic Server)

IBM (Oracle7, DB2/6000, DB2/2, SP2, IBM 390 Parallel Sysplex)

SGI (Oracle7 on SMP Challenge database server)

Taradata with AT&T and NCR

Tandem (Himalaya K10000

Cray Research (SuperServer)

SUN (Oracle7, DB2 on SUN's SMP)

AT&T and NCR

TMC (CM5, Darwin, Oracle7, Decision/SQL, Parasort)

KSR (Query Decomposor, Oracle7)

Amdahl (Oracle7)

nCUBE (Oracle7, nCUBE2)

data-CACHE Corp (SQL dataMANAGER)

Data General Corp (Oracle7, DSO)

DEC (POLYCENTER Manager)

HP (Oracle7, IMAGE/SQL, HP 3000)

Encore Computer Corp (Oracle7 on Infinity 90)

HTML version of Basic Foils prepared 7 October 97

Foil 16 Some Current Major Commercial Users

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

Kmart

American Express

AAA Airline

Citibank

Prudential Securities

Bell Atlantic Corp

BellSouth

BT (British Telecom)

Criminal Investigative Technology Inc.

Check http://greatwall.npac.syr.edu:1963 for more info about parallel database technology in industry

HTML version of Basic Foils prepared 7 October 97

Foil 17 Parallel Database Technology and Theory

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

HTML version of Basic Foils prepared 7 October 97

Foil 18 Hardware architectures for parallel DBMS
-- Generic System

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

A typical hardware setup is:

HTML version of Basic Foils prepared 7 October 97

Foil 19 Hardware Architectures and forms of Parallelism

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

HTML version of Basic Foils prepared 7 October 97

Foil 20 Three (Hardware) architectures for parallel DBMS

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

shared-memory (also called Symmetric Multiprocessors SMP)

all processors share direct access to a common global memory and to all disks. example: Oracle7 on SGI Challenge

shared-disks

each processor has a private memory but has direct access to all disk. example: Oracle 7 on IBM SP2

shared-nothing

each memory and disk is owned by some processor that acts as a server for that data. Mass storage is distributed among processors by connecting one or more disk to each processor. example: DB2 on SP2
This is "owner compute" rule as in much scientific computing (HPF default implies processor where variable "stored" performs computations setting variable's value)

HTML version of Basic Foils prepared 7 October 97

Foil 21 Topological view of MP machines (with "Scalability" in mind)

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

Shared-Memory and Shared-Disk

interference limits scalability
complicated and relatively expensive

Shared Nothing

minimal interference between processors
moves only questions and answers over the network
can scale up to hundreds (maybe thousands) of processors
composed of cheap, stock parts and a fast interconnecting network
achieve near-linear improvements
good for system scalability and portability

Shared Data

Little experience so far

HTML version of Basic Foils prepared 7 October 97

Foil 22 Notes on Shared Nothing Architecture

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

On a shared-nothing architecture, one or more disks are connectedto each processor, depending on machine's (parallel) I/O architecture, via direct connecting or some I/O channels.

The shared-nothing does not focus on the physical I/O connection architectures, rather, it refers tohow the data is partitioned on the disk arrays and how the data are movedinto the procesor buffers for the parallel query processing.

In this architecture, data locations determines where the data will be processed and how data is shared by processors other than its local processor (ie. the processor which has the local disk holding the data if direct connection like DB2 on SP2, or the shortest path from disks to the process if other I/O architectures like Oracle on ncube).

Unlike shared-disk where data may be shared via the interconnection network when it is first readfrom a disk to a remote processor (buffer), if the data local to processorA is required by a remote processor B to perfom some parallel query processing, A sends the data to B via the communication network.

That is the only shared media for shared-nothing is the communication network.

In most case, data placement has determined how and where (even when) a parallel query processing is decomposed or partitioned.

Data (or I/O) load balance determines the CPU load balance in the system.

This is built in the query decomposer. eg.

if Processor A gets 10% data local to its disk, while B gets remaining 90%.

Then in a shared-nothing system, A spends 10% CPU and then is idle while B takes most of CPU time.

On the otherhand for a shared-disk system, A and B will process similiar amount of data with some overhead of sending data from B to A.

HTML version of Basic Foils prepared 7 October 97

Foil 23 Dataflow perspective

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

Pipelined: "assembly line"

average query does not have that many steps
some operational stages cannot be pipelined (e.g. Sort)
one operator is longer than the other (a kind of skew)
This is task parallelism in scientific computing

Partitioned: "divide and conquer"

fairly straight forward to implement
"tough" operations are divided, parts run in parallel
This is classic data parallelism in scientific computing

Combination: "best of both world"

several partitions running parallel
each partition is, where possible, a short pipeline

HTML version of Basic Foils prepared 7 October 97

Foil 24 Parallelisms in parallel database systems

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

pipelined parallelism (inter-operator parallelism)

query parallelism

inter-query parallelism
intra-operator parallelism

partitioned (declustered) parallelism (I/O parallelism)

transaction parallelism

HTML version of Basic Foils prepared 7 October 97

Foil 25 Data Partitioning --- How to divide data among multiple disks ?

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

a common practice on large number of disks and mass storage systems (file striping)

high I/O bandwidth of multiple disks and parallel I/O

no need for specialized hardware

important for query load balance on a shared-nothing system

this is of course a central problem in scientific computing and much of High Performance Fortran is devoted to data partitioning

HTML version of Basic Foils prepared 7 October 97

Foil 26 Basic Data Partitioning Schemes

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

HTML version of Basic Foils prepared 7 October 97

Foil 27 Major Approaches in Data Partitioning

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

Round Robin ( cyclic in HPF/Scientific Computing Notation)

data scattered
cannot locate specific records

Hashing ( scattered decomposition in HPF style)

scatters the data
can locate specific records
danger of data skew

Range Partitioning (block cyclic in HPF style)

data not scattered
can locate specific records
danger of data skew
"related" data can be clustered

HTML version of Basic Foils prepared 7 October 97

Foil 28 Danger of Data Skew arises on
Shared Nothing with Data Partitioning

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

HTML version of Basic Foils prepared 7 October 97

Foil 29 Pitfalls in data partitioning

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

operations/application dependent

danger of data skew arises on shared-nothing with data partitioning

data skew is the focus of a number of research projects

automatic OS file striping vs manual (application-specific) table striping

HTML version of Basic Foils prepared 7 October 97

Foil 30 Performance Metrics In Parallel Database Systems

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

speedup (degree of parallelism)

(fixed problem size)

scaleup (scalability) -- transaction scaleup, batch scaleup, processor scaleup, I/O scaleup

(problem size increases)

sizeup (both)

(scaled speedup in scientific computing increases problem and computer size by same factor)

HTML version of Basic Foils prepared 7 October 97

Foil 31 Performance barriers

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

startup (latency)

interference (communication, locking)

load balance (execution skew - all the execution occurs in one partition)

data-partition (data skew - all the data is placed in one partition)

imbalance of system resources (CPUs and I/O channels)

CPU-intensive or I/O intensive queries

HTML version of Basic Foils prepared 7 October 97

Foil 32 Some basic terminology for relational database model

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

Data Structure

relations (files, tables)
tuples (records, rows)
attributes (fields, columns)

Relation operators

scan (select-project) (a relation, a predicate, and an attribute list)
sort (reorder)
aggregate operators (SUM,AVG,MAX,MIN,...)
insert/delete/update
set operators (union, intersection, difference)
join, merge and division

embedded operators

uniformity of the data and operators
source of data-flow execution model

HTML version of Basic Foils prepared 7 October 97

Foil 33 Examples of Typical Relational Operations

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

HTML version of Basic Foils prepared 7 October 97

Foil 34 Join Operation: a SELECT operation that combines rows from two or more tables. Each returned row contains data from more than one table

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

SQL: Select A.A1,A.A3,B.B1,B.B2 from A,B where A.A1=B.B1;

HTML version of Basic Foils prepared 7 October 97

Foil 35 Overview of Structure Query Language (SQL)

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

a database language specifies the semantics of various components of a DBMS: structures and operation of a data model, data definition, data access, security, programming language interface, and data administration

Industry accepted Standard, first introduced by ANSI in 1986, current SQL92 by ANSI and ISO, new standard SQL3 with enhancements in object-oriented data management is undergoing. Portable to all RDBMS systems.

built on relational model, simple and powerful

non-procedural 4GL language, only specify "what-to-do", not "how-to-do" , extended to object-oriented programming interface

this extended model competes with fledging object-oriented database in industry

HTML version of Basic Foils prepared 7 October 97

Foil 36 Features of Structure Query Language (SQL)

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

applications in SQL do not need to be rewritten to run on parallel server (all parallelizations of SQL programs are carried out by the server), compared to scientific areas where both parallel application algorithms and a parallel compiler are required

data-independent --- two-phase database schema (logical and physical database)

SQL in Oracle 7

create tables in the database
store information in tables
select exactly the information you need from your database
make changes to your data and the structure of underlying tables, and
combine and calculate data to generate the information you need

HTML version of Basic Foils prepared 7 October 97

Foil 37 Major RDBMS functionality

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

Data Access --- SQL, Transactions, PL/SQL, Data Integrity

Data Concurrency and Consistency --- Concurrency, Locking

Data Security --- Users and Schemes, Roles and Privileges, Profiles and Auditing

Database Backup and Recovery --- Redo log, Rolling Forward/Back

Distributed Processing --- Client-Server environment

HTML version of Basic Foils prepared 7 October 97

Foil 38 What is a 3GL or 4GL?

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

Historians of programming languages and database people tend to classify programming languages by generations:

1GL --- machine language (in 0/1 binary)

2GL --- assembly language (in symbolic machine language)

3GL --- high-level programming language (structural, modular and usually procedural)

4GL --- non-procedural (declarative). Most AI languages are 4GL.

HTML version of Basic Foils prepared 7 October 97

Foil 39 What is PL/SQL -- I ?

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

PL/SQL illustrates some low-level data access tools/languages that are available in most RDBMS.

Basically, there are three levels of database access interfaces:

1. highest level --- SQL

Non-procedure language, only specify what-to-do, not how-to-do.
So-called fouth-generation language (4GL). no control structures(loop,if)
No module (block) structure, very expressive or declarative.
One Other Language example in this category is Prolog.

HTML version of Basic Foils prepared 7 October 97

Foil 40 What is PL/SQL -- II?

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

2. Middle-level --- PL/SQL

fill the gap between the highest SQL level and lowest C level programming.
It is basically a procedual implementation of SQL access.
Pure SQL is a subset of PL/SQL where more C-like constructs are introduced
Such as block-structure,control statments, variables,module,interface specification,functions, etc).
But it doesnt have the rich data-types and low-level operators as in C such as
Structure(record),enumeration,array, pointer(reference), and set datatypes, and bitwise-operator,I/O and file handling.

HTML version of Basic Foils prepared 7 October 97

Foil 41 What is PL/SQL -- III?

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

3. Embedded SQL in C (or f77,Cobol) --- programming language interfaces

Procedural high-level programming langauges (often called 3GL).
All capability and flexibility in C-level programming as a host language to embed the SQL statement as the data-access for querying the database engine.
Poweful in programming but complex sematics and inconvenient interface to SQL.
Usually you need to know many programming/libary details to interface the both in programming.
Good for pre-/post-processing to the query but bad for query programming.

HTML version of Basic Foils prepared 7 October 97

Foil 42 What is Data Integrity?

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

A way to enforce error-checking and formating in data-types. Usually the rules for data integrity is created when a table is specified. Examples:

Null -- a data field may or may not be allowed to be Null.
Unique -- a data field may or may not be allowed to be uniquely identified among all the other raws in a table
Primary key -- a data field being a primary key must be unique
Referential integrity -- inserting or updating a field's value on table A must depend on if it matches the value of a field on another table B.

Data integrity is the major way to enforce the data-consistency in the database.

HTML version of Basic Foils prepared 7 October 97

Foil 43 What are Schemes?

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

Schemes are used to define scope of different data objects to enforce data secuity and naming issues in a DBMS.

A Scheme in a DB is a logical hierarchy of data storage structures.

Major scheme objects are: users, tablespaces, tables, views, index, cluster, synonyms, columns.

Access to Different Schemes require different roles and privileges for different users.

HTML version of Basic Foils prepared 7 October 97

Foil 44 What are Roles?

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

A way to classify level of database users. Like sysadm usually has superuser privileges, roles are used to group users with different privileges.

Because a DBMS system has many levels of operations which set many levels(kinds) of operation privilages,

Role is another logical level of grouping to define/specify who will have what system privileges.

It makes system management easier and more flexible.

HTML version of Basic Foils prepared 7 October 97

Foil 45 What are Profiles and Auditing?

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

Profiles and Auditing are the way to monitor (system logging) and record user database actions

They are used for:

investigating suspicious activity and .
gathering data about specific activity for performance tuning and system statistics.

These are just a DB way of saying logging.

As in other cases, Database field sometimes use its own different terms for the same thing in other fields.

HTML version of Basic Foils prepared 7 October 97

Foil 46 What are Two-phase Database Schema?

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

For a RDBMS there are two levels of abstractions of how data being stored and represented in the system:

1) Logical database consists of the conceptual view of a DB system which is described by an ER (entity-relationship) model, defined and access by SQL and represented by tables,columns,views and other abstractions of data-object;

2) The physical database consists of the actual phyical view of the DB which is represented by files,blocks,indexes,clusters,partitions,etc.

End-user and developers only need to deal with the logical level,

While the DBMS and a DBA (database administrator) define and perform the mapping between the two levels.

This is the reason why SQL achieves portability (oompared to f77), from a viewpoint of a data independent model.

HTML version of Basic Foils prepared 7 October 97

Foil 47 Example: Relational Joins

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

HTML version of Basic Foils prepared 7 October 97

Foil 48 General Structure of Parallel and Sequential Relational Joins

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

HTML version of Basic Foils prepared 7 October 97

Foil 49 Parallel Algorithm for Relational Joins

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

HTML version of Basic Foils prepared 7 October 97

Foil 50 Parallel Database Software Architecture
Distributed Lock Manager

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

Implementation of Parallel Cache Management

Support transaction parallelism of multiple OLTP

A simple and effective approach to port sequential RDBMS to MPP and loosely coupled systems with shared-nothing architecture

support parallel loading, parallel indexing, parallel insert/update and parallel recovery

Major functionality

keep track of the current "ownship" of a resource
accepts requests for resources from application processes
notifies the requesting process when a resource is available
get exclusive access to a resource for a resource

Note this can work fine in OLTP with many uncorrelated queries but will not work in scientific computing where ALL updates are correclated and reserving a resource introduces a sequential bottleneck

HTML version of Basic Foils prepared 7 October 97

Foil 51 Parallel Database Software Architecture

From untitled presentation ARL Database Tutorial -- February 98. *

Full HTML Index

Parallel Query Optimizer

Parallel Query Optimizer builds a query execution plan employing all system resources on the basis of data-partition knowledge, it can easily employ few tens of processors to execution a single query with high degree of parallelism. Both data-parallel and data-flow paradigms are used to achieve this goal
Support complex CPU-intensive or I/O-intensive queries for DSS
near-linear or even super-linear speedup can be achieved (due to larger effective caches)
Major focus for research --- optimized parallel algorithms needed
carried out by a query coordinator on the RDBMS server (the query coordinator generates the optimal parallel plan rather than having the best serial plan simply executed in parallel. This optimization is performed by the coordinator without the need for any external information (DB2 and Oracle7 use different approaches)
most relational operators are parallizable, including full-table scan, sort, aggregate operators, join, merge
System/Server tuning plays an important role in achieving optimized performance in a parallel RDBMS --- benchmarking is an effective means of performing tuning

Basic foilset First Part (Motivation, Theory) of Parallel Relational Databases

Table of Contents for full HTML of First Part (Motivation, Theory) of Parallel Relational Databases

Foil 1 Parallel Relational Database Management Systems -- I

Foil 2 Abstract of Parallel Relational Database Management Systems -- I

Foil 3 Outline of Full Database Presentation

Foil 4 Section I: Parallel Database Technology in Commercial Applications and Industry

Foil 5 Motivations for Parallel Databases -- I:Overview of Parallel Database Appeal

Foil 6 Motivations for Parallel Databases -- II:Inadequacies with Current Mainframe Solutions

Foil 7 Motivations for Parallel Databases -- III:Commercial versus Scientific Applications

Foil 8 Motivations for Parallel Databases -- IV:Market Demand from Competitiveness

Foil 9 Application Areas for Parallel Database: commercial, administration, scientific

Foil 10 General Classes of Commercial Applications

Foil 11 An Application Example ---Intelligent Business systems1)Objectives

Foil 12 Intelligent Business systems ---2)Typical Questions

Foil 13 Intelligent Business systems ---3) Major Technology Challenges

Foil 14 Intelligent Business systems ---4) Solutions

Foil 15 Major Software and Hardware vendors in Parallel Database Technology

Foil 16 Some Current Major Commercial Users

Foil 17 Parallel Database Technology and Theory

Foil 18 Hardware architectures for parallel DBMS-- Generic System

Foil 19 Hardware Architectures and forms of Parallelism

Foil 20 Three (Hardware) architectures for parallel DBMS

Foil 21 Topological view of MP machines (with "Scalability" in mind)

Foil 22 Notes on Shared Nothing Architecture

Foil 23 Dataflow perspective

Foil 24 Parallelisms in parallel database systems

Foil 25 Data Partitioning --- How to divide data among multiple disks ?

Foil 26 Basic Data Partitioning Schemes

Foil 27 Major Approaches in Data Partitioning

Foil 28 Danger of Data Skew arises on Shared Nothing with Data Partitioning

Foil 29 Pitfalls in data partitioning

Foil 30 Performance Metrics In Parallel Database Systems

Foil 31 Performance barriers

Foil 32 Some basic terminology for relational database model

Foil 33 Examples of Typical Relational Operations

Foil 34 Join Operation: a SELECT operation that combines rows from two or more tables. Each returned row contains data from more than one table

Foil 35 Overview of Structure Query Language (SQL)

Foil 36 Features of Structure Query Language (SQL)

Foil 37 Major RDBMS functionality

Foil 38 What is a 3GL or 4GL?

Foil 39 What is PL/SQL -- I ?

Foil 40 What is PL/SQL -- II?

Foil 41 What is PL/SQL -- III?

Foil 42 What is Data Integrity?

Foil 43 What are Schemes?

Foil 44 What are Roles?

Foil 45 What are Profiles and Auditing?

Foil 46 What are Two-phase Database Schema?

Foil 47 Example: Relational Joins

Foil 48 General Structure of Parallel and Sequential Relational Joins

Foil 49 Parallel Algorithm for Relational Joins

Foil 50 Parallel Database Software ArchitectureDistributed Lock Manager

Foil 51 Parallel Database Software Architecture

Foil 4 Section I:
Parallel Database Technology in Commercial Applications and Industry

Foil 5 Motivations for Parallel Databases -- I:
Overview of Parallel Database Appeal

Foil 6 Motivations for Parallel Databases -- II:
Inadequacies with Current Mainframe Solutions

Foil 7 Motivations for Parallel Databases -- III:
Commercial versus Scientific Applications

Foil 8 Motivations for Parallel Databases -- IV:
Market Demand from Competitiveness

Foil 11 An Application Example ---
Intelligent Business systems
1)Objectives

Foil 12 Intelligent Business systems ---
2)Typical Questions

Foil 13 Intelligent Business systems ---
3) Major Technology Challenges

Foil 14 Intelligent Business systems ---
4) Solutions

Foil 18 Hardware architectures for parallel DBMS
-- Generic System

Foil 28 Danger of Data Skew arises on
Shared Nothing with Data Partitioning

Foil 50 Parallel Database Software Architecture
Distributed Lock Manager