HELP! * GREY=local Full HTML for

LOCAL foilset First Part (Motivation, Theory) of Parallel Relational Databases

Given by Gang Cheng Marek Podgorny (Geoffrey Fox) at CPS600 Spring Semester on April 1995. Foils prepared July 6,1995
Abstract * Foil Index for this file

See also color IMAGE
This presentation contains the first two sections
  • Parallel Database Technology in Commercial Applications and Industry
  • Parallel Database Technology and Theory
Of the full CPS616 Parallel Database Module
The first section sets the scene by motivating the need for paraalel databases while
The second section reviews both Sequentional and Parallel Relational Databases looking at explicit examples
  • nCUBE and SP2 with Oracle and DB2
We also discuss database system architectures and review
The SQL Query language

Table of Contents for full HTML of First Part (Motivation, Theory) of Parallel Relational Databases


1 Parallel Relational Database Management Systems -- I
2 Abstract of Parallel Relational Database Management Systems -- I
3 Outline of Full Database Presentation
4 Section I:
Parallel Database Technology in Commercial Applications and Industry

5 Motivations for Parallel Databases -- I:
Overview of Parallel Database Appeal

6 Motivations for Parallel Databases -- II:
Inadequacies with Current Mainframe Solutions

7 Motivations for Parallel Databases -- III:
Commercial versus Scientific Applications

8 Motivations for Parallel Databases -- IV:
Market Demand from Competitiveness

9 Application Areas for Parallel Database: commercial, administration, scientific
10 General Classes of Commercial Applications
11 An Application Example ---
Intelligent Business systems
1)Objectives

12 Intelligent Business systems ---
2)Typical Questions

13 Intelligent Business systems ---
3) Major Technology Challenges

14 Intelligent Business systems ---
4) Solutions

15 Major Software and Hardware vendors in Parallel Database Technology
16 Some Current Major Commercial Users
17 Parallel Database Technology and Theory
18 Hardware architectures for parallel DBMS
-- Generic System

19 Hardware Architectures and forms of Parallelism
20 Three (Hardware) architectures for parallel DBMS
21 Notes on Shared Nothing Architecture
22 Shared-data Architecture
23 Shared Data Architecture
24 Note on Oracle nCUBE2 Hybrid Architecture
25 Topological view of MP machines (with ÒScalabilityÓ in mind)
26 Dataflow perspective
27 Parallelisms in parallel database systems
28 Data Partitioning --- How to divide data among multiple disks ?
29 Basic Data Partitioning Schemes
30 Major Approaches in Data Partitioning
31 Danger of Data Skew arises on
Shared Nothing with Data Partitioning

32 Pitfalls in data partitioning
33 Performance Metrics In Parallel Database Systems
34 Performance barriers
35 Some basic terminology for relational database model
36 Examples of Typical Relational Operations
37 Join Operation: a SELECT operation that combines rows from two or more tables. Each returned row contains data from more than one table
38 Overview of Structure Query Language (SQL)
39 Features of Structure Query Language (SQL)
40 Major RDBMS functionality
41 What is a 3GL or 4GL?
42 What is PL/SQL -- I ?
43 What is PL/SQL -- II?
44 What is PL/SQL -- III?
45 What is Data Integrity?
46 What are Schemes?
47 What are Roles?
48 What are Profiles and Auditing?
49 What are Two-phase Database Schema?
50 Example: Relational Joins
51 General Structure of Parallel and Sequential Relational Joins
52 Parallel Algorithm for Relational Joins
53 Parallel Database Software Architecture
Distributed Lock Manager

54 Parallel Database Software Architecture

This table of Contents Abstract



HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 1 Parallel Relational Database Management Systems -- I

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Click here for subtitle
Gang Cheng, Marek Podgorny
April 1995
Northeast Parallel Architectures Center
at Syracuse University

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 2 Abstract of Parallel Relational Database Management Systems -- I

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
This presentation contains the first two sections
  • Parallel Database Technology in Commercial Applications and Industry
  • Parallel Database Technology and Theory
Of the full CPS616 Parallel Database Module
The first section sets the scene by motivating the need for paraalel databases while
The second section reviews both Sequentional and Parallel Relational Databases looking at explicit examples
  • nCUBE and SP2 with Oracle and DB2
We also discuss database system architectures and review
The SQL Query language

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 3 Outline of Full Database Presentation

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
This Presentation has five Sections
Parallel Database Technology in Commercial Applications and Industry
Parallel Database Technology and Theory
Parallel Database Projects at NPAC
Parallel Oracle7 RDBMS -- A Case Study
Parallel Database Benchmarking

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 4 Section I:
Parallel Database Technology in Commercial Applications and Industry

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 5 Motivations for Parallel Databases -- I:
Overview of Parallel Database Appeal

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 6 Motivations for Parallel Databases -- II:
Inadequacies with Current Mainframe Solutions

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Conventional (mainframe) systems unable to support
  • time-critical queries
  • simultaneous OLTP and decision support systems
  • gigantic scientific and commercial databases
  • complex queries (approximate matching, pattern recognition ...)
  • sophisticated optimization as in market segmentation

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 7 Motivations for Parallel Databases -- III:
Commercial versus Scientific Applications

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
In commercial business world, MPP systems
  • Can find more profitable applications than the scientific world,
  • As SQL intrinsically parallel, one needs only parallelize the database system and not the many SQL applications
    • in contrast for scientific computing, one needs to both produce a parallel compiler and parallelize each application at language and algorithm level
  • Have better cost/performance than the mainframe system
  • Are mature and highly available NOW

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 8 Motivations for Parallel Databases -- IV:
Market Demand from Competitiveness

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 9 Application Areas for Parallel Database: commercial, administration, scientific

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
High transaction rate applications
  • banks, airlines, telecommunication, security brokers, retailers ...
Real-time applications
  • stock trading, C3I (military, civil defense), air traffic, process control (manufacturing, accelerators, reactors/generators), real-time insurance claim analysis ...
Complex queries
  • decision support (market analysis), molecular biology, chemistry/pharmacology ...
Massive amount of data
  • EOS, HEP, weather, census, global environment models, oil exploration, medical history files, multimedia support ...

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 10 General Classes of Commercial Applications

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
OLTP (On Line Transaction Processing) --- consists of performing many small independent requests submitted by many clients simultaneously and operating
  • Performance needed is linear in database size
DSS (Decision Support System)
  • Performance needed can be exponential in database size for sophisticated optimization or queries
Data Warehouse --- operations mixing DSS and OLTP
Data mining --- uncover/extract meaningful information from massive data sources

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 11 An Application Example ---
Intelligent Business systems
1)Objectives

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
understand customer buying habits and product preference
predict market tendency
make efficient, productive and profitable decisions
must be based on answers to questions on ALL historical business transaction data (Lots of !)

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 12 Intelligent Business systems ---
2)Typical Questions

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
ÒGive me a breakdown of all customers likely to default in the coming yearÓ
ÒDo three things: predict year-end demand, tell me which customers will fuel that demand, and tell me whyÓ
ÒDefine a highly concentrated micromarket (make special offers to customer segments that meet very select criteria) within my databaseÓ
ÒCreate a model that really explains why some customers renew their subscriptions and others donÕtÓ
ÒSuggest ways of regrouping our customers into new market segments (for Direct-Marketing)Ó
ÒFind some customer-preference pattern I might not be aware ofÓ

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 13 Intelligent Business systems ---
3) Major Technology Challenges

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Lots of Historical Data (and they are growing daily !)
High Performance Computing System and DBMS
Intelligent Data Analysis (Optimization)
Not possible on a mainframe system in terms of both performance and capacity

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 14 Intelligent Business systems ---
4) Solutions

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
MPP system
Parallel Relational Database Management System
Intelligent Datamining Algorithms
  • Expert Systems
  • Fractal Compression
  • Neural Networks
  • Modelling
  • Genetic Algorithms

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 15 Major Software and Hardware vendors in Parallel Database Technology

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Oracle (Oracle7 with parallel server and parallel query options)
Sybase (Navigation Server on AT&T 3600, SUN SMPs, and IBM SP2 (planned),SQL Server 10)
Informix (INFORMIX-OnLine Dynamic Server)
IBM (Oracle7, DB2/6000, DB2/2, SP2, IBM 390 Parallel Sysplex)
SGI (Oracle7 on SMP Challenge database server)
Taradata with AT&T and NCR
Tandem (Himalaya K10000
Cray Research (SuperServer)
SUN (Oracle7, DB2 on SUNÕs SMP)
AT&T and NCR
TMC (CM5, Darwin, Oracle7, Decision/SQL, Parasort)
KSR (Query Decomposor, Oracle7)
Amdahl (Oracle7)
nCUBE (Oracle7, nCUBE2)
data-CACHE Corp (SQL dataMANAGER)
Data General Corp (Oracle7, DSO)
DEC (POLYCENTER Manager)
HP (Oracle7, IMAGE/SQL, HP 3000)
Encore Computer Corp (Oracle7 on Infinity 90)

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 16 Some Current Major Commercial Users

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Kmart
American Express
AAA Airline
Citibank
Prudential Securities
Bell Atlantic Corp
BellSouth
BT (British Telecom)
Criminal Investigative Technology Inc.
Check http://greatwall.npac.syr.edu:1963 for more info about parallel database technology in industry

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 17 Parallel Database Technology and Theory

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 18 Hardware architectures for parallel DBMS
-- Generic System

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * Critical Information in IMAGE
Full HTML Index
A typical hardware setup is:

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 19 Hardware Architectures and forms of Parallelism

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * Critical Information in IMAGE
Full HTML Index

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 20 Three (Hardware) architectures for parallel DBMS

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
shared-memory (also called Symmetric Multiprocessors SMP)
  • all processors share direct access to a common global memory and to all disks. example: Oracle7 on SGI Challenge
shared-disks
  • each processor has a private memory but has direct access to all disk. example: Oracle 7 on IBM SP2
shared-nothing
  • each memory and disk is owned by some processor that acts as a server for that data. Mass storage is distributed among processors by connecting one or more disk to each processor. example: DB2 on SP2
  • This is Òowner computeÓ rule as in much scientific computing (HPF default implies processor where variable ÒstoredÓ performs computations setting variableÕs value)

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 21 Notes on Shared Nothing Architecture

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
On a shared-nothing architecture, one or more disks are connectedto each processor, depending on machineÕs (parallel) I/O architecture, via direct connecting or some I/O channels.
The shared-nothing does not focus on the physical I/O connection architectures, rather, it refers tohow the data is partitioned on the disk arrays and how the data are movedinto the procesor buffers for the parallel query processing.
In this architecture, data locations determines where the data will be processed and how data is shared by processors other than its local processor (ie. the processor which has the local disk holding the data if direct connection like DB2 on SP2, or the shortest path from disks to the process if other I/O architectures like Oracle on ncube).
Unlike shared-disk where data may be shared via the interconnection network when it is first readfrom a disk to a remote processor (buffer), if the data local to processorA is required by a remote processor B to perfom some parallel query processing, A sends the data to B via the communication network.
That is the only shared media for shared-nothing is the communication network.
In most case, data placement has determined how and where (even when) a parallel query processing is decomposed or partitioned.
Data (or I/O) load balance determines the CPU load balance in the system.
This is built in the query decomposer. eg.
if Processor A gets 10% data local to its disk, while B gets remaining 90%.
Then in a shared-nothing system, A spends 10% CPU and then is idle while B takes most of CPU time.
On the otherhand for a shared-disk system, A and B will process similiar amount of data with some overhead of sending data from B to A.

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 22 Shared-data Architecture

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Combine features of the above three models, data sharing technology directly interfaces with the individual processors to enable each processor to cache shared data in its own local memory with full read/write capability (as in shared-memory model) Data sharing maintains data integrity by ensuring cache coherency across all processors and provide efficient cross-system locking mechanisms (as in shared-nothing model)All disk devices are accessible to each processor so that a workload may be scheduled to run on any available processor (as in shared-disks model), instead of being run only on the processors with data affinity (as in shared-nothing model). Example of shared-data system: DB2 on IBM System/390 Parallel Sysplex.

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 23 Shared Data Architecture

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * Critical Information in IMAGE
Full HTML Index

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 24 Note on Oracle nCUBE2 Hybrid Architecture

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
It is difficult to characterise the Oracle7 on ncube2 as it mixed both shared-disk and shared-nothing architectures features by using an additonal subcube as a giga-cache which lies between ncube I/O system (including I/O nodes, multiple I/O channels and disks drives) and the compute-node subcube(s).
Before data is read into the buffer in compute-processor, they first are cached in a Giga-cache node.
So if you look just at the compute-subcube <--> Giga-cache subcube, it looks likea shared-nothing system,
But if you look at the whole compute-subcube <--> Giga-cache subcube <--> disk-arrays, it looks like a shared-disk system.
Strictly speaking and compared to DB2 on SP2, it is not a shared-nothing system as the data-placement on disk-arrays has little to do with how query processing is paralleliazed or decomposed.
This is also the reason why we found data partition schemes have less I/O performance impact for Oracle7 on nucbe than that on SP2 (the latter is a shared-disk system).

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 25 Topological view of MP machines (with ÒScalabilityÓ in mind)

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Shared-Memory and Shared-Disk
  • interference limits scalability
  • complicated and relatively expensive
Shared Nothing
  • minimal interference between processors
  • moves only questions and answers over the network
  • can scale up to hundreds (maybe thousands) of processors
  • composed of cheap, stock parts and a fast interconnecting network
  • achieve near-linear improvements
  • good for system scalability and portability
Shared Data
  • Little experience so far

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 26 Dataflow perspective

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Pipelined: Òassembly lineÓ
  • average query does not have that many steps
  • some operational stages cannot be pipelined (e.g. Sort)
  • one operator is longer than the other (a kind of skew)
  • This is task parallelism in scientific computing
Partitioned: Òdivide and conquerÓ
  • fairly straight forward to implement
  • ÒtoughÓ operations are divided, parts run in parallel
  • This is classic data parallelism in scientific computing
Combination: Òbest of both worldÓ
  • several partitions running parallel
  • each partition is, where possible, a short pipeline

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 27 Parallelisms in parallel database systems

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
pipelined parallelism (inter-operator parallelism)
query parallelism
  • inter-query parallelism
  • intra-operator parallelism
partitioned (declustered) parallelism (I/O parallelism)
transaction parallelism

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 28 Data Partitioning --- How to divide data among multiple disks ?

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
a common practice on large number of disks and mass storage systems (file striping)
high I/O bandwidth of multiple disks and parallel I/O
no need for specialized hardware
important for query load balance on a shared-nothing system
this is of course a central problem in scientific computing and much of High Performance Fortran is devoted to data partitioning

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 29 Basic Data Partitioning Schemes

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * Critical Information in IMAGE
Full HTML Index

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 30 Major Approaches in Data Partitioning

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Round Robin ( cyclic in HPF/Scientific Computing Notation)
  • data scattered
  • cannot locate specific records
Hashing ( scattered decomposition in HPF style)
  • scatters the data
  • can locate specific records
  • danger of data skew
Range Partitioning (block cyclic in HPF style)
  • data not scattered
  • can locate specific records
  • danger of data skew
  • ÒrelatedÓ data can be clustered

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 31 Danger of Data Skew arises on
Shared Nothing with Data Partitioning

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * Critical Information in IMAGE
Full HTML Index

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 32 Pitfalls in data partitioning

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
operations/application dependent
danger of data skew arises on shared-nothing with data partitioning
data skew is the focus of a number of research projects
automatic OS file striping vs manual (application-specific) table striping

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 33 Performance Metrics In Parallel Database Systems

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
speedup (degree of parallelism)
  • (fixed problem size)
scaleup (scalability) -- transaction scaleup, batch scaleup, processor scaleup, I/O scaleup
  • (problem size increases)
sizeup (both)
  • (scaled speedup in scientific computing increases problem and computer size by same factor)

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 34 Performance barriers

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
startup (latency)
interference (communication, locking)
load balance (execution skew - all the execution occurs in one partition)
data-partition (data skew - all the data is placed in one partition)
imbalance of system resources (CPUs and I/O channels)
CPU-intensive or I/O intensive queries

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 35 Some basic terminology for relational database model

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Data Structure
  • relations (files, tables)
  • tuples (records, rows)
  • attributes (fields, columns)
Relation operators
  • scan (select-project) (a relation, a predicate, and an attribute list)
  • sort (reorder)
  • aggregate operators (SUM,AVG,MAX,MIN,...)
  • insert/delete/update
  • set operators (union, intersection, difference)
  • join, merge and division
embedded operators
  • uniformity of the data and operators
  • source of data-flow execution model

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 36 Examples of Typical Relational Operations

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * Critical Information in IMAGE
Full HTML Index

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 37 Join Operation: a SELECT operation that combines rows from two or more tables. Each returned row contains data from more than one table

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * Critical Information in IMAGE
Full HTML Index
SQL: Select A.A1,A.A3,B.B1,B.B2 from A,B where A.A1=B.B1;

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 38 Overview of Structure Query Language (SQL)

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
a database language specifies the semantics of various components of a DBMS: structures and operation of a data model, data definition, data access, security, programming language interface, and data administration
Industry accepted Standard, first introduced by ANSI in 1986, current SQL92 by ANSI and ISO, new standard SQL3 with enhancements in object-oriented data management is undergoing. Portable to all RDBMS systems.
built on relational model, simple and powerful
non-procedural 4GL language, only specify Òwhat-to-doÓ, not Òhow-to-doÓ , extended to object-oriented programming interface
  • this extended model competes with fledging object-oriented database in industry

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 39 Features of Structure Query Language (SQL)

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
applications in SQL do not need to be rewritten to run on parallel server (all parallelizations of SQL programs are carried out by the server), compared to scientific areas where both parallel application algorithms and a parallel compiler are required
data-independent --- two-phase database schema (logical and physical database)
SQL in Oracle 7
  • create tables in the database
  • store information in tables
  • select exactly the information you need from your database
  • make changes to your data and the structure of underlying tables, and
  • combine and calculate data to generate the information you need

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 40 Major RDBMS functionality

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Data Access --- SQL, Transactions, PL/SQL, Data Integrity
Data Concurrency and Consistency --- Concurrency, Locking
Data Security --- Users and Schemes, Roles and Privileges, Profiles and Auditing
Database Backup and Recovery --- Redo log, Rolling Forward/Back
Distributed Processing --- Client-Server environment

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 41 What is a 3GL or 4GL?

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Historians of programming languages and database people tend to classify programming languages by generations:
1GL --- machine language (in 0/1 binary)
2GL --- assembly language (in symbolic machine language)
3GL --- high-level programming language (structural, modular and usually procedural)
4GL --- non-procedural (declarative). Most AI languages are 4GL.

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 42 What is PL/SQL -- I ?

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
PL/SQL illustrates some low-level data access tools/languages that are available in most RDBMS.
Basically, there are three levels of database access interfaces:
1. highest level --- SQL
  • Non-procedure language, only specify what-to-do, not how-to-do.
  • So-called fouth-generation language (4GL). no control structures(loop,if)
  • No module (block) structure, very expressive or declarative.
  • One Other Language example in this category is Prolog.

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 43 What is PL/SQL -- II?

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
2. Middle-level --- PL/SQL
  • fill the gap between the highest SQL level and lowest C level programming.
  • It is basically a procedual implementation of SQL access.
  • Pure SQL is a subset of PL/SQL where more C-like constructs are introduced
  • Such as block-structure,control statments, variables,module,interface specification,functions, etc).
  • But it doesnt have the rich data-types and low-level operators as in C such as
  • Structure(record),enumeration,array, pointer(reference), and set datatypes, and bitwise-operator,I/O and file handling.

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 44 What is PL/SQL -- III?

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
3. Embedded SQL in C (or f77,Cobol) --- programming language interfaces
  • Procedural high-level programming langauges (often called 3GL).
  • All capability and flexibility in C-level programming as a host language to embed the SQL statement as the data-access for querying the database engine.
  • Poweful in programming but complex sematics and inconvenient interface to SQL.
  • Usually you need to know many programming/libary details to interface the both in programming.
  • Good for pre-/post-processing to the query but bad for query programming.

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 45 What is Data Integrity?

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
A way to enforce error-checking and formating in data-types. Usually the rules for data integrity is created when a table is specified. Examples:
  • Null -- a data field may or may not be allowed to be Null.
  • Unique -- a data field may or may not be allowed to be uniquely identified among all the other raws in a table
  • Primary key -- a data field being a primary key must be unique
  • Referential integrity -- inserting or updating a fieldÕs value on table A must depend on if it matches the value of a field on another table B.
Data integrity is the major way to enforce the data-consistency in the database.

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 46 What are Schemes?

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Schemes are used to define scope of different data objects to enforce data secuity and naming issues in a DBMS.
A Scheme in a DB is a logical hierarchy of data storage structures.
Major scheme objects are: users, tablespaces, tables, views, index, cluster, synonyms, columns.
Access to Different Schemes require different roles and privileges for different users.

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 47 What are Roles?

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
A way to classify level of database users. Like sysadm usually has superuser privileges, roles are used to group users with different privileges.
Because a DBMS system has many levels of operations which set many levels(kinds) of operation privilages,
Role is another logical level of grouping to define/specify who will have what system privileges.
It makes system management easier and more flexible.

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 48 What are Profiles and Auditing?

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Profiles and Auditing are the way to monitor (system logging) and record user database actions
They are used for:
  • investigating suspicious activity and .
  • gathering data about specific activity for performance tuning and system statistics.
These are just a DB way of saying logging.
As in other cases, Database field sometimes use its own different terms for the same thing in other fields.

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 49 What are Two-phase Database Schema?

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
For a RDBMS there are two levels of abstractions of how data being stored and represented in the system:
1) Logical database consists of the conceptual view of a DB system which is described by an ER (entity-relationship) model, defined and access by SQL and represented by tables,columns,views and other abstractions of data-object;
2) The physical database consists of the actual phyical view of the DB which is represented by files,blocks,indexes,clusters,partitions,etc.
End-user and developers only need to deal with the logical level,
While the DBMS and a DBA (database administrator) define and perform the mapping between the two levels.
This is the reason why SQL achieves portability (oompared to f77), from a viewpoint of a data independent model.

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 50 Example: Relational Joins

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * Critical Information in IMAGE
Full HTML Index

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 51 General Structure of Parallel and Sequential Relational Joins

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * Critical Information in IMAGE
Full HTML Index

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 52 Parallel Algorithm for Relational Joins

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * Critical Information in IMAGE
Full HTML Index

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 53 Parallel Database Software Architecture
Distributed Lock Manager

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Implementation of Parallel Cache Management
Support transaction parallelism of multiple OLTP
A simple and effective approach to port sequential RDBMS to MPP and loosely coupled systems with shared-nothing architecture
support parallel loading, parallel indexing, parallel insert/update and parallel recovery
Major functionality
  • keep track of the current ÒownshipÓ of a resource
  • accepts requests for resources from application processes
  • notifies the requesting process when a resource is available
  • get exclusive access to a resource for a resource
Note this can work fine in OLTP with many uncorrelated queries but will not work in scientific computing where ALL updates are correclated and reserving a resource introduces a sequential bottleneck

HELP! * GREY=local HTML version of LOCAL Foils prepared July 6,1995

Foil 54 Parallel Database Software Architecture

From General Discussion of Parallel Relational Databases CPS600 Spring Semester -- April 1995. * See also color IMAGE
Full HTML Index
Parallel Query Optimizer
  • Parallel Query Optimizer builds a query execution plan employing all system resources on the basis of data-partition knowledge, it can easily employ few tens of processors to execution a single query with high degree of parallelism. Both data-parallel and data-flow paradigms are used to achieve this goal
  • Support complex CPU-intensive or I/O-intensive queries for DSS
  • near-linear or even super-linear speedup can be achieved (due to larger effective caches)
  • Major focus for research --- optimized parallel algorithms needed
  • carried out by a query coordinator on the RDBMS server (the query coordinator generates the optimal parallel plan rather than having the best serial plan simply executed in parallel. This optimization is performed by the coordinator without the need for any external information (DB2 and Oracle7 use different approaches)
  • most relational operators are parallizable, including full-table scan, sort, aggregate operators, join, merge
  • System/Server tuning plays an important role in achieving optimized performance in a parallel RDBMS --- benchmarking is an effective means of performing tuning

Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Tue Feb 18 1997