High transaction rate applications
-
banks, airlines, telecommunication, security brokers, retailers ...
|
Real-time applications
-
stock trading, C3I (military, civil defense), air traffic, process control (manufacturing, accelerators, reactors/generators), real-time insurance claim analysis ...
|
Complex queries
-
decision support (market analysis), molecular biology, chemistry/pharmacology ...
|
Massive amount of data
-
EOS, HEP, weather, census, global environment models, oil exploration, medical history files, multimedia support ...
|
Oracle (Oracle7 with parallel server and parallel query options)
|
Sybase (Navigation Server on AT&T 3600, SUN SMPs, and IBM SP2 (planned),SQL Server 10)
|
Informix (INFORMIX-OnLine Dynamic Server)
|
IBM (Oracle7, DB2/6000, DB2/2, SP2, IBM 390 Parallel Sysplex)
|
SGI (Oracle7 on SMP Challenge database server)
|
Taradata with AT&T and NCR
|
Tandem (Himalaya K10000
|
Cray Research (SuperServer)
|
SUN (Oracle7, DB2 on SUNÕs SMP)
|
AT&T and NCR
|
TMC (CM5, Darwin, Oracle7, Decision/SQL, Parasort)
|
KSR (Query Decomposor, Oracle7)
|
Amdahl (Oracle7)
|
nCUBE (Oracle7, nCUBE2)
|
data-CACHE Corp (SQL dataMANAGER)
|
Data General Corp (Oracle7, DSO)
|
DEC (POLYCENTER Manager)
|
HP (Oracle7, IMAGE/SQL, HP 3000)
|
Encore Computer Corp (Oracle7 on Infinity 90)
|
On a shared-nothing architecture, one or more disks are connectedto each processor, depending on machineÕs (parallel) I/O architecture, via direct connecting or some I/O channels.
|
The shared-nothing does not focus on the physical I/O connection architectures, rather, it refers tohow the data is partitioned on the disk arrays and how the data are movedinto the procesor buffers for the parallel query processing.
|
In this architecture, data locations determines where the data will be processed and how data is shared by processors other than its local processor (ie. the processor which has the local disk holding the data if direct connection like DB2 on SP2, or the shortest path from disks to the process if other I/O architectures like Oracle on ncube).
|
Unlike shared-disk where data may be shared via the interconnection network when it is first readfrom a disk to a remote processor (buffer), if the data local to processorA is required by a remote processor B to perfom some parallel query processing, A sends the data to B via the communication network.
|
That is the only shared media for shared-nothing is the communication network.
|
In most case, data placement has determined how and where (even when) a parallel query processing is decomposed or partitioned.
|
Data (or I/O) load balance determines the CPU load balance in the system.
|
This is built in the query decomposer. eg.
|
if Processor A gets 10% data local to its disk, while B gets remaining 90%.
|
Then in a shared-nothing system, A spends 10% CPU and then is idle while B takes most of CPU time.
|
On the otherhand for a shared-disk system, A and B will process similiar amount of data with some overhead of sending data from B to A.
|
It is difficult to characterise the Oracle7 on ncube2 as it mixed both shared-disk and shared-nothing architectures features by using an additonal subcube as a giga-cache which lies between ncube I/O system (including I/O nodes, multiple I/O channels and disks drives) and the compute-node subcube(s).
|
Before data is read into the buffer in compute-processor, they first are cached in a Giga-cache node.
|
So if you look just at the compute-subcube <--> Giga-cache subcube, it looks likea shared-nothing system,
|
But if you look at the whole compute-subcube <--> Giga-cache subcube <--> disk-arrays, it looks like a shared-disk system.
|
Strictly speaking and compared to DB2 on SP2, it is not a shared-nothing system as the data-placement on disk-arrays has little to do with how query processing is paralleliazed or decomposed.
|
This is also the reason why we found data partition schemes have less I/O performance impact for Oracle7 on nucbe than that on SP2 (the latter is a shared-disk system).
|
Data Structure
-
relations (files, tables)
-
tuples (records, rows)
-
attributes (fields, columns)
|
Relation operators
-
scan (select-project) (a relation, a predicate, and an attribute list)
-
sort (reorder)
-
aggregate operators (SUM,AVG,MAX,MIN,...)
-
insert/delete/update
-
set operators (union, intersection, difference)
-
join, merge and division
|
embedded operators
-
uniformity of the data and operators
-
source of data-flow execution model
|
a database language specifies the semantics of various components of a DBMS: structures and operation of a data model, data definition, data access, security, programming language interface, and data administration
|
Industry accepted Standard, first introduced by ANSI in 1986, current SQL92 by ANSI and ISO, new standard SQL3 with enhancements in object-oriented data management is undergoing. Portable to all RDBMS systems.
|
built on relational model, simple and powerful
|
non-procedural 4GL language, only specify Òwhat-to-doÓ, not Òhow-to-doÓ , extended to object-oriented programming interface
-
this extended model competes with fledging object-oriented database in industry
|
Data Access --- SQL, Transactions, PL/SQL, Data Integrity
|
Data Concurrency and Consistency --- Concurrency, Locking
|
Data Security --- Users and Schemes, Roles and Privileges, Profiles and Auditing
|
Database Backup and Recovery --- Redo log, Rolling Forward/Back
|
Distributed Processing --- Client-Server environment
|
For a RDBMS there are two levels of abstractions of how data being stored and represented in the system:
|
1) Logical database consists of the conceptual view of a DB system which is described by an ER (entity-relationship) model, defined and access by SQL and represented by tables,columns,views and other abstractions of data-object;
|
2) The physical database consists of the actual phyical view of the DB which is represented by files,blocks,indexes,clusters,partitions,etc.
|
End-user and developers only need to deal with the logical level,
|
While the DBMS and a DBA (database administrator) define and perform the mapping between the two levels.
|
This is the reason why SQL achieves portability (oompared to f77), from a viewpoint of a data independent model.
|
Implementation of Parallel Cache Management
|
Support transaction parallelism of multiple OLTP
|
A simple and effective approach to port sequential RDBMS to MPP and loosely coupled systems with shared-nothing architecture
|
support parallel loading, parallel indexing, parallel insert/update and parallel recovery
|
Major functionality
-
keep track of the current ÒownshipÓ of a resource
-
accepts requests for resources from application processes
-
notifies the requesting process when a resource is available
-
get exclusive access to a resource for a resource
|
Note this can work fine in OLTP with many uncorrelated queries but will not work in scientific computing where ALL updates are correclated and reserving a resource introduces a sequential bottleneck
|