Page 1 :
Mapping the data warehouse architecture to, Multiprocessor architecture, 1. Relational data base technology for data warehouse, Linear Speed up: refers the ability to increase the number of processor to reduce, response time, Linear Scale up: refers the ability to provide same performance on the same requests as, the database size increases, , 1.1 Types of parallelism, I., , Inter query Parallelism:, In which different server threads or processes handle multiple requests at the same, time., , II., , Intra query Parallelism:, This form of parallelism decomposes the serial SQL query into lower level operations, such as scan, join, sort etc. Then these lower level operations are executed concurrently, in parallel., - Intra query parallelism can be done in either of two ways:, Horizontal parallelism: which means that the data base is partitioned across multiple, disks and parallel processing occurs within a specific task that is performed concurrently, on different processors against different set of data, Vertical parallelism: This occurs among different tasks. All query components such as, scan, join, sort etc are executed in parallel in a pipelined fashion. In other words, an, output from one task becomes an input into another task., , III., , Data partitioning:, , Data partitioning is the key component for effective parallel execution of data base, operations. Partition can be done randomly or intelligently., Random portioning: Includes random data striping across multiple disks on a, single server., Another option for random portioning is round robin fashion partitioning in, which each record is placed on the next disk assigned to the data base., Intelligent partitioning: Assumes that DBMS knows where a specific record is, located and does not waste time searching for it across all disks. The various, intelligent partitioning include:
Page 2 :
Hash partitioning: A hash algorithm is used to calculate the partition number, based on the value of the partitioning key for each row, Key range partitioning: Rows are placed and located in the partitions according, to the value of the partitioning key. That is all the rows with the key value from A, to K are in partition 1, L to T are in partition 2 and so on., Schema portioning: an entire table is placed on one disk; another table is placed, on different disk etc. This is useful for small reference tables., User defined portioning: It allows a table to be partitioned on the basis of a user, defined expression., , 2. Data base architectures of parallel processing, There are three DBMS software architecture styles for parallel processing:, Shared memory or shared everything Architecture, Shared disk architecture, Shred nothing architecture
Page 3 :
2.1 Shared Memory (or Shared Everything) Architecture:, It has the following characteristics, , , , , , Multiple PUs share memory., Each PU has full access to all shared memory through a common bus., Communication between nodes occurs via shared memory., Performance is limited by the bandwidth of the memory bus., It is simple to implement and provide a single system image, implementing an, RDBMS on, SMP(symmetric multiprocessor), Disadvantage, Scalability is limited by bus bandwidth and latency, and by available memory., , 2.2 Shared Disk Architecture, Shared disk systems are typically loosely coupled. Such systems, illustrated in following, figure, have the following characteristics:
Page 4 :
, , , , , , Each node consists of one or more PUs and associated memory., Memory is not shared between nodes., Communication occurs over a common high-speed bus., Each node has access to the same disks and other resources., Bandwidth of the high-speed bus limits the number of nodes (scalability) of the, system., The Distributed Lock Manager (DLM) is required., Advantages, Shared disk systems permit high availability. All data is accessible even if one, node dies., These systems have the concept of one database, which is an advantage over, shared nothing systems., Shared disk systems provide for incremental growth., Disadvantages, Inter-node synchronization is required, involving DLM overhead and greater, dependency on high-speed interconnect., If the workload is not partitioned well, there may be high synchronization, overhead.
Page 5 :
2.2 Shared Nothing Architecture (Distributed), Shared nothing systems are typically loosely coupled. In shared nothing systems only, one CPU is connected to a given disk. If a table or database is located on that disk, Shared nothing systems are concerned with access to disks, not access to memory., Adding more PUs and disks can improve scale up., Advantages for parallel processing, , , , , , Shared nothing systems provide for incremental growth., System growth is practically unlimited., MPPs are good for read-only databases and decision support applications., Failure is local: if one node fails, the others stay up., , Disadvantages for parallel processing, More coordination is required., More overhead is required for a process working on a disk belonging to another, node.