Parallel and Distributed Database

 

Welcome to class! 

In today’s class, we will be talking about parallel and distributed database. Enjoy the class!

Parallel and Distributed Database

Parallel and Distributed Database | classnotes.ng

Parallel database:

Parallel Database improve processing and input/output speeds by using multiple CPU and disks in parallel. A Parallel Database system seeks to improve performance through parallelization of various operations, such as loading data, building indexes and evaluating queries. In Parallel processing, many operations are performed simultaneously, as opposed to serial processing, in which the computational steps are performed sequentially.

Organizations of every size benefit from databases because they improve the management of information. The database has a server, a specialized program that oversees all user requests.  Organization use parallel database approach for a large user base and millions of records to process. They are fast, flexible and reliable.

Architecture for parallel database

There are three main architectures for building parallel DBMS

  • Shared Memory
  • Shared Disk System
  • Shared Nothing System
  1. Shared Memory System: This is where multiple processors are attached to an interconnected network and access a common region of memory.

Advantages

  1. It is closer to a conventional machine and easy to program.
  2. Overhead is low.
  3. OS Services are leveraged to utilized the additional CPU

Disadvantages

  1. It leads to a bottleneck problem.
  2. Expensive to build.
  3. It is less sensitive to partitioning
 
  1. Shared disk system: where each processor has its own main memory, and direct access to all disks through an interconnected network.

Advantages

  1. The same with shared memory

Disadvantages

  1. More interference
  2. Increases N/ W bandwidth.
  3. The shared disk is less sensitive to partitioning.
 
  1. Shared nothing: This is where each processor has local main memory and disk space, but no two processors can access the same storage area and all communication between processor is through a network connection. It has its own mass storage as well as main memory.

Advantages

  1. It provides a linear scale-up and linear speedup.
  2. Shared nothing benefit from “good” partitioning.
  3. Cheap to build.

Disadvantages

  1. It is hard to program.
  2. Addition of new nodes requires reorganization.

Evaluation

  1. Define parallel database.
  2. Enumerate the three architectures for a database.
Parallel query evaluation

A relational query execution plan is a graph/ tree of relational algebra operators (based on this operators can execute in parallel) and the operators in a graph can be executed in parallel. If an operator consumes the output of a second operator, we have pipelined parallelism.

Data partitioning: In this case, a large database is partitioned horizontally across several disks, this enables us to exploit the I/O bandwidth of the disk by reading and writing them in parallel. This can be done in the following ways:

  1. Round-robin partitioning: If there are n processors, the 1st tuple is assigned to processor mod n round-robin partitioning. Round-robin partitioning is suitable for efficiently evaluating queries that access the entire relation. If only a subset of the tuples is required, hash partitioning and range partitioning are better than round-robin partitioning.
  2. Hash partitioning: A hash function is applied to (selected fields of) a tuple to determine its processor. Hash partitioning has the additional virtue that it keeps data evenly distributed even if the data grows and shrinks over time.
  3. Range partitioning: Tuples are sorted and ranges are chosen for the sort key values so that each range contains roughly the same number of tuples, tuples in range, I reassigned to processor i. Range Partitioning can lead to data skew.
Advantages of parallel databases

A parallel database runs on many computers at the same time.

  1. High Performances
  2. Speed
  3. Reliability
  4. Capacity
Disadvantages of Parallel database
  1. Implementation is highly expensive.
  2. Handling Parallel database simultaneously is difficult and complex.
  3. A lot of resources are needed to support and maintain the database.
General evaluation
  1. Define Parallel query evaluation.
  2. State three methods by which data can be partitioned.
  3. What are the advantages and disadvantages of a parallel database?

Reading assignment

Understanding Data Processing for senior secondary schools by Dinehin Victoria pages 269 – 271

Weekend assignment

  1. ………..system seeks to improve performance through parallelization of various operations. (a) Parallel database (b) distributed database (c) relational database (d) flat database.
  2. The architecture where multiple processors are attached to an interconnected network and access a common region of memory is called ………. (a) shared memory (b) shared disk system (c) share nothing(d) all of the above
  3. In ……….partitioning, tuples are sorted and ranges are chosen for the sort key values. (a) round-robin (b) hash       (c) range        (d) table
  4. ………..Partitioning is suitable for efficiently evaluating queries that access the entire relation. (a) range (b) round-robin   (c) hash   (d)  query
  5. The following are examples of parallel database except. (a) implementation is highly expensive  (b) speed   (c) reliability   (d)  capacity

Theory

  1. Define parallel database.
  2. Enumerate the three architectures for a database.
  3. State three methods data can be partitioned.
  4. What are the advantages and disadvantages of a parallel database?

 

In our next class, we will be talking more about Parallel And Distributed Database.  We hope you enjoyed the class.

Should you have any further question, feel free to ask in the comment section below and trust us to respond as soon as possible.

School Owner? Looking for ready-made content and tools to save time and grow easily? Book a free demo session now

Get more class notes, videos, homework help, exam practice on Android [DOWNLOAD]

Get more class notes, videos, homework help, exam practice on iPhone [DOWNLOAD]

2 thoughts on “Parallel and Distributed Database”

Leave a Reply

Your email address will not be published. Required fields are marked *

Don`t copy text!