Parallel And Distributed Database II

Welcome to class!

In today’s class, we will be talking more about parallel and distributed database. Enjoy the class!

Parallel and Distributed Database II

Distributed Database

A Distributed database (DDB) is a collection of multiple, logical interrelated database distributed over a computer network.

A Distributed database management system (DDBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users. A distributed database system is a system that permits physical data storage across several sites and each site/node is managed by a DBMS that is capable of running independently of the other sites. It is a database in which storage devices are not all attached to a common processing unit as the CPU, controlled by a distributed database management system. It may be stored in multiple computers, located in the same physical location; or may be dispersed over a network of interconnected computers. System administrators can distribute collections of data (e. g in a database) across multiple physical locations. A distributed database can reside on network servers on the internet, on corporate intranets, or other company networks.

Two processing ensure that the distributed database remain up- to date and current:

Replication: involves using specialized software that looks for changes in the distributed database. Once the changes have been identified, the replication process makes all the databases look the same. The replication process can be complex and time –consuming depending on the size and number of the distributed databases
Duplication: This process has less complexity, it basically identifies one database as a master and then duplicates that database. The duplication process is normally done at a set time hour. This is to ensure that each distributed location has the same data. In the duplication process, users may change only the master database, which ensures that local data will not be overwritten.

A Distributed Database management system is designed for heterogeneous database platforms that focus on heterogeneous database management systems. The following property is considered desirable:

Distributed Data Independence: Users should be able to ask queries without specifying where the referenced relations or copies or fragments of the relations are located.
Distributed Transaction Atomicity: User should be able to write transactions that access and update at several sites just as they would write transactions over purely local data

Types of distributed database

There are two major types of distributed database systems: they are:

Homogenous distributed database
Heterogeneous distributed database.

Homogenous distributed database:

The following conditions must be satisfied for the homogeneous database:
The operating system use, at each location, must be the same.
the operating system, must, data structures and database application used at each location must be same or compatible.

Heterogeneous distributed database:

The following conditions must be satisfied for the heterogeneous database:

Different sites may use different schema and software.
In heterogeneous systems, different nodes may have different hardware, software and data structure at various nodes or locations.

Evaluation

Explain the distributed database
List and explain two major types of distributed database.

Architectures of distributed database systems

The three major distributed DBMS architectures are:

Client-Server
Collaborating Server
Middleware

Client-Server Architecture: In this architecture, the Client (front end) does data presentation or processing, while the Server (back- end) does storage, security and major data processing. The client is held responsible for user-interface issues and servers manage data and execute transactions. A client-server system has one or more client processes and one or more server processes, and a client process can send a query to any one server process. Thus a client process could run on a personal computer and send queries to a server running on a mainframe.

Clients characteristics

Always initiate requests to servers.
Waits for replies.
Receives replies.
Usually connects to a small number of servers at one time.

Servers characteristics

Always wait for a request from one of the clients
Servers client request then replies with requested data to the clients
A server may communicate with other servers to serve a client request.
A server is a source which sends a request to the client to get the needed data users.

Advantages of client-server architecture

Very easy to implement because of its clear separation of functionally and a centralized server.
Allow user to run a graphical user interface.
It enables the roles and responsibilities of a computing system to be distributed among several independent computers known to each other only through the network. It also provides greater ease of maintenance.
Servers provide better security control access and resources to guarantee that only those clients with the appropriate permissions may access and change data.
Since data storage is centralized, updates to that data are much easier for administrators.
Many advanced client-server technologies are designed to ensure security, user-friendly interfaces and ease of use.
It works with multiple different clients of different specifications.

Disadvantages of client-server

The client-Server architecture does not permit a single query to span multiple servers.
Some times to separate and distinguish between clients and server architecture become harder.
The problem of overlapping, the client process and the server.
Networks traffic blocking is one of the problems related to the client-server model.

Collaborating server system: This is a collection of database servers, each capable of running transactions against local data, which cooperatively execute transactions spanning multiple servers. This overcomes the problem of client-server architecture.
Middleware architecture: All web transactions take place on the servers. The web server is responsible for communicating with the browser while the database server is responsible for storing the required information.

Advantages of distributed databases

Data is stored at many sites, also referred to as nodes.
The processors at nodes are interconnected by a computer network rather than a multiprocessor configuration.
The distributed database is indeed a true database, not a collection of files that can be stored individually at each node.
The overall system has the full functionality of a database management system.
Reliable transactions due to the replication of database
Hardware, operating system, network, fragmentation, DBMS, replication and location independence.
Continuous operation, even if some nodes go offline.
Distributed query processes can improve performance.
Easier expansion.
Local autonomy of site autonomy: a department can control the data about them.
Protection of valuable data if there is a fire outbreak as a result of the distributed data in multiple sites.
Modularity systems can be modified added and removed fro the distributed database without affecting other systems or modules.
It is very economical.

Disadvantages of distributed databases

Data integrity is difficult to maintain.
Distributed data is very complex in nature. For example, extra work must be done to maintain multiple disparate systems, instead of one big one.
It is not really economical because a more extensive infrastructure implies extra labour costs.
Absence of standards right.
Additional software is needed.
Complexity in database design.
The operating system should support a distributed environment.

Storing data in DDBS

Data storage in a distributed database involve two concepts

Fragmentation
Replication
Fragmentation: This is a process of splitting a relation into smaller relation or fragments, and storing the fragment possibly at different sites. In horizontal fragmentation, each fragment consists of a subset of rows of the original relation. While in vertical fragmentation, each fragment consists of a subset of columns of the original relations.
Replication: This means that several copies of a relation or relation fragment can be stored. An entire relation can be replicated at one or more sites. Similarly, one or more fragments of a relation can be replicated at other sites. For example, if a relation R is fragmented into R1, R2 and R3, there might be just one copy of R1, whereas R2 is replicated at two other sites and R3 is replicated at all sites.

Parallel DBMS against distributed DBMS

Parallel Distributed System: seeks to improve performance through parallelization of various operations, such as data loading, index building and query evaluating.
Distributed Database System: Data is physically stored across several sites, and each site is typically managed by a DBMS capable of running independently of the other sites. The distribution of data is governed by factors such as local ownership and increased availability.

System component:

Distributed DBMS consists of many Geo-distributed, low –bandwidth link connected, autonomic site. While parallel DBMS consists of tightly coupled, high- bandwidth link connected, non- autonomic node.

Component role:

Sites in distributed DBMS can work independently to handle local transaction or work together to handle global transactions. While nodes in parallel DBMS can only work together to handle global transactions.

Design purposes:

Distributed DBMS is for sharing data, local autonomy, high availability, while parallel DBMSA is for high-performance high availability.

General evaluation

Explain the distributed database
List and explain two major types of distributed database.
State the two concepts of data storage in a distributed database.

In our next class, we will be talking about Networking. We hope you enjoyed the class.

Should you have any further question, feel free to ask in the comment section below and trust us to respond as soon as possible.

Parallel And Distributed Database II