A formal model for atomic commit protocols for a distributed database system is introduced. The model is used to prove existence results about resilient protocols for site failures that do not partition the network and then for partitioned networks. For site failures, a pessimistic recovery technique, called independent recovery, is introduced and the class of failures for which resilient protocols exist is identified. For partitioned networks, two cases are studied: the pessimistic case in which messages are lost, and the optimistic case in which no messages are lost. In all cases, fundamental limitations on the resiliency of protocols are derived.
Index Terms-Commit protocols, crash recovery, distributed database systems, distributed systems, fault tolerance, transaction management.
A formal model for transaction processing in a distributed database and then extend it to model several classes of failures and crash recovery techniques. These models are used to study whether or not resilient protocols exist for various failure classes.
Crash recovery in distributed systems has been studied extensively in the literature Many protocols have been designed that are resilient in some environments. All have an "ad hoc" flavor to them in the sense that the class of failures they will survive is not clearly delineated.
The purpose of this project is to formalize the crash recovery problem in a distributed database environment and then give some preliminary results concerning the existence of resilient protocols in various well-defined situations.
Consequently, in the next section we give a brief introduction to transactions in a distributed database.
In Section III we indicate the assumed network environment and our model for transaction processing. In Section IV we extend the model to include the possibility of site failure and give results concerning the existence of resilient protocols in this situation. Section V turns to the possibility of network failure and shows the class of failures for which a resilient protocol exists. Section VI summarizes the results in the previous two sections.
1.2 BACKGROUND OF THE STUDY
A distributed database management system supports a database physically distributed over multiple sites interconnected by a communications network. By definition, a transaction in a distributed database system is a (logically) atomic operation: it must be processed at all sites or at none of them. De-signing protocols resilient to various failures, including arbitrary site failures and partitioning of the communications network, is a very difficult task.
Preserving transaction atomicity in the single-site case is a well-understood problem. The processing of a single transaction is viewed as follows. At some time during its execution, a "commit point" is reached where the site decides to commit or to abort the transaction. A commit is an uncon¬ditional guarantee to execute the transaction to completion, even in the event of multiple failures. Similarly, an abort is an unconditional guarantee to back out the transaction so that none of its effects persist. If a failure occurs before the commit point is reached, then immediately upon recovery the site will abort the transaction. Both commit and abort are irreversible.In the multiple site case, it is the task of a commit protocol to enforce global atomicity. Assuming that each site has a recovery strategy that provides atomicity at the local level, the problem becomes one of ensuring that the sites either unanimously abort or unanimously commit the transaction. A mixed decision results in an inconsistent database. In the absence of failures, a unanimous consensus is easily obtained by a simple protocol. The challenge then is to find protocols ensuring atomicity in the presence of inopportune and perhaps repetitive failures.
A basic assumption within this project is that during the initial phase of distributed transaction processing any participating site can unilaterally abort the transaction. A site may choose to abort for any of the following reasons:
1) One or more sites fail,
2) The network fails,
3) The transaction deadlocks with another transaction,
4) The user aborts the transaction.
Clearly, before any site can commit the transaction, all sites must relinquish their right to unilaterally abort it. Once a site has relinquished that right, it can abort the transaction only in concordance with the other sites.
1.2 PURPOSE OF THE STUDY
To design and implement an optimized commit protocol (1 PC and 2 PC) for transactions in
distributed Database Management Systems which ensure atomicity and durability, in particularfor handheld devices.
1.4 OBJECTIVE OF THE STUDY
1. The protocol development involves algorithms for the coordinator and participating processes within Transaction Manager to provide global atomicity.
2. It guarantees uniform execution of transaction ensuring that all participating sites commit or all abort even when subject to network failures, coordinating and participating site failures.
3. The distributed computer system ensures reliability by providing redundant resources on different nodes. It leads to problems like lack of global state information, the possibility of partial failure and performing parallel operations. Hence, to maintain consistency an atomic transaction model is required.
4. In D-DBMS, database may be replicated or fragmented on multiple sites. A single transaction may involve the modifications in the database at multiple sites. For maintaining the consistency, more specifically, guarantying the ACID properties it is necessary for the transaction manager to ensure that the transaction gets successfully completed at all the sites participating in the transaction or on none of them. The Commit Protocol is to be designed to handle all these issues.
5. This protocol is intended to be applicable in DBMS for handheld devices, e.g. Simputer. It has to handle the issues common to the mobile environment like frequent disconnections, high communication prices etc.
6. Both the protocol 2PC and 1PC, have been designed and implemented
DEFINITIONS OF TERM
Coordinating site: The site that initiates the transaction.
Participating sites: The sites at which the transaction gets executed.
Deferred constraints: The constraints checks that are validated only at the time of the commit.
Voting phase: The phase in which the coordinating site communicate.