Transactions and Replications

One Copy Serializability

  • Replicated transactional service
    • Each replica manager providers concurrency control and recovery of its own data items in the same way as it would for non-replicated data
  • Effects of transactions performed by various clients on replicated data items are the same as if they has been performed one at a time on a single data item
  • Additional complications
    • Failures should be serialized w.r.t. transactions, i.e., any failure observed by a transaction must appear to have happened before a transaction started

Replication Scheme

  • Primary copy
    • All client request are directed to a single primary server
  • Read one — write all
    • cannot handle network partitions
    • Each write operation sets a write lock to each replica manager
    • Each read sets a read lock at one replica manager
  • Schemes that can handle network partitions
    • Available copies with validation
    • Quorum consensus
    • Virtual Partition

Available Copies Replication

  • Can handle the case when some replica managers are unavailable because they failed or communication failure
  • Reads can be performed by any available replica manager but writes must be performed by all available replica managers
  • Normal case is like one/write all
    • As long as the set of available replica managers does not change during a transaction
  • RM failure case
    • One copy serializability requires that failures and recovery be serialized w.r.t transactions
    • This is not achieved when different transactions make conflicting failure observations
    • Examples shows local concurrency control is not enough
    • Additional concurrency control procedure (called local validation) has to be performed to ensure correctness
  • Available copies with local validation assume no network partition
    • i.e, functioning replica managers can communicate with one another.
  • Local validation
    • Before a transaction commits it checks for failures (and recoveries) of replica managers of data items it has accessed

Handling Network Partititons

  • Network partitions separate replica managers into two or more subgroups, in such a way that the members of a group can communicate with one another but members of different subgroups cannot communicate
  • Optimistic approaches
    • Available copies with validation
  • Pessimistic approaches
    • Quorum consensus

Available Copies with Validation

  • Available copies algorithm applied within each partition
    • Maintains availability for read operations
  • When partition is repaired, possibly conflicting transactions in separate partitions are validated
    • The effects of a committed transactions that is now aborted on validation will have to be undone
      • Only feasible for applications where such compensating actions can be taken
  • Validation
    • Versions vector (write-write conflicts)
    • Precedence graphs (each partition maintains a log of data items affected by the Read and Write operations of transactions)
    • Log used to construct a precedence graph whose nodes are transactions and whose edges represent conflicts between Read and Write operations
      • No cycle in graph corresponding to each partition
    • If there are cycles in graph, validation fails.

Quorum Consensus

  • A quorum is a subgroup of replica managers whose size gives it the right to carry out operations
  • Majority voting one instance of a quorum consensus scheme 
    • R+ W > total number of votes in group
    • W > half f the total votes
    • Ensure that each read quorum intersects a write quorum, and two write quorum will intersect
  • Each replica has a version number that is used to detect if the replica is up to date. 

Virtual Partition Scheme

  • Combines available copies and quorum consensus
  • Virtual partition = set of replica managers that have a read and write quorum
  • If a virtual partition can be formed, available copies is used
    • Improve performance of Reads
  • If a failure occurs, and virtual partition changes during a transaction, it is aborted
  • Have to ensure virtual partitions do not overlap.

CAP Conjecture

  • Is it possible to achieve consistency, availability, and partition tolerance?.
  • Classic distributed systems: focused on ACID semantics
    • Atomic
    • Consistent
    • Isolated
    • Durable
  • Modern internet system: focused on BASE
    • Basically Available
    • Soft-state (or scalable)
    • Eventually consistent
  • ACID v.s. BASE

Why the Divide

  • What goals might you want for a shared-data system
    • consistency
    • availability
    • partition tolerance
  • Strong consistency
    • all clients see the same view, even in the presence of updates
  • High availability
    • All clients can find the same replica of the data, even in the presence of failures
  • Partition-tolerance
    • The system properties hold even when the system is partitioned. 

CAP Conjecture

  • You can only have two out of these three properties
  • The choice of which feature to discard determines the nature of your system. 

Consistency and Availability

  • Comment
    • Providing transaction semantics requires all nodes to be in contact with each other
  • Example
    • Single-site clustered databases
  • Typical features
    • Two-phase commit
    • Cache invalidation protocol
    • Classic distributed system style

Consistency and Partition Tolerance

  • Comment
    • If one is willing to tolerate system-wide blocking, then can provide consistency even when there are temporary partitions
  • Examples
    • Distributed databases
    • Distributed locking
    • Quorum (majority) protocols
  • Typical features
    • Pessimistic locking
    • Minority partitions unavailable
    • Also common distributed system
      • voting vs. primary replicas

Partition-Tolerance and Availability

  • Comment
    • sacrifice consistency
  • Examples
    • DNS
    • Web cache
    • Coda
    • Bayou
  • Typical features
    • TTLs and lease cache management
    • Optimistic updating with conflict resolution

Techniques

  • Expiration-based caching: AP
  • Quorum/majority algorithm: PC
  • Two phase commit: AC

Leave a Reply