Basic Architecture for Replica Data Management
System Model
Five phases in performing a request
- Front end issues the request
- Either sent to a single replica or multicast to all replicate managers
- Coordination
- Replica managers coordinate in preparation for the execution of the request
- i.e., agree is requests is to be performed and the ordering of the request relative to others
- Execution
- Agreement
- Reach consensus on effect of the request, e.g., agree to commit or abort in a transactional system
- Response
Mechanism for Sequential Consistency
- Primary-based replication protocols
- Replicated-write protocols
- Active replication using multicast communication
- Quorum-based protocols
Primary-backup Model
- Front ends only communicate with primary
Procedures
- Request
- FE issues a request containing a unique identifier to the primary replica manager
- Coordination
- The primary takes each request in the order in which it receives it
- Execution
- The primary executes the request and store the response
- Agreement
- If the request is an update, the primary sends the updated state, the response and the unique id to all backups. The backups send an acknowledgement.
- Response
- The primary responds to the front end, which hands the response back to the client.
Implementation
- Implements linearizability if primary is correct, since primary sequences all the operations
Failures
- If primary fails, the system retains linearizabilty if a single back becomes the new primary and if the new system configuration takes over exactly where the last left off
- If primary fails, it should be replaced with a unique backup
- Replica managers that survive have to agree upon which operation has been performed when the replacement primayr is over
- Requirements met if replica managers organized as a group and if primary uses view-synchronous communication to propagate updates to backups.
Replicate-Write Protocol
Active replication
- Front end multicasts request to each replica using a totally ordered reliable multicast
- System achieves sequential consistency but not linearizability
- Total order in which replica managers process requests may not be the same as real-time order in which clients made request.
Implementing Ordered Multicast
- Incoming messages are held back in a queue until delivery guarantees can be met
- The hold-back queue for arriving multicast messages
- Coordinate all machines needed to determine delivery order
- FIFO-ordering
- Easy
- Use a separate sequence number for each process
- Total ordering
- Use a sequencer
- Distributed algorithm with three phases
- Causal order
- Use vector timestamps
The ISIS algorithm for total ordering
Causal Ordering using Vector Timestamps
Quorum-based Protocols
- Procedures
- Assign a number of votes to each replica
- Let N be the total number of votes
- Define R = read quorum, W = write quorum
- R+W > N
- W > N/2
- guarantee that no two writes at the same time
- since if yes, than the vote for w_1 and w_2 are larger than N
- Only one writer at a time can achieve write quorum
- Every reader sees at least one copy of the most recent read (takes one with most recent version number)
- Examples
Scaling
- None of the protocols for sequential consistency scale
- To read or write, you have to either
- Contact a primary copy
- Use reliable totally ordered multicast
- Contact over half the replicas
- All this complexity is to ensure sequential consistency
- Even the protocols for causal consistency and FIFO consistency are difficult to scale if they use reliable multicast
- Can we weaken sequential consistency without losing some important features?
Highly Available Service
- Emphasis on giving clients access to the service with reasonable response time, even if some results do not conform to sequential consistency
- Examples
- Gossip
- Relaxed consistency
- Causal update ordering
- Bayou
- Eventual consistency
- Domain-specific conflict detection and resolution
- Coda (file system)
- Disconnected operation
- Use vector timestamp to detect conflicts