VM Live Migration’s Impacts on the Running Applications

Posted on December 30, 2016 by admin

1. Will the IP address change after migration?

Both types of live migration exist, including changing and not changing IP address [5].

Based on Google cloud [1], it can migrate clients’ VM without affect the customers. That means the IP address of a VM would not be changed in this case.

To retain the same IP address, hyper-V requires the source and destination hosts to be within the same subnet. I think Google cloud may not have this requirement.
I think the virtual network [4] would be able to remove the restrictions on the locations of the destination hosts. “Hyper-V Network Virtualization decouples virtual networks for customer virtual machines from the physical network infrastructure.”

2. Will the migration interrupt the Internet service?

This depends on the implementation. The answer is different regarding different implementation.

According to google cloud [1], there will be no service interruptions.

During post-migration brownout, the VM executes on the target. The source VM is present, and may be providing supporting functionality for the target. For instance, until the network fabric has caught up the new location of the VM, and source VM provides forwarding services for packets to and from the target VM.

According to hyper-V [2]

the migration is not downtime-free, the interruption is almost immeasurably brief. Usually the longest delay is the network layer while the virtual machine’s MAC address is registered on the new physical switch port and its new location is propagated throughout the network.
According to [3], in order to use live migration the VM needs to keep the same IP address across date centers in order to achieve the goal of continuous access from clients to the virtual machine during and after the migration.

3. How the network is migrated?

The most challenging issue in VM migration is to keep the network working.

In LAN, different hypervisors using different strategies.

It uses ARP to bind the IP address to the new host.

The VM sends ARP signal, broadcast that the IP address is moved to a new host. But this may not be allowed for security reasons.

VMware

VMotion uses VNIC to ensure the network connection.

The VNIC will be migrated with VM as well. Every VNIC has a unique MAC address in LAN and is connected to one or multiple NIC.
Since VNIC has a MAC address that is irrelevant to the physical network address, the network will be continued as normal using VM live migration.
Note due to the restrictions of Ethernet, the source and destination hosts have to be in the same subnet.

In WAN

The VM will be given a new IP address in the destination host. In order to ensure the network connection, we can use IP tunnel with combination of dynamic DNS, i.e., we can build a IP tunnel between the source IP and destination IP address, and use it to forward the packets from source host to destination host. Once migration is done, VM can response to the new network. It means the DNS is updated, and the network connection will refer to the new IP address.

Reference
[1] Google cloud VM live migration
[2] Hyper-V live migration
[3] Live Migration — Implementation considerations
[4] Hyper-V 网络虚拟化概述
[5] 虚拟机迁移研究

Transactions and Replications

Posted on January 11, 2016 by admin

One Copy Serializability

Replicated transactional service

Each replica manager providers concurrency control and recovery of its own data items in the same way as it would for non-replicated data

Effects of transactions performed by various clients on replicated data items are the same as if they has been performed one at a time on a single data item
Additional complications

Failures should be serialized w.r.t. transactions, i.e., any failure observed by a transaction must appear to have happened before a transaction started

Replication Scheme

Primary copy

All client request are directed to a single primary server

Read one — write all

cannot handle network partitions
Each write operation sets a write lock to each replica manager
Each read sets a read lock at one replica manager

Schemes that can handle network partitions

Available copies with validation
Quorum consensus
Virtual Partition

Available Copies Replication

Can handle the case when some replica managers are unavailable because they failed or communication failure
Reads can be performed by any available replica manager but writes must be performed by all available replica managers
Normal case is like one/write all

As long as the set of available replica managers does not change during a transaction

RM failure case

One copy serializability requires that failures and recovery be serialized w.r.t transactions
This is not achieved when different transactions make conflicting failure observations
Examples shows local concurrency control is not enough
Additional concurrency control procedure (called local validation) has to be performed to ensure correctness

Available copies with local validation assume no network partition

i.e, functioning replica managers can communicate with one another.

Local validation

Before a transaction commits it checks for failures (and recoveries) of replica managers of data items it has accessed

Handling Network Partititons

Network partitions separate replica managers into two or more subgroups, in such a way that the members of a group can communicate with one another but members of different subgroups cannot communicate
Optimistic approaches

Available copies with validation

Pessimistic approaches

Quorum consensus

Available Copies with Validation

Available copies algorithm applied within each partition

Maintains availability for read operations

When partition is repaired, possibly conflicting transactions in separate partitions are validated

The effects of a committed transactions that is now aborted on validation will have to be undone

Only feasible for applications where such compensating actions can be taken

Validation

Versions vector (write-write conflicts)
Precedence graphs (each partition maintains a log of data items affected by the Read and Write operations of transactions)
Log used to construct a precedence graph whose nodes are transactions and whose edges represent conflicts between Read and Write operations

No cycle in graph corresponding to each partition

If there are cycles in graph, validation fails.

Quorum Consensus

A quorum is a subgroup of replica managers whose size gives it the right to carry out operations
Majority voting one instance of a quorum consensus scheme

R+ W > total number of votes in group
W > half f the total votes
Ensure that each read quorum intersects a write quorum, and two write quorum will intersect

Each replica has a version number that is used to detect if the replica is up to date.

Virtual Partition Scheme

Combines available copies and quorum consensus
Virtual partition = set of replica managers that have a read and write quorum
If a virtual partition can be formed, available copies is used

Improve performance of Reads

If a failure occurs, and virtual partition changes during a transaction, it is aborted
Have to ensure virtual partitions do not overlap.

CAP Conjecture

Is it possible to achieve consistency, availability, and partition tolerance?.
Classic distributed systems: focused on ACID semantics

Atomic
Consistent
Isolated
Durable

Modern internet system: focused on BASE

Basically Available
Soft-state (or scalable)
Eventually consistent

ACID v.s. BASE

Why the Divide

What goals might you want for a shared-data system

consistency
availability
partition tolerance

Strong consistency

all clients see the same view, even in the presence of updates

High availability

All clients can find the same replica of the data, even in the presence of failures

Partition-tolerance

The system properties hold even when the system is partitioned.

CAP Conjecture

You can only have two out of these three properties.
The choice of which feature to discard determines the nature of your system.

Consistency and Availability

Comment

Providing transaction semantics requires all nodes to be in contact with each other

Example

Single-site clustered databases

Typical features

Two-phase commit
Cache invalidation protocol
Classic distributed system style

Consistency and Partition Tolerance

Comment

If one is willing to tolerate system-wide blocking, then can provide consistency even when there are temporary partitions

Examples

Distributed databases
Distributed locking
Quorum (majority) protocols

Typical features

Pessimistic locking
Minority partitions unavailable
Also common distributed system

voting vs. primary replicas

Partition-Tolerance and Availability

Comment

sacrifice consistency

Examples

DNS
Web cache
Coda
Bayou

Typical features

TTLs and lease cache management
Optimistic updating with conflict resolution

Techniques

Expiration-based caching: AP
Quorum/majority algorithm: PC
Two phase commit: AC

Fault Tolerance

Posted on January 9, 2016 by admin

Definitions

Availability

Probability the system operates correctly at any given moment

Reliability

Ability to run correctly for a long interval of time

Safety

Failure to operate correctly does not lead to catastrophic failures

Maintainability

Ability to “easily” repair a failed system

Failure Models

Different types of failures

Two-Army Problem

Byzantine Agreement

[Lamport et al. (1982)]
Goal

Each process learn the true values sent by correct processes

Assumption

Every message that is sent is delivered correctly
The receiver knows who sent the message
Message delivery time is bounded

Byzantine Agreement Result

In a system with m faulty processes agreement, agreement can be achieved only if there are 2m+1 functioning correctly.
Note

This result only guarantees that each process receives the true values sent by correct processors, but it does not identify the correct process!

Byzantine General Problem

Phase 1: Generals announce their troop strengths to each other

Phase 2: Each general construct a vector with all troops

Phase 3: General send their vectors to each other and compute the majority voting

Reliable Group Communication

Reliable Multicast

All nonfaulty process which do not join/leave during communication receive the message

Atomic Multicast

All message are delivered in the same order to all processes

View Delivery

A view reflects current membership of group
A view is delivered when a membership change occurs and the application is notified of the change

Receiving a view is different from delivering a view

All members have to agree to the delivery of a review

View synchronous group communication

The delivery of a new view draws a conceptual line across the system and every message is either delivered on one side or the other of that line

Atomic Multicast

All message are delivered in the same order to “all” processes
Group view

The set of processes known by the sender when it multicast the message

Virtual synchronous multicast

A message multicast to a group view G is delivered to all nonfaulty process in G

If sender fails after sending the message, the message may be delivered to no one

Virtual Synchrony Implementation

Only stable messages are delivered
Stable message

A message received by all processes in the message’s group view

Assumptions (can be ensured by using TCP)

Point-to-point communication is reliable
Point-to-point communication ensures FIFO-ordering

Message Ordering

Total ordering does not imply causality or FIFO!

Virtualization and Code Migration

Posted on January 6, 2016 by admin

Motivation for Code Migration

Load sharing in distributed systems

Long-running process can be migrated to idle processors

Client-server systems

Code for data entry shipped to client systems
If large quantities of data need to be processed, it is better to ship the data processing component to the client
Dynamically configurable client software

More flexibility, easier maintenance and upgrade of client software

Enterprise and “Desktop Grids”

Computationally-intensive tasks shipped to idle PCs around the network

Models for Code Migration

A process has three segments

Code segment

Set of instructions making the problem

Execution segment

Private data, stack, PC, registers

Resource segment

Reference to external resources such as files, printers, devices etc

Weak vs Strong Mobility

Weak mobility

Only code segment + initialization data migrated

e.g., Java Applets

Strong mobility

Code segment + Execution segment

Send-initiated vs. receiver-initiated migration

Receiver-initiated is much easier considering security

Migration and Local Resources

Process-to-resource bindings make code migration difficult
Three types of processor to resource bindings

Binding by identifier

When a process refers to a resource by its identifier

e.g., URL, IP address, local communication endpoint (socket)

Binding by value

Weaker form of binding when only the value of a resource is needed

e.g., when a program replies on standard language libraries

Binding by type

Weakest form of binding when a process indicates the type of a resource

e.g., a printer

Transactions and Concurrency

Posted on January 5, 2016 by admin

Transactions

Motivation

Provide atomic operations at servers that maintain shared data for clients
Provide recoverability from server crashes

Properties (ACID)

Atomicity
Consistency
Isolation
Durability

Concurrency Control

Motivation

Without concurrency control, we have lost updates, inconsistent retrievals, dirty reads, etc.
Concurrency control schemes are designed to allow two or more transactions to be executed correctly while maintaining serial equivalence

Serial Equivalence is correctness criterion

Schedule produced by concurrency control scheme should be equivalent to a serial schedule in which transactions are executed one after the other.

Scheme

Locking
Optimistic concurrency control
Time-stamp based concurrency control

Use of Locks in Strict Two-Phase Locking

When an operation accesses an object within a transaction

(1) If the object is not already locked, it is locked and the operation proceeds
(2) If the object has a conflicting lock set by another transaction, the transaction must wait until it is unlocked
(3) If the object has a non-conflicting lock set by another transaction, the lock is shared and the operation proceeds
(4) If the object has already been locked in the same transaction, the lock will be promoted if necessary and the operation proceeds

When promotion is prevented by a conflicting lock, rule 2 is used

Strict Two-Phase Locking

Deadlock

Example

Resolution of Deadlock

Timeout

Optimistic Concurrency Control

Drawback of locking

Overhead of lock maintainance
Deadlocks
Reduced concurrency

Optimistic Concurrency Control

In most applications, likelihood of conflicting accesses by concurrent transaction is low
Transactions proceed as though there are no conflicts
Three phases

Working Phase

Transactions read and write private copies of objects

Validation phase

Each transaction is assigned a transaction number when it enters its phase

Update phase

Validation of Transaction

Timestamp Based Concurrency Control

Each transaction is assigned a unique timestamp at the moment it starts

In distributed transactions, Lamport’s timestamps can be used.

Every data item has a timestamp

Read timestamp = timestamp of transaction that last read the time
Write timestamp = timestamp of transaction that most recently changed an item

Timestamp ordering write rule

Concurrency Control for Distributed Transactions

Locking

Distributed deadlocks possible

Timestamp ordering

Lamport time stamps

The Two-Phase Commit Protocol

Three Phase Commit

Problem with two-phase commit

If coordinator crashes, participants cannot reach a decision, stay blocked until coordinator recovers

Three-phase commit

There is no single state from which it is possible to make a transaction directly to another COMMIT or ABORT state
There is not state in which it is not possible to make a final decision, and from which a transaction to COMMIT can be made.

Update Propagation in Distributed System

Posted on January 3, 2016 by admin

Update Propagation

A comparison between push-based and pull-based protocols in the case of multiple client, single server system

Epidemic Protocols

Update propagation for systems that only need eventual consistency
Randomized approaches based on the theory of epidemics

Infective, susceptible and removed servers

Anti-entropy propagation model

A server P picks another server Q at random, and exchanges updates
Three approaches

P only pushes updates to Q
P only pulls new updates from Q
P and Q send updates to each other

If many infective servers, pull-based approach is better
If only one infective server, eitehr approach will eventually propagate all updates
Rumor spreading (Gossiping) will speed up propagation

If server P has been updated, it randomly contacts Q and tries to push the update to Q, if Q was already updated by another server, with some probability (1/k), P loses interest in spreading the update any further

The Gossip System

Guarantees

Each client obtains a consistent service over time

i.e., replica managers only provide a client with data that reflect the updates the client has observed so far

Relaxed consistency between replicas

Primarily causal consistency, but support also provided for sequential consistency
Choice up to the application designer

Procedure

Request

Front Ends sends a query or update request to a replica manager that is reachable

Update Response

RM replies as soon as it receives updates

Coordination

RM does not process the request until it can meet the required ordering constraints

This may involve receiving updates from other replica managers in gossip messages

Execution
Query Response

If the request is a query, the RM replies at this point

Agreement

The replica managers update each other by exchanging gossip messages, which contains the most recent updates they have received
This is one in lazy fashion

Query and update operations in a gossip service

Front ends propagate their timestamps whenever clients communicate directly

A gossip replica manager, showing its main state components

Bayou: Eventual Consistency

If no updates take place for a long time, all replicas will gradually become consistent
Domain specific conflict detection and resolution

Appropriate for applications like shared calendars

Motivation for eventual consistency

Sequential consistency requires that at every point, every replica has a value that could be the result of the global-agreed sequential application of writes
This does not require that all replicas agree at all times, just that they always take on the same sequence of value
Writes

When they arrive, are applied in the same order at all replicas
Easily done with timestamps

Conflict Resolution

Dealing with inconsistency

Every Bayou update contains a dependency check and a merger procedure in addition to the operation’s specification

Replication not transparent to application

Only the application knows how to resolve conflicts

Split of responsibility

Replication system

Propagate updates

Application

resolve conflicts

Optimistic application of writes require that writes be “undoable“

Rolling Back Updates

Keep log of updates
Order by timestamp
When a new update comes in, place it in the correct order and repapply log of updates
Need to establish when you can truncate the log
Requires old updates to be “committed”, new ones tentative
Committed order can be achieved by designing a replica manager as the primary replica manager

Consistency and Replication (II)

Posted on January 1, 2016 by admin

Basic Architecture for Replica Data Management

System Model

Five phases in performing a request

Front end issues the request

Either sent to a single replica or multicast to all replicate managers

Coordination

Replica managers coordinate in preparation for the execution of the request

i.e., agree is requests is to be performed and the ordering of the request relative to others

Execution
Agreement

Reach consensus on effect of the request, e.g., agree to commit or abort in a transactional system

Response

Mechanism for Sequential Consistency

Primary-based replication protocols
Replicated-write protocols

Active replication using multicast communication
Quorum-based protocols

Primary-backup Model

Front ends only communicate with primary

Procedures

Request

FE issues a request containing a unique identifier to the primary replica manager

Coordination

The primary takes each request in the order in which it receives it

Execution

The primary executes the request and store the response

Agreement

If the request is an update, the primary sends the updated state, the response and the unique id to all backups. The backups send an acknowledgement.

Response

The primary responds to the front end, which hands the response back to the client.

Implementation

Implements linearizability if primary is correct, since primary sequences all the operations

Failures

If primary fails, the system retains linearizabilty if a single back becomes the new primary and if the new system configuration takes over exactly where the last left off

If primary fails, it should be replaced with a unique backup
Replica managers that survive have to agree upon which operation has been performed when the replacement primayr is over
Requirements met if replica managers organized as a group and if primary uses view-synchronous communication to propagate updates to backups.

Replicate-Write Protocol

Active replication

Front end multicasts request to each replica using a totally ordered reliable multicast
System achieves sequential consistency but not linearizability

Total order in which replica managers process requests may not be the same as real-time order in which clients made request.

Implementing Ordered Multicast

Incoming messages are held back in a queue until delivery guarantees can be met

The hold-back queue for arriving multicast messages

Coordinate all machines needed to determine delivery order
FIFO-ordering

Easy
Use a separate sequence number for each process

Total ordering

Use a sequencer
Distributed algorithm with three phases

Causal order

Use vector timestamps

The ISIS algorithm for total ordering

Causal Ordering using Vector Timestamps

Quorum-based Protocols

Procedures

Assign a number of votes to each replica
Let N be the total number of votes
Define R = read quorum, W = write quorum
R+W > N
W > N/2

guarantee that no two writes at the same time
since if yes, than the vote for w_1 and w_2 are larger than N

Only one writer at a time can achieve write quorum
Every reader sees at least one copy of the most recent read (takes one with most recent version number)

Examples

Scaling

None of the protocols for sequential consistency scale
To read or write, you have to either

Contact a primary copy
Use reliable totally ordered multicast
Contact over half the replicas

All this complexity is to ensure sequential consistency

Even the protocols for causal consistency and FIFO consistency are difficult to scale if they use reliable multicast

Can we weaken sequential consistency without losing some important features?

Highly Available Service

Emphasis on giving clients access to the service with reasonable response time, even if some results do not conform to sequential consistency
Examples

Gossip

Relaxed consistency

Causal update ordering

Bayou

Eventual consistency
Domain-specific conflict detection and resolution

Coda (file system)

Disconnected operation
Use vector timestamp to detect conflicts

Consistency and Replication (I)

Posted on January 1, 2016 by admin

1. Replication

Motivation

Performance enhancement
Enhanced availability/reliability
Fault tolerance
Scalability

Tradeoff between benefits of replication and work required to keep replica consistent

Requirements

Consistency

Depends upon applications
In many applications, we want different clients making (read/write) requests to different replicas of the same logical items should not obtain different results

Replica transparency

Desirable for most application

2. Consistency Models

Consistency model is a contract between processes and a data store

If process follow certain rules, then the store will work correctly

Needed for understanding how concurrent read and writes behave w.r.t shared data
Relevant for shared memory multiprocessors

Cache coherence algorithms

Shared database, files

Independent operations
transactions

Strict Consistency

Any read on a data item x returns a value corresponding to the result of the most recent write on x
Challenge

It requires absolute global time

Sequential Consistency

The result of any execution is the same as if the read and write operation by all processes were executed in some sequential order and the operations of each individual process appear in this sequence in the order specified by its program.
i.e., required to be seen by the same process in the same order
Example

Linearizability

Definition of sequential consistency says nothing about time

There is no reference to the “most recent” write operation

Liearizability

Weaker than strict consistency, stronger than sequential consistency

Operations are assumed to receive a time stamp with a global available lock that is loosely synchronized

The result of any execution is the same as if the operations by all processes on the data store were executed in some sequential order and the operations of each individual process appear in this sequence in the order specified by its program.

Causal Consistency

Writes that are potentially causally related must be seen by all processes in the same order. Concurrent writes may be seen in a different order on different machines.
Example

(a) is not casual-consistent, since in process P2, a performs before b, so that in process P3 and P4, a should also performs before b
While in (b) there is not casual relationship between a and b

FIFO consistency

Writes done by a single process are seen by all other processes in the order in which there were issues.
But writes from different processes may be seen in a different order by different process.

Weak Consistency

Access to synchronization variable associated with a data store are sequential consistent.
No operation on a synchronization variable is allowed to be performed until all previous writes have been completed everywhere
No read or write operation on data items are allowed to be performed until all previous operations to synchronization variable have been performed.

Release Consistency

Before a read or write operation on a shard data is performed, all previous requires done by the process must have completed successfully.
Before a release is allowed to be performed, all previous reads and writes by the process must have completed.
Access to synchronization variables are FIFO consistent (sequential consistent is not required).

Entry Consistency

An acquire process of a synchronization variable is not allowed to perform with respect to a process until all updates to the guarded shared data have been performed with respect to that process.
Before an exclusive mode access to a synchronization variable by a process is allowed to perform with respect to that process, no other process may hold the synchronization variable, not even in nonexclusive mode.
After an exclusive mode access to a synchronization variable has been performed, any other process’s next nonexclusive mode access to that synchronization variable may not be performed until it has performed with respect to that variable’s owner.

3. A summary of Consistency Models

4. Weak Consistency Models

The weak consistency models that use synchronization variable (release, entry consistency) are mostly relevant to shared multiprocessor systems

Also modern CPU with multiple pipelines, out-of-order instruction execution, asynchronous writes, etc.

In distributed systems, weak consistency typically refers to weaker consistency models than sequential consistency

Casual consistency

e.g., used in the Gossip system

Optimistic approaches such as those used in Bayou, Coda that use application specific operations to achieve eventual consistency

5. Eventual Consistency

Session Guarantees

When clients move around and connects to different replicas, strange things can happen

Updates you just made are missing
Database goes back in time

Design choice

Insist strict consistency
Enforce some session guarantees, client-centric consistency

Monotonic Reads

If a process reads the value of a data item x, any successive read operation on x by that process will always return that same value on a more recent value.
Disallow reads to a database less current than previous read
Example error

Get a list of email message, when attempts to read one, pop put “message does not exist”

Monotonic Writes

A write operation by a process on a data item x is completed before any successive write operation x by the same process
Writes must follow any previous writes that occurred within their session

Read your Writes

A read operation by a process on a data item x is completed before any successive write operation on x by the same process.
Every read in a session should see all previous writes in that session.
Example error

Deleted email message re-appear

Writes Follow Reads

A write operation by a process on a data item x following a previous read operation on x by the same process is guaranteed to take place on the same or a more recent value of x that was read.

If a write W followed by a read R at a server X, ten at all other servers

If W is in T’s database, then any writes relevant to R are also avaialble

After users outside session
Traditional write/read dependencies preserved at all servers
Two guarantees: ordering and propagation

Order

If a read precedes a write in a session, then that read depends on a previous non-session write, then previous write will never be seen after second write at any server. It may not be seen at all.

Propagation

Previous write will actually have propagated to any data base where the second write is applied.

6. Supporting Session Guarantees

Responsibility of session manager, not servers
Two sets

Read set

set of writes that are relevant to session reads

Write set

Set of writes that performed in session

Update dependencies captured in read-sets and write-sets
Causal order of writes

Use Lamport clocks

Naming

Posted on December 30, 2015 by admin

Naming Entities

A name in a distributed system is a string of bits or characters that is used to refer to an entity
Type of names

Address

The name of an access point of an entity

Identifiers

A name that uniquely identifies an entity

Location-independent name

A name that is independent from its addresses

Name Resolution

A naming system maintains a name-to-address binding for resources
Main problem in naming

How to do name resolution in a distributed systems in a scalable way

Approaches for name resolution closely tied to naming scheme
Flat vs Structure naming

In a flat name space, identifiers are random bit strings
No information on location embedded in name

Name Resolution in a Flat Name Space

Simple solutions that work in a LAN environment

Broadcasting & Multicasting

Message containing identifier of the entity is boardcast
Machine with access point for the entity replies with the address of the access point

ARP protocol for finding the data-link address of a machine given the IP address
When network grows, multicast is more efficient

Forwarding pointers

When an entity moves from A to B, it leaves behind a reference to its new location at B

Home-based Approaches

Home agent keeps track of current location of mobile entity

Hierarchical Approaches

Similar to DNS

What is a DHT

Distributed Hash Table

key = hash(data)
lookup(key) -> IP address
send-RPC(IP address, PUT, key, value)
Send-RPC(IP address, GET, key) -> value

Chord

Structure Naming Systems

Names are organized into name spaces

A name space can be represented as a labeled, directed graph wit two types of nodes

Leaf nodes and directory nodes
Absolute vs relative path names
Local names vs global names

Name resolution: the process of looking up a name

Closure mechanism

Knowing where and how to start name resolution

Unix File Name Spaces

Linking and Mounting

Symbolic link

Mounting remote name spaces

Implementing Name Spaces

Naming service

A service that allows users and processes to add, remove, and lookup names

Name spaces for large-scale widely distributed systems are typically organized hierarchically
Three layers used to implement such distributed name spaces

Global layer

root node and its children

Administrational layer

Directory nodes within a single organization

Managerial layer

Example: DNS name space

Iterative vs Recursive Resolution

Recursive

puts a higher performance demand on each name server
cons

Too high for global layer name servers

Pros

Caching is more effective
Communication costs may be reduces

Example

Iterative

Example

Attribute-based Naming

An entity is described in terms of (attribute, value) pairs
Each entity can have several attributes
Attribute-based naming service are called directory services
LDAP (Lightweight directory access protocol) defacto industry standard

Based on OSI X.500 directory service

Another example: UDDI for web services

Client-Server Design Issues

Posted on December 28, 2015 by admin

Threads in Distributed Systems

Multithreaded clients

Thread can block waiting for server response but application/process is not blocked

Multithreaded servers

Simplified server code as opposed to finite-state-machine approach that users non-blocking system calls
Can handle multiple client requests in parallel (while making blocking ystem calls)
Improved performance over iterative servers on multiprocessor systems

One Copy Serializability

Replication Scheme

Available Copies Replication

Handling Network Partititons

Available Copies with Validation

Quorum Consensus

Virtual Partition Scheme

CAP Conjecture

Why the Divide

CAP Conjecture

Consistency and Availability

Consistency and Partition Tolerance

Partition-Tolerance and Availability

Techniques

Definitions

Failure Models

Two-Army Problem

Byzantine Agreement

Byzantine General Problem

Phase 1: Generals announce their troop strengths to each other

Phase 2: Each general construct a vector with all troops

Phase 3: General send their vectors to each other and compute the majority voting Reliable Group Communication Reliable Multicast All nonfaulty process which do not join/leave during communication receive the message Atomic Multicast All message are delivered in the same order to all processes

Reliable Group Communication

Reliable Multicast All nonfaulty process which do not join/leave during communication receive the message

View Delivery

Atomic Multicast

Message Ordering

Motivation for Code Migration

Models for Code Migration

Migration and Local Resources

Transactions

Motivation

Properties (ACID)

Concurrency Control

Motivation

Scheme

Use of Locks in Strict Two-Phase Locking

Strict Two-Phase Locking

Deadlock

Example

Resolution of Deadlock

Optimistic Concurrency Control

Drawback of locking

Optimistic Concurrency Control

Validation of Transaction

Timestamp Based Concurrency Control

Timestamp ordering write rule

Concurrency Control for Distributed Transactions

The Two-Phase Commit Protocol

Three Phase Commit

Update Propagation

Epidemic Protocols

The Gossip System

Bayou: Eventual Consistency

Motivation for eventual consistency

Conflict Resolution

Rolling Back Updates

Basic Architecture for Replica Data Management

System Model

Mechanism for Sequential Consistency

Primary-backup Model

Procedures

Implementation

Failures

Replicate-Write Protocol

Active replication

Implementing Ordered Multicast

The ISIS algorithm for total ordering

Causal Ordering using Vector Timestamps

Quorum-based Protocols

Scaling

Highly Available Service

1. Replication

Motivation

Requirements

2. Consistency Models

Strict Consistency

Sequential Consistency

Causal Consistency

FIFO consistency

Phase 3: General send their vectors to each other and compute the majority voting

Reliable Group Communication

Reliable Multicast

All nonfaulty process which do not join/leave during communication receive the message

Atomic Multicast

All message are delivered in the same order to all processes

Reliable Multicast

All nonfaulty process which do not join/leave during communication receive the message

Iterative

Example