A distributed system is a collection of independent computers that appears to its users as a single coherent system
Lamport
You know you have one when the crash of a computer you’ve never heard of stops you from getting any work done.
Distributed Applications
Applications that consist of a set of processes that are distributed across a network of machines and work together as an ensemble to solve a common problem.
Types of Distributed Systems
Distributed computing systems
Cluster computing: homogeneous
Grid computing: heterogenous via virtual organizations
Cloud computing: everything as a service
Distributed Information Systems
Transaction processing system (Transaction RPC)
Enterprise Information integration
Publish/Subscribe systems (message oriented v.s. RPC/RMI)
Distributed Pervasive Systems
Home systems
Healthy care systems
Sensor networks
Characteristic properties of transactions (ACID)
Atomic
To the outside world, the transaction happends individually
Consistent
The transaction does not violate system invariants
Isolated
Concurrent transactions do not interfere with each other.
Durable
Once a transaction commits, the changes are permanent.
Goal/Beneftis
Resource sharing
Distribution transparency
Scalability
Fault tolerance and availability
Performance
Parallel computing can be considered a subset of distributed computing.
Challenges
Heterogeneity
Need for “openness”
Open standards: key interfaces in software and communication protocols need to be standardized
Often defined in Interface Definition Language (IDL)
Security
Denial of service attacks
Scalability
size
geographically
administratively
Transparency
Failure handling
Quality of service
Scalability
Factors
Size
Concerning centralized services/data/algorithm
Geographically
Synchronous communication in LAN vs. asynchronous communication in WAN
Administratively
Policy conflicts from different organizations (e.g., for security, access control)
Scalability techniques
Hiding communication latency
Asynchronous communication
Code migration (to client)
Distribution
Splitting a large component to parts (e.g., DNS)
Replication
Caching (decision of clients vs. servicers)
On demand (pull) vs. planned (push)
Communication
Communication Paradigms
Interprocess communication
Socket programming, message passing, etc.
Remote invocation
Request/Reply
RPC/RMI
Indirect communication
Group communication
Publisher-subscriber
Message queues
Tuple spaces
Communication Patterns
Client-servier
Group-oriented/Peer-to-Peer
Applications that require reliability, scalability
Distributed Software
Middleware handles heterogeneity
High-level support
Make distributed nature of application transparent to the user/programmmer
Remote procedure callls
RPC + Object orientation = CORBA
Higher-level support BUT expose remote objects, partial failure, etc. to the programmer
Scalability
Fundamental/Abstract Models
Interaction Model
Reflects the assumptions about the progresses and the communication channels in the distributed system.
Failure Model
Distinguish between the types of failures of the processes and the communication channels.
Security Model
Assumptions about the principals and the adversary
Interaction Models
Synchronous Distributed Systems
A system in which the following bounds are defined
The time to execute each step of a process has an upper and lower bound
Each message transmitted over a channel is received within a known bounded delay.
Each process has a local clock whose drift rate from real times has a known bound.
Asynchronous distributed system
Each step of a process can take an arbitrary time
Message delivery time is arbitrary
Clock drift rate are arbirary
Some implications
In a synchronous system, timeout can be used to detect failures
While in asynchrous system, it is impossible to detect failures or “reach aggrement”.