Introduction to Distributed Computing Systems

Lecture 1

Definition

  • Distributed System
    • Tannenbaum
      • A distributed system is a collection of independent computers that appears to its users as a single coherent system
    • Lamport
      • You know you have one when the crash of a computer you’ve never heard of stops you from getting any work done.
  • Distributed Applications
    • Applications that consist of a set of processes that are distributed across a network of machines and work together as an ensemble to solve a common problem.
  • Types of Distributed Systems
    • Distributed computing systems
      • Cluster computing: homogeneous
      • Grid computing: heterogenous via virtual organizations
      • Cloud computing: everything as a service
    • Distributed Information Systems
      • Transaction processing system (Transaction RPC)
      • Enterprise Information integration
      • Publish/Subscribe systems (message oriented v.s. RPC/RMI)
    • Distributed Pervasive Systems
      • Home systems
      • Healthy care systems
      • Sensor networks

Characteristic properties of transactions (ACID)

  • Atomic
    • To the outside world, the transaction happends individually
  • Consistent
    • The transaction does not violate system invariants
  • Isolated
    • Concurrent transactions do not interfere with each other.
  • Durable
    • Once a transaction commits, the changes are permanent.

Goal/Beneftis

  • Resource sharing
  • Distribution transparency
  • Scalability
  • Fault tolerance and availability
  • Performance
    • Parallel computing can be considered a subset of distributed computing.

Challenges

  • Heterogeneity
  • Need for “openness”
    • Open standards: key interfaces in software and communication protocols need to be standardized
    • Often defined in Interface Definition Language (IDL)
  • Security
    • Denial of service attacks
  • Scalability
    • size
    • geographically
    • administratively
  • Transparency
  • Failure handling
  • Quality of service

Scalability

  • Factors
    • Size
      • Concerning centralized services/data/algorithm
    • Geographically
      • Synchronous communication in LAN vs. asynchronous communication in WAN
    • Administratively
      • Policy conflicts from different organizations (e.g., for security, access control)
  • Scalability techniques
    • Hiding communication latency
      • Asynchronous communication
      • Code migration (to client)
    • Distribution
      • Splitting a large component to parts (e.g., DNS)


    • Replication
      • Caching (decision of clients vs. servicers)
      • On demand (pull) vs. planned (push)

Communication

  • Communication Paradigms

    • Interprocess communication
      • Socket programming, message passing, etc.
    • Remote invocation
      • Request/Reply
      • RPC/RMI
    • Indirect communication
      • Group communication
      • Publisher-subscriber
      • Message queues
      • Tuple spaces

  • Communication Patterns

    • Client-servier
    • Group-oriented/Peer-to-Peer
      • Applications that require reliability, scalability

Distributed Software

  • Middleware handles heterogeneity
  • High-level support
    • Make distributed nature of application transparent to the user/programmmer
      • Remote procedure callls
      • RPC + Object orientation = CORBA
  • Higher-level support BUT expose remote objects, partial failure, etc. to the programmer
  • Scalability

Fundamental/Abstract Models

  • Interaction Model
    • Reflects the assumptions about the progresses and the communication channels in the distributed system.
  • Failure Model
    • Distinguish between the types of failures of the processes and the communication channels. 
  • Security Model
    • Assumptions about the principals and the adversary

Interaction Models

  • Synchronous Distributed Systems
    • A system in which the following bounds are defined
      • The time to execute each step of a process has an upper and lower bound
      • Each message transmitted over a channel is received within a known bounded delay.
      • Each process has a local clock whose drift rate from real times has a known bound.
  • Asynchronous distributed system
    • Each step of a process can take an arbitrary time
    • Message delivery time is arbitrary
    • Clock drift rate are arbirary
  • Some implications
    • In a synchronous system, timeout can be used to detect failures
    • While in asynchrous system, it is impossible to detect failures or “reach aggrement”.

    Leave a Reply