Middleware

Lecture 2 — part 5

Middleware

  • Definition
    • Middleware is a set of common business-unaware services that enable applications and end-users to interact with each other across a network
    • Distributed system services that have standard programming interfaces and protocols 
      • Services sit in the middle above OS and network software
      • and below industry-specific applications
  • Examples
    • ftp, email
    • web browsers
    • database drivers and gateways
    • CORBA (Common object request broker architecture)
    • Microsoft .NET
    • Java RMI, JINI, Javaspaces, JMS
    • Web services software — SOAP, REST

Functional View of Middleware

  • Information exchange services
  • Application-specific services
    • Specialized services
      • e,g,m transaction services and replication services for distributed databases
      • group services for collaborative applications, specialized services for multimedia applications
    • business-unaware
  • Management and support service
    • needed for locating distributed resources and administrating resources acorss the network

System Architecture — Peer to Peer Computing

lecture 2 — part 3

Organization of nodes in P2P Systems

  • Centralized directory
    • Original Napster
      • Pros
        • Simple
      • Cons
        • O(N) states
        • single point of failure
  • Unstructured P2P systems
    • Gnutella and its successors (flood queries)
      • Pros
        • Robust
      • Cons
        • Worst case O(n) messages per lookup
  • Structured P2P systems
    • Based upon Distributed Hash Tables (DHTs)
    • Chord, CAN, Tapestry…

Distributed Hash Table (DHT)

  • Distributed Hash Table
    • Key = Hash (data)
    • lookup(key) -> IP address
    • send-RPC(IP address, PUT, key, value)
    • send-RPC(IP address, GET, key) -> value
  • Chord
  • Example: BT content distribution

System Architecture — Centralized architecture

Lecture 2 — part 2

Client-server applications

  • Clients
    • Interacts with users through a user interface
    • Performs application functions
    • Interacts with client middleware using middleware API
    • Receives response and display it if needed
  • Servers
    • Implement services
    • Invoked by server middleware
    • Provide error-recovery and failure handling service

 Overview

  • Common communication patterns in distributed applications
    • Client-Server
    • Group (multicast)
    • Function-shipping/Applets
  • Client
    • Process that requests services
  • Server
    • Process that provides services
  • Details
    • Client usually blocks until server responds
    • Client usually invoked by end users when hey require services
    • Server usually waits for incoming requests
    • Server can have many clients making concurrent requests
    • Server is usually a program with special privileges

Application Software Architectures

  • Many applications can be considered to be made up of three software components or logical tiers

    • user interface
    • processing layer
    • data layer

  • Client/server architecture

    • Single-physical tiered, two physical tiered
    • multi-tiered
    • Distributed Data
      • e.g., distributed database
    • Remote data
      • e.g., network file system 
    • Distributed programs
      • e.g., world wide web
    • Remote presentation
      • e.g., telnet
    • Distributed presentation
      • e.g., X Windows

  • Motivation for multi-tier architectures

    • Frees clients from dependencies on the exact implementation of database
    • It allows business logic to be concentrated in one place
      • software updates are restricted to middle layer
    • Performance improvements possible by batching requests from many clients to database
    • Database and business logic tiers could be implemented by multiple servers for scalability

Distributed Software and System Architectures

Lecture 2

1. Distributed Architectures

  • Software Architecture
    • Logical organization of the collection of software components that make up a distributed application
  • System Architecture
    • Instantiation of a software architecture, i.e., physical placement of software components on computers

2. Architecture Style

  • Layered architectures
    • Pros
      • Division of task
      • Scalability
      • Transparency
      • Potability
    • Cons
      • e.g., layer 1 and layer 3 cannot talk directly
  • Object-based architectures
    • Pros
      • Independent components
      • Free to talk to anyone
  • Data-centered architectures
    • e.g., google docs
  • Event-based architectures
    • e.g., facebook

3. System Architecture

  • Centralized architecture
    • client-server applications
  • Decentralized architecture
    • Peer-to-peer applications
  • Hybrid architecture

Introduction to Distributed Computing Systems

Lecture 1

Definition

  • Distributed System
    • Tannenbaum
      • A distributed system is a collection of independent computers that appears to its users as a single coherent system
    • Lamport
      • You know you have one when the crash of a computer you’ve never heard of stops you from getting any work done.
  • Distributed Applications
    • Applications that consist of a set of processes that are distributed across a network of machines and work together as an ensemble to solve a common problem.
  • Types of Distributed Systems
    • Distributed computing systems
      • Cluster computing: homogeneous
      • Grid computing: heterogenous via virtual organizations
      • Cloud computing: everything as a service
    • Distributed Information Systems
      • Transaction processing system (Transaction RPC)
      • Enterprise Information integration
      • Publish/Subscribe systems (message oriented v.s. RPC/RMI)
    • Distributed Pervasive Systems
      • Home systems
      • Healthy care systems
      • Sensor networks

Characteristic properties of transactions (ACID)

  • Atomic
    • To the outside world, the transaction happends individually
  • Consistent
    • The transaction does not violate system invariants
  • Isolated
    • Concurrent transactions do not interfere with each other.
  • Durable
    • Once a transaction commits, the changes are permanent.

Goal/Beneftis

  • Resource sharing
  • Distribution transparency
  • Scalability
  • Fault tolerance and availability
  • Performance
    • Parallel computing can be considered a subset of distributed computing.

Challenges

  • Heterogeneity
  • Need for “openness”
    • Open standards: key interfaces in software and communication protocols need to be standardized
    • Often defined in Interface Definition Language (IDL)
  • Security
    • Denial of service attacks
  • Scalability
    • size
    • geographically
    • administratively
  • Transparency
  • Failure handling
  • Quality of service

Scalability

  • Factors
    • Size
      • Concerning centralized services/data/algorithm
    • Geographically
      • Synchronous communication in LAN vs. asynchronous communication in WAN
    • Administratively
      • Policy conflicts from different organizations (e.g., for security, access control)
  • Scalability techniques
    • Hiding communication latency
      • Asynchronous communication
      • Code migration (to client)
    • Distribution
      • Splitting a large component to parts (e.g., DNS)


    • Replication
      • Caching (decision of clients vs. servicers)
      • On demand (pull) vs. planned (push)

Communication

  • Communication Paradigms

    • Interprocess communication
      • Socket programming, message passing, etc.
    • Remote invocation
      • Request/Reply
      • RPC/RMI
    • Indirect communication
      • Group communication
      • Publisher-subscriber
      • Message queues
      • Tuple spaces

  • Communication Patterns

    • Client-servier
    • Group-oriented/Peer-to-Peer
      • Applications that require reliability, scalability

Distributed Software

  • Middleware handles heterogeneity
  • High-level support
    • Make distributed nature of application transparent to the user/programmmer
      • Remote procedure callls
      • RPC + Object orientation = CORBA
  • Higher-level support BUT expose remote objects, partial failure, etc. to the programmer
  • Scalability

Fundamental/Abstract Models

  • Interaction Model
    • Reflects the assumptions about the progresses and the communication channels in the distributed system.
  • Failure Model
    • Distinguish between the types of failures of the processes and the communication channels. 
  • Security Model
    • Assumptions about the principals and the adversary

Interaction Models

  • Synchronous Distributed Systems
    • A system in which the following bounds are defined
      • The time to execute each step of a process has an upper and lower bound
      • Each message transmitted over a channel is received within a known bounded delay.
      • Each process has a local clock whose drift rate from real times has a known bound.
  • Asynchronous distributed system
    • Each step of a process can take an arbitrary time
    • Message delivery time is arbitrary
    • Clock drift rate are arbirary
  • Some implications
    • In a synchronous system, timeout can be used to detect failures
    • While in asynchrous system, it is impossible to detect failures or “reach aggrement”.

    Blob File System v.s. GFS/HDFS

    Similarities

    • Master nodes
      • Both GFS and TFS (taobao file system) uses single master node with multiple slaves. 
      • My comments: facebook does not use such approaches? I think it uses P2P approach.

    Differences

    • Functions
      • GFS/HDFS are more popular
        • On top of it, we can build the big tables such as BigTable, Hypertable, HBase.
      • Blob File System
        • Usually used for Photos, Albums (These are called Blob Data)
    • Challenges
      • Blob FS
        • For each write, it will request the master node to assign a blob number and machine lists to write to. 
        • Challenge
          • The volume of meta-data is of huge size
            • E.g., Taobao has more than 10G photos, assume each photo has meta-data of size 20Bytes, the total size will be 20*10 = 200G, much more than the memory of a single machine.
        • Solution
          • The meta-data is not stored in Blob FS.
          • The meta-data are stored in external systems.
            • e.g., Taobao TFS has an id for each photo, the id are stored in external databases, such as Oracle or Mysql sharding cluster.
          • Blob FS use chunk to organize data.
            • Every blob file is a logical file. 
            • Every chuck in a physical file.
            • Multiple logical file will share a physical file, so that it can reduces the number of physical files. 
              • All meta-data for physical files can be in memory, thus every read of Blob file only needs one I/O access.
      • HDFS/GFS
        • GFS v2 may be able to combine GFS and Blob FS into a system. It is difficult to do so, since
          • It needs to support both large and small files.
          • The size of meta-data is too large and thus the master nodes also need to be distributed.

    Google File System (GFS) v.s. HDFS

    Overview

    HDFS is a simplified version of GFS.

    Similarities

    • Master and Slaves
      • Both GFS and HDFS use single master + multiple slaves mode.
      • The master node maintains the check-point, data migration, log
    • Data blocks and replication
      • It maintains multiple copies (usually 3) to support better reliability and performance
    • Tree structure
      • It maintains a tree-structure file system, and allows operations like those under Linux system
        • copy, rename, move, copy, delete etc.

    Differences

    • File appends
      • GFS 
        • allow multiple appends and allow multiple clients to append simultaneously
        • if every append will visit the master node, it will be of low efficiency. 
          • GFS use “Leasing Mechanism” to deliver the write permission of Chunk to Chunk Server.
          • Check server can write the chunks within the lease (e.g., 12s).
          • Since multiple servers may write simultaneously, and the API is asynchronous, the records might be in different order. This makes the system design very complicated.
      • HDFS
        • Only allow one open and data append
        • The client will first write the data in local tmp file, and when the size of tmp data reach the size of a chunk (64M), then it will ask the HDFS master to assign a machine and chucn number to write the Chuck data. 
        • Advantage
          • The master will not be bottleneck. Since each write only occur when the data accumulated to be up to 64M.
        • Disadvantage
          • If the machine down in the process, some logs are not in the HDFS, and it might lose some data.
    • Master failure
      • GFS
        • Backup master node. When the main master node fails, a new master node will be voted from the backup nodes.
        • Support snapshot by using “copy on write” approach.
      • HDFS
        • HDFS needs human-interations in terms of failure.
        • Does not support snapshot. 
    • Garbage Collection (GC)
      • GFS
        • Lazy GC. 
        • It will marks the files to be deleted (e.g., rename the file to one contains time information), thus the files will not be able to be visited by normal users.
        • The master node will periodically check the files and delete the out-date ones (usually the files with more than 3 days).
      • HDFS
        • HDFS use simple and directly delete mechanism.

      Facebook Distributed System Optimization via Asynchronization — Big Data Queries

      Asynchronization Query

      • Asynchronization Query
        • Each query is asynchronous, all functions return “Future Object”
        • Every DB query are divided into two parts
          • Set request
          • Receive response
      • Future Object tree
        • Each future object has two states
          • Waiting for execution
          • Finished execution
        • Once the tree structure is constructed, the execution will start from the bottom to the root. 
          • When the root finishes execution, it means the page loading is completed.
        • Lazy manner
          • The execution process is lazy, since it first construct the trees and then execute. This is similar with Spark map-reduce, the functions forms a DAG structure, only when a node is being needed will its predecessor be executed.

      Memcache

      • In terms of the problem of which query should be executed first, it should not be depended in the coding process.
        • But, there should be an extra phase to determine such kind of schedule. 
      • Importance of which query to be executed first
        • 比如我们现在有两个查询需求。一个是查询你在淘宝上买过东西的朋友,另一个是查询你在淘宝上买过保时捷的朋友。常理来说,我们会先想到查询你在淘宝上的朋友,再进行另一个条件的查询,比如这样:”
          IdList friends = waitFor(getFriends(myId));
          yield return getTaoBaoBuyers(friends);
          但是对于保时捷这个查询而言,这是不对的,因为淘宝上买保时捷的人是很少的,可能就一两个,而淘宝上的好友数可能有上百。因此保时捷的查询应该是这个次序比较优化:
          IdList buyers = waitFor(getPorscheBuyer());
          yield return getFriends(buyers);

      Reference
      [1] http://www.infoq.com/cn/news/2015/04/async-distributed-haiping

      Facebook Cassandra

      Introduction

      Cassandra is an open source distributed database system that is designed for storing and managing large amounts of data across commodity servers. 
      • Provide scalability and high availability without compromising performance
      • Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platfrom for mission critical data.
      • Support replicating across multiple datacenters and providing low latency for your users
      • Offer the convenience of column indexes with the performance of log-structured updates, strong support for denormalization and materialized views, and powerful built-in caching.

      History

      • combined Google BigTable with Amazon Dynamo

      Architecture

      • Overview
        • Peer-to-Peer distributed system
        • All nodes the same
        • Data partitioned among all nodes in the cluster
        • Custom data replication to ensure fault tolerance
        • Read/Write anywhere design
      • Communication
        • Each node communicates with each other through the Gossip protocol, which exchanges information across cluster every second
        • A commit log is used on each node to capture write activity
        • Data durability is assured
        • Data also written to an in-memory structure (memtable) and then to disk once the memory structure is full (an SStable)
      • Advantage
        • Scalability
          • Since it uses peer to peer distributed systems, it is easy to add nodes to increase from TB to PB
        • Fault-tolerance
          • Replication
        • Post-relation database
          • Since it column index, it has dynamic schema design to allows for much more flexibile data storage
        • Data compression
          • Uses Google’s Snappy data compression algorithm
          • Compresses data on a per column family level

      Current status

      Facebook used to use Cassandra for inbox search. Now they are using HBase for the same. 

      Reference