OpenStack Nova VM live migration flow

Original post is:  OpenStack Nova VM migration flow 

  • nova.api.openstack.compute.contrib.admin_actions._migrate_live()
  • nova.compute.api.live_migrate()
  •        – update instance state to MIGRATING state
           – call into scheduler to live migrate (scheduler hint will be set to the host select (which may be none))

  • nova.scheduler.manager.live_migration()
  • nova.scheduler.manager._schedule_live_migration()
  • nova.conductor.tasks.live_migrate.LiveMigrationTask.execute()
  •         – check that the instance is running

            – check that the instance’s host is up
            – if destination host provided, check that it..
                  1. is different than the instance’s host
                  2. is up
                  3. has enough memory
                  4. is compatible with the instance’s host (i.e., hypervisor type and version)
                  5. passes live migration checks (call using amqp rpc into nova manager check_can_live_migration_destination)
              – else destination host not provided, find a candidate host and check that it 
                  1. is compatible with the instance’s host (i.e., hypervisor type and version)
                  2. passes live migration checks 
               – call using amqp rpc into nova manager live_migration
                 Note: Migration data is initially set by check_can_live_migrate_destionation and can be used for implementation specific parameters from this point.


  • nova.compute.manager.check_can_live_migrate_destination()
  •         – driver.check_can_live_migrate_destination()

            – call using amqp rpc into nova manager check_can_live_migrate_source
            – driver.check_ca_live_migrate_destionation_cleanup()

  • nova.compute.manager.check_can_live_migrate_source()
  •         – determine if the instance is volume backed and add result to the migration data

            – driver.check_can_live_migrate_source()

  • nova.compute.manager.live_migration()
  •         – if block migration request then driver.get_instace_disk_info()

            – call using amqp rpc into nova manager pre_live_migration
                  – Error handler: _rollback_live_migraiton
            – driver.live_migration()

  • nova.compute.manager.pre_live_migration()
  •         – get the block device information for the instance

            – get the network information for the instance
            – driver.pre_live_migration()
            – setup networks on destination host by calling the network API setup_networks_on_host
            – driver.ensure_filtering_rules_for_instance()

  • nova.compute.manager._rollback_live_migration()
  • nova.compute.manager._post_live_migration()
  •         – driver.get_volume_connector()

            – for each instance volume connection call the volume API terminate_connection
            – driver.unfilter_instance()
            – call into conductor to network_migrate_instance_start which will eventually call the network API migrate_instace_start
            – call using amqp rpc into nova manager post_live_migration_at_destionation
            – if block migration or not shared storage driver.destory()
            – else driver.unplug_vifs()

    Google Cloud VM Live Migration

    Introduction

    Heartbleed bug was revealed on April 7th, 2014. On that day, most cloud customers were impacted because patching the system requires VM reboot. At Google, none of the customers were impacted due to the transparent maintenance functionality introduced in Google Compute Engine in Dec 2013.

    Through a combination of datacenter topology innovations and live migration technology, they can move their customer running VMs out of the way of planned hardware and software maintenance events, so they keep the infrastructure protected and reliable — without customers’ VMs, applications or workloads noticing that anything happened.

    VM Migration Procedure 

    The high-level steps are illustrated in the following

    • The process begins with a notification that VMs need to be evicted from their current host machine. The notification might start with a file change (e.g., a release engineer indicating that a new BIOS is available), Hardware Operations scheduling maintenance, an automatic signal from an impending hardware failure etc. 
    • Once a VM is selected for migration, we provide a notification to the guest that a migration is imminent. After a waiting period, a target host is selected and the host is asked to set up a new, empty “target” VM to receive the migrating “source” VM. Authentication is used to establish a connection between the source and target. 
    • There are three stages involved in the VM’s migration
      • During pre-migration brownout, the VM is still executing on the source, while most state is sent from the source to the target. For instance, we copy all the guest memory to the target, while tracking the pages that have been re-dirtied on the source. The time spent in pre-migration brownout is a function of the size of the guest memory and the rate at which pages are being dirtied.
      • During blackout, which is a very brief moment when the VM is not running anywhere, it is paused, and all the remaining state required to being running the VM on the target is sent. 
      • During post-migration brownout, the VM executes on the target. The source VM is present, and may be providing supporting functionality for the target. For instance, until the network fabric has caught up the new location of the VM, and source VM provides forwarding services for packets to and from the target VM
    • Finally, the migration is completed, and the system deletes the source VM.

      Reference
      [1] Google Compute Engine Users Live Migration Technology to service infrastructure without application downtime 

      Hyper-V Live VM Migration Procedure


      1. Live migration setup occurs. 

      During the live migration setup stage, the source server creates a connection with the destination server. This connection transfers the virtual machine configuration data to the destination server. A skeleton virtual machine is set up on the destination server and memory is allocated to the destination virtual machine.



      2. Memory pages are transferred from the source node to the destination node

      • In the second stage of a live migration, the memory assigned to the migrating virtual machine is copied over the network to the destination server. This memory is referred to as the “working set” of the migrating virtual machine. A page of memory is 4 KB.
      • In addition to copying the working set to the destination server, Hyper-V monitors the pages in the working set on the source server. As memory pages are modified in the source server, they are tracked and marked as being modified. 
      • During this phase of the migration, the migrating virtual machine continues to run. Hyper-V iterates the memory copy process several times, with each iteration requiring a small number of modified pages to be copied. 


      3. Modified pages are transferred.

      • This third phase of the migration is a memory copy process that duplicates the remaining modified memory pages to the destination server. The source server transfers the CPU and device state of the virtual machine to the destination server.
      • During this stage, the network bandwidth available between the source and destination servers is critical to the speed of the live migration. Using a 1 Gigabit Ethernet or faster is important. The faster the source server transfers the modified pages from the migrating virtual machines working set, the more quickly the live migration is completed
      • The number of pages transferred in this stage is determined by how actively the virtual machine accesses and modifies the memory pages. The more modified pages there are, the longer it takes to transfer all pages to the destination server

      4. The storage handle is moved from the source server to the destination server

      • During the fourth stage of a live migration, control of the storage such as any virtual hard disk files or physical storage attached through a virtual Fibre Channel adapter, is transferred to the destination server. 


      5. The virtual machine is brought online on the destination server.

      • In the fifth stage of a live migration, the destination server now has the up-to-date working set as well as access to any storage used by the virtual machine. At this point, the virtual machine is resumed. 
      6. Network cleanup occurs. 
      • In the final stage of a live migration, the migrated virtual machine is running on the destination server. At this point, a message is sent to the network switch. This message causes the network switch to obtain the new MAC addresses of the migrated virtual machine so that network traffic to and from the virtual machine can use the correct switch port

      Reference
      [1] Virtual Machine Live Migration Overview

      VM Live Migration’s Impacts on the Running Applications

      1. Will the IP address change after migration?

      Both types of live migration exist, including changing and not changing IP address [5].

      • Based on Google cloud [1], it can migrate clients’ VM without affect the customers. That means the IP address of a VM would not be changed in this case.
        • To retain the same IP address, hyper-V requires the source and destination hosts to be within the same subnet. I think Google cloud may not have this requirement.
        • I think the virtual network [4] would be able to remove the restrictions on the locations of the destination hosts. “Hyper-V Network Virtualization decouples virtual networks for customer virtual machines from the physical network infrastructure.” 

      2. Will the migration interrupt the Internet service?

      This depends on the implementation. The answer is different regarding different implementation.

      • According to google cloud [1], there will be no service interruptions.
        • During post-migration brownout, the VM executes on the target. The source VM is present, and may be providing supporting functionality for the target. For instance, until the network fabric has caught up the new location of the VM, and source VM provides forwarding services for packets to and from the target VM
      • According to hyper-V [2]
        • the migration is not downtime-free, the interruption is almost immeasurably brief. Usually the longest delay is the network layer while the virtual machine’s MAC address is registered on the new physical switch port and its new location is propagated throughout the network. 
        • According to [3], in order to use live migration the VM needs to keep the same IP address across date centers in order to achieve the goal of continuous access from clients to the virtual machine during and after the migration. 

      3. How the network is migrated?

      The most challenging issue in VM migration is to keep the network working.

      In LAN, different hypervisors using different strategies.

      • Xen
        • It uses ARP to bind the IP address to the new host. 
          • The VM sends ARP signal, broadcast that the IP address is moved to a new host.  But this may not be allowed for security reasons. 
      • VMware
        • VMotion uses VNIC to ensure the network connection. 
          • The VNIC will be migrated with VM as well. Every VNIC has a unique MAC address in LAN and is connected to one or multiple NIC. 
          • Since VNIC has a MAC address that is irrelevant to the physical network address, the network will be continued as normal using VM live migration. 
          • Note due to the restrictions of Ethernet, the source and destination hosts have to be in the same subnet

      In WAN

      • The VM will be given a new IP address in the destination host. In order to ensure the network connection, we can use IP tunnel with combination of dynamic DNS, i.e., we can build a IP tunnel between the source IP and destination IP address, and use it to forward the packets from source host to destination host. Once migration is done, VM can response to the new network. It means the DNS is updated, and the network connection will refer to the new IP address. 

      Reference
      [1] Google cloud VM live migration
      [2] Hyper-V live migration
      [3] Live Migration — Implementation considerations
      [4] Hyper-V 网络虚拟化概述 
      [5] 虚拟机迁移研究