Live Migration

Posted on March 19, 2016 by admin

1. Life Migration Workflow

Verify the storage backend is appropriate for the migration type

Perform a shared storage check for normal migrations
Do the inverse for block migrations
Checks are run on both the source and destination, orchestrated via RPC calls from the scheduler

On the destination

Create the necessary volume connections
If block migration, create the instance directory, populate missing backing files from Glance and create empty instance disks

On the source

Initiate the actual live migration

Upon complete

Generate the Libvirt XML and define it on the destination

2. Migrations

Why migration

Operations

Key to performing non-distruptive work
Re-balancing workloads and resources

Expectations versus reality

Special snowflakes
Ephemeral instance and the “cloud” way

Type of migration

Migrate

Completely “cold”, libvirt does almost nothing
Share code path with “resize”
Extremely brittle (users SSh and copies files around)

Live migration

Orchestrated almost entirely by Libvirt (via DomainMigrateToURI)

Block migration

Similar code path as live migraiton
More risky and brittle (disks are moving along with state)

3. Live Migrations

Nova offloads capabilities comparisons to Libvirt

The API equivalent of virsh capabilities is run by the scheduler on the source and destination;

Nova live migraiton

Important config options

Live_migration_flat =+ VIR_MIGRATE_LIVE
block_migration_flag=+ VIR_MIGRATE_LIVE

Standardized virtual CPU flags

libvirt_cpu_model = custome
libvirt_cpu_model = cpu64-rhel6

“Max Downtime” (not currently tunable)

Look for upstream patches soon
Qemu will keep doing when the cut can be done in “30” millseconds

4. Brittle Operations

Any long running, synchronous tasks

All migrations (memory sync, disk sync, etc)

No graceful way to stop services
Most prone to failure

Migrate and resize
Live migraiton (block or otherwise)
Instance snapshot

5. Recovering from failures

Always investigate before forcing actions

Look at the log for excpetions
Check whether an instance is running on multiple hypervisors
Nova reset-state –active and `nova reboot –hard can go a long way

Sometime, brute force is going to be required

Kill -9 qumu or kvm processes
After the database records, commonly `host`

6. “Stuck” Live Migrations

Live migrations can get stuck
Instances left in a paused state on both ends

Monitor socket is unpresponsive, Libvirt is helpless

Generally a result of an overly aggressive “max donwtime” and rapidly changing memory state (e.g., JVM)
Can be a result of a QEMU issue/bug

manageSave (suspend) will generally be prone as well

7. Live Migration Flow

http://bodenr.blogspot.com/2014/03/openstack-nova-vm-migration-live-and.html

Leave a Reply Cancel reply

You must be logged in to post a comment.