1. Libvirt Python API

  • Compute manager calls the driver
    • `self.driven.pause(instance)`
      • It will suspend the instance
  • Nova Libvirt Driver

2. Libvirt Domain 

  • Libvirt domains are defined via XML
  • Domains defined by Nova are persistent
  • XML is re-generated on Hard Reboot
    • Manual costomerizations will be overriden
  • Logging occurs per-domain
    • /var/log/libvirt/qemu/instance_name.log

3. Virsh 

  • virsh is the command line tool for Libvirt
    • Consumes the same API referenced earlier via C
  • virsh list
    • List all domains by name
  • virsh domname uuid
    • Returns the instance_name
  • virsh dumpxml uuid
    • Configuration for an individual domain

4.  Libvirt Domain XML

  • Translated by Libvirt to ultimately call QEMU with the right arguments
  • The XML generated by Nova ends up being modified, fairly heavility, by Libvirt upon definition
  • Key difference between “active” and “inactive” XML
    • Numerous values are derived at time of instance start
    • Output from an active “dumpxml” will likely fail to define
      • Use `virsh dumpxml –inactive uuid `

    5. Libvirtd Configuration

    • Critically important information tuning for Libvirt 1.1.x and newer, example
      • max_clients = 50

    Live Migration

    1. Life Migration Workflow

    • Verify the storage backend is appropriate for the migration type
      • Perform a shared storage check for normal migrations
      • Do the inverse for block migrations
      • Checks are run on both the source and destination, orchestrated via RPC calls from the scheduler
    • On the destination
      • Create the necessary volume connections
      • If block migration, create the instance directory, populate missing backing files from Glance and create empty instance disks
    • On the source
      • Initiate the actual live migration
    • Upon complete
      • Generate the Libvirt XML and define it on the destination

      2. Migrations

      • Why migration
        • Operations
          • Key to performing non-distruptive work
          • Re-balancing workloads and resources
        • Expectations versus reality
          • Special snowflakes
          • Ephemeral instance and the “cloud” way
      • Type of migration
        • Migrate
          • Completely “cold”, libvirt does almost nothing
          • Share code path with “resize”
          • Extremely brittle (users SSh and copies files around)
        • Live migration
          • Orchestrated almost entirely by Libvirt (via DomainMigrateToURI)
        • Block migration
          • Similar code path as live migraiton
          • More risky and brittle (disks are moving along with state)

      3. Live Migrations

      • Nova offloads capabilities comparisons to Libvirt
        • The API equivalent of virsh capabilities is run by the scheduler on the source and destination; 
      • Nova live migraiton
        • Important config options
          • Live_migration_flat =+ VIR_MIGRATE_LIVE
          • block_migration_flag=+ VIR_MIGRATE_LIVE
        • Standardized virtual CPU flags
          • libvirt_cpu_model = custome
          • libvirt_cpu_model = cpu64-rhel6
        • “Max Downtime” (not currently tunable)
          • Look for upstream patches soon
          • Qemu will keep doing when the cut can be done in “30” millseconds

      4. Brittle Operations

      • Any long running, synchronous tasks
        • All migrations (memory sync, disk sync, etc)
      • No graceful way to stop services
      • Most prone to failure
        • Migrate and resize
        • Live migraiton (block or otherwise)
        • Instance snapshot

      5. Recovering from failures

      • Always investigate before forcing actions
        • Look at the log for excpetions
        • Check whether an instance is running on multiple hypervisors
        • Nova reset-state –active and `nova reboot –hard can go a long way
      • Sometime, brute force is going to be required
        • Kill -9 qumu or kvm processes
        • After the database records, commonly `host`

      6. “Stuck” Live Migrations

      • Live migrations can get stuck
      • Instances left in a paused state on both ends
        • Monitor socket is unpresponsive, Libvirt is helpless
      • Generally a result of an overly aggressive “max donwtime” and rapidly changing memory state (e.g., JVM)
      • Can be a result of a QEMU issue/bug
        • manageSave (suspend) will generally be prone as well

      OpenStack Overview

      1. Architecture

      2. Commands

      • In compute node, run `virsh capabilities`, you can see the capability of that node.
      • `virsh dumpxml instance-id`
        • Describe the vm

      3. Reboot

      • Soft reboot
        • It relies completely on the guest OS and ACPI passed through QEMU
      • Hard reboot
        • Just make it work. 
        • It resolves most issue
        • It is at the hypervisor and Nova level 
        • It makes zero assumptions about the state of the hypervisor
          • Notable effort has been placed to make internal operations idempotent, and call them here.
        • Steps
          • Destroy the domain
            • Equivalent of `virsh destroy`
            • Does not destroy data, only the QEMU process
            • Effectively a `kill -9` of the QEMU process
          • Re-establish any and all volume connections.
          • Regenerate the Libvirt XML
          • Check for and re-download any missing backing files (instance_dir/_base)
          • Plug VIFs (re-create bridges, VLAN interfaces, etc.)
          • Regenerate and apply iptables rules

        My paper list to read


        • Peizhe Cheng, Shuaiqiang Wang, Jun Ma, Jiankai Sun and Hui Xiong. Learning to Recommend Accurate and Diverse Items. The 26th International World Wide Web Conference (WWW)
        • Dimitrios Serbos, Shuyao Qi, Nikos Mamoulis, Evaggelia Pitoura and Panayiotis Tsaparas. Fairness in Package-to-Group Recommendations
        • Exploring Rated Datasets with Rating Maps






        A Large-scale Analysis of the Mnemonic Password Advice
        Show Me the Money! Finding Flawed Implementations of Third-party In-app Payment in Android Apps


        A Call to ARMs: Understanding the Costs and Benefits of JIT Spraying Mitigations
        Internet-scale Probing of CPS: Inference, Characterization and Orchestration Analysis
        Dachshund: Digging for and Securing (Non-)Blinded Constants in JIT Code


        Ramblr: Making Reassembly Great Again
        BOOMERANG: Exploiting the Semantic Gap in Trusted Execution Environments
        A Broad View of the Ecosystem of Socially Engineered Exploit Documents
        Dark Hazard: Learning-based, Large-Scale Discovery of Hidden Sensitive Operations in Android Apps
        ASLR on the Line: Practical Cache Attacks on the MMU
        Hey, My Malware Knows Physics! Attacking PLCs with Physical Model Aware Rootkit
        Wi-Fly?: Detecting Privacy Invasion Attacks by Consumer Drones
        HOP: Hardware makes Obfuscation Practical
        TenantGuard: Scalable Runtime Verification of Cloud-Wide VM-Level Network Isolation
        Broken Hearted: How To Attack ECG Biometrics


        DELTA: A Security Assessment Framework for Software-Defined Networks
        Obfuscation-Resilient Privacy Leak Detection for Mobile Apps Through Differential Analysis
        A2C: Self Destructing Exploit Executions via Input Perturbation
        Address Oblivious Code Reuse: On the Effectiveness of Leakage Resilient Diversity


        You are Who You Know and How You Behave: Attribute Inference Attacks via Users’ Social Friends and Behaviors 

        Stealing Machine Learning Models via Prediction APIs

        FlowFence: Practical Data Protection for Emerging IoT Application Frameworks

        Towards Measuring and Mitigating Social Engineering Malware Download Attacks

        Specification Mining for Intrusion Detection in Networked Control Systems

        APISan: Sanitizing API Usages through Semantic Cross-checking

        Undermining Entropy-based Information Hiding (And What to do About it)

        zxcvbn: Low-Budget Password Strength Estimation

        Mirror: Enabling Proofs of Data Replication and Retrievability in the Cloud

        ARMageddon: Cache Attacks on Mobile Devices 

        Hidden Voice Commands

        OblivP2P: An Oblivious Peer-to-Peer Content Sharing System

        AuthLoop: End-to-End Cryptographic Authentication for Telephony over Voice Channels

        Trusted Browsers for Uncertain Times

        Virtual U: Defeating Face Liveness Detection by Building Virtual Models From Your Public Photos

        One Bit Flips, One Cloud Flops: Cross-VM Row Hammer Attacks and Privilege Escalation

        All Your Queries Are Belong to Us:The Power of File-Injection Attacks on Searchable Encryption

        Fast, Lean, and Accurate: Modeling Password Guessability Using Neural Networks

        SGX-Enabled Oblivious Machine Learning

        Poking Holes into Information Hiding

        Off-Path TCP Exploits: Global Rate Limit Considered Dangerous

        Request and Conquer: Exposing Cross-Origin Resource Size


        WebPerf: Evaluating What-If Scenarios for Cloud-hosted Web Applications

        Taking the Blame Game out of Data Centers Operations with NetPoirot 

        Accurate Spear Phishing Campaign Attribution and Early Detection

        Rich Cloud-Based Web Applications with CloudBrowser 2.0 
        Controlling the Elasticity of Web Applications on Cloud Computing


        StormDroid: A Streaminglized Machine Learning-based System for Detecting Android Malware

        Bilateral-secure Signature by Key Evolving

        Efficient Authenticated Multi-Pattern Matching

        Attestation Transparency: Building secure Internet services for legacy clients

        Congesting the Internet with Coordinated And Decentralized Pulsating Attacks

        Privacy and Utility of Inference Control Mechanisms for Social Computing Applications

        StemJail: Dynamic Role Compartmentalization

        Your Credentials Are Compromised, Do Not Panic: You Can Be Well Protected

        Power-aware Checkpointing: Toward the Optimal Checkpointing Interval under Power Capping

        A Sharper Sense of Self: Probabilistic Reasoning of Program Behaviors for Anomaly Detection with Context Sensitivity

        Characterizing the Consistency of Online Services

        Balancing Security and Performance for Agility in Dynamic Threat Environments
        Specification Mining for Intrusion Detection in Networked Control Systems

        CCS 2016
        SmartWalk: Enhancing Social Network Security via Adaptive Random Walks

        Acing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

        Content Security Problems? Evaluating the Effectiveness of Content Security Policy in the Wild

        CSP is Dead, Long Live CSP: On the Insecurity of Whitelists and the Future of the Content Security Policy

        CSPAutoGen: Black-box Enforcement of Content Security Policy upon Real-World Websites

        A EpicRec: Towards Practical Differentially Private Framework for Personalized Recommendation

        Generic Attacks on Secure Outsourced Databases

        Identifying the Scanners and Attack Infrastructure behind Amplification DDoS attacks

        Lurking Malice in the Cloud: Understanding and Detecting Cloud Repository as a Malicious Service