ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
TASK-AWARE VIRTUAL MACHINE
SCHEDULING FOR I/O PERFORMANCE

Hwanju Kim, Hyeontaek Lim, Jinkyu Jeong, Heeseung Jo
   (Korea Advanced Institute of Science and Technology)
         Joonwon Lee (Sungkyunkwan Univ.)
                VEE 2009 March 13
Virtual Machine
          Consolidation
Centralized various computing environments
  Virtual desktop infrastructure
    VMware, Sun, HP, MS
  Cloud computing
    Amazon EC2
Virtual Machine
            Consolidation
  Centralized various computing environments
    Virtual desktop infrastructure
      VMware, Sun, HP, MS
    Cloud computing
      Amazon EC2




Unpredictable workloads
  due to the diversity
Virtual Machine
          Consolidation
Performance enhancement
 Paravirtualization
 Hardware-assisted techniques
   Intel VT, AMD SVM
 Optimization
Virtual Machine
            Consolidation
 Performance enhancement
   Paravirtualization
   Hardware-assisted techniques
     Intel VT, AMD SVM
   Optimization




High degree of consolidation
Virtual Machine
            Consolidation
 Performance enhancement
   Paravirtualization
   Hardware-assisted techniques
     Intel VT, AMD SVM
   Optimization                   Unpredictable workloads



                                        Intelligent CPU
High degree of consolidation         management can improve
                                        the performance
Background
A semantic gap between the VMM and a guest OS
  VMM¡¯s lack of knowledge of VM internal

  No tracking characteristics of guest-level tasks

     Internal workload-agnostic scheduling

     Poor decision about ¡°when¡± to schedule a VM

  Simple design of the VMM

                       OS awareness
 Low overheads
                                           E?cient resource management
   Low TCB
Background
Task-unawareness leading to poor responsiveness


Head        Run queue sorted based on CPU fairness                Tail

                                                       I/O-
                                                                I/O-
                                                     bound
     VM               VM                              task
                                                              bound
                                                               task

   (VCPU)           (VCPU)                            CPU-
                                                      bound    mixed
                                                       task    task




       VMM
Background
Task-unawareness leading to poor responsiveness


Head           Run queue sorted based on CPU fairness                Tail

                                                          I/O-
                                                                   I/O-
                                                        bound
     VM                  VM                              task
                                                                 bound
                                                                  task

   (VCPU)              (VCPU)                            CPU-
                                                         bound    mixed
                                                          task    task




        VMM

   I/O event
Background
Task-unawareness leading to poor responsiveness
                                                That event
                                                 is mine!
Head           Run queue sorted based on CPU fairness                     Tail

                                                               I/O-
                                                                        I/O-
                                                             bound
     VM                  VM                                   task
                                                                      bound
                                                                       task

   (VCPU)              (VCPU)                                 CPU-
                                                              bound    mixed
                                                               task    task




        VMM

   I/O event
Background
Task-unawareness leading to poor responsiveness
                                                  That event
                                                   is mine!
Head           Run queue sorted based on CPU fairness                       Tail

                                                                 I/O-
                                                                          I/O-
                         I have no idea this incoming          bound
     VM                    VM event is yours                    task
                                                                        bound
                                                                         task

   (VCPU)              (VCPU) and                               CPU-
                                                                bound    mixed
                        your VM has low priority now.            task    task
                          Sorry not to schedule you..



        VMM
                                      Responsiveness VS. Fairness
   I/O event
Background
Task-unawareness leading to poor responsiveness
                                                  That event
                                                   is mine!
Head           Run queue sorted based on CPU fairness                       Tail

                                                                 I/O-
                                                                          I/O-
                         I have no idea this incoming          bound
     VM                    VM event is yours                    task
                                                                        bound
                                                                         task

   (VCPU)              (VCPU) and                               CPU-
                                                                bound    mixed
                        your VM has low priority now.            task    task
                          Sorry not to schedule you..



        VMM
                                      Responsiveness VS. Fairness
   I/O event
Background
       The worst case example for 6 domains consolidated
       1                                          Workloads
                    Native Linux(I/O+CPU)         1 dom: Server & CPU-bound task
      0.8                   XenoLinux(I/O)        5doms: CPU-bound task
                     XenoLinux(I/O+CPU)

      0.6
CDF




      0.4

      0.2

       0
            0   2         4        6         8   10
                      Response time (ms)
Background
       The worst case example for 6 domains consolidated
       1                                          Workloads
                    Native Linux(I/O+CPU)         1 dom: Server & CPU-bound task
      0.8                   XenoLinux(I/O)        5doms: CPU-bound task
                     XenoLinux(I/O+CPU)

      0.6
CDF




      0.4

      0.2

       0
            0   2         4        6         8   10
                      Response time (ms)
Background
       The worst case example for 6 domains consolidated
       1                                              Workloads
                        Native Linux(I/O+CPU)         1 dom: Server & CPU-bound task
      0.8                       XenoLinux(I/O)        5doms: CPU-bound task
                         XenoLinux(I/O+CPU)

      0.6       By boosting mechanism of
CDF




                  Xen Credit scheduler
      0.4

      0.2                                Boosting mechanism realizes
                                     I/O boundness with only VCPU-level
       0
            0       2         4        6         8   10
                          Response time (ms)
Main Goals
Improve responsiveness of an I/O-bound task
  Priority boosting with task-level granularity

    ¡°Partial boosting¡±

CPU fairness guarantee
Transparency
Low management overheads
Issues
How to identify an I/O-bound task




How to know an incoming event is for the I/O-
bound task
Approach
Non-intrusive approach
  No guest OS modi?cation
    No explicit interface to inform I/O-bound task and event
    data

  Pros.
    No additional engineering cost for di?erent OSes
    Strong trustworthiness
  Cons.
    False decision
HOW TO IDENTIFY I/O-BOUND TASKS
Tracking I/O-bound Tasks
Observable information at the VMM
  Task switching

     Monitoring address space changes (Antfarm USENIX¡¯06)

  CPU time usage

     Running time of a task

Example (x86)   event
                        CR3 update            CR3 update

                                )                        ()
                              c(                     ts
                                                       c      Time
                          d ts                    rd
                         r
                                 Task running time
Tracking I/O-bound Tasks
Inference based on common gray-box knowledge
  Kernel policy to improve responsiveness of I/O-
  bound tasks
    An I/O-bound task is preemptively scheduled in
    response to its incoming event

  Characteristic of I/O-bound tasks
    Short running time

    Threshold to decide a short running time: IOthreashold
Tracking I/O-bound Tasks
Three disjoint observation
                                                Observation
classes based on two gray-box
criteria                               Ambiguity

  Positive evidence
                                           Positive        Negative
    supports I/O-boundness                 evidence        evidence


  Negative evidence
    supports non-I/O-boundness
                             Preemptively scheduling in response to an event
  Ambiguity                                        &
    No evidence                   Short running time( < IOthreshold )
Tracking I/O-bound Tasks
Example
                           IOthreshold
Event pending
  to VCPU1



                                                                Time



                Positive                 Negative   Ambiguity
Tracking I/O-bound Tasks
Example
                        IOthreshold
Event pending
  to VCPU1
             T1


                                                             Time
      Scheduling
       VCPU1



             Positive                 Negative   Ambiguity
Tracking I/O-bound Tasks
Example
                        IOthreshold
Event pending
  to VCPU1
             T1 T2


                                                             Time
      Scheduling
       VCPU1



             Positive                 Negative   Ambiguity
Tracking I/O-bound Tasks
Example
                        IOthreshold
Event pending
  to VCPU1
             T1 T2


                                                             Time
      Scheduling
       VCPU1



             Positive                 Negative   Ambiguity

                                                 T1
Tracking I/O-bound Tasks
Example
                          IOthreshold
Event pending
  to VCPU1
             T1 T2   T3


                                                               Time
      Scheduling
       VCPU1



             Positive                   Negative   Ambiguity

                                                   T1
Tracking I/O-bound Tasks
Example
                          IOthreshold
Event pending
  to VCPU1
             T1 T2   T3


                                                               Time
      Scheduling
       VCPU1



             Positive                   Negative   Ambiguity

            T2                                     T1
Tracking I/O-bound Tasks
Example
                          IOthreshold
Event pending
  to VCPU1
             T1 T2   T3    T4


                                                               Time
      Scheduling
       VCPU1



             Positive                   Negative   Ambiguity

            T2                                     T1
Tracking I/O-bound Tasks
Example
                          IOthreshold
Event pending
  to VCPU1
             T1 T2   T3    T4


                                                               Time
      Scheduling
       VCPU1



             Positive                   Negative   Ambiguity

            T2       T3                            T1
Tracking I/O-bound Tasks
Example
                          IOthreshold
Event pending
  to VCPU1
             T1 T2   T3    T4                  T5


                                                                Time
      Scheduling
       VCPU1



             Positive                   Negative    Ambiguity

            T2       T3                             T1
Tracking I/O-bound Tasks
Example
                          IOthreshold
Event pending
  to VCPU1
             T1 T2   T3    T4                  T5


                                                                Time
      Scheduling
       VCPU1



             Positive                   Negative    Ambiguity

            T2       T3                 T4          T1
Tracking I/O-bound Tasks
Example
                          IOthreshold
Event pending
  to VCPU1
             T1 T2   T3    T4                  T5   T6


                                                                     Time
      Scheduling
       VCPU1



             Positive                   Negative         Ambiguity

            T2       T3                 T4               T1
Tracking I/O-bound Tasks
Example
                          IOthreshold
Event pending
  to VCPU1
             T1 T2   T3    T4                  T5   T6


                                                                     Time
      Scheduling
       VCPU1



             Positive                   Negative         Ambiguity

            T2       T3                 T4               T1   T5
Tracking I/O-bound Tasks
Example
                          IOthreshold
Event pending
  to VCPU1
             T1 T2   T3    T4                  T5   T6


                                                                             Time
      Scheduling                                              Descheduling
       VCPU1                                                    VCPU1



             Positive                   Negative         Ambiguity

            T2       T3                 T4               T1     T5
Tracking I/O-bound Tasks
Example
                          IOthreshold
Event pending
  to VCPU1
             T1 T2   T3    T4                  T5   T6


                                                                             Time
      Scheduling                                              Descheduling
       VCPU1                                                    VCPU1



             Positive                   Negative         Ambiguity

            T2       T3                 T4   T6          T1     T5
Tracking I/O-bound Tasks
  Weighted evidence accumulation
     The degree of belief to reinforce the inference

     Weight of positive evidence < Weight of negative evidence

       More penalize for negative evidence


                                                          BelThreshold
The degree
    of                      At this time, this task is believed as an I/O-bound task
  belief
                 # of sequential observations
HOW TO KNOW AN INCOMING
EVENT IS FOR AN I/O-BOUND TASK
Correlation Mechanism
To distinguish an incoming event for I/O-bound task


Block I/O
  Block read
Network I/O
  Packet reception
Correlation Mechanism
               - Block I/O -
Request/response style
If T1 requests for reading B1 and T1 is I/O-bound
    Completion event for B1 is for I/O-bound task

How to decide ¡°T1 read B1¡± at the VMM
Correlation Mechanism
               - Block I/O -
Request/response style
If T1 requests for reading B1 and T1 is I/O-bound
    Completion event for B1 is for I/O-bound task

How to decide ¡°T1 read B1¡± at the VMM
  When the VMM observes a read event, it checks
  whether the current task is I/O-bound
Correlation Mechanism
               - Block I/O -
Request/response style
If T1 requests for reading B1 and T1 is I/O-bound
    Completion event for B1 is for I/O-bound task

How to decide ¡°T1 read B1¡± at the VMM
  When the VMM observes a read event, it checks
  whether the current task is I/O-bound

  But, how about ¡°delayed read event¡± ?
       Guest OS dependent (e.g. block I/O scheduler)
Correlation Mechanism
                        - Block I/O -
Inspection window
               inspection window

 user     T1      T2     T3   T4

 kernel                read

 VMM                             actual
                              read request

If an I/O-bound task in inspection window
     The actual read request is for I/O-bound task
  False positive VS. False negative
Correlation Mechanism
             - Network I/O -
Event identi?cation
  Socket-like information is too heavy for the VMM

  Destination port number for TCP/IP communication

    Most speci?c to a recipient task

Asynchronous packet reception
  No prior information about incoming packets
  History-based prediction mechanism
Correlation Mechanism
              - Network I/O -
History-based prediction mechanism
  Inference
    ¡°If an incoming packet is for I/O-bound task, this
    packet makes the I/O-bound task to be
    preemptively scheduled¡±


  Monitoring the ?rst woken task in response to an
  incoming packet
Correlation Mechanism
                      - Network I/O -
 History-based prediction mechanism (cont¡¯)
    Portmap
      An entry for each destination port number
      Each entry is an N-bit saturating counter
                                     Example (2-bit counter)
            00             01
portmap      non-           weak
          I/O-bound      I/O-bound
                                            If the ?rst woken task is I/O-bound
                                            Otherwise

            10             11
                           strong
          I/O-bound
                         I/O-bound
Correlation Mechanism
                      - Network I/O -
 History-based prediction mechanism (cont¡¯)
    Portmap
      An entry for each destination port number
      Each entry is an N-bit saturating counter
                                     Example (2-bit counter)
            00             01
portmap      non-           weak
          I/O-bound      I/O-bound
                                             If the ?rst woken task is I/O-bound
                                             Otherwise

            10             11
                           strong    If portmap counter¡¯s MSB is set, this packet
          I/O-bound
                         I/O-bound
                                                  is for I/O-bound
PARTIAL BOOSTING
Partial Boosting
Priority boosting with task-level granularity
  Priority boosting lasts during the run of an I/O-bound task



Why?
  To prevent CPU-bound tasks in a boosted VCPU from
  compromising CPU fairness
Partial Boosting
Procedure

Head         Run queue sorted based on CPU fairness                 Tail


    VM1                VM2                               VM3
   (VCPU)             (VCPU)                      I/O-   CPU-    CPU-
                                                 bound   bound   bound
                                                  task    task    task




       VMM
Partial Boosting
Procedure

Head         Run queue sorted based on CPU fairness                 Tail


     VM1               VM2                               VM3
    (VCPU)            (VCPU)                      I/O-   CPU-    CPU-
                                                 bound   bound   bound
                                                  task    task    task




       VMM
                     If this event is inferred for an I/O-bound task in VM3,
I/O event                           do partial boosting for VM3
Partial Boosting
Procedure

Head              Run queue sorted based on CPU fairness         Tail


         VM3                  VM1                  VM2
  I/O-   CPU-    CPU-        (VCPU)               (VCPU)
 bound   bound   bound
  task    task    task




         VMM
                          If this event is inferred for an I/O-bound task in VM3,
I/O event                                do partial boosting for VM3
Partial Boosting
Procedure

Head              Run queue sorted based on CPU fairness         Tail


         VM3                  VM1                  VM2
  I/O-   CPU-    CPU-        (VCPU)               (VCPU)
 bound   bound   bound
  task    task    task




         VMM
                          If this event is inferred for an I/O-bound task in VM3,
I/O event                                do partial boosting for VM3
Partial Boosting
Procedure

Head              Run queue sorted based on CPU fairness         Tail


         VM3                  VM1                  VM2
  I/O-   CPU-    CPU-        (VCPU)               (VCPU)
 bound   bound   bound
  task    task    task




         VMM
                          If this event is inferred for an I/O-bound task in VM3,
I/O event                                do partial boosting for VM3
Partial Boosting
Procedure

Head         Run queue sorted based on CPU fairness                 Tail


     VM1               VM2                               VM3
    (VCPU)            (VCPU)                      I/O-   CPU-    CPU-
                                                 bound   bound   bound
                                                  task    task    task




       VMM
                     If this event is inferred for an I/O-bound task in VM3,
I/O event                           do partial boosting for VM3
IMPLEMENTATION & EVALUATION
Implementation
Based on Credit scheduler in Xen 3.2.1
Task information maintained by hash
  Limited number of tasks maintained

  Remove of a task with infrequent I/O

Correlation
  Block I/O : using grant table in Xen

  Network I/O : supported by network backend driver

No consideration of multiple VCPUs
Evaluation
Interactive workload        Worst case scenario
                            1 dom: Server & CPU-bound task
  Packet request-response   5doms: CPU-bound task

                            Think time: 100 ~ 1000 ms
                            IOthreshold = 0.5 ms
Evaluation
Interactive workload        Worst case scenario
                            1 dom: Server & CPU-bound task
  Packet request-response   5doms: CPU-bound task

                            Think time: 100 ~ 1000 ms
                            IOthreshold = 0.5 ms
Evaluation
    Correlation evaluation
        Partial boosting hit ratio (PBHR)
                                         h
              PBHR (%) =                                  ¡Á 100
                         T he number of partial boostings
where
          1    , if an I/O-bound task awakes during partial boosting.
   h=
          0    , otherwise.

          de?ned as true positive ratio
              False positive ratio = (100 - PBHR) %
Evaluation
             Correlation evaluation: Block I/O
                             PBHR (%)
                                                    Workloads
           100
                                                    1 dom: 8 tasks
                                                        1 task: I/O-bound task
            75                                          7 tasks: I/O+CPU task
PBHR (%)




                                                        (CPU usage 1~300ms between IOs)
            50                                      5doms: CPU-bound task

            25

            0
                 1   2   3   4   5   6   7   8 NC
                     Inspection window size
Evaluation
                    Correlation evaluation: Block I/O
                                      Throughput
                    120                                       Workloads
                                                              1 dom: 8 tasks
Throughput (KB/s)




                                                                  1 task: I/O-bound task
                    90                                            7 tasks: I/O+CPU task
                                                                  (CPU usage 1~300ms between IOs)
                                                              5doms: CPU-bound task
                    60


                    30
                          1   2   3    4   5   6   7   8 NC
                              Inspection window size
Evaluation
            Correlation evaluation: Network I/O

           100           90      93             Workloads
                                                1 dom: 8 tasks
            75    64                                1 task: I/O-bound task
PBHR (%)




                                                    7 tasks: I/O+CPU task
            50                                      (CPU usage 1~300ms between IOs)
                                                5doms: CPU-bound task

            25
                                         10
            0                                        2-bit is reasonable
                   1      2       4     NC      in terms of space overheads
                 Bit-width of portmap counter
Evaluation
                Response time for text editing during CPU-intensive
                workload
       1                                                                1
      0.9                                      Normal                  0.9                                   Normal
                                                TAVS                                                          TAVS
      0.8                                                              0.8
      0.7                                                              0.7
      0.6                                                              0.6



                                                                 CDF
CDF




      0.5                                                              0.5
      0.4                                                              0.4
      0.3                                                              0.3
      0.2                   Text editing                               0.2              Text editing
      0.1                (Xen compilation)                             0.1   (Web browsing with Flash animations)
       0                                                                0
            0     0.05    0.1     0.15       0.2    0.25   0.3               0   0.05   0.1     0.15       0.2    0.25   0.3
                           Response time (sec)                                           Response time (sec)
Evaluation
Execution time of I/O-bound tasks with CPU-intensive
workloads
                                              Normal           TAVS
         Normalized execution time



                                     1.00

                                     0.75

                                     0.50
                                                                          PBHR = 99%
                                     0.25

                                       0
                                             grep   find   wget   smbcp   smbcp: copy ?les from
                                                                           remote samba server
 Background workload                         compilation    compression
Evaluation
Overheads
 Task tracking overheads: 0.06%

 No overhead for inspecting incoming packets

 Increased network throughput : decreased CPU throughput
 = 48 : 1

 Space overhead of N-bit portmap

   N * 8KB for each VM

   e.g. 2-bit portmaps for TCP and UDP: 32KB for each VM
Conclusions
Task-aware VM scheduling
  Bridging the semantic gap in CPU management
  Transparency by VMM-level inference
    Gray-box technique
  Low overheads
Future Work
Extension on multicore system


Simulation-based analysis for more intelligent
scheduling


Evaluation for more various workloads
THANK YOU!
BACKUP SLIDES
Implementation
Block correlation
                                          Shared
                     IDD   Dom1         grant table     Dom2

                                       1




              Active                  Dom1's           Dom2's
     VMM    grant table            recent task list recent task list
                                     T1 T3            T2 T3 T1
            1



                                                   Hardware

                           Machine memory
Implementation
Network correlation

                        IDD         Dom1                 Dom2




                  Incoming packet
                     information
      VMM         Dom1, TCP, 1000
                  Dom2, UDP, 2000
                                    update   Dom1's     Dom2's
                  Dom2, TCP, 1000
                                             portmap    portmap
                  Shared memory


                                                       Hardware
            NIC
Throughput
                100                                          25                                               100                                          60
                                                                                      Throughput
                 95                                                                   Dom6                     95
                 90                                                                   Dom5                     90
                 85                                                                   Dom4                     85                                          50
                 80                                          20                       Dom3                     80
                 75                                                                   Dom2                     75
                                                                                      Dom1(I/O)
                 70                                                                   IDD                      70
                                                                                                               65                                          40




                                                                                                                                                                Throughput (Mbps)
                 65




                                                                  Throughput (MB/s)




                                                                                              CPU usage (%)
CPU usage (%)




                 60                                          15                                                60
                 55                                                                                            55
                 50                                                                                            50                                          30
                 45                                                                                            45
                 40                                          10                                                40
                 35                                                                                            35                                          20
                 30                                                                                            30
                 25                                                                                            25
                 20                                          5                                                 20
                 15                                                                                            15                                          10
                 10                                                                                            10
                  5                                                                                             5
                  0                                          0                                                  0                                          0
                      Base   0   0.0625 0.125   0.25   0.5                                                          Base   0   0.0625 0.125   0.25   0.5
                                    PBratio                                                                                       PBratio



                                           Allowed CP U usage f or partial boosting
                                 PBratio =
                                                     T otal CP U usage
Degree of Belief
Degree of belief (grep, ?nd + compilation)
                        300                                                grep
The degree of belief




                        250                                                cc1
                        200
                        150
                        100
                         50
                          0
                        -50
                       -100
                              0     100     200              300     400
                                          Elapsed time (sec)
                        300                                                find
The degree of belief




                        250                                                cc1
                        200
                        150
                        100
                         50
                          0
                        -50
                       -100
                              0            100                     200
                                          Elapsed time (sec)

More Related Content

Task-aware Virtual Machine Scheduling for I/O Performance

  • 1. TASK-AWARE VIRTUAL MACHINE SCHEDULING FOR I/O PERFORMANCE Hwanju Kim, Hyeontaek Lim, Jinkyu Jeong, Heeseung Jo (Korea Advanced Institute of Science and Technology) Joonwon Lee (Sungkyunkwan Univ.) VEE 2009 March 13
  • 2. Virtual Machine Consolidation Centralized various computing environments Virtual desktop infrastructure VMware, Sun, HP, MS Cloud computing Amazon EC2
  • 3. Virtual Machine Consolidation Centralized various computing environments Virtual desktop infrastructure VMware, Sun, HP, MS Cloud computing Amazon EC2 Unpredictable workloads due to the diversity
  • 4. Virtual Machine Consolidation Performance enhancement Paravirtualization Hardware-assisted techniques Intel VT, AMD SVM Optimization
  • 5. Virtual Machine Consolidation Performance enhancement Paravirtualization Hardware-assisted techniques Intel VT, AMD SVM Optimization High degree of consolidation
  • 6. Virtual Machine Consolidation Performance enhancement Paravirtualization Hardware-assisted techniques Intel VT, AMD SVM Optimization Unpredictable workloads Intelligent CPU High degree of consolidation management can improve the performance
  • 7. Background A semantic gap between the VMM and a guest OS VMM¡¯s lack of knowledge of VM internal No tracking characteristics of guest-level tasks Internal workload-agnostic scheduling Poor decision about ¡°when¡± to schedule a VM Simple design of the VMM OS awareness Low overheads E?cient resource management Low TCB
  • 8. Background Task-unawareness leading to poor responsiveness Head Run queue sorted based on CPU fairness Tail I/O- I/O- bound VM VM task bound task (VCPU) (VCPU) CPU- bound mixed task task VMM
  • 9. Background Task-unawareness leading to poor responsiveness Head Run queue sorted based on CPU fairness Tail I/O- I/O- bound VM VM task bound task (VCPU) (VCPU) CPU- bound mixed task task VMM I/O event
  • 10. Background Task-unawareness leading to poor responsiveness That event is mine! Head Run queue sorted based on CPU fairness Tail I/O- I/O- bound VM VM task bound task (VCPU) (VCPU) CPU- bound mixed task task VMM I/O event
  • 11. Background Task-unawareness leading to poor responsiveness That event is mine! Head Run queue sorted based on CPU fairness Tail I/O- I/O- I have no idea this incoming bound VM VM event is yours task bound task (VCPU) (VCPU) and CPU- bound mixed your VM has low priority now. task task Sorry not to schedule you.. VMM Responsiveness VS. Fairness I/O event
  • 12. Background Task-unawareness leading to poor responsiveness That event is mine! Head Run queue sorted based on CPU fairness Tail I/O- I/O- I have no idea this incoming bound VM VM event is yours task bound task (VCPU) (VCPU) and CPU- bound mixed your VM has low priority now. task task Sorry not to schedule you.. VMM Responsiveness VS. Fairness I/O event
  • 13. Background The worst case example for 6 domains consolidated 1 Workloads Native Linux(I/O+CPU) 1 dom: Server & CPU-bound task 0.8 XenoLinux(I/O) 5doms: CPU-bound task XenoLinux(I/O+CPU) 0.6 CDF 0.4 0.2 0 0 2 4 6 8 10 Response time (ms)
  • 14. Background The worst case example for 6 domains consolidated 1 Workloads Native Linux(I/O+CPU) 1 dom: Server & CPU-bound task 0.8 XenoLinux(I/O) 5doms: CPU-bound task XenoLinux(I/O+CPU) 0.6 CDF 0.4 0.2 0 0 2 4 6 8 10 Response time (ms)
  • 15. Background The worst case example for 6 domains consolidated 1 Workloads Native Linux(I/O+CPU) 1 dom: Server & CPU-bound task 0.8 XenoLinux(I/O) 5doms: CPU-bound task XenoLinux(I/O+CPU) 0.6 By boosting mechanism of CDF Xen Credit scheduler 0.4 0.2 Boosting mechanism realizes I/O boundness with only VCPU-level 0 0 2 4 6 8 10 Response time (ms)
  • 16. Main Goals Improve responsiveness of an I/O-bound task Priority boosting with task-level granularity ¡°Partial boosting¡± CPU fairness guarantee Transparency Low management overheads
  • 17. Issues How to identify an I/O-bound task How to know an incoming event is for the I/O- bound task
  • 18. Approach Non-intrusive approach No guest OS modi?cation No explicit interface to inform I/O-bound task and event data Pros. No additional engineering cost for di?erent OSes Strong trustworthiness Cons. False decision
  • 19. HOW TO IDENTIFY I/O-BOUND TASKS
  • 20. Tracking I/O-bound Tasks Observable information at the VMM Task switching Monitoring address space changes (Antfarm USENIX¡¯06) CPU time usage Running time of a task Example (x86) event CR3 update CR3 update ) () c( ts c Time d ts rd r Task running time
  • 21. Tracking I/O-bound Tasks Inference based on common gray-box knowledge Kernel policy to improve responsiveness of I/O- bound tasks An I/O-bound task is preemptively scheduled in response to its incoming event Characteristic of I/O-bound tasks Short running time Threshold to decide a short running time: IOthreashold
  • 22. Tracking I/O-bound Tasks Three disjoint observation Observation classes based on two gray-box criteria Ambiguity Positive evidence Positive Negative supports I/O-boundness evidence evidence Negative evidence supports non-I/O-boundness Preemptively scheduling in response to an event Ambiguity & No evidence Short running time( < IOthreshold )
  • 23. Tracking I/O-bound Tasks Example IOthreshold Event pending to VCPU1 Time Positive Negative Ambiguity
  • 24. Tracking I/O-bound Tasks Example IOthreshold Event pending to VCPU1 T1 Time Scheduling VCPU1 Positive Negative Ambiguity
  • 25. Tracking I/O-bound Tasks Example IOthreshold Event pending to VCPU1 T1 T2 Time Scheduling VCPU1 Positive Negative Ambiguity
  • 26. Tracking I/O-bound Tasks Example IOthreshold Event pending to VCPU1 T1 T2 Time Scheduling VCPU1 Positive Negative Ambiguity T1
  • 27. Tracking I/O-bound Tasks Example IOthreshold Event pending to VCPU1 T1 T2 T3 Time Scheduling VCPU1 Positive Negative Ambiguity T1
  • 28. Tracking I/O-bound Tasks Example IOthreshold Event pending to VCPU1 T1 T2 T3 Time Scheduling VCPU1 Positive Negative Ambiguity T2 T1
  • 29. Tracking I/O-bound Tasks Example IOthreshold Event pending to VCPU1 T1 T2 T3 T4 Time Scheduling VCPU1 Positive Negative Ambiguity T2 T1
  • 30. Tracking I/O-bound Tasks Example IOthreshold Event pending to VCPU1 T1 T2 T3 T4 Time Scheduling VCPU1 Positive Negative Ambiguity T2 T3 T1
  • 31. Tracking I/O-bound Tasks Example IOthreshold Event pending to VCPU1 T1 T2 T3 T4 T5 Time Scheduling VCPU1 Positive Negative Ambiguity T2 T3 T1
  • 32. Tracking I/O-bound Tasks Example IOthreshold Event pending to VCPU1 T1 T2 T3 T4 T5 Time Scheduling VCPU1 Positive Negative Ambiguity T2 T3 T4 T1
  • 33. Tracking I/O-bound Tasks Example IOthreshold Event pending to VCPU1 T1 T2 T3 T4 T5 T6 Time Scheduling VCPU1 Positive Negative Ambiguity T2 T3 T4 T1
  • 34. Tracking I/O-bound Tasks Example IOthreshold Event pending to VCPU1 T1 T2 T3 T4 T5 T6 Time Scheduling VCPU1 Positive Negative Ambiguity T2 T3 T4 T1 T5
  • 35. Tracking I/O-bound Tasks Example IOthreshold Event pending to VCPU1 T1 T2 T3 T4 T5 T6 Time Scheduling Descheduling VCPU1 VCPU1 Positive Negative Ambiguity T2 T3 T4 T1 T5
  • 36. Tracking I/O-bound Tasks Example IOthreshold Event pending to VCPU1 T1 T2 T3 T4 T5 T6 Time Scheduling Descheduling VCPU1 VCPU1 Positive Negative Ambiguity T2 T3 T4 T6 T1 T5
  • 37. Tracking I/O-bound Tasks Weighted evidence accumulation The degree of belief to reinforce the inference Weight of positive evidence < Weight of negative evidence More penalize for negative evidence BelThreshold The degree of At this time, this task is believed as an I/O-bound task belief # of sequential observations
  • 38. HOW TO KNOW AN INCOMING EVENT IS FOR AN I/O-BOUND TASK
  • 39. Correlation Mechanism To distinguish an incoming event for I/O-bound task Block I/O Block read Network I/O Packet reception
  • 40. Correlation Mechanism - Block I/O - Request/response style If T1 requests for reading B1 and T1 is I/O-bound Completion event for B1 is for I/O-bound task How to decide ¡°T1 read B1¡± at the VMM
  • 41. Correlation Mechanism - Block I/O - Request/response style If T1 requests for reading B1 and T1 is I/O-bound Completion event for B1 is for I/O-bound task How to decide ¡°T1 read B1¡± at the VMM When the VMM observes a read event, it checks whether the current task is I/O-bound
  • 42. Correlation Mechanism - Block I/O - Request/response style If T1 requests for reading B1 and T1 is I/O-bound Completion event for B1 is for I/O-bound task How to decide ¡°T1 read B1¡± at the VMM When the VMM observes a read event, it checks whether the current task is I/O-bound But, how about ¡°delayed read event¡± ? Guest OS dependent (e.g. block I/O scheduler)
  • 43. Correlation Mechanism - Block I/O - Inspection window inspection window user T1 T2 T3 T4 kernel read VMM actual read request If an I/O-bound task in inspection window The actual read request is for I/O-bound task False positive VS. False negative
  • 44. Correlation Mechanism - Network I/O - Event identi?cation Socket-like information is too heavy for the VMM Destination port number for TCP/IP communication Most speci?c to a recipient task Asynchronous packet reception No prior information about incoming packets History-based prediction mechanism
  • 45. Correlation Mechanism - Network I/O - History-based prediction mechanism Inference ¡°If an incoming packet is for I/O-bound task, this packet makes the I/O-bound task to be preemptively scheduled¡± Monitoring the ?rst woken task in response to an incoming packet
  • 46. Correlation Mechanism - Network I/O - History-based prediction mechanism (cont¡¯) Portmap An entry for each destination port number Each entry is an N-bit saturating counter Example (2-bit counter) 00 01 portmap non- weak I/O-bound I/O-bound If the ?rst woken task is I/O-bound Otherwise 10 11 strong I/O-bound I/O-bound
  • 47. Correlation Mechanism - Network I/O - History-based prediction mechanism (cont¡¯) Portmap An entry for each destination port number Each entry is an N-bit saturating counter Example (2-bit counter) 00 01 portmap non- weak I/O-bound I/O-bound If the ?rst woken task is I/O-bound Otherwise 10 11 strong If portmap counter¡¯s MSB is set, this packet I/O-bound I/O-bound is for I/O-bound
  • 49. Partial Boosting Priority boosting with task-level granularity Priority boosting lasts during the run of an I/O-bound task Why? To prevent CPU-bound tasks in a boosted VCPU from compromising CPU fairness
  • 50. Partial Boosting Procedure Head Run queue sorted based on CPU fairness Tail VM1 VM2 VM3 (VCPU) (VCPU) I/O- CPU- CPU- bound bound bound task task task VMM
  • 51. Partial Boosting Procedure Head Run queue sorted based on CPU fairness Tail VM1 VM2 VM3 (VCPU) (VCPU) I/O- CPU- CPU- bound bound bound task task task VMM If this event is inferred for an I/O-bound task in VM3, I/O event do partial boosting for VM3
  • 52. Partial Boosting Procedure Head Run queue sorted based on CPU fairness Tail VM3 VM1 VM2 I/O- CPU- CPU- (VCPU) (VCPU) bound bound bound task task task VMM If this event is inferred for an I/O-bound task in VM3, I/O event do partial boosting for VM3
  • 53. Partial Boosting Procedure Head Run queue sorted based on CPU fairness Tail VM3 VM1 VM2 I/O- CPU- CPU- (VCPU) (VCPU) bound bound bound task task task VMM If this event is inferred for an I/O-bound task in VM3, I/O event do partial boosting for VM3
  • 54. Partial Boosting Procedure Head Run queue sorted based on CPU fairness Tail VM3 VM1 VM2 I/O- CPU- CPU- (VCPU) (VCPU) bound bound bound task task task VMM If this event is inferred for an I/O-bound task in VM3, I/O event do partial boosting for VM3
  • 55. Partial Boosting Procedure Head Run queue sorted based on CPU fairness Tail VM1 VM2 VM3 (VCPU) (VCPU) I/O- CPU- CPU- bound bound bound task task task VMM If this event is inferred for an I/O-bound task in VM3, I/O event do partial boosting for VM3
  • 57. Implementation Based on Credit scheduler in Xen 3.2.1 Task information maintained by hash Limited number of tasks maintained Remove of a task with infrequent I/O Correlation Block I/O : using grant table in Xen Network I/O : supported by network backend driver No consideration of multiple VCPUs
  • 58. Evaluation Interactive workload Worst case scenario 1 dom: Server & CPU-bound task Packet request-response 5doms: CPU-bound task Think time: 100 ~ 1000 ms IOthreshold = 0.5 ms
  • 59. Evaluation Interactive workload Worst case scenario 1 dom: Server & CPU-bound task Packet request-response 5doms: CPU-bound task Think time: 100 ~ 1000 ms IOthreshold = 0.5 ms
  • 60. Evaluation Correlation evaluation Partial boosting hit ratio (PBHR) h PBHR (%) = ¡Á 100 T he number of partial boostings where 1 , if an I/O-bound task awakes during partial boosting. h= 0 , otherwise. de?ned as true positive ratio False positive ratio = (100 - PBHR) %
  • 61. Evaluation Correlation evaluation: Block I/O PBHR (%) Workloads 100 1 dom: 8 tasks 1 task: I/O-bound task 75 7 tasks: I/O+CPU task PBHR (%) (CPU usage 1~300ms between IOs) 50 5doms: CPU-bound task 25 0 1 2 3 4 5 6 7 8 NC Inspection window size
  • 62. Evaluation Correlation evaluation: Block I/O Throughput 120 Workloads 1 dom: 8 tasks Throughput (KB/s) 1 task: I/O-bound task 90 7 tasks: I/O+CPU task (CPU usage 1~300ms between IOs) 5doms: CPU-bound task 60 30 1 2 3 4 5 6 7 8 NC Inspection window size
  • 63. Evaluation Correlation evaluation: Network I/O 100 90 93 Workloads 1 dom: 8 tasks 75 64 1 task: I/O-bound task PBHR (%) 7 tasks: I/O+CPU task 50 (CPU usage 1~300ms between IOs) 5doms: CPU-bound task 25 10 0 2-bit is reasonable 1 2 4 NC in terms of space overheads Bit-width of portmap counter
  • 64. Evaluation Response time for text editing during CPU-intensive workload 1 1 0.9 Normal 0.9 Normal TAVS TAVS 0.8 0.8 0.7 0.7 0.6 0.6 CDF CDF 0.5 0.5 0.4 0.4 0.3 0.3 0.2 Text editing 0.2 Text editing 0.1 (Xen compilation) 0.1 (Web browsing with Flash animations) 0 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.05 0.1 0.15 0.2 0.25 0.3 Response time (sec) Response time (sec)
  • 65. Evaluation Execution time of I/O-bound tasks with CPU-intensive workloads Normal TAVS Normalized execution time 1.00 0.75 0.50 PBHR = 99% 0.25 0 grep find wget smbcp smbcp: copy ?les from remote samba server Background workload compilation compression
  • 66. Evaluation Overheads Task tracking overheads: 0.06% No overhead for inspecting incoming packets Increased network throughput : decreased CPU throughput = 48 : 1 Space overhead of N-bit portmap N * 8KB for each VM e.g. 2-bit portmaps for TCP and UDP: 32KB for each VM
  • 67. Conclusions Task-aware VM scheduling Bridging the semantic gap in CPU management Transparency by VMM-level inference Gray-box technique Low overheads
  • 68. Future Work Extension on multicore system Simulation-based analysis for more intelligent scheduling Evaluation for more various workloads
  • 71. Implementation Block correlation Shared IDD Dom1 grant table Dom2 1 Active Dom1's Dom2's VMM grant table recent task list recent task list T1 T3 T2 T3 T1 1 Hardware Machine memory
  • 72. Implementation Network correlation IDD Dom1 Dom2 Incoming packet information VMM Dom1, TCP, 1000 Dom2, UDP, 2000 update Dom1's Dom2's Dom2, TCP, 1000 portmap portmap Shared memory Hardware NIC
  • 73. Throughput 100 25 100 60 Throughput 95 Dom6 95 90 Dom5 90 85 Dom4 85 50 80 20 Dom3 80 75 Dom2 75 Dom1(I/O) 70 IDD 70 65 40 Throughput (Mbps) 65 Throughput (MB/s) CPU usage (%) CPU usage (%) 60 15 60 55 55 50 50 30 45 45 40 10 40 35 35 20 30 30 25 25 20 5 20 15 15 10 10 10 5 5 0 0 0 0 Base 0 0.0625 0.125 0.25 0.5 Base 0 0.0625 0.125 0.25 0.5 PBratio PBratio Allowed CP U usage f or partial boosting PBratio = T otal CP U usage
  • 74. Degree of Belief Degree of belief (grep, ?nd + compilation) 300 grep The degree of belief 250 cc1 200 150 100 50 0 -50 -100 0 100 200 300 400 Elapsed time (sec) 300 find The degree of belief 250 cc1 200 150 100 50 0 -50 -100 0 100 200 Elapsed time (sec)