Jump to content

weird cpu scaling behavior with vm

So I know there are some limitations with load detection and such with vms, but I had it working properly before, and now after a reinstall of the host os my vms are acting strange.

Basically when I put a load on a single core in the vm (which has it's cores pinned to real cores), instead of that physical core ramping up in the host os, ALL the cores pinned to the vm ramp up together.

That wouldn't be so much of an issue except that it costs me .4ghz. I have a full 8 core cpu (hyperthreading disabled) passed through to this vm. With only one core under load it  should go to 3.3ghz, but under full 8 core load it only goes to 2.9.

 

 

 

 

allcoresboost.png

singlecorehot.png

Build: Intel S2600gz, 2x E5-2670, EVGA SC 1070, Zotac 1060 6GB mini, 48GB Micron 1333mhz ECC DDR3, 2x Intel DPS-750XB 750 watt PSU

https://pcpartpicker.com/user/elerek/saved/3T7D4D

Link to comment
Share on other sites

Link to post
Share on other sites

host: ubuntu server

hypervisor: kvm (via virsh)

guest: windows 10 enterprise evaluation

 

hardware:

 

cpu: 2x intel e5-2670 with hyperthreading disabled in bios

mb: intel s2600gz

Build: Intel S2600gz, 2x E5-2670, EVGA SC 1070, Zotac 1060 6GB mini, 48GB Micron 1333mhz ECC DDR3, 2x Intel DPS-750XB 750 watt PSU

https://pcpartpicker.com/user/elerek/saved/3T7D4D

Link to comment
Share on other sites

Link to post
Share on other sites

Seems a bit odd, try recreating the VM and attaching the existing virtual disk. When you assign vCPUs to VMs the host has to schedule pCPU time for every assigned vCPU at once but if only 1 vCPU is active only 1 pCPU should actually ramp up not all.

Link to comment
Share on other sites

Link to post
Share on other sites

12 hours ago, leadeater said:

Seems a bit odd, try recreating the VM and attaching the existing virtual disk. When you assign vCPUs to VMs the host has to schedule pCPU time for every assigned vCPU at once but if only 1 vCPU is active only 1 pCPU should actually ramp up not all.

I thought hypervisors could run instructions of a single vCPU on multiple pCPUs? If I run a single threaded process in a guest and watch my host usage, I often see the load bouncing between multiple different CPUs, and sometimes get split 50/50 or 33/33/33 between CPUs. This is on Hyper-V, but I imagine kvm is capable of the same. When you pin cores to a VM, I wouldn't assume that pins vCPU 1 to pCPU 1, etc. I may be wildly wrong, I'm just presenting an idea

Looking to buy GTX690, other multi-GPU cards, or single-slot graphics cards: 

 

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, brwainer said:

I thought hypervisors could run instructions of a single vCPU on multiple pCPUs? If I run a single threaded process in a guest and watch my host usage, I often see the load bouncing between multiple different CPUs, and sometimes get split 50/50 or 33/33/33 between CPUs. This is on Hyper-V, but I imagine kvm is capable of the same. When you pin cores to a VM, I wouldn't assume that pins vCPU 1 to pCPU 1, etc. I may be wildly wrong, I'm just presenting an idea

Yes it can run vCPU load on any pCPU thread it likes, but if a VM asks for CPU time it must be given access to the number of pCPU threads as configured by vCPUs. This is the reason why if you reduce the number of vCPUs a VM has it can actually perform better, application dependent, and every other VM will also perform better. In the virtualization world less is more.

 

In the case of ESXi and other hypervisors you can limit what pCPU threads a VM vCPU may be allocated to but doing that is extremely rare.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, leadeater said:

Yes it can run vCPU load on any pCPU thread it likes, but if a VM asks for CPU time it must be given access to the number of pCPU threads as configured by vCPUs. This is the reason why if you reduce the number of vCPUs a VM has it can actually perform better, application dependent, and every other VM will also perform better. In the virtualization world less is more.

 

In the case of ESXi and other hypervisors you can limit what pCPU threads a VM vCPU may be allocated to but doing that is extremely rare.

right so my thought is, the configuration says "run the processes on any of these 8 CPUs", and the hypervisor may be splitting the time evenly. that's the only thing I can think of.

Looking to buy GTX690, other multi-GPU cards, or single-slot graphics cards: 

 

Link to comment
Share on other sites

Link to post
Share on other sites

29 minutes ago, brwainer said:

right so my thought is, the configuration says "run the processes on any of these 8 CPUs", and the hypervisor may be splitting the time evenly. that's the only thing I can think of.

Correct, the hypervisor has a CPU scheduler and it will constantly re-look at the CPU demand and move load around threads to best optimize as much as it can.

 

To expand a little bit on how removing vCPUs can increase performance I'll give an example.

 

Host: 16 Threads (8c/16)

VM 1: 4 vCPU

VM 2: 2 vCPU

VM 3: 4 vCPU

VM 4: 8 vCPU

VM 5: 4 vCPU

VM 6: 2 vCPU

VM 7: 4 vCPU

VM 8: 8 vCPU

vCPU:pCPU: 2:1 (4:1 real cores)

 

Here is a host that going by vCPU:pCPU ratio is lightly loaded so in theory every VM should perform very well. But the owner of VM 4 is complaining that the batch processing load time is taking way to long and goes much faster on his dev laptop with only 4 cores, half the ram and a laptop HDD so what gives?

 

Further investigation shows that the application can only utilize 1 thread but the VM has 8, why does this matter? Well when the VM asks for CPU time to execute instructions it can't ask for 1 it must be given CPU time for all 8, but there are 7 other VMs on the host all asking for CPU time. It is much easier to schedule time for 1 vCPU than it is for 8 vCPU so the VM is constantly in CPU wait cycles. If VM 4 was reduced to 1 or 2 vCPUs the VM will rarely be in CPU wait and the perceived application performance will go up.

 

As a virtual farm administrator one of the key metrics to monitoring performance is making sure VMs aren't getting blocked by CPU waits as the hypervisor tries to schedule time. The way I like to describe it is "Don't pee in the pool", one badly configured VM can effect every VM. So don't let anyone ask for over-sized VMs as it hurts them and you, this is a very hard concept for customers to understand.

 

If you have a need to run many VMs with large vCPU counts you should actually run them on the same host, this is counter intuitive as basic logic would lead you to spreading them across hosts and making sure they never live on the same host. Why is it actually a good thing? When a VM with 8 vCPUs releases it's threads there is instantly 8 available for use which is perfect when there are 4 other VMs with 8 vCPUs on the host. If this 8 vCPU VM was living on a host with 8 1 vCPU VMs and 4 2 vCPU VMs the chances of 8 pCPUs being naturally available at the same time is very rare so the hypervisor has to put VMs in CPU wait again reducing performance.

 

Edit:

Oh and a CPU thread in a guest VM at 100% CPU time (this is important) doesn't mean a pCPU thread on the host will be at 100% CPU time. Windows shows CPU time not actual utilization, it can on every CPU cycle be executing a very easy task that is accelerated my micro-code or is a simple spin wait.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, leadeater said:

-snip-

This is basically what I had assumed, but I hadn't learned or deduced that all the cores assigned to a VM had to be available together for the VM to get any CPU time at all. That is quite fascinating.

Looking to buy GTX690, other multi-GPU cards, or single-slot graphics cards: 

 

Link to comment
Share on other sites

Link to post
Share on other sites

This shouldn't be the case here though. Before running cinebench, all cores are clocked down on the host, I pinned cinebench to only run on core 7 of the vcpu using cpu affinity in task manager. Each vcpu is pinned to a specific pcpu in the config:

 

  <cputune>
    <vcpupin vcpu='0' cpuset='8'/>
    <vcpupin vcpu='1' cpuset='9'/>
    <vcpupin vcpu='2' cpuset='10'/>
    <vcpupin vcpu='3' cpuset='11'/>
    <vcpupin vcpu='4' cpuset='12'/>
    <vcpupin vcpu='5' cpuset='13'/>
    <vcpupin vcpu='6' cpuset='14'/>
    <vcpupin vcpu='7' cpuset='15'/>
  </cputune>

 

Also, as I said, this worked originally, but after a reinstall of the host os, it doesn't anymore. 

Build: Intel S2600gz, 2x E5-2670, EVGA SC 1070, Zotac 1060 6GB mini, 48GB Micron 1333mhz ECC DDR3, 2x Intel DPS-750XB 750 watt PSU

https://pcpartpicker.com/user/elerek/saved/3T7D4D

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Elerek said:

This shouldn't be the case here though. Before running cinebench, all cores are clocked down on the host, I pinned cinebench to only run on core 7 of the vcpu using cpu affinity in task manager. Each vcpu is pinned to a specific pcpu in the config:

 

  <cputune>
    <vcpupin vcpu='0' cpuset='8'/>
    <vcpupin vcpu='1' cpuset='9'/>
    <vcpupin vcpu='2' cpuset='10'/>
    <vcpupin vcpu='3' cpuset='11'/>
    <vcpupin vcpu='4' cpuset='12'/>
    <vcpupin vcpu='5' cpuset='13'/>
    <vcpupin vcpu='6' cpuset='14'/>
    <vcpupin vcpu='7' cpuset='15'/>
  </cputune>

 

Also, as I said, this worked originally, but after a reinstall of the host os, it doesn't anymore. 

Is this VM an imported one from the previous install or a new VM? Just wondering if it's an import issue when the hypervisor read the VM configuration.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×