Jump to content

Running ESXi (VMVisor) 7.0U3f as a virtual machine on TrueNAS SCALE 22.02.2.1

Go to solution Solved by MSMSMSM,

So,

 

I tried a few things out, starting with Proxmox containing an ESXi 7 and TrueNAS Core VM, then realizing that my internal network needs a "router" of sorts (DHCP, gateway, etc) and experimenting between vyOS, RouterOS (Microtik) and pfSense on said Proxmox instance, totaling the number of virtual machines to 3.

 

I learned a few things and also didn't sleep for two nights but here's what I learned so far, I'm writing this down from memory so I could be completely wrong, please correct me. Everything specified is as of this writing.

  • PCI pass-through on Proxmox is hit-or-miss and is considered experimental for a reason
    It'll still work, just... breakage is expected
     
  • If you have created a pool and configuration on TrueNAS Scale, you cannot just install TrueNAS Core and restore it as it has a lower version number
    This is probably by design, apparently SCALE uses a greater version of ZFS than Core
     
  • Just because something is Linux-based doesn't mean it can be made into an LXC package
    I found myself doing this while I was on Proxmox and "optimizing" TrueNAS SCALE and vyOS by converting their clean installs into LXC container tarballs despite me getting neither working yet and having never used LXCs before
     
  • RouterOS is paid software
    I didn't read deep enough until I got it working and realized that it needs a license, in my defense, I was sleep deprived
     
  • Use pfSense
    Yeah....
     
  • You can't mix nested virtualization and PCI passthrough with ESXi
    See https://kb.vmware.com/s/article/67272
     
On 8/5/2022 at 1:04 AM, LIGISTX said:

No need for new hardware... Just need to backup your truenas XML, format your boot media with ESXi, use an HBA to pass the drives through (thats 50 bucks.....), instal a truenas VM, load XML, done. Truenas will work just as it always did, it will just be running under ESXi.

 

On 8/5/2022 at 12:58 AM, Electronics Wizardy said:

You probably can install esxi on the host, then pass the disk to a truenas vm, then use that vm to store the other vms on.

 

I essentially did this. Proxmox wasn't doing it for me, it felt jank to have a hypervisor within a hypervisor and if I had to use Proxmox, why use ESXi but I don't like Proxmox so I gave ESXi a shot.

 

I was avoiding it for so long because I thought it won't be compatible with my hardware and surprisingly.... it is!

 

While I was on Proxmox, after battling both vyOS and RouterOS, I found myself using pfSense (which is an amazing piece of software) and using the experience I gained from trying to run ESXi+SCALE+pfSense under Proxmox, I decided to ditch Proxmox and go pfSense+SCALE on ESXi...

 

It took a lot of research, false starts and failed attempts but finally, I...

  • Installed ESXi
    I needed to use the flag systemMediaSize=min to discourage ESXi to allocate too much space to the OS (VMFSL) and leave some for the local store (VMFS)
     
  • Used the free space to create a new partition and assign it as a local store
    This step really stumped me since Proxmox gives you a local store but ESXi 7 requires me to go and enable SSH so I can go and create datastore following this guide on Github Gist and until I figured it out, ESXi felt like an intimidating ghost town
     
  • Whitelisted my SATA controller
    I found a guide that guided me to ensure that my controller won't be greyed out when I try to enable PCIe passthrough by modifying passthru.map
     
  • Went into maintenance mode and set the SATA controller as passthrough enabled and then rebooted the ESXi Host, then disabled SSH and exited maintenance mode
     
  • Upload my pfSense and SCALE ISOs, create a SCALE VM and assign the passed through SATA controller to the SCALE VM

    Turns out, when you assign RAM to a VM, the RAM isn't "hard allocated" to that VM but is rather "soft allocated" (made available when needed) by default and mixing that with passthrough makes ESXi upset with this error
     
    Quote

    Invalid memory setting for FPT: memory reservation (sched.mem.min) should be equal to memsize(xxxx)

    The solution is to go into advanced options within the VM's settings and ensure that the memory assigned equals the memory reserved.
     

  • Create 4 port groups, one for management, one for ISCSI, one for accessing the internet (through my general home network) and one for the intranet (managed by the pfSense instance inside)

    I named them ManageNet (bound to statically assigned IP attached to my home network through VMkernel NIC 0), PublicNet (used by all VMs as vNIC0), PrivateNet (used by all VMs as vNIC1) and SCSINet (bound to statically assigned IP attached to the internal network through VMkernel NIC 1)

    P.S. PublicNet is just the standard "VM Network" port group renamed
     

  • Assign two NICs to each VM

  • Start them up, install what's needed, import configurations and keys

  • Setup iSCSI to connect to the SCALE instance and use the exposed drive as the ZFS-backed data store for non-core VMs

    The way I'm doing it is jank so I won't be elaborating any further on it :P... I didn't expect this to work at all but it somehow did!
     

  • Have the logs set up on persistent storage by following the steps of a knowledge base article

and...

 

. That's all folks! - Storia e curiosità - ViviRidendo

👋

 

So, I've been running a TrueNAS SCALE server on a small form factor system for about a while now and part of why i went with SCALE over Core (which is admittedly more stable) is its virtualization capabilities. I want to be able to provision and manage my VMs from a desktop client rather than a web interface and libvirt UIs are lacking on Windows with the most prominent variant (QtEmu) having its last modification ~13 years ago.

 

I've found myself comfortable with VMWare Workstation long before I owned servers, when I just started tinkering with virtualization and so I figured I could run an ESXi server within TrueNAS SCALE using QEMU/KVM and manage them from Workstation and so I got myself a copy of ESXi 7.0U3f-20036589 and got started.

 

Immediately problems started to show up, first and foremost, as of this writing, SCALE defaults to i440fx, while online guides which talk about ESXi specify using q35 (source) and ESXi 7 onwards has dropped support for e1000 NICs (source), while SCALE allows you to pick between e1000 and VirtIO (which isn't supported by ESXi) but excludes other supported options (like vmxnet3, e1000e, rtl8139)

 

So naturally I tried using ESXi 6.7.0.update03-14320388 (which should in theory have drivers for e1000 NICs) and got my first purple screen of death. At first I thought this is perhaps because I'm passing through my 11900K directly and so following the guide that talked about q35, I changed it to Westmere but that didn't help.

 

image.thumb.png.9450d82e8b83fa1e8e8919e7f8edbe73.png

 

Booting off ESXi 7.0U3f-20036589 will give me a "no supported network card" prompt. I could in theory just modify the VM configuration files to use q35 and vmxnet3 but that doesn't seem to be ideal as TrueNAS stores configuration data in their own databases which will be preferred over manual configuration (as far as I understand what's written) and you're expected to use their APIs to make any modifications (source).

 

I've thought about using the "Community Networking Driver for ESXi" (source) and constructing a custom ISO using PowerCLI but none of them list the 82540EM (the corresponding device emulated by e1000 in QEMU) and even if I wanted to do a hail mary, due to some stuff that I'm barely qualified to make sense out of, the ability to make a custom ISO seems unlikely (source).

 

So I return from my slumber to ask for tech tips. How do I leverage TrueNAS SCALE APIs to modify VM configurations in the "intended(TM)" manner.

Link to comment
Share on other sites

Link to post
Share on other sites

Keep in mind that running a nested hypervisor is really not recommended.

 

I'd install KVM from the Linux command line in SCALE with a web frontend and go from there.

PC Specs - AMD Ryzen 7 5800X3D - MSI B550M Mortar - 32GB Corsair Vengeance RGB DDR4-3600 @ CL16 - ASRock RX7800XT - 660p 1TBGB & Crucial P5 1TB - Fractal Define Mini C - CM V750v2 - Windows 11 Pro

 

Link to comment
Share on other sites

Link to post
Share on other sites

27 minutes ago, NelizMastr said:

I'd install KVM from the Linux command line in SCALE with a web frontend and go from there.

It may not like that… truenas is REALLY meant to be an appliance, it doesn’t like you messing with things. 
 

7 hours ago, MSMSMSM said:

👋

 

So, I've been running a TrueNAS SCALE server on a small form factor system for about a while now and part of why i went with SCALE over Core (which is admittedly more stable) is its virtualization capabilities. I want to be able to provision and manage my VMs from a desktop client rather than a web interface and libvirt UIs are lacking on Windows with the most prominent variant (QtEmu) having its last modification ~13 years ago.

 

I've found myself comfortable with VMWare Workstation long before I owned servers, when I just started tinkering with virtualization and so I figured I could run an ESXi server within TrueNAS SCALE using QEMU/KVM and manage them from Workstation and so I got myself a copy of ESXi 7.0U3f-20036589 and got started.

 

Immediately problems started to show up, first and foremost, as of this writing, SCALE defaults to i440fx, while online guides which talk about ESXi specify using q35 (source) and ESXi 7 onwards has dropped support for e1000 NICs (source), while SCALE allows you to pick between e1000 and VirtIO (which isn't supported by ESXi) but excludes other supported options (like vmxnet3, e1000e, rtl8139)

 

So naturally I tried using ESXi 6.7.0.update03-14320388 (which should in theory have drivers for e1000 NICs) and got my first purple screen of death. At first I thought this is perhaps because I'm passing through my 11900K directly and so following the guide that talked about q35, I changed it to Westmere but that didn't help.

 

image.thumb.png.9450d82e8b83fa1e8e8919e7f8edbe73.png

 

Booting off ESXi 7.0U3f-20036589 will give me a "no supported network card" prompt. I could in theory just modify the VM configuration files to use q35 and vmxnet3 but that doesn't seem to be ideal as TrueNAS stores configuration data in their own databases which will be preferred over manual configuration (as far as I understand what's written) and you're expected to use their APIs to make any modifications (source).

 

I've thought about using the "Community Networking Driver for ESXi" (source) and constructing a custom ISO using PowerCLI but none of them list the 82540EM (the corresponding device emulated by e1000 in QEMU) and even if I wanted to do a hail mary, due to some stuff that I'm barely qualified to make sense out of, the ability to make a custom ISO seems unlikely (source).

 

So I return from my slumber to ask for tech tips. How do I leverage TrueNAS SCALE APIs to modify VM configurations in the "intended(TM)" manner.

Why not just run truenas UNDER ESXi? I wouldn’t ever run a hypervisor under truenas, that’s just bass ackwards. But virtualize truenas is pretty standard. 
 

I ran core under ESXi for years, but have since moved to proxmox and have been much happier. Either way, why not flip this around and virtualize truenas? 

Rig: i7 13700k - - Asus Z790-P Wifi - - RTX 4080 - - 4x16GB 6000MHz - - Samsung 990 Pro 2TB NVMe Boot + Main Programs - - Assorted SATA SSD's for Photo Work - - Corsair RM850x - - Sound BlasterX EA-5 - - Corsair XC8 JTC Edition - - Corsair GPU Full Cover GPU Block - - XT45 X-Flow 420 + UT60 280 rads - - EK XRES RGB PWM - - Fractal Define S2 - - Acer Predator X34 -- Logitech G502 - - Logitech G710+ - - Logitech Z5500 - - LTT Deskpad

 

Headphones/amp/dac: Schiit Lyr 3 - - Fostex TR-X00 - - Sennheiser HD 6xx

 

Homelab/ Media Server: Proxmox VE host - - 512 NVMe Samsung 980 RAID Z1 for VM's/Proxmox boot - - Xeon e5 2660 V4- - Supermicro X10SRF-i - - 128 GB ECC 2133 - - 10x4 TB WD Red RAID Z2 - - Corsair 750D - - Corsair RM650i - - Dell H310 6Gbps SAS HBA - - Intel RES2SC240 SAS Expander - - TreuNAS + many other VM’s

 

iPhone 14 Pro - 2018 MacBook Air

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, LIGISTX said:

Why not just run truenas UNDER ESXi? I wouldn’t ever run a hypervisor under truenas, that’s just bass ackwards. But virtualize truenas is pretty standard. 

And you're absolutely correct, except, I've already spent all my money on this server (this was a recent exploration of mine) and I've already created ZFS pools, datasets, volumes (I was planning to just use SCALE without ESXi and migrated my disk images to Zvols thinking I could manage the VMs from a non-web interface) and I don't really have an appetite for spending anymore. I'm already down the hole by a lot for subpar hardware.

 

The reason I even care about ESXi is because of the integration and polish it has with VMware Workstation, I don't like web interfaces and VMware's tight integration makes it appealing for me.

3 hours ago, LIGISTX said:

truenas is REALLY meant to be an appliance, it doesn’t like you messing with things

The folks at the TrueNAS forums were pretty clear about this and since I really fancy their "backup your preferences and restore them at will" feature, I'm leaning towards doing what the official guides say. That's why I was hoping someone who was familiar with TrueNAS APIs could pitch in...

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, MSMSMSM said:

And you're absolutely correct, except, I've already spent all my money on this server (this was a recent exploration of mine) and I've already created ZFS pools, datasets, volumes (I was planning to just use SCALE without ESXi and migrated my disk images to Zvols thinking I could manage the VMs from a non-web interface) and I don't really have an appetite for spending anymore. I'm already down the hole by a lot for subpar hardware.

What hardware are you using?

 

You probably can install esxi on the host, then pass the disk to a truenas vm, then use that vm to store the other vms on.

Link to comment
Share on other sites

Link to post
Share on other sites

8 minutes ago, MSMSMSM said:

And you're absolutely correct, except, I've already spent all my money on this server (this was a recent exploration of mine) and I've already created ZFS pools, datasets, volumes (I was planning to just use SCALE without ESXi and migrated my disk images to Zvols thinking I could manage the VMs from a non-web interface) and I don't really have an appetite for spending anymore. I'm already down the hole by a lot for subpar hardware.

No need for new hardware... Just need to backup your truenas XML, format your boot media with ESXi, use an HBA to pass the drives through (thats 50 bucks.....), instal a truenas VM, load XML, done. Truenas will work just as it always did, it will just be running under ESXi.

 

https://www.ebay.com/itm/394094836815?hash=item5bc1e1fc4f:g:Oa4AAOSwDQFhsw6x

 

Then you just need some SAS to SATA cables, you pass the entire PCIe device through to truenas, and your done. 

Rig: i7 13700k - - Asus Z790-P Wifi - - RTX 4080 - - 4x16GB 6000MHz - - Samsung 990 Pro 2TB NVMe Boot + Main Programs - - Assorted SATA SSD's for Photo Work - - Corsair RM850x - - Sound BlasterX EA-5 - - Corsair XC8 JTC Edition - - Corsair GPU Full Cover GPU Block - - XT45 X-Flow 420 + UT60 280 rads - - EK XRES RGB PWM - - Fractal Define S2 - - Acer Predator X34 -- Logitech G502 - - Logitech G710+ - - Logitech Z5500 - - LTT Deskpad

 

Headphones/amp/dac: Schiit Lyr 3 - - Fostex TR-X00 - - Sennheiser HD 6xx

 

Homelab/ Media Server: Proxmox VE host - - 512 NVMe Samsung 980 RAID Z1 for VM's/Proxmox boot - - Xeon e5 2660 V4- - Supermicro X10SRF-i - - 128 GB ECC 2133 - - 10x4 TB WD Red RAID Z2 - - Corsair 750D - - Corsair RM650i - - Dell H310 6Gbps SAS HBA - - Intel RES2SC240 SAS Expander - - TreuNAS + many other VM’s

 

iPhone 14 Pro - 2018 MacBook Air

Link to comment
Share on other sites

Link to post
Share on other sites

21 hours ago, MSMSMSM said:

So naturally I tried using ESXi 6.7.0.update03-14320388 (which should in theory have drivers for e1000 NICs) and got my first purple screen of death. At first I thought this is perhaps because I'm passing through my 11900K directly and so following the guide that talked about q35, I changed it to Westmere but that didn't help.

You can run ESXi 6.7 on KVM/QEMU using the e1000 NIC. 

These are my testlab below running in UnRAID. (2 x ESXi Hosts & TrueNAS provides the LUN's for the Datastores via iSCSi)

 

 image.png.869b98820a2ab1fde8bb48036ce2f5a1.pngimage.png.47d1ea5cf883dffa25e5b06f4c432f94.png

21 hours ago, MSMSMSM said:

Booting off ESXi 7.0U3f-20036589 will give me a "no supported network card" prompt. I could in theory just modify the VM configuration files to use q35 and vmxnet3 but that doesn't seem to be ideal as TrueNAS stores configuration data in their own databases which will be preferred over manual configuration (as far as I understand what's written) and you're expected to use their APIs to make any modifications (source).

 

I've thought about using the "Community Networking Driver for ESXi" (source) and constructing a custom ISO using PowerCLI but none of them list the 82540EM (the corresponding device emulated by e1000 in QEMU) and even if I wanted to do a hail mary, due to some stuff that I'm barely qualified to make sense out of, the ability to make a custom ISO seems unlikely (source).

 

So I return from my slumber to ask for tech tips. How do I leverage TrueNAS SCALE APIs to modify VM configurations in the "intended(TM)" manner.

 

I have actually upgraded them to 7.0.3, and they're running fine. I did have to change them from e1000 to vmxnet3 to get the network adapters to work though. 

UnRAID is the same in that they dont list vmxnet3 as a supported adapter in their UI, but if you check QEMU it should still have support for it. So you can see here I manually changed the type to vmxnet3

image.png.91ba9421eed8c3c872aaaf690f5bd947.png

 

You can see here im running 7.0.3. I set the machine type to Q35 (5.1)

image.thumb.png.3826708fabc31510a751d23b3f0dd324.png

 

The issue you'll have as you've found out, is unlike UnRAID you cant just simply edit an XML or JSON file, TrueNAS Scale holds the configuration in a database. 

You'll need to connect with a websocket client and do an advanced config

image.png.58336b616dee7a711061dfb567b32e0f.png

 

Heres some info about the websocket API for VM's https://www.truenas.com/docs/api/scale_websocket_api.html#vm

Spoiler

Desktop: Ryzen9 5950X | ASUS ROG Crosshair VIII Hero (Wifi) | EVGA RTX 3080Ti FTW3 | 32GB (2x16GB) Corsair Dominator Platinum RGB Pro 3600Mhz | EKWB EK-AIO 360D-RGB | EKWB EK-Vardar RGB Fans | 1TB Samsung 980 Pro, 4TB Samsung 980 Pro | Corsair 5000D Airflow | Corsair HX850 Platinum PSU | Asus ROG 42" OLED PG42UQ + LG 32" 32GK850G Monitor | Roccat Vulcan TKL Pro Keyboard | Logitech G Pro X Superlight  | MicroLab Solo 7C Speakers | Audio-Technica ATH-M50xBT2 LE Headphones | TC-Helicon GoXLR | Audio-Technica AT2035 | LTT Desk Mat | XBOX-X Controller | Windows 11 Pro

 

Spoiler

Server: Fractal Design Define R6 | Ryzen 3950x | ASRock X570 Taichi | EVGA GTX1070 FTW | 64GB (4x16GB) Corsair Vengeance LPX 3000Mhz | Corsair RM850v2 PSU | Fractal S36 Triple AIO | 12 x 8TB HGST Ultrastar He10 (WD Whitelabel) | 500GB Aorus Gen4 NVMe | 2 x 2TB Samsung 970 Evo Plus NVMe | LSI 9211-8i HBA

 

Link to comment
Share on other sites

Link to post
Share on other sites

Further to the info above, i've checked TrueNAS Scale. The version of QEMU does have e1000e and vmxnet3

 

image.png.90fadbc18a02e0974ad4b3a5305854f7.png

Spoiler

Desktop: Ryzen9 5950X | ASUS ROG Crosshair VIII Hero (Wifi) | EVGA RTX 3080Ti FTW3 | 32GB (2x16GB) Corsair Dominator Platinum RGB Pro 3600Mhz | EKWB EK-AIO 360D-RGB | EKWB EK-Vardar RGB Fans | 1TB Samsung 980 Pro, 4TB Samsung 980 Pro | Corsair 5000D Airflow | Corsair HX850 Platinum PSU | Asus ROG 42" OLED PG42UQ + LG 32" 32GK850G Monitor | Roccat Vulcan TKL Pro Keyboard | Logitech G Pro X Superlight  | MicroLab Solo 7C Speakers | Audio-Technica ATH-M50xBT2 LE Headphones | TC-Helicon GoXLR | Audio-Technica AT2035 | LTT Desk Mat | XBOX-X Controller | Windows 11 Pro

 

Spoiler

Server: Fractal Design Define R6 | Ryzen 3950x | ASRock X570 Taichi | EVGA GTX1070 FTW | 64GB (4x16GB) Corsair Vengeance LPX 3000Mhz | Corsair RM850v2 PSU | Fractal S36 Triple AIO | 12 x 8TB HGST Ultrastar He10 (WD Whitelabel) | 500GB Aorus Gen4 NVMe | 2 x 2TB Samsung 970 Evo Plus NVMe | LSI 9211-8i HBA

 

Link to comment
Share on other sites

Link to post
Share on other sites

On 8/5/2022 at 1:04 AM, LIGISTX said:

No need for new hardware... Just need to backup your truenas XML, format your boot media with ESXi, use an HBA to pass the drives through (thats 50 bucks.....), instal a truenas VM, load XML, done. Truenas will work just as it always did, it will just be running under ESXi.

This sounds like a good idea, except, I have an ITX motherboard and my PCIe and M.2 slots are completely occupied, to an extent where I'm converting USB headers into ports so I can attach more hard drives to it (I did emphasize on the server being SFF when I built it)

On 8/5/2022 at 11:27 AM, Jarsky said:

Further to the info above, i've checked TrueNAS Scale. The version of QEMU does have e1000e and vmxnet3

Talk about being a tease 😛

I ran a similar command via SSH to see if TrueNAS comes with a version of QEMU that perhaps doesn't support e1000e/vmxnet3 and like you've shown, that isn't the case, this is more so a limitation of their API/GUI.

On 8/5/2022 at 12:58 AM, Electronics Wizardy said:

You probably can install esxi on the host, then pass the disk to a truenas vm, then use that vm to store the other vms on.

Based on some reading I had done on TrueNAS's forums, they strongly recommended against visualizing TrueNAS and so I figured that since the ESXi VMs aren't as valuable to me as the data stored on TrueNAS, I'm willing for the occasional ESXi hiccup in exchange for TrueNAS stability.  Though if I'm totally out of options, I will consider this, even though I'm not a huge fan of it.

Link to comment
Share on other sites

Link to post
Share on other sites

So,

 

I tried a few things out, starting with Proxmox containing an ESXi 7 and TrueNAS Core VM, then realizing that my internal network needs a "router" of sorts (DHCP, gateway, etc) and experimenting between vyOS, RouterOS (Microtik) and pfSense on said Proxmox instance, totaling the number of virtual machines to 3.

 

I learned a few things and also didn't sleep for two nights but here's what I learned so far, I'm writing this down from memory so I could be completely wrong, please correct me. Everything specified is as of this writing.

  • PCI pass-through on Proxmox is hit-or-miss and is considered experimental for a reason
    It'll still work, just... breakage is expected
     
  • If you have created a pool and configuration on TrueNAS Scale, you cannot just install TrueNAS Core and restore it as it has a lower version number
    This is probably by design, apparently SCALE uses a greater version of ZFS than Core
     
  • Just because something is Linux-based doesn't mean it can be made into an LXC package
    I found myself doing this while I was on Proxmox and "optimizing" TrueNAS SCALE and vyOS by converting their clean installs into LXC container tarballs despite me getting neither working yet and having never used LXCs before
     
  • RouterOS is paid software
    I didn't read deep enough until I got it working and realized that it needs a license, in my defense, I was sleep deprived
     
  • Use pfSense
    Yeah....
     
  • You can't mix nested virtualization and PCI passthrough with ESXi
    See https://kb.vmware.com/s/article/67272
     
On 8/5/2022 at 1:04 AM, LIGISTX said:

No need for new hardware... Just need to backup your truenas XML, format your boot media with ESXi, use an HBA to pass the drives through (thats 50 bucks.....), instal a truenas VM, load XML, done. Truenas will work just as it always did, it will just be running under ESXi.

 

On 8/5/2022 at 12:58 AM, Electronics Wizardy said:

You probably can install esxi on the host, then pass the disk to a truenas vm, then use that vm to store the other vms on.

 

I essentially did this. Proxmox wasn't doing it for me, it felt jank to have a hypervisor within a hypervisor and if I had to use Proxmox, why use ESXi but I don't like Proxmox so I gave ESXi a shot.

 

I was avoiding it for so long because I thought it won't be compatible with my hardware and surprisingly.... it is!

 

While I was on Proxmox, after battling both vyOS and RouterOS, I found myself using pfSense (which is an amazing piece of software) and using the experience I gained from trying to run ESXi+SCALE+pfSense under Proxmox, I decided to ditch Proxmox and go pfSense+SCALE on ESXi...

 

It took a lot of research, false starts and failed attempts but finally, I...

  • Installed ESXi
    I needed to use the flag systemMediaSize=min to discourage ESXi to allocate too much space to the OS (VMFSL) and leave some for the local store (VMFS)
     
  • Used the free space to create a new partition and assign it as a local store
    This step really stumped me since Proxmox gives you a local store but ESXi 7 requires me to go and enable SSH so I can go and create datastore following this guide on Github Gist and until I figured it out, ESXi felt like an intimidating ghost town
     
  • Whitelisted my SATA controller
    I found a guide that guided me to ensure that my controller won't be greyed out when I try to enable PCIe passthrough by modifying passthru.map
     
  • Went into maintenance mode and set the SATA controller as passthrough enabled and then rebooted the ESXi Host, then disabled SSH and exited maintenance mode
     
  • Upload my pfSense and SCALE ISOs, create a SCALE VM and assign the passed through SATA controller to the SCALE VM

    Turns out, when you assign RAM to a VM, the RAM isn't "hard allocated" to that VM but is rather "soft allocated" (made available when needed) by default and mixing that with passthrough makes ESXi upset with this error
     
    Quote

    Invalid memory setting for FPT: memory reservation (sched.mem.min) should be equal to memsize(xxxx)

    The solution is to go into advanced options within the VM's settings and ensure that the memory assigned equals the memory reserved.
     

  • Create 4 port groups, one for management, one for ISCSI, one for accessing the internet (through my general home network) and one for the intranet (managed by the pfSense instance inside)

    I named them ManageNet (bound to statically assigned IP attached to my home network through VMkernel NIC 0), PublicNet (used by all VMs as vNIC0), PrivateNet (used by all VMs as vNIC1) and SCSINet (bound to statically assigned IP attached to the internal network through VMkernel NIC 1)

    P.S. PublicNet is just the standard "VM Network" port group renamed
     

  • Assign two NICs to each VM

  • Start them up, install what's needed, import configurations and keys

  • Setup iSCSI to connect to the SCALE instance and use the exposed drive as the ZFS-backed data store for non-core VMs

    The way I'm doing it is jank so I won't be elaborating any further on it :P... I didn't expect this to work at all but it somehow did!
     

  • Have the logs set up on persistent storage by following the steps of a knowledge base article

and...

 

. That's all folks! - Storia e curiosità - ViviRidendo

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, MSMSMSM said:

So,

 

I tried a few things out, starting with Proxmox containing an ESXi 7 and TrueNAS Core VM, then realizing that my internal network needs a "router" of sorts (DHCP, gateway, etc) and experimenting between vyOS, RouterOS (Microtik) and pfSense on said Proxmox instance, totaling the number of virtual machines to 3.

 

I learned a few things and also didn't sleep for two nights but here's what I learned so far, I'm writing this down from memory so I could be completely wrong, please correct me. Everything specified is as of this writing.

  • PCI pass-through on Proxmox is hit-or-miss and is considered experimental for a reason
    It'll still work, just... breakage is expected
     
  • If you have created a pool and configuration on TrueNAS Scale, you cannot just install TrueNAS Core and restore it as it has a lower version number
    This is probably by design, apparently SCALE uses a greater version of ZFS than Core
     
  • Just because something is Linux-based doesn't mean it can be made into an LXC package
    I found myself doing this while I was on Proxmox and "optimizing" TrueNAS SCALE and vyOS by converting their clean installs into LXC container tarballs despite me getting neither working yet and having never used LXCs before
     
  • RouterOS is paid software
    I didn't read deep enough until I got it working and realized that it needs a license, in my defense, I was sleep deprived
     
  • Use pfSense
    Yeah....
     
  • You can't mix nested virtualization and PCI passthrough with ESXi
    See https://kb.vmware.com/s/article/67272
     

 

 

I essentially did this. Proxmox wasn't doing it for me, it felt jank to have a hypervisor within a hypervisor and if I had to use Proxmox, why use ESXi but I don't like Proxmox so I gave ESXi a shot.

 

I was avoiding it for so long because I thought it won't be compatible with my hardware and surprisingly.... it is!

 

While I was on Proxmox, after battling both vyOS and RouterOS, I found myself using pfSense (which is an amazing piece of software) and using the experience I gained from trying to run ESXi+SCALE+pfSense under Proxmox, I decided to ditch Proxmox and go pfSense+SCALE on ESXi...

 

It took a lot of research, false starts and failed attempts but finally, I...

  • Installed ESXi
    I needed to use the flag systemMediaSize=min to discourage ESXi to allocate too much space to the OS (VMFSL) and leave some for the local store (VMFS)
     
  • Used the free space to create a new partition and assign it as a local store
    This step really stumped me since Proxmox gives you a local store but ESXi 7 requires me to go and enable SSH so I can go and create datastore following this guide on Github Gist and until I figured it out, ESXi felt like an intimidating ghost town
     
  • Whitelisted my SATA controller
    I found a guide that guided me to ensure that my controller won't be greyed out when I try to enable PCIe passthrough by modifying passthru.map
     
  • Went into maintenance mode and set the SATA controller as passthrough enabled and then rebooted the ESXi Host, then disabled SSH and exited maintenance mode
     
  • Upload my pfSense and SCALE ISOs, create a SCALE VM and assign the passed through SATA controller to the SCALE VM

    Turns out, when you assign RAM to a VM, the RAM isn't "hard allocated" to that VM but is rather "soft allocated" (made available when needed) by default and mixing that with passthrough makes ESXi upset with this error
     

    The solution is to go into advanced options within the VM's settings and ensure that the memory assigned equals the memory reserved.
     

  • Create 4 port groups, one for management, one for ISCSI, one for accessing the internet (through my general home network) and one for the intranet (managed by the pfSense instance inside)

    I named them ManageNet (bound to statically assigned IP attached to my home network through VMkernel NIC 0), PublicNet (used by all VMs as vNIC0), PrivateNet (used by all VMs as vNIC1) and SCSINet (bound to statically assigned IP attached to the internal network through VMkernel NIC 1)

    P.S. PublicNet is just the standard "VM Network" port group renamed
     

  • Assign two NICs to each VM

  • Start them up, install what's needed, import configurations and keys

  • Setup iSCSI to connect to the SCALE instance and use the exposed drive as the ZFS-backed data store for non-core VMs

    The way I'm doing it is jank so I won't be elaborating any further on it :P... I didn't expect this to work at all but it somehow did!
     

  • Have the logs set up on persistent storage by following the steps of a knowledge base article

and...

 

. That's all folks! - Storia e curiosità - ViviRidendo

Lol… sounds like you did the amount of learning and trial and error I have done over ~5 years in 2 nights. 
 

But I am curious what proxmox PCIe pass through issues you had - I have not had issues with pass through on my server nor two of my buddies’. I have pfsense and truenas core (plus simple stuff) virtualized under proxmox with PCIe pass through for HBA and NIC’s. 
 

Why did you opt to attempt to create LXC containers instead of just standard VM’s?

 

Glad you got stuff working, sounds like an exciting few days lol. 

Rig: i7 13700k - - Asus Z790-P Wifi - - RTX 4080 - - 4x16GB 6000MHz - - Samsung 990 Pro 2TB NVMe Boot + Main Programs - - Assorted SATA SSD's for Photo Work - - Corsair RM850x - - Sound BlasterX EA-5 - - Corsair XC8 JTC Edition - - Corsair GPU Full Cover GPU Block - - XT45 X-Flow 420 + UT60 280 rads - - EK XRES RGB PWM - - Fractal Define S2 - - Acer Predator X34 -- Logitech G502 - - Logitech G710+ - - Logitech Z5500 - - LTT Deskpad

 

Headphones/amp/dac: Schiit Lyr 3 - - Fostex TR-X00 - - Sennheiser HD 6xx

 

Homelab/ Media Server: Proxmox VE host - - 512 NVMe Samsung 980 RAID Z1 for VM's/Proxmox boot - - Xeon e5 2660 V4- - Supermicro X10SRF-i - - 128 GB ECC 2133 - - 10x4 TB WD Red RAID Z2 - - Corsair 750D - - Corsair RM650i - - Dell H310 6Gbps SAS HBA - - Intel RES2SC240 SAS Expander - - TreuNAS + many other VM’s

 

iPhone 14 Pro - 2018 MacBook Air

Link to comment
Share on other sites

Link to post
Share on other sites

16 hours ago, LIGISTX said:

But I am curious what proxmox PCIe pass through issues you had

Despite being an ESXi newbie, this wasn't my first Proxmox rodeo, my first experiments with Proxmox were when I was hackintoshing my Ryzen 1800X rig and wanted 32-bit support. It probably wouldn't be fair to blame Proxmox entirely but the whole process of doing VFIO on Linux is... tiresome and hit-or-miss.


Having to activate kernel modules, pass IOMMU groups, blacklist drivers, then, depending on what IOMMU groups you get, having to apply kernel patches (to split out IOMMU groups due to unfavorable setup) which then may or may not work and one day may just decide to not work at all.

 

Combine this with the fact that nested virtualization on Proxmox is less few-clicks (like in ESXi) and more under the hood despite Proxmox being supposed to be treated like an appliance (having to "ignore MSRs" was something that caused me to raise an eyebrow), I found that I was having a SCALE-esque problem of having an appliance that needs under the hood changes that are liable to change at any time as they're technically either not supported or "advanced".

 

I wasn't expecting a one-click run solution but this is going to be my production server so I'd like to minimize my risk exposure.

 

16 hours ago, LIGISTX said:

Why did you opt to attempt to create LXC containers instead of just standard VM’s?

Somewhere along the line, I realized that SCALE, vyOS and Proxmox are all based on Debian and figured it would be wasteful to run three distinct instances of similar logic if alternatively I can make them share one kernel. Apparently the guides I saw more or less summarized the process as extracting the root filesystem (from a squashfile or somewhere else) and create a GZ tarball and then create a metadata file and create a GZ tarball of that, leaving you with a root.tar.gz and a metadata.tar.gz

 

After messing around for a bit and having Proxmox refuse to recognize the root.tar.gz, I got tired and gave up. Right now, containers wouldn't benefit me as pfSense is based on FreeBSD 12, SCALE is based on Debian and ESXi is a proprietary POSIX-compatible OS, so there is no common base, so to speak.

 

I would like to enable PCI passthrough and nested virtualization on my SCALE VM so I can use that as Linux base and run containers while minimizing resource wastage but that's future me's problem.

 

16 hours ago, LIGISTX said:

Glad you got stuff working, sounds like an exciting few days lol. 

Thank you! It wouldn't be possible without the wealth of information available on blogs, wikis, forums, discussion boards, knowledge bases and comment sections (yes, really), I think I'd have given up a long time ago if the only thing I had was official documentation.

 

For example, setting up iSCSI networking was aided by these two guides (here, here) and while I eventually gave up on setting up CHAP, this forum thread was still helpful.

 

----

 

Also, small rant. I have an X550T2BLK NIC for my 10GbE needs and am currently using a QNA-UC5G1T (5GbE USB NIC) connected to my desktop computer and was expecting a 5GbE link.

 

Nope! The Intel card reports itself capable of 100Mbps, 1000Mbps and 10000Mbps, it doesn't report itself as 2500Mbps or 5000Mbps capable for reasons I do not understand and I remember on TrueNAS SCALE (back when I was on bare metal), having to edit my rc.conf to have ethtool modify the advertisement parameters of both ports on system startup.

 

 | 0x          008   |         100baseT Full |
 | 0x          020   |        1000baseT Full |
 | 0x         1000   |       10000baseT Full |
 | 0x 800000000000   |        2500baseT Full |
 | 0x1000000000000   |        5000baseT Full |
 |-------------------------------------------|
 | 0x1800000001028   |                       |

 Source: https://man7.org/linux/man-pages/man8/ethtool.8.html

Taking 0x1800000001028 to override what the ixgbe driver was reporting the card as capable of was jank but it worked. Now that I'm on ESXi, I can't do this! Apparently, this is a known problem (source, source) but unlike on SCALE, all I can do is sit on my hands (or buy a 10GbE card for my desktop which is exactly what I did, spending 75USD on a TX401)...

 

This, despite the X550-T2 being on the QVL (source)

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×