Jump to content

4x10Gbit (40Gbit) Fiber-optic SAN/iPXE Network Boot Build Log

Now we're entering uncharted territory. Moving forward I haven't tested any of this so it's going to be a learning experience.

 

Getting DHCP consolidated into the iSCSI server was actually surprisingly easy. Just had to find a good article that outlined the process.

 

How to install DHCP Server on FreeBSD

 

It seems FreeBSD uses "ports" and installing a DHCP server service takes a fairly easy series of commands:

portsnap fetch
portsnap extract
cd /usr/ports/net/isc-dhcp44-server
make

 

At this point there's a series of on-screen menu options you can select/deselect. The guide tells you to select BIND_SYMBOLS and deselect IPV6. I selected BIND_SYMBOLS and forgot to deselect IPV6 but it doesn't look like that impacted anything. After this all other prompts can be left on default.

 

When it returns you to the console run:

make install

 

It tells you to edit lines in /etc/rc.conf which seems to be like a startup file that executes functions on system startup but they didn't specify what to edit they just show a picture...how helpful...

 

Ignoring /etc/rc.conf now we have to edit /usr/local/etc/dhcpd.conf which miraculously uses the exact same syntax as isc-dhcp-server on Debian so I can just cut/paste my configuration over. 😆

 

Scrolling down a ways it tells you to edit rc.conf in order to start DHCP with the OS:

dhcpd_enable="YES"
dhcpd_ifaces="em0"

In my case "em0" was "vtnet3". I will have to test how the 2nd line likes me adding the other 3 interfaces since all four NIC's are going to need to be hosting DHCP in order for this to work.

 

Next they tell you to run this in order to start the service without rebooting the server:

/usr/local/etc/rc.d/isc-dhcpd.sh start

But I just get an error saying isc-dhcpd.sh doesn't exist. Going to the /usr/local/etc/rc.d directory there's no file ending in .sh but there is a isc-dhcpd file so running this seemed to get things rolling:

/usr/local/etc/rc.d/isc-dhcpd start

 

From here you can apparently verify the service is running with:

/usr/local/etc/rc.d/isc-dhcpd status

 

I took the 2x2670v1 server and removed it's config from the the Debian DHCP server then verified that the server lo longer got a reply from DHCP. It didn't.

 

Next I allowed the new FreeBSD DHCP server to run on the subnet and to my delight it worked immediately without hassle.

 

So we've eliminated the need for a separate DHCP server and better yet the config file is identical so there's nothing new to learn there.

 

Next-up is can we get TFTP chain-loading working...?

Link to comment
Share on other sites

Link to post
Share on other sites

I'm noting this down here right now so I don't forget or lose the information (very easy for me to do 😅)

 

At the Flexboot Spashscreen when you run Ctrl+B it's not a simple process to boot to a SAN disk. You're much better off leaving it up to DHCP and by extension iPXE as I just discovered how I may get chainloading to work this time and it means compiling a custom TFTP file for each individual client which is very much doable but the following commands are imperative for that to work. More to come as I continue to investigate this option.

 

dhcp
set initiator-iqn iqn.2021-9.boinc.com:lun3
sanboot iscsi:10.3.0.2::::iqn.2021-9.boinc.com:lun3

 

Link to comment
Share on other sites

Link to post
Share on other sites

Got TFTP working on FreeBSD. It's not iPXE so we're not booting yet but one step at a time.

 

I found what looks almost like a high school powerpoint presentation explaining how to set it up on FreeBSD.

 

Pretty strait forward:

Make sure the service starts by adding it to rc.conf:

nano /etc/rc.conf
inetd_enable="YES"

 

Edit /etc/inetd.conf and uncomment the two TFTP lines:

...
# ntalk is required for the 'talk' utility to work correctly
#ntalk	dgram	udp	wait	tty:tty	/usr/libexec/ntalkd	ntalkd
tftp	dgram	udp	wait	root	/usr/libexec/tftpd	tftpd -l -s /tftpboot
tftp	dgram	udp6	wait	root	/usr/libexec/tftpd	tftpd -l -s /tftpboot
#bootps	dgram	udp	wait	root	/usr/libexec/bootpd	bootpd
...

Gotta scroll down a little ways in the file to find them.

 

Start the service:

/etc/rc.d/inetd start

 

Check if it's running:

netstat -an | grep "*.69"

Should get the output:

udp6       0      0 *.69                   *.*                    
udp4       0      0 *.69                   *.*

If you don't get an output then something's wrong.

 

From here create the TFTP directory:

mkdir /tftpboot

 

Now just drop your files into the /tftpboot directory and have the DHCP server point your clients to the file:

...
subnet 10.3.0.0 netmask 255.255.255.0 {
    range 10.3.0.10 10.3.0.254;
#    option routers 10.3.0.1;
    next-server 10.3.0.2;
    filename "undionly.kpxe";
}
...

"next-server" is the TFTP server & "filename" is the updated iPXE image.

 

Now as you can see when you go to boot the server it chain loads the updated iPXE image from TFTP.

 

402868958_Screenshotfrom2021-10-0118-02-47.png.d5c29981a8467dfa99cf57f24a1eadce.png

 

This does present a problem as is though. After DHCP provides the information. It doesn't know that it already provided the updated file so when the updated version of iPXE queries DHCP as for what to do DHCP just provides the same information again. And iPXE pulls iPXE from TFTP again, and it will do this again, and again, and again...

 

There's a couple methods to breaking the loop and one of them is to mod the iPXE image so that it runs some commands instead of defaulting to requesting information from DHCP and that's what we're going to attempt now.

 

And that's exactly where the post I made over an hour ago comes in. 😆 Let's see if we can make this happen! iPXE's website outlines the process fairly extensively so I hope for this to be relatively painless.

Link to comment
Share on other sites

Link to post
Share on other sites

Holy lord it worked. 🤣

 

Start by downloading the latest version of iPXE from the Git repository:

git clone git://git.ipxe.org/ipxe.git

 

Now create a file with the list of commands you want iPXE to run on system boot starting with the line: #!ipxe

#!ipxe

dhcp
set initiator-iqn iqn.2021-9.boinc.com:lun3
sanboot iscsi:10.3.0.2::::iqn.2021-9.boinc.com:lun3

Save this file with whatever name you want.

 

Before we proceed I was missing liblzma-dev so:

sudo apt install liblzma-dev

 

Now run the commands:

cd ipxe/src
make bin/undionly.kpxe EMBED=/home/user/filename

Where user is your username and filename is your script for iPXE.

 

When it's done you can copy & rename undionly.kpxe if you like, from here we can move the file to the TFTP server and it works! :old-grin:

 

2003496846_Screenshotfrom2021-10-0119-55-24.png.53b31e047ebb0684cf20254a9a4361b2.png

 

With this we just gained two things:

  1. We can simplify our DHCP server config by removing all the iSCSI lines. Our custom iPXE images will handle what each system boots to.
  2. We should 🤞have UEFI boot support now too. Haven't tested it yet though.

I'm going to finish creating the other custom images and clean up the DHCP config. Then we'll go from there.

Link to comment
Share on other sites

Link to post
Share on other sites

All looks to be working well. I cleaned up the DHCP server config file quite a bit and re-arranged some things:

subnet 10.1.0.0 netmask 255.255.255.0 { range 10.1.0.10 10.1.0.254; next-server 10.1.0.1; }

subnet 10.2.0.0 netmask 255.255.255.0 { range 10.2.0.10 10.2.0.254; next-server 10.2.0.1; }

host boinc-node-5960x   { hardware ethernet 00:02:c9:57:1e:ae; fixed-address 10.2.0.11; filename "sanboot.lun2"; }

subnet 10.3.0.0 netmask 255.255.255.0 { range 10.3.0.10 10.3.0.254; next-server 10.3.0.1; }

host boinc-rig-2x2670v1 { hardware ethernet 00:02:c9:56:b4:14; fixed-address 10.3.0.11; filename "sanboot.lun3"; }

subnet 10.4.0.0 netmask 255.255.255.0 { range 10.4.0.10 10.4.0.254; next-server 10.4.0.1; }

host boinc-node-3930k   { hardware ethernet 00:02:c9:54:b6:44; fixed-address 10.4.0.11; filename "sanboot.lun4"; }

 

This actually provides a much cleaner look and makes more sense for my setup. So now instead of declaring the filename globally we're declaring it within the host variable. This allows us to customize what file on the TFTP server DHCP points the client to. Each contains a custom script pointing each client to their respective iSCSI LUN.

 

So now our FreeBSD server is hosting three service.

  • iSCSI
  • DHCP
  • TFTP

Which means we can install FreeBSD bare metal and should be able to UEFI boot systems. Next up I just got a couple of NIC's in and I'd like to see how they respond to this configuration but I think I'll save that for tomorrow. Good night!

Link to comment
Share on other sites

Link to post
Share on other sites

I am extremely happy to see that iPXE is a very universal utility when it comes to NIC's. No changes besides MAC addresses were necessary in the DHCP config testing out both the Mellanox ConnectX-3 CX311A & Intel X520-DA1 they both were drop-in plug'n'play. Interestingly though the Intel X520 performed exceptionally better than both Mellanox NICs when it came to OS load time which I find interesting...

 

The only other thing left to do is test & validate UEFI Boot support beyond that we're ready to write a tutorial.

Link to comment
Share on other sites

Link to post
Share on other sites

17 hours ago, Windows7ge said:

I am extremely happy to see that iPXE is a very universal utility when it comes to NIC's. No changes besides MAC addresses were necessary in the DHCP config testing out both the Mellanox ConnectX-3 CX311A & Intel X520-DA1 they both were drop-in plug'n'play. Interestingly though the Intel X520 performed exceptionally better than both Mellanox NICs when it came to OS load time which I find interesting...

 

The only other thing left to do is test & validate UEFI Boot support beyond that we're ready to write a tutorial.

does the intel card support some kind of hardware offloading that the mellanox cards don't?

Link to comment
Share on other sites

Link to post
Share on other sites

30 minutes ago, RollinLower said:

does the intel card support some kind of hardware offloading that the mellanox cards don't?

The only thing I can think of would possibly be RDMA (Remote Direct Memory Access) but it's also possible it's just a better more supported/mature firmware.

 

I opened the network monitor on FreeBSD and I could see while the Mellanox NIC's after iPXE initialization and after GRUB finished it's job the OS would load at a rate of single digit MB/s if not 100's of KB/s until close to final boot where it would jump to over 100MB/s (possibly when the OS would finish loading the driver NIC? ¯\_(ツ)_/¯)

 

Meanwhile with the Intel X520 once iPXE connected to the iSCSI drive it would instantly jump to well over 100MB/s and the OS would just immediately start loading. ie. No wait time.

 

At more than twice the price though of the CX311A on eBay I don't know how keen I'd be on buying more of them just for faster boot on a server.

 

For shits'n'giigles though I went ahead and updated the firmware on the Mellanox ConnectX-3 CX311A to the latest. It's a pretty strait forward process on Debian but the firmware version number wasn't a huge jump so I don't think it will have changed anything drastically. Likely just bug fixes. I'll test what impact it had if any another weekend.

 

Something important to note though is it doesn't appear installing Ubuntu Desktop is an easy process. Although you can install open-iscsi from the live media after you switch to the iSCSI target you lose all ability to talk to DNS. Internet access still exists but when you try to install open-iscsi on the iSCSI drive you can't reach the repositories. This is something I haven't found a workaround for yet but you can just install Ubuntu Server (comes with open-iscsi pre-installed) then install a desktop so that's equivocally as good IMO.

 

Will test UEFI boot today. I hope it works given most platforms today use it by default.

Link to comment
Share on other sites

Link to post
Share on other sites

Hmn...well this isn't a good sign...

 

1898000634_Screenshotfrom2021-10-0314-09-21.png.7d9c8b26c269285ca8a3dcfba2f7fa58.png

 

Installed Ubuntu Server via UEFI so it has a 512MiB EFI partition but even after chain-loading the updated iPXE file we're still stuck as though we don't have UEFI support...

 

It looks like I'm going to have to pop the CX311A in and see if that changes anything. If not might have to test the X520 as well. I wonder if it's not just iPXE that must support UEFI but the NIC itself.

Link to comment
Share on other sites

Link to post
Share on other sites

Hmn, I knew my luck was going to run out. Questions just led themselves to more questions...

 

So, I found the problem why UEFI wasn't working...sort of...

  1. I was using the wrong ipxe file. I needed to be using ipxe.efi not undionly.kpxe.
  2. None of the three NIC's I've tested want to work with ipxe.efi...

Every NIC complains about either it's unknown format or that it doesn't fit in some sort of memory buffer.

 

@RollinLowerSome good news about the CX311A. With the latest firmware it actually outperforms the MNPA19-XTR by 10~20x.

 

MNPA19-XTR = Loads OS around 500KB~1MB/s at a time.

CX311A = Loads OS around 10~12MB/s at a time.

 

Big improvement. Same cost on eBay. I think I'm going to be changing out some NIC's. 😅

 

All of this does amount to one big downside. iPXE at least to the best of what I could accomplish here will only do BIOS/Legacy boot. Their website swears by supporting UEFI w/ ipxe.efi but I just cannot get it working.

 

I'm nearly out of time for this weekend so I think this is where I'm going to leave it. I'll start working on the tutorial next week-end. If any new information comes up though I'll re-investigate UEFI boot.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×