GCP - GPU staging time reduction

Asking here as nobody can seem to figure this out.


I have an application that requires the smallest boot-time/TTL possible with GPUs attached to a VM in GCP CE. To keep cost down, my infrastructure is dependent on starting and stopping dedicated instances as demand increases/decreases. (I would convert to containers but gpu costs are too high)

I have achieved sub-5second start times with custom images without GPUs, but as soon as I attach a GPU, the time to "RUNNING" is always past 20-30s.

I have tried multiple different distros, clear linux, prepackaged Nvidia driver images, minimal installs of Fedora, minimalised Debian, reductions to kernel and userspace - systemd-analyze says my boot-time is 3s, but starting the VM with a GPU takes 20-30s in "STAGING" before running.

This only occurs when the gpu is attached to the VM and when removed the VM starts within the time mentioned by systemd-analyze. It is consistent across all distros and bootimages.

Is there any packages or documentation I am missing to speed up this staging-time with a GPU attached or is this a limitation with GCP's internal staging of GPU instances?

I'd much appreciate any help or advice.

