Help with RL using stable baselines 3 [Python]

ViniciusSilvestre · May 21, 2023

I'm trying to use sb3 for a RL project. The "issue" im facing is the following:

i have two gpu's in my system right now a RTX 3070 and a RTX 2070 super. the 3070 is running two instance of the env at the same time while the 2070 is running one instance of the env (limitations due to the way i'm doing a grid search for the optimal rewards).

during the episodes everything runs fine, but when the 2070 reaches the rollout stage, it moves all the processing to the 3070 and stays at 0% usage. this is despite the fact i specified the device the model should run at.

The model is trying to learn how to play a game i developed. The way thing are now its still faster than running with just the 3070 because of VRAM limitations, it can only fit two instances of the model in the 8GB frame buffer, but i cant help but feel im leaving performance behind.

any ideas what could be happening? and if so, how to fix it? i tried specifying the device even more with:

<
model = PPO('MultiInputPolicy', env, verbose=1, tensorboard_log=log_path, device='cuda:1')
with torch.cuda.device('cuda:1'):

model.learn(total_timesteps=80000, callback=callback)
>

but it didn't help

A1100a.txt Env.py PlagueGame.py Run Model 1.py Run Model 2.py Run Model 3.py