Jump to content

TUTORIAL FREE 1.7MPPD+ (F@H, and BOINC) GCloud GPU (Preemptible instances) FOR A MONTH

Introduction

This is an extension of the already existing guide written by @Metallus97 for using cloud GPUs to run folding applications. I will be referring to some steps in that guide, so it will probably be good to have it open in an other tab. That guide shows you how to set up Folding at Home for a normal google cloud instance for 1.7MPPD+ for free for almost two weeks. In this guide I will be showing you how to set up Folding at Home to run on "preemptible" google cloud instances for a little under half the cost with little/no PPD penalty, that gives us almost a month of GPU time at around 1.7MPPD for free! These instances also last at the very most 24 hours before they get shutdown, depending on time of day I have had instances up for a range between 1 and 10 hours so it required some extra setup to do constant, almost uninterrupted folding using this instance type (Don't worry about that though, I have made a Terraform project to to almost ALL of the setup for you!).

 

Lets get started!

1. Pick where you want to run your folding vms. Using this calculator, change the machine class to "Preemptible" and "Machine type" to "n1-standard-2" or whichever has at least one core per GPU that you want to have. Add the GPUs that you want and click add to estimate. Different regions have different costs so you can click "edit" to test which regions you will get the best bang for your buck, and/or which are the best places to run these vms because they are near you. I have chosen what I found to be the cheapest region to use, so if you just want to just get started you can ignore this step, and steps 8 and 9.

Spoiler

1242422762_Screenshot_2020-04-11GoogleCloudPlatformPricingCalculator.png.e25db854b2e4eaf17024e661fa7a2001.png

 

2. Next off, you will need to follow steps 1.1-1.5 on @Metallus97's guide.

3. Then, download and install Terraform as that is what we will be using to automatically set up almost everything you will need!

4. Download and unzip this file.
5. Open `terraform.tfvars` in a text editor.

6. in the `terraform.tfvars` file you should:

    - Fill in remote_access_ip and remote_access_password if you want to access BOINC or F@H Remotely(If you leave the password blank, Terraform will generate a password for you). 

    - Fill in your F@H username, passkey, and team if you want to use Folding at Home

    - Choose how many of each instance type you want to be running (F@H exclusive, BOINC exclusive, and both F@H and BOINC)

7. If you want, in `terraform.tfvars` you can also set the machine type, GPU type and count to what you chose before, or leave the defaults for a 1 GPU Tesla P100 instance for a single instance of the best $/PPD/Day configuration for F@H. (More GPUs as long as you keep 1 core per GPU will yeild more PPD and the same bang for buck, but give you less time. The default config gives you almost a month!).

 

You can ignore these steps if you didn't want to do the work of choosing regions yourself, the project is configured to use what I found to be the cheapest regions by default.

Spoiler

8. Choose regions to set `function_region` and `app_region` to using these, links. Ideally choose what is closest to where you chose to run your vms

9. Set `default_region` and `default_zone` to what region you chose to run the vms in.

10. Open your project dashboard and add `project=` in `terraform.tfvars` to the  "Project ID" in your dashboard. e.g `project="my_project_id"`.

Spoiler

1130363927_Screenshot_2020-04-11HomeMyFirstProjectGoogleCloudPlatform.png.02eecb47ef6ab5c07c31171ccd3aa8a7.png

11. Go to the functions and scheduler web pages and wait for them to initialise, when asked to set a region for the scheduler pick what you chose for the app engine (or if you are going with the defaults, select "us-west2".

12. Go to the service accounts page and add a service account for terraform and add these rolls:

  • Cloud Functions Developer
  • Cloud Scheduler Admin
  • Compute Admin
  • Service Account User
  • Pub/Sub Editor
  • Storage Admin

Then create and download the account key.

Spoiler

Creating the service account:

1780949417_Screenshot_2020-04-11ServiceaccountsIAMAdminMyFirstProjectGoogleCloudPlatform.thumb.png.932eaa95c3f05391171b288408e515a2.png

 

Giving the service account a name:

1424138405_Screenshot_2020-04-11CreateserviceaccoIAMAdminMyFirstProjectGoogleCloudPlatform.thumb.png.039e6c7a828e2f5bbf01d80878d0b3aa.png

 

Assigning roles to the service account:

817196157_Screenshot_2020-04-11CreateserviceaccoIAMAdminMyFirstProjectGoogleCloudPlatform(2).thumb.png.e15df6c21d074ca6bdec687988089fab.png

 

200231116_Screenshot_2020-04-11CreateserviceaccoIAMAdminMyFirstProjectGoogleCloudPlatform(1).thumb.png.1463ee23b59f52d84b90587c3154b202.png

Creating and downloading the account key:

142492354_Screenshot_2020-04-11CreateserviceaccoIAMAdminMyFirstProjectGoogleCloudPlatform(3).thumb.png.dc7c950bff3607828069901919d03d2a.png

 

1931846267_Screenshot_2020-04-11CreateserviceaccoIAMAdminMyFirstProjectGoogleCloudPlatform(4).png.95258148d53c21a648bdce4ca9fbc59b.png

13. Rename the downloaded .json file to "credentials.json" and move it into the `resources/` folder in the unzipped project folder.

14. Open the project folder in a terminal, and run:

`terraform init`

$ terraform init

Initializing the backend...

Initializing provider plugins...
- Checking for available provider plugins...
- Downloading plugin for provider "archive" (hashicorp/archive) 1.3.0...
- Downloading plugin for provider "random" (hashicorp/random) 2.2.1...
- Downloading plugin for provider "google" (hashicorp/google) 3.16.0...

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

Then `terraform apply`

$ terraform apply
data.archive_file.start_vm_function_source: Refreshing state...

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

......

Plan: 10 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: 

and enter "yes".

 

15. Add your instance as a remote client (if you configured an access IP and password) (using port 36331 for F@H) and wait for it to come up. You might need to wait a while for the instance to finish installing the Nvidia drivers and F@H client and/or BOINC client. You can keep tabs on the installation progress by running:

tail -n 100 -f /var/log/cloud-init-output.log

on the instance over SSH (see how below).

 

 

Notes

You can SSH (remote terminal) into the instance by clicking this on this page:

Spoiler

177671793_Screenshot_2020-04-11ComputeEngine-MyFirstProject-GoogleCloudPlatform.thumb.png.8ee0177c73d45b4e051734980e14d379.png

 

When you are finished (probably when you have run out of free credit), open the project folder in a terminal again and run:

terraform destroy

 

EDIT:

I have been dealing with strange FAHCore interruptions (code 102) on my GCE instances (on GPU units). After looking through the system logs and resource usage it turns out that F@H was chewing up all available memory and then getting OOM killed. If you are dealing with this, try changing the machine type to `n1-standard-2`. After waiting to see stability I may change the default machine type to this even though it costs a bit more (but negligible compared to the GPU costs anyway).

 

EDIT2:

I should also note, this setup doesn't do anything special to for BOINC to handle the instances starting and stopped. I couldn't find much online and what I did find suggests that BOINC already handles this fine. Let me know if you have any problems or know otherwise.

Edited by WhyKickAmooCow
Update to include that now the Terraform project can provision F@H, and BOINC instances.
Link to comment
Share on other sites

Link to post
Share on other sites

Once again and in a public post: THANKS BRO! That essentially gives double the free points and with your scripts isn't hard to do!

FOLDING MONTH 2021! GOGOGO and save on some heating costs 🙂

 

Link to comment
Share on other sites

Link to post
Share on other sites

Thanks for the tutorial but I get the following error after I apply the terraform config:

 

Error: googleapi: Error 409: Sorry, that name is not available. Please try a different one., conflict

  on main.tf line 153, in resource "google_storage_bucket" "bucket":
 153: resource "google_storage_bucket" "bucket" {

I did make the VM, should I kill it or will it work fine? I think it happened because I ran the apply script twice and it failed the first time because I ran it before my gpu quota was accepted.

Link to comment
Share on other sites

Link to post
Share on other sites

@Menzo

That should be fine. As long as you can see here that there are a bunch of "start" messages then it means that it was created successfully. Running `terraform refresh` and then `terraform apply` may fix the issue though.

 

EDIT:

And if that doesn't work, try `terraform import google_storage_bucket.bucket folding-bucket`.

Link to comment
Share on other sites

Link to post
Share on other sites

On 4/11/2020 at 2:01 PM, WhyKickAmooCow said:

No problem. The least I can do while in lockdown is my part to help it end sooner!

please remember to quote or mention with @starttypingtheusername, because i idd not get a notification about this 

FOLDING MONTH 2021! GOGOGO and save on some heating costs 🙂

 

Link to comment
Share on other sites

Link to post
Share on other sites

@WhyKickAmooCow

I did not see lots of start messages and when my VM was preempted and I restarted it manually all progress was lost. So I'm now running 'terraform destroy' and starting again.

Link to comment
Share on other sites

Link to post
Share on other sites

Thanks for the guide. I had the same error message during installation but it doesn't seem to have caused any issues and the instances are running properly.

Link to comment
Share on other sites

Link to post
Share on other sites

I think I got it all fixed now,

 

When I change the name of the bucket from 'folding-bucket' to 'folding-buckets' in main.tf and do a 'terraform refresh' and 'terraform apply' it works with no errors.

 

I also had to add a timezone to the scheduler to make it work.

 

Thanks for the tutorial and help.

Link to comment
Share on other sites

Link to post
Share on other sites

@Menzo

It is strange that happened in the first place. That probably means that Terraform created the bucket but didn't save the state. Did you happen to kill Terraform while it was still running?

 

And yes, I can see that a timezone is required for scheduler, which does make it strange that mine works fine without a timezone set (even though it shows as mandatory). I wonder if it is just being forced to use the timezone that I used for a different scheduler. I will add a timezone to the scheduler created by Terraform so that should help.

 

EDIT:

I can see that it was creating the scheduler with a default timezone of etc/utc so I wonder why it wasn't working in your case. Either way, there is now a variable to set scheduler timezone.

Link to comment
Share on other sites

Link to post
Share on other sites

  • 3 weeks later...

Ah, for anyone else having the issue with not being able to create buckets due to a naming conflict I think I found the problem. I am pretty sure that it was my bad, and that buckets have to have a unique name globally (can't conflict with the name of even anyone else's buckets. I will address this soon, along with an update to provision a boinc instance, so that should be pretty neat.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×