Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use dedicated GCP runners #184

Closed
proppy opened this issue Feb 24, 2022 · 18 comments
Closed

use dedicated GCP runners #184

proppy opened this issue Feb 24, 2022 · 18 comments
Labels
enhancement New feature or request

Comments

@proppy
Copy link
Contributor

proppy commented Feb 24, 2022

We should consider using https://antmicro.com/blog/2021/08/open-source-github-actions-runners-with-gcp-and-terraform/ to get quicker feedback on the CI jobs.

@kgugala
Copy link

kgugala commented Feb 24, 2022

this is, of course, doable. We'd just need to setup the runners in a GCP project. Do we have a project dedicated to HDL GH org?

@proppy
Copy link
Contributor Author

proppy commented Feb 24, 2022

@umarcor
Copy link
Member

umarcor commented Feb 26, 2022

Actually, we do want to use "self-hosted" runners in multiple repos of this org (incl. containers, conda-* and maybe packages). See hdl/containers#51.

There is an 'hdl-containers' project in GCP, which is used for the container registry (gcr.io/hdl-containers). I do have management access there, using my personal gmail account. That allows me to e.g. update the tokens, which expire every 1-3 months. See https://hdl.github.io/containers/dev/Tasks.html#credentials.

However, I'm unsure about 'hdl-containers' and 'github-hdl' being the same project in GCP. Precisely, I don't have permissions to access https://console.cloud.google.com/?project=github-hdl. I'm missing resourcemanager.projects.get.

In the end of September 2021, I talked to @mithro and @QuantamHD about this (through e-mail). Ethan told it was problematic to add either my university account or my personal gmail account for these purposes. Since I do have an antmicro account now, maybe we can reconsider.

@umarcor umarcor added the enhancement New feature or request label Feb 26, 2022
@mithro
Copy link
Member

mithro commented Feb 27, 2022

Looks like both the hdl-conda and github-hdl Google Cloud Platform projects were set up in the past to host resources for this organization.

I believe the plan was that github-hdl organization would not be allowed to have publically accessible resources (like Google Cloud Storage buckets), while hdl-conda (and hdl-containers) would be were results would be published and thus be significantly more constrained.

I thus think it makes the most sense to deploy the GitHub runners under github-hdl for this organization. I'll make sure that @umarcor and @kgugala have access to the organization through their Antmicro accounts.

@mithro
Copy link
Member

mithro commented Feb 27, 2022

@umarcor / @kgugala - Should have pending invites to manage the github-hdl organization.

@proppy
Copy link
Contributor Author

proppy commented Dec 13, 2022

@PiotrZierhoffer @ajelinski @kgugala how can I help to setup dedicated runner for conda-eda? per #210 (comment) this will become necessary for bigger packages like Xyce.

@kgugala
Copy link

kgugala commented Dec 13, 2022

@proppy We can setup the infrastructure. Do you know what machines will be needed?

@proppy
Copy link
Contributor Author

proppy commented Dec 13, 2022

There are currently 64 package building jobs being run concurrently on every commit.

The longest job (XLS) takes around 2 hours and dominate the total build time.

Assuming you can't have multiple runner running on a single node what about starting with 4x n2-standard-32?

@kgugala
Copy link

kgugala commented Dec 13, 2022

We can choose machine type per job in CI (you can configure this in the yaml file). We just need to have a list of machine types we want to use

@proppy
Copy link
Contributor Author

proppy commented Dec 13, 2022

you can configure this in the yaml file

per job or per workflow?

@mithro
Copy link
Member

mithro commented Dec 13, 2022

per job I think?

@proppy
Copy link
Contributor Author

proppy commented Dec 14, 2022

We just need to have a list of machine types we want to use

  • n2-standard-2 # same as github workers
  • n2-standard-32 # for larger build (xls, xyce)

@umarcor
Copy link
Member

umarcor commented Dec 14, 2022

@proppy what's the issue with building Xyce on default runners? We build it in hdl/containers and it takes less than 2h, which is far below the limit. Are you cross-compiling it for architectures other than x64?

@proppy
Copy link
Contributor Author

proppy commented Dec 21, 2022

@umarcor this was based on a conversation with @cbalint13 here: #210 (comment)

@cbalint13
Copy link

@umarcor this was based on a conversation with @cbalint13 here: #210 (comment)

@umarcor , @proppy

Trilinos, takes full ~15h (Release, no-debug), all 3x complete builds {native, openmpi, mpich}:

Xyce, takes full ~1h (Release, no-debug), all 3x complete builds {native, openmpi, mpich}:

Automated builds having full possible flags/features, except CUDA for now (coming soon).
All builders {aarch64, x86-64, ppc64le} are native, no cross compilation, ram + rootramfs is ~150Gb.

@proppy
Copy link
Contributor Author

proppy commented Jan 17, 2023

@AdamOlech configured the custom runner and posted a PoC here:

@ajelinski @PiotrZierhoffer should we migrate all the job at once? or incrementally (starting with the one that currently fails because of limited resources #263 #238) ?

@PiotrZierhoffer
Copy link
Contributor

We are working on moving the whole workflow. It should be easier than dividing the current one, as dependencies should be generally the same for everything.

@proppy
Copy link
Contributor Author

proppy commented Feb 24, 2023

Thanks for doing this!

@PiotrZierhoffer @ajelinski @AdamOlech

@proppy proppy closed this as completed Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants