Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poll GitHub API instead of using webhooks #14

Open
pzehner opened this issue Jun 25, 2024 · 4 comments
Open

Poll GitHub API instead of using webhooks #14

pzehner opened this issue Jun 25, 2024 · 4 comments

Comments

@pzehner
Copy link

pzehner commented Jun 25, 2024

This project sounds very interesting for the needs of my lab. Unfortunately, opening a port plus having a reverse-proxy running on a machine that submits SLURM jobs is not really an option for me. As an alternative to using a webhook, would it be possible to poll the GitHub API instead?

I dug a little bit, sounds like you get the worflows of a repository with:

https://api.github.com/repos/USER/NAME/actions/workflows

then you get the latest runs of a workflow with:

https://api.github.com/repos/USER/NAME/actions/workflows/WORKFLOW_ID/runs

filter the queuing runs, then you get the jobs of a run with:

https://api.github.com/repos/USER/NAME/actions/runs/RUN_ID/jobs

filter the self-hosted jobs and you get the expected labels for the runner to spawn.

I'd love to contribute to this, but I'm not familiar with Rust.

@fknorr
Copy link
Contributor

fknorr commented Jun 25, 2024

It would certainly be possible to poll Github periodically, but I would be concerned about running into API rate limits unless we allow rather high latency between a CI job triggering and slurmactiond launching a job.

Have you considered running the reverse proxy on a separate, publicly reachable server and forwarding a port to the SLURM headnode / slurmactiond host through your local network?

@pzehner
Copy link
Author

pzehner commented Jun 25, 2024

It would certainly be possible to poll Github periodically, but I would be concerned about running into API rate limits unless we allow rather high latency between a CI job triggering and slurmactiond launching a job.

I checked the documentation, unauthenticated fetches are limited to 60 per hour, authenticated fetches to 5000 per hour, and authenticated fetches with Enterprise Cloud to 15000 per hour.

As you need 3 fetches per poll, in the worst case you are limited to 20 fetches per hour (every 3 minutes), which sounds reasonable.

Have you considered running the reverse proxy on a separate, publicly reachable server and forwarding a port to the SLURM headnode / slurmactiond host through your local network?

I have to check this option, but I'm afraid it may not be possible in my case.

@fknorr
Copy link
Contributor

fknorr commented Jun 25, 2024

I took a look at the code again this afternoon and doing what you propose is unfortunately a bit more complex than adding a timer loop around the update function (which I hoped could suffice), because we also need to know about jobs completing / failing, not just new ones arriving. Unfortunately I do not have the capacity at the moment to make this happen.

Nonetheless, having a polling feature would be neat even when a webhook is present, because webhook updates can spuriously drop due to a bad network connection. For these setups, a polling period of ~10 minutes or similar would suffice.

For reference: slurmactiond maintains a Scheduler state machine which keeps track of the active jobs and runners and triggers state transitions when an webhook event arrives to signal a job update. For a polling-based approach, we need to periodically poll each job (and trigger state updates where detected) and also query for new and unassigned jobs. There probably needs to be a separate module next to webhook.rs for these periodic jobs.

@pzehner
Copy link
Author

pzehner commented Jun 26, 2024

Extending an event-driven paradigm with a polled-state one is not trivial indeed.

I guess your current Scheduler could be re-used. As I said, I'd propose a PR for this if it were in Python or C++…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants