Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runners on EKS with an EFS volume in K8s-mode can't start a job pod. #3885

Open
4 tasks done
sierrasoleil opened this issue Jan 14, 2025 · 1 comment
Open
4 tasks done
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers

Comments

@sierrasoleil
Copy link

Checks

Controller Version

0.10.1

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. Deploy a kubernetes-mode runner in EKS using CONTAINER_HOOKS, with EFS as a RWX storage volume
2. Run a job that only uses the runner container, everything is fine.
3. Run the same job but add the `container:` key to the workflow, the runner pod never gets past "pending"

Describe the bug

First and foremost, has anyone successfully used EFS for the _work volume in kubernetes-mode runners? I can't seem to find any examples, so maybe that's just wrong? I don't know of any other readily available CSI for EKS that supports RWX, which I guess is required for k8s-mode?

All runners, successful or not, show a few error events while waiting for the EFS volume to become available.

Warning  FailedScheduling  33s   default-scheduler  0/8 nodes are available: waiting for ephemeral volume controller to create the persistentvolumeclaim "arc-amd-8jt4l-runner-kll9r-work". preemption: 0/8 nodes are available: 8 Preemption is not helpful for scheduling.

I guess EFS is just slow, but I don't know why that would prevent the runner from starting at all.

Describe the expected behavior

I expected a runner with the container: key to create a job pod using that container.

Additional Context

My Runner definition:

githubConfigSecret: github-auth
githubConfigUrl: <url>

controllerServiceAccount:
  namespace: gh-controller
  name: github-arc

# containerMode:
#   kubernetesModeWorkVolumeClaim:
#     accessModes: ["ReadWriteOnce"]

template:
  spec:
    nodeSelector:
      beta.kubernetes.io/arch: amd64
    serviceAccountName: github-runner
    # securityContext:
    #   fsGroup: 1001
    containers:
      - name: runner
        image: ghcr.io/actions/actions-runner:latest
        #image: 823996030995.dkr.ecr.us-west-2.amazonaws.com/github-runner-robust:amd64
        command: ["/home/runner/run.sh"]
        env:
          - name: ACTIONS_RUNNER_CONTAINER_HOOKS
            value: /home/runner/k8s/index.js
          - name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
            value: /home/runner/config/hook-extension.yaml
          - name: ACTIONS_RUNNER_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
            value: "false"
        volumeMounts:
          - name: work
            mountPath: /home/runner/_work
          - name: hook-extension
            mountPath: /home/runner/config/hook-extension.yaml
            subPath: hook-extension.yaml
    volumes:
      - name: work
        ephemeral:
          volumeClaimTemplate:
            spec:
              accessModes: ["ReadWriteMany"]
              storageClassName: "gh-efs-sc"
              resources:
                requests:
                  storage: 10Gi
      - name: hook-extension
        configMap:
          name: hook-extension
          items:
            - key: content
              path: hook-extension.yaml


The Hook extention only adds a serviceAccountName to the worker pod:

apiVersion: v1
kind: ConfigMap
metadata:
  name: hook-extension
data:
  content: |
    spec:
      serviceAccountName: github-runner


The following job will work:

name: Actions Runner Controller
on:
  workflow_dispatch:
jobs:
  Base-Runner:
    runs-on: arc-amd
    #container:
    #  image: alpine:latest
    steps:
      - run: echo "hooray!"


However, if I uncomment `container:` and `image:` the runner pod gets stuck at `pending` and it never even creates the job pod.

It's worth noting that `fsGroup:` key because that previously got the runner to work, but after some CSI updates it became a problem.

Controller Logs

The controller logs just show this every minute or so while the pod is pending:

2025-01-14T20:52:05Z    INFO    EphemeralRunnerSet      Ephemeral runner counts {"version": "0.10.1", "ephemeralrunnerset": {"name":"arc-amd-8jt4l","namespace":"gh-runners"}, "pending": 1, "running": 0, "finished": 0, "failed": 0, "deleting": 0}
2025-01-14T20:52:05Z    INFO    EphemeralRunnerSet      Scaling comparison      {"version": "0.10.1", "ephemeralrunnerset": {"name":"arc-amd-8jt4l","namespace":"gh-runners"}, "current": 1, "desired": 1}
2025-01-14T20:52:05Z    INFO    AutoscalingRunnerSet    Find existing ephemeral runner set      {"version": "0.10.1", "autoscalingrunnerset": {"name":"arc-amd","namespace":"gh-runners"}, "name": "arc-amd-8jt4l", "specHash": "76b6bcbfbb"}

Runner Pod Logs

The runner pod never reaches a point where it can produce logs.
@sierrasoleil sierrasoleil added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels Jan 14, 2025
Copy link
Contributor

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

@sierrasoleil sierrasoleil changed the title <Please write what didn't work for you here> Runners on EKS with an EFS volume in K8s-mode can't start a job pod. Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers
Projects
None yet
Development

No branches or pull requests

1 participant