Skip to content

Commit 4a6c3fa

Browse files
authored
add argo integration based on pod integration (#3897)
* add argo integration based on pod integration * address pr comments * external workload based support * address feedback
1 parent a8c759f commit 4a6c3fa

File tree

9 files changed

+163
-11
lines changed

9 files changed

+163
-11
lines changed

site/content/en/docs/tasks/_index.md

+3-2
Original file line numberDiff line numberDiff line change
@@ -35,16 +35,16 @@ batch user is a researcher, AI/ML engineer, data scientist, among others.
3535

3636
As a batch user, you can learn how to:
3737
- [Run a Kueue managed batch/Job](run/jobs).
38-
- [Run a Kueue managed Flux MiniCluster](run/flux_miniclusters).
3938
- [Run a Kueue managed Kubeflow Job](run/kubeflow).
4039
Kueue supports MPIJob v2beta1, PyTorchJob, TFJob, XGBoostJob and PaddleJob.
4140
- [Run a Kueue managed KubeRay RayJob](run/rayjobs).
4241
- [Run a Kueue managed KubeRay RayCluster](run/rayclusters).
43-
- [Run a Kueue managed AppWrapper](run/appwrappers).
4442
- [Submit Kueue jobs from Python](run/python_jobs).
4543
- [Run a Kueue managed plain Pod](run/plain_pods).
4644
- [Run a Kueue managed JobSet](run/jobsets).
4745
- [Submit jobs to MultiKueue](run/multikueue).
46+
- [Run external workloads](run/external_workloads).
47+
Kueue allows one to use built-in integrations (such as Pods or Jobs) to run external workloads.
4848

4949
### Serving user
5050

@@ -61,6 +61,7 @@ A _platform developer_ integrates Kueue with other software and/or contributes t
6161

6262
As a platform developer, you can learn how to:
6363
- [Integrate a custom Job with Kueue](dev/integrate_a_custom_job).
64+
- [Integrate a custom workload with Kueue using built-in frameworks](dev/external_frameworks).
6465
- [Enable pprof endpoints](dev/enabling_pprof_endpoints).
6566
- [Develop a custom AdmissionCheck Controller](dev/develop-acc).
6667

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
title: "External Frameworks"
3+
weight: 8
4+
date: 2025-01-17
5+
description: >
6+
How to run Kueue with external frameworks
7+
---
8+
9+
See [external frameworks](/docs/tasks/run/external_workloads) for examples of using existing
10+
integrations to integrate external frameworks.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
---
2+
3+
title: "Supporting External Frameworks"
4+
linkTitle: "External Frameworks"
5+
weight: 9
6+
date: 2025-01-23
7+
description: >
8+
How to run Kueue with external frameworks
9+
---
10+
11+
The tasks below show you how to build a custom integration.
12+
You can use AppWrapper, job-based workloads and pod-based workloads.
13+
14+
### [AppWrapper](https://project-codeflare.github.io/appwrapper/) Integration
15+
- [Run a custom workload using Appwrappers](/docs/tasks/run/external_workloads/wrapped_custom_workload).
16+
17+
### Integrations based on built-in frameworks
18+
- [Run a Flux Miniclusters using job integration](/docs/tasks/run/external_workloads/flux_miniclusters).
19+
- [Run an Argo Workflow using pod integration](/docs/tasks/run/external_workloads/pod_based_workloads/argo_workflow).
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
---
2+
title: "Run An Argo Workflow"
3+
date: 2025-01-23
4+
weight: 3
5+
description: >
6+
Integrate Kueue with Argo Workflows.
7+
---
8+
9+
This page shows how to leverage Kueue's scheduling and resource management capabilities when running [Argo Workflows](https://argo-workflows.readthedocs.io/en/latest/).
10+
11+
This guide is for [batch users](/docs/tasks#batch-user) that have a basic understanding of Kueue. For more information, see [Kueue's overview](/docs/overview).
12+
13+
Currently Kueue doesn't support Argo Workflows [Workflow](https://argo-workflows.readthedocs.io/en/latest/workflow-concepts/) resources directly,
14+
but you can take advantage of the ability for Kueue to [manage plain pods](/docs/tasks/run_plain_pods) to integrate them.
15+
16+
## Before you begin
17+
18+
1. Learn how to [install Kueue with a custom manager configuration](/docs/installation/#install-a-custom-configured-released-version).
19+
20+
2. Follow steps in [Run Plain Pods](/docs/tasks/run/plain_pods/#before-you-begin)
21+
to learn how to enable and configure the `v1/pod` integration.
22+
23+
3. Install [Argo Workflows](https://argo-workflows.readthedocs.io/en/latest/installation/#installation)
24+
25+
## Workflow definition
26+
27+
### a. Targeting a single LocalQueue
28+
29+
If you want the entire workflow to target a single [local queue](/docs/concepts/local_queue),
30+
it should be specified in the `spec.podMetadata` section of the Workflow configuration.
31+
32+
{{< include "examples/pod-based-workloads/workflow-single-queue.yaml" "yaml" >}}
33+
34+
### b. Targeting a different LocalQueue per template
35+
36+
If prefer to target a different [local queue](/docs/concepts/local_queue) for each step of your Workflow,
37+
you can define the queue in the `spec.templates[].metadata` section of the Workflow configuration.
38+
39+
In this example `hello1` and `hello2a` will target `user-queue` and `hello2b` will
40+
target `user-queue-2`.
41+
42+
{{< include "examples/pod-based-workloads/workflow-queue-per-template.yaml" "yaml" >}}
43+
44+
### c. Limitations
45+
46+
- Kueue will only manage pods created by Argo Workflows. It does not manage the Argo Workflows resources in any way.
47+
- Each pod in a Workflow will create a new Workload resource and must wait for admission by Kueue.
48+
- There is no way to ensure that a Workflow will complete before it is started. If one step of a multi-step Workflow does not have
49+
available quota, Argo Workflows will run all previous steps and then wait for quota to become available.
50+
- Kueue does not understand Argo Workflows `suspend` flag and will not manage it.
51+
- Kueue does not manage `suspend`, `http`, or `resource` template types since they do not create pods.

site/content/en/docs/tasks/run/flux_miniclusters.md site/content/en/docs/tasks/run/external_workloads/flux_miniclusters.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: "Run A Flux MiniCluster"
33
linkTitle: "Flux MiniClusters"
44
date: 2022-02-14
5-
weight: 6
5+
weight: 2
66
description: >
77
Run a Kueue scheduled Flux MiniCluster.
88
---

site/content/en/docs/tasks/run/wrapped_custom_workload.md site/content/en/docs/tasks/run/external_workloads/wrapped_custom_workload.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: "Run A Wrapped Custom Workload"
33
linkTitle: "Custom Workload"
44
date: 2025-01-14
5-
weight: 7
5+
weight: 1
66
description: >
77
Use an AppWrapper to Run a Custom Workload on Kueue.
88
---

site/static/_redirects

+7-7
Original file line numberDiff line numberDiff line change
@@ -15,16 +15,16 @@
1515
/docs/tasks/enabling_pprof_endpoints /docs/tasks/dev/enabling_pprof_endpoints 301
1616
/docs/tasks/integrate_a_custom_job /docs/tasks/dev/integrate_a_custom_job 301
1717

18-
/docs/tasks/run_flux_minicluster /docs/tasks/run/flux_miniclusters 301
19-
/docs/tasks/run_jobs /docs/tasks/run/jobs 301
20-
/docs/tasks/run_jobsets /docs/tasks/run/jobsets 301
21-
/docs/tasks/run_kubeflow_jobs /docs/tasks/run/kubeflow 301
22-
/docs/tasks/run_plain_pods /docs/tasks/run/plain_pods 301
23-
/docs/tasks/run_rayclusters /docs/tasks/run/rayclusters 301
24-
/docs/tasks/run_rayjobs /docs/tasks/run/rayjobs 301
18+
/docs/tasks/run_jobs /docs/tasks/run/jobs 301
19+
/docs/tasks/run_jobsets /docs/tasks/run/jobsets 301
20+
/docs/tasks/run_kubeflow_jobs /docs/tasks/run/kubeflow 301
21+
/docs/tasks/run_plain_pods /docs/tasks/run/plain_pods 301
22+
/docs/tasks/run_rayclusters /docs/tasks/run/rayclusters 301
23+
/docs/tasks/run_rayjobs /docs/tasks/run/rayjobs 301
2524

2625
/docs/tasks/run_kubeflow_jobs/run_mpijobs /docs/tasks/run/kubeflow/mpijobs 301
2726
/docs/tasks/run_kubeflow_jobs/run_paddlejobs /docs/tasks/run/kubeflow/paddlejobs 301
2827
/docs/tasks/run_kubeflow_jobs/run_pytorchjobs /docs/tasks/run/kubeflow/pytorchjobs 301
2928
/docs/tasks/run_kubeflow_jobs/run_tfjobs /docs/tasks/run/kubeflow/tfjobs 301
3029
/docs/tasks/run_kubeflow_jobs/run_xgboostjobs /docs/tasks/run/kubeflow/xgboostjobs 301
30+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
apiVersion: argoproj.io/v1alpha1
2+
kind: Workflow
3+
metadata:
4+
generateName: steps-
5+
spec:
6+
entrypoint: hello-hello-hello
7+
8+
templates:
9+
- name: hello-hello-hello
10+
steps:
11+
- - name: hello1 # hello1 is run before the following steps
12+
template: whalesay
13+
arguments:
14+
parameters:
15+
- name: message
16+
value: "hello1"
17+
- - name: hello2a # double dash => run after previous step
18+
template: whalesay
19+
arguments:
20+
parameters:
21+
- name: message
22+
value: "hello2a"
23+
- name: hello2b # single dash => run in parallel with previous step
24+
template: whalesay-queue-2
25+
arguments:
26+
parameters:
27+
- name: message
28+
value: "hello2b"
29+
30+
- name: whalesay
31+
metadata:
32+
labels:
33+
kueue.x-k8s.io/queue-name: user-queue # Pods from this template will target user-queue
34+
inputs:
35+
parameters:
36+
- name: message
37+
container:
38+
image: docker/whalesay
39+
command: [cowsay]
40+
args: ["{{inputs.parameters.message}}"]
41+
42+
- name: whalesay-queue-2
43+
metadata:
44+
labels:
45+
kueue.x-k8s.io/queue-name: user-queue-2 # Pods from this template will target user-queue-2
46+
inputs:
47+
parameters:
48+
- name: message
49+
container:
50+
image: docker/whalesay
51+
command: [cowsay]
52+
args: ["{{inputs.parameters.message}}"]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
apiVersion: argoproj.io/v1alpha1
2+
kind: Workflow
3+
metadata:
4+
generateName: hello-world-
5+
spec:
6+
entrypoint: whalesay
7+
podMetadata:
8+
labels:
9+
kueue.x-k8s.io/queue-name: user-queue # All pods will target user-queue
10+
templates:
11+
- name: whalesay
12+
container:
13+
image: docker/whalesay
14+
command: [ cowsay ]
15+
args: [ "hello world" ]
16+
resources:
17+
limits:
18+
memory: 32Mi
19+
cpu: 100m

0 commit comments

Comments
 (0)