Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow mutating queue name in StatefulSet Webhook. #3520

Merged

Conversation

mbobrovskyi
Copy link
Contributor

@mbobrovskyi mbobrovskyi commented Nov 13, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Allow mutating the queue-name label in the StatefulSet Webhook when ReadyReplicas equals zero.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Allow mutating queue-name label in StatefulSet Webhook when ReadyReplicas equals zero.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. labels Nov 13, 2024
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 13, 2024
Copy link

netlify bot commented Nov 13, 2024

Deploy Preview for kubernetes-sigs-kueue ready!

Name Link
🔨 Latest commit 972193d
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/678e5ef50acc0b0008054645
😎 Deploy Preview https://deploy-preview-3520--kubernetes-sigs-kueue.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 13, 2024
@mimowo
Copy link
Contributor

mimowo commented Nov 13, 2024

/hold
I want to understand the flow e2e first from the user perspective.
In particular, how can user start such a StafulSet, will adding the label make it start?

IIRC for Jobs we start such a Job (but please double-check and confirm).

I synced with @mbobrovskyi that this is to align the behavior for Deployment, but another option is to simply reject such Deployments if they are not supported anyway.

I think it deserves e2e test.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 13, 2024
@mbobrovskyi mbobrovskyi marked this pull request as draft November 13, 2024 09:53
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 13, 2024
@dgrove-oss
Copy link
Contributor

I'd also like to understand how it interacts with the namespaceSelector on the pod integration. In the Pod webhook Default method, if the namespaceSelector doesn't match then we never get to the code that consults manageJobsWithoutQueueName.

We don't support namespaceSelectors to modify manageJobsWithoutQueueName for any other integration (discussed at length in #2119).

What is the intended semantics for a StatefulSet or Deployment that is deployed in a namespace that doesn't match the namespaceSelector for the Pod integration when manageJobsWithoutQueueName is true?

@mimowo
Copy link
Contributor

mimowo commented Nov 14, 2024

In the Pod webhook Default method, if the namespaceSelector doesn't match then we never get to the code that consults manageJobsWithoutQueueName.

Yes, this is WAI. The intention was to have a mechanism to exclude pods (like static pods or DeamonSet pods) in kube-system and kueue-system. We made the mechanism more generic (to exclude arbitrary namespaces).

We don't support namespaceSelectors to modify manageJobsWithoutQueueName for any other integration (discussed at length in #2119).

Right, we don't do it for all other integrations. However, I think Deployments and StatefulSets need to be the other cases, first Deployments are used in kube-system and kueue-system so we better don't touch them. Second, the support is based on the PodGroup integration and so we inherit the lookup into namespaceSelector for the pod integration.

What is the intended semantics for a StatefulSet or Deployment that is deployed in a namespace that doesn't match the namespaceSelector for the Pod integration when manageJobsWithoutQueueName is true?

IIUC this means basically "for Deployments and StatefulSets in the kube-system or kueue-system". I think we should not manage them - no workload should be created. Since Deployments and StatefulSets are based on PodGroup integration this should happen "for free".

Let me know if this matches your expectations and understanding.

cc @mwielgus

@dgrove-oss
Copy link
Contributor

It honestly feels a bit like our implementation is leaking through to the API. In particular, treating StatefulSets one way and Jobs another wrt manageJobsWithoutQueueName.

I think it could be less surprising / easier to explain if the boolean manageJobsWithoutQueueNames was replaced with a namespaceSelector across all integrations. I know this was discussed before, but maybe it is worth revisiting now that (a) we see what we need for Deployment and StatefulSet and (b) we are thinking about what a v1 API would look like and what perhaps should be improved between now and then.

@mimowo
Copy link
Contributor

mimowo commented Nov 14, 2024

It honestly feels a bit like our implementation is leaking through to the API. In particular, treating StatefulSets one way and Jobs another wrt manageJobsWithoutQueueName.

Yeah, I see the point - so that it is not clear why StatefulSet or Deployment pods are controlled by podOptions.namespaceSelector, whilst for other Jobs this is not respected.

I think it could be less surprising / easier to explain if the boolean manageJobsWithoutQueueNames was replaced with a namespaceSelector across all integrations.

You mean "replaced"? Or something like "restricted" - so that we only manage workloads matching the namespaceSelector?

I know this was discussed before, but maybe it is worth revisiting now that (a) we see what we need for Deployment and StatefulSet and (b) we are thinking about what a v1 API would look like and what perhaps should be improved between now and then.

I would be in favor of that. The original intention of podOptions.namespaceSelector was to exclude "kube-system" and "kueue-system" from pods. Back then we didn't foresee the need to exclude managing for Jobs or other supported CRDs. However, as we now support Deployments it makes also sense to exclude "kube-system" and "kueue-system". Luckily this is for free by using Pod integration, but as you say it means leaking implementation details.

Let me also cc @mwielgus and @tenzen-y for their opinions, but +1 from me to decouple namespaceSelector from podOptions.

The remaining question from me: do we support both places, or we validate only one is set? We could consider supporting both places for v1beta1 and depracate the one in podOptions, but it would be good to have a KEP for that. Are you interested in driving this?

@dgrove-oss
Copy link
Contributor

You mean "replaced"? Or something like "restricted" - so that we only manage workloads matching the namespaceSelector?

restricted is a better word :).

Yes, I'd propose that we do a uniform filtering by namespaceSelector for all integrations when manageJobsWithoutQueueName is true. I'll give people some time to comment, but if there is interest in exploring this I'd be happy to kick off a KEP and drive it.

@mbobrovskyi mbobrovskyi changed the title Allow manageJobsWithoutQueueName on StatefulSet. Allow mutating queue name in StatefulSet Webhook. Nov 18, 2024
@mbobrovskyi mbobrovskyi force-pushed the fix/manageJobsWithoutQueueName branch from f9aa14c to 752d4dc Compare November 18, 2024 05:08
@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Nov 18, 2024
@mbobrovskyi mbobrovskyi force-pushed the fix/manageJobsWithoutQueueName branch 3 times, most recently from 43a4d26 to 3d997ec Compare November 27, 2024 11:49
@mbobrovskyi mbobrovskyi force-pushed the fix/manageJobsWithoutQueueName branch from 3d997ec to 534b8aa Compare November 27, 2024 11:56
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 27, 2024
@mbobrovskyi
Copy link
Contributor Author

/reopen

@k8s-ci-robot k8s-ci-robot reopened this Nov 27, 2024
@trasc
Copy link
Contributor

trasc commented Jan 14, 2025

/uncc

@k8s-ci-robot k8s-ci-robot removed the request for review from trasc January 14, 2025 11:56
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 18, 2025
@mbobrovskyi mbobrovskyi force-pushed the fix/manageJobsWithoutQueueName branch from c585c28 to 8838fd8 Compare January 18, 2025 09:23
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 18, 2025
@mbobrovskyi mbobrovskyi force-pushed the fix/manageJobsWithoutQueueName branch 2 times, most recently from 2b3460b to c9e499c Compare January 20, 2025 04:12
@mbobrovskyi mbobrovskyi marked this pull request as ready for review January 20, 2025 04:12
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 20, 2025
@k8s-ci-robot k8s-ci-robot requested a review from gabesaba January 20, 2025 04:12
@mbobrovskyi mbobrovskyi force-pushed the fix/manageJobsWithoutQueueName branch from c9e499c to 3e3b763 Compare January 20, 2025 04:13
@mbobrovskyi mbobrovskyi requested a review from mimowo January 20, 2025 04:22
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
LGTM, just minor suggestions for the testing coverage

@mbobrovskyi mbobrovskyi force-pushed the fix/manageJobsWithoutQueueName branch from 3e3b763 to 972193d Compare January 20, 2025 14:34
@mbobrovskyi mbobrovskyi requested a review from mimowo January 20, 2025 14:37
@mimowo
Copy link
Contributor

mimowo commented Jan 20, 2025

/hold
I want to understand the flow e2e first from the user perspective.
In particular, how can user start such a StafulSet, will adding the label make it start?

/hold cancel
I believe this is covered now with the e2e test which demonstrates how a user can change the label and the StatefulSet will start to run.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 20, 2025
@mimowo
Copy link
Contributor

mimowo commented Jan 20, 2025

It honestly feels a bit like our implementation is leaking through to the API. In particular, treating StatefulSets one way and Jobs another wrt manageJobsWithoutQueueName.

I believe this concern is already addressed by the KEP: #3589. We are yet to follow up on deprecating and removing the podOptions, but I wouldn't block this work on that.

Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 20, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: e9af5e65a7a809073ed4c8f9785c2f093b911a98

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mbobrovskyi, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 20, 2025
@k8s-ci-robot k8s-ci-robot merged commit 74dd940 into kubernetes-sigs:main Jan 20, 2025
17 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.11 milestone Jan 20, 2025
FillZpp pushed a commit to leptonai/kueue that referenced this pull request Feb 5, 2025
* Allow mutating queue-name in StatefulSet Webhook.

* Add test cases to check set queue-name.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants