Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Check known required permissions for install before installing with the helm applier #1858

Open
wants to merge 34 commits into
base: main
Choose a base branch
from

Conversation

bentito
Copy link
Contributor

@bentito bentito commented Mar 10, 2025

Description

This is a successor PR to #1716 and is primarily the contributions of @trgeiger and @joelanford .

Goal and title, remain the same. Approach is a bit modified:

Pulls in RBAC authorization code from k8s.is/kubernetes, uses that code to check GET and other verb permissions as prelude to and as response from a Helm dry-run

To pull in the RBAC auth code concisely, repeatably and with warnings if the used code changes, we add a maintenance utility that adds the needed replace directives for all related staging modules (e.g., k8s.io/api, k8s.io/apimachinery, etc.) and they are automatically pinned to the corresponding published version.

All this code is initially called at

missingRules, err := h.PreAuthorizer.PreAuthorize(ctx, &ceServiceAccount, strings.NewReader(tmplRel.Manifest))

in internal/operator-controller/applier/helm.go

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 10, 2025
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 10, 2025
Copy link

netlify bot commented Mar 10, 2025

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit d6bab4a
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/67edafc387c1460008463618
😎 Deploy Preview https://deploy-preview-1858--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@bentito bentito force-pushed the rbac-auth-k8s-replacer branch from 2991d5d to 65ef8a2 Compare March 10, 2025 20:03
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 10, 2025
@bentito bentito marked this pull request as ready for review March 10, 2025 20:04
@bentito bentito requested a review from a team as a code owner March 10, 2025 20:04
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 10, 2025
Copy link

codecov bot commented Mar 11, 2025

Codecov Report

Attention: Patch coverage is 32.27273% with 447 lines in your changes missing coverage. Please review.

Project coverage is 64.80%. Comparing base (19eacb0) to head (e2f3968).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
hack/tools/k8smaintainer/main.go 0.00% 245 Missing ⚠️
internal/operator-controller/authorization/rbac.go 59.33% 116 Missing and 19 partials ⚠️
internal/operator-controller/applier/helm.go 6.94% 66 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1858      +/-   ##
==========================================
- Coverage   68.94%   64.80%   -4.15%     
==========================================
  Files          66       68       +2     
  Lines        5236     5890     +654     
==========================================
+ Hits         3610     3817     +207     
- Misses       1394     1822     +428     
- Partials      232      251      +19     
Flag Coverage Δ
e2e 46.26% <6.74%> (-4.10%) ⬇️
unit 53.78% <30.60%> (-3.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

azych

This comment was marked as outdated.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 14, 2025
@trgeiger
Copy link
Contributor

trgeiger commented Mar 15, 2025

I added some tests but they still need to be tweaked/finalized. I noticed while writing them up that due to the order of the logic where missing rules are checked before escalation, if bind/escalate are in play but we're missing the explicit permissions that bind/escalate would give us we end up with a result where there's no error but we do have missing rules. @joelanford is that what we would want? I would think if we can bind or escalate that we would not return that we're missing those rules since the SA can grant them.

EDIT: This isn't a concern, I misunderstood the permissions logic here

@bentito bentito force-pushed the rbac-auth-k8s-replacer branch from 7a6a943 to e974006 Compare March 18, 2025 13:44
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 18, 2025
@trgeiger trgeiger force-pushed the rbac-auth-k8s-replacer branch from e974006 to 8f76fa8 Compare March 18, 2025 14:29
@bentito bentito force-pushed the rbac-auth-k8s-replacer branch from 9a80b06 to 28211af Compare March 31, 2025 18:17
Comment on lines +230 to +236
k8s.io/api v0.32.3
k8s.io/apiextensions-apiserver v0.32.3
k8s.io/apimachinery v0.32.3
k8s.io/apiserver v0.32.3
k8s.io/cli-runtime v0.32.3
k8s.io/client-go v0.32.3
k8s.io/component-base v0.32.3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It feels off to me that these direct dependencies are now showing up in the require block with all of the indirect dependencies. Any idea why this is happening?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think layout is up to "golang.org/x/mod/modfile"

Comment on lines +40 to +41
require k8s.io/kube-openapi v0.0.0-20241105132330-32ad38e42d3f // indirect

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea why this is now moved into a separate require statement? Before it was in the require grouping of all of the indirect dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a bug in the k8smaintainer code? I'll check there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored k8smaintainer code, see below, the long reply comment, but I don't think we're going to get better control of the file layout, golang.org/x/mod is controlling that.


replace k8s.io/api => k8s.io/api v0.32.2

replace k8s.io/apiextensions-apiserver => k8s.io/apiextensions-apiserver v0.32.3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: is it possible to group all of these into a single replace section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible if the "golang.org/x/mod/modfile" library allows for it, I'll look

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored k8smaintainer code, but it still won't change layout, see comment below

go.mod Outdated
Comment on lines 258 to 260
replace k8s.io/client-go => k8s.io/client-go v0.32.2

replace k8s.io/cloud-provider => k8s.io/cloud-provider v0.32.3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are some of these at 0.32.2, and some are at 0.32.3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible bug, I'll check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I’ve refactored the k8smaintainer/main.go code to make it easier to understand and made a logical change: moving from an assumption-based version pinning approach to one that actively verifies tag existence. The script now uses go list -m -versions to check if the target staging version (derived from k8s.io/kubernetes) exists for each dependency. If the exact tag is missing, it falls back to check for and use the immediately preceding patch version, preventing failures caused by unsynchronized Kubernetes tagging while still logging warnings for transparency.

PTAL at the code and the go.mod it kicked out when run with make tidy. The formatting issues (nits) likely can't be fixed due to the the libs we're using to operate on go.mod being in charge. But it seems like all the staging versions are aligned. The output from running is here:

operator-controller rbac-auth-k8s-replacer $ make tidy
go run hack/tools/k8smaintainer/main.go
Running in module root: /Users/btofel/workspace/operator-controller
Found k8s.io/kubernetes version: v1.32.3
Target staging version calculated: v0.32.3
Running 'go list -m -json all'...
WARNING: Neither target version v0.32.3 nor its predecessor found for k8s.io/kube-openapi. Skipping pinning.
WARNING: Neither target version v0.32.3 nor its predecessor found for k8s.io/system-validators. Skipping pinning.
WARNING: Neither target version v0.32.3 nor its predecessor found for k8s.io/utils. Skipping pinning.
Identified 30 k8s.io/* modules to manage.
Removing existing k8s.io/* replace directives...
Adding determined replace directives...
Adding replace: k8s.io/api => k8s.io/api v0.32.3
Adding replace: k8s.io/apiextensions-apiserver => k8s.io/apiextensions-apiserver v0.32.3
Adding replace: k8s.io/apimachinery => k8s.io/apimachinery v0.32.3
Adding replace: k8s.io/apiserver => k8s.io/apiserver v0.32.3
Adding replace: k8s.io/cli-runtime => k8s.io/cli-runtime v0.32.3
Adding replace: k8s.io/client-go => k8s.io/client-go v0.32.3
Adding replace: k8s.io/cloud-provider => k8s.io/cloud-provider v0.32.3
Adding replace: k8s.io/cluster-bootstrap => k8s.io/cluster-bootstrap v0.32.3
Adding replace: k8s.io/code-generator => k8s.io/code-generator v0.32.3
Adding replace: k8s.io/component-base => k8s.io/component-base v0.32.3
Adding replace: k8s.io/component-helpers => k8s.io/component-helpers v0.32.3
Adding replace: k8s.io/controller-manager => k8s.io/controller-manager v0.32.3
Adding replace: k8s.io/cri-api => k8s.io/cri-api v0.32.3
Adding replace: k8s.io/cri-client => k8s.io/cri-client v0.32.3
Adding replace: k8s.io/csi-translation-lib => k8s.io/csi-translation-lib v0.32.3
Adding replace: k8s.io/dynamic-resource-allocation => k8s.io/dynamic-resource-allocation v0.32.3
Adding replace: k8s.io/endpointslice => k8s.io/endpointslice v0.32.3
Adding replace: k8s.io/externaljwt => k8s.io/externaljwt v0.32.3
Adding replace: k8s.io/kms => k8s.io/kms v0.32.3
Adding replace: k8s.io/kube-aggregator => k8s.io/kube-aggregator v0.32.3
Adding replace: k8s.io/kube-controller-manager => k8s.io/kube-controller-manager v0.32.3
Adding replace: k8s.io/kube-proxy => k8s.io/kube-proxy v0.32.3
Adding replace: k8s.io/kube-scheduler => k8s.io/kube-scheduler v0.32.3
Adding replace: k8s.io/kubectl => k8s.io/kubectl v0.32.3
Adding replace: k8s.io/kubelet => k8s.io/kubelet v0.32.3
Adding replace: k8s.io/kubernetes => k8s.io/kubernetes v1.32.3
Adding replace: k8s.io/metrics => k8s.io/metrics v0.32.3
Adding replace: k8s.io/mount-utils => k8s.io/mount-utils v0.32.3
Adding replace: k8s.io/pod-security-admission => k8s.io/pod-security-admission v0.32.3
Adding replace: k8s.io/sample-apiserver => k8s.io/sample-apiserver v0.32.3
Writing updated go.mod...
Running 'go mod tidy -go=1.23.4'...
Running 'go mod download k8s.io/kubernetes'...
Successfully updated k8s dependencies.
# k8s-maintainer calls go mod tidy

Comment on lines 294 to 306
for _, ns := range sets.List(namespaces) {
for _, v := range collectionVerbs {
attributeRecords = append(attributeRecords, authorizer.AttributesRecord{
User: manifestManager,
Namespace: ns,
APIGroup: gvr.Group,
APIVersion: gvr.Version,
Resource: gvr.Resource,
ResourceRequest: true,
Verb: v,
})
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on collectionVerbs containing list and watch, this is checking for those verbs in the object's namespace. But that is insufficient for what contentmanager needs (which is cluster-scoped list and watch).

For now, it is probably sufficient to split collectionVerbs into clusterCollectionVerbs and namespacedCollectionVerbs and then have a separate loop for clusterCollectionVerbs that hardcodes Namespace: corev1.NamespaceAll.

One problem with this approach is that this pre-authorizer implementation would be tightly coupled with the permission requirements exerted by the contentmanager, which isn't great because there is a hidden dependency that will be hard to keep track of.

My opinion: for now we accept the tight coupling (with a comment that clarifies where the cluster-scoped list and watch requirements come from. But then let's also capture a story under the GA-ification of this feature that ensures we go back and decouple things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay I'm going to implement the split. when I commit that, I'll also make a story on the GA epic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the split is done in fa839f2

return nil, err
}
attributesRecords := dm.asAuthorizationAttributesRecordsForUser(manifestManager)

Copy link
Member

@joelanford joelanford Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need two more attributesRecords:

for _, verb := range []string{"update", "patch"} {
        attributesRecords = append(attributesRecords, authorizer.AttributesRecord{
	        User:            manifestManager,
	        Name:            clusterExtension.Name,
	        APIGroup:        clusterExtension.Group,
	        APIVersion:      clusterExtension.Version,
	        Resource:        "clusterextensions/finalizers",
	        ResourceRequest: true,
	        Verb:            verb,
        })
}

This is required for clusters that have (or could in the future) the OwnerReferencesPermissionEnforcement admission controller feature gate enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look at this after meeting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelanford so should this only be added if that feature gate is enabled?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think we just always add this attributes record. That means we require this one extra permission for clusters where OwnerReferencesPermissionEnforcement is disabled, but it also means a cluster admin could enable that feature without breaking existing ClusterExtensions (because we've already required the permissions that that feature requires)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, then this thread should be satisfied @bentito

}
sortableRules := rbacv1helpers.SortableRuleSlice(missingRules[ns])
sort.Sort(sortableRules)
allMissingPolicyRules = append(allMissingPolicyRules, ScopedPolicyRules{Namespace: ns, MissingRules: sortableRules})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we finish this loop, allMissingPolicyRules also needs to be sorted by namespace since missingRules is a map that we iterate in random order.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bentito done

Comment on lines 93 to 97
if err := ec.checkEscalation(ctx, manifestManager, obj); err != nil {
// In Kubernetes 1.32.2 the specialized PrivilegeEscalationError is gone.
// Instead, we simply collect the error.
preAuthEvaluationErrors = append(preAuthEvaluationErrors, err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've got a problem here. ec.checkEscalation also returns missing rules, but:

  1. They are embedded in the returned err
  2. The returned error is a simple string combines information about missing rules and separate evaluation errors.

We need the error returned by ec.checkEscalation to be something we can type assert on and extract out the missing rules, so that we can add to our missing rule set and still collect the separate evaluation errors that are possible. That was the purpose of PrivilegeEscalationError in my PoC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a stab at handling this, @bentito please review

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 2, 2025
@openshift-merge-robot
Copy link

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

bentito and others added 3 commits April 2, 2025 12:37
Also sort final missing rules by namespace

Signed-off-by: Tayler Geiger <[email protected]>
// In Kubernetes 1.32.2 the specialized PrivilegeEscalationError is gone.
// Instead, we simply collect the error.
missingEscalationRules, namespace := parseEscalationErrorForMissingRules(err)
// Check if we already have these escalation PolicyRules, if so don't append
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this check we end up with a final compacted policy rule with duplicates of the same verb over and over in the []Verbs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had run into that in my PoC as well. Here's the code I added to the CompactRules function to take care of that: d67e50f (#1804)

Comment on lines +419 to +422
setupLog.Info("preflight permissions check enabled via feature gate")
preAuth = authorization.NewRBACPreAuthorizer(mgr.GetClient())
} else {
setupLog.Info("preflight permissions check disabled via feature gate")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Rather than one-off logging like this, it's probably better if we have a general "log the feature gate status" function that does this kind of thing consistently and in one place.

Comment on lines 97 to 102
for i, rule := range missingEscalationRules {
previousRule := missingRules[namespace][len(missingRules[namespace])-len(missingEscalationRules)+i]
if !arePolicyRulesEqual(previousRule, rule) {
missingRules[namespace] = append(missingRules[namespace], missingEscalationRules...)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we blindly append the missing rules here, and rely on:

  • upstream CompactRules that we call in line 112
  • possibly a second pass that dedups the verbs like in my PoC?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CompactRules doesn't seem to properly dedup, I was ending up with a PolicyRule.Verb with 4 counts of "create" and stuff. Where is the dedup bit of your PoC? I looked around for it, I can look again if you're not sure

Comment on lines +120 to +123
// sort allMissingPolicyRules alphabetically by namespace
sort.Slice(allMissingPolicyRules, func(i, j int) bool {
return allMissingPolicyRules[i].Namespace < allMissingPolicyRules[j].Namespace
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the new slices package in the standard library has slices.SortFunc which is a generics-based sort implementation that makes this a bit more ergonomic (the less func allows direct comparison of the two objects rather than requiring the index lookup).

missingRules[namespace] = append(missingRules[namespace], missingEscalationRules...)
}
}
preAuthEvaluationErrors = append(preAuthEvaluationErrors, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is err the original error here still (i.e. does it still include the missing rules text)? If so, I think we need parseEscalationError to return a new error where all the missing rules text is removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah it's the original full error, so you just want to have it return the error minus the missing rules bit. can do

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be ... robust enough to split the error string at ": " a colon and a space?


func parseEscalationErrorForMissingRules(ecError error) ([]rbacv1.PolicyRule, string) {
// Regex to capture namespace and serviceaccount
userRegex := regexp.MustCompile(`system:serviceaccount:(?P<Namespace>[^:]+):(?P<ServiceAccount>[^"]+)`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fragile because it assumes the user will be a service account. I don't think we should make that assumption:

  1. We've got a feature coming soon around synthetic auth
  2. It would make this code less reusable in CLI contexts where the user is not in our control.

I think we already know the namespace when we call this function though (it's the namespace of the object where doing the escalation check against).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the code that generates the string. I think it would probably be best if we wrote the regex that could fully describe (and extract the relevant pieces) of the full string.

https://github.com/kubernetes/kubernetes/blob/3d0594d4d92653fcefea0b91363f473ae457c6b8/pkg/registry/rbac/validation/rule.go#L81-L84

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(that would also help with extracting out the rule resolution errors that I mentioned above)

Pass in the clusterextension to PreAuthorize instead of the user.Info
since we need the extension to create the clusterextension/finalizer

Signed-off-by: Tayler Geiger <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants