-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Check known required permissions for install before installing with the helm applier #1858
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for olmv1 ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
2991d5d
to
65ef8a2
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1858 +/- ##
==========================================
- Coverage 68.94% 64.80% -4.15%
==========================================
Files 66 68 +2
Lines 5236 5890 +654
==========================================
+ Hits 3610 3817 +207
- Misses 1394 1822 +428
- Partials 232 251 +19
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I added some tests but they still need to be tweaked/finalized. I noticed while writing them up that due to the order of the logic where missing rules are checked before escalation, if bind/escalate are in play but we're missing the explicit permissions that bind/escalate would give us we end up with a result where there's no error but we do have missing rules. @joelanford is that what we would want? I would think if we can bind or escalate that we would not return that we're missing those rules since the SA can grant them. EDIT: This isn't a concern, I misunderstood the permissions logic here |
7a6a943
to
e974006
Compare
e974006
to
8f76fa8
Compare
Signed-off-by: Brett Tofel <[email protected]>
This reverts commit 2681194.
Signed-off-by: Brett Tofel <[email protected]>
9a80b06
to
28211af
Compare
Signed-off-by: Brett Tofel <[email protected]>
Signed-off-by: Brett Tofel <[email protected]>
Signed-off-by: Tayler Geiger <[email protected]>
k8s.io/api v0.32.3 | ||
k8s.io/apiextensions-apiserver v0.32.3 | ||
k8s.io/apimachinery v0.32.3 | ||
k8s.io/apiserver v0.32.3 | ||
k8s.io/cli-runtime v0.32.3 | ||
k8s.io/client-go v0.32.3 | ||
k8s.io/component-base v0.32.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It feels off to me that these direct dependencies are now showing up in the require
block with all of the indirect dependencies. Any idea why this is happening?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think layout is up to "golang.org/x/mod/modfile"
require k8s.io/kube-openapi v0.0.0-20241105132330-32ad38e42d3f // indirect | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea why this is now moved into a separate require
statement? Before it was in the require
grouping of all of the indirect dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a bug in the k8smaintainer code? I'll check there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored k8smaintainer code, see below, the long reply comment, but I don't think we're going to get better control of the file layout, golang.org/x/mod is controlling that.
|
||
replace k8s.io/api => k8s.io/api v0.32.2 | ||
|
||
replace k8s.io/apiextensions-apiserver => k8s.io/apiextensions-apiserver v0.32.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: is it possible to group all of these into a single replace
section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possible if the "golang.org/x/mod/modfile" library allows for it, I'll look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored k8smaintainer code, but it still won't change layout, see comment below
go.mod
Outdated
replace k8s.io/client-go => k8s.io/client-go v0.32.2 | ||
|
||
replace k8s.io/cloud-provider => k8s.io/cloud-provider v0.32.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are some of these at 0.32.2, and some are at 0.32.3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possible bug, I'll check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I’ve refactored the k8smaintainer/main.go
code to make it easier to understand and made a logical change: moving from an assumption-based version pinning approach to one that actively verifies tag existence. The script now uses go list -m -versions to check if the target staging version (derived from k8s.io/kubernetes) exists for each dependency. If the exact tag is missing, it falls back to check for and use the immediately preceding patch version, preventing failures caused by unsynchronized Kubernetes tagging while still logging warnings for transparency.
PTAL at the code and the go.mod
it kicked out when run with make tidy
. The formatting issues (nits) likely can't be fixed due to the the libs we're using to operate on go.mod being in charge. But it seems like all the staging versions are aligned. The output from running is here:
operator-controller rbac-auth-k8s-replacer $ make tidy
go run hack/tools/k8smaintainer/main.go
Running in module root: /Users/btofel/workspace/operator-controller
Found k8s.io/kubernetes version: v1.32.3
Target staging version calculated: v0.32.3
Running 'go list -m -json all'...
WARNING: Neither target version v0.32.3 nor its predecessor found for k8s.io/kube-openapi. Skipping pinning.
WARNING: Neither target version v0.32.3 nor its predecessor found for k8s.io/system-validators. Skipping pinning.
WARNING: Neither target version v0.32.3 nor its predecessor found for k8s.io/utils. Skipping pinning.
Identified 30 k8s.io/* modules to manage.
Removing existing k8s.io/* replace directives...
Adding determined replace directives...
Adding replace: k8s.io/api => k8s.io/api v0.32.3
Adding replace: k8s.io/apiextensions-apiserver => k8s.io/apiextensions-apiserver v0.32.3
Adding replace: k8s.io/apimachinery => k8s.io/apimachinery v0.32.3
Adding replace: k8s.io/apiserver => k8s.io/apiserver v0.32.3
Adding replace: k8s.io/cli-runtime => k8s.io/cli-runtime v0.32.3
Adding replace: k8s.io/client-go => k8s.io/client-go v0.32.3
Adding replace: k8s.io/cloud-provider => k8s.io/cloud-provider v0.32.3
Adding replace: k8s.io/cluster-bootstrap => k8s.io/cluster-bootstrap v0.32.3
Adding replace: k8s.io/code-generator => k8s.io/code-generator v0.32.3
Adding replace: k8s.io/component-base => k8s.io/component-base v0.32.3
Adding replace: k8s.io/component-helpers => k8s.io/component-helpers v0.32.3
Adding replace: k8s.io/controller-manager => k8s.io/controller-manager v0.32.3
Adding replace: k8s.io/cri-api => k8s.io/cri-api v0.32.3
Adding replace: k8s.io/cri-client => k8s.io/cri-client v0.32.3
Adding replace: k8s.io/csi-translation-lib => k8s.io/csi-translation-lib v0.32.3
Adding replace: k8s.io/dynamic-resource-allocation => k8s.io/dynamic-resource-allocation v0.32.3
Adding replace: k8s.io/endpointslice => k8s.io/endpointslice v0.32.3
Adding replace: k8s.io/externaljwt => k8s.io/externaljwt v0.32.3
Adding replace: k8s.io/kms => k8s.io/kms v0.32.3
Adding replace: k8s.io/kube-aggregator => k8s.io/kube-aggregator v0.32.3
Adding replace: k8s.io/kube-controller-manager => k8s.io/kube-controller-manager v0.32.3
Adding replace: k8s.io/kube-proxy => k8s.io/kube-proxy v0.32.3
Adding replace: k8s.io/kube-scheduler => k8s.io/kube-scheduler v0.32.3
Adding replace: k8s.io/kubectl => k8s.io/kubectl v0.32.3
Adding replace: k8s.io/kubelet => k8s.io/kubelet v0.32.3
Adding replace: k8s.io/kubernetes => k8s.io/kubernetes v1.32.3
Adding replace: k8s.io/metrics => k8s.io/metrics v0.32.3
Adding replace: k8s.io/mount-utils => k8s.io/mount-utils v0.32.3
Adding replace: k8s.io/pod-security-admission => k8s.io/pod-security-admission v0.32.3
Adding replace: k8s.io/sample-apiserver => k8s.io/sample-apiserver v0.32.3
Writing updated go.mod...
Running 'go mod tidy -go=1.23.4'...
Running 'go mod download k8s.io/kubernetes'...
Successfully updated k8s dependencies.
# k8s-maintainer calls go mod tidy
for _, ns := range sets.List(namespaces) { | ||
for _, v := range collectionVerbs { | ||
attributeRecords = append(attributeRecords, authorizer.AttributesRecord{ | ||
User: manifestManager, | ||
Namespace: ns, | ||
APIGroup: gvr.Group, | ||
APIVersion: gvr.Version, | ||
Resource: gvr.Resource, | ||
ResourceRequest: true, | ||
Verb: v, | ||
}) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on collectionVerbs
containing list
and watch
, this is checking for those verbs in the object's namespace. But that is insufficient for what contentmanager needs (which is cluster-scoped list
and watch
).
For now, it is probably sufficient to split collectionVerbs
into clusterCollectionVerbs
and namespacedCollectionVerbs
and then have a separate loop for clusterCollectionVerbs
that hardcodes Namespace: corev1.NamespaceAll
.
One problem with this approach is that this pre-authorizer implementation would be tightly coupled with the permission requirements exerted by the contentmanager, which isn't great because there is a hidden dependency that will be hard to keep track of.
My opinion: for now we accept the tight coupling (with a comment that clarifies where the cluster-scoped list
and watch
requirements come from. But then let's also capture a story under the GA-ification of this feature that ensures we go back and decouple things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay I'm going to implement the split. when I commit that, I'll also make a story on the GA epic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the split is done in fa839f2
return nil, err | ||
} | ||
attributesRecords := dm.asAuthorizationAttributesRecordsForUser(manifestManager) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need two more attributesRecords:
for _, verb := range []string{"update", "patch"} {
attributesRecords = append(attributesRecords, authorizer.AttributesRecord{
User: manifestManager,
Name: clusterExtension.Name,
APIGroup: clusterExtension.Group,
APIVersion: clusterExtension.Version,
Resource: "clusterextensions/finalizers",
ResourceRequest: true,
Verb: verb,
})
}
This is required for clusters that have (or could in the future) the OwnerReferencesPermissionEnforcement
admission controller feature gate enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look at this after meeting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joelanford so should this only be added if that feature gate is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think we just always add this attributes record. That means we require this one extra permission for clusters where OwnerReferencesPermissionEnforcement is disabled, but it also means a cluster admin could enable that feature without breaking existing ClusterExtensions (because we've already required the permissions that that feature requires)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, then this thread should be satisfied @bentito
} | ||
sortableRules := rbacv1helpers.SortableRuleSlice(missingRules[ns]) | ||
sort.Sort(sortableRules) | ||
allMissingPolicyRules = append(allMissingPolicyRules, ScopedPolicyRules{Namespace: ns, MissingRules: sortableRules}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we finish this loop, allMissingPolicyRules
also needs to be sorted by namespace since missingRules
is a map that we iterate in random order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bentito done
if err := ec.checkEscalation(ctx, manifestManager, obj); err != nil { | ||
// In Kubernetes 1.32.2 the specialized PrivilegeEscalationError is gone. | ||
// Instead, we simply collect the error. | ||
preAuthEvaluationErrors = append(preAuthEvaluationErrors, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've got a problem here. ec.checkEscalation
also returns missing rules, but:
- They are embedded in the returned
err
- The returned error is a simple string combines information about missing rules and separate evaluation errors.
We need the error returned by ec.checkEscalation
to be something we can type assert on and extract out the missing rules, so that we can add to our missing rule set and still collect the separate evaluation errors that are possible. That was the purpose of PrivilegeEscalationError
in my PoC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a stab at handling this, @bentito please review
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Signed-off-by: Brett Tofel <[email protected]>
Signed-off-by: Brett Tofel <[email protected]>
Also sort final missing rules by namespace Signed-off-by: Tayler Geiger <[email protected]>
// In Kubernetes 1.32.2 the specialized PrivilegeEscalationError is gone. | ||
// Instead, we simply collect the error. | ||
missingEscalationRules, namespace := parseEscalationErrorForMissingRules(err) | ||
// Check if we already have these escalation PolicyRules, if so don't append |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this check we end up with a final compacted policy rule with duplicates of the same verb over and over in the []Verbs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I had run into that in my PoC as well. Here's the code I added to the CompactRules function to take care of that: d67e50f
(#1804)
Signed-off-by: Brett Tofel <[email protected]>
…c-auth-k8s-replacer
setupLog.Info("preflight permissions check enabled via feature gate") | ||
preAuth = authorization.NewRBACPreAuthorizer(mgr.GetClient()) | ||
} else { | ||
setupLog.Info("preflight permissions check disabled via feature gate") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Rather than one-off logging like this, it's probably better if we have a general "log the feature gate status" function that does this kind of thing consistently and in one place.
for i, rule := range missingEscalationRules { | ||
previousRule := missingRules[namespace][len(missingRules[namespace])-len(missingEscalationRules)+i] | ||
if !arePolicyRulesEqual(previousRule, rule) { | ||
missingRules[namespace] = append(missingRules[namespace], missingEscalationRules...) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we blindly append the missing rules here, and rely on:
- upstream
CompactRules
that we call in line 112 - possibly a second pass that dedups the verbs like in my PoC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CompactRules doesn't seem to properly dedup, I was ending up with a PolicyRule.Verb with 4 counts of "create" and stuff. Where is the dedup bit of your PoC? I looked around for it, I can look again if you're not sure
// sort allMissingPolicyRules alphabetically by namespace | ||
sort.Slice(allMissingPolicyRules, func(i, j int) bool { | ||
return allMissingPolicyRules[i].Namespace < allMissingPolicyRules[j].Namespace | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the new slices
package in the standard library has slices.SortFunc
which is a generics-based sort implementation that makes this a bit more ergonomic (the less func allows direct comparison of the two objects rather than requiring the index lookup).
missingRules[namespace] = append(missingRules[namespace], missingEscalationRules...) | ||
} | ||
} | ||
preAuthEvaluationErrors = append(preAuthEvaluationErrors, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is err
the original error here still (i.e. does it still include the missing rules text)? If so, I think we need parseEscalationError
to return a new error where all the missing rules text is removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah it's the original full error, so you just want to have it return the error minus the missing rules bit. can do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would be ... robust enough to split the error string at ": " a colon and a space?
|
||
func parseEscalationErrorForMissingRules(ecError error) ([]rbacv1.PolicyRule, string) { | ||
// Regex to capture namespace and serviceaccount | ||
userRegex := regexp.MustCompile(`system:serviceaccount:(?P<Namespace>[^:]+):(?P<ServiceAccount>[^"]+)`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fragile because it assumes the user will be a service account. I don't think we should make that assumption:
- We've got a feature coming soon around synthetic auth
- It would make this code less reusable in CLI contexts where the user is not in our control.
I think we already know the namespace when we call this function though (it's the namespace of the object where doing the escalation check against).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the code that generates the string. I think it would probably be best if we wrote the regex that could fully describe (and extract the relevant pieces) of the full string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(that would also help with extracting out the rule resolution errors that I mentioned above)
Pass in the clusterextension to PreAuthorize instead of the user.Info since we need the extension to create the clusterextension/finalizer Signed-off-by: Tayler Geiger <[email protected]>
Description
This is a successor PR to #1716 and is primarily the contributions of @trgeiger and @joelanford .
Goal and title, remain the same. Approach is a bit modified:
Pulls in RBAC authorization code from
k8s.is/kubernetes
, uses that code to check GET and other verb permissions as prelude to and as response from a Helm dry-runTo pull in the RBAC auth code concisely, repeatably and with warnings if the used code changes, we add a maintenance utility that adds the needed
replace
directives for all related staging modules (e.g.,k8s.io/api
,k8s.io/apimachinery
, etc.) and they are automatically pinned to the corresponding published version.All this code is initially called at
in
internal/operator-controller/applier/helm.go
Reviewer Checklist