Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFE-1162: Updates enhancement to reflect hcp usecase and with latest information on aws tags support #1700

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

TrilokGeer
Copy link
Contributor

The PR updates enhancement to reflect hcp usecase and with latest information on aws tags support.

@openshift-ci openshift-ci bot requested review from hasbro17 and jerpeter1 October 17, 2024 16:05
Copy link
Contributor

openshift-ci bot commented Oct 17, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from trilokgeer. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@TrilokGeer TrilokGeer changed the title Updates enhancement to reflect hcp usecase and with latest information on aws tags support [CFE-1162] Updates enhancement to reflect hcp usecase and with latest information on aws tags support Oct 18, 2024
@TrilokGeer TrilokGeer changed the title [CFE-1162] Updates enhancement to reflect hcp usecase and with latest information on aws tags support CFE-1162: Updates enhancement to reflect hcp usecase and with latest information on aws tags support Oct 18, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 18, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 18, 2024

@TrilokGeer: This pull request references CFE-1162 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

The PR updates enhancement to reflect hcp usecase and with latest information on aws tags support.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.


A new field `propagateUserTags` is added in GA release version. The `experimentalPropagateUserTags` field will be deprecated. In future release versions, `experimentalPropagateUserTags` will be removed.
When both fields are set, `experimentalPropagateUserTags` takes precedence.
If the userTags field is changed post-install, there is no guarantee about how an in-cluster operator will respond to the change. Some operators may reconcile the change and change tags on the AWS resource. Some operators may ignore the change. However, if tags are removed from userTags, the tag will not be removed from the AWS resource.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You moved this paragraph, but did you mean to remove it entirely (or perhaps keep it for historical reference, but mark it as outdated)? We've run into an obstacle with cluster-ingress-operator and cloud-provider-aws (see openshift/cluster-ingress-operator#1148 (comment)), so I want to understand whether (1) handling updates and (2) handling removals in particular are hard requirements or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In contrast to cluster-ingress-operator and cloud-provider-aws, I checked the behaviour with aws-load-balancer-operator#137, and it can update and remove the tags.

oc get infrastructures.config.openshift.io cluster -oyaml | yq .status.platformStatus

aws:
  region: us-east-2
  resourceTags:
    - key: conflict-key1
      value: plat-value2
    - key: conflict-key2
      value: plat-value3
type: AWS

aws elbv2 describe-tags --resource-arns arn:aws:elasticloadbalancing:us-east-*******

{
    "TagDescriptions": [
        {
            "ResourceArn": "arn:aws:elasticloadbalancing:us-east-*******",
            "Tags": [
                {
                    "Key": "service.k8s.aws/resource",
                    "Value": "LoadBalancer"
                },
                {
                    "Key": "service.k8s.aws/stack",
                    "Value": "aws-load-balancer-test-default-lb-class/echoserver"
                },
                {
                    "Key": "conflict-key2",
                    "Value": "plat-value3"
                },
                {
                    "Key": "conflict-key1",
                    "Value": "plat-value2"
                },
                {
                    "Key": "elbv2.k8s.aws/cluster",
                    "Value": "ckyal-20241021-2e71f8-pmfc8"
                }
            ]
        }
    ]
}

Updated infrastructure status: removed conflict-key1 and updated conflict-key2

oc get infrastructures.config.openshift.io cluster -oyaml | yq .status.platformStatus
aws:
  region: us-east-2
  resourceTags:
    - key: conflict-key2
      value: new-plat-value3
type: AWS

aws elbv2 describe-tags --resource-arns arn:aws:elasticloadbalancing:us-east-*******

{
    "TagDescriptions": [
        {
            "ResourceArn": "arn:aws:elasticloadbalancing:us-east-*******",
            "Tags": [
                {
                    "Key": "service.k8s.aws/resource",
                    "Value": "LoadBalancer"
                },
                {
                    "Key": "service.k8s.aws/stack",
                    "Value": "aws-load-balancer-test-default-lb-class/echoserver"
                },
                {
                    "Key": "conflict-key2",
                    "Value": "new-plat-value3"
                },
                {
                    "Key": "elbv2.k8s.aws/cluster",
                    "Value": "ckyal-20241021-2e71f8-pmfc8"
                }
            ]
        }
    ]
}

So, there is a bit of inconsistency between upstream cloud-provider-aws and aws-load-balancer-controller on their way of handling tags.

From OpenShift (cluster-ingress-operator and aws-load-balancer-operator) side, I think we can only control

  • aws-load-balancer-additional-resource-tags annotation for cluster-ingress-operator , and
  • --default-tags container arg for aws-load-balancer-operator

to propagate relevant tags to its consumers, which in this case are cloud-provider-aws and aws-load-balancer-controller, respectively. If we need to standardise the behaviour, then we might need to change the upstream logic.

Copy link
Contributor Author

@TrilokGeer TrilokGeer Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You moved this paragraph, but did you mean to remove it entirely (or perhaps keep it for historical reference, but mark it as outdated)? We've run into an obstacle with cluster-ingress-operator and cloud-provider-aws (see openshift/cluster-ingress-operator#1148 (comment)), so I want to understand whether (1) handling updates and (2) handling removals in particular are hard requirements or not.

Ah!, I think it is a residue of the initial version, thanks @Miciah. In summary, all redhat supported operators (in-cluster or day2) will reconcile the tags on resources created and managed by them. Removal of tag is not supported, as it'd intervene and remove the tags un-intentionally for tags added externally by the user, if any.

Copy link
Member

@chiragkyal chiragkyal Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removal of tag is not supported, as it'd intervene and remove the tags un-intentionally for tags added externally by the user, if any.

aws-load-balancer-controller provides a command line flag called --external-managed-tags through which a list of tag keys can be ignored to be reconciled by the controller.

xRef :

external-managed-tags  stringList  AWS Tag keys that will be managed externally. Specified Tags are ignored during reconciliation

This means if a user wants to manage a tag externally, it has to be added in external-managed-tags list. If not, aws-load-balancer-controller will remove unsolicited tags, and apply tags defined in --default-tags.

@alebedev87 As I can see, aws-load-balancer-operator doesn't provide a field to specify ExternalManagedTags. Was there any discussion around this, or it's an existing gap?

Comment on lines +97 to +98
For the resources created and managed by hosted control plane, cluster api provider for aws reconciles the user tags on AWS resources. The hosted control plane updates the `infrastructure.config.openshift.io` resource to reflect new tags in `resourceTags`. The OpenShift operators, both core and non-core (managed by RedHat), reconcile the respective AWS resources created and managed by them.
Given that, there is no universal controller to update all resources created by OpenShift, the day2 updates of tags is not supported for standalone OpenShift deployments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean "not supported for standalone OpenShift deployments"? What will happen if a cluster admin changes resourceTags in infrastructure? Line 95 ("If the userTags field is changed post-install,") says it's going to be applied to the cloud objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At present, in standalone Openshift deployments, the resourceTags in infrastructure is handled as immutable and not a supported operation. The userTags update is allowed for hcp usecases, where, hcp control plane updates the infrastructure status based on edits in hostedCluster and nodepool objects.

Copy link
Contributor

@jsafrane jsafrane Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resourceTags in infrastructure is handled as immutable and not a supported operation

This is not true, status.platformStatus.aws.resourceTags is editable on a standalone cluster.
Something will happen on the standalone cluster. I think with the current PR, the volumes just get re-tagged, just as in HCP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, it is definitely a bug. The feature was implemented before the support of subresource edit in kubectl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The controllers should not be enabled to reconcile for standalone clusters. Even in the event of update in standalone clusters, the change must not reflected on the cloud provider resources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not agree here. I think there should be as little difference between a standalone cluster and HyperShift managed one, esp. when it comes to consuming the OpenShift own APIs such as Infrastructure. Having different behavior in HyperShift managed clusters will create inconsistent experience for users.

In addition, for example cluster registry operator already syncs tags in existing s3 buckets when user changes infrastructure status.platformStatus.aws.resourceTags in a standalone cluster, and I think all observers of this field should have the same behavior.

@JoelSpeed @openshift/openshift-staff-engineers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the OCP standalone cannot be considered for the reason of incompleteness of the solution. The completeness is added along with hypershift cluster spec which reconciles tags for vpc, subnets, etc. The day2 tags worklfow being considered here is for SD+ROSA+HCP deployments. I am happy to consider better alternatives other than adding differentiation in operators along with an additional annotation (which exists on infrastructure) that identifies the clusters as hypershift deployed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on discussion on slack, let's consider user updating infrastructure status as unsupported on standalone OCP and the statement in the enhancement is OK.
Still, I'd prefer all implementations to do the same - if registry operator syncs tags on status update, storage should too. It will greatly simplify our testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, for example cluster registry operator already syncs tags in existing s3 buckets when user changes infrastructure status.platformStatus.aws.resourceTags in a standalone cluster, and I think all observers of this field should have the same behavior.

The Machine API providers behave in the same way

We previously didn't want tag updating because of the inconsistency between some resources being update and some not being updated.

In the case of HyperShift, we have an opportunity to make the tagging consistent across all resources, however, it was my understanding that a lot of resources for a HyperShift cluster were pre-provisioned and handed to the cluster, and it's not actually the Cluster API controllers that provision these resources. Will CAPI update tags on resources it didn't create?

@chiragkyal
Copy link
Member

/cc @alebedev87

Copy link
Contributor

openshift-ci bot commented Nov 6, 2024

@TrilokGeer: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

### Existing design details

New `experimentalPropagateUserTags` field added to `.platform.aws` of install config to indicate that the user tags should be applied to AWS
New `propagateUserTags` field added to `.platform.aws` of install config to indicate that the user tags should be applied to AWS
Copy link
Contributor

@jsafrane jsafrane Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the field stored after installation? In other words, how a CSI driver operator knows it should sync tags or not? The operator does not read install-config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field is applied in install-config while generating the infrastructure resource. There are no control explicit control fields proposed. Hence, it is enabled by default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.
I understood that the flag propagateUserTags enables the propagation as whole, but it just copies the tags from install-config to the initial infrastructure. I thought that installer already does that, but I may be wrong here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the change here is about deprecation of experimentalPropagateUserTags field. The field applies to installer config.

If the userTags field is changed post-install, there is no guarantee about how an in-cluster operator will respond to the change. Some operators may reconcile the change and change tags on the AWS resource. Some operators may ignore the change. However, if tags are removed from userTags, the tag will not be removed from the AWS resource.

For the resources created and managed by hosted control plane, cluster api provider for aws reconciles the user tags on AWS resources. The hosted control plane updates the `infrastructure.config.openshift.io` resource to reflect new tags in `resourceTags`. The OpenShift operators, both core and non-core (managed by RedHat), reconcile the respective AWS resources created and managed by them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do the CAPI controllers actually source from? The hosted control plane resource will copy them into the AWSCluster which is within the control plane, and not the guest cluster. So why do we need to adjust the infrastructure resource at all? Are there things beyond what the AWS CAPI provider manages that also support updating the tags?

Comment on lines +97 to +98
For the resources created and managed by hosted control plane, cluster api provider for aws reconciles the user tags on AWS resources. The hosted control plane updates the `infrastructure.config.openshift.io` resource to reflect new tags in `resourceTags`. The OpenShift operators, both core and non-core (managed by RedHat), reconcile the respective AWS resources created and managed by them.
Given that, there is no universal controller to update all resources created by OpenShift, the day2 updates of tags is not supported for standalone OpenShift deployments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, for example cluster registry operator already syncs tags in existing s3 buckets when user changes infrastructure status.platformStatus.aws.resourceTags in a standalone cluster, and I think all observers of this field should have the same behavior.

The Machine API providers behave in the same way

We previously didn't want tag updating because of the inconsistency between some resources being update and some not being updated.

In the case of HyperShift, we have an opportunity to make the tagging consistent across all resources, however, it was my understanding that a lot of resources for a HyperShift cluster were pre-provisioned and handed to the cluster, and it's not actually the Cluster API controllers that provision these resources. Will CAPI update tags on resources it didn't create?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants