Skip to content

Worker Nodes using Access Entry are Not Joining Cluster when IAM role used by Access Entry is recreated #3329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kamalverma1 opened this issue Mar 21, 2025 · 1 comment
Labels

Comments

@kamalverma1
Copy link

kamalverma1 commented Mar 21, 2025

Is your request related to a problem? Please describe.

I have come across an issue that I faced where the worker nodes were unable to join the cluster. I have faced it coupe of times. The issue was that I was creating and managing a nodegroup by terraform and when a nodegroup was created, the associated the IAM role and the Access Entry used by nodegroup was also created. However, the access entry associated with the nodegroup is not removed or recreated after it was initially created when we remove a nodegroup and the respective IAM role.

But this caused an issue. I got to know that the Access Entry is associated with some metadata of the IAM role and it becomes non-functional when the associated IAM role is recreated with the same name(usually with the recreation of the nodegroup).

There is currently no way to find out if an access entry is functional until we verify that the Access Entry is created/recreated after the IAM role. Since for our usecase, the nodegroup and the associated IAM role are closely associaated, if a nodegroup is recreated, the IAM role is also recreated, we would always run into such issues were the Access Entry will be non functional even if it seems to be attached to the correct IAM role(recreated) with valid attached policies. This makes it harder to debug by just looking at the Access Entry configuration and the associated IAM role and its attached policies.

This is a very fustrating to debug and takes even few days to figureout what the real issue. It took a long time even for the AWS support guys to figure out the real issue. I am sure others have faced similar issues as me. This can also be the case with the Admin Access Entry with IAM role being recreated with the same name where the user can lose access to the cluster and may find hareder to debug.

Describe the solution you'd like.

The possible solution are:

  1. There can be some dependency that when an Access Entry is associated with an IAM role, the IAM role deletion is restricted until the Access Entry is deleted.
  2. A way to test/verify if the Access Entry is valid.
  3. Automatic association of the Access Entry to the IAM role if the IAM role is recreated with the same name. Should not be dependent on the metadata of the old IAM role with the same name.
  4. It should be clearly documented in the docs for Access Entry and the troubleshooting docs for users to avoid such issues.

Describe alternatives you've considered.

We did not face any issues while using aws-auth but the way this Access Entry works with IAM role is actually a real issue.

Additional context

I would like the access entry to be managed in a more easier and transparent way so that it does not break the functionality for the user wasting a ton of valuable time debugging such issues.

@bryantbiggs
Copy link
Member

EKS managed node group, Fargate, and Auto Mode are all responsible for creating/managing the access entry for the role used - not this module

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants