Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performing an update w/o the cluster stopped messed up the CloudFormation stack and now I can't do any mods. #271

Open
gwolski opened this issue Oct 27, 2024 · 1 comment

Comments

@gwolski
Copy link

gwolski commented Oct 27, 2024

I was trying to do an update of the cluster. I added the ClusterConfig/SlurmSettings/ScaledownIdletime to my cluster config file:

  ParallelClusterConfig:
    Version: 3.11.1
    Architecture: x86_64
    Image:
      Os: rocky8
      CustomAmi: ami-0d68c6538XXXXXXX 
    DisableSimultaneousMultithreading: true
    ClusterConfig:
      SlurmSettings:
        ScaledownIdletime: 20

I had noticed earlier that when I downsized some node count, you STOPPED the cluster. I assumed you would do the same here.
Maybe not.
The cluster was not stopped. The update failed. The UPDATE_ROLLBACK_FAILED.
I stopped my cluster.
Tried the update again, but it can't do the update as the cluster if in a bad state. How does one get out of a bad state to continue using aws-eda-slurm-cluster install.sh?

That said, I've discovered my above syntax is not valid. the ${cluster}-config gets created, but the actual cluster stack doesn't. Need to document how to add things like SlurmSettings better please. I just can't figure out. Will file another issue for that.

@cartalla
Copy link
Contributor

Usually what you'll need to do is resume the rollback. When you do that you'll get an option to skip the failed resource. I'm assuming that the failure can from a custom resource like the ParallelCluster or UpdateHeadNode resource. Expand the section for the failed resources, click that, and then continue the rollback. That should get you to a rollback complete state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants