Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark operator blueprint fails to deploy #740

Open
1 task done
alanty opened this issue Feb 6, 2025 · 2 comments
Open
1 task done

Spark operator blueprint fails to deploy #740

alanty opened this issue Feb 6, 2025 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@alanty
Copy link
Contributor

alanty commented Feb 6, 2025

Description

When deploying the spark-operator blueprint, terraform fails with the error below when creating the spark-operator helm chart:

execution error at (spark-operator/templates/prometheus/podmonitor.yaml:22:4): The cluster does not support the required API version `[monitoring.coreos.com/v1](http://monitoring.coreos.com/v1)` for `PodMonitor`.
with module.eks_data_addons.helm_release.spark_operator[0],
on .terraform/modules/eks_data_addons/spark-operator.tf line 7, in resource "helm_release" "spark_operator":
7: resource "helm_release" "spark_operator"
  • ✋ I have searched the open/closed issues and my issue is not listed.

⚠️ Note

Before you submit an issue, please perform the following for Terraform examples:

  1. Remove the local .terraform directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!): rm -rf .terraform/
  2. Re-initialize the project root to pull down modules: terraform init
  3. Re-attempt your terraform plan or apply and check if the issue still persists

Versions

  • Module version [Required]: v1.0.3

  • Terraform version: v1.10.5

  • Provider version(s):

provider registry.terraform.io/alekc/kubectl v2.1.3
provider registry.terraform.io/hashicorp/aws v5.85.0
provider registry.terraform.io/hashicorp/cloudinit v2.3.5
provider registry.terraform.io/hashicorp/helm v2.17.0
provider registry.terraform.io/hashicorp/kubernetes v2.35.1
provider registry.terraform.io/hashicorp/null v3.2.3
provider registry.terraform.io/hashicorp/random v3.6.3
provider registry.terraform.io/hashicorp/time v0.12.1
provider registry.terraform.io/hashicorp/tls v4.0.6

Reproduction Code [Required]

Steps to reproduce the behavior:

  1. git clone the DoEKS repo
  2. cd to analytics/terraform/spark-k8s-operator
  3. execute the install.sh script

Expected behavior

the install completes successfully

Actual behavior

it fails :(

Additional context

Looks like this started after #737

@alanty alanty self-assigned this Feb 6, 2025
@alanty alanty added the bug Something isn't working label Feb 6, 2025
@alanty
Copy link
Contributor Author

alanty commented Feb 6, 2025

This one is my bad, sorry for the headache.

This is a race between kube-prometheus-stack which creates the PodMonitor CRD and the spark-operator helm charts.
When I enabled the metrics on spark operator (#737 ) my stack was already deployed, but from a fresh create the operator looks to try to create the pod monitor before the kube-prometheus-stack rolled out.

For a quick fix I've got a PR (#742) to comment out and disable the Pod Monitor again.

@alanty
Copy link
Contributor Author

alanty commented Feb 6, 2025

I looked into adding a depends_on between the helm resources but I don't think there is an easy way to do it at the helm chart level, i.e. have the kube-prometheus-stack deploy before the spark-operator?

The helm charts are deployed byeks_data_addons for spark-operator or eks_blueprints_addons for kube-prometheus-stack. We can add a dependency between those modules to fix this and keep the pod monitor creation, but I'm sure that will cause problems for us later at some point.

Maybe its easier to keep that in the values file, but with it commented out in case someone wants those metrics?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant