Spark operator blueprint fails to deploy #740

alanty · 2025-02-06T14:15:57Z

Description

When deploying the spark-operator blueprint, terraform fails with the error below when creating the spark-operator helm chart:

execution error at (spark-operator/templates/prometheus/podmonitor.yaml:22:4): The cluster does not support the required API version `[monitoring.coreos.com/v1](http://monitoring.coreos.com/v1)` for `PodMonitor`.
with module.eks_data_addons.helm_release.spark_operator[0],
on .terraform/modules/eks_data_addons/spark-operator.tf line 7, in resource "helm_release" "spark_operator":
7: resource "helm_release" "spark_operator"

✋ I have searched the open/closed issues and my issue is not listed.

⚠️ Note

Before you submit an issue, please perform the following for Terraform examples:

Remove the local .terraform directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!): rm -rf .terraform/
Re-initialize the project root to pull down modules: terraform init
Re-attempt your terraform plan or apply and check if the issue still persists

Versions

Module version [Required]: v1.0.3
Terraform version: v1.10.5

Provider version(s):

provider registry.terraform.io/alekc/kubectl v2.1.3
provider registry.terraform.io/hashicorp/aws v5.85.0
provider registry.terraform.io/hashicorp/cloudinit v2.3.5
provider registry.terraform.io/hashicorp/helm v2.17.0
provider registry.terraform.io/hashicorp/kubernetes v2.35.1
provider registry.terraform.io/hashicorp/null v3.2.3
provider registry.terraform.io/hashicorp/random v3.6.3
provider registry.terraform.io/hashicorp/time v0.12.1
provider registry.terraform.io/hashicorp/tls v4.0.6

Reproduction Code [Required]

Steps to reproduce the behavior:

git clone the DoEKS repo
cd to analytics/terraform/spark-k8s-operator
execute the install.sh script

Expected behavior

the install completes successfully

Actual behavior

it fails :(

Additional context

Looks like this started after #737

The text was updated successfully, but these errors were encountered:

alanty · 2025-02-06T14:35:19Z

This one is my bad, sorry for the headache.

This is a race between kube-prometheus-stack which creates the PodMonitor CRD and the spark-operator helm charts.
When I enabled the metrics on spark operator (#737 ) my stack was already deployed, but from a fresh create the operator looks to try to create the pod monitor before the kube-prometheus-stack rolled out.

For a quick fix I've got a PR (#742) to comment out and disable the Pod Monitor again.

alanty · 2025-02-06T14:39:45Z

I looked into adding a depends_on between the helm resources but I don't think there is an easy way to do it at the helm chart level, i.e. have the kube-prometheus-stack deploy before the spark-operator?

The helm charts are deployed byeks_data_addons for spark-operator or eks_blueprints_addons for kube-prometheus-stack. We can add a dependency between those modules to fix this and keep the pod monitor creation, but I'm sure that will cause problems for us later at some point.

Maybe its easier to keep that in the values file, but with it commented out in case someone wants those metrics?

alanty self-assigned this Feb 6, 2025

alanty added the bug Something isn't working label Feb 6, 2025

alanty mentioned this issue Feb 6, 2025

fix: Disable spark-operator podmonitor #742

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark operator blueprint fails to deploy #740

Spark operator blueprint fails to deploy #740

alanty commented Feb 6, 2025

alanty commented Feb 6, 2025

alanty commented Feb 6, 2025

Spark operator blueprint fails to deploy #740

Spark operator blueprint fails to deploy #740

Comments

alanty commented Feb 6, 2025

Description

⚠️ Note

Versions

Reproduction Code [Required]

Expected behavior

Actual behavior

Additional context

alanty commented Feb 6, 2025

alanty commented Feb 6, 2025