Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate zookeeper-operator #600

Open
janhoy opened this issue Aug 15, 2023 · 7 comments
Open

Deprecate zookeeper-operator #600

janhoy opened this issue Aug 15, 2023 · 7 comments
Labels
zookeeper Related to Zookeeper or the Zookeeper Operator

Comments

@janhoy
Copy link
Contributor

janhoy commented Aug 15, 2023

Ref discussions in #517 (comment)

The ZK operator is a project that is not well maintained (74 open issues and no response to PRs). They lag in releases (on zk 3.7.1 while latest is 3.9.0) and you cannot even upgrade Zookeeper by referencing another ZK docker image, since the zk-operator require their own custom image with some added tools. This will be a problem in the future unless pravega ups their game one this.

Other issues with the architecture:

  • it is a bit awkward to have the Solr operator add a CRD for the Zk operator to pick up. Hard to debug issues.
  • If you try to delete a solr cluster in e.g. ArgoCD, it will try to delete leaf resources first, not being aware that they are managed by ZK-operator, so ZK-operator will re-create them in an endless loop
  • Upgrading Solr operator requires manual upgrading of zk-operator CRD in the right order..
  • ZK-operator POD runs as root and there is no way to patch the pod-spec through Solr-operator :(

So let's consider declaring the zk-operator support deprecated. We can then either recommend users to use bitnami ZK chart separately and copy the ZK address. Or if possible, a tighter integration where you can just provide a namespace+name and let solr-operator figure out the connection string.

@janhoy janhoy added the zookeeper Related to Zookeeper or the Zookeeper Operator label Aug 15, 2023
@madrob
Copy link
Contributor

madrob commented Aug 15, 2023

Do we have input from the zookeeper community about their preferred or recommended approach?

@janhoy
Copy link
Contributor Author

janhoy commented Aug 16, 2023

Do we have input from the zookeeper community about their preferred or recommended approach?

Who/where do you want to ask? The Zookeeper project does not do any Docker stuff afaik. A volunteer publishes the official zookeeper Docker image on hub. There are various ZK helm charts out there, of which bitnami seems to be the best maintained one.

One option would of course be for Solr Operator to also take on managing the ZK cluster by crafting k8s specs for the sts, pod, svc, pdb, svcmonitor, cert etc resources needed for each Solr cluster. But I know @HoustonPutman is not very keen on that idea as the maintenance burden is too big. And I agree.

@madrob
Copy link
Contributor

madrob commented Aug 16, 2023

Who/where do you want to ask? The Zookeeper project does not do any Docker stuff afaik.

We would ask on their dev list. Maybe we need to ask them to reconsider their stance on no container development, I don't know the history behind that decision. Maybe their PMC is not aware of how bad the situation is in the community, and as part of the community of users we would be bringing new context to them. I don't know that I trust any of the third party volunteer supported tooling to exist and be up to date long term.

Of course, if their response is to tell us to go pound sand, then we are back in the same place we are now, but maybe there's a chance we get a better outcome.

@cdmikechen
Copy link

@janhoy
I use https://github.com/strimzi/strimzi-kafka-operator to manage kafka cluster in k8s.
Strimzi has built a unified image that includes both zookeeper and kafka. The user only needs to define a kafka resource, and the operator will create zookeeper first, and then kafka after zookeeper is running properly. I think solr-operator can also follow strimzi's idea and build zookeeper and solr in an image.

By the way, it seems that apache pulsar's current image is also an unified image (with bookkeeper).

@janhoy
Copy link
Contributor Author

janhoy commented Sep 28, 2023

@cdmikechen What your're proposing is effectively for Solr Operator project to take on the whole responsibility of zookeeper-operator, and that is likely not something the project is willing or ready to do.

That's why I opened this issue. In the future there is hope for the core Solr project to take in a new zookeeper node-role in which case it makes more sense for solr-operator to manage those roles natively.

@bergner
Copy link

bergner commented Feb 23, 2024

I ran into the issue of having ArgoCD get into a loop when trying to delete an ArgoCD application that used the Solr helm chart as one of its components, and the only workaround was to manually force delete the SolrCloud CR. ArgoCD has an annotation that can be set on resources ArgoCD should ignore (i.e. resources managed by something else):

argocd.argoproj.io/compare-options: IgnoreExtraneous

Looking at the helm chart configuration options at https://github.com/apache/solr-operator/tree/main/helm/solr I tried setting all the exposed "annotations" options accordingly in the values.yaml file used when deploying the Solr helm chart but this was still not sufficient.

image:
  tag: 8.11
replicas: 1
solrOptions:
  logLevel: "DEBUG"

fullnameOverride: solr

zk:
  provided:
    replicas: 1
    persistence:
      annotations:
        argocd.argoproj.io/compare-options: IgnoreExtraneous
    adminServerService:
      annotations:
        argocd.argoproj.io/compare-options: IgnoreExtraneous
    clientService:
      annotations:
        argocd.argoproj.io/compare-options: IgnoreExtraneous
    headlessService:
      annotations:
        argocd.argoproj.io/compare-options: IgnoreExtraneous
    zookeeperPodPolicy:
      annotations:
        argocd.argoproj.io/compare-options: IgnoreExtraneous
podOptions:
  serviceAccountName: solr-service-account
  annotations:
    argocd.argoproj.io/compare-options: IgnoreExtraneous
  podSecurityContext:
    runAsNonRoot: false
    seccompProfile:
      type: RuntimeDefault
  startupProbe:
    timeoutSeconds: 30
    periodSeconds: 10
statefulSetOptions:
  annotations:
    argocd.argoproj.io/compare-options: IgnoreExtraneous
commonServiceOptions:
  annotations:
    argocd.argoproj.io/compare-options: IgnoreExtraneous
headlessServiceOptions:
  annotations:
    argocd.argoproj.io/compare-options: IgnoreExtraneous
nodeServiceOptions:
  annotations:
    argocd.argoproj.io/compare-options: IgnoreExtraneous
ingressOptions:
  annotations:
    argocd.argoproj.io/compare-options: IgnoreExtraneous
configMapOptions:
  annotations:
    argocd.argoproj.io/compare-options: IgnoreExtraneous
dataStorage:
  type: persistent
  capacity: "20Gi"
  persistent:
    reclaimPolicy: "Delete"
    pvc:
      name: "solr-data"
      annotations:
        solrapp: "solr-data"
        argocd.argoproj.io/compare-options: IgnoreExtraneous
      labels:
        solrapp: "solr-data"

Looking at the resources underneath the SolrCloud CR I found the following that lacked the IgnoreExtraneous annotation:

  • ZookeeperCluster CR itself.
  • ConfigMap solr-solrcloud-zookeeper-config
  • StatefulSet solr-solrcloud-zookeper
  • PersistentVolumeClaim data-solr-solrcloud-zookeeper-0

Then there were a few other resources that also lacked it but I think these are ok without it (based on previous experience with ArgoCD and other operators):

  • PodDisruptionBudget (both for SolrCloud and ZK)
  • Endpoints and EndpointSlice resources both sit underneath Service resources and the Service does have the annotation

I also think the Bitnami Zookeeper chart would be a better choice here. I haven't used the Bitnami ZK chart myself but I have had a pretty good experience using some containers and other charts from Bitnami in the past and the Bitnami ZK chart has good options that seemingly makes it easy to set annotations on all resources:

commonAnnotations | Add annotations to all the deployed resources | {}

@janhoy
Copy link
Contributor Author

janhoy commented Feb 24, 2024

This issue would mainly be to add DEPRECATION warning to our documentation and also log files that the zk-operator option is deprecated and will be removed in the future. Is this a way we want to move?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
zookeeper Related to Zookeeper or the Zookeeper Operator
Projects
None yet
Development

No branches or pull requests

4 participants