You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This one is really odd, and might be somewhat related to the way how bundle unpack Job names are generated based on the hash value of the bundle (and namespaces?).
When I installed one operator into specific namespace and then try to attempt in another day (actually, 20 days later) to install same operator but into different namespace, then the 2nd Subscription hung and never is reconciled.
What did you do?
At Apr 4th installed operand-deployment-lifecycle-manager.v4.0.0 into namespace cp30test. All good there
At Apr 24th attempted to install the same package (same catalogsource, same channel, same packagename) but into namespace cp46test
different namespace, but very similar name
Subscription in cp46test is hung - i.e. it is never reconciled fully - except of the status field updated that all the catalog sources are healty
in namespace openshift-operator-lifecycle-manager in the catalog-operator-8586f5974d-khh7g Pod there are erorr messages like:
E0424 20:12:47.382632 1 queueinformer_operator.go:319] sync "cp46test" failed: bundle unpacking failed with an error: jobs.batch "8d67f73b77c43214c1f31adf025bfc258a4b6d671a34f339926a897eb6d45c6" already exists
Indeed, a Job named 8d67f73b77c43214c1f31adf025bfc258a4b6d671a34f339926a897eb6d45c6 exists in openshift-marketplace namespace
BUT, the Job was created April 4th and it was completed Apr 4th (and today is Apr 24th):
So, seems that OLM fails to install 2nd instance of same operator, if there is some hash function collision of the bundle unpack Job names
but then, if the Job is already Completed, then it's not re-generated and OLM is stuck
Some more screenshots:
Attaching the relevant YAML resources and OCP must-gather generated (but has 176MB, above size limit, can submit requested subset of files, or you can ping me to get access to the whole package):
Bug Report
This one is really odd, and might be somewhat related to the way how bundle unpack Job names are generated based on the hash value of the bundle (and namespaces?).
When I installed one operator into specific namespace and then try to attempt in another day (actually, 20 days later) to install same operator but into different namespace, then the 2nd
Subscription
hung and never is reconciled.What did you do?
At Apr 4th installed
operand-deployment-lifecycle-manager.v4.0.0
into namespacecp30test
. All good thereAt Apr 24th attempted to install the same package (same catalogsource, same channel, same packagename) but into namespace
cp46test
Subscription in
cp46test
is hung - i.e. it is never reconciled fully - except of the status field updated that all the catalog sources are healtyin namespace
openshift-operator-lifecycle-manager
in thecatalog-operator-8586f5974d-khh7g
Pod there are erorr messages like:E0424 20:12:47.382632 1 queueinformer_operator.go:319] sync "cp46test" failed: bundle unpacking failed with an error: jobs.batch "8d67f73b77c43214c1f31adf025bfc258a4b6d671a34f339926a897eb6d45c6" already exists
Indeed, a
Job
named8d67f73b77c43214c1f31adf025bfc258a4b6d671a34f339926a897eb6d45c6
exists inopenshift-marketplace
namespaceSo, seems that OLM fails to install 2nd instance of same operator, if there is some hash function collision of the bundle unpack Job names
ConfigMap
which holds the bundle details: https://github.com/openshift/operator-framework-olm/blob/master/staging/operator-lifecycle-manager/pkg/controller/bundle/bundle_unpacker.go#L92https://github.com/openshift/operator-framework-olm/blob/master/staging/operator-lifecycle-manager/pkg/controller/bundle/bundle_unpacker.go#L665
Some more screenshots:
Attaching the relevant YAML resources and OCP must-gather generated (but has 176MB, above size limit, can submit requested subset of files, or you can ping me to get access to the whole package):
What did you expect to see?
I would like to see 2nd installation of the operator in separate namespace working just fine
What did you see instead? Under which circumstances?
As above, install hung
Environment
Kubernetes version information:
Kubernetes cluster kind: OCP
k8s version: v1.27.11+ec42b99
Possible Solution
Mitigation is to manually remove the Job which completed Apr 4th and then installation will proceed.
Additional context
N/A
The text was updated successfully, but these errors were encountered: