Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Brokers keep on restarting when not able to make connection with Read-only global zk #23838

Open
2 of 3 tasks
Meet0861 opened this issue Jan 10, 2025 · 0 comments
Open
2 of 3 tasks
Labels
type/bug The PR fixed a bug or issue reported a bug

Comments

@Meet0861
Copy link

Search before asking

  • I searched in the issues and found nothing similar.

Read release policy

  • I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

Version

Pulsar version 3.0.7 which uses zookeeper version 3.9.2
Broker and configstore both are on the same pulsar version

Minimal reproduce step

Lets say, global zk in 2 regions, R1->3 participants, 2 observers and R2->2 participants, 2 observers. Leader zk is in R2.
Now, in some situation like network partition, global zk loses the quorum and the R2 zks went to Read only mode.

What did you expect to see?

  • Cluster should not went to unstable state[brokers kept on restarting] even if it is not able to establish connection with global zk for any reason and should handle this gracefully
  • Get calls should work even when quorum is lost and global zk is operating in RO mode with local sessions enabled
  • New produce/consume also works if it doesnt make any write call to global zk internally

What did you see instead?

Adding observations as per our testing based on the above mentioned global zk setup where global ZK is operating in RO mode:
pulsar v2.9.3 and Configstore v3.9.2(pulsar v3.0.7)
zookeeperStoreAllowReadOnlyOperations flag is not set in the brokers. Still cluster is in stable state and existing reads/writes works. And few admin get calls also works. Though, from configstore we still see the exceptions like "refusing the connection from not RO clients".

pulsar v3.0.7 and Configstore v3.9.2(pulsar v3.0.7)
metadataStoreAllowReadOnlyOperations flag is not set in the brokers. when global zk lost the quorum and is in RO mode, brokers will not able to make connection with configstore and keep on restarting.
if we enable the metadataStoreAllowReadOnlyOperations in the broker and local session in the configstore, RO session establishment works and existing reads/writes also works but any admin call, lets say even get tenants fails with the keeperSessionExpired exception.
As broker sends a close session call to the configstore on any call made via pulsar client or pulsar admin and fetching the admin policies fails.
Once the quorum is up again, session upgrades automatically and works as expected.

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@Meet0861 Meet0861 added the type/bug The PR fixed a bug or issue reported a bug label Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

No branches or pull requests

1 participant