Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(consumer): add recovery from no leader partitions #3101

Merged
merged 3 commits into from
Feb 23, 2025
Merged

Conversation

liutao365
Copy link
Contributor

When some topic partitions have no leader due to Kafka broker failures, the Sarama consumer group should be able to continue consuming partitions that have leaders and resume consuming the partitions that previously had no leader once they return to normal. This pull request addresses this issue.

When some topic partitions have no leader due to Kafka broker failures,
the Sarama consumer group should be able to continue consuming
partitions that do have leaders and resume consuming the partitions that
previously had no leader once they return to normal.

Signed-off-by: liutao366 <[email protected]>
Copy link
Contributor

@puellanivis puellanivis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nothing significant I can see that should be improved. While there is a suggestion here, I don’t think it’s necessarily any better than the existing approach.

@liutao365
Copy link
Contributor Author

Add test case:

  1. create a more than 6 partitions and 1 replication-factor (simplify the test case) topic and produce some millions messages .
  2. shut down one of the kafka brokers, which will cause some of the topic partitions lose leader.
  3. run the expamples/consumergroup/main.go and add a configuration
    image
  4. restore the broker previously shut down.
  5. describe the consumergroup and check all the partittions have no consumer lag

@dnwe dnwe changed the title fix: Fix the problem when some partitions have no leader fix(consumer): add recovery from no leader partitions Feb 19, 2025
@dnwe dnwe force-pushed the main branch 3 times, most recently from 990524c to 5dc4e24 Compare February 19, 2025 16:53
@dnwe
Copy link
Collaborator

dnwe commented Feb 19, 2025

@liutao365 thanks for proposing this change, the approach looks good to me – I added a commit to fixup the client locking (the FV was failing for the race condition) and another commit to slightly refactor the consume partition code into a single named func to cover both paths. Can you take a look and confirm you're happy? Also @puellanivis if you wouldn't mind re-reviewing that would be great too.

We should probably add a unittest to cover this scenario, but I'm happy for us to do that under a follow-up PR as it is probably worth landing this fix sooner rather than later

@dnwe dnwe added the fix label Feb 19, 2025
@liutao365
Copy link
Contributor Author

@liutao365 thanks for proposing this change, the approach looks good to me – I added a commit to fixup the client locking (the FV was failing for the race condition) and another commit to slightly refactor the consume partition code into a single named func to cover both paths. Can you take a look and confirm you're happy? Also @puellanivis if you wouldn't mind re-reviewing that would be great too.

We should probably add a unittest to cover this scenario, but I'm happy for us to do that under a follow-up PR as it is probably worth landing this fix sooner rather than later

Hi, @dnwe , I've red the changes and its better than my original commit, I'm happy with this improvements.

Copy link
Member

@prestona prestona left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Just to avoid some duplication here

Signed-off-by: Dominic Evans <[email protected]>
@dnwe dnwe merged commit 60592f6 into IBM:main Feb 23, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants