Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-18659: librdkafka compressed produce fails unless api versions returns produce v0 #18727

Open
wants to merge 6 commits into
base: trunk
Choose a base branch
from

Conversation

ijuma
Copy link
Member

@ijuma ijuma commented Jan 28, 2025

Advertise support for produce v0-v2, but return an UnsupportedVersion error if a produce request with these versions is attempted.

Since clients pick the highest supported version by both client and broker during version negotiation, this solves the problem with minimal tech debt (even though it's not ideal that the protocol definition becomes inconsistent with the actual protocol support). Adjust the protocol html generation to exclude produce v0-v2 and fix an existing bug in the response output where removed versions were not excluded.

Add an integration test to verify the behavior for produce v0-v2.

Reference to related librdkafka issue: confluentinc/librdkafka#4956

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@ijuma ijuma changed the title (WIP) KAFKA-18659: librdkafka compressed produce fails unless api versions … (WIP) KAFKA-18659: librdkafka compressed produce fails unless api versions returns produce v0 Jan 28, 2025
@github-actions github-actions bot added core Kafka Broker clients small Small PRs labels Jan 28, 2025
@ijuma ijuma changed the title (WIP) KAFKA-18659: librdkafka compressed produce fails unless api versions returns produce v0 KAFKA-18659: librdkafka compressed produce fails unless api versions returns produce v0 Jan 28, 2025
@ijuma ijuma force-pushed the kafka-18659-librdkafka-compressed-produce-fails branch from 30ea3d7 to 6b145a6 Compare January 29, 2025 06:03
@@ -208,7 +211,7 @@ public static String toHtml() {
// Responses
b.append("<b>Responses:</b><br>\n");
Schema[] responses = key.messageType.responseSchemas();
for (int i = 0; i < responses.length; i++) {
for (int i = oldestVersion; i < responses.length; i++) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug fix is unrelated to the main change in this PR.

@ijuma ijuma force-pushed the kafka-18659-librdkafka-compressed-produce-fails branch from 6b145a6 to 6bbf021 Compare January 29, 2025 06:07
@ijuma ijuma force-pushed the kafka-18659-librdkafka-compressed-produce-fails branch from 6bbf021 to a252374 Compare January 29, 2025 06:10
@ijuma
Copy link
Member Author

ijuma commented Jan 29, 2025

The test failures look related:

FAILED ❌ KafkaAdminClientTest > testAdminClientApisAuthenticationFailure()
FAILED ❌ ProcessorTest > testParseRequestHeaderWithUnsupportedApiVersion()
FAILED ❌ KafkaApisTest > shouldReplaceProducerFencedWithInvalidProducerEpochInProduceResponse()
FAILED ❌ KafkaApisTest > testTransactionalParametersSetCorrectly()

Investigating.

Copy link
Member

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ijuma thanks for this fix. Does this patch enable Kafka clients to communicate with older servers that support Produce v0-v2? If so, should we disable that request on client-side?

// See `ProduceRequest.MIN_VERSION` for details on why we need to do this
if (produceRequest.version < ProduceRequest.MIN_VERSION) {
requestHelper.sendErrorResponseMaybeThrottle(request, Errors.UNSUPPORTED_VERSION.exception())
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

; is unnecessary

@ijuma
Copy link
Member Author

ijuma commented Jan 29, 2025

@chia7712 Good point regarding the client behavior, I'll look into adjusting that too.

@ijuma
Copy link
Member Author

ijuma commented Jan 29, 2025

Thinking about it, there is probably a cleaner way to make this change. Let me try that.

…ibrdkafka-compressed-produce-fails

* apache-github/trunk:
  MINOR: prevent exception from HdrHistogram (apache#18674)
  KAFKA-18653: Fix mocks and potential thread leak issues causing silent RejectedExecutionException in share group broker tests (apache#18725)
  KAFKA-18646: Null records in fetch response breaks librdkafka (apache#18726)
  KAFKA-18619: New consumer topic metadata events should set requireMetadata flag (apache#18668)
  KAFKA-18488: Improve KafkaShareConsumerTest (apache#18728)
@@ -44,7 +47,7 @@
// transaction V2 (KIP_890 part 2) is enabled, the produce request will also include the function for a
// AddPartitionsToTxn call. If V2 is disabled, the client can't use produce request version higher than 11 within
// a transaction.
"validVersions": "3-12",
"validVersions": "0-12",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of bringing back v0 and v1, could we just customize the minVersion of the produce ApiKey in ApiResponse?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am indeed exploring something involving customizing ApiKey/ApiResponse (that's what I meant when I said that there may be a cleaner way to make the change). But I was planning to leave 0-12 here and reduce the range within the code.

The main advantage is that you get appropriate errors if you try to produce with v0-2. The option you suggest would result in a disconnection. I think a disconnection is too confusing if the given version is included as part of the api versions response.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that UNSUPPORTED_VERSION was added in 0.10.0 https://github.com/apache/kafka/pull/986/files#diff-0ae188e241ee35148145e7e04ed2acb3c20356650194f3603811c51b886ebe20. Produce v0/v1 already existed in 0.9. So, even if we send UNSUPPORTED_VERSION in the error, the 0.9 client won't be able to understand it. So, it really only helps with Produce v2 request.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's nearly impossible to make the experience for these super ancient clients good. I was more thinking about client developers implementing the kafka protocol.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could make an effort to send a more meaningful error code for produce v0-v2. But since they are not really supported on the broker, just doing a disconnect like any other unsupported version also seems reasonable.

@github-actions github-actions bot removed the small Small PRs label Jan 30, 2025
.filter(apiKey -> apiKey.inScope(ApiMessageType.ListenerType.BROKER))
.collect(Collectors.toList());
return EnumSet.copyOf(apis);
return brokerApis();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This simplification is unrelated to the main change in this PR - just something I noticed could be cleaned up.

@@ -208,26 +208,26 @@ public static String toHtml() {
// Responses
b.append("<b>Responses:</b><br>\n");
Schema[] responses = key.messageType.responseSchemas();
for (int i = 0; i < responses.length; i++) {
Schema schema = responses[i];
for (int version = key.oldestVersion(); version < key.latestVersion(); version++) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the same approach we use for requests for consistency - it should result in the same behavior, but fail more clearly if there is some inconsistency.

public Optional<ApiVersionsResponseData.ApiVersion> toApiVersion(boolean enableUnstableLastVersion,
Optional<ApiMessageType.ListenerType> listenerType) {
// see `PRODUCE_MIN_VERSION` for details on why we do this
short oldestVersion = (this == PRODUCE && listenerType.map(l -> l == ApiMessageType.ListenerType.BROKER).orElse(false)) ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excuse me, but why change the (produce) schema version after we decided to change the API response? I assumed all we needed to do was return the API response including v0-v2 produce requests to fix the librdkafka issue. Then, the schema could remain at v3-v12, so the v0-v2 requests would still be rejected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will explain the changes in more detail after I confirm that the test results are passing.

@github-actions github-actions bot added the tools label Jan 30, 2025
@ijuma
Copy link
Member Author

ijuma commented Jan 30, 2025

@chia7712 @junrao The tests should now be passing for this PR. The comments should also make it clear what's going on. The main point of contention is whether we should remove v0-v2 from the protocol definition files. If we do that, there are three main implications for this PR:

  1. The ApiKeys.oldestVersion special case would no longer be required.
  2. The ApiKeys.toApiVersion code would have to refer to a separate constant for Produce min version (instead of retrieving it from the generated class).
  3. We would have to remove testProduceRequestV0ToV2IsRejectedByBroker since there would be no way to produce with v0-v2. We would probably have to create a ducktape test to replace that. I am not sure how easy or hard that would be (ducktape tests typically use clients instead of testing specific requests).

Given that, I would prefer to keep the current approach to unblock 4.0. I am ok with filing a ticket to do the 3 things listed above, if we prefer that approach. Overall, I think it's a 50/50 situation (neither approach is clearly better than the other), but I am ok if someone else wants to tackle the alternative option.

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants