-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YARN-11745: Fix TimSort contract violation in PriorityQueueComparator Class #7278
base: trunk
Are you sure you want to change the base?
YARN-11745: Fix TimSort contract violation in PriorityQueueComparator Class #7278
Conversation
🎊 +1 overall
This message was automatically generated. |
Thanks @Hean-Chhinling for the fix!
|
...er/resourcemanager/scheduler/capacity/policy/TestPriorityUtilizationQueueOrderingPolicy.java
Outdated
Show resolved
Hide resolved
I think there are some check style error what should be fixed: |
Thank you for pointing this out. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Hean-Chhinling!
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Hean-Chhinling for the patch! Nice find and clean fix, I only had some smaller styling comments.
QueueCapacities mockQueueCapacities = mock(QueueCapacities.class); | ||
when(mockQueueCapacities.getAbsoluteUsedCapacity(partition)) | ||
.thenReturn(4.2f); // Could be any number | ||
when(mockQueueCapacities.getUsedCapacity(partition)) | ||
.thenReturn(1.0f); // Could be any number | ||
when(mockQueueCapacities.getAbsoluteCapacity(partition)) | ||
.thenReturn(6.2f); // Could be any number | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: why not use the randomQueueCapacities method below? Would make the code a bit cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review @brumi1024!
I try using the randomQueueCapacities
method but the exception did not occur.
The reason that I am using the this mockQueueCapacities
is to have the same values for queues capacities that are generated. So that the PriorityQueueComparator
class compare
methods will fall into using the configureMinResource
value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks. In that case I think the "// Could be any number" comment is a bit misleading, maybe we could specify that it could be any specific number, so that there are equal values.
QueueResourceQuotas mockResourceQuotas = mock(QueueResourceQuotas.class); | ||
when(mockResourceQuotas.getConfiguredMinResource(partition)) | ||
.thenReturn(resource); | ||
when(mockQueue.getQueueResourceQuotas()) | ||
.thenReturn(mockResourceQuotas); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, randomResourceQuotas seems to be similar functionality. AFAICS the (0,0) case is the reason this was needed, that probably could be solved by creating a "zeroResourceQuotas" or something similar which ensures a few queues with zero minResource are present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion here, I have refactored my code to use randomResourceQuotas
. And randomly override queue resource to zero.
boolean isZeroResource = ThreadLocalRandom.current().nextBoolean();
if (isZeroResource) {
resourceQuotas.setConfiguredMinResource(partition, Resource.newInstance(0, 0));
}
...er/resourcemanager/scheduler/capacity/policy/TestPriorityUtilizationQueueOrderingPolicy.java
Outdated
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
QueueCapacities mockQueueCapacities = mock(QueueCapacities.class); | ||
when(mockQueueCapacities.getAbsoluteUsedCapacity(partition)) | ||
.thenReturn(4.2f); // Could be any number | ||
when(mockQueueCapacities.getUsedCapacity(partition)) | ||
.thenReturn(1.0f); // Could be any number | ||
when(mockQueueCapacities.getAbsoluteCapacity(partition)) | ||
.thenReturn(6.2f); // Could be any number | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks. In that case I think the "// Could be any number" comment is a bit misleading, maybe we could specify that it could be any specific number, so that there are equal values.
@@ -331,6 +386,7 @@ private QueueResourceQuotas randomResourceQuotas(String partition) { | |||
QueueResourceQuotas qr = new QueueResourceQuotas(); | |||
qr.setConfiguredMinResource(partition, | |||
Resource.newInstance(randInt(1, 10) * 1024, randInt(1, 10))); | |||
// System.out.println("Quotas: " + qr.getConfiguredMinResource(partition)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not needed, I will delete it. It was used during my testing that I forgot to remove it.
queues.add(createMockPriorityQueueResourcesForSorting(partition)); | ||
queues.add(createMockPriorityQueueResourcesForSorting(partition)); | ||
queues.add(createMockPriorityQueueResourcesForSorting(partition)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there are no values supplied in the params the 3 calls seems unnecessary, you could simply increate the limit from 300 to 900.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I am sorry for the oversight.
🎊 +1 overall
This message was automatically generated. |
Description of PR
This PR addresses the TimSort contract violation issue in the
PriorityQueueComparator
class, which was identified when sorting queue with resource (0, 0) and queue resource(any number, any number). The comparator previously failed to maintain transitivity of TimSort, leading tojava.lang.IllegalArgumentException: Comparison method violates its general contract!
during sorting.Root Cause
The issue occurred due to inconsistent comparison logic that violated the transitivity rules required by TimSort.
Specifically, at the following code lines the AND condition that only compare the resources if both queues' resources are not none. However, when one of the queue resource is
none or (0, 0)
and the other queue resource isnot none
it skips this condition and go to compare based on theabsoluteCapacity
and when both of the queues'absoluteCapacity
are the same, it leads to both queues equal each other even though their resources are different.For more detail example of how this behaviour break the TimSort algorithm please see this attachment. ExampleZeroQueueResourceProblem
Solution
Instead of checking if both queues' resource are not none. We should only check if one of the queue's resource is not none. This way to avoid skipping the queue resource comparison when we have one queue resource is not none and the other one is none. Specifically, change from AND condition to OR condition at the following codes:
if (!minEffRes1.equals(Resources.none()) || !minEffRes2.equals( Resources.none())) { return minEffRes2.compareTo(minEffRes1); }
Testing
Added a unit test
testPriorityQueueComparatorClassDoesNotViolateTimSortContract
to verify that sorting no longer throwsjava.lang.IllegalArgumentException: Comparison method violates its general contract!
.The test includes setting resource instance (0, 0) and resource(any number, any number) then shuffle the repeated queues that were created and then sort in using the
PriorityQueueComparator
class. It also mock the necessary elements, for example,priority
label
,absoluteUsedCapacity
,usedCapacity
andabsoluteCapacity
. These elements can be of any number.Impact
TimSort
violation, ensuring stable and predictable sorting forPriorityQueueComparator
class.PriorityQueueComparator
sorting algorithm may change in some behaviour when the queue resource is (0, 0).For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?