-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-51008][SQL] Add ResultStage for AQE #49715
base: master
Are you sure you want to change the base?
Conversation
sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
Show resolved
Hide resolved
cc @ulysses-you |
@@ -588,7 +639,7 @@ case class AdaptiveSparkPlanExec( | |||
if (plan.children.isEmpty) { | |||
CreateStageResult(newPlan = plan, allChildStagesMaterialized = true, newStages = Seq.empty) | |||
} else { | |||
val results = plan.children.map(createQueryStages) | |||
val results = plan.children.map(createQueryStagesInternal) | |||
CreateStageResult( | |||
newPlan = plan.withNewChildren(results.map(_.newPlan)), | |||
allChildStagesMaterialized = results.forall(_.allChildStagesMaterialized), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the new code is a bit hard to read. Not sure if there are some developing context.
Can we create result query stage here ? If the plan is root query and allChildStagesMaterialized
then we wrap ResultQueryStage
and it is not a materialized stage, so aqe will materialize it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds reasonable to me, cc @liuzqt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this pr is just one of the stage level feature
prs ?
@ulysses-you yes, after this PR, we can implement the proposed idea in #44013 (comment) and keep contexts in the AQE query stage. |
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
Outdated
Show resolved
Hide resolved
…e/AdaptiveSparkPlanExec.scala Co-authored-by: Wenchen Fan <[email protected]>
…e/AdaptiveSparkPlanExec.scala Co-authored-by: Wenchen Fan <[email protected]>
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
Outdated
Show resolved
Hide resolved
@@ -579,23 +592,52 @@ case class AdaptiveSparkPlanExec( | |||
allChildStagesMaterialized = false, | |||
newStages = Seq(newStage)) | |||
|
|||
case q: QueryStageExec => | |||
case q: QueryStageExec if q ne currentPhysicalPlan => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this condition protect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have plan like this:
ShuffleQueryStage 0
+- Exchange hashpartitioning(key#17, 5), REPARTITION_BY_COL, [plan_id=89]
+- *(1) SerializeFromObject [invoke(knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key()) AS key#17, static_invoke(UTF8String.fromString(invoke(knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).value()))) AS value#18]
+- Scan[obj#14]
where the root plan is a ShuffleQueryStageExec
and we have to create a ResultQueryStage on top of it.
==>
ResultQueryStage 1
+- AQEShuffleRead coalesced
+- ShuffleQueryStage 0
+- Exchange hashpartitioning(key#17, 5), REPARTITION_BY_COL, [plan_id=89]
+- *(1) SerializeFromObject [invoke(knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key()) AS key#17, static_invoke(UTF8String.fromString(invoke(knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).value()))) AS value#18]
+- S...
This refactor is equivalent to my previous implementation using createQueryStagesInternal
and create result stage in the external createQueryStages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see, let's add comments to explain it
// We can skip creating a new query stage if the given plan is already a query stage.
// Note: if this query stage is the root node, we still need to create a result query stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, the caller always invokes createQueryStages
with currentPhysicalPlan
, so we know when to deal with the result stage. Now I feel the previous code is clearer. Maybe just name it better? e.g. createQueryStages
and createNonResultQueryStages
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed an even more complicated pattern from broken test: We create a new non-result query stage as the root node, and that query stage is immediately materialized due to stage reuse, so we have to create result stage right after. Current implementation can not handle such case, and fixing is might be hacky...
So yes I think maybe separating result and non-result query stage creation is a better option. I'll rename it and add some comments to clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed an even more complicated pattern from broken test: We create a new non-result query stage as the root node, and that query stage is immediately materialized due to stage reuse, so we have to create result stage right after. Current implementation can not handle such case, and fixing is might be hacky...
So yes I think maybe separating result and non-result query stage creation is a better option. I'll rename it and add some comments to clarify.
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
Outdated
Show resolved
Hide resolved
assert(plan2.isInstanceOf[ResultQueryStageExec]) | ||
assert(plan1 ne plan2) | ||
assert(plan1.asInstanceOf[ResultQueryStageExec].plan | ||
.fastEquals(plan2.asInstanceOf[ResultQueryStageExec].plan)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should they be equal? I think these two result stages should have different handler functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes they have different handler function. But the root plan they wrap should be the same(which is the original AQE root plan)
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
Outdated
Show resolved
Hide resolved
currentPhysicalPlan = newPhysicalPlan | ||
currentLogicalPlan = newLogicalPlan | ||
stagesToReplace = Seq.empty[QueryStageExec] | ||
if (!currentPhysicalPlan.isInstanceOf[ResultQueryStageExec]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to skip ResultQueryStageExec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Result stage is already the last step, there is nothing to reoptimize.
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
Outdated
Show resolved
Hide resolved
* Run `fun` on finalized physical plan | ||
*/ | ||
def withFinalPlanUpdate[T](fun: SparkPlan => T): T = lock.synchronized { | ||
_isFinalPlan = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so when we call df.collect multi-times, we will re-optimize final stage multi-times. It is due to for each call we need to wrap new ResultQueryStageExec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we construct QueryResultStageExec
directly and won't re-optimize it: https://github.com/apache/spark/pull/49715/files#diff-ec42cd27662f3f528832c298a60fffa1d341feb04aa1d8c80044b70cbe0ebbfcR536
} | ||
_isFinalPlan = true | ||
finalPlanUpdate | ||
currentPhysicalPlan.asInstanceOf[ResultQueryStageExec].resultOption.get().get.asInstanceOf[T] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it mean we would cache result data ? is it expected ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, this is actually a side effect of all QueryStageExec...
We can implement a "fetch-oncesemantic which only fetch once at the end of AQE loop. But still we can not prevent user from accessing it multiple times as long as they can access the
ResultQueryStageExec` node from the query plan.
@cloud-fan what do you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good catch! This stops the result from being GCed if the users throw away the result of df.collect()
but still keep the df
around.
Maybe the final outcome of a ResultStage
should be Unit
which is only used to trigger the final plan calculation. The caller side is still responsible for running the function to get the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposal above can also simplify things: once a result stage is created, we never need to recreate it as the final plan is finalized. It's similar to the def getFinalPhysicalPlan()
style before.
What changes were proposed in this pull request?
Added ResultQueryStageExec for AQE
How does the query plan look like in explain string:
How does the query plan look like in Spark UI:
Why are the changes needed?
Currently AQE framework is not fully self-contained since not all plan segments can be put into a query stage: the final "stage" basically executed as a nonAQE plan. This PR added a result query stage for AQE to unify the framework. With this change, we can build more query stage level features, one use case like #44013 (comment)
Does this PR introduce any user-facing change?
NO
How was this patch tested?
new unit tests.
Also exisiting tests which are impacted by this change are updated to keep their original test semantics.
Was this patch authored or co-authored using generative AI tooling?
NO