Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSE may extract Window functions into a Project node #15190

Open
Blizzara opened this issue Mar 12, 2025 · 1 comment
Open

CSE may extract Window functions into a Project node #15190

Blizzara opened this issue Mar 12, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@Blizzara
Copy link
Contributor

Describe the bug

It seems the Common Subexpression Elimination rule may eliminate also WindowFunctions by moving them into a Project node, which then causes physical planning to fail since a Project cannot actually execute those WindowFunctions.

To Reproduce

The following test fails on current main:

#[tokio::test]
async fn window() -> Result<()> {
    let wexpr = row_number_udwf().call(vec![]);
    let plan = LogicalPlanBuilder::empty(true)
        .window(vec![wexpr.clone(), wexpr.alias("aliased")])?
        .build()?;
    println!("plan: {}", plan.display_indent_schema());
    let ctx = SessionStateBuilder::new().build();
    let plan = ctx.optimize(&plan)?;
    println!("optimized plan: {}", plan.display_indent_schema());
    DataFrame::new(ctx, plan).collect().await?;
    Ok(())
}

with

plan: WindowAggr: windowExpr=[[row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS aliased]] [row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING:UInt64, aliased:UInt64]
  EmptyRelation []

optimized plan: Projection: row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, aliased [row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING:UInt64, aliased:UInt64]
  WindowAggr: windowExpr=[[__common_expr_1 AS row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, __common_expr_1 AS aliased]] [__common_expr_1:UInt64, row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING:UInt64, aliased:UInt64]
    Projection: row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS __common_expr_1 [__common_expr_1:UInt64]
      EmptyRelation []


Error: NotImplemented("Physical plan does not support logical expression WindowFunction(WindowFunction { fun: WindowUDF(WindowUDF { inner: RowNumber { signature: Signature { type_signature: Nullary, volatility: Immutable } } }), params: WindowFunctionParams { args: [], partition_by: [], order_by: [], window_frame: WindowFrame { units: Rows, start_bound: Preceding(UInt64(NULL)), end_bound: Following(UInt64(NULL)), is_causal: false }, null_treatment: None } })")

Note how the optimize() has introduced a Project node which contains the row_number() call.

Removing either expr from the window() call makes the test pass.

Expected behavior

Test succeeds. Preferrably the Window node notices the common subexpression and deduplicates it, but that's a bonus, as long as the plan runs.

Additional context

No response

@Blizzara Blizzara added the bug Something isn't working label Mar 12, 2025
@Blizzara
Copy link
Contributor Author

Adding Expr::WindowFunction(_) to the list here seems to fix this, but I don't know if that is the right fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant