Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer equality in boolean comparisons #34166

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ranma42
Copy link
Contributor

@ranma42 ranma42 commented Jul 5, 2024

In most databases, equality comparisons can take advantage of indexing and inequalities cannot.

Fixes #34164.

@@ -839,7 +839,7 @@ public override async Task Rewrite_compare_bool_with_bool(bool async)
"""
SELECT "e"."Id"
FROM "Entities1" AS "e"
WHERE "e"."BoolA" <> "e"."NullableBoolB"
WHERE "e"."BoolA" = (NOT ("e"."NullableBoolB"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a positive change? I doubt any database out there will use index with equality and NOT, more than it would for inequality, no? Should we make this change more targeted, so that it doesn't do this specific transformation?

Copy link
Contributor Author

@ranma42 ranma42 Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a positive change?

It is a positive change, at least as long as we do not analyze the provenance of the columns.
In this specific case, it is basically irrelevant: the whole table is going to be scanned linearly (just once).
If the left column and the right column came from different tables, it would be much more efficient.

I doubt any database out there will use index with equality and NOT, more than it would for inequality, no?

At least Sqlite does. This is basically an instance of #34048 (comment)

Should we make this change more targeted, so that it doesn't do this specific transformation?

Do you expect worse plans on some db?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add some examples in the issue #34164.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Sqlite and Postgres examples 🚀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So between a <> b and a = NOT(b), the former certainly seems more natural, and what I'd expect a standard SQL query to look like. If these perform the same, I'd definitely prefer the first, at least for readability etc. (and we do generally care about that).

I could alsi imagine a database where the planner optimizes to use an index on b (effectively transforming the inequality into a not on a), where this change would cause a regression. of course, this is entirely speculative, and I have an actually looked into which databases basis to which optimizations.

I'd prefer us to do a bit more cross database research before merging a change like this, which at the very least makes our SQL less readable/standard/expected (and that does tend to have some correlation sometimes with performance). If this address is a very specific Sqlite behavior, where equality is always better, we always have the option of doing this change for sqlite only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I wrote the above comment before noticing you posted data on other databases… some remarks:

  • For the Sqlite case, have you confirmed that the second option, where an index is built, is actually faster than the first?
  • For the SQL Server case, the total subtree cost is actually higher with the second method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will perform some measurement, but I do not have real code/an actual database that is using this filter; I will try to do some synthetic examples (I'll try to cover some interesting cases, but a real-world case would definitely be more relevant).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some (synthetic) benchmarks have been posted in #34164

@ranma42
Copy link
Contributor Author

ranma42 commented Jul 12, 2024

Maybe I should have explained this in advance: this is not a new/different approach to translate boolean (in)equalities; it is just making the code more consistent in choosing = over <> (as per the issue).

Ideally there should be no duplication of this code, hence it should be "inevitably" consistent.

@ranma42 ranma42 force-pushed the prefer-equal branch 2 times, most recently from 7f57ec3 to 32d05b3 Compare July 13, 2024 17:45
@ranma42
Copy link
Contributor Author

ranma42 commented Jul 13, 2024

I cleaned up the code a little (the same equality conversion logic is now shared between OptimizeComparison and RewriteNullSemantics.

@ranma42
Copy link
Contributor Author

ranma42 commented Jul 29, 2024

Rebased to resolve conflicts

@ranma42
Copy link
Contributor Author

ranma42 commented Dec 23, 2024

Rebased to resolve conflicts

…Entities

Problem was that in EF9 we moved some optimizations from sql nullability processor to SqlExpressionFactory (so that we optimize things early). One of the optimizations:

```
!(true == a) -> false == a
!(false == a) -> true == a
```

is not safe to do when a is a constant or parameter null value, because it evaluates to true IS NULL (two value logic). Fix is to remove this optimization from SqlExpressionFactory and instead put it back into OptimizeNotExpression, once we've converted everything we could to IS NULL checks already.

Fixes dotnet#35393
@ranma42 ranma42 marked this pull request as draft January 1, 2025 09:06
They can take advantage of indexing.
These are the baselines regenerated after a rebase on top of
35fc423.
@ranma42
Copy link
Contributor Author

ranma42 commented Jan 6, 2025

This is currently marked as draft as it is based on (an old version of):

@maumar
Copy link
Contributor

maumar commented Jan 7, 2025

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@maumar
Copy link
Contributor

maumar commented Jan 7, 2025

@ranma42 I think it makes sense to use this PR in favor of 35395 as the basis for the fix (if we iron out all the kinks), at least for main. We could consider scoping it for patch, but that can be a separate discussion

@ranma42
Copy link
Contributor Author

ranma42 commented Jan 7, 2025

👍 @maumar your fix is still needed, at least the part that drops the invalid optimization from SqlExpressionFactory, but we can probably avoid re-introducing the code in SqlNullabilityProcessor.
I will work on cleaning this up a bit and experiment with showing & fixing the regression that occurs on value-converted types as mentioned in #35395 (comment)

@maumar
Copy link
Contributor

maumar commented Jan 14, 2025

@ranma42 also consider extending Rewrite_compare_bool_with_bool test to include constants. I did that as part of the patch fix and it indeed caught the error e12547f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prefer equality when comparing values
4 participants