Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-6145] Function 'TRIM' without parameters throw NullPointerEx… #3869

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

timgrein
Copy link
Contributor

@timgrein timgrein commented Jul 19, 2024

Added an explicit check at parsing time to make sure that the string to trim is not null

Copy link
Member

@caicancai caicancai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should improve the description in reference.md

@timgrein
Copy link
Contributor Author

You should improve the description in reference.md

Do you've something explicit on your mind? I've checked other functions from Character string operators and functions inside reference.md and it doesn't seem to be common to explicitly specify behavior, if certain args are null. I've fixed one issue though in another change: TRIM function documentation refers to string1 two times. Did you have this on your mind?

@timgrein timgrein requested a review from caicancai July 21, 2024 18:02
@caicancai
Copy link
Member

You should improve the description in reference.md

Do you've something explicit on your mind? I've checked other functions from Character string operators and functions inside reference.md and it doesn't seem to be common to explicitly specify behavior, if certain args are null. I've fixed one issue though in another change: TRIM function documentation refers to string1 two times. Did you have this on your mind?

Thanks for reminding me, I didn't notice this.

@@ -10857,6 +10857,7 @@ void assertSubFunReturns(boolean binary, String s, int start,
f.checkFails("trim('' from 'abcde')",
"Trim error: trim character must be exactly 1 character",
true);
f.checkFails("trim()", "String to trim cannot be null", false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the expected error for this function?
Or should the result be null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The creator of the original https://issues.apache.org/jira/browse/CALCITE-6145 suggested to throw an exception with an exact description.

I've checked the behavior of PostgreSQL (v15):

SELECT TRIM() AS trimmed_string;

leads to a syntax error: Query Error: error: syntax error at or near ")".

SELECT TRIM("") AS trimmed_string;

leads to an error: Query Error: error: zero-length delimited identifier at or near """".

SELECT TRIM(null) AS trimmed_string;

returns null.

So I think this aligns with the behavior of PostgreSQL throwing an error during query parsing, but I think the message could be a bit better: trim cannot be called without arguments maybe?

I've added an explicit test for when the value is null, which indeed returns null (aligns with PostgreSQL's behavior).

WDYT? @mihaibudiu

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the function trim() need to have arguments, then the type checker should have complained about not finding a function with a signature without arguments. I don't expect the parser to complain about it, that is strange. The parser should just accept the function, and the type checker should validate it.

The original message is not good, since clearly there is no 'null' involved. However, if the error comes from the parser, there isn't much you can do about it.

@timgrein timgrein requested a review from mihaibudiu July 22, 2024 20:18
@@ -130,6 +130,9 @@ public SqlTrimFunction(String name, SqlKind kind,
if (operands[1] == null) {
operands[1] = SqlLiteral.createCharString(" ", pos);
}
if (operands[2] == null) {
throw new IllegalArgumentException("String to trim cannot be null");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The second argument of trim cannot be null"
It's not clear what "string" you are referring to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's either the first or the third:

You can specify one argument (only the string to trim) or three (where to trim (leading, trailing both sides), what to trim from the string, the string to trim).

Copy link
Contributor Author

@timgrein timgrein Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The second argument of trim cannot be null"

This is not 100% correct I think as the string to trim from can be null: trim('a' from cast(null as varchar(1))) returns null, which is valid. I think it's more about the absence of the argument, which feels to me like a different thing than a value/field being null.

Ideally we should correctly fall through to the default case or handle the cases, where the string to trim is absent explicitly.

@@ -130,6 +130,9 @@ public SqlTrimFunction(String name, SqlKind kind,
if (operands[1] == null) {
operands[1] = SqlLiteral.createCharString(" ", pos);
}
if (operands[2] == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something is wrong here. If you reached this point you have one argument.

Copy link
Contributor Author

@timgrein timgrein Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've two (flag from which side(s) to trim, string to trim from the 3rd operand) right?

For trim(both 'a' from 'aAa') you've three operands:

operand[0]: "BOTH"
operand[1]: "a"
operand[2]: "aAa"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, and the message says that there are "no arguments"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, currently adjusting this

Copy link
Contributor Author

@timgrein timgrein Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so I've debugged through this:

If you call trim() the following happens:

  • Enters the block for 3 operands (first gets a default value, second and third are null, not in the sense of their value, but of their absence)
  • operands[0] gets set to BOTH
  • operands[1] gets so to " "
  • operands[2] is null (indicating absence of the arg not that its actual value is null)

So the actual method call from a user perspective was one without arguments (trim()).

trim(" a ") works fine
trim(both " a ") works fine
trim("a" from) syntax error (string to trim is absent)
trim(both "a" from) syntax error (string to trim is absent)

So to handle the no-args case I think it's correct to check for if (operands[2] == null) in the block with 3 operands. I'll add a comment explaining it, otherwise it's confusing I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the first two arguments have some default values

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mihaibudiu @timgrein Perhaps we can refer to SparkSQL's exception information.
SparkSQL:

spark-sql> select trim();
Error in query: Invalid number of arguments for function trim. Expected: one of 1 and 2; Found: 0; line 1 pos 7

@mihaibudiu
Copy link
Contributor

Please let us know when this is again ready for review.

@timgrein
Copy link
Contributor Author

Please let us know when this is again ready for review.

Looking into this again, is there any guidance on where these type of validations/semantic analysis should live? I see there's another method called validateCall, which feels like a more appropriate place for these kind of validations.

@mihaibudiu
Copy link
Contributor

I think your code is in the right place.
the validation methods do not know anything about the function being called, just its type signature.

@mihaibudiu
Copy link
Contributor

So it this ready for review?

@timgrein
Copy link
Contributor Author

timgrein commented Aug 6, 2024

So it this ready for review?

@mihaibudiu Sorry I was on vacation the last week, I'll give it a read again and let you know, whether it's ready to review!

@timgrein timgrein requested a review from mihaibudiu August 7, 2024 10:55
@timgrein
Copy link
Contributor Author

timgrein commented Aug 7, 2024

Thank you for your patience!

I've adjusted the error message to follow what Calcite uses at a lot of other places Invalid number of arguments to function.... @mihaibudiu also added an explanatory comment, so it's ready to review again. Let me know, if you want to use a different approach or if this fine as a bug fix.

Edit: grepped through the codebase and it seems like messages starting with Invalid number of arguments to function... shouldn't be constructed manually, but rather thrown through CalciteResource#invalidArgCount during type checking (there's also only one instance, where this is called in a specific function SqlDatePartFunction). I'm wondering if a better fix should adapt the corresponding type checker. Maybe something like ONE_OR_MORE would be useful before checking ANY_STRING_STRING? I think that's also something you've mentioned already, I'll check if I can move the fix to the type checker, although I'm not sure yet what's the order of execution (call creation before type checking?).

Copy link

sonarqubecloud bot commented Aug 7, 2024

@mihaibudiu
Copy link
Contributor

Looks like you have some lines that are too long.
You can validate your commits before submitting them by running ./gradlew build

@mihaibudiu
Copy link
Contributor

@timgrein this is almost ready, with just a little more work we can merge it

Copy link

github-actions bot commented Dec 5, 2024

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 90 days if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants