-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add decimal argument support to round function #713
Conversation
extensions/functions_rounding.yaml
Outdated
and this value cannot be exactly represented, this specifies how | ||
to round it. | ||
- TIE_TO_EVEN: round to nearest value; if exactly halfway, tie |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this happen with decimal representations? I'd argue all of the floating point handling stuff here does not apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would these not apply? For example, the input value 2.5
could be represented exactly in a decimal type, but rounding it to the nearest integer would result in 2
if the rounding mode is TIE_TO_EVEN
or 3
if the mode is TIE_AWAY_FROM_ZERO
.
extensions/functions_rounding.yaml
Outdated
@@ -268,3 +268,43 @@ scalar_functions: | |||
AWAY_FROM_ZERO, TIE_DOWN, TIE_UP, TIE_TOWARDS_ZERO, TIE_TO_ODD ] | |||
nullability: DECLARED_OUTPUT | |||
return: fp64? | |||
- args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all other decimal functionality we have placed them in _decimal.yaml
files. Not sure if we want to have just this one function in a file by itself though.
extensions/functions_rounding.yaml
Outdated
When `s` is a negative number, the rounding is | ||
performed to the left side of the decimal point | ||
as specified by `s`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this operation affect the scale? We should probably clarify that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this function could return a different decimal type (i.e. reduce the precision and the scale parameters), but I was working on the assumption that it would just return a different value. I'm not sure if that is what you are asking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not assume. Let's add some expected behaviors here. Once we get tests inside core, we can transplant those into test cases. And if this is the behaviors of spark we're trying to match, we shouldn't probably just put this in a spark function file (or name it spark_round here). Decimal behavior is often quite different between different systems.
97b20bd
to
1dbdb4b
Compare
Just revisiting this. I've updated the parameters of the decimal return type to match the logic used by the Spark round functions: I note the comment here: #671 (comment) Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can use any constants in the calculation. The s parameter in round could be not a constant although I suspect some backends only will allow constants here. If it is not a constant then we don't know a lot. If the number turns out to be negative the scale could even increase. Because that second value could change the value in any manner we likely have to return maximum scale and precision here.
I will take it as an action to try running these tests against a backend (as the consumer testing does) to see if the tests are functional/correct.
|
||
# negative_rounding: Examples with negative rounding | ||
round(2::dec<2,0>, -2::i32) = 0::dec<2,0> | ||
round(123::dec<2,0>, -2::i32) = 100::dec<2,0> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are three digits here so the precision needs to be 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last two lines should be:
round(123::dec<3,0>, -2::i32) = 100::dec<3,0>
round(8793::dec<4,0>, -2::i32) = 8800::dec<4,0>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this file is the rounding_decimal directory I'd name this file ceil.test.
|
||
# negative_rounding: Examples with negative rounding | ||
round(2::dec<2,0>, -2::i32) = 0::dec<2,0> | ||
round(123::dec<2,0>, -2::i32) = 100::dec<2,0> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last two lines should be:
round(123::dec<3,0>, -2::i32) = 100::dec<3,0>
round(8793::dec<4,0>, -2::i32) = 8800::dec<4,0>
### SUBSTRAIT_INCLUDE: '/extensions/functions_rounding_decimal.yaml' | ||
|
||
# basic: Basic examples without any special cases | ||
ceil(2.25::dec<8,2>) = 3::dec<7,0> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, DuckDB returns decimal<8,0> for the first two and decimal<2,0> for the last one (and decimal<8,0> for the floor tests too). There may be some variation between systems here.
### SUBSTRAIT_INCLUDE: '/extensions/functions_rounding_decimal.yaml' | ||
|
||
# basic: Basic examples without any special cases | ||
round(2::dec<2,0>, 2::i32) = 2::dec<3,0> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DuckDB returns:
2,0
8,1
2,0
3,0
4,0
Thanks @EpsilonPrime, that's helpful. It looks like the scale of the decimal returned by DuckDB is also a function of the second parameter of Given the following query: select num, floor(num), ceil(num),
round(num, -2), round(num, -1), round(num, 0),
round(num, 1), round(num, 2), round(num, 3)
from (values (0.5), (-0.5), (999.9), (-999.9), (2.75)) as table(num) Spark produces the following output and type schema:
I propose, then, that the given an input of type return: |-
precision = min(P + 1, 38)
decimal?<precision, S> Which is not necessarily what it actually returns, but is the maximum precision/scale of what it could return (taking into account your earlier comment). How does that sound? |
The round function has a number of variants to support different numeric types. This commit adds support for rounding decimals. The precision of the resultant decimal type is one greater than the precision of the input decimal to allow for rounding up to the next decimal digit. The scale of the resultant decimal type is the same as the input type since the result of rounding cannot add any further decimal places. Signed-off-by: Andrew Coleman <[email protected]>
1dbdb4b
to
92c964b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me.
Thanks for the approval @EpsilonPrime. Thanks! |
@jacques-n, would you be happy with this? ^^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the two points correctly they are:
- If we had a notion of constant arguments we could potentially express a more accurate return type.
- Some engines choose to return data-dependent return types that are more narrow than the worst-case scenario.
Regarding (1) I don't think this corner case warrants the additional complexity of constant arguments. I'm not aware of any significant planner improvements that could be made by more clearly knowing the decimal scale/precision.
Regarding (2) I don't think this is a Substrait concern. Engines are always permitted to return more efficient encodings of data as long as the values fit the range defined by the return type. It's not possible for Substrait (which doesn't have access to data) to make a better decision with the data it does have.
So I approve.
The only thing that might be nice to clarify is what should happen if a 38 digit number consisting of all 9's is rounded up. Does it saturate (return the number unchanged) or emit an error?
I guess that would be implementation specific? In the case of Spark, it throws an error:
|
Ok. Let's worry about it in a follow-up. We can either document the error behavior or add an overflow option as we learn more. |
We do. We just haven't figured out yet how to expose to type derivation.
Then those engines don't implement these functions. |
Good point. I thought we removed them but I was thinking of optional arguments. Why can't we expose constant arguments to type derivation? It seems the concern was:
But if we mark
In that case it doesn't sound like there are any engines that implement these functions? I read the following as "engine decides return type based on data":
|
The round function has a number of variants to support different numeric types. This commit adds support for rounding decimals. This is required for the spark module.