feat: add decimal argument support to round function #713

andrew-coleman · 2024-09-26T15:43:15Z

The round function has a number of variants to support different numeric types. This commit adds support for rounding decimals. This is required for the spark module.

EpsilonPrime · 2024-10-01T06:44:10Z

extensions/functions_rounding.yaml

+              and this value cannot be exactly represented, this specifies how
+              to round it.
+
+                - TIE_TO_EVEN: round to nearest value; if exactly halfway, tie


Can this happen with decimal representations? I'd argue all of the floating point handling stuff here does not apply.

Why would these not apply? For example, the input value 2.5 could be represented exactly in a decimal type, but rounding it to the nearest integer would result in 2 if the rounding mode is TIE_TO_EVEN or 3 if the mode is TIE_AWAY_FROM_ZERO.

EpsilonPrime · 2024-10-01T06:45:15Z

extensions/functions_rounding.yaml

@@ -268,3 +268,43 @@ scalar_functions:
              AWAY_FROM_ZERO, TIE_DOWN, TIE_UP, TIE_TOWARDS_ZERO, TIE_TO_ODD ]
        nullability: DECLARED_OUTPUT
        return: fp64?
+      - args:


For all other decimal functionality we have placed them in _decimal.yaml files. Not sure if we want to have just this one function in a file by itself though.

EpsilonPrime · 2024-10-01T06:46:56Z

extensions/functions_rounding.yaml

+
+              When `s` is a negative number, the rounding is
+              performed to the left side of the decimal point
+              as specified by `s`.


Does this operation affect the scale? We should probably clarify that here.

I guess this function could return a different decimal type (i.e. reduce the precision and the scale parameters), but I was working on the assumption that it would just return a different value. I'm not sure if that is what you are asking.

Let's not assume. Let's add some expected behaviors here. Once we get tests inside core, we can transplant those into test cases. And if this is the behaviors of spark we're trying to match, we shouldn't probably just put this in a spark function file (or name it spark_round here). Decimal behavior is often quite different between different systems.

andrew-coleman · 2025-02-06T09:45:14Z

Just revisiting this. I've updated the parameters of the decimal return type to match the logic used by the Spark round functions:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala#L1492

I note the comment here: #671 (comment)
Does this still apply? If so, how would you suggest I work around this since the output decimal scale depends on that second argument?

Thanks!

EpsilonPrime

I don't think we can use any constants in the calculation. The s parameter in round could be not a constant although I suspect some backends only will allow constants here. If it is not a constant then we don't know a lot. If the number turns out to be negative the scale could even increase. Because that second value could change the value in any manner we likely have to return maximum scale and precision here.

I will take it as an action to try running these tests against a backend (as the consumer testing does) to see if the tests are functional/correct.

EpsilonPrime · 2025-02-14T07:02:35Z

tests/cases/rounding_decimal/round_decimal.test

+
+# negative_rounding: Examples with negative rounding
+round(2::dec<2,0>, -2::i32) = 0::dec<2,0>
+round(123::dec<2,0>, -2::i32) = 100::dec<2,0>


there are three digits here so the precision needs to be 3

The last two lines should be:

round(123::dec<3,0>, -2::i32) = 100::dec<3,0>
round(8793::dec<4,0>, -2::i32) = 8800::dec<4,0>

EpsilonPrime · 2025-02-21T02:44:48Z

tests/cases/rounding_decimal/ceil_decimal.test

Since this file is the rounding_decimal directory I'd name this file ceil.test.

EpsilonPrime · 2025-02-21T02:47:54Z

tests/cases/rounding_decimal/round_decimal.test

+
+# negative_rounding: Examples with negative rounding
+round(2::dec<2,0>, -2::i32) = 0::dec<2,0>
+round(123::dec<2,0>, -2::i32) = 100::dec<2,0>


The last two lines should be:

round(123::dec<3,0>, -2::i32) = 100::dec<3,0>
round(8793::dec<4,0>, -2::i32) = 8800::dec<4,0>

EpsilonPrime · 2025-02-21T03:58:36Z

tests/cases/rounding_decimal/ceil_decimal.test

+### SUBSTRAIT_INCLUDE: '/extensions/functions_rounding_decimal.yaml'
+
+# basic: Basic examples without any special cases
+ceil(2.25::dec<8,2>) = 3::dec<7,0>


FWIW, DuckDB returns decimal<8,0> for the first two and decimal<2,0> for the last one (and decimal<8,0> for the floor tests too). There may be some variation between systems here.

EpsilonPrime · 2025-02-21T04:02:02Z

tests/cases/rounding_decimal/round_decimal.test

+### SUBSTRAIT_INCLUDE: '/extensions/functions_rounding_decimal.yaml'
+
+# basic: Basic examples without any special cases
+round(2::dec<2,0>, 2::i32) = 2::dec<3,0>


DuckDB returns:

2,0
8,1
2,0
3,0
4,0

andrew-coleman · 2025-02-25T14:58:18Z

Thanks @EpsilonPrime, that's helpful. It looks like the scale of the decimal returned by DuckDB is also a function of the second parameter of round() as is the case with Spark.

Given the following query:

select num, floor(num), ceil(num),
     round(num, -2), round(num, -1), round(num, 0),
     round(num, 1), round(num, 2), round(num, 3)
from (values (0.5), (-0.5), (999.9), (-999.9), (2.75)) as table(num)

Spark produces the following output and type schema:

+-------+----------+---------+--------------+--------------+-------------+-------------+-------------+-------------+
|    num|FLOOR(num)|CEIL(num)|round(num, -2)|round(num, -1)|round(num, 0)|round(num, 1)|round(num, 2)|round(num, 3)|
+-------+----------+---------+--------------+--------------+-------------+-------------+-------------+-------------+
|   0.50|         0|        1|             0|             0|            1|          0.5|         0.50|         0.50|
|  -0.50|        -1|        0|             0|             0|           -1|         -0.5|        -0.50|        -0.50|
| 999.90|       999|     1000|          1000|          1000|         1000|        999.9|       999.90|       999.90|
|-999.90|     -1000|     -999|         -1000|         -1000|        -1000|       -999.9|      -999.90|      -999.90|
|   2.75|         2|        3|             0|             0|            3|          2.8|         2.75|         2.75|
+-------+----------+---------+--------------+--------------+-------------+-------------+-------------+-------------+

root
 |-- num: decimal(5,2) (nullable = false)
 |-- FLOOR(num): decimal(4,0) (nullable = true)
 |-- CEIL(num): decimal(4,0) (nullable = true)
 |-- round(num, -2): decimal(4,0) (nullable = true)
 |-- round(num, -1): decimal(4,0) (nullable = true)
 |-- round(num, 0): decimal(4,0) (nullable = true)
 |-- round(num, 1): decimal(5,1) (nullable = true)
 |-- round(num, 2): decimal(6,2) (nullable = true)
 |-- round(num, 3): decimal(6,2) (nullable = true)

I propose, then, that the given an input of type decimal<P, S>, the return type expression should be:

        return: |-
          precision = min(P + 1, 38)
          decimal?<precision, S>

Which is not necessarily what it actually returns, but is the maximum precision/scale of what it could return (taking into account your earlier comment).

How does that sound?

The round function has a number of variants to support different numeric types. This commit adds support for rounding decimals. The precision of the resultant decimal type is one greater than the precision of the input decimal to allow for rounding up to the next decimal digit. The scale of the resultant decimal type is the same as the input type since the result of rounding cannot add any further decimal places. Signed-off-by: Andrew Coleman <[email protected]>

EpsilonPrime

Works for me.

andrew-coleman · 2025-03-05T13:46:56Z

Thanks for the approval @EpsilonPrime.
Would it be possible to merge this? I can then follow up with the Java implementation.
cc @vbarua @jacques-n @westonpace @cpcloud

Thanks!

andrew-coleman · 2025-03-11T09:07:27Z

@jacques-n, would you be happy with this? ^^

westonpace

If I understand the two points correctly they are:

If we had a notion of constant arguments we could potentially express a more accurate return type.
Some engines choose to return data-dependent return types that are more narrow than the worst-case scenario.

Regarding (1) I don't think this corner case warrants the additional complexity of constant arguments. I'm not aware of any significant planner improvements that could be made by more clearly knowing the decimal scale/precision.

Regarding (2) I don't think this is a Substrait concern. Engines are always permitted to return more efficient encodings of data as long as the values fit the range defined by the return type. It's not possible for Substrait (which doesn't have access to data) to make a better decision with the data it does have.

So I approve.

The only thing that might be nice to clarify is what should happen if a 38 digit number consisting of all 9's is rounded up. Does it saturate (return the number unchanged) or emit an error?

andrew-coleman · 2025-03-13T13:47:52Z

The only thing that might be nice to clarify is what should happen if a 38 digit number consisting of all 9's is rounded up. Does it saturate (return the number unchanged) or emit an error?

I guess that would be implementation specific?

In the case of Spark, it throws an error:

spark.sql("select round(99999999999999999999999999999999999999, -1)").show()

org.apache.spark.SparkArithmeticException: [DECIMAL_PRECISION_EXCEEDS_MAX_PRECISION] Decimal precision 39 exceeds max precision 38.

westonpace · 2025-03-13T15:12:39Z

Ok. Let's worry about it in a follow-up. We can either document the error behavior or add an overflow option as we learn more.

jacques-n · 2025-03-15T16:39:15Z

If we had a notion of constant arguments

We do. We just haven't figured out yet how to expose to type derivation.

Some engines choose to return data-dependent return types that are more narrow than the worst-case scenario.

Then those engines don't implement these functions.

westonpace · 2025-03-19T17:10:28Z

We do. We just haven't figured out yet how to expose to type derivation.

Good point. I thought we removed them but I was thinking of optional arguments. Why can't we expose constant arguments to type derivation? It seems the concern was:

The s parameter in round could be not a constant although I suspect some backends only will allow constants here.
@EpsilonPrime

But if we mark s as a constant argument then this seems an invalid concern?

Then those engines don't implement these functions.
@jacques-n

In that case it doesn't sound like there are any engines that implement these functions? I read the following as "engine decides return type based on data":

 |-- round(num, 2): decimal(6,2) (nullable = true)
 |-- round(num, 3): decimal(6,2) (nullable = true)

andrew-coleman requested review from jacques-n, cpcloud, westonpace, EpsilonPrime and vbarua as code owners September 26, 2024 15:43

EpsilonPrime reviewed Oct 1, 2024

View reviewed changes

EpsilonPrime self-assigned this Dec 11, 2024

andrew-coleman force-pushed the round branch 2 times, most recently from 97b20bd to 1dbdb4b Compare February 6, 2025 09:43

andrew-coleman requested a review from EpsilonPrime February 12, 2025 10:36

EpsilonPrime reviewed Feb 14, 2025

View reviewed changes

EpsilonPrime reviewed Feb 21, 2025

View reviewed changes

andrew-coleman force-pushed the round branch from 1dbdb4b to 92c964b Compare February 25, 2025 15:33

andrew-coleman requested a review from EpsilonPrime February 28, 2025 07:38

EpsilonPrime approved these changes Mar 4, 2025

View reviewed changes

westonpace approved these changes Mar 13, 2025

View reviewed changes

westonpace merged commit eb696b5 into substrait-io:main Mar 13, 2025
13 checks passed

andrew-coleman deleted the round branch March 13, 2025 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add decimal argument support to round function #713

feat: add decimal argument support to round function #713

andrew-coleman commented Sep 26, 2024

EpsilonPrime Oct 1, 2024

andrew-coleman Oct 15, 2024

EpsilonPrime Oct 1, 2024

EpsilonPrime Oct 1, 2024

andrew-coleman Oct 15, 2024

jacques-n Oct 16, 2024 •

edited

Loading

andrew-coleman commented Feb 6, 2025

EpsilonPrime left a comment

EpsilonPrime Feb 14, 2025

EpsilonPrime Feb 21, 2025

EpsilonPrime Feb 21, 2025

EpsilonPrime Feb 21, 2025

EpsilonPrime Feb 21, 2025

EpsilonPrime Feb 21, 2025

andrew-coleman commented Feb 25, 2025

EpsilonPrime left a comment

andrew-coleman commented Mar 5, 2025

andrew-coleman commented Mar 11, 2025

westonpace left a comment

andrew-coleman commented Mar 13, 2025

westonpace commented Mar 13, 2025

jacques-n commented Mar 15, 2025

westonpace commented Mar 19, 2025

feat: add decimal argument support to round function #713

feat: add decimal argument support to round function #713

Conversation

andrew-coleman commented Sep 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacques-n Oct 16, 2024 • edited Loading

Choose a reason for hiding this comment

andrew-coleman commented Feb 6, 2025

EpsilonPrime left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrew-coleman commented Feb 25, 2025

EpsilonPrime left a comment

Choose a reason for hiding this comment

andrew-coleman commented Mar 5, 2025

andrew-coleman commented Mar 11, 2025

westonpace left a comment

Choose a reason for hiding this comment

andrew-coleman commented Mar 13, 2025

westonpace commented Mar 13, 2025

jacques-n commented Mar 15, 2025

westonpace commented Mar 19, 2025

jacques-n Oct 16, 2024 •

edited

Loading