-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Substrait plan read relation baseSchema does not include the struct with type information #12244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I belive this was fixed in #12245 ? |
The behavior was modified in #12245 and the original issue looks addressed but the plans I'm getting don't validate. DataFusion currently hardcodes struct nullability to
The "baseSchema": {
"names": [
"species",
"island",
"bill_length_mm",
"bill_depth_mm",
"body_mass_g",
"sex",
"year"
],
"struct": {
"types": [
{
"string": {
"nullability": "NULLABILITY_NULLABLE"
}
},
{
"string": {
"nullability": "NULLABILITY_NULLABLE"
}
},
{
"fp64": {
"nullability": "NULLABILITY_NULLABLE"
}
},
{
"fp64": {
"nullability": "NULLABILITY_NULLABLE"
}
},
{
"i32": {
"nullability": "NULLABILITY_NULLABLE"
}
},
{
"string": {
"nullability": "NULLABILITY_NULLABLE"
}
},
{
"i32": {
"nullability": "NULLABILITY_NULLABLE"
}
}
]
}
}, The above implicitly sets the nullability on the baseSchema to
I can get the plan to validate if I set nullability to
Is the validator right and should DataFusion change its behavior here? |
I didn‘t see something in substrait.io where it‘d be explicitly said, but the example there (https://substrait.io/tutorial/sql_to_substrait/#types-and-schemas) does use NULLABILITY_REQUIRED. I don’t think it makes any difference in reality, given it’s the fields inside that struct type that matter, not the root struct itself. If the validator is happier with NULLABILITY_REQUIRED, fine by me to change it to that. |
Thanks @Blizzara. I'll file a PR to change this soon and we can close this issue out. I asked for some clarification on the Substrait Slack so we can be sure it's the right change and I'll create a PR once I'm sure. Hopefully also update the Substrait docs too. |
I was able to confirm this is probably something Substrait should make required and put up a PR with the change at #15011. That should close this issue out. |
Describe the bug
Datafusion produces substrait plans that do not include a struct with type information
It should look more like this to be valid
To Reproduce
Generate any substrait plan that includes a read relation and you'll be able to see that the plan output doesn't include the struct field with type information in the baseSchema.
base_schema is a
NamedStruct
https://substrait.io/relations/logical_relations/#__tabbed_1_1
https://substrait.io/types/named_structs/
Expected behavior
No response
Additional context
You can also vaidate plans by running them through the substrait-validator
The text was updated successfully, but these errors were encountered: