-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix timestamp with timezone mapping in iceberg type converter #23534
base: master
Are you sure you want to change the base?
Fix timestamp with timezone mapping in iceberg type converter #23534
Conversation
8e1716e
to
38919d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some tests with column types Timestamp with timezone
?
+1. Let's add some end to end tests. Additionally, we may want to remove the validation added here, since I believe we should support this properly now: #22926 |
Also add test cases involving |
Just to clarify, do you want the timestamp with timezone to be a part of the table that is being partitioned or the type of the partition column? Currently it is not supported as one of the types for partition columns. |
Yes, that's right. But I think it's better for us to first figure out how to handle it in these cases when we start to support it. A very important question is, what format of long type data do we plan to actually store in data files for If we store the data following Iceberg spec, then we will lose the information of time zone; and if we store the data following Presto's format, then we may meet problems involving cross-engine compatibility. |
@hantangwangd I don't see this as a choice, we must store the data according to the Iceberg spec, which means we'll lose the embedded time zone information. This is fine--semantically, it's the same thing, and the only thing that might be confusing is the user, when retrieving stored Iceberg timestamps, will see that the timezones have been adjusted to UTC. But the point in time values will remain the same, and this is merely a limitation of the Iceberg table format. |
@tdcmeehan Completely agree with your viewpoint. That means we need to perform transformation logics for data with type of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nits. One question about removing the verifyTypeSupported
method
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergHiveMetadata.java
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergNativeMetadata.java
Show resolved
Hide resolved
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergTypes.java
Outdated
Show resolved
Hide resolved
a562f45
to
d877bcc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me.
- Please add a document entry in https://prestodb.io/docs/current/connector/iceberg.html#type-mapping
- Squash all the commits into 1 "Fix timestamp with timezone mapping in iceberg type converter"
@@ -117,6 +117,10 @@ public static Type toPrestoType(org.apache.iceberg.types.Type type, TypeManager | |||
case TIME: | |||
return TimeType.TIME; | |||
case TIMESTAMP: | |||
Types.TimestampType timestampType = (Types.TimestampType) type.asPrimitiveType(); | |||
if (timestampType.shouldAdjustToUTC()) { | |||
return TimestampWithTimeZoneType.TIMESTAMP_WITH_TIME_ZONE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add static import for this
this.batchReaderEnabledQueryRunner = createIcebergQueryRunner( | ||
ImmutableMap.of(), | ||
ImmutableMap.of(), | ||
ImmutableMap.of(PARQUET_BATCH_READ_OPTIMIZATION_ENABLED, "true"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is no need to refactor the IcebergQueryRunner.createIcebergQueryRunner(...)
, just set hive.parquet-batch-read-optimization-enabled
to true
in extraProperties
would be ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there is another way, but adding this to extra properties actually causes the tests to break since it is an unused property there. This sets it as a configuration property, but it needs to be a session property. The session is actually passed to the distributed query runner builder in it's constructor, so we need some way to add properties to that session before building the runner. This is why I decided to make changes to IcebergQueryRunner
. Please let me know if this clears things up, if you have another approach in mind here I would certainly be open to it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, sorry for the overlooking that hive.parquet-batch-read-optimization-enabled
belongs to the config in presto-hive-common
. Since presto-main
does not dependent on presto-hive-common
, TestingPrestoServer
does not inject HiveCommonModule
either. So you are right currently we cannot set hive.parquet-batch-read-optimization-enabled
in extraProperties
here.
else { | ||
type.writeLong(blockBuilder, utcMillis); | ||
} | ||
type.writeLong(blockBuilder, utcMillis); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The real long type data for timestamp with tz stored in parquet file is of presto's format? This may lead to many problems. As guided by @tdcmeehan, we should store the data in Iceberg format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to modify the writer as well for this, but I need to verify. I noticed that we were improperly packing the date in the previous logic which led to incorrect values being read.
For example the following set of queries produced bad results:
create table t(t TIMESTAMP WITH TIME ZONE);
INSERT INTO t VALUES TIMESTAMP '1980-12-08 0:10:0 America/Los_Angeles';
presto:tpch> SELECT * FROM t;
t
-------------------------------
+46764-05-25 18:40:01.825 UTC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to modify the writer as well for this.
Agree, the bad results in your example seems to be the problem caused by writing data with the incorrect format. So we need to customize a dedicated writer logic for timestamp with tz, which transform the long values encoded in presto's format to long values encoded in iceberg's format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some changes that I thought would fix this, added the unpackMillisUtc
method to a new TimestampWithTimezoneValueWriter
that I added to parquet, but we are losing all timezone information in the unpacking bit shift. This theoretically shouldn't be a problem since we are supposed to be storing millisUtc
in the first section of the bits, and the last 12 are reserved for whatever timezone we want to display to the user. However this is clearly not the case, since the filter operations are failing, the program expects the millis UTC to be transformed based on the timezone info, to get an actual millis UTC.
This is some pretty misleading variable naming and I think it leaves us with two options...
- leave the system the same and build methods on top of it that apply the timezone part of the ts with tz to the millis part to get a correct instant (and change the variable names)
- change the way ts with tz is read so at the time of unpacking we already have the millis UTC correct.
Sorry if this is redundant, I couldn't find documentation about how iceberg stores ts w tz under the hood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Referring to https://iceberg.apache.org/spec/#primitive-types, Iceberg defines timestamp tz as follows:
Timestamp, microsecond precision, with timezone
Timestamp values with time zone represent a point in time: values are stored as UTC and do not retain a source time zone (2017-11-16 17:10:34 PST is stored/retrieved as 2017-11-17 01:10:34 UTC and these values are considered identical).
So it just contains UTC values in micros and do not retain the source time zone.
By the way, did the filtering operation fail because we forgot to do the same convert for timestamp with tz in ExpressionConverter.getIcebergLiteralValue(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the filtering operation. I had to add TimestampWithTimezoneType
to the cases in expression converter but so far I have just treated it like the other date time types that are included. I tried adding unpacking to the expression converter for ts w tz and now we have the regular tests passing, but failing with the batch reader enabled. Looking into this now
Edit: fixed by adding the long value conversion (packing) to the plain batch values decoder
69a4878
to
361d23a
Compare
@@ -60,6 +60,9 @@ interface Int64TimeAndTimestampMicrosValuesDecoder | |||
void readNext(long[] values, int offset, int length) | |||
throws IOException; | |||
|
|||
void readNextWithTimezone(long[] values, int offset, int length) | |||
throws IOException; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary because the decoder is accesses the individual raw values from parquet, not the batch reader. The batch reader has the column descriptor (metadata) which should tell it whether or not to pack the value with timezone, like is done in the regular column reader. The actual packing should be done within the decoder, so there needs to be a way for decoders to dynamically switch between with and without timezone mode.
The approach I took here is just to copy the readNext method from each implementation and add packDateTimewithZone
. A stateful approach (instance variable bool withTimestamp or something) would probably be more efficient for future development and code execution but for now I am seeing if this implementation works.
79d53ce
to
607edf0
Compare
a381447
to
c7deb43
Compare
public Object[][] createTestTimestampWithTimezoneData() | ||
{ | ||
return new Object[][] { | ||
{getQueryRunner()}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we need a whole separate test for this. Can't we just create a dataProvider which passes in true/false values and lets us construct a valid session in the beginning of the test method? Then you can pass the session to all of the execute/assertQuery methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on this please? I'm not sure what you are referring to as the separate test. I can have the data provider pass in one true and one false value and add a condition inside the test function itself, is that what you are asking for here? If so what purpose would that serve? Thanks
|
||
boolean isWithTimezone(); | ||
|
||
void setWithTimezone(boolean withTimezone); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API for this is an anti-pattern. We should aim for the reader to be immutable. I think the implementors of this ValuesDecoder
should have a constructor where this is set instead.
@@ -69,7 +91,13 @@ public void readNext(long[] values, int offset, int length) | |||
final LongDictionary localDictionary = dictionary; | |||
for (int srcIndex = currentBuffer.length - currentCount; destinationIndex < endIndex; srcIndex++) { | |||
long dictionaryValue = localDictionary.decodeToLong(localBuffer[srcIndex]); | |||
values[destinationIndex++] = MICROSECONDS.toMillis(dictionaryValue); | |||
long millisValue = MICROSECONDS.toMillis(dictionaryValue); | |||
if (isWithTimezone()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than having a branch statement inside the hot loop for a lot of these readers, I think we should set the "reading function" in the constructor of the reader as the reading behavior shouldn't change. Do you see a performance impact when this conditional is introduced?
for (int i = 0; i < block.getPositionCount(); i++) { | ||
if (!block.isNull(i)) { | ||
long value = unpackMillisUtc(type.getLong(block, i)); | ||
long scaledValue = writeMicroseconds ? MILLISECONDS.toMicros(value) : value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment with avoiding branching inside hot loop
3d1e9b9
to
8b3ce9e
Compare
Please let me know when this or other doc is added to the PR, and I'll be happy to review the documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull branch, local doc build, looks good. Thanks!
7c4c8b6
to
85d6261
Compare
* - ``TIMESTAMP`` | ||
- ``TIMESTAMP_WITH_TIMEZONE`` | ||
* - ``STRING`` | ||
- ``VARCHAR`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also need the mapping for the table below (prestodb -> iceberg)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'm not sure about this. The below table contains...
* - ``TIMESTAMP``
- ``TIMESTAMP WITHOUT ZONE``
* - ``TIMESTAMP WITH TIMEZONE``
- ``TIMESTAMP WITH ZONE``
I suppose with and without zone just represent isAdjustedUTC
in this case? Either way it looks like this part is covered. Maybe we should standardize the way we discuss these in the documentation though
} | ||
|
||
@Override | ||
public void readNext(long[] values, int offset, int length) | ||
{ | ||
int endOffset = offset + length; | ||
for (int i = offset; i < endOffset; i++) { | ||
values[i] = MICROSECONDS.toMillis(innerReader.readLong()); | ||
long curValue = MICROSECONDS.toMillis(innerReader.readLong()); | ||
packFunction.pack(curValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we need to use the return value of the pack
function?
07b1613
to
6feb435
Compare
9e8764b
to
e2f6634
Compare
24a43af
to
4c4117b
Compare
be489f1
to
8add1c4
Compare
8add1c4
to
67895c0
Compare
Description
Fixes bug described in #23529