-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time zone variant calculator: does it let us fully handle zoned datetime formatting? #5466
Comments
Instead of '1 less' couldn't you query the tz data to look for a transition from that data and use it? In other words, couldn't your table have both a standard offset and a daylight offset?
Actually, querying the offset table for that exact time |
My goal is, assuming that an IXDTF string is correct (has the correct offset for the given date, time, and time zone), format that data without relying directly on the TZDB at runtime. I can store both the standard offset and daylight offset for each time zone. I guess my questions then would be:
|
|
Actually I guess the counter example is when a city switches from one metazone to another metazone, not just changing its transition dates, such as what happened last year in Chihuahua, Mexico, which switched from Mountain Time to Central Time https://www.timeanddate.com/time/zone/mexico/chihuahua So maybe this mapping needs to be from metazones, not time zones, to what their standard and daylight offsets are? |
a metazone's offsets are valid for that zone for a certain time period. So the Mexico_Pacific and America_Central offsets will be different. https://github.com/eggert/tz/blob/main/northamerica#L2731-L2732 <timezone type="America/Chihuahua">
<usesMetazone to="1998-04-05 09:00" mzone="America_Central"/>
<usesMetazone to="2022-10-30 08:00" from="1998-04-05 09:00" mzone="Mexico_Pacific"/>
<usesMetazone from="2022-10-30 08:00" mzone="America_Central"/>
</timezone> |
Does a particular metazone always have the same offsets corresponding to its standard and daylight variants? |
It seems that ICU4C determines the zone variant by reading "is the current datetime DST or not" from the TZDB. That bit appears fetchable from tzif, and it is in the tzif crate: https://unicode-org.github.io/icu4x/rustdoc/tzif/data/tzif/struct.LocalTimeTypeRecord.html I think my previous question though is still a valid question to ask. Does a particular metazone always have the same offsets corresponding to its standard and daylight variants? That could perhaps be data that could be added to CLDR. Also, regarding whether the DST shift should be fixed at 1 hour: it seems that the ICU4C code currently assumes this in multiple places, such as https://github.com/unicode-org/icu/blob/eda184e6af63d6eee1b3a59c61d1695eef44fcb4/icu4c/source/i18n/timezone.cpp#L1241 |
My favorite counter-example to this is
And then there is also the case of Ireland, whose DST shift is inverted from what's typical:
As you noted, TZ strings invert the sign. So |
FWIW, here's a markdown table of the output of |
It's already been noted regarding the sign in the POSIX tz string. But just found the below quote in the TZ Variable section of the GNU C LIbrary manual.
|
I think the question "how to set the "ampa": {
"dt": "Pacific Daylight Time",
"st": "Pacific Standard Time"
} store "ampa": {
"-7:00": "Pacific Daylight Time",
"-8:00": "Pacific Standard Time"
} This doesn't require any additional lookup at runtime, as we already have the offset, and naturally handles any kind of DST (even multiple). |
I agree that I think data in this format would be ideal. "ampa": {
"-7:00": "Pacific Daylight Time",
"-8:00": "Pacific Standard Time"
} This data could be added to However, there are a few things to consider: 1) Has a metazone ever changed its associated time variants? If not, the data is straightforward, exactly as shown above. If so, this data could still reasonably be captured and added to the file. Consider a hypothetical situation where "ampa": {
"-7:00": "Pacific Daylight Time",
"-8:00": "Pacific Standard Time"
},
"amce": {
"usesTimeVariants": {
"-5:00": "Central Daylight Time",
"-6:00": "Central Standard Time",
"_to": "2024-09-06 00:00"
},
"usesTimeVariants": {
"-5:00": "Central Daylight Time",
"-5:30": "Central Standard Time",
"_from": "2024-09-06 00:00",
"_to": "2025-09-06 00:00"
},
"usesTimeVariants": {
"-5:00": "Central Daylight Time",
"-6:00": "Central Standard Time",
"_from": "2025-09-06 00:00"
},
}, This format seems reasonable and is the same structure as how Time Zone ID's are mapped to MetaZones in the same file. 2) What would happen if a time zone within an associated metazone observes the same time-variants offsets, but transitions among them at different datetimes than other zones within that metazone? One relevant example of this is the recent proposal for some of the West Coast states to observe permanent Daylight Savings Time: https://www.opb.org/article/2024/02/20/oregon-bill-to-end-daylight-saving-time-fails-legislature/ If this were the case, then the offset would remain This all seems okay to me. 3) What would happen if an individual time zone wants to use use different offsets than the current time-variant offsets established by the metazone? I am not aware of any such case like this that exists, but I think there are two reasonable solutions: A) That time zone could switch to a new metazone (either new or preexisting) that matches its desired offsets. This happens all the time. B) We could add that offset data to CLDR. "ampa": {
"-7:00": "Pacific Daylight Time",
"-7:30": "Pacific Cool New Time",
"-8:00": "Pacific Standard Time"
}, The time zones that use the prior offsets would go on as usual, and the time zone with the new offset would have its new localized name. I recall a conversation with @sffc years ago that perhaps A format such as this would allow us to be agnostic of naming conventions, instead tying the internationalized name of the variant to an offset. However, there are a few more considerations to take into account in this case: 3.1) What if a time zone wants to add a new offset, but have the same localized name as another offset? "ampa": {
"-7:00": "Pacific Daylight Time",
"-7:30": "Pacific Standard Time",
"-8:00": "Pacific Standard Time"
}, This probably wouldn't cause a data ambiguity issue, but I think it would be incredibly confusing, as "Pacific Standard Time" would now be semantically ambiguous. This should not be allowed. 3.2) What if a metazone wants to add a new localized name for an offset that is already present? "ampa": {
"-7:00": "Pacific Daylight Time",
"-7:00": "Pacific Cool New Time",
"-8:00": "Pacific Standard Time"
}, This would cause a data issue and should not be allowed. Conclusion I don't feel that I have the cycles to take on this work myself right now, but I would support collaborating on making this data available (if people agree it is sound). Here is an example of when the short metazone identifiers were added to that same CLDR file: https://unicode-org.atlassian.net/browse/CLDR-14607 Filing an issue on Jira would be a good next step if we reach a consensus here. |
All questions of the form "what if a timezone wants to do something different than the rest of the metazone" should be answered by creating a new metazone. My expectation is that all zones in a metazone fully agree on offsets today and in the future, but maybe that's not guaranteed. |
That would be much simpler and more stringent. I would agree with imposing these restrictions. I was just trying to think of all the cases. |
Another counter-example to the 60-minute transition: https://www.atlasobscura.com/places/lord-howe-islands-time |
I agree with the workaround of creating a new metazone if the offset invariants ever break down. Metazones are purely a CLDR/ICU construction, not TZDB, so we have a lot of latitude for how we handle them. For example, if all US West Coast states decided to abolish daylight savings time and that Pacific Time should be GMT-7 instead of GMT-8 (a proposal I don't support but which is good for illustrative purposes), then we would need to create a new metazone such as It is highly likely that such changes already occurred in the last 50 years, and we should probably look for them in datagen. |
As far as data sources are concerned, it seems perfectly fine to me for this data to be derived from TZDB. Currently ICU4C uses TZDB to determine which zone variant to use when formatting, so if ICU4X used TZDB during datagen, then we should be able to guarantee consistency with ICU4C. ICU4X could manually spawn new "private use" metazones as needed. |
OK, one other issue I realized. There are numerous countries that use their own country name as the metazone. The first one I pulled is "kyrg", Kyrgyzstan: https://en.wikipedia.org/wiki/Kyrgyzstan_Time Kyrgyzstan has switched between UTC+5 and UTC+6 multiple times, but presumably the metazone has not changed. |
Yeah, this was gonna be my concern: cases where oddball metazones are tidally locked to a country. I assume this fact means that the "use the offset only" idea won't work? |
I think it can still "work"; it's just something we need to factor in. A few ways of resolving this:
|
One other note: I very frequently encounter people using "PST" to mean Pacific Time, not specifically Pacific Standard Time, and similarly with EST and CST and others. For example, it is very common to see people say "let's meet in San Francisco on September 7 at 10am PST", and if you show up at that time according to the TZDB/CLDR definition, unless it is a time zone nerds meetup, you will be an hour late. What this means: this is all so imprecise anyway, so let's just land something reasonable and otherwise encourage people to use city-based time zone names. Maybe CLDR can focus on adding a short location format, such as "LA Time" or "NYC Time" to use instead of the ambiguous things it currently uses. |
Normal people (other than those who are super-familiar with how IANA timezones work, which is a very small Venn diagram overlap with "normal people") don't use "LA Time" or "NYC time". So I'm not sure it'd make sense to add that to CLDR. I understand the desire for consistency, but this seems to be a case where there's no evading the inconsistency of human language use. |
My hypothesis is that "normal people" would understand what you meant by "LA Time", even if they haven't often seen it before, and it is also the most unambiguous definition for an i18n library to produce. |
Random comment for earlier replies.
I think the concept of ZoneVariant in the struct is problematic. |
Random observation:
|
These are all technically correct, though confusing. They're both Mountain Time. It's just that Denver is in Mountain Daylight Time and Phoenix is in Mountain Standard Time because Arizona does not observe DST. I would argue that this is a reason why populating the EDIT: Though, to clarify, the above "Mountain Time" formats are "Generic non-location format". The UTS-35 spec defines several formats with fallbacking: Generic non-location format
Generic partial location format
Generic location format
Specific non-location format
Localized GMT format
ISO 8601 time zone formats
It was years ago, so I'm not sure if the current implementations within ICU4X are exactly the same, but I tried to implement the fallbacking rules according to the spec. The above strings have enough information available to utilize either Generic location format e.g. Phoenix Time, or Generic partial location format e.g. Mountain Time (Denver). |
FWIW, I think this is a nice solution to this problem described above, where if there's a colloquial name for a time zone like "Pacific Time", it's still used but with a disambiguator for less common cases like Arizona. |
The observation about generic non-location being ambiguous is well known and largely working as intended. It should only be used if the location of the event is known from context. Here is the language I wrote for how to select your time zone style in semantic skeleta:
|
Example use cases where generic time zone style is acceptable:
Note: In most or all of these cases, it would be acceptable to say "local time" or simply drop the qualifier. Example where generic time is not acceptable and a different style should be used, unless the location is otherwise known from context:
My point is that there are enough legitimate use cases for generic non-location format, but since it could introduce ambiguity, it should only be used if the developer opts in. |
This seems to be the non-ambiguous version of the generic non-location format. We don't seem to support this in ICU4X, however? What we need for full correctness is a If there is sufficient overlap between the offset list and the metazone list for each location, they could be combined, as the bulk of these structures will be the keys. |
Re generic partial location format, it sounds like we're meant to detect when a metazone is |
LGTM |
#5466 Supersedes #5515 --------- Co-authored-by: Shane F. Carr <[email protected]>
At its core, ICU4X time zones have 4 fields, which fully determine the strings to be selected for formatting:
Let's say someone gives us an IXDTF string like:
2024-08-29T11:53:18-0700[America/Los_Angeles]
From this string, we can already populate two fields:
We have MetazoneCalculator, which takes the time portion of the string and lets us calculate the metazone field:
However, how do we calculate the ZoneVariant field?
I learned today that tzif files, at least version 2 and 3 files, contain a footer that looks like this:
The "8" in that footer means that this time zone has a standard offset of 8 hours behind UTC. (note that the offset is negated from what we normally see)
Does this mean that we could build a table with standard offsets and use that table to generate zone variants? For example, we could create a data file with the following data, which can all be generated from the TZDB:
Then, when reading the IXDTF string, we use the following algorithm to select the zone variant:
Mechanically, we can generate this table by using a combination of our own tzif crate, which contains a struct ZoneVariantInfo with this information pre-parsed, and a tzif source, which could potentially be jiff_tzdb.
Note: the Time Zone ID would probably be stored in BCP-47 and Standard Offset would be bitpacked to an i8. It's possible we could stuff this data into one of our existing data structs to be more efficient.
Note: I assume that this mapping of time zone IDs to standard offsets is fairly stable over time, such that we do not need to worry about shipping updates at a cadence different than normal CLDR data updates.
Please help me understand: is the proposed algorithm correct and robust, or is it flawed in some edge cases?
@nekevss @leftmostcat @nordzilla @yumaoka @justingrant
The text was updated successfully, but these errors were encountered: