Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why isn't a datetime string like 2024-08-24T14:00:00-05 allowed to be parsed into a ZonedDateTime? #2930

Closed
BurntSushi opened this issue Aug 25, 2024 · 17 comments
Labels

Comments

@BurntSushi
Copy link

In particular, one possible result of parsing 2024-08-24T14:00:00-05 would be as if it were written 2024-08-24T14:00:00-05[-05]?

Some users of Jiff are running into this restriction and I guess it seems somewhat limiting for the sake of being limiting in some respect. In particular, a ZonedDateTime isn't just limited to an instant in a time zone, but it can be an instant in a fixed offset as well. So it seems like it would be fine to accept just a 2024-08-24T14:00:00-05 as-is. Is this meant more as a gentle nudging toward using RFC 9557 everywhere? Or is there a footgun here that this restriction is preventing that I'm not seeing?

And if Jiff were to allow parsing of 2024-08-24T14:00:00-05 directly into its analog of ZonedDateTime, how bad of a deviation do you think that would be in your opinion?

@nekevss
Copy link

nekevss commented Aug 25, 2024

Huh, isn't 2024-08-24T14:00:00-05[-05] supported by the specification?

Temporal grammar lists UTCOffset as taking ASCIISign Hour as a valid, and the same goes for RFFC3339 and RFC9557 by extension. Although, I could definitely be missing something.

FWIW, I did test the below cases in ixdtf locally, and they seem to pass as well.

#[test]
fn test_hour_utc_offset() {
    let tz_test = "2024-08-24T14:00:00-05[-05]";
    let result = IxdtfParser::from_str(&tz_test).parse();
    assert!(result.is_ok());

    let tz_test = "2024-08-24T14:00:00-05";
    let result = IxdtfParser::from_str(&tz_test).parse();
    assert!(result.is_ok());
}

So at least from that perspective, there wouldn't be any deviation 😄

@BurntSushi
Copy link
Author

Huh, isn't 2024-08-24T14:00:00-05[-05] supported by the specification?

Just to be clear, do you mean 2024-08-24T14:00:00-05?

If you try to parse 2024-08-24T14:00:00-05 using Temporal.ZonedDateTime.from on the documentation page, it fails with an error indicating that a bracketed time zone is required. Is that version of Temporal out of sync with the spec?

@BurntSushi
Copy link
Author

It's not the grammar I'm asking about here. It makes sense that the lower level parsing routine accepts it. It's about what ZonedDateTime can be parsed from specifically.

@nekevss
Copy link

nekevss commented Aug 25, 2024

Oh, sorry! 😅 The documentation isn't out of sync. ZonedDateTime in Temporal does require TimeZoneAnnotation according to the grammar.

But I'd 100% defer to the champions on this in general.

@justingrant
Copy link
Collaborator

is there a footgun here that this restriction is preventing that I'm not seeing?

Yep! There is definitely a footgun.

If you have a ZonedDateTime object, then callers expect that they can derive other ZDT objects from it, e.g. by adding a Duration, by using with, etc. In order for those derived objects to be accurate, the original ZDT needs to know how offsets will change for any Instant in the resulting derived object. In other words, the original ZDT needs to know its time zone in order for derived objects to be correct.

If Temporal allowed parsing RFC3339 strings like 2024-08-24T14:00:00-05:00 into a ZonedDateTime, then any code that received a ZDT object could no longer know whether the time zone in the ZDT was an actual time zone, or whether an ignorant or lazy developer just parsed a timezone-less timestamp string.

In other words, if a developer tries to parse a timezone-less string into a ZDT, it's almost certainly a programmer bug not an intentional bypassing of the ZDT guardrails. Temporal is pretty agressive across the API at making "likely bug" cases illegal. For example, you also can't parse a string like 2024-08-24T14:00:00Z into a PlainDateTime because per RFC9557 the Z means "I don't know the time zone" so it would be a mistake for developers to assume that the calendar date or wall-clock time of that string is meaningful.

Similarly, to prevent programmer bugs from timezone-less strings, we disallow parsing strings into a ZDT unless there's a time zone annotation. This ensures that developers won't accidentally parse a timezone-less string and then treat the resulting ZDT as if it had a real time zone. They can still fake it using an offset time zone, but they have to opt into extra work to do it, which makes is somewhat more likely that they actually are intending to break the rules and are aware of the consequences of doing so: that downstream consumers of that ZDT can't rely on its time zone to derive more ZDT objects.

Instead, users who have RFC3339 strings should be parsing those strings into Temporal.Instant objects, and then using instant.withTimeZone() to add the actual time zone of the timestamp to create a ZDT that can be used to derive other ZDT objects.

Out of curiosity, what are the use cases where people are wanting to parse these strings into a ZDT? Why isn't Instant good enough?

If the need is just to get the individual components (offset, year, month, day, hour, ...) from an RFC3339 string, then a developer can use this (intentionally less ergonomic than regular parsing in order to discourage accidental use) code like this:

function getOffsetOfRFC3339String(s) {
  return Temporal.Instant.from(s).toZonedDateTimeISO(s).offset;
}
getOffsetOfRFC3339String('2024-08-24T14:00:00-05:00')
// => '-05:00'

I'd strongly advise against passing ZDTs parsed in this way to any other code, because it's not really a time zone and will break downstream code.

@justingrant
Copy link
Collaborator

And if Jiff were to allow parsing of 2024-08-24T14:00:00-05 directly into its analog of ZonedDateTime, how bad of a deviation do you think that would be in your opinion?

Sorry, forgot to answer this question directly in my earlier comment: I think this would be bad! It would mean that Jiff's ZDT could not be trusted by downstream callers, because there'd be no way to know if the time zone of the ZDT is a real human time zone, or a programmer bug where they parsed a timezone-less string as if it had a real time zone.

@BurntSushi
Copy link
Author

Thanks for the reply @justingrant! Much appreciated. I now understand the motivation on Temporal's side here much better.

If the need is just to get the individual components (offset, year, month, day, hour, ...) from an RFC3339 string, then a developer can use this (intentionally less ergonomic than regular parsing in order to discourage accidental use) code like this:

function getOffsetOfRFC3339String(s) {
  return Temporal.Instant.from(s).toZonedDateTimeISO(s).offset;
}
getOffsetOfRFC3339String('2024-08-24T14:00:00-05:00')
// => '-05:00'

Ah yeah this is interesting. The downside with this though is that it isn't just less ergonomic, but it's also more expensive since it appears to need to parse s twice. Jiff might have a different weighting of the trade-offs here than Temporal. With that said, you can achieve the equivalent with Jiff's strptime API.

I also didn't realize you could do that with the Temporal API. I'm not sure if the docs for Instant.toZonedDateTimeISO are out of sync or not, but I'm not sure that behavior is something I would have inferred from the current docs.

Out of curiosity, what are the use cases where people are wanting to parse these strings into a ZDT? Why isn't Instant good enough?

I am still trying to understand that myself. Here's one example. You'll notice that I phrase "I do not understand" in a few different ways. :-)

I filed this issue because I also wanted to make sure I understood things from the other side too here, even if the use cases end up not being compelling enough.

Temporal is pretty agressive across the API at making "likely bug" cases illegal.

Yeah this makes sense. It's a big part of what attracted me to Temporal. :-)

Sorry, forgot to answer this question directly in my earlier comment: I think this would be bad! It would mean that Jiff's ZDT could not be trusted by downstream callers, because there'd be no way to know if the time zone of the ZDT is a real human time zone, or a programmer bug where they parsed a timezone-less string as if it had a real time zone.

I think this is the crux of the matter. And in particular:

This ensures that developers won't accidentally parse a timezone-less string and then treat the resulting ZDT as if it had a real time zone.

The way I conceptualize this is that the [-05] in 2024-08-24T14:00:00-05[-05] is an extra assertion meant to convey, "yes, I know what I am doing, this is a timezone-less string but it's okay in this specific case."

My sense here is that it might be worth Jiff growing a opt-in option to parse a ZDT from a timezone-less RFC3339 instant (accepting a fixed offset, but not Zulu). I think this would match the spirit of "intentionally less ergonomic than regular parsing in order to discourage accidental use" but without requiring double-parsing. It would look something like this:

use jiff::fmt::temporal::DateTimeParser;

static PARSER: DateTimeParser = DateTimeParser::new().time_zone_annotation_requried(false);

let zdt = PARSER.parsed_zoned("2024-08-24T14:00:00-05").unwrap();
assert_eq!(zdt.offset(), jiff::tz::offset(-5));

This is a lot more verbose than the typical way to parse a ZDT in Jiff. That is, "2024-08-24T14:00:00-05".parse::<Zoned>() would still fail.

@justingrant
Copy link
Collaborator

I also didn't realize you could do that with the Temporal API. I'm not sure if the docs for Instant.toZonedDateTimeISO are out of sync or not, but I'm not sure that behavior is something I would have inferred from the current docs.

It's not specific to Instant.toZonedDateTimeISO](https://tc39.es/proposal-temporal/docs/instant.html#toZonedDateTimeISO). Anywhere time zones are accepted as a property or function argument, you can pass any of the following:

  • a time zone identifier string
  • a ZonedDateTime instance - its time zone will be used
  • an RFC 9557 string that includes a time zone annotation - the annotation will be used
  • an RFC 3339 string, in which case the offset will be used as the time zone

The way I conceptualize this is that the [-05] in 2024-08-24T14:00:00-05[-05] is an extra assertion meant to convey, "yes, I know what I am doing, this is a timezone-less string but it's okay in this specific case."

That's correct, and because the caller has gone through the extra effort to opt in, then ZDT can parse it.

I think this would match the spirit of "intentionally less ergonomic than regular parsing in order to discourage accidental use" but without requiring double-parsing.

I guess I would first ask: what use cases require a ZDT in this case as opposed to an Instant? What are callers trying to do with the returned ZDT that they can't do with an Instant?

If the issue is purely a parsing thing ("I want to know every individual field of an RFC 9557 string) then you may be better served by a parsing-only function instead of a function that returns a ZDT.

@BurntSushi
Copy link
Author

BurntSushi commented Aug 28, 2024

I guess I would first ask: what use cases require a ZDT in this case as opposed to an Instant? What are callers trying to do with the returned ZDT that they can't do with an Instant?

Yes, I agree. The use cases need to be flushed out more.

If the issue is purely a parsing thing ("I want to know every individual field of an RFC 9557 string) then you may be better served by a parsing-only function instead of a function that returns a ZDT.

That's a possibility too. Jiff supports this via its BrokenDownTime via the strptime API. And one can craft strptime-format strings that resemble the Temporal ISO 8601 grammar, but not every corner case. So it might make sense to have a similar BrokenDownTime type that can be parsed from Jiff's implementation of Temporal's ISO 8601 grammar. I think my hang-up there is that I believe there are ambiguous cases between "parse something that contains a date" and "parse something that is only a time." But I'd have to go back and check that.

@justingrant
Copy link
Collaborator

Yes, I agree. The use cases need to be flushed out more.

Yep, that's the place to start. Let's better understand what users are trying to do, and why, and then we can figure out if ZDT is the right solution for those cases. Do you want to post a comment here when you learn more about the use cases behind these asks?

@BurntSushi
Copy link
Author

Yes, will do.

@francois08
Copy link

francois08 commented Sep 11, 2024

@justingrant I see that you are a contributor to RFC 9557. Question:

If I want to store a datetime in the future (e.g. for a future meeting), like "2035-11-08 14:00:00" in Belgium,

  1. why isn't it possible to write:
2035-11-08T14:00:00[Europe/Brussels]

Omitting the offset, specifying at this local time in Europe/Brussels, whatever the offset from UTC at that time (we don't know for sure; who knows, DST is probably abolished in EU by that time). Instead, to be compliant, it seems I would need to "guess" the offset (as DST rules might change):

2035-11-08T14:00:00+01:00[Europe/Brussels]
  1. if we must write the offset, why can't we a specify a priority? E.g. in that same example:
2035-11-08T14:00:00+01:00[Europe/Brussels]

I know for sure the time zone identifier is correct, but not the offset. I want to mention: resolve the date time first by the provided time zone ID.
We have the exclamation mark [!Europe/Brussels] but that only says in a vague way that the inconsistency must be resolved.

@ptomato
Copy link
Collaborator

ptomato commented Sep 11, 2024

@francois08 RFC 9557 is for exact time strings, and 2035-11-08T14:00:00[Europe/Brussels] isn't an exact time. You can write a parser that resolves the wall-clock time in this string to an exact time in whatever way you prefer (and we have this functionality in Temporal) but the string itself is just serialized data; it can't carry a meaning, it only has a meaning in the context of an application that parses it.

@francois08
Copy link

francois08 commented Sep 11, 2024

@ptomato I assumed the point of this standard is to address the lack of time zone information in ISO 8601 and RFC 3339, where we lose daylight saving time details. But DST and exact times don't go well together as DST is uncertain/variable for future date times.

@ptomato
Copy link
Collaborator

ptomato commented Sep 11, 2024

@francois08 Nonetheless, 2035-11-08T14:00:00+01:00[Europe/Brussels] is still an exact time even if the time zone changes the offset: namely, 2035-11-08T13:00Z.

@justingrant
Copy link
Collaborator

Sorry for late reply, I've been traveling for work for a few weeks.

I assumed the point of this standard is to address the lack of time zone information in ISO 8601 and RFC 3339, where we lose daylight saving time details.

The specific goal of RFC 9557 is to extend RFC 3339 with time zone information. Therefore, the standard's format's data model was intended to be backwards compatible with RFC 3339, so applications can strip off all content after the first "[" and send it to another program that only knows about RFC 3339.

There was discussion during the development of the standard about whether to handle the case you mentioned (we called it "floating" timestamps) but ultimately we decided to keep the scope limited to only a superset of RFC 3339, which leaves space for a future standard to define how floating timestamps should be handled. (And ensured that RFC 9557, which had already taken years, would finally ship!)

As @ptomato notes above, Temporal does accept formats like 2035-11-08T14:00:00[Europe/Brussels] without an offset in the string. Temporal in general allows components of RFC 9557 to be omitted if there are sensible defaults that Temporal can apply. For example, you can also omit the time: 2035-11-08[Europe/Brussels] is equivalent to 2035-11-08T00:00:00[Europe/Brussels].

But DST and exact times don't go well together as DST is uncertain/variable for future date times.

Handling this case is exactly why Temporal allows the offset to be omitted. By omitting the offset, you're telling Temporal that you want to evaluate the string always using the time zone rules that are effective when the string is parsed.

Including both is also useful, because then you can deal with conflicts better. For example, you can parse the string using the option offset: "reject" and if it throws, then you can prompt the user (or use your own code) for how to handle the conflict between the old offset and the new, now-incorrect offset.

@justingrant
Copy link
Collaborator

I think we've addressed the questions above, so I'll close this issue now. Feel free to re-open if more discussion needed. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants