Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IXDTF parser calendared APIs (was: Consider adding an AnyCalendarFactory) #5739

Open
sffc opened this issue Oct 28, 2024 · 16 comments · Fixed by #5984
Open

IXDTF parser calendared APIs (was: Consider adding an AnyCalendarFactory) #5739

sffc opened this issue Oct 28, 2024 · 16 comments · Fixed by #5984
Assignees
Labels
C-calendar Component: Calendars discuss-priority Discuss at the next ICU4X meeting S-medium Size: Less than a week (larger bug fix or enhancement)

Comments

@sffc
Copy link
Member

sffc commented Oct 28, 2024

Currently AnyCalendar::try_new_unstable loads the data every time, making it seem inefficient since calendars are often (but not always) short-lived objects.

Should we consider adding AnyCalendarFactory?

pub struct AnyCalendarFactory {
    payloads: (
        DataPayload<JapaneseErasV1Marker>,
        DataPayload<ChineseCacheV1Marker>,
        // ...
    )
}

impl AnyCalendarFactory {
    pub fn try_new_unstable(...) { ... }

    pub fn calendar_from_kind(&self, kind: AnyCalendarKind) -> AnyCalendar /* or Ref<AnyCalendar> ??? */ { ... }
}

This type could be used internally in the IxdtfParser (#5736).

@Manishearth @robertbastian

@sffc sffc added the C-calendar Component: Calendars label Oct 28, 2024
@Manishearth
Copy link
Member

Are they short-lived? The intent of the whole AsCalendar infrastructure was to make it easy to have long lived calendars.

@sffc
Copy link
Member Author

sffc commented Oct 28, 2024

There are lots of cases when you might want a short-lived Date<Ref<AnyCalendar>> object. The question is how you get the Ref<AnyCalendar>. Currently you can only get a AnyCalendar from its own constructors, which means you need to get a new instance every time. But IxdtfParser allows returning in arbitrary calendars chosen at runtime.

Actually, perhaps a more efficient approach here would be a type that pairs Date<Iso> with AnyCalendarKind and which implements the IsInCalendar trait. It can also have functions that take a data provider and return Date<AnyCalendar>.

In other words:

pub struct SecretCalendarDate {
    iso_date: Date<Iso>,
    kind: AnyCalendarKind,
}

// With this impl, SecretCalendarDate can be passed directly to DateTimeFormatter
impl IsInCalendar for SecretCalendarDate { ... }

impl SecretCalendarDate {
    pub fn to_any_date_unstable(&self, provider: ...) -> Result<Date<AnyCalendar>, DataError> { ... }
    pub fn to_iso_date(&self) -> Date<Iso> { ... }
}

impl IxdtfParser {
    pub fn parse_date(&self) -> Result<SecretCalendarDate, ParseError> { ... }
}

@Manishearth
Copy link
Member

To me this seems like more an IxdtfParser issue. I don't think we need a factory type here.

I think it returning (Date<Iso>, AnyCalendarKind) makes sense, and if we want to name that type and give it conversions that would be fine too.

@sffc
Copy link
Member Author

sffc commented Oct 29, 2024

It occurs to me that my claim about IsInCalendar doesn't really hold up because the type won't be able to emit the correct YearInfo and stuff without the calendar data. So you would still need to pass it into ConvertAndFormat. However, it would still be nice to be able to enforce that the IXDTF calendar matches the DateTimeFormatter calendar without having to load the AnyCalendar data extra times.

@sffc
Copy link
Member Author

sffc commented Oct 29, 2024

So the main functional difference (beyond the type system) of DateTimeFormatter and FixedCalendarDateTimeFormatter is that the former contains a calendar field and the latter does not contain a calendar field. This means that DateTimeFormatter is able to convert dates, but FixedCalendarDTF is not.

Ideally the output of the IXDTF parser can be passed into DateTimeFormatter and succeed if its AnyCalendarKind matches. An external way to do this would be to make ConvertCalendar be fallible:

https://unicode-org.github.io/icu4x/rustdoc/icu/datetime/scaffold/trait.ConvertCalendar.html

It could return Infallible on types where the conversion is infallible. Unfortunately it means that convert_and_format needs to start returning a Result.

We could instead make a new trait TryConvertCalendar and a corresponding format function.

@sffc sffc added the discuss-priority Discuss at the next ICU4X meeting label Oct 29, 2024
@Manishearth
Copy link
Member

Not a huge fan of adding another format function, but I get it. I think it's fine to require an explicit conversion

@sffc
Copy link
Member Author

sffc commented Oct 29, 2024

It's not about the explicit conversion; it's about the general problem of having data that is tagged with a calendar be able to use a DateTimeFormatter without having to load the calendar object an extra time (since it already lives inside of the DateTimeFormatter) but also still enforce that the calendar in the formatter matches the calendar annotation.

So there are basically 3 situations for formatting with DateTimeFormatter:

  1. Date Calendar not known (Date<Iso>). Convert with DateTimeFormatter's calendar.
  2. Date Calendar known, but data not loaded (this thread). Fail if calendar does not match DateTimeFormatter's calendar, and also perform conversion.
  3. Date Calendar known and data is loaded. Fail if calendar does not match DateTimeFormatter's calendar. No conversion is needed.

@robertbastian
Copy link
Member

Agree with the approach, but don't like the name "tagged".

@Manishearth
Copy link
Member

Partial discussion:

  • @robertbastian Seems like calendrical calculations should be part of IXDTF parsing. It is already dataful. This would allow us to use IXDTF parser output with FixedCalendarFormatter.
  • @sffc That's what I originally thought, but since DateTimeFormatter already owns the calendar, it's silly to load it again each time. AnyCalendarFactory is kind-of not a very elegant utility.
  • @Manishearth What does it get us?
  • @sffc
    • Enforcing validation with TaggedDate + (Any) DateTimeFormatter, since otherwise it always has convert_and_format behavior
    • Convenient conversions to Date<AnyCalendar> where the data loading is explicit
  • @robertbastian My preferred solution is we have an AnyDate parser, and it needs to be given the data to do these things. That mirrors the IXDTF parser, which has data. I'm not a fan of passing around the TaggedDate, because it seems like we are passing around half-parsed things. We could add the tuple thing to the icu_datetime crate. Only the DateTimeFormatter can handle the tuple. So I'd rather not have this type at all.
  • @sffc Could you address teh thing about DTF owning a calendar, and that's my main point. We shouldn't need to load data for it. You want to be able to parse a lot of things in a loop. In your proposal the downside is that the AnyCalendar parser needs to load data for everyone: Japanese, Chinese (cached), Islamic (cached). Most people don't need to load chinese dates (etc).
  • @robertbastian: This is an advantage IMO: you don't want to error at parse time, you should error at construction time. Ctor should make sure all the data that is possibly needed is available.
  • @sffc But if you've created your datetime formatter and the calendars don't match that's an error. You don't need to compute the calendar in the IXDTF parser because the DateTimeFormatter already owns it.
  • @robertbastian Your mantra over the last 6 months has been the thin membrane, that the dtf should not do any calendar calculations, it shouldn't do any timezone calculations, why are we now saying that the datetime formatter should do date calculations?
  • @Manishearth Seems like we're hitting on a case where you have a multi-calendar use case.

@sffc
Copy link
Member Author

sffc commented Oct 29, 2024

Conclusions:

// icu_calendar
impl<A: AsCalendar> Date<A> {
    pub fn try_new_from_codes(
        era: Option<Era>,
        year: i32,
        month_code: MonthCode,
        day: u8,
        calendar: A,
    ) -> Result<Self, DateError>
    
    pub fn try_from_str(
        s: &str,
        calendar: A,
    ) -> Result<Self, ParseError>
}

enum ParseError {
    ....
    MismatchedCalendar(AnyCalendarKind /* or id string */),
}

impl DateTimeFormatter {
    pub fn get_calendar(&self) -> Ref<AnyCalendar>
}

// Call site
let date = Date::try_from_str("...", dtf.get_calendar())?;
let fdt = dtf.strict_format(date)?;

// Possible future
let fdt = dtf.format_ixdtf("...")?;

LGTM: @sffc @Manishearth @robertbastian

Time zone parser:

impl IxdtfParser {
    pub fn try_location_only_iso_from_str(
        &self,
        ixdtf_str: &str,
    ) -> Result<CustomZonedDateTime<Iso, TimeZoneInfo<models::AtTime>>, ParseError> { ... }

    pub fn try_location_only_from_str<A: AsCalendar>(
        &self,
        ixdtf_str: &str,
        calendar: A,
    ) -> Result<CustomZonedDateTime<A, TimeZoneInfo<models::AtTime>>, ParseError> { ... }
}

LGTM: @sffc @robertbastian @Manishearth

@sffc sffc added this to the ICU4X 2.0 ⟨P1⟩ milestone Oct 29, 2024
@sffc sffc removed the discuss-priority Discuss at the next ICU4X meeting label Oct 29, 2024
@sffc sffc added this to icu4x 2.0 Oct 29, 2024
@sffc sffc moved this to Unclaimed for sprint in icu4x 2.0 Oct 29, 2024
@Manishearth Manishearth changed the title Consider adding an AnyCalendarFactory IXDTF parser calendared APIs (was: Consider adding an AnyCalendarFactory) Jan 7, 2025
@sffc sffc added the S-medium Size: Less than a week (larger bug fix or enhancement) label Jan 7, 2025
@Manishearth
Copy link
Member

This probably can be done alongside #5859 and #5929, which @robertbastian is already working on.

@sffc
Copy link
Member Author

sffc commented Jan 14, 2025

From #5982 (review)

I don't think we explicitly discussed this above. I think that

Date::try_from_str("2024-07-17", Gregorian)

should fail. You should write

Date::try_from_str("2024-07-17", Iso)
// or
Date::try_iso_from_str("2024-07-17")

and then you can call .with_calendar to convert it.

I don't have an opinion right at the moment about how we should behave with

Date::try_iso_from_str("2024-07-17[u-ca=buddhist]")

@sffc sffc added the discuss-priority Discuss at the next ICU4X meeting label Jan 14, 2025
@robertbastian
Copy link
Member

DateTime doesn't have a to_calendar method, so always explicitly going through ISO makes things very verbose. It's

DateTime::try_from_str(string, formatter.calendar())

vs

let iso = DateTime::try_iso_from_string(string);
DateTime { date: iso.date.to_calendar(formatter.calendar()), time: iso.time }

@github-project-automation github-project-automation bot moved this from Unclaimed for sprint to Done in icu4x 2.0 Jan 15, 2025
@sffc
Copy link
Member Author

sffc commented Jan 15, 2025

I'll live with the error handling that we landed in #5984.

Should IxdtfParser::try_from_str<A>(&self, &str, A) be named parse_str? It is not a constructor.

Should IxdtfParser be named ZonedDateTimeParser?

@sffc sffc reopened this Jan 15, 2025
robertbastian pushed a commit that referenced this issue Jan 15, 2025
#5739

This is mostly docs. The only non-docs is adding PartialEq bounds,
including to DateTime and ZonedDateTime, which became reachable via one
of my new docs tests. I think adding PartialEq to DateTime is not in
contradiction to the thin DateTime effort.
@robertbastian
Copy link
Member

Should IxdtfParser::try_from_str(&self, &str, A) be named parse_str? It is not a constructor.

Yes

Should IxdtfParser be named ZonedDateTimeParser?

Yes

@robertbastian
Copy link
Member

Actually I think the _str is implicit in a parse method, I'd prefer if we used parse and parse_from_utf8. This makes the longer methods like parse_iso_location_only a bit more manageable.

robertbastian added a commit that referenced this issue Jan 15, 2025
Now that we parse `Date<A: AsCalendar>` instead of `Date<AnyCalendar>`,
the `Date<Iso>` specializations are not needed anymore.

#5739
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-calendar Component: Calendars discuss-priority Discuss at the next ICU4X meeting S-medium Size: Less than a week (larger bug fix or enhancement)
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants