Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neo DateTime FFI #5940

Open
sffc opened this issue Dec 20, 2024 · 3 comments
Open

Neo DateTime FFI #5940

sffc opened this issue Dec 20, 2024 · 3 comments
Assignees
Labels
C-datetime Component: datetime, calendars, time zones S-large Size: A few weeks (larger feature, major refactoring)

Comments

@sffc
Copy link
Member

sffc commented Dec 20, 2024

Some thoughts on the structure of DateTime FFI.

Types: we might not need very many

The Rust type DateTimeFormatter<CompositeFieldSet> does everything. We can have many constructors that load different subsets of data. The reason to have multiple types is to reduce stack size (memory usage in FFI) and make the Drop impl a bit smaller, which I think is valuable.

It probably makes sense to start with these types:

FFI type Rust type Approximate stack size
TimeFormatter TimeFormatter<TimeFieldSet> 272
DateFormatter DateTimeFormatter<DateFieldSet> 440
DateTimeFormatter DateTimeFormatter<CompositeDateTimeFieldSet> 480
ZonedDateTimeFormatter DateTimeFormatter<CompositeFieldSet> 1472
GregorianDateTimeFormatter FixedCalendarDateTimeFormatter<Gregorian, CompositeDateTimeFieldSet> 424
GregorianZonedDateTimeFormatter FixedCalendarDateTimeFormatter<Gregorian, CompositeFieldSet> 1416
size_test!(TimeFormatter<fieldsets::enums::TimeFieldSet>, time_formatter_size, 272);
size_test!(DateTimeFormatter<fieldsets::enums::DateFieldSet>, date_formatter_size, 440);
size_test!(DateTimeFormatter<fieldsets::enums::CompositeDateTimeFieldSet>, date_time_formatter_size, 480);
size_test!(DateTimeFormatter<fieldsets::enums::CompositeFieldSet>, zoned_date_time_formatter_size, 1472);
size_test!(FixedCalendarDateTimeFormatter<icu_calendar::Gregorian, fieldsets::enums::CompositeDateTimeFieldSet>, gregorian_date_time_formatter_size, 424);
size_test!(FixedCalendarDateTimeFormatter<icu_calendar::Gregorian, fieldsets::enums::CompositeFieldSet>, gregorian_zoned_date_time_formatter_size, 1416);

Maybe we don't need the Gregorian types. The only difference is the presence of the AnyCalendar field and its associated destructor, which seems to be fairly small in the grand scheme. We can still of course have the Gregorian constructors, but they can resolve to these types.

I think we can also have just DateTimeFormatter and not DateFormatter.

Basically: time zone data is way bigger than date data which is way bigger than time data. So if you already have date data, removing time data is mostly in the noise, and if you already have zone data, removing date data is mostly in the noise.

Static Field Set Constructors

For dates, times, and datetimes, I think we can just have all the constructors. There are 20ish.

I think we should flatten the field set options into positional arguments, like we do in some other types. This avoids having a bunch of tiny field set structs like we do in Rust.

For time zones, I see a few options…

  1. Maybe we could have a fn to "add" time zone data to a DateTimeFormatter, transforming it into a ZonedDateTimeFormatter. This would require a bit of work on the Rust side, but I think it's doable.
  2. Or we do a hybrid approach where we have time zone constructors for each style and they take a dynamic datetime field set as an argument. You won't get slicing of date data, but date data is small relative to time zone data.
  3. Anything else?

Dynamic Field Set Constructors

The Rust approach is infeasible over FFI, since it is built on the idea of a bunch of little structs that compose into bigger structs and eventually into an all-encompassing dataful enum.

I think we instead want to introduce a builder API. I think we should start in Rust and then port it to FFI. I've had in mind for a while that the builder API would work similarly to the Serde impl.

There are a few approaches for specifying the field sets:

  1. Enums and structs: specify all date, calendar period, time, and zone fields in plain enums. Then you put the ones you want into a struct. The constructor is fallible if you put together an invalid combination of field sets, but the field sets individually are always valid. This is slightly more type-safe.
  2. Set of fields: export a single enum with possible fields, and the field set is a vector of that enum. Then we check to see if the fields you put in the vector compose a valid field set. This is what the Serde impl does.
  3. String: specify the semantic field set as a string, perhaps in JSON syntax, and use the Serde impl directly.

I lean toward option 2 because it is a smaller API surface, but I can be easily persuaded. If option 1 can be implemented more efficiently since it is structs and enums instead of a vector, that would convince me.

Alternatively, we could say that we don't export dynamic field sets over FFI. Clients in each language can build their own delegation mechanism that sits on top of the static constructors. This would be annoying for clients, but it is totally feasible. And if CLDR adds a string semantic skeleton syntax, we can add a single narrow constructor for that.

@Manishearth

@sffc sffc added the C-datetime Component: datetime, calendars, time zones label Dec 20, 2024
@sffc sffc added this to the ICU4X 2.0 ⟨P1⟩ milestone Dec 20, 2024
@Manishearth
Copy link
Member

Manishearth commented Dec 20, 2024

The only difference is the presence of the AnyCalendar field and its associated destructor, which seems to be fairly small in the grand scheme

AnyCalendar pulls in a lot of data statically, no? We could add try_new_gregorian() (etc) functions to it to reduce it a bit, but we'd still be stuck with tons of symbols data for the formatters.

I like your first set of six types.

Maybe we could have a fn to "add" time zone data to a DateTimeFormatter, transforming it into a ZonedDateTimeFormatter. This would require a bit of work on the Rust side, but I think it's doable.

I'd prefer to avoid doing mutation/moving over FFI if possible, there are safety concerns there. Eventually I'd like diplomat to start enforcing these things which would make me a bit more relaxed about it but right now our enforcement basically is "we have very few mutable types and we know how they get used".

In particular, I see hazards of mixing this with parts formatting since that involves persisting borrows.

Not opposed, but disprefer it.

I think we instead want to introduce a builder API. I think we should start in Rust and then port it to FFI. I've had in mind for a while that the builder API would work similarly to the Serde impl.

Quite interested.

(Yes, builder APIs also involve mutation, but this is a bit more relaxed: it's abnormal to ever borrow from a builder, builder types exist to be mutated and then consumed, and there are no other operations people typically add)

I lean toward option 2 because it is a smaller API surface, but I can be easily persuaded. If option 1 can be implemented more efficiently since it is structs and enums instead of a vector, that would convince me.

Requiring additional allocation for each DTF construction doesn't seem great. I do like option 1 and think that validation won't be too hard since Rust exhaustive matching will force us to handle everything. Might be messy, I'm not sure, if I recall correctly the "valid" fieldset combinations are actually pretty easily described.

String: specify the semantic field set as a string, perhaps in JSON syntax, and use the Serde impl directly.

I think this would be a fine API to have for 2.0. People probably will be wanting this anyway. It's not going to be efficient, but it's a nice API to have and gives us breathing room to design something good if we want.

Edit: Shane mentioned that semantic skeleton string formats are not specified in UTS 35, so this probably needs more work and is not an easy way out.

May have more thoughts later.

@sffc
Copy link
Member Author

sffc commented Dec 21, 2024

I put up a new builder API in #5944.

We now have several ways of specifying skeletons:

2.0-BETA1 Serde

{
  "fieldSet": ["year","month","day","weekday","time","zoneGeneric"],
  "length":"medium",
  "alignment":"column",
  "yearStyle":"always",
  "timePrecision":"secondF3"
}

PR #5944

{
  "dateFields":"YMDE",
  "zoneStyle":"Zs",
  "length":"medium",
  "alignment":"column",
  "yearStyle":"always",
  "timePrecision":"secondF3"
}

Rust API

fieldsets::YMDET::medium()
  .with_alignment(Alignment::Column)
  .with_year_style(YearStyle::Always)
  .with_time_precision(TimePrecision::SecondF3)
  .with_zone_generic()

ECMA-402 (for comparison)

{
  "era": "short",
  "year": "numeric",
  "month": "short",
  "day": "2-digit",
  "hour": "2-digit",
  "minute": "numeric",
  "second": "numeric",
  "fractionalSecondDigits": 3,
  "timeZoneName": "shortGeneric",
}

Comparing and contrasting:

2.0-BETA1 Serde PR #5944 Rust API ECMA-402
Type Safety Enums for options, but allows invalid sets of fields and/or options. Enums for options and fields, but allows invalid combinations of options. Fully type-safe Allows invalid sets of fields and/or options, including options not allowed by semantic skeletons.
FFI Compatibility Medium; needs a vector Easy Difficult Easy
ECMA-402 Compatibility Conversion is relatively straightforward Conversion is possible but takes a bit of work and a bunch of matching in userland Conversion is possible but likely requires even more work than PR #5944 N/A
Readability @sffc thinks this is the most readable since it avoids jargon like "YMDET" @Manishearth thinks this is the most readable since it is easier to parse, and people should know what "YMDET" means Mostly readable, aside from the YMDET Mostly readable, though verbose

What I think we should do, after a brief conversation with @Manishearth:

  1. Adopt PR Add new datetime field set builder #5944 for FFI
  2. Export the builder to Rust users as an alternative to the type-safe API
  3. Change the Serde impl to use the new schema
  4. Export a standalone API to map from ECMA-402 skeletons to ICU4X skeletons (instead of doing this in userland)
  5. Ask CLDR-TC to adopt this new schema as the standard serialized form in UTS 35

Thoughts? @zbraniecki

sffc added a commit that referenced this issue Dec 21, 2024
@sffc
Copy link
Member Author

sffc commented Dec 28, 2024

AnyCalendar pulls in a lot of data statically, no?

Only if the AnyCalendar constructor is used. I propose using the Gregorian constructor and then casting the resulting FixedCalendarDateTimeFormatter into a DateTimeFormatter via into_formatter.

I really want a way to allow for time zone selection over FFI without including all time zone data, but I guess for 2.0 we can just say, here's the builder API, and if you want less data, upvote this issue and write a custom FFI fn in the interim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-datetime Component: datetime, calendars, time zones S-large Size: A few weeks (larger feature, major refactoring)
Projects
None yet
Development

No branches or pull requests

2 participants