-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Holistic zerovec abstraction design #5523
Comments
Suggestion: something I needed in #5510 is a dataful enum ULE. Maybe |
Yeah. I've stuck that in step 4 but we should start thinking about designs now |
So the most basic dataful enum representation is to spend a byte on the discriminant and then have a single VarULE value (we should consider supporting regular ULEs, too, but I'll focus on VarULE). But, as I did in #5510, you don't normally need the whole discriminant byte, so I took half of it and used it as its own standalone piece of metadata. A key question to answer would be, can the inner VarULEs depend on the extra data in the discriminant byte? Doing so might mean that we need a new trait such as Okay, with that in mind, a design could be pub enum VarEnum2<A, B> {
A(A),
B(B),
}
pub struct VarEnum2WithMetadata<A, B> {
metadata: u8,
value: VarEnum2<A, B>
}
impl<A, B> VarEnum2WithMetadata<A, B> {
pub fn try_new(metadata: u8, value: VarEnum2<A, B>) -> Result<Self> {
if metadata & 0x80 != 0 {
Err(...)
} else {
Ok(Self { metadata, value })
}
}
}
pub struct VarEnum2ULE<A, B> {
_a: PhantomData,
_b: PhantomData,
/// Representation:
/// - First byte: discriminant and metadata
/// - Remainder: either A or B
bytes: [u8]
}
impl<A, B> VarEnum2ULE<A, B> {
pub fn get(&self) -> VarEnum2WithMetadata<&A, &B> {
// ...
}
} And then we could add additional numbers of variants (VarEnum3, ...) as needed. Having this at hand, in addition to the proposals in the OP, should allow me to remove unsafe code from PluralElementsPackedULE. |
Another thought: we could build a single type such as |
Yeah, this is roughly what I'm thinking for enums, but there may be reasons to support some kind of packing. |
Discussion between @Manishearth and @sffc: Design principle: Every data model the macro supports should also be supported via publicly exported ZeroVec types. This may not lead to 100% the same experience1, but it should be possible. Footnotes
|
It is impossible for an IndexN array to need more than a length integer of size N, anyway, the max index is always `>=` the length. Part of #5523 Builds on #5593 We could in theory just have a `VZVFormatCombo<Index, Len>` type that allows free selection, however I'm trying to keep this minimal. Overall the main use case for that is picking things like "a small array of ;argely-sized elements" and we could just expose Index16Len8 for that. I can see that being useful for things like #5580, though it also feels like a data microoptimization. The "total" lines in fingerprints.csv are interspersed in giant diffs, and this basically only gets a max of 2-6 byte wins per data, but the overall data size went down by 200KB. Not amazing, not terrible. ```rust [18:26:22] मanishearth@manishearth-glaptop2 ~/dev/icu4x/provider/data ^_^ $ rg total | awk '{ gsub(/B,/, "", $3); s +=$3} END{print s}' 23501369 [18:26:08] मanishearth@manishearth-glaptop2 ~/dev/icu4x/provider/data ^_^ $ rg total | awk '{ gsub(/B,/, "", $3); s +=$3} END{print s}' 23391499 ```
The VarULE counterpart of TupleNULE Part of #5523. Planned to be used in #4437 I'm not super happy with the naming with this vs VarTupleULE, but I've tried to make it clearer with the module names and it's fine for now. We can rename as desired since zerovec isn't on the ICU4X stability track. I do plan to add serde/etc impls but that's going to be a separate PR. <!-- Thank you for your pull request to ICU4X! Reminder: try to use [Conventional Comments](https://conventionalcomments.org/) to make comments clearer. Please see https://github.com/unicode-org/icu4x/blob/main/CONTRIBUTING.md for general information on contributing to ICU4X. -->
See also: #5561 for a holistic design for the core vector types
Jumping off of #5520 (comment)
I think we should try to make our
zerovec
abstractions a bit more holistically designed. I have these goals in mind:unsafe
code in the crate should be well organized and minimal.For the purposes of this discussion, ICU4X is considered a typical large-scale user, as it is the primary user.
The design I've had in mind is the following. It roughly matches @sffc's design in #5520 (comment) but goes in more depth on how the unsafe code should be structured.
ZeroVec<T>
,VarZeroVec<T, Format>
, plus slice types (exists)MultiFieldsULE<N, Format>
type. Would enjoy suggestions on the name2 since it's more than just fields. MaybeUntypedMultiVarULE
. (planned for 2.0)syn
deps #5127 (unplanned for 2.0, but is a purely internal implementation change that can happen whenever we want)VarTupleULE
that represents(ULE, VarULE)
. (already exists, would like name suggestions).TupleNULE
that represents(ULE, ULE, ...)
. We've done a good job minimizing unsafe here. (already exists, would like name suggestions)VarTupleNULE
that represents(VarULE, VarULE, ...)
. Implemented usingMultiFieldsULE
, basically a typed version of the same. Should not have too much unsafe, ideally using macros or something to do the N-tuples. I'd like to have some of this on track for 2.0, I'm okay with not getting all of it but I want to try.OptionULE
,OptionVarULE
that representOption<T>
. We already have these, but I'd like to potentially switch these over to usingVarTupleULE
in their internal implementation. (No change planned for 2.0, nice-to-have)NichedOption
, but aren't just Option. Ideally they let us do more complex enums.I am not fully happy with the names of these generic types, but we can bikeshed and we can punt naming past 2.0. I would like to see if we can come up with something better than
TupleN
,VarTuple
, andVarTupleN
(orVarVar
). I don't really findVarVar
to be evocative of the right thing, it sounds nested rather than just being "twoVar
s".TupleN
andVarTupleN
sound about right to me, but we then need a name for the(ULE, VarULE)
type. MaybeMixedTuple
?Overall I think this is work I can get done in time for 2.0, at least sufficiently so that we're not blocked on stable data formats in the future, even if the zerovec APIs may change a bit.
cc @sffc @robertbastian
Footnotes
In the long run it would be nice to support that as well, for now I am comfortable not trying to cover that too much in the current holistic design. ↩
I don't consider the bikesheds here to be 2.0 blockers, but I do wish to solicit name suggestions. ↩
The text was updated successfully, but these errors were encountered: