Use `serde` exclusively for serialization #109

fjarri · 2023-01-03T21:12:22Z

Currently we have a two-step system: umbral objects are serialized manually as byte blobs (with SerializableToArray trait) and then the result can be (optionally) serialized with serde (to enable composition in nucypher-core). Originally, this choice was made for the following reasons:

compatibility with ByteStringSplitter (not important anymore)
support for no-alloc targets (serde can handle that, for certain formats)
efficient byte array serialization (now I know how to do it with serde)

I don't think there was anything else, was there?

So I think it might be the time to retire SerializableToArray/DeserializableFromArray and just use serde. The upside is significant code simplification and less boilerplate when adding new structures (which I bumped into when working on #107). The downsides are:

Changed ABI, so protocol versions in nucypher-core will have to be bumped.
Slightly less efficient packaging in binary formats, since serde does not support constant-sized arrays, so every byte blob (curve point, hash, or signature) will have its length serialized too.
Less efficient packaging in text formats, since we will only be able to mandate encoding for "leaf" types (aforementioned points, hashes, and signatures), so instead of encoding, say, the whole cfrag in base64 as a byte blob, we will have a hierarchical structure with hex-encoded components. (Porter can still encode cfrags in binary and then in base64 if it prefers)
All the users who need serialization (and umbral is not particularly useful without it) will have to use serde and a crate for whatever format they use, which are somewhat heavy dependencies.

Speaking of the last point, we might as well just make serde a mandatory dependency (this will help avoid some problems with documenting the serde trait support, too).

Also this will resolve #95 - we will just use hex for small objects and that's it.

@piotr-roslaniec, @jMyles, @KPrasch, @theref - thoughts?

The text was updated successfully, but these errors were encountered:

piotr-roslaniec · 2023-01-04T16:12:39Z

I don't think there was anything else, was there?

I don't recall any other reasons.

Changed ABI, so protocol versions in nucypher-core will have to be bumped.

IMHO benefits outweigh this minor inconvenience. To my best knowledge there are few adopters of nucypher-core, and from the perspective of nucypher-ts, this is not an issue (breaking change, transparent to the user).

Less efficient packaging in text formats (...)

So this is a breaking change in terms of serialization behavior, but not for the clients that implement their own serialization format (e.g. serialize into raw bytes, then serialize into base64 in the client). Do you think it would be possible to use serde_as macros to circumvent this issue? I don't have enough experience with it to recommend it for this particular issue, so just throwing an idea.

All the users who need serialization (and umbral is not particularly useful without it) will have to use serde and a crate for whatever format they use, which are somewhat heavy dependencies.

I think serde is so heavily entrenched in the crate ecosystem some users may view it only as a benefit. I try to implement a philosophy of refactoring into a blessed crate when possible.

fjarri · 2023-01-04T17:50:58Z

So this is a breaking change in terms of serialization behavior, but not for the clients that implement their own serialization format (e.g. serialize into raw bytes, then serialize into base64 in the client).

No, for them too, since the binary format will change as well.

Do you think it would be possible to use serde_as macros to circumvent this issue?

Unfortunately it does not change serde's behavior with arrays, the length is still serialized.

fjarri · 2023-01-04T21:07:50Z

Some specific things to consider in relation with this:

The state of serde_bytes module. umbral won't use Base64 encoding anymore, but nucypher-core still relies on it (since it has some long bytestrings to be encoded). Ideally, it should be extracted to its own crate. There is already serde_bytes crate, but it only works for binary formats. I opened Support for human-readable formats? serde-rs/bytes#37 there to test the waters. There's also serdect, which supports hex encoding, but not base64, and it is targeted at constant-time execution. I wouldn't want to create another similar crate, but that might be the only option.
Reference serialization format. That is, what format does bytes() use in Python, or toBytes() in JS? It needs to be documented and made consistent throughout the codebase. Possibly a trait? So that one could, say, serialize an object in Rust and deserialize it in Python. Moreover, should umbral re-exported by nucypher-core use that trait? Or should we disable serialization for anything other than protocol objects in nucypher-core?
to_secret_bytes() and from_secret_bytes(). These methods don't sit well with me. nucypher doesn't use from_secret_bytes() at all, and only uses to_secret_bytes() to create a seed for the certificate - so maybe instead we can have a corresponding method of SecretKeyFactory that would produce such a seed. Not to mention that the certificate uses SECP384R1, so technically our seed is currently only 2/3 of the maximum size.

fjarri added scoping Scoping required for further action API Related to public API ABI Changes the format of serialized objects labels Jan 3, 2023

fjarri mentioned this issue Jan 6, 2023

Leave serde as the only serialization method #110

Merged

fjarri closed this as completed in #110 Jan 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `serde` exclusively for serialization #109

Use `serde` exclusively for serialization #109

fjarri commented Jan 3, 2023 •

edited

Loading

piotr-roslaniec commented Jan 4, 2023

fjarri commented Jan 4, 2023

fjarri commented Jan 4, 2023 •

edited

Loading

Use serde exclusively for serialization #109

Use serde exclusively for serialization #109

Comments

fjarri commented Jan 3, 2023 • edited Loading

piotr-roslaniec commented Jan 4, 2023

fjarri commented Jan 4, 2023

fjarri commented Jan 4, 2023 • edited Loading

Use `serde` exclusively for serialization #109

Use `serde` exclusively for serialization #109

fjarri commented Jan 3, 2023 •

edited

Loading

fjarri commented Jan 4, 2023 •

edited

Loading