Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use serde exclusively for serialization #109

Closed
fjarri opened this issue Jan 3, 2023 · 3 comments · Fixed by #110
Closed

Use serde exclusively for serialization #109

fjarri opened this issue Jan 3, 2023 · 3 comments · Fixed by #110
Labels
ABI Changes the format of serialized objects API Related to public API scoping Scoping required for further action

Comments

@fjarri
Copy link
Contributor

fjarri commented Jan 3, 2023

Currently we have a two-step system: umbral objects are serialized manually as byte blobs (with SerializableToArray trait) and then the result can be (optionally) serialized with serde (to enable composition in nucypher-core). Originally, this choice was made for the following reasons:

  • compatibility with ByteStringSplitter (not important anymore)
  • support for no-alloc targets (serde can handle that, for certain formats)
  • efficient byte array serialization (now I know how to do it with serde)

I don't think there was anything else, was there?

So I think it might be the time to retire SerializableToArray/DeserializableFromArray and just use serde. The upside is significant code simplification and less boilerplate when adding new structures (which I bumped into when working on #107). The downsides are:

  • Changed ABI, so protocol versions in nucypher-core will have to be bumped.
  • Slightly less efficient packaging in binary formats, since serde does not support constant-sized arrays, so every byte blob (curve point, hash, or signature) will have its length serialized too.
  • Less efficient packaging in text formats, since we will only be able to mandate encoding for "leaf" types (aforementioned points, hashes, and signatures), so instead of encoding, say, the whole cfrag in base64 as a byte blob, we will have a hierarchical structure with hex-encoded components. (Porter can still encode cfrags in binary and then in base64 if it prefers)
  • All the users who need serialization (and umbral is not particularly useful without it) will have to use serde and a crate for whatever format they use, which are somewhat heavy dependencies.

Speaking of the last point, we might as well just make serde a mandatory dependency (this will help avoid some problems with documenting the serde trait support, too).

Also this will resolve #95 - we will just use hex for small objects and that's it.

@piotr-roslaniec, @jMyles, @KPrasch, @theref - thoughts?

@fjarri fjarri added scoping Scoping required for further action API Related to public API ABI Changes the format of serialized objects labels Jan 3, 2023
@piotr-roslaniec
Copy link
Contributor

I don't think there was anything else, was there?

I don't recall any other reasons.

Changed ABI, so protocol versions in nucypher-core will have to be bumped.

IMHO benefits outweigh this minor inconvenience. To my best knowledge there are few adopters of nucypher-core, and from the perspective of nucypher-ts, this is not an issue (breaking change, transparent to the user).

Less efficient packaging in text formats (...)

So this is a breaking change in terms of serialization behavior, but not for the clients that implement their own serialization format (e.g. serialize into raw bytes, then serialize into base64 in the client). Do you think it would be possible to use serde_as macros to circumvent this issue? I don't have enough experience with it to recommend it for this particular issue, so just throwing an idea.

All the users who need serialization (and umbral is not particularly useful without it) will have to use serde and a crate for whatever format they use, which are somewhat heavy dependencies.

I think serde is so heavily entrenched in the crate ecosystem some users may view it only as a benefit. I try to implement a philosophy of refactoring into a blessed crate when possible.

@fjarri
Copy link
Contributor Author

fjarri commented Jan 4, 2023

So this is a breaking change in terms of serialization behavior, but not for the clients that implement their own serialization format (e.g. serialize into raw bytes, then serialize into base64 in the client).

No, for them too, since the binary format will change as well.

Do you think it would be possible to use serde_as macros to circumvent this issue?

Unfortunately it does not change serde's behavior with arrays, the length is still serialized.

@fjarri
Copy link
Contributor Author

fjarri commented Jan 4, 2023

Some specific things to consider in relation with this:

  • The state of serde_bytes module. umbral won't use Base64 encoding anymore, but nucypher-core still relies on it (since it has some long bytestrings to be encoded). Ideally, it should be extracted to its own crate. There is already serde_bytes crate, but it only works for binary formats. I opened Support for human-readable formats? serde-rs/bytes#37 there to test the waters. There's also serdect, which supports hex encoding, but not base64, and it is targeted at constant-time execution. I wouldn't want to create another similar crate, but that might be the only option.
  • Reference serialization format. That is, what format does bytes() use in Python, or toBytes() in JS? It needs to be documented and made consistent throughout the codebase. Possibly a trait? So that one could, say, serialize an object in Rust and deserialize it in Python. Moreover, should umbral re-exported by nucypher-core use that trait? Or should we disable serialization for anything other than protocol objects in nucypher-core?
  • to_secret_bytes() and from_secret_bytes(). These methods don't sit well with me. nucypher doesn't use from_secret_bytes() at all, and only uses to_secret_bytes() to create a seed for the certificate - so maybe instead we can have a corresponding method of SecretKeyFactory that would produce such a seed. Not to mention that the certificate uses SECP384R1, so technically our seed is currently only 2/3 of the maximum size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ABI Changes the format of serialized objects API Related to public API scoping Scoping required for further action
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants