Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add encodeExtendedJson and decodeExtendedJson with support for BigInts, TypesArrays, and Dates #108

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mainnet-pat
Copy link
Contributor

This is a counterparty to libauth's stringify function.

It currently supports parsing back Uint8Array and BigInt, not sure how to go about Symbol and Function

@bitjson bitjson force-pushed the master branch 10 times, most recently from d670d17 to 5ad6520 Compare January 12, 2024 17:21
@bitjson
Copy link
Member

bitjson commented Feb 11, 2024

Thanks for this PR @mainnet-pat! Sorry to keep you waiting here.

I think it makes sense to formalize this sort of format a bit more, maybe calling it (within Libauth) "extended JSON" and providing encodeExtendedJson and decodeExtendedJson functions (to match our cleaned up naming conventions).

I'd previously (~5 years ago 🤯) avoided including this re-encoding behavior because the <TYPE: string prefix doesn't seem as reliable as I'd like to see for an encode/decode-capable format. I also expected the wider JS ecosystem to have arrived at a more native standard for serializing JSON with bigints by now too, but that's clearly going to happen in user space now, if at all.

So: I'd love to pull this in!

I think stringify should continue to work how it currently works, and we'll need a completely separate pair of encodeExtendedJson and decodeExtendedJson functions that support (for now) only TypedArrays (all of them), bigint, and Dates, as those types can be safely re-instantiated in the receiving JS context. Functions can technically be re-instantiated, but supporting them might create some pretty serious vulnerabilities in many applications. And symbols, by design, can't be reproduced in the receiving JS context (but supporting them is also useful for logging the unknown with stringify).

I think the existing string pattern approach (<TYPE: [...] >) is still the best choice. It's visually minimal in that we don't use objects or arrays to represent the complex values (so console.log(obj) and console.log(stringify(obj)) have almost the same shape), the closing > adds a final bit of verification that the value is intended to be an encoded complex value rather than a string with the matching prefix, and the RegExp matching can be just as performant or better than object or array deserialization.

If you (and/or @jimtendo, who just reminded me about this issue) are interested in expanding this PR to break out encodeExtendedJson and decodeExtendedJson (and add them to encodings-and-formats.md), I'd love to merge it!

@emergent-reasons
Copy link

Whatever it is in the end, please I beg that it be explicit about bigint vs. number (float) behavior. A common thing I found with extended json schemes is that they are dangerous by trying to be smart - automatically choosing integer types (in JS implementations) based on size of the integer. This leads to dangerous and messy ambiguous type situations.

I think this does that already, but just putting it out there hopefully as a red line that we don't cross.

@emergent-reasons
Copy link

I never quite understood the need for a special Uint8Array datatype when it will be marginally more efficient at best, and further reduce portability over hex. This is in contrast to a solution for bigint which is strictly necessary.

@mainnet-pat is there an explanation somewhere for wanting that?

@jimtendo
Copy link
Contributor

I never quite understood the need for a special Uint8Array datatype when it will be marginally more efficient at best, and further reduce portability over hex. This is in contrast to a solution for bigint which is strictly necessary.

@mainnet-pat is there an explanation somewhere for wanting that?

This approach will generally always be less efficient than manually encoding/decoding specific fields/paths to their desired types as it requires traversing every field in the JSON tree. But, the trade-off here is that it greatly reduces code-complexity in the sense that we do not have to double-up on interfaces. So the best lens to view this from is probably as a way to decrease complexity, not as a way to increase efficiency.

For example, imagine we have the following interface which we need to transport over the network as JSON:

interface  AnyHedgeContract {
  makerPayoutLockingBytecode: Uint8Array;
  takerPayoutLockingBytecode: Uint8Array;
  makerPublicKey: Uint8Array;
  takerPublicKey: Uint8Array;
  hedgeSats: bigint;
  longSats: bigint;
  // ... other fields
}

If we wanted to type this correctly for transport, we would basically need to duplicate it and replace the Uint8Array and "bigint" with string types to store the hex/bigint data:

interface  AnyHedgeContractJSON {
  makerPayoutLockingBytecode: string;
  takerPayoutLockingBytecode: string;
  makerPublicKey: string;
  takerPublicKey: string;
  hedgeSats: string;
  longSats: string;
  // ... other fields
}

If we were just doing the above, this is probably still quite easily manageable (the interfaces above aren't too complex). But, if we were to do this for every interface that needs to be transported (and those with arrays of data), we wind up with a lot of code that is (near) duplicated and also a lot of additional code to encode/decode to/from these transport interfaces.

Having a encodeExtendedJson/decodeExtendedJson just means that we can avoid the need for these additional transport-specific interfaces and instead perform the encode/decode just before a payload is about to be transported.

be explicit about bigint vs. number (float) behavior

It is explicit. It checks the types being parsed in such that:

{
  // ...
  takerPublicKey: Uint8Array;
  hedgeSats: bigint;
  // ...
}

... will become JSON Payload:

{
  takerPublicKey: "<Uint8Array: HEX_DATA>", // Because type in Object is Uint8Array
  hedgeSats: "<bigint: 123456789123456789", // Because type in Object is bigint
}

(There's no clever "does this number need to be a bigint"... it just looks at what the native type itself is).

@emergent-reasons
Copy link

emergent-reasons commented Feb 11, 2024

Thanks for the explanation Jim. I would really like to split the discussion - I'm not making any statement about the bigint part. That is strictly necessary. I'm saying that by pushing a custom binary format for Uint8Array, we are making it even less portable than needed except in the JS/TS case where we get some additional convenience benefit. Maybe the convenience benefit is large enough, but it's worth having a discussion about it.

Regarding hex vs. whatever encoding, it will be more efficient or less efficient or whatever - I think we can agree that doesn't really matter when the difference is going to be marginal either way.

Also thanks for showing that it's explicit 🙏. Hope it stays that way.

@bitjson
Copy link
Member

bitjson commented Feb 12, 2024

@emergent-reasons I agree that trying to pass Uint8Arrays between JS contexts is needlessly high-friction, it's probably always better to use a hex-encoded string. 👍

In the JS context, Libauth requires Uint8Arrays in places where we want more assurance about the shape of the data (non-malformed hex) or we need to repeatedly work with the binary representation (the VM; decoded addresses, keys, etc.)

These days (as of Libauth v2), the Libauth formats that are intended for significant external consumption are all represented with JSON-safe data types (there's lots of hex-encoded strings and string-encoded BigInts in the WalletTemplate schema, for example). I plan to continue that approach with all the external formats exposed by the wallet engine too.

In the case of these encodeExtendedJson and decodeExtendedJson utilities, it's independently useful to be able to easily encode and later decode complex JS objects in a new context. I'd expect them to mostly be used for debugging purposes, but (as @jimtendo describes) there are probably also a lot of single-purpose protocols or simpler systems where using this convention will be easier than creating and using JSON schemas for transmission.

That said, I don't expect to use encodeExtendedJson/decodeExtendedJson at all in the wallet engine, since all wallet engine data types will follow strictly-typed formats with Libauth-generated JSON schemas. (In fact, when you get to that level of abstraction, we rarely use hex-encoded strings either: e.g. CashAddress, WIF, hd key encoding, etc.)

@bitjson bitjson changed the title Add extended parse function, compatible with Uint8Array and BigInt Add encodeExtendedJson and decodeExtendedJson with support for BigInt, TypesArrays, and Dates Feb 12, 2024
@bitjson bitjson changed the title Add encodeExtendedJson and decodeExtendedJson with support for BigInt, TypesArrays, and Dates Add encodeExtendedJson and decodeExtendedJson with support for BigInts, TypesArrays, and Dates Feb 12, 2024
@jimtendo
Copy link
Contributor

@bitjson would you be open to a tree-traversing function to go along with this also?

There are some middleware use-cases where this would be useful. To give an example, if one was to want to do something like this as ExpressJS middleware:

// Use express.json() to handle JSON payloads.
// NOTE: We could rewrite the functionality contained in here, but it's high effort because it also supports things like Gzipped payloads.
expressInstance.use(express.json());

// Add middleware to convert to extended JSON.
expressInstance.use(extendedJson()); 

... we would have to do something similar to the following where we double-up on JSON parsing (e.g. string to object then back to string/object to string and then back to object):

(Have posted whole example in case useful for others and highlighted the portions I'm referring to as "PROBLEM POINT")

export const extendedJson = function(): Array<(req: Request, res: Response, next: NextFunction) => void>
{
	// TODO: This is currently VERY inefficient in that we are doubling up on stringify and parse operations.
	//       We either want to replace with a custom tree-traversal for objects or implement express.json() ourselves.

	// Define Middleware function that decodes extended JSON (for requests).
	const decodeExtendedJsonMiddleware = function(req: Request, _res: Response, next: NextFunction): void
	{
		// Decode only if there is a request body.
		if(req.body)
		{
			// PROBLEM POINT
			req.body = decodeExtendedJson(JSON.stringify(req.body));
		}

		// Move forward to next middleware in our route.
		next();
	};

	// Define Middleware function that encodes extended JSON (for responses).
	const encodeExtendedJsonMiddleware = function(_req: Request, res: Response, next: NextFunction): void
	{
		// Store our original send method so that we can invoke it later.
		const originalSend = res.send;

		// Define updated send method that supports Extended JSON.
		const sendWithExtendedJsonSupport = function(data: any): Response<any, Record<string, any>>
		{
			// Check if the data is an object
			if(typeof data === 'object' || typeof data === 'bigint')
			{
				// Convert the object to our Extended JSON format.
                                // PROBLEM POINT
				const dataAsExtJSON = JSON.parse(encodeExtendedJson(data));

				// Send the Extended JSON format
				return originalSend.call(res, dataAsExtJSON);
			}

			// If the data is not an object, just send it as is.
			return originalSend.call(res, data);
		};

		// Override the send function
		res.send = sendWithExtendedJsonSupport;

		// Move forward to next middleware in our route.
		next();
	};

	// Return an array of our middleware for handling Extended JSON in a) our requests and b) our responses.
	return [ decodeExtendedJsonMiddleware, encodeExtendedJsonMiddleware ];
};

If we added a tree-traversal function, we could optionally avoid that inefficient doubling up. This applies to most libraries that would cast from string -> object or object -> string too (e.g. PeerJS).

(Have pasted whole code there just in case it's useful for someone else).

@bitjson
Copy link
Member

bitjson commented Feb 14, 2024

@bitjson would you be open to a tree-traversing function to go along with this also?

Certainly! As long as it has full test coverage (so we can maintain it easily), no problem exposing additional related utilities. If I'm understanding correctly, you're looking for a decodeExtendedJson equivalent that accepts an object rather than a string (where some string values have the extendedJson encodings for other types, e.g. <bigint: 1n>)?

I'd suggest choosing a similar name with a clarifying prefix, maybe decodeExtendedJsonObject and encodeExtendedJsonObject.

@jimtendo
Copy link
Contributor

If I'm understanding correctly, you're looking for a decodeExtendedJson equivalent that accepts an object rather than a string

Yep, exactly!

I'd suggest choosing a similar name with a clarifying prefix, maybe decodeExtendedJsonObject and encodeExtendedJsonObject.

Sounds perfect, thank you!

@emergent-reasons
Copy link

emergent-reasons commented Mar 11, 2024

Jim is putting a placeholder for this feature into some code we are working on. Looking at the details, one thing jumped out at me - I recommend not using a facsimile of javascript bigint naming and notation. It sets up an expectation that the format is just a wrapped up javascript bigint which I doubt is a design goal due to additional complexity. Is it a design goal?

For example, <bigint: 123_456n> would be a reasonable value given the double suggestion of bigint and ___n notation. In the implementation I saw, the parser would just pass that string through.

For example, if javascript bigint notation is a design goal, then every language will need to implement a complete version of javascript bigint including all possible representations.

I'd recommend something neutral and narrowly defined for purpose such as

  • int:123
  • <int:123>

or something similar. Actually using <____> gets into weird HTML territory and might be dangerous in some way I don't understand. I don't have a strong opinion on that though.

@jimtendo
Copy link
Contributor

For me, the main goal would just be landing on something that the ecosystem can standardize on.

But I do tend to agree with @emergent-reasons that I think it may be better to have these encoded in a JS-agnostic manner (e.g. <int:123> or int:123). That way, they might feel a bit more native to any other languages that need to consume or create these payloads (e.g. C++). In JS, the BigInt type isn't always necessarily used to support values above Number.MAX_SAFE_INTEGER, but just used to enforce that the number must be an integer - not a float/double.

<Uint8Array:...>, etc, might already be sensible as I think names like this are pretty common across languages/frameworks. The other thought I had was hex: or bin:, but these might be a bit too loose.

I'm pretty pathetic with RegEx, but it'd be a nice-to-have if the expressions were pretty straightforward for use with JSON Schema. Some data structures become very cumbersome (e.g. doubling up on interfaces) to manually convert to standard JSON for transport (e.g. some of those I work with have LOTS of Uint8Array and int fields) so being able to easily encode/decode these fields with a one-liner for transport is probably the biggest selling point for me when it comes to these functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants