Skip to content

Design of API for setting JS prototypes #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tlively opened this issue Feb 19, 2025 · 15 comments
Open

Design of API for setting JS prototypes #2

tlively opened this issue Feb 19, 2025 · 15 comments

Comments

@tlively
Copy link
Member

tlively commented Feb 19, 2025

One question this does bring up is whether we allow descriptor values to flow into JS. An ideal value representation for a descriptor in SpiderMonkey would just be the pointer to the SM shape, however shapes are not JS objects and I'm not sure yet how feasible it'd be to have them exposed to JS. We have a similar problem in wasm code, but it's a little easier there to add support.

The bigger issue is that our shapes are also immutable, so you can't set a prototype for an already created shape, but instead create a new one with the right prototype. So with this design we'd need to wrap a descriptor in a new object and add an indirection so that this mutation is reflected everywhere. And it wouldn't work if you change a descriptor's prototype after structs have been allocated with it, because they would have been allocated with different shapes and there's no easy way to find them all and assign them new shapes (AFAIK). So it'd be nice if this API was a one-shot and can only be called before a descriptor has been used anywhere.

Originally posted by @eqrion in ea16da7

This seems to suggest that the original API from the design issue was actually a better fit. That API design set the prototype immutably at the point where the custom descriptor was allocated, and it did so in a way that the custom descriptor could be used from subsequent constant initializers. Alternative designs that e.g. import functions to allocate custom descriptors would lack this last property.

The only downside of the original API was that it depended on a custom section for identifying which field of the custom descriptor was meant to be the prototype. We could get rid of the custom section by standardizing a convention like "the first field is used as a prototype if it has a type that matches (ref null extern)," but that would be super ad hoc, and I don't think that's any better than using a custom section.

Additional ideas would be very welcome here!

@sjrd
Copy link

sjrd commented Feb 19, 2025

We could get rid of the custom section by standardizing a convention like "the first field is used as a prototype if it has a type that matches (ref null extern)," but that would be super ad hoc, and I don't think that's any better than using a custom section.

That seems actively harmful to me. It's going to quickly break any codegen whose first descriptor field has, by chance, that type. For example, because of the type of JS string builtins, strings typically get that type. The first field of a descriptor might be the class name, for run-time reflection. That could very well have the type (ref null extern), but one most certainly does not want it to be used as prototype.

@tlively
Copy link
Member Author

tlively commented Feb 20, 2025

Another idea due to @syg is that we could simply have each custom descriptor object start out with an associated empty prototype object that will be the JS prototype of the objects described by the descriptor. After instantiation, the prototype could be retrieved from an exported custom descriptor and then populated with methods.

The benefits of this idea are that it does not require custom sections and it still allows custom descriptors to be used in constant expressions.

The downsides are the overhead of allocating an associated prototype object for every custom descriptor (but I don't expect this to be a problem in practice) and the need to use Object.setPrototypeOf to set up the proper prototype inheritance chains.

@sjrd
Copy link

sjrd commented Feb 20, 2025

the need to use Object.setPrototypeOf to set up the proper prototype inheritance chains.

That is exactly what I was thinking as I read the message. Doesn't that trigger severe deoptimization paths in the main engines?

@tlively
Copy link
Member Author

tlively commented Feb 20, 2025

Yes, I believe so.

A workaround would be to provide the supertype's custom descriptor when allocating the subtype's custom descriptor and have an annotation telling the engine to make the proper connection between the associated prototypes, but that gets us back to where we started with the custom sections.

@eqrion
Copy link
Contributor

eqrion commented Feb 20, 2025

This seems to suggest that the original API from the WebAssembly/design#1552 was actually a better fit. That API design set the prototype immutably at the point where the custom descriptor was allocated, and it did so in a way that the custom descriptor could be used from subsequent constant initializers. Alternative designs that e.g. import functions to allocate custom descriptors would lack this last property.

Yeah, I think that making the prototype of a custom descriptor immutably set at creation is the ideal way to go here.

As yet another alternative to using a custom section here, what if we added a new struct creation instruction for descriptors that let us pass embedding specific host information?

e.g.

struct.new_descriptor $typeIndex
  $typeIndex refers to a struct that describes another struct
  [$struct-fields, externref] -> [(ref exact $typeIndex)]

It takes an extra optional externref that is interpreted by the host however we need to. So it could be an options bag with a prototype field, or just the prototype.

This would be minimally invasive on the core semantics, and I believe it would be useful for other embeddings than just JS. JS isn't going to be the only host for wasm that might want to have these wasm GC objects interop with their host type system.

@tlively
Copy link
Member Author

tlively commented Feb 20, 2025

@rossberg, how would you feel about adding new allocations instructions for descriptor types that take an arbitrary externref as an extension point for host interop?

@eqrion
Copy link
Contributor

eqrion commented Feb 20, 2025

As a quick follow up to struct.new_descriptor. We'd want to ensure that from core wasm's perspective, the host extension value you pass for new_descriptor doesn't change any observable behavior in core wasm. So no changing casts or anything like that. But it could change an embedder's semantics, like the JS embedding spec.

And also that there's no struct.get_descriptor_host_value to retrieve this value after you pass it, so that hosts that don't care about it can drop it immediately.

@rossberg
Copy link
Member

@tlively, to be honest, it strikes me as a bit random. Is there any concrete host use case outside JS's peculiar prototype mechanism?

If we had type imports, we could perhaps introduce a js builtin module that defines an opaque type proto, and a struct using that as it's first field could be interpreted specially like suggested above, but preventing the risk that @sjrd described for abusing plain externref.

@eqrion
Copy link
Contributor

eqrion commented Feb 21, 2025

@tlively, to be honest, it strikes me as a bit random. Is there any concrete host use case outside JS's peculiar prototype mechanism?

I don't think it's just about JS prototypes, there are other parts of how a wasm-GC struct is reflected to JS that we'll want to control. At least one thing I think we'll want is to have specified wasm fields get reflected as JS own properties with specific names. The externref that's provided wouldn't just be the prototype, but it'd be a collection of settings to control the initialization of the rtt.

For non-JS hosts, I could imagine similar use-cases. If you embedded Wasm-GC code in a Java VM or .NET VM, you'd probably want to have the ability to control how these objects are reflected to the rest of the code in the system.

If we had type imports, we could perhaps introduce a js builtin module that defines an opaque type proto, and a struct using that as it's first field could be interpreted specially like suggested above, but preventing the risk that @sjrd described for abusing plain externref.

I think that might help out here, but I haven't been able to fully think through how type imports + compile type imports (basically what js builtins gives us) work together.

@rossberg
Copy link
Member

I don't think it's just about JS prototypes, there are other parts of how a wasm-GC struct is reflected to JS that we'll want to control.

Okay, yes, but as you say: it is about reflecting Wasm values in JS. It is not relevant to Wasm itself, so would be strange as a Wasm instruction.

For non-JS hosts, I could imagine similar use-cases. If you embedded Wasm-GC code in a Java VM or .NET VM, you'd probably want to have the ability to control how these objects are reflected to the rest of the code in the system.

I am not so sure. The expectation to be able to directly use foreign values as if they were "natively implemented" host language objects is a very unique JavaScriptism. I doubt that anything like that will be done in any other embedding. It's not how embedded VMs or data representations usually work. You normally have an API or wrapper objects for accessing embedded values.

@jakobkummerow
Copy link
Contributor

I share the concern about internal shape descriptors flowing out to JS as first-class values; I don't think we would want that (mostly for code robustness reasons). I think our implementation will use an indirection there: Wasm struct points to internal shape descriptor, internal shape descriptor points to Wasm-level "custom descriptor". That way the custom descriptor can flow out to JS without the internal shape descriptor ever floating around userspace code as a first-class value. To make the custom descriptor be the thing that's passed to struct.new etc, it'll also have to have a pointer back to the internal shape descriptor it's associated with.

I strongly share the concern that replacing a prototype after any object using that descriptor has already been exposed to user code (e.g. via the "start" function) is a no-go: we'd have to allocate a fresh internal shape descriptor for the new prototype, and updating all pointers to that would be somewhere between "a big mess" and "infeasible". (Having the prototype itself be mutable is fine, it just needs to preserve its object identity.)

For scaling reasons (many shapes with many methods installed on prototypes) I believe that we'll really want a declarative way to specify them, which AFAICS will mean some kind of section (because that's the declarative mechanism that Wasm has). So I think we should be spending our brainstorming time on coming up with a kind of section (core Wasm? traditional "custom section"? something new in between? something... else?) that we can agree on for this purpose.
Just so it doesn't get lost in the discussion, I'd like to highlight that besides specifying which JS object should be the prototype for which Wasm RTT, there will be two additional things that will likely be even more performance sensitive, and hence even more reason to want a declarative approach:

  • specifying the prototype relations between these prototypes (so that a Java/Kotlin/... class hierarchy is properly reflected in the JS-side prototypes)
  • specifying which Wasm function (from the same module) is installed under which name as a method on one of these prototypes. I don't have hard numbers, but as a rough guess, I wouldn't be surprised if we had on average ten or so methods per prototype, so setting these up will be an order of magnitude more impactful for performance than configuring the object identity for the prototypes.

@tlively
Copy link
Member Author

tlively commented Feb 28, 2025

I consider these to be the requirements constraining the solution space here:

  • Immutable Prototype Identity
  • Custom Descriptors Available in Constant Expressions
  • Maintain Core Wasm Abstractions
  • No Additions to Core Wasm (that are not independently useful)
  • No Slowdown for Unrelated Code
  • Admits a Declarative API

These requirements imply that if a solution sets up an association between a custom descriptor and a particular prototype object, the information about that association must be encapsulated in an imported value passed to the allocation of the custom RTT. More details about these requirements and their implications is included in the spoiler at the bottom of this post.

Furthermore, the specified algorithm for [[GetPrototypeOf]] must be able to look up the prototype from some combination of the custom descriptor's identity, type, and value.

This seems to give four families of solutions, although there might be other solutions I am missing. All are described below, although the last solution is the only new one.

1. Trivial Prototypes

The simplest solution is to use only the custom descriptor's identity to look up the associated prototype. In the spec, this would mean there is a global map from custom descriptor addresses to prototype objects. In implementations, this would mean that every custom descriptor has an eagerly allocated empty associated prototype. One downside of this approach is the (probably negligible) overhead of allocating an empty prototype for every custom descriptor, whether it needs one or not.

All other solutions import values to use during custom descriptor allocation to set up the association with the prototypes. These imported values could be the prototypes themselves, but it is probably better for them to be option bags containing the prototypes to allow us to additionally configure own properties or other things in the future.

2. Field Order Convention

If we're not using the custom descriptor's identity to look up the prototype, we have to use some combination of its type and value. In fact, we must use the type to at least ensure we are getting the prototype from an immutable field. This ensures that the spec algorithm that respects Wasm abstraction boundaries and what real engines would do, namely eagerly copy the prototype into the prototype slot at custom descriptor allocation time, will produce the same value.

The simplest way to look up the prototype from the custom descriptor is to use some fixed convention of where to look. For example:

  • Take the value of the first field if that field is immutable, else null.
  • Take the value of the first immutable field if it exists, else null.
  • Take the value of the first field if is immutable and matches externref, else null.
  • Take the value of the first immutable field that matches externref if it exists, else null.
  • etc.

We can bikeshed the precise convention later. No matter what the convention is, it is something producers have to be aware of. In some cases, they might need to insert a null placeholder field to prevent some other field from being unintentionally picked as holding the prototype.

3. Distinguished Types

We can alternatively allow producers to communicate their intention for a particular field to contain the prototype by having a distinguished type (i.e. not externref) that they can use for that field. Since we don't want to add anything to core Wasm that does not stand alone, we can't just add a new subtype of externref for this, but we could take a dependency on type imports and provide such a subtype as an importable builtin type.

With this solution, the spec algorithm would use the first immutable field with the distinguished builtin type to lookup the prototype. Real engines would do a similar iteration, considering only fields referring to imported subtypes of extern, at custom descriptor allocation time.

4. Distinguished Values (✨NEW!✨)

Rather than iterating through the type of the custom descriptor to find the field used to look up the prototype, it would be possible to iterate through the values of the fields instead. We could have an API that sets an internal sentinel property on objects intended to be imported for use as prototypes (or option bags containing prototypes), and the spec algorithm could iterate through immutable fields looking for the first value that had that sentinel property set. Real engines would do the same search when allocating custom descriptors.

The advantage of this approach is that it lets producers signal their intent explicitly, just like with distinguished types, but it doesn't depend on type imports. The downside is that there might be more fields to search through than there would be with the distinguished types approach, but I doubt this would make much of a difference in practice.

Declarative APIs

All of the above solutions admit a declarative API where a custom section describes the intended shape and contents of the prototypes and the engine automatically realizes that plan via some combination of synthesizing imports and mutating prototypes at the end of instantiation. In all the solutions, methods cannot be attached to the prototypes until after instantiation finishes because the exported functions the methods should call are not available until then. The start function is either able to observe that the methods don't yet work or is able to observe that this core Wasm abstraction has been broken, and we don't want to allow the latter. On the other hand, all solutions but the first allow things that do not depend on Wasm exports, such as own properties or prototype chains, to be configured eagerly when the imported prototypes (or option bags containing prototypes) are first created.

The first solution, Trivial Prototypes, does not allow any eager configuration at all. A declarative API for this solution would require mapping custom descriptor identities to intended prototype configurations, but the custom descriptor identities are not available outside the instance until exports are made available at the end of instantiation. The prototypes would all have to have the same, default shape during the start function.

Explanation of requirements and implications

Immutable Prototype Identity

Ultimately, engines need to install the prototype on the underlying shape descriptor for the WasmGC object. This engine-managed shape descriptor may or may not be the same as the user-visible "custom descriptor" object in core Wasm. Both SpiderMonkey and V8 require the prototype identity associated with a particular shape descriptor to be immutable. The prototype object itself can be mutable, but its identity must be the same for the lifetime of the shape descriptor. In particular, this means that the prototype object must be available when the shape descriptor is allocated. The shape descriptor must be allocated before the shape of a described object can be observed, and in particular before a described object can flow into a cast or out to JS. In practice, this means that the shape descriptor must be allocated at the same time as the custom descriptor. Trying to allocate it lazily while ensuring it is allocated before it is needed would be too complicated and would introduce new work at the Wasm/JS boundary and on casts. The prototype objects must therefore be available when custom descriptors are allocated.

Custom Descriptors Available in Constant Expressions

Today, vtables are allocated in immutable Wasm globals. This allows Binaryen optimizations to safely propagate their fields and eventually devirtualize many method calls. With the Custom RTTs proposal, the vtable structs will become custom descriptors and will be associated with the JS prototypes for the objects they describe. To keep Binaryen's devirtualization optimizations working, the vtable custom descriptors must also be allocated in immutable Wasm globals. The custom descriptors must therefore be allocatable in constant expressions or imported. Due to the previous requirement, this means that the JS prototypes must be available when constant expressions, and in particular global initializers, are evaluated.

Maintain Core Wasm Abstractions

As a matter of design hygiene, the WebAssembly CG requires that the design of the mechanism for associating JS prototypes with custom descriptors be specifiable in terms of the existing Core Wasm embedding API. Implementations of the mechanism may of course punch through any abstraction boundaries they want, but they must not be forced to do so by the nature of the design.

In particular this means that the specification of this mechanism in the JS embedder spec may:

  • Inspect the names and types of a module's imports.
  • Inspect the names and types of a module's exports.
  • Inspect a module's custom sections.
  • Inspect and modify the imports provided at instantiation.
  • Access the exports of an instance.

However, because the JS prototypes must be associated with descriptor objects and the objects they describe at all times, and because these prototypes can be observed before instantiation is finished via calls to imports from the start function, the specification of the mechanism cannot depend on an instance's exports.

Since separate instances of a module must be able to have different custom descriptors and different JS prototypes to access the separate state of each instance, no solution can depend solely on a module's import and export definitions or custom sections. The solution must therefore depend on observing and possibly modifying the imports provided at instantiation time since that's the only capability that has not been ruled out.

Since the custom descriptors must be available in constant expressions, they may themselves be imported or the information necessary to describe their association with a JS prototype may be imported and then used to allocate the custom descriptors. But custom descriptors must contain non-nullable references to functions in the same instance to support devirtualization, so it is not possible to initialize them from outside the instance before instantiation has finished and the functions can be made available as exports. That rules out importing the custom descriptors. Instead, we must import information sufficient to form the association with the JS prototypes themselves when a custom descriptor is allocated.

No Additions to Core Wasm

As another matter of design hygiene, the WebAssembly GG also requires that we not add anything to core Wasm only for the benefit of particular embedders. Building on existing proposals or proposing new mechanisms that have independent value to core Wasm is acceptable, though.

No Slowdown for Unrelated Code

The proposed mechanism should not have performance penalties for code that does not use the new mechanism. In particular, the design should require no new work on the Wasm/JS boundary or for casts. This is the requirement that rules out lazily allocating shape descriptors.

Admits a Declarative API

The proposed mechanism must be declarative in nature or allow a declarative API to be layered on top of it. This is a requirement from the V8 team, which expects users to configure thousands of prototypes with tens of thousands of methods and expects that doing so declaratively will be the only way to get reasonable startup performance.

@rossberg
Copy link
Member

The distinguished value idea is a clever option, that could work.

Both option 3 and 4 could be combined with the most constrained version of option 2 to avoid any search through the fields. That is, only consider the very first field, and if that does not have the right attribute+type+value, the magic behaviour does not apply and the prototype is left as null.

@jakobkummerow
Copy link
Contributor

For module size reasons, it would actually be interesting to not require functions that are installed as prototype methods to also be exported on their own.
A specific estimate we've heard from a prospective user of all this is "200K functions". When exporting 200K functions, even if you give them nonsensical size-optimized names (in the style of a JS minifier), each export takes 8 bytes (1 for the export name length, 3 for the export name, 1 for the kind, 3 for the LEB-encoded index), adding up to 1.6 MB of totally useless module size bloat (because nobody will call these by their export names: they'll be called via prototypes instead).

I fully expect that "design hygiene" will take precedence over practical concerns like module size, but wanted to have this consideration on the record nevertheless.

@tlively
Copy link
Member Author

tlively commented Mar 12, 2025

Note that exporting all 200k functions will also completely kill wasm-opts's ability to meaningfully optimize types, so the long-term solution for this will be to a combined JS-Wasm callgraph analysis to remove unused exports. Binaryen has a tool for this called wasm-metadce that is used by Emscripten, so there is precedent for this kind of thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants