You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What always intrigued me were the various little tricks that JavaScript implementations use to be
as fast as they are. One of those tricks is how JS implementations represent values to be more time
and space efficient.
I think it’s worth the time to look at those neat little tricks for two reasons: Firstly, some of
them are also applicable in other situations (though most of them admittedly are not). And secondly,
because they remind you of how various data is represented at the low level (stuff you may have
learned in your CS courses at some point - or maybe not).
So, let’s start!
The trivial approach: Tagged unions
JavaScript is a dynamically typed language and as such needs some kind of “value” data structure,
which stores the current type and a value of that type. The trivial approach to this problem is to
use a tagged union. This could look roughly like this:
As you can see the idea is very simple. You have a type tag, which specifies which type the value
currently has. And a union that’s used to store the various types. Examples to store integers and
objects would look like that:
The problem with this straightforward approach is that the size of the above object would be
16 bytes = 128 bits (on a 64 bit machine), even though the actual information content is much
lower (any individual value can be stored in 64 bits or less). Especially if this value structure
is stored in an array, the overhead quickly adds up. Even outside of arrays, simply passing around
this structure will already take up two 64 bit registers, and so on.
So what we want is to make values smaller, ideally reducing them to just the size of the payload.
Before we get to that for the general case, let’s first have a look at a technique called pointer
tagging:
An interlude: Pointer tagging
Have a look at the above structure again. It consists of a union, which has a size of 8 bytes, and
an integer which could fit into a single byte (a char). So theoretically the total size should be 9
bytes. But it’s not - it’s 16 bytes. This is because the compiler adds padding to the data structure
in order to align it in memory.
This is done for performance reasons: The CPU can access the memory only in word sized chunks. So if
our data always starts at a word it can be fetched efficiently. If it were to start somewhere in the
middle of a word the CPU would have to apply masks and shifts in order to get the data - which would
be too slow.
A word on a 64 bit machine is 64 bit = 8 bytes (obviously ^^). So if all pointers to our object will
be aligned to 8 bytes it means that they will be multiples of 8. A pointer thus can be 8, 16, 24,
109144, etc. But it can not be 7 or 13. Or speaking in binary: 0b1000 (=8), 0b10000 (=16),
0b11000 (=24), 0b11010101001011000 (=109144).
As you can see the lower three bits are always zero. So those three bits are basically free to use.
We can use them to store a “tag”, which is an integer between 0 (0b000) and 7 (0b111).
pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppTTT
^- actual pointer three tag bits -^
Here is a sample implementation of how to do so:
#include<cassert>
#include<stdint.h>template<typenameT,intalignedTo>classTaggedPointer{private:static_assert(alignedTo!=0&&((alignedTo&(alignedTo-1))==0),"Alignment parameter must be power of two");
<span>// for 8 byte alignment tagMask = alignedTo - 1 = 8 - 1 = 7 = 0b111</span>
<span>// i.e. the lowest three bits are set, which is where the tag is stored</span>
<span>static</span> <span>const</span> <span>intptr_t</span> <span>tagMask</span> <span>=</span> <span>alignedTo</span> <span>-</span> <span>1</span><span>;</span>
<span>// pointerMask is the exact contrary: 0b...11111000</span>
<span>// i.e. all bits apart from the three lowest are set, which is where the pointer is stored</span>
<span>static</span> <span>const</span> <span>intptr_t</span> <span>pointerMask</span> <span>=</span> <span>~</span><span>tagMask</span><span>;</span>
<span>// save us some reinterpret_casts with a union</span>
<span>union</span> <span>{</span>
<span>T</span> <span>*</span><span>asPointer</span><span>;</span>
<span>intptr_t</span> <span>asBits</span><span>;</span>
<span>};</span>
The technique described above is useful in many contexts. Basically it’s applicable to any situation
where you want to add a small bit of info to an aligned pointer. And this is also the situation we
are in :)
But before we get to that let’s first have a look at another, very similar trick:
Storing integers in the pointer
JavaScript does not have an integer type per se, it only has IEEE 754 doubles. Still, people mainly
work with small integral numbers (think of loops, array indexes, etc). Thus it makes sense to store
those small integers in a real integer variable instead of a double, because many operations are
much faster this way. That’s also the reason why in the Value class above I have a distinct int32
type and not just a double type.
Especially for integers the above Value approach seems like overkill: The 16 byte Value structure
only stores 4 bytes of actual data, one byte of type information, and the remaining 11 bytes being
padding.
The solution obviously is to use something similar to the tagged pointers introduced above. You can
say: “If the last bit of the pointer is 1 then it actually isn’t a pointer at all, but it’s an
integer.”
Here is how a pointer would look like:
pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppTT0
^- actual pointer two tag bits -^ ^- last bit 0
And this is how the integer would look like (note that as we need one bit for distinguishing the
type the integer has only 63 bits):
iiiiiiii|iiiiiiii|iiiiiiii|iiiiiiii|iiiiiiii|iiiiiiii|iiiiiiii|iiiiiii1
^- actual integer ^- last bit 1
To get the actual integer value we just need to shift off the 1 bit using integer >> 1.
Here again a sample implementation:
#include<cassert>
#include<stdint.h>template<typenameT,intalignedTo>classTaggedPointerOrInt{private:static_assert(alignedTo!=0&&((alignedTo&(alignedTo-1))==0),"Alignment parameter must be power of two");static_assert(alignedTo>1,"Pointer must be at least 2-byte aligned in order to store an int");
<span>// for 8 byte alignment tagMask = alignedTo - 1 = 8 - 1 = 7 = 0b111</span>
<span>// i.e. the lowest three bits are set, which is where the tag is stored</span>
<span>static</span> <span>const</span> <span>intptr_t</span> <span>tagMask</span> <span>=</span> <span>alignedTo</span> <span>-</span> <span>1</span><span>;</span>
<span>// pointerMask is the exact contrary: 0b...11111000</span>
<span>// i.e. all bits apart from the three lowest are set, which is where the pointer is stored</span>
<span>static</span> <span>const</span> <span>intptr_t</span> <span>pointerMask</span> <span>=</span> <span>~</span><span>tagMask</span><span>;</span>
<span>// save us some reinterpret_casts with a union</span>
<span>union</span> <span>{</span>
<span>T</span> <span>*</span><span>asPointer</span><span>;</span>
<span>intptr_t</span> <span>asBits</span><span>;</span>
<span>};</span>
<span>inline</span> <span>void</span> <span>setPointer</span><span>(</span><span>T</span> <span>*</span><span>pointer</span><span>,</span> <span>int</span> <span>tag</span> <span>=</span> <span>0</span><span>)</span> <span>{</span>
<span>// make sure that the pointer really is aligned</span>
<span>assert</span><span>((</span><span>reinterpret_cast</span><span><</span><span>intptr_t</span><span>></span><span>(</span><span>pointer</span><span>)</span> <span>&</span> <span>tagMask</span><span>)</span> <span>==</span> <span>0</span><span>);</span>
<span>// make sure that the tag isn't too large</span>
<span>assert</span><span>(((</span><span>tag</span> <span><<</span> <span>1</span><span>)</span> <span>&</span> <span>pointerMask</span><span>)</span> <span>==</span> <span>0</span><span>);</span>
<span>// last bit isn't part of tag anymore, but just zero, thus the << 1</span>
<span>asPointer</span> <span>=</span> <span>pointer</span><span>;</span>
<span>asBits</span> <span>|=</span> <span>tag</span> <span><<</span> <span>1</span><span>;</span>
<span>}</span>
<span>inline</span> <span>void</span> <span>setInt</span><span>(</span><span>intptr_t</span> <span>number</span><span>)</span> <span>{</span>
<span>// make sure that when we << 1 there will be no data loss</span>
<span>// i.e. make sure that it's a 31 bit / 63 bit integer</span>
<span>assert</span><span>(((</span><span>number</span> <span><<</span> <span>1</span><span>)</span> <span>>></span> <span>1</span><span>)</span> <span>==</span> <span>number</span><span>);</span>
<span>// shift the number to the left and set lowest bit to 1</span>
<span>asBits</span> <span>=</span> <span>(</span><span>number</span> <span><<</span> <span>1</span><span>)</span> <span>|</span> <span>1</span><span>;</span>
<span>}</span>
<span>inline</span> <span>T</span> <span>*</span><span>getPointer</span><span>()</span> <span>const</span> <span>{</span>
<span>assert</span><span>(</span><span>isPointer</span><span>());</span>
<span>return</span> <span>reinterpret_cast</span><span><</span><span>T</span> <span>*></span><span>(</span><span>asBits</span> <span>&</span> <span>pointerMask</span><span>);</span>
<span>}</span>
<span>inline</span> <span>int</span> <span>getTag</span><span>()</span> <span>const</span> <span>{</span>
<span>assert</span><span>(</span><span>isPointer</span><span>());</span>
<span>return</span> <span>(</span><span>asBits</span> <span>&</span> <span>tagMask</span><span>)</span> <span>>></span> <span>1</span><span>;</span>
<span>}</span>
<span>inline</span> <span>intptr_t</span> <span>getInt</span><span>()</span> <span>const</span> <span>{</span>
<span>assert</span><span>(</span><span>isInt</span><span>());</span>
<span>return</span> <span>asBits</span> <span>>></span> <span>1</span><span>;</span>
<span>}</span>
<span>inline</span> <span>bool</span> <span>isPointer</span><span>()</span> <span>const</span> <span>{</span>
<span>return</span> <span>(</span><span>asBits</span> <span>&</span> <span>1</span><span>)</span> <span>==</span> <span>0</span><span>;</span>
<span>}</span>
<span>inline</span> <span>bool</span> <span>isInt</span><span>()</span> <span>const</span> <span>{</span>
<span>return</span> <span>(</span><span>asBits</span> <span>&</span> <span>1</span><span>)</span> <span>==</span> <span>1</span><span>;</span>
<span>}</span>
};
Usage example:
// either a pointer to a double (with a two bit tag) or a 63 bit integerdoublenumber=17.0;TaggedPointerOrInt<double,8>taggedPointerOrInt(&number,3);taggedPointerOrInt.isPointer();// == true;taggedPointerOrInt.getPointer();// == &numbertaggedPointerOrInt.getTag();// == 3
Now using the above technique we have a powerful way to optimize the original Value class. Integers
will be stored directly in the pointer; doubles, strings and objects will be stored as a pointer
with a tag identifying their type (e.g 0 = 0b00 = object, 1 = 0b01 = string, 2 = 0b10 = double).
Booleans could be stored similarly to ints (and fetched using a simple bool >> 3):
00000000|00000000|00000000|00000000|00000000|00000000|00000000|0000b110
actual bool value -^ ^- last bit 0 (as it isn't an int)
tag = 3 = 0b11 to identify that it's a bool -^^
So what have we gained overall? Quite a bit: The size of the value structure is back to 64 bits,
making it half as large. However, there is also a cost: Doubles now need to be stored as a pointer,
which also implies that they have to be allocated on the heap. As such, we have improved the situation
for most types, but introduced a large regression for doubles.
Wouldn’t it be nice if we could avoid that heap allocation for doubles as well?
Pointers in the NaN-Space
Recap: How do doubles look like?
Doubles in JS are following the IEEE 754 Standard for Floating-Point Arithmetic. In C++ doubles
aren’t specified to follow any standard, but we assume that they use IEEE 754, too (which,
practically speaking is a pretty good assumption. In general this whole article is nothing more than
a large set of evil assumptions that any self-respecting C++ developer would murder you for.)
As you probably know IEEE 754 doubles are internally represented using the following bit sequence:
seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm
^- exponent ^- mantissa
^- sign bit
The resulting float is basically (-1)^s * m * 2^e (speaking oversimplifyingly).
What is much more important to us are some special values that doubles can have:
Positive zero (0.0):
seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm
00000000|00000000|00000000|00000000|00000000|00000000|00000000|00000000
^- sign bit 0 (= +), all other bits also 0
Negative zero (-0.0):
seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm
10000000|00000000|00000000|00000000|00000000|00000000|00000000|00000000
^- sign bit 1 (= -), all other 0
IEEE defines two zeros: +0 and -0. Both have exponent and mantissa zeroed out and are distinguished
by the value of the sign bit. This may seem pointless at first - zero is zero after all - but it
allows for some subtle distinctions. For example 1.0 / 0.0 is +INF whereas 1.0 / -0.0 is -INF. There
are also other operations where the sign of the zero makes a difference.
One interesting behavior of the zeros is that they are considered equal in comparison. I.e.
0.0 == -0.0. The only way to find out whether a number is negative zero is to check whether the
sign bit is set:
Positive infinity (+INF):
seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm
01111111|11110000|00000000|00000000|00000000|00000000|00000000|00000000
^- exponent bits all 1 mantissa bits all 0 -^
^- sign bit 0 (= +)
Negative infinity (-INF):
seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm
11111111|11110000|00000000|00000000|00000000|00000000|00000000|00000000
^- exponent bits all 1 mantissa bits all 0 -^
^- sign bit 1 (= -)
Infinity has all exponent bits set and a zero mantissa. Again there is +INF and -INF, distinguished
by the sign bit.
Signaling NaN:
seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm
s1111111|11110ppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp
^- first mantissa bit 0 everything else is "payload" -^
^- exponent bits all 1 and mustn't be all-zero (as it
^- any sign bit would be INF then)
Quiet NaN:
seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm
s1111111|11111ppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp
^- first mantissa bit 1 everything else is "payload" -^
^- exponent bits all 1
^- any sign bit
NaNs represent values that are Not a Number. E.g. 0.0/0.0 = NaN. NaNs also have interesting
comparison semantics, as any comparison with NaN will be false (including NaN == NaN).
The representations of NaNs also have all exponent bits set (like infinity) but have a non-zero
mantissa. The sign bit is irrelevant for NaNs.
There are two types of NaNs: Signaling NaNs (sNaN) and quiet NaNs (qNaN). They are distinguished by
the first bit after the exponent: If it is 1 then the NaN is quiet; if it is 0 it is signaling.
Signaling NaNs - in theory - should throw an invalid operation exception (EM_INVALID) when they are
operated upon, whereas quiet NaNs should just be left alone. Practically this doesn’t seem to be
used much, at least in MSVC this is disabled by default and needs to be enabled by compiler option +
__controlfp() call.
Where it starts to get interesting for us is that NaNs additionally encode a 51 bit “payload” in the
mantissa. This payload was originally designed to contain error information.
But deceitful as we are, we will misuse that NaN payload to stuff other things in it - like integers
and pointers.
64 bit is a lie
On 64 bit architectures pointers have a size of 64 bits, obviously. But think about that again: How
much is 64 bit actually? That’s 2^64 = 1.84467441 * 10^19 addressable memory bytes. That’s 16 EiB
(Exabytes). Or talking in more convenient terms that’s 17179869184 Gigabytes. I don’t know about you,
but I only have 16GB of memory. Some high-end server setups may have 4 TiB of memory, but even those
only need a 42 bit address space (2^42 = 4 TiB).
Unsurprisingly we aren’t the first to notice this: the x86-64 architecture utilizes only the lower
48 bits (which still allows 256 TiB) of a pointer. Additionally bits 63 through 48 must be copies
of bit 47. Pointers that follow this pattern are called canonical.
This leads to a strange looking address space (you can find a nicer version of this image on
Wikipedia):
The operating system normally assigns only pointers from the lower half to applications, reserving
the upper half for itself. As always there are outliers - e.g. Solaris assigns addresses from the
upper half under some circumstances (mmap). But we’ll just ignore that and assume that only the
lower half from 0x00000000.00000000 to 0x00007FFF.FFFFFFFF is used. (Again, all these evil
assumptions…)
At this point you should see what I am driving at: If pointers are really only 48 bits large they
fit perfectly into our 51 bit payload space.
Sample implementation
Here is an implementation stub for doing so (just implements doubles, ints and void pointers; a
real implementation would look similar, just with more types):
<span>// if the double can be losslessly stored as an int32 do so</span>
<span>// (int32 doesn't have -0, so check for that too)</span>
<span>if</span> <span>(</span><span>number</span> <span>==</span> <span>asInt32</span> <span>&&</span> <span>!</span><span>isNegativeZero</span><span>(</span><span>number</span><span>))</span> <span>{</span>
<span>*</span><span>this</span> <span>=</span> <span>Value</span><span>(</span><span>asInt32</span><span>);</span>
<span>return</span><span>;</span>
<span>}</span>
<span>asDouble</span> <span>=</span> <span>number</span><span>;</span>
<span>}</span>
<span>inline</span> <span>Value</span><span>(</span><span>const</span> <span>int32_t</span> <span>number</span><span>)</span> <span>{</span>
<span>asBits</span> <span>=</span> <span>number</span> <span>|</span> <span>Int32Tag</span><span>;</span>
<span>}</span>
<span>inline</span> <span>Value</span><span>(</span><span>void</span> <span>*</span><span>pointer</span><span>)</span> <span>{</span>
<span>// ensure that the pointer really is only 48 bit</span>
<span>assert</span><span>((</span><span>reinterpret_cast</span><span><</span><span>uint64_t</span><span>></span><span>(</span><span>pointer</span><span>)</span> <span>&</span> <span>PtrTag</span><span>)</span> <span>==</span> <span>0</span><span>);</span>
<span>asBits</span> <span>=</span> <span>reinterpret_cast</span><span><</span><span>uint64_t</span><span>></span><span>(</span><span>pointer</span><span>)</span> <span>|</span> <span>PtrTag</span><span>;</span>
<span>}</span>
<span>inline</span> <span>bool</span> <span>isDouble</span><span>()</span> <span>const</span> <span>{</span>
<span>return</span> <span>asBits</span> <span><</span> <span>MaxDouble</span><span>;</span>
<span>}</span>
<span>inline</span> <span>bool</span> <span>isInt32</span><span>()</span> <span>const</span> <span>{</span>
<span>return</span> <span>(</span><span>asBits</span> <span>&</span> <span>Int32Tag</span><span>)</span> <span>==</span> <span>Int32Tag</span><span>;</span>
<span>}</span>
<span>inline</span> <span>bool</span> <span>isPointer</span><span>()</span> <span>const</span> <span>{</span>
<span>return</span> <span>(</span><span>asBits</span> <span>&</span> <span>PtrTag</span><span>)</span> <span>==</span> <span>PtrTag</span><span>;</span>
<span>}</span>
<span>inline</span> <span>double</span> <span>getDouble</span><span>()</span> <span>const</span> <span>{</span>
<span>assert</span><span>(</span><span>isDouble</span><span>());</span>
<span>return</span> <span>asDouble</span><span>;</span>
<span>}</span>
<span>inline</span> <span>int32_t</span> <span>getInt32</span><span>()</span> <span>const</span> <span>{</span>
<span>assert</span><span>(</span><span>isInt32</span><span>());</span>
<span>return</span> <span>static_cast</span><span><</span><span>int32_t</span><span>></span><span>(</span><span>asBits</span> <span>&</span> <span>~</span><span>Int32Tag</span><span>);</span>
<span>}</span>
<span>inline</span> <span>void</span> <span>*</span><span>getPointer</span><span>()</span> <span>const</span> <span>{</span>
<span>assert</span><span>(</span><span>isPointer</span><span>());</span>
<span>return</span> <span>reinterpret_cast</span><span><</span><span>void</span> <span>*></span><span>(</span><span>asBits</span> <span>&</span> <span>~</span><span>PtrTag</span><span>);</span>
<span>}</span>
};
Again, the code should be easy enough to understand. As an aid, here are the three consts from the
top in binary:
Maximum double (qNaN with sign bit set without payload):
seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm
0xfff8000000000000: 11111111|11111000|00000000|00000000|00000000|00000000|00000000|00000000
32 bit integer:
seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm
0xfff9000000000000: 11111111|11111001|00000000|00000000|iiiiiiii|iiiiiiii|iiiiiiii|iiiiiiii
^- integer value
Pointer:
seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm
0xfffa000000000000: 11111111|11111010|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp
^- 48 bit pointer value
A few more considerations
The above code makes doubles accessible directly, without further operations and accesses pointers
using a bit mask (that’s what Mozilla does). An alternative implementation could do it the other way
around and make pointers accessible directly and doubles using an offset (i.e. you would add
0x0001000000000000 on inserting a double and subtract it when retrieving it). The latter is what
WebKit does.
On 32 bit machines one can access the lower 32 bits of the 64 bit value directly. So both pointers
and integers can be accessed directly in any case. On the other hand, whereas the tagged pointer
with embedded integer approach would need only a 32 bit value on 32 bit machines, the NaN approach
will need a 64 bit value in both 32 bit and 64 bit environments. So on 32 bit two words need to be
passed around, instead of just one (that’s why Mozilla calls this approach fatvals).
Still, considering the overall wins it is worth it: Both integers and doubles (and for that matter
also bools and all other “small” values) can be stored directly in the value, without accessing the
heap.
Links
</section></div></div></div><br>
via www.npopov.com https://www.npopov.com
February 13, 2025 at 10:17AM
The text was updated successfully, but these errors were encountered:
Pointer magic for efficient dynamic value representations
https://ift.tt/gDNKhzx
What always intrigued me were the various little tricks that JavaScript implementations use to be as fast as they are. One of those tricks is how JS implementations represent values to be more time and space efficient.
I think it’s worth the time to look at those neat little tricks for two reasons: Firstly, some of them are also applicable in other situations (though most of them admittedly are not). And secondly, because they remind you of how various data is represented at the low level (stuff you may have learned in your CS courses at some point - or maybe not).
So, let’s start!
The trivial approach: Tagged unions
JavaScript is a dynamically typed language and as such needs some kind of “value” data structure, which stores the current type and a value of that type. The trivial approach to this problem is to use a tagged union. This could look roughly like this:
As you can see the idea is very simple. You have a type tag, which specifies which type the value
currently has. And a union that’s used to store the various types. Examples to store integers and
objects would look like that:
The problem with this straightforward approach is that the size of the above object would be 16 bytes = 128 bits (on a 64 bit machine), even though the actual information content is much lower (any individual value can be stored in 64 bits or less). Especially if this value structure is stored in an array, the overhead quickly adds up. Even outside of arrays, simply passing around this structure will already take up two 64 bit registers, and so on.
So what we want is to make values smaller, ideally reducing them to just the size of the payload. Before we get to that for the general case, let’s first have a look at a technique called pointer tagging:
An interlude: Pointer tagging
Have a look at the above structure again. It consists of a union, which has a size of 8 bytes, and an integer which could fit into a single byte (a
char
). So theoretically the total size should be 9 bytes. But it’s not - it’s 16 bytes. This is because the compiler adds padding to the data structure in order to align it in memory.This is done for performance reasons: The CPU can access the memory only in word sized chunks. So if our data always starts at a word it can be fetched efficiently. If it were to start somewhere in the middle of a word the CPU would have to apply masks and shifts in order to get the data - which would be too slow.
A word on a 64 bit machine is 64 bit = 8 bytes (obviously ^^). So if all pointers to our object will be aligned to 8 bytes it means that they will be multiples of 8. A pointer thus can be 8, 16, 24, 109144, etc. But it can not be 7 or 13. Or speaking in binary: 0b1000 (=8), 0b10000 (=16), 0b11000 (=24), 0b11010101001011000 (=109144).
As you can see the lower three bits are always zero. So those three bits are basically free to use. We can use them to store a “tag”, which is an integer between 0 (0b000) and 7 (0b111).
Here is a sample implementation of how to do so:
The code is fairly straightforward I think. You could then use it like this:
The technique described above is useful in many contexts. Basically it’s applicable to any situation where you want to add a small bit of info to an aligned pointer. And this is also the situation we are in :)
But before we get to that let’s first have a look at another, very similar trick:
Storing integers in the pointer
JavaScript does not have an integer type per se, it only has IEEE 754 doubles. Still, people mainly work with small integral numbers (think of loops, array indexes, etc). Thus it makes sense to store those small integers in a real integer variable instead of a double, because many operations are much faster this way. That’s also the reason why in the
Value
class above I have a distinct int32 type and not just a double type.Especially for integers the above
Value
approach seems like overkill: The 16 byteValue
structure only stores 4 bytes of actual data, one byte of type information, and the remaining 11 bytes being padding.The solution obviously is to use something similar to the tagged pointers introduced above. You can say: “If the last bit of the pointer is 1 then it actually isn’t a pointer at all, but it’s an integer.”
Here is how a pointer would look like:
And this is how the integer would look like (note that as we need one bit for distinguishing the type the integer has only 63 bits):
To get the actual integer value we just need to shift off the 1 bit using
integer >> 1
.Here again a sample implementation:
Usage example:
Now using the above technique we have a powerful way to optimize the original Value class. Integers
will be stored directly in the pointer; doubles, strings and objects will be stored as a pointer
with a tag identifying their type (e.g 0 = 0b00 = object, 1 = 0b01 = string, 2 = 0b10 = double).
Booleans could be stored similarly to ints (and fetched using a simple
bool >> 3
):So what have we gained overall? Quite a bit: The size of the value structure is back to 64 bits, making it half as large. However, there is also a cost: Doubles now need to be stored as a pointer, which also implies that they have to be allocated on the heap. As such, we have improved the situation for most types, but introduced a large regression for doubles.
Wouldn’t it be nice if we could avoid that heap allocation for doubles as well?
Pointers in the NaN-Space
Recap: How do doubles look like?
Doubles in JS are following the IEEE 754 Standard for Floating-Point Arithmetic. In C++ doubles aren’t specified to follow any standard, but we assume that they use IEEE 754, too (which, practically speaking is a pretty good assumption. In general this whole article is nothing more than a large set of evil assumptions that any self-respecting C++ developer would murder you for.)
As you probably know IEEE 754 doubles are internally represented using the following bit sequence:
The resulting float is basically (-1)^s * m * 2^e (speaking oversimplifyingly).
What is much more important to us are some special values that doubles can have:
IEEE defines two zeros: +0 and -0. Both have exponent and mantissa zeroed out and are distinguished by the value of the sign bit. This may seem pointless at first - zero is zero after all - but it allows for some subtle distinctions. For example 1.0 / 0.0 is +INF whereas 1.0 / -0.0 is -INF. There are also other operations where the sign of the zero makes a difference.
One interesting behavior of the zeros is that they are considered equal in comparison. I.e. 0.0 == -0.0. The only way to find out whether a number is negative zero is to check whether the sign bit is set:
Infinity has all exponent bits set and a zero mantissa. Again there is +INF and -INF, distinguished by the sign bit.
NaNs represent values that are Not a Number. E.g. 0.0/0.0 = NaN. NaNs also have interesting comparison semantics, as any comparison with NaN will be false (including NaN == NaN).
The representations of NaNs also have all exponent bits set (like infinity) but have a non-zero mantissa. The sign bit is irrelevant for NaNs.
There are two types of NaNs: Signaling NaNs (sNaN) and quiet NaNs (qNaN). They are distinguished by the first bit after the exponent: If it is 1 then the NaN is quiet; if it is 0 it is signaling. Signaling NaNs - in theory - should throw an invalid operation exception (EM_INVALID) when they are operated upon, whereas quiet NaNs should just be left alone. Practically this doesn’t seem to be used much, at least in MSVC this is disabled by default and needs to be enabled by compiler option +
__controlfp()
call.Where it starts to get interesting for us is that NaNs additionally encode a 51 bit “payload” in the mantissa. This payload was originally designed to contain error information.
But deceitful as we are, we will misuse that NaN payload to stuff other things in it - like integers and pointers.
64 bit is a lie
On 64 bit architectures pointers have a size of 64 bits, obviously. But think about that again: How much is 64 bit actually? That’s 2^64 = 1.84467441 * 10^19 addressable memory bytes. That’s 16 EiB (Exabytes). Or talking in more convenient terms that’s 17179869184 Gigabytes. I don’t know about you, but I only have 16GB of memory. Some high-end server setups may have 4 TiB of memory, but even those only need a 42 bit address space (2^42 = 4 TiB).
Unsurprisingly we aren’t the first to notice this: the x86-64 architecture utilizes only the lower 48 bits (which still allows 256 TiB) of a pointer. Additionally bits 63 through 48 must be copies of bit 47. Pointers that follow this pattern are called canonical.
This leads to a strange looking address space (you can find a nicer version of this image on Wikipedia):
The operating system normally assigns only pointers from the lower half to applications, reserving the upper half for itself. As always there are outliers - e.g. Solaris assigns addresses from the upper half under some circumstances (mmap). But we’ll just ignore that and assume that only the lower half from 0x00000000.00000000 to 0x00007FFF.FFFFFFFF is used. (Again, all these evil assumptions…)
At this point you should see what I am driving at: If pointers are really only 48 bits large they fit perfectly into our 51 bit payload space.
Sample implementation
Here is an implementation stub for doing so (just implements doubles, ints and void pointers; a real implementation would look similar, just with more types):
Again, the code should be easy enough to understand. As an aid, here are the three consts from the
top in binary:
A few more considerations
The above code makes doubles accessible directly, without further operations and accesses pointers using a bit mask (that’s what Mozilla does). An alternative implementation could do it the other way around and make pointers accessible directly and doubles using an offset (i.e. you would add
0x0001000000000000
on inserting a double and subtract it when retrieving it). The latter is what WebKit does.On 32 bit machines one can access the lower 32 bits of the 64 bit value directly. So both pointers and integers can be accessed directly in any case. On the other hand, whereas the tagged pointer with embedded integer approach would need only a 32 bit value on 32 bit machines, the NaN approach will need a 64 bit value in both 32 bit and 64 bit environments. So on 32 bit two words need to be passed around, instead of just one (that’s why Mozilla calls this approach fatvals).
Still, considering the overall wins it is worth it: Both integers and doubles (and for that matter also bools and all other “small” values) can be stored directly in the value, without accessing the heap.
Links
via www.npopov.com https://www.npopov.com
February 13, 2025 at 10:17AM
The text was updated successfully, but these errors were encountered: