playing with rpython: build via rpython -O jit rptoy.py
this is an attempt to (a) learn how to structure an rpython bytecode interpreter, and (b) check that we can get the JIT to optimise in the cases we hope it will
very promising! gcd
shows that multiway loop nests are JIT'ed as well as plain loops; that cfac
is JIT'ed to similar performance as fac shows that the JIT successfully elides
the gratuitous closure call that occurs in the former and not in the latter. csum
and sum
put a much cheaper op in the closure, and the JIT still performs admirably.
hfac
and hsum
are jitted equivalently to their non-inner-function variants, and hgcd
comes close.
Finally tare
(4000x the null program) shows that we don't have an onerous startup overhead.
test | w/ JIT | no JIT |
---|---|---|
tare | 0m0.008s | 0m0.008s |
fac | 0m0.886s | 0m1.316s |
cfac | 0m0.883s | 0m1.641s |
hfac | 0m0.870s | 0m1.382s |
gcd | 0m0.038s | 0m0.670s |
hgcd | 0m0.042s | 0m0.740s |
sum | 0m0.035s | 0m1.085s |
csum | 0m0.034s | 0m1.884s |
hsum | 0m0.035s | 0m1.229s |
odd | 0m0.032s | 0m0.650s |
- fac
- Tail recursive factorial with an accumulator
- cfac
- Same as fac but with a gratuitous closure call enclosing the multiplication
- hfac
- factorial again, but with a recursive inner helper
- gcd
- Dijkstra's symmetric two-branched subtractive gcd; tests bridge generation
- hgcd
- GCD with recursive inner helper
- sum
- Like fac but adds instead of multiplies to increase the relative cost of control flow
- csum
- Like cfac, it wraps the addition in a gratuitous closure call
- hsum
- Like hfac
- odd
- Parity check via mutually recursive inner helpers
rpython
does a much better job with class-structured internals- to make things easy on the typer, use assert and bind directly to variables
can_enter_jit
andjit_merge_point
must not have any ops in between- the "virtualizables" argument is plural but can only take a single argument
- interpreter locals that aren't live between calls don't need to be listed in reds
- machine int/bigint shimmering is very effective
- virtualisation is very sensitive to other changes, but when it works, it's amazing
- "unpredictable loops" have a longer jit warmup time
- continue testing/integrating CF Bolz-Tereick's other recommendations
- try out floats
- maybe move beyond hand-assembled tests?