Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing test suite - all.lua #62

Open
Pavel-Durov opened this issue Aug 21, 2023 · 4 comments
Open

Failing test suite - all.lua #62

Pavel-Durov opened this issue Aug 21, 2023 · 4 comments
Assignees

Comments

@Pavel-Durov
Copy link
Contributor

Issue

Failing test suite all.lua.
all.lua file used as a test suite, it sets up the environment and executes multiple tests.

The all.lua execution failure can be tofo with all.lua itself or with the invoked induvisual tests.

Steps to reproduce

$ ../src/lua -e"_U=true" all.lua

        Starting Tests
random seeds: 1692615247, 27885112
current path:
****/usr/local/share/lua/5.4/?.lua;/usr/local/share/lua/5.4/?/init.lua;/usr/local/lib/lua/5.4/?.lua;/usr/local/lib/lua/5.4/?/init.lua;./?.lua;./?/init.lua****

    ---- total memory: 35.8K, max memory: 35.8K ----

time: 3e-05 (+3e-05)

***** FILE 'main.lua'*****
Segmentation fault (core dumped)

Notes

all.lua file overrides the native dofile function https://github.com/ykjit/yklua/blob/main/tests/all.lua#L143
Where each test file:

  1. loaded
  2. stringified to a binary representation
  3. loaded back from binary
  4. executed as a function

Based on the attached backtrace, it looks like we have an exception when freeing YK trace locations, probably caused by this dump/load process. Needs further investigation 🕵️

Backtrace

Program received signal SIGSEGV, Segmentation fault.
core::sync::atomic::AtomicUsize::fetch_sub (self=0xfffffffffffffff0) at /rustc/8c74a5d27c644a0f7a22bb2fa8dd3ff8257bc220/library/core/src/sync/atomic.rs:2540
2540    /rustc/8c74a5d27c644a0f7a22bb2fa8dd3ff8257bc220/library/core/src/sync/atomic.rs: No such file or directory.
(gdb) 
(gdb) 
(gdb) bt
#0  core::sync::atomic::AtomicUsize::fetch_sub (self=0xfffffffffffffff0)
    at /rustc/8c74a5d27c644a0f7a22bb2fa8dd3ff8257bc220/library/core/src/sync/atomic.rs:2540
#1  alloc::sync::{impl#27}::drop<lock_api::mutex::Mutex<parking_lot::raw_mutex::RawMutex, ykrt::location::HotLocation>> (self=0x7fffffffcc20)
    at /rustc/8c74a5d27c644a0f7a22bb2fa8dd3ff8257bc220/library/alloc/src/sync.rs:1864
#2  0x00007ffff7b062eb in core::ptr::drop_in_place<alloc::sync::Arc<lock_api::mutex::Mutex<parking_lot::raw_mutex::RawMutex, ykrt::location::HotLocation>>> ()
    at /rustc/8c74a5d27c644a0f7a22bb2fa8dd3ff8257bc220/library/core/src/ptr/mod.rs:497
#3  0x00007ffff7b2eabe in core::mem::drop<alloc::sync::Arc<lock_api::mutex::Mutex<parking_lot::raw_mutex::RawMutex, ykrt::location::HotLocation>>> (_x=...)
    at /rustc/8c74a5d27c644a0f7a22bb2fa8dd3ff8257bc220/library/core/src/mem/mod.rs:987
#4  0x00007ffff7b0a4c0 in ykrt::location::{impl#1}::drop (self=0x7fffffffcc78) at ykrt/src/location.rs:196
#5  0x00007ffff7b04d3b in core::ptr::drop_in_place<ykrt::location::Location> ()
    at /rustc/8c74a5d27c644a0f7a22bb2fa8dd3ff8257bc220/library/core/src/ptr/mod.rs:497
#6  0x00007ffff7b04fdd in core::mem::drop<ykrt::location::Location> (_x=...) at /rustc/8c74a5d27c644a0f7a22bb2fa8dd3ff8257bc220/library/core/src/mem/mod.rs:987
#7  0x00007ffff7b04cdd in ykcapi::yk_location_drop (loc=...) at ykcapi/src/lib.rs:87
#8  0x000000000088aede in free_loc (f=<optimised out>, i=<optimised out>, idx=<optimised out>) at lyk.c:66
#9  0x000000000088b075 in yk_free_locactions (f=0x926290) at lyk.c:76
#10 0x0000000000807c08 in luaF_freeproto (L=0x916e38, f=0x926290) at lfunc.c:276
#11 0x000000000080a4ae in freeobj (L=0x916e38, o=0x926290) at lgc.c:767
#12 0x0000000000815358 in deletelist (L=0x916e38, p=0x926290, limit=<optimised out>) at lgc.c:1494
#13 0x0000000000815164 in luaC_freeallobjects (L=0x916e38) at lgc.c:1511
#14 0x000000000084fd4d in close_state (L=0x916e38) at lstate.c:276
#15 0x00000000008503ce in lua_close (L=0x916e38) at lstate.c:421
#16 0x00000000007d9aa0 in main (argc=<optimised out>, argv=<optimised out>) at lua.c:663

Versions

YKLua - main/0a80b8adafdba03d111dcc5e032ce3d6e3c913f7
YK - master/a690e6eaf7253a4fa9c04bb54406c07b7efc0085

@Pavel-Durov
Copy link
Contributor Author

Progress update:
yk_location_drop is called multiple times, sometimes with YK location having Location { inner: 0 } value.

$1 = ykrt::location::Location {inner: core::sync::atomic::AtomicUsize {v: core::cell::UnsafeCell<usize> {value: 117585}}}
....
$2 = ykrt::location::Location {inner: core::sync::atomic::AtomicUsize {v: core::cell::UnsafeCell<usize> {value: 0}}}}}

backtrace:

Program received signal SIGSEGV, Segmentation fault.
core::sync::atomic::AtomicUsize::fetch_sub (self=0xfffffffffffffff0) at /rustc/8c74a5d27c644a0f7a22bb2fa8dd3ff8257bc220/library/core/src/sync/atomic.rs:2540
2540    /rustc/8c74a5d27c644a0f7a22bb2fa8dd3ff8257bc220/library/core/src/sync/atomic.rs: No such file or directory.
#0  core::sync::atomic::AtomicUsize::fetch_sub (self=0xfffffffffffffff0) at /rustc/8c74a5d27c644a0f7a22bb2fa8dd3ff8257bc220/library/core/src/sync/atomic.rs:2540
#1  alloc::sync::{impl#27}::drop<lock_api::mutex::Mutex<parking_lot::raw_mutex::RawMutex, ykrt::location::HotLocation>> (self=0x7fffffffce60) at /rustc/8c74a5d27c644a0f7a22bb2fa8dd3ff8257bc220/library/alloc/src/sync.rs:1864
#2  0x00007ffff7b062eb in core::ptr::drop_in_place<alloc::sync::Arc<lock_api::mutex::Mutex<parking_lot::raw_mutex::RawMutex, ykrt::location::HotLocation>>> () at /rustc/8c74a5d27c644a0f7a22bb2fa8dd3ff8257bc220/library/core/src/ptr/mod.rs:497
#3  0x00007ffff7b2ebce in core::mem::drop<alloc::sync::Arc<lock_api::mutex::Mutex<parking_lot::raw_mutex::RawMutex, ykrt::location::HotLocation>>> (_x=...) at /rustc/8c74a5d27c644a0f7a22bb2fa8dd3ff8257bc220/library/core/src/mem/mod.rs:987
#4  0x00007ffff7b0a53b in ykrt::location::{impl#1}::drop (self=0x7fffffffcf18) at ykrt/src/location.rs:197
#5  0x00007ffff7b04d3b in core::ptr::drop_in_place<ykrt::location::Location> () at /rustc/8c74a5d27c644a0f7a22bb2fa8dd3ff8257bc220/library/core/src/ptr/mod.rs:497
#6  0x00007ffff7b04fdd in core::mem::drop<ykrt::location::Location> (_x=...) at /rustc/8c74a5d27c644a0f7a22bb2fa8dd3ff8257bc220/library/core/src/mem/mod.rs:987
#7  0x00007ffff7b04cdd in ykcapi::yk_location_drop (loc=...) at ykcapi/src/lib.rs:87
#8  0x000000000088d8ed in yk_free_proto (f=<optimised out>) at lyk.c:99
#9  0x0000000000809f28 in luaF_freeproto (L=0x919e38, f=0x8f9410) at lfunc.c:276
#10 0x000000000080c7ce in freeobj (L=0x919e38, o=0x8f9410) at lgc.c:767
#11 0x0000000000817678 in deletelist (L=0x919e38, p=0x8f9410, limit=<optimised out>) at lgc.c:1494
#12 0x0000000000817484 in luaC_freeallobjects (L=0x919e38) at lgc.c:1511
#13 0x000000000085206d in close_state (L=0x919e38) at lstate.c:276
#14 0x00000000008526ee in lua_close (L=0x919e38) at lstate.c:421
#15 0x00000000007dbd40 in main (argc=<optimised out>, argv=<optimised out>) at lua.c:663

Need to see how these locations are set 🕵️‍♀️

bors bot added a commit that referenced this issue Aug 25, 2023
64: Track initialised yk locations r=ltratt a=Pavel-Durov

Partially fixes (only for serial compilation)  - #62

# Changes

- Remove reallocarray in favour of calloc
- Change YkLocations to be stored as pointers
- Change lyk interface to reflect its "hooks" functionality
- Move tests to a separate test script and enable serialised all.lua test suite
- Added logging in lyk module for ease of debugging
- Updated readme

`test.sh` was tested for resiliency multiple times:
```shell
$ try_repeat -v 100 sh ./test.sh
``` 
# Issues

## Uninitialised locations memory 
`reallocarray` does not initialise memory with default zero bytes locations as `calloc`. 
Moving to `calloc` and using pointers allowed to check for NULL with default values set at initialisation time.

## Tests

When lua tests are executed one by one as single files it produces different results from when it's executed via `all.lua` test suite.

Example:

```shell

$ YKD_SERIALISE_COMPILATION=1 ../src/lua -e"_U=true" ./main.lua 
...
../src/lua: ./main.lua:343: assertion failed!
stack traceback:
        [C]: in function 'assert'
        ./main.lua:343: in main chunk
        [C]: in ?
```
When run through `all.lua` it passes:
```shell
$ YKD_SERIALISE_COMPILATION=1 ../src/lua -e"_U=true" ./all.lua 
...
***** FILE 'main.lua'*****
...
***** FILE 'gc.lua'*****
...
```
Since `all.lua` is what we base the stability of yklua, I think we should include it in the tests as is, currently only serialised compilation works.

Running `all.lua` prior to these changes results in error (`main/56c5787799b876f36babbae24e9afc025806b831`):
```
$ YKD_SERIALISE_COMPILATION=1 gdb -ex 'r' -ex 'bt' --args ../src/lua -e"_U=true" ./all.lua 
...
***** FILE 'main.lua'*****

Program received signal SIGSEGV, Segmentation fault.
core::sync::atomic::AtomicUsize::fetch_sub (self=<optimised out>) at /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/core/src/sync/atomic.rs:2575
2575    /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/core/src/sync/atomic.rs: No such file or directory.
#0  core::sync::atomic::AtomicUsize::fetch_sub (self=<optimised out>) at /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/core/src/sync/atomic.rs:2575
#1  alloc::sync::{impl#33}::drop<lock_api::mutex::Mutex<parking_lot::raw_mutex::RawMutex, ykrt::location::HotLocation>, alloc::alloc::Global> (self=0x7fffffffb410)
    at /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/alloc/src/sync.rs:2370
#2  0x00007ffff7b0d4ab in core::ptr::drop_in_place<alloc::sync::Arc<lock_api::mutex::Mutex<parking_lot::raw_mutex::RawMutex, ykrt::location::HotLocation>, alloc::alloc::Global>> () at /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/core/src/ptr/mod.rs:497
#3  0x00007ffff7b0067e in core::mem::drop<alloc::sync::Arc<lock_api::mutex::Mutex<parking_lot::raw_mutex::RawMutex, ykrt::location::HotLocation>, alloc::alloc::Global>> (_x=...) at /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/core/src/mem/mod.rs:987
#4  0x00007ffff7b121f0 in ykrt::location::{impl#1}::drop (self=0x7fffffffb468) at ykrt/src/location.rs:197
#5  0x00007ffff7affa9b in core::ptr::drop_in_place<ykrt::location::Location> () at /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/core/src/ptr/mod.rs:497
#6  0x00007ffff7aff6bd in core::mem::drop<ykrt::location::Location> (_x=...) at /rustc/ef85656a10657ba5e4f7fe2931a4ca6293138d51/library/core/src/mem/mod.rs:987
#7  0x00007ffff7b004bd in ykcapi::yk_location_drop (loc=...) at ykcapi/src/lib.rs:90
#8  0x000000000088aa0e in free_loc (f=<optimised out>, i=<optimised out>, idx=<optimised out>) at lyk.c:66
#9  0x000000000088aba5 in yk_free_locactions (f=0x928cc0) at lyk.c:76
--Type <RET> for more, q to quit, c to continue without paging--
#10 0x0000000000807738 in luaF_freeproto (L=0x916e38, f=0x928cc0) at lfunc.c:276
#11 0x0000000000809fde in freeobj (L=0x916e38, o=0x928cc0) at lgc.c:767
#12 0x0000000000817145 in sweepgen (L=0x916e38, g=<optimised out>, p=<optimised out>, limit=<optimised out>, pfirstold1=<optimised out>) at lgc.c:1106
#13 0x0000000000816713 in youngcollection (L=0x916e38, g=0x916f00) at lgc.c:1239
#14 0x000000000081557d in genstep (L=0x916e38, g=0x916f00) at lgc.c:1434
#15 0x0000000000815044 in luaC_step (L=0x916e38) at lgc.c:1686
#16 0x00000000007e4447 in lua_pushstring (L=0x916e38, s=<optimised out>) at lapi.c:553
#17 0x000000000089e60c in findloader (L=0x916e38, name=0x928df8 "tracegc") at loadlib.c:641
#18 0x000000000089de0a in ll_require (L=0x916e38) at loadlib.c:666
#19 0x00000000007fda75 in precallC (L=0x916e38, func=<optimised out>, nresults=<optimised out>, f=0x89db80 <ll_require>) at ldo.c:506
#20 0x00000000007fe016 in luaD_precall (L=0x916e38, func=0x923780, nresults=1) at ldo.c:569
#21 0x0000000000883f68 in luaV_execute (L=0x916e38, ci=<optimised out>) at lvm.c:1655
#22 0x00000000007feb3b in ccall (L=0x916e38, func=<optimised out>, nResults=<optimised out>, inc=<optimised out>) at ldo.c:609
#23 0x00000000007fec61 in luaD_callnoyield (L=0x916e38, func=0x917730, nResults=-1) at ldo.c:627
#24 0x00000000007eaf03 in f_call (L=0x916e38, ud=<optimised out>) at lapi.c:1041
#25 0x00000000007f8be7 in luaD_rawrunprotected (L=0x916e38, f=0x7eae40 <f_call>, ud=0x7ffff2487308) at ldo.c:144
#26 0x0000000000801406 in luaD_pcall (L=0x916e38, func=0x7eae40 <f_call>, u=0x7ffff2487308, old_top=<optimised out>, ef=<optimised out>) at ldo.c:926
#27 0x00000000007ea9ec in lua_pcallk (L=0x916e38, nargs=<optimised out>, nresults=<optimised out>, errfunc=<optimised out>, ctx=<optimised out>, k=<optimised out>)
    at lapi.c:1067
#28 0x00000000007dc623 in docall (L=0x916e38, narg=0, nres=-1) at lua.c:160
#29 0x00000000007dbd24 in handle_script (L=0x916e38, argv=<optimised out>) at lua.c:255
#30 0x00000000007d9fe3 in pmain (L=0x916e38) at lua.c:634
#31 0x00000000007fda75 in precallC (L=0x916e38, func=<optimised out>, nresults=<optimised out>, f=0x7d97f0 <pmain>) at ldo.c:506
#32 0x00000000007fe0c8 in luaD_precall (L=0x916e38, func=0x9176f0, nresults=1) at ldo.c:572
#33 0x00000000007fea7f in ccall (L=0x916e38, func=0x9176f0, nResults=1, inc=<optimised out>) at ldo.c:607
#34 0x00000000007fec61 in luaD_callnoyield (L=0x916e38, func=0x9176f0, nResults=1) at ldo.c:627
#35 0x00000000007eaf03 in f_call (L=0x916e38, ud=<optimised out>) at lapi.c:1041
#36 0x00000000007f8be7 in luaD_rawrunprotected (L=0x916e38, f=0x7eae40 <f_call>, ud=0x7ffff2487058) at ldo.c:144
#37 0x0000000000801406 in luaD_pcall (L=0x916e38, func=0x7eae40 <f_call>, u=0x7ffff2487058, old_top=<optimised out>, ef=<optimised out>) at ldo.c:926
#38 0x00000000007ea9ec in lua_pcallk (L=0x916e38, nargs=<optimised out>, nresults=<optimised out>, errfunc=<optimised out>, ctx=<optimised out>, k=<optimised out>)
    at lapi.c:1067
#39 0x00000000007d94b0 in main (argc=<optimised out>, argv=<optimised out>) at lua.c:660
```

That's cause we're freeing uninitialised yk locations.


Co-authored-by: Pavel Durov <[email protected]>
@Pavel-Durov
Copy link
Contributor Author

Lua tests behave differently when executed through all.lua and individually.

Direct execution of math.lua error:

YKD_PRINT_JITSTATE=1 YKD_SERIALISE_COMPILATION=1 /home/pd/yklua-fork/src/lua -e"_U=true" ./math.lua 
instrinsic is missing `yk.intrinsic.inlined` metadata:   %13 = call double @llvm.floor.f64(double %0), !dbg !9453

Execution ofmath.lua via all.lua:

YKD_PRINT_JITSTATE=1 YKD_SERIALISE_COMPILATION=1 /home/pd/yklua-fork/src/lua -e"_U=true" ./all.lua 
...
final OK !!!

We know that math.lua test should fail, but it doesn't 🤔.

One assumption was that all.lua executed sub-tests with the non-yk lua interpreter.
But then I added the lua script to print process information:

local function print_env()
    print(string.format("YKD_SERIALISE_COMPILATION=%s", os.getenv("YKD_SERIALISE_COMPILATION") or "UNDEFINED"))
    print(string.format("YKD_PRINT_JITSTATE=%s", os.getenv("YKD_PRINT_JITSTATE") or "UNDEFINED"))
end

local function exec_cmd(cmd)
    local openPop = assert(io.popen(cmd, 'r'))
    local output = openPop:read('*all')
    openPop:close()
    return output
end

local function print_proc_info()
    print(exec_cmd("ps -Flww -p $(pgrep -x lua)"))
    print(exec_cmd("pstree -p $(pgrep -x lua)"))
end

function proc_print_info(file_name)
    print(string.format("> %s proc info.", file_name))
    print_env()
    print_proc_info()
end

all.lua tests:

YKD_PRINT_JITSTATE=1 YKD_SERIALISE_COMPILATION=1 /home/pd/yklua-fork/src/lua -e"_U=true" ./all.lua
...
> all.lua proc info.
YKD_SERIALISE_COMPILATION=1
YKD_PRINT_JITSTATE=1
F S UID          PID    PPID  C PRI  NI ADDR SZ WCHAN    RSS PSR STIME TTY          TIME CMD
0 S pd       2049911 2049788 25  80   0 - 24603 pipe_r 40016   7 13:15 pts/19   00:00:00 /home/pd/yklua-fork/src/lua -e_U=true ./all.lua

lua(2049911)---sh(2049915)---pstree(2049917)
...
***** FILE 'math.lua'*****
> math.lua proc info.
YKD_SERIALISE_COMPILATION=1
YKD_PRINT_JITSTATE=1
F S UID          PID    PPID  C PRI  NI ADDR SZ WCHAN    RSS PSR STIME TTY          TIME CMD
0 S pd       2049911 2049788 60  80   0 - 24666 pipe_r 40596   7 13:15 pts/19   00:00:00 /home/pd/yklua-fork/src/lua -e_U=true ./all.lua

lua(2049911)---sh(2049921)---pstree(2049923)
...
final OK !!!

math.lua tests:

[13:14:20] $ YKD_PRINT_JITSTATE=1 YKD_SERIALISE_COMPILATION=1 /home/pd/yklua-fork/src/lua -e"_U=true" ./math.lua 
> math.lua proc info.
YKD_SERIALISE_COMPILATION=1
YKD_PRINT_JITSTATE=1
F S UID          PID    PPID  C PRI  NI ADDR SZ WCHAN    RSS PSR STIME TTY          TIME CMD
0 S pd       2049733 2036361 81  80   0 - 24658 pipe_r 40448   0 13:14 pts/26   00:00:00 /home/pd/yklua-fork/src/lua -e_U=true ./math.lua

lua(2049733)---sh(2049737)---pstree(2049739)
...
instrinsic is missing `yk.intrinsic.inlined` metadata:   %13 = call double @llvm.floor.f64(double %0), !dbg !9453 x pd@bencher8 ~/yklua-fork/tests sort-test-serialised/4bdeede7db8b8a3b6eb10e94e0a9d654c54a3212

Based on the output, I can conclude that in both cases yk-enabled lua interpreter is used.

The mystery continues 🕵️

@ltratt
Copy link
Contributor

ltratt commented Sep 17, 2023

@ptersilie Perhaps when you're done on the depot speed up, you could have a look at this, to see if we can find any clues to help @Pavel-Durov ?

@Pavel-Durov
Copy link
Contributor Author

After looking at the issue with @ptersilie
Looks like the problem is with the yk_lookup for dynamically loaded functions (load calls in all.lua).
When functions are loaded dynamically, the locations are set but later on, the lookup fails to get the locations by the PC index.

bors bot added a commit that referenced this issue Sep 18, 2023
68: Fix location set on proto loading. r=ptersilie a=Pavel-Durov

Related: #62

+ Identifying loop instructions using proto code instead of loop index.

Co-authored-by: Pavel Durov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants