perf(rpc): io_uring integration & redesign #477

InKryption · 2025-01-03T16:42:44Z

get receive working
get send working
error handling:
- eliminate explicit TODO panics
- improve first_error error handling if practical

follow up work (not part of this pr): #517

And re-organize some methods based on that change

Do not exit for *any* errors that are specific to the related connection, simply free them and continue to the next CQE. Specifically in the case of `error.SubmissionQueueFull`, instead of immediately failing, we instead first try to flush the submission queue and then try again to submit; if it fails a second time, that means despite flushing the submission queue, it somehow still failed, so we panic, since this indicates something is *very* wrong. This also eliminates the `pending_cqes_buf`, since there is actually no situation in which `consumeOurCqe` returns an error, and we resume work afterwards - either we process all the received CQEs, or we hard exit - this was already essentially the case before, now it's more obvious. For the main submit, we now wait for at least 1 connection, but we also add a timeout SQE to make it terminate if we don't receive a connection or completion of another task for 1 second; this alleviates the busy loop that was running before.

Also slightly refactor error sets. Now instead of checking to see if we need to set a flag to re-queue the multishot accept, we just pass in the server context on init and queue it, which now makes sense since the context and workpool are separate.

Also add fix for rebase

* Move more specific functions to the only files they're used. * Move the `serve*` functions outside of `Context`, making them free functions which just accept the context and work pool. * Remove `acceptAndServeConnection`; originally this was required to be able to nicely structure the unit test, and used to be more integrated, however it no longer makes sense as a concept. * Inline `handleRequest` into the basic backend. * Make the `acceptHandled` function, moved into the basic backend, guarantee the specified `sync` behavior, and inline `have_accept4`. * Appropriately re-export the relevant parts of the server API. * Added top level doc comments.

And disable the rpc server test when it is enabled

codecov · 2025-02-10T19:03:38Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Files with missing lines	Coverage Δ
src/accountsdb/db.zig	`88.98% <ø> (ø)`
src/accountsdb/snapshots.zig	`88.84% <100.00%> (ø)`
src/ledger/blockstore.zig	`100.00% <ø> (ø)`
src/ledger/cleanup_service.zig	`84.04% <ø> (ø)`
src/ledger/database/hashmap.zig	`91.66% <ø> (ø)`
src/ledger/database/rocksdb.zig	`95.58% <ø> (ø)`
src/ledger/fuzz.zig	`0.00% <ø> (ø)`
src/rpc/server/server.zig	`100.00% <100.00%> (ø)`
src/utils/fmt.zig	`86.66% <ø> (ø)`
src/utils/io.zig	`96.66% <100.00%> (+4.66%)`	⬆️

0xNineteen

amazing work

* RPC server: io_uring upgrade Separates the server into two parts: the context, and the work pool; the context contains everything generally needed to run the server, the work pool contains a statically polymorphic implementation for a pool to dispatch the actual work to. In doing this, we also separate certain things out into a few different files. The RPC server context API has been modified slightly to reflect this, and the work pool directly exposed, for now. * Don't use file-as-struct * Run style script, respect line length limit * Improve accept failure handling & update TODOs * Handle potentially failing/cancelling of `accept_multishot` by re-queueing it, based on the `IORING_CQE_F_MORE` flag. * Revise/simplify the queueing logic for the `accept_multishot` SQE. * Resolve the EINTR TODO panics, returning a catch-all error value indicating it as a bad but non-critical error. * Update the `a: ?noreturn` `if (a) |*b|` TODO, adding that it's solved in 0.14; it should be resolved after we update to 0.14. * Unify EAGAIN panic message. * Add TODO to remove hacky-ish workaround * Use `self: Type` convention * A few minor fixups and improvements * Simplify test, make server socket nonblocking On MacOS, on basic WorkPool, this means we now need to manually set the accepted socket's flags to the right things, ie, blocking, as opposed to the server socket's nonblocking mode. Means we also have to handle EAGAIN a bit differently in the io_uring backend, but that's a fine tradeoff. * server.zig -> server/lib.zig * Segregate out the basic backend And re-organize some methods based on that change * Re-organize server module * Update LOGGER_SCOPE * Simplify & improve io_uring backend error handling * De-scope `accept_flags` * Simplify `can_use` for linux cross-compilation * (io_uring) Rework error handling, add timeout Do not exit for *any* errors that are specific to the related connection, simply free them and continue to the next CQE. Specifically in the case of `error.SubmissionQueueFull`, instead of immediately failing, we instead first try to flush the submission queue and then try again to submit; if it fails a second time, that means despite flushing the submission queue, it somehow still failed, so we panic, since this indicates something is *very* wrong. This also eliminates the `pending_cqes_buf`, since there is actually no situation in which `consumeOurCqe` returns an error, and we resume work afterwards - either we process all the received CQEs, or we hard exit - this was already essentially the case before, now it's more obvious. For the main submit, we now wait for at least 1 connection, but we also add a timeout SQE to make it terminate if we don't receive a connection or completion of another task for 1 second; this alleviates the busy loop that was running before. * (io_uring) Remove multishot_accept_submitted Also slightly refactor error sets. Now instead of checking to see if we need to set a flag to re-queue the multishot accept, we just pass in the server context on init and queue it, which now makes sense since the context and workpool are separate. * (io_uring) Simplify new entry creation Also add fix for rebase * Misc fixups * Re-organize alias/import * General restructure * Move more specific functions to the only files they're used. * Move the `serve*` functions outside of `Context`, making them free functions which just accept the context and work pool. * Remove `acceptAndServeConnection`; originally this was required to be able to nicely structure the unit test, and used to be more integrated, however it no longer makes sense as a concept. * Inline `handleRequest` into the basic backend. * Make the `acceptHandled` function, moved into the basic backend, guarantee the specified `sync` behavior, and inline `have_accept4`. * Appropriately re-export the relevant parts of the server API. * Added top level doc comments. * Re-oorganize loggers & scopes * Refactor `build_options` imports * Add `no_network_tests` build option And disable the rpc server test when it is enabled * Update circleci with `-Dno-network-tests`

InKryption self-assigned this Jan 3, 2025

0xNineteen changed the title ~~RPC server: io_uring upgrade WIP~~ perf(rpc): io_uring WIP Jan 3, 2025

InKryption force-pushed the ink/rpc-server-optimize branch 6 times, most recently from 4f8ccba to 6b56293 Compare January 14, 2025 20:36

InKryption force-pushed the ink/rpc-server-optimize branch 6 times, most recently from 9f42c0c to 22c4f6d Compare January 20, 2025 19:09

InKryption changed the title ~~perf(rpc): io_uring WIP~~ perf(rpc): io_uring integration & redesign Jan 20, 2025

InKryption force-pushed the ink/rpc-server-optimize branch from 72c0b7f to 945988d Compare January 20, 2025 19:25

InKryption marked this pull request as ready for review January 21, 2025 10:33

InKryption requested review from Sobeston and 0xNineteen January 21, 2025 10:33

InKryption force-pushed the ink/rpc-server-optimize branch 2 times, most recently from 922a7bb to 8fdab03 Compare January 23, 2025 05:24

0xNineteen requested a review from kprotty January 23, 2025 16:38

InKryption mentioned this pull request Jan 24, 2025

perf(rpc server): Improve crossplatform basic workpool performance #514

Open

InKryption force-pushed the ink/rpc-server-optimize branch 7 times, most recently from 6195797 to ecfe590 Compare January 24, 2025 21:37

InKryption added 16 commits February 10, 2025 19:12

server.zig -> server/lib.zig

6de1c00

Segregate out the basic backend

733cb36

And re-organize some methods based on that change

Re-organize server module

142214a

Update LOGGER_SCOPE

2e469d0

Simplify & improve io_uring backend error handling

871791d

De-scope accept_flags

b9c91d3

Simplify can_use for linux cross-compilation

b1bf268

(io_uring) Simplify new entry creation

f2850a1

Also add fix for rebase

Misc fixups

a62bd8c

Re-organize alias/import

b180b08

Re-oorganize loggers & scopes

2708dd9

Refactor build_options imports

ff55d3f

Add no_network_tests build option

1566938

And disable the rpc server test when it is enabled

InKryption force-pushed the ink/rpc-server-optimize branch from c70555e to ef620c8 Compare February 10, 2025 18:47

0xNineteen previously approved these changes Feb 10, 2025

View reviewed changes

Update circleci with -Dno-network-tests

fb5963e

InKryption dismissed 0xNineteen’s stale review via fb5963e February 10, 2025 18:53

InKryption force-pushed the ink/rpc-server-optimize branch from ef620c8 to fb5963e Compare February 10, 2025 18:53

InKryption requested a review from 0xNineteen February 10, 2025 19:05

0xNineteen approved these changes Feb 10, 2025

View reviewed changes

0xNineteen added this pull request to the merge queue Feb 10, 2025

Merged via the queue into main with commit 694aa99 Feb 10, 2025
16 checks passed

0xNineteen deleted the ink/rpc-server-optimize branch February 10, 2025 19:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(rpc): io_uring integration & redesign #477

perf(rpc): io_uring integration & redesign #477

InKryption commented Jan 3, 2025 •

edited

Loading

codecov bot commented Feb 10, 2025 •

edited

Loading

0xNineteen left a comment

perf(rpc): io_uring integration & redesign #477

perf(rpc): io_uring integration & redesign #477

Conversation

InKryption commented Jan 3, 2025 • edited Loading

codecov bot commented Feb 10, 2025 • edited Loading

Codecov Report

0xNineteen left a comment

Choose a reason for hiding this comment

InKryption commented Jan 3, 2025 •

edited

Loading

codecov bot commented Feb 10, 2025 •

edited

Loading