feat: context cache #70

jeswr · 2023-10-27T09:36:08Z

Creates a context caching mechanism. The use case of this is parsing numerous json-ld objects which all have the same context object; where context parsing is often the main bottleneck; see performance results below:

Parse a context that has not been cached; and without caching in place x 108 ops/sec ±0.32% (86 runs sampled)
Parse a list of iri contexts that have been cached x 79,985 ops/sec ±0.44% (90 runs sampled)
Parse a context object that has not been cached x 1,950 ops/sec ±1.36% (90 runs sampled)
Parse a context object that has been cached x 7,637 ops/sec ±0.20% (91 runs sampled)

coveralls · 2023-10-27T09:39:39Z

Pull Request Test Coverage Report for Build 6666516387

53 of 53 (100.0%) changed or added relevant lines in 3 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.1%) to 99.904%

Totals
Change from base Build 6492290900:	-0.1%
Covered Lines:	570
Relevant Lines:	570

💛 - Coveralls

rubensworks · 2023-10-27T11:07:51Z

This looks very neat, thanks @jeswr!

Before I do an in-depth review, I have some general questions.

What is the overall performance impact of this when caching is not very valuable? Might be useful to run the parser perf checks again with an without this cache. I suspect hashing to have some overhead.
Do you have insights on memory consumption? My fear is that for parsers that are reused, many (potentially large) contexts can be cached. I know there's the LRU cache, but I wonder if we should consider disabling the cache by default (not sure tbh).
Do all JSON-LD spec tests still pass with this change?

Pinging @sdevalk, as this will interest him regarding rubensworks/rdf-dereference.js#48

jeswr · 2023-10-27T11:28:06Z

What is the overall performance impact of this when caching is not very valuable? Might be useful to run the parser perf checks again with an without this cache. I suspect hashing to have some overhead.

In the worst case it seems to be an extra 50% overhead. It can be worked around by trying to work by-reference rather than by hash most of the time (see https://github.com/inrupt/solid-client-vc-js/blob/0f8ce276b6ea8a977b9d2ea189bc92385ef44b48/src/parser/jsonld.ts#L70-L111) however; the danger here is that you accidentally consume a large amount of memory by using contexts as keys so I'm leaving this for a subsequent piece of work once better benchmarking is in place and we have a good way of pruning them quickly.

FYI the custom caching mechanism in https://github.com/inrupt/solid-client-vc-js/blob/0f8ce276b6ea8a977b9d2ea189bc92385ef44b48/src/parser/jsonld.ts reduced the time on e2e tests from 20min to 20seconds.

Do you have insights on memory consumption? My fear is that for parsers that are reused, many (potentially large) contexts can be cached. I know there's the LRU cache, but I wonder if we should consider disabling the cache by default (not sure tbh).

I've disabled caching by default in b48cc07. So we will need to make an update to jsonld-streaming-parser to allow custom contexts / caches to be passed in.

Do all JSON-LD spec tests still pass with this change?

I'm not seeing rdf-test-suite as a dev dependency of this library or any commands to run spec testing. Do you have a script somewhere to run spec tests on this lib?

rubensworks · 2023-10-27T11:47:31Z

danger here is that you accidentally consume a large amount of memory by using contexts as keys so I'm leaving this for a subsequent piece of work once better benchmarking is in place and we have a good way of pruning them quickly.

I suspect that caching by reference would consume less memory than hashing, as caching is only done on pointers towards shared memory.

reduced the time on e2e tests from 20min to 20seconds.

Ooh, nice!

I'm not seeing rdf-test-suite as a dev dependency of this library or any commands to run spec testing. Do you have a script somewhere to run spec tests on this lib?

This will have to be tested by manually plugging this into jsonld-streaming-parser.

jeswr · 2023-10-27T11:55:44Z

as caching is only done on pointers towards shared memory.

My main concern here is that if we have a poor configuration of the lru-cache then we are preventing GC on context objects that have already been parsed and may no longer be relevant.

rubensworks · 2023-10-27T12:06:33Z

My main concern here is that if we have a poor configuration of the lru-cache then we are preventing GC on context objects that have already been parsed and may no longer be relevant.

I would suspect lru-cache to be very well battle-tested by now, but I don't know enough of its internals to make any hard claims :-)

jeswr added 4 commits October 27, 2023 00:24

chore: create cache for normalized jsonld contexts

7ba6b6b

chore: get context cache working

2bc7078

feat: context caching

92a4d23

chore: update node test matrix

a4528be

jeswr added 3 commits October 27, 2023 10:50

chore: improve testing of caching

f72a98a

chore: improve test coverage

d8c354f

cihore: fix parent context tests

a6ca647

chore: dont cache contexts by default

b48cc07

This was referenced Oct 28, 2023

fix: don't mutate contexts rubensworks/jsonld-streaming-parser.js#119

Merged

perf: remove unnecessary cloning and iteration #73

Merged

feat: jsonld parsing without re-serialization inrupt/solid-client-vc-js#849

Merged

jeswr mentioned this pull request Nov 20, 2023

performance: expand contexts in parseInnerContexts #74

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: context cache #70

feat: context cache #70

jeswr commented Oct 27, 2023 •

edited

Loading

coveralls commented Oct 27, 2023 •

edited

Loading

rubensworks commented Oct 27, 2023

jeswr commented Oct 27, 2023 •

edited

Loading

rubensworks commented Oct 27, 2023

jeswr commented Oct 27, 2023

rubensworks commented Oct 27, 2023

feat: context cache #70

Are you sure you want to change the base?

feat: context cache #70

Conversation

jeswr commented Oct 27, 2023 • edited Loading

coveralls commented Oct 27, 2023 • edited Loading

Pull Request Test Coverage Report for Build 6666516387

💛 - Coveralls

rubensworks commented Oct 27, 2023

jeswr commented Oct 27, 2023 • edited Loading

rubensworks commented Oct 27, 2023

jeswr commented Oct 27, 2023

rubensworks commented Oct 27, 2023

jeswr commented Oct 27, 2023 •

edited

Loading

coveralls commented Oct 27, 2023 •

edited

Loading

jeswr commented Oct 27, 2023 •

edited

Loading