Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inactive DocHandles are not evicted from handleCache #358

Open
yarolegovich opened this issue Jul 11, 2024 · 0 comments
Open

Inactive DocHandles are not evicted from handleCache #358

yarolegovich opened this issue Jul 11, 2024 · 0 comments

Comments

@yarolegovich
Copy link
Contributor

yarolegovich commented Jul 11, 2024

Hi, wanted to share my view on the DocHandle eviction problem and hear your thoughts. If the suggested approach sounds reasonable, I can start working on it this or early next week.

The problem:
DocHandles are hard-referenced in Repo#handleCache and never destroyed. The longer the program runs the more "idle" documents there are.

Constraints:

  • We definitely don't want to unload a document if client code is working with it (holds a reference).
  • We don't want to unload a document another peer is actively replicating to us, so that we don't need to load it back to memory on every new message.
  • There might be >10k handles if objects are small and each reside in their own document, unlikely >100k (?).

Suggested solution:

  1. Rename DocHandle to InternalDocHandle, all the private logic is going to stay here.
  2. Add DocHandle that'll hold a hard-ref to InternalDocHandle. Public API goes here.
  3. InternalDocHandle holds a weak-ref to DocHandle that hard-references it. We need this to avoid unloading documents referenced by client code.
  4. Maintain a lastChange timestamp on InternalDocHandle. We need it to avoid unloading documents which are being replicated.
  5. setInterval to scan all handles, removing those where DocHandle weak-ref is undefined and it's been more than a minute (?) since the last document update. Should probably be user-configurable.
  6. Scan can be incremental, yielding after 10k items not to hold event-loop for too long, we're ok with eventual eviction. We can continue the scan from where we left on the next interval trigger or just schedule a macro-task to continue after other callbacks had a go. I'd probably go with the latter approach.

Alternatives:

  1. Maintain an LRU-cache, moving handles in response to change events, when garbage-collecting we can iterate from least to most recently used and stop iteration as soon as time passed since lastChange doesn't exceed eviction threshold. Not a bad idea, but we don't gain much if user code holds references to a lot of objects which rarely change and sit in the tail. We'll need to go through all of them first. In addition with our data set size reference walking can be worse than linear array scan even if the latter checks more elements.
  2. Don't introduce InternalDocHandle. Let's say we maintain an LRU cache as in alternative#1 and based on lastChange timestamp we move handles from strong-ref active cache to inactive: Map<DocumentId, WeakMap<DocHandle>> cache. On sync message or find we can look up in both and return an existing handle which was still in-use by user-code. On change we can bring handles back to activeCache if they weren't there. Don't like this solution because:
    • Quite error-prone and unjustifiably complicated: when a handle is moved to inactiveCache we need to ensure that all internal references are cleaned-up. When we bring it back to active cache we need to ensure everything is in place, like there's an active docSynchronizer.
    • We'll still need to full-scan and periodically clean inactiveCache entries where deref resolves to undefined.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant