Add support for clearing doc from DocHandle #365

georgewsu · 2024-07-30T00:02:22Z

Summary:
Add support for clearing doc from DocHandle so that reference can be released and memory freed, without deleting document from storage

Issue:
#330

Background:
After creating enough documents in a repo, out of memory errors will cause a synch server to crash because memory isn't being freed. Repo holds references to each DocHandle in handleCache and each DocHandle holds a reference to the automerge doc.

Proposed solution:
https://github.com/georgewsu/automerge-repo/tree/gsu/cache-eviction-test2 has code for an initial implementation of cache eviction and releasing document reference so that memory can be freed. I've tested this with a sync server running locally.

Part 1:
This PR only covers introducing a new state and event to DocHandle to allow clearing the document.

…released and memory freed, without deleting document from storage

alexjg · 2024-08-14T21:51:25Z

Thanks for putting a bunch of work into this and the follow up PRs. I'm going to post my comments here but this feedback also relates to the linked work.

The addition of a new state to the DocHandle makes me uneasy. I've been thinking about why and I think it's because from the perspective of the application, the CLEARED state is not useful. I think instead we should just re-use the LOADING state. This is important because we need to figure out what to do if we evict a document which the application has a live DocHandle for (as discussed in #358 ). There seems to me to be not much difference between a handle which is loading because it is waiting for storage or network and a handle which is loading because the Repo has decided to evict it.

As to how to the time based cache expiration policy. I think there are some tactical problems with it - the accessTime seems to me like it would be based on when the handle was created, not on the last time the application used it. Maybe that works for some usecases but it doesn't seem very general. This feels more like something which should be configurable in some way, maybe by passing a function to the Repo.clearCache method.

georgewsu · 2024-08-14T22:49:43Z

Thanks for putting a bunch of work into this and the follow up PRs. I'm going to post my comments here but this feedback also relates to the linked work.

The addition of a new state to the DocHandle makes me uneasy. I've been thinking about why and I think it's because from the perspective of the application, the CLEARED state is not useful. I think instead we should just re-use the LOADING state. This is important because we need to figure out what to do if we evict a document which the application has a live DocHandle for (as discussed in #358 ). There seems to me to be not much difference between a handle which is loading because it is waiting for storage or network and a handle which is loading because the Repo has decided to evict it.

As to how to the time based cache expiration policy. I think there are some tactical problems with it - the accessTime seems to me like it would be based on when the handle was created, not on the last time the application used it. Maybe that works for some usecases but it doesn't seem very general. This feels more like something which should be configurable in some way, maybe by passing a function to the Repo.clearCache method.

Sounds good, agree with not having another state in DocHandle if it isn't necessary. @pvh had originally suggested reusing the IDLE state but I wasn't sure how that would work since the constructor sends the BEGIN event to transition to LOADING state. I'll work on removing the added state and reusing the LOADING state instead and will test it out.

The intention of accessTime is to track the last time the handle was used - will follow up with you on that approach / implementation details / changes need to support that.

Thanks!

pvh · 2024-08-15T00:35:08Z

Just a note that I'm going to let @alexjg be the reviewer for this one (unless he asks for help) since he's already up to speed. I'm mostly done refactoring in this area -- I am considering retiring xstate just to reduce dependencies (cc/ @HerbCaudill), but I'll try and wait until after this lands.

alexjg

I've left a few small comments. I have a bigger question which is bothering me. Currently it's quite common for people to depend on this idea that when a DocHandle is ready then the application can assume it will always be ready. For example, code like this is quite common:

const handle = repo.find(...)
await handle.whenReady()
console.log(handle.docSync()) // <-- this line depends on the document never becoming unready

Introducing a transition back to IDLE breaks this assumption and it's the reason that in #358 @yarolegovich spent a lot of time trying to find a way to avoid evicting handles which the application is known to be using.

I think that finding a solution to this problem is orthogonal to having a way to transition in the first place though so I think that - once the comments I've made are addressed - we can do that later.

packages/automerge-repo/test/DocHandle.test.ts

alexjg · 2024-08-19T12:04:38Z

packages/automerge-repo/src/DocHandle.ts

@@ -426,6 +434,11 @@ export class DocHandle<T> extends EventEmitter<DocHandleEvents<T>> {
    this.#machine.send({ type: DELETE })
  }

+  /** Called by the repo to free memory used by the document. */
+  reset() {


I think that rather than calling this reset we should all it idle as I think that more clearly describes what it actally does.

pvh · 2024-08-19T15:39:34Z

I've left a few small comments. I have a bigger question which is bothering me. Currently it's quite common for people to depend on this idea that when a DocHandle is ready then the application can assume it will always be ready. For example, code like this is quite common:
const handle = repo.find(...)
await handle.whenReady()
console.log(handle.docSync()) // <-- this line depends on the document never becoming unready

JS is single threaded -- as long as you don't yield, the above code is indeed safe. This is also a benefit of the promise-oriented API. Overall though, your point that introducing non-monotonic loading behaviour could have pitfalls is true but unavoidable.

I think ideally, most people will never encounter this behaviour, but we might in fact want a new handle state (or to rename the state to perhaps UNLOADED) so that we can give better error messages if it does happen.

georgewsu · 2024-08-19T21:18:01Z

Thanks for the reviews @alexjg @pvh

Agree about the need to determine which handles are in use, will follow up with questions I had in existing threads.

For the handle state, will follow up with you about going with IDLE or UNLOADED.

alexjg · 2024-08-21T16:05:09Z

It does seem like we need a new state if we want to provide good error messages, apologies for making you go around in circles. I think we should go for a new state called UNLOADED (distinct from IDLE) and rename reset() -> unload().

georgewsu · 2024-08-21T18:23:28Z

It does seem like we need a new state if we want to provide good error messages, apologies for making you go around in circles. I think we should go for a new state called UNLOADED (distinct from IDLE) and rename reset() -> unload().

Great - that sounds good, I'll make that change. Do you think UNLOADED should be a terminal state, or should it allow reusing the handle and reloading, either via a new RELOAD event or reusing the BEGIN event? If reloading is not needed yet, I could start with the terminal state and leave it to be refactored later.

alexjg · 2024-08-21T19:43:20Z

I think we should definitely support reloading via BEGIN yeah but I'm happy to leave that for a future PR if it's complicated.

georgewsu · 2024-08-21T22:20:42Z

I think we should definitely support reloading via BEGIN yeah but I'm happy to leave that for a future PR if it's complicated.

Cool, @alexjg just saw your comment - I actually just pushed 3 versions:

234f838 which has unloaded state as final
42706f7 which allows calling begin() again
8f0f839 which adds reload()

Happy to go with any of these

alexjg · 2024-08-26T15:46:30Z

@georgewsu I likke 8f0f! If you can fix the lint I'll merge.

Add support for clearing doc from DocHandle so that reference can be …

d790dec

…released and memory freed, without deleting document from storage

georgewsu marked this pull request as ready for review July 30, 2024 18:19

This was referenced Jul 30, 2024

Issue 330: Support removeDocument from synchronizer and isDocSubscribedTo #366

Open

Support removeDocument from synchronizer and isDocSubscribedTo georgewsu/automerge-repo#2

Merged

George Su added 3 commits August 15, 2024 14:33

Merge branch 'main' into gsu/clear-doc-handle

68ed39f

Add support for clearing doc from DocHandle

bcbe358

Update test for doc handle reset and being idle

5cd749c

alexjg reviewed Aug 19, 2024

View reviewed changes

Update idle() implementation

bcc7fee

George Su added 3 commits August 21, 2024 14:28

Add support to unload doc from DocHandle

234f838

Add support to unload doc from DocHandle, with begin again allowed

42706f7

Add support to unload doc from DocHandle, with reload allowed

8f0f839

npm run format

af26390

alexjg merged commit 7c486df into automerge:main Aug 27, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for clearing doc from DocHandle #365

Add support for clearing doc from DocHandle #365

georgewsu commented Jul 30, 2024

alexjg commented Aug 14, 2024

georgewsu commented Aug 14, 2024

pvh commented Aug 15, 2024

alexjg left a comment

alexjg Aug 19, 2024

pvh commented Aug 19, 2024

georgewsu commented Aug 19, 2024

alexjg commented Aug 21, 2024

georgewsu commented Aug 21, 2024

alexjg commented Aug 21, 2024 •

edited

Loading

georgewsu commented Aug 21, 2024

alexjg commented Aug 26, 2024

Add support for clearing doc from DocHandle #365

Add support for clearing doc from DocHandle #365

Conversation

georgewsu commented Jul 30, 2024

alexjg commented Aug 14, 2024

georgewsu commented Aug 14, 2024

pvh commented Aug 15, 2024

alexjg left a comment

Choose a reason for hiding this comment

alexjg Aug 19, 2024

Choose a reason for hiding this comment

pvh commented Aug 19, 2024

georgewsu commented Aug 19, 2024

alexjg commented Aug 21, 2024

georgewsu commented Aug 21, 2024

alexjg commented Aug 21, 2024 • edited Loading

georgewsu commented Aug 21, 2024

alexjg commented Aug 26, 2024

alexjg commented Aug 21, 2024 •

edited

Loading