Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for clearing doc from DocHandle #365

Merged
merged 9 commits into from
Aug 27, 2024

Conversation

georgewsu
Copy link
Contributor

Summary:
Add support for clearing doc from DocHandle so that reference can be released and memory freed, without deleting document from storage

Issue:
#330

Background:
After creating enough documents in a repo, out of memory errors will cause a synch server to crash because memory isn't being freed. Repo holds references to each DocHandle in handleCache and each DocHandle holds a reference to the automerge doc.

Proposed solution:
https://github.com/georgewsu/automerge-repo/tree/gsu/cache-eviction-test2 has code for an initial implementation of cache eviction and releasing document reference so that memory can be freed. I've tested this with a sync server running locally.

Part 1:
This PR only covers introducing a new state and event to DocHandle to allow clearing the document.

…released and memory freed, without deleting document from storage
@alexjg
Copy link
Contributor

alexjg commented Aug 14, 2024

Thanks for putting a bunch of work into this and the follow up PRs. I'm going to post my comments here but this feedback also relates to the linked work.

The addition of a new state to the DocHandle makes me uneasy. I've been thinking about why and I think it's because from the perspective of the application, the CLEARED state is not useful. I think instead we should just re-use the LOADING state. This is important because we need to figure out what to do if we evict a document which the application has a live DocHandle for (as discussed in #358 ). There seems to me to be not much difference between a handle which is loading because it is waiting for storage or network and a handle which is loading because the Repo has decided to evict it.

As to how to the time based cache expiration policy. I think there are some tactical problems with it - the accessTime seems to me like it would be based on when the handle was created, not on the last time the application used it. Maybe that works for some usecases but it doesn't seem very general. This feels more like something which should be configurable in some way, maybe by passing a function to the Repo.clearCache method.

@georgewsu
Copy link
Contributor Author

Thanks for putting a bunch of work into this and the follow up PRs. I'm going to post my comments here but this feedback also relates to the linked work.

The addition of a new state to the DocHandle makes me uneasy. I've been thinking about why and I think it's because from the perspective of the application, the CLEARED state is not useful. I think instead we should just re-use the LOADING state. This is important because we need to figure out what to do if we evict a document which the application has a live DocHandle for (as discussed in #358 ). There seems to me to be not much difference between a handle which is loading because it is waiting for storage or network and a handle which is loading because the Repo has decided to evict it.

As to how to the time based cache expiration policy. I think there are some tactical problems with it - the accessTime seems to me like it would be based on when the handle was created, not on the last time the application used it. Maybe that works for some usecases but it doesn't seem very general. This feels more like something which should be configurable in some way, maybe by passing a function to the Repo.clearCache method.

Sounds good, agree with not having another state in DocHandle if it isn't necessary. @pvh had originally suggested reusing the IDLE state but I wasn't sure how that would work since the constructor sends the BEGIN event to transition to LOADING state. I'll work on removing the added state and reusing the LOADING state instead and will test it out.

The intention of accessTime is to track the last time the handle was used - will follow up with you on that approach / implementation details / changes need to support that.

Thanks!

@pvh
Copy link
Member

pvh commented Aug 15, 2024

Just a note that I'm going to let @alexjg be the reviewer for this one (unless he asks for help) since he's already up to speed. I'm mostly done refactoring in this area -- I am considering retiring xstate just to reduce dependencies (cc/ @HerbCaudill), but I'll try and wait until after this lands.

Copy link
Contributor

@alexjg alexjg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few small comments. I have a bigger question which is bothering me. Currently it's quite common for people to depend on this idea that when a DocHandle is ready then the application can assume it will always be ready. For example, code like this is quite common:

const handle = repo.find(...)
await handle.whenReady()
console.log(handle.docSync()) // <-- this line depends on the document never becoming unready

Introducing a transition back to IDLE breaks this assumption and it's the reason that in #358 @yarolegovich spent a lot of time trying to find a way to avoid evicting handles which the application is known to be using.

I think that finding a solution to this problem is orthogonal to having a way to transition in the first place though so I think that - once the comments I've made are addressed - we can do that later.

packages/automerge-repo/test/DocHandle.test.ts Outdated Show resolved Hide resolved
@@ -426,6 +434,11 @@ export class DocHandle<T> extends EventEmitter<DocHandleEvents<T>> {
this.#machine.send({ type: DELETE })
}

/** Called by the repo to free memory used by the document. */
reset() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that rather than calling this reset we should all it idle as I think that more clearly describes what it actally does.

@pvh
Copy link
Member

pvh commented Aug 19, 2024

I've left a few small comments. I have a bigger question which is bothering me. Currently it's quite common for people to depend on this idea that when a DocHandle is ready then the application can assume it will always be ready. For example, code like this is quite common:

const handle = repo.find(...)
await handle.whenReady()
console.log(handle.docSync()) // <-- this line depends on the document never becoming unready

JS is single threaded -- as long as you don't yield, the above code is indeed safe. This is also a benefit of the promise-oriented API. Overall though, your point that introducing non-monotonic loading behaviour could have pitfalls is true but unavoidable.

I think ideally, most people will never encounter this behaviour, but we might in fact want a new handle state (or to rename the state to perhaps UNLOADED) so that we can give better error messages if it does happen.

@georgewsu
Copy link
Contributor Author

Thanks for the reviews @alexjg @pvh

Agree about the need to determine which handles are in use, will follow up with questions I had in existing threads.

For the handle state, will follow up with you about going with IDLE or UNLOADED.

@alexjg
Copy link
Contributor

alexjg commented Aug 21, 2024

It does seem like we need a new state if we want to provide good error messages, apologies for making you go around in circles. I think we should go for a new state called UNLOADED (distinct from IDLE) and rename reset() -> unload().

@georgewsu
Copy link
Contributor Author

It does seem like we need a new state if we want to provide good error messages, apologies for making you go around in circles. I think we should go for a new state called UNLOADED (distinct from IDLE) and rename reset() -> unload().

Great - that sounds good, I'll make that change. Do you think UNLOADED should be a terminal state, or should it allow reusing the handle and reloading, either via a new RELOAD event or reusing the BEGIN event? If reloading is not needed yet, I could start with the terminal state and leave it to be refactored later.

@alexjg
Copy link
Contributor

alexjg commented Aug 21, 2024

I think we should definitely support reloading via BEGIN yeah but I'm happy to leave that for a future PR if it's complicated.

@georgewsu
Copy link
Contributor Author

I think we should definitely support reloading via BEGIN yeah but I'm happy to leave that for a future PR if it's complicated.

Cool, @alexjg just saw your comment - I actually just pushed 3 versions:

  1. 234f838 which has unloaded state as final
  2. 42706f7 which allows calling begin() again
  3. 8f0f839 which adds reload()

Happy to go with any of these

@alexjg
Copy link
Contributor

alexjg commented Aug 26, 2024

@georgewsu I likke 8f0f! If you can fix the lint I'll merge.

@alexjg alexjg merged commit 7c486df into automerge:main Aug 27, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants