Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak #74

Open
bowenwang1996 opened this issue Mar 25, 2020 · 9 comments
Open

Memory leak #74

bowenwang1996 opened this issue Mar 25, 2020 · 9 comments

Comments

@bowenwang1996
Copy link

In get_data_decode_matrix the inverted matrix will always be added to the trie but nothing seems to every gets removed from the trie, so this can potentially consume a lot of memory. Is it expected that an instance of ReedSolomon<F> wouldn't be used too many times?

@darrenldl
Copy link
Contributor

It is expected there would not be a lot of inverted matrices required for a run of the application, which may not be a valid assumption, especially after the upgrade to GF16 where the number of possible matrices may be much larger than GF8.

At the time of writing the library, most of the efforts was dedicated to making sure the code was sound, and thus this memory leak was overlooked (and also it only supported GF8 at the time).

@darrenldl
Copy link
Contributor

Adding a cache eviction procedure would be ideal (obviously), but I think I'm overwhelmed by things at the moment to warrant me implementing it.

@darrenldl
Copy link
Contributor

darrenldl commented Mar 25, 2020

I roughly outline the changes that need to be made as follows to help the fix get started

  • Either define the upper bound as a constant, or as a parameter in the ReedSolomon<F> instantiation function, and passed to the tree creation procedure
  • InversionTree type in src/inversion_tree.rs needs to have at least one more field for cache eviction metadata (number of fields depends on which cache eviction algorithm we want to use)
  • We need an eviction procedure to clear one spot if the tree is full, and this is to be called within insert_inverted_matrix, after the square matrix check
  • The quickcheck test qc_tree_same_as_hash_map_prop in src/inversion_tree.rs will need to keep the upper bound of matrices in mind

@darrenldl darrenldl added the bug label Mar 25, 2020
@darrenldl
Copy link
Contributor

Lastly, I definitely could have documented this memory leak, which I know from the start, better and lodged it as a backlog issue to tackle later on or for others to tackle.

At the time of porting from Klaus Post's Go implementation to Rust's, I was mostly focusing on correctness (via heavy testing) and matching the feature parity of Klaus's. Flaws carried over from Klaus's were overlooked if they didn't (directly) lead to incorrect results, such as this bug.

Sorry for the inconvenience to whomever is affected, and thanks for reporting @bowenwang1996 !

@bowenwang1996
Copy link
Author

@darrenldl Thanks for the quick response. We have found a workaround so it is not super urgent.

bowenwang1996 added a commit to near/nearcore that referenced this issue Mar 25, 2020
…#2317)

There is a memory leak in the reed solomon code that we use rust-rse/reed-solomon-erasure#74 which results in our nodes crashing after reaching about 100k blocks. This PR fixes it by occasionally reinstantiating the reed solomon instance and cap its memory usage at 500MB.

Test plan
---------
Make sure existing tests pass.
@darrenldl
Copy link
Contributor

@bowenwang1996 Just to get some sanity checks before I attempt to do anything: is there a cache eviction policy in mind that you think would enable best performance in, say, your company's use case?

I was thinking of just doing a simple LRU, but reckon I should check with others before too late just in case that's a terrible idea.

@bowenwang1996
Copy link
Author

I don't have anything specific in mind. I think LRU works.

@burdges
Copy link
Collaborator

burdges commented Feb 4, 2021

I suppose ReedSolomon::new gets somewhat heavy too, so maybe worth mentioning https://github.com/dermesser/memoize somewhere in the docs.

Miezhiko added a commit to Masha/reed-solomon-erasure that referenced this issue Feb 15, 2021
@Miezhiko
Copy link
Contributor

@bowenwang1996 hello, may you please check if #83 is helping?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants