Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When an SOC is overwritten, the old version is served from cache? #4983

Open
Cafe137 opened this issue Feb 4, 2025 · 8 comments
Open

When an SOC is overwritten, the old version is served from cache? #4983

Cafe137 opened this issue Feb 4, 2025 · 8 comments
Assignees
Labels
needs-triaging new issues that need triaging

Comments

@Cafe137
Copy link

Cafe137 commented Feb 4, 2025

Context

Bee 2.4.0

Summary

My understanding is that SOCs can be overwritten. However, when updating an existing SOC, old versions are served, likely from cache. I suspect cache, but not 100% on it. Getting the latest version requires workarounds such as disabling cache[0], or waiting a long time.

Additionally, there exists an intermediate period after update, where new and old versions are both served randomly, based on which nodes are chosen. This probably assumes that local cache is disabled.

[0] While disabling local cache has worked in some cases, remote nodes can still cache previous versions.

Expected behavior

After updating an SOC, its new version should quickly propagate in the network and replace previous versions.

Actual behavior

Please see summary.

Steps to reproduce

Write an SOC; update same SOC; fetch SOC => 1st version is returned

Possible solution

When a new version of an SOC is stored, Bee should clear the previous version from the cache, if exists.

@Cafe137 Cafe137 added the needs-triaging new issues that need triaging label Feb 4, 2025
@istae
Copy link
Member

istae commented Feb 4, 2025

You are right. The cache as it is currently implemented will not update itself with the latest SOC, this is a known problem.

@NoahMaizels
Copy link
Contributor

NoahMaizels commented Feb 5, 2025

My understanding is that SOCs can be overwritten. However, when updating an existing SOC, old versions are served, likely from cache. I suspect cache, but not 100% on it. Getting the latest version requires workarounds such as disabling cache[0], or waiting a long time.

If I understand correctly based on an earlier call with @istae, you would need that every node along the path the chunk takes hopping along would need to disable cache, not only the receiving node itself.

@NoahMaizels
Copy link
Contributor

My understanding is that SOCs can be overwritten. However, when updating an existing SOC, old versions are served, likely from cache. I suspect cache, but not 100% on it. Getting the latest version requires workarounds such as disabling cache[0], or waiting a long time.

If I understand correctly based on an earlier call with @istae, you would need that every node along the path the chunk takes hopping along would need to disable cache, not only the receiving node itself.

This would also mean increased latency, but I think it's not an issue when it comes to single chunk retrievals

@Cafe137
Copy link
Author

Cafe137 commented Feb 5, 2025

These are the options I naively see, sharing just for brainstorming purposes:

  • Never cache an SOC, simple but very inefficient.
  • Have a "cache busting" protocol that signals to the whole network that a new version of an SOC is available.
  • Have a request flag that requests a fresh version of an SOC, by-passing cache for all hopped nodes.

@mfw78
Copy link
Collaborator

mfw78 commented Feb 10, 2025

Another option is to expand the SOC so that it has a cache_ttl field in it's chunk definition. Then could just take the option of if the SOC doesn't have a cache_ttl, don't cache it.

@acha-bill acha-bill self-assigned this Mar 17, 2025
@acha-bill
Copy link
Contributor

Have a request flag that requests a fresh version of an SOC, by-passing cache for all hopped nodes.

This option looks good to me. The client controls when they want consistency (latest chunk) vs performance (maybe latest). Also a pragmatic approach

Have a "cache busting" protocol

Inefficient as you have to notify everyone on the network starting from the node that stored the update. Also, in case of network partioning, you still end up with nodes that didn't invalidate their cached entry. Also harder to implement. lol

Another option is to expand the SOC so that it has a cache_ttl

Since the TTL is set by the uploader but the cache problem is experienced by downloaders who may be unrelated, there's a misalignment of control and needs

@mfw78
Copy link
Collaborator

mfw78 commented Mar 18, 2025

Since the TTL is set by the uploader but the cache problem is experienced by downloaders who may be unrelated, there's a misalignment of control and needs

The intent would be to not make it binding, but to make it a best efforts basis. More specifically, this provides hints to intermediate nodes as to how to best effectively manage their cache, which would maximise their potential revenue from bandwidth incentives.

@NoahMaizels
Copy link
Contributor

NoahMaizels commented Mar 18, 2025

Maybe the nodes which cache the SOC could record when the last update was, and requesting nodes can specify how fresh of an update they are looking for?

Like, they could specify 5 minutes in the request, and then if the intermediate node has a SOC update that is less than 5 minutes old it will serve that one, and if it's older than 5 minutes it will attempt to fetch a newer one and update its cache with that one if it exists?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triaging new issues that need triaging
Projects
None yet
Development

No branches or pull requests

5 participants