Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Reconcile on disk state with assigned partitions for given resource and delete unused partitions #650

Open
2 of 12 tasks
ZacAttack opened this issue Sep 21, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@ZacAttack
Copy link
Contributor

Willingness to contribute

No. I cannot contribute a bug fix at this time.

Venice version

Observed since March in production

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 20.0): Marner 5.15.111.1-1.cm2
  • JDK version: 17

Describe the problem

Currently the on-disk state inside the rocksDB folder of a venice server for a give resource (store version) can contain folders for partitions that are no longer assigned to the host, due to missed "partition drops".

We need to ensure that only the known (metadata ?) or "assigned" partitions are present in the folder for each store_version , and remove the rest very quickly in order to recover disk space.

It is NOT sufficient to clean up this data on start without some introspection. Venice relies on delayed rebalance on it's controller to avoid unnecessary bootstraps, so some examination may be needed to determine what the appropriate disk state should be.

Tracking information

No response

Code to reproduce bug

No response

What component(s) does this bug affect?

  • Controller: This is the control-plane for Venice. Used to create/update/query stores and their metadata.
  • Router: This is the stateless query-routing layer for serving read requests.
  • Server: This is the component that persists all the store data.
  • VenicePushJob: This is the component that pushes derived data from Hadoop to Venice backend.
  • VenicePulsarSink: This is a Sink connector for Apache Pulsar that pushes data from Pulsar into Venice.
  • Thin Client: This is a stateless client users use to query Venice Router for reading store data.
  • Fast Client: This is a stateful client users use to query Venice Server for reading store data.
  • Da Vinci Client: This is an embedded, stateful client that materializes store data locally.
  • Alpini: This is the framework that fast-client and routers use to route requests to the storage nodes that have the data.
  • Samza: This is the library users use to make nearline updates to store data.
  • Admin Tool: This is the stand-alone client used for ad-hoc operations on Venice.
  • Scripts: These are the various ops scripts in the repo.
@ZacAttack ZacAttack added the bug Something isn't working label Sep 21, 2023
@ZacAttack
Copy link
Contributor Author

Ok. So here are my notes for how to make the bug fix.

There are two tasks. Reconcile Disk state for Davinci and for servers. I'm going to explain them seperately.

Servers

Our task will be to consult Helix's idealstate for understanding which partitions will be assigned to the server. That should be a sufficient enough signal to tell us if the server will get state transitions for the data which is on disk.

Within one of the constructors for StorageService.java there is an argument for providing a functional interface called checkWhetherStorageEngineShouldBeKeptOrNot. When the storage service starts up, it's going to check every folder in it's local disk and determine weather or not this folder should be deleted or not. Today we use this in Davinci, but we don't in the server. You can observe this due to the fact that when VeniceServer.java calls new StorageService() it uses a constructor which doesn't provide a parameter for checkWhetherStorageEngineShouldBeKeptOrNot.

Our task here then is to have VeniceServer invoke the constructor that requires this parameter, and then to define the function that should be used.

Some Tips:

To get the Idealstate, you should be able to use the SafeHelixDdataAccessor object and call getProperty. What you should pass here is a PropertyKey which points to the idealstate for the resource we're checking up on. That call should look something like:

PropertyKey.Builder propertyKeyBuilder = new PropertyKey.Builder(clusterConfig.getClusterName());
IdealState idealstate = safeHelixDataAccessort.getProperty(propertyKeyBuilder.idealStates(storeName));

Once you have the idealstate object, you can interrogate it on which instance names currently host a given partition. If the current instance is in the list, then you know that we should keep the partition. Yay!

Some notes now that I'm looking at this. First of all, the current implementation of checkWhetherStorageEngineShouldBeKeptOrNot seems to work off of store names not partition names. We will probably have to tweak that somehow, I leave it to you to decide how best to do that.

Next, it would be great if the implementation prevented looking up the IdealState from ZK over and over again. Some cacheing per store would be ideal here so we only have to look it up once, not over and over again. So please keep that in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant