You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working on two Kubernetes jobs which read/write/delete blocks from object storage:
periodic rewrites on new blocks uploaded to object storage
running maintenance scripts in case of any corruption that gets Compactor halting (which happens quite often :/)
For the first job, I want to ensure that Compactor isn't currently running, working on blocks I want to rewrite at the same time
For the second job, I want to notify the Compactor that it's fine to carry on
Describe the solution you'd like
It would be great if my tasks could call an endpoint like ://thanos-compactor/suspend when they are starting their work.
This would notify Compactor to drop & forget everything it's doing immediately, going into its halted state.
After everything is done, a call to ://thanos-compactor/resume would then get it back running and to leave the halted state again, resyncing block information to continue (or rather restart) compaction.
Compactor should support storing its suspension status locally, so stays suspended after getting restarted.
These endpoints should be opt-in via a flag (or seperate ones?), as this shouldn't be made available without proper monitoring for Compactors being halted for too long.
Having "Resume" in the UI would be nice as well; not so sure about a "Suspend" button, as it could raise concerns about accidental (or malicious) activations impacting stability and performance. But on the other hand, there is already a --disable-admin-operations flag.
Describe alternatives you've considered
Putting thanos compact inbetween my scripts
My first idea was to run Compactor as a job as well, making it easier to perform the operations in sequence.
While this would be possible with a custom image executing some pre and post thanos compact tasks, it's also getting more complex when I don't want to execute those tasks on the same schedule.
Locking streams or bucket instead of Compactor
suspend / resume would work in my use case, but I'm only working with one Compactor instance right now, against a simple, not replicated bucket.
With more than one Compactor instances working on different streams, discovering, suspending and resuming the correct one might get a bit more involved.
In that case it might be easier to put some lock file into the bucket, which could be checked by the Compactor before every upload.
Haven't thought about it too much, so not sure if this would even properly work in all cases when replication gets involved?
This would also waste more CPU cycles, since Compactor learns about the lock at a later point.
Doing relabeling in Compactor
As described in #4941 (comment), my particular use case of relabeling would be handled even better directly by the Compactor.
Additional context
/resume would even by useful without /suspend, in cases where it's easier to do a request (or click a button in the UI) rather than restarting the process, for example when Comapctor isn't running in the cluster.
The text was updated successfully, but these errors were encountered:
These endpoints should be opt-in via a flag (or seperate ones?), as this shouldn't be made available without proper monitoring for Compactors being halted for too long.
This could also be adressed by having a mandatory "requestUntil" timestamp for suspensions, at which Compactor would auto-resume if it didn't get resumed earlier, or the suspension wasn't extended with another call.
A max-suspend flag on process start could also specify a maximum duration a Compactor can get suspended.
The suspend endpoint would respond with how long it will stay suspended, min(requestUntil, max-suspend), so callers know when to schedule another request for renewing the suspension.
Is your proposal related to a problem?
I'm working on two Kubernetes jobs which read/write/delete blocks from object storage:
Describe the solution you'd like
It would be great if my tasks could call an endpoint like
://thanos-compactor/suspend
when they are starting their work.This would notify Compactor to drop & forget everything it's doing immediately, going into its halted state.
After everything is done, a call to
://thanos-compactor/resume
would then get it back running and to leave the halted state again, resyncing block information to continue (or rather restart) compaction.Compactor should support storing its suspension status locally, so stays suspended after getting restarted.
These endpoints should be opt-in via a flag (or seperate ones?), as this shouldn't be made available without proper monitoring for Compactors being halted for too long.
Having "Resume" in the UI would be nice as well; not so sure about a "Suspend" button, as it could raise concerns about accidental (or malicious) activations impacting stability and performance. But on the other hand, there is already a
--disable-admin-operations
flag.Describe alternatives you've considered
Putting
thanos compact
inbetween my scriptsMy first idea was to run Compactor as a job as well, making it easier to perform the operations in sequence.
While this would be possible with a custom image executing some pre and post
thanos compact
tasks, it's also getting more complex when I don't want to execute those tasks on the same schedule.Locking streams or bucket instead of Compactor
suspend / resume would work in my use case, but I'm only working with one Compactor instance right now, against a simple, not replicated bucket.
With more than one Compactor instances working on different streams, discovering, suspending and resuming the correct one might get a bit more involved.
In that case it might be easier to put some lock file into the bucket, which could be checked by the Compactor before every upload.
Haven't thought about it too much, so not sure if this would even properly work in all cases when replication gets involved?
This would also waste more CPU cycles, since Compactor learns about the lock at a later point.
Doing relabeling in Compactor
As described in #4941 (comment), my particular use case of relabeling would be handled even better directly by the Compactor.
Additional context
/resume
would even by useful without/suspend
, in cases where it's easier to do a request (or click a button in the UI) rather than restarting the process, for example when Comapctor isn't running in the cluster.The text was updated successfully, but these errors were encountered: