-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add capability to discard duplicate jobs with concurrency configuration #523
base: main
Are you sure you want to change the base?
Add capability to discard duplicate jobs with concurrency configuration #523
Conversation
@@ -66,6 +66,8 @@ def prepare_for_execution | |||
|
|||
def dispatch | |||
if acquire_concurrency_lock then ready | |||
elsif discard_concurrent? | |||
discard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the way 🤔 We're in the middle of a transaction here, and the job hasn't even been committed to the DB. It makes no sense to delete a record in the same transaction you're creating it. It'd make sense to roll that transaction back instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I don't fully understand how we're getting into this code path, but from my investigation, it looks as though there isn't an open transaction at this time.
I tried the following and running the test suite and didn't hit any open transactions.
raise "open transactions" if ApplicationRecord.connection.open_transactions.positive?
I still believe we'd want to discard here. Let me know your thoughts.
app/models/solid_queue/semaphore.rb
Outdated
@@ -39,6 +43,14 @@ def initialize(job) | |||
@job = job | |||
end | |||
|
|||
def at_limit? | |||
if semaphore = Semaphore.find_by(key: key) | |||
semaphore.value.zero? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is vulnerable to race conditions and the reason we don't check it in this way when blocking jobs. If two concurrent jobs are claiming the semaphore, both of them will see it open, and none will be discarded. Then, both will run together because we won't block them either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You raise an excellent point. Do you have a suggestion to how we might be able to avoid this race condition?
Are you thinking Pessimistic Locking might help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was thinking that perhaps we could rely on the same check we're already doing to block the job, but instead of blocking the job, we'd rollback the transaction 🤔 What I'm not sure is whether this should actually raise some exception to indicate the job hasn't been enqueued. Maybe that's not necessary, but at least it should set the successfully_enqueued
attribute in the active job to false
.
Thanks for the review @rosa ❤️, I'll review your comments more thoroughly and make some changes. 😊 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rosa, I've made a few changes and believe to have encountered weird interactions with bulk enqueuing which might be a bug. Let me know if you'd like those changes split out into another PR.
active_jobs.each do |active_job| | ||
job = enqueued_jobs_by_active_job_id[active_job.job_id] | ||
active_job.provider_job_id = job&.id | ||
active_job.successfully_enqueued = job.present? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reorganized this method so that we can efficiently set successfully_enqueued
for both enqueued/blocked and discarded jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I think the previous implementation did that as well (only setting successfully_enqueued = true
for jobs that had been actually enqueued), it should work without doing anything for discarded jobs, as long as discarded jobs don't create a job record. I must be missing something here, why do you think the previous implementation didn't set that correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rosa previous implementation would only set successfully_enqueued = true, but wouldn't set successfully_enqueued = false if it wasn't enqueued.
If you're okay with that quirk, then I can revert this change.
@@ -49,7 +52,7 @@ def create_from_active_job(active_job) | |||
def create_all_from_active_jobs(active_jobs) | |||
job_rows = active_jobs.map { |job| attributes_from_active_job(job) } | |||
insert_all(job_rows) | |||
where(active_job_id: active_jobs.map(&:job_id)) | |||
where(active_job_id: active_jobs.map(&:job_id)).order(id: :asc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may have been an unintentional bug, or would be something that's good to do moving forward with more concurrency configuration.
Previous to adding this order, the jobs could be dispatched out of order they were provided in. This isn't really a problem when there isn't concurrency controls on the job, but when using concurrency controls, it might have unintended consequences.
So if you were to do:
jobs = [
job_1 = DiscardedNonOverlappingJob.new(@result, name: "A"),
job_2 = DiscardedNonOverlappingJob.new(@result, name: "B")
]
enqueued_jobs_count = SolidQueue::Job.enqueue_all(jobs)
There wasn't any guarantee that job_1 would be dispatched before job_2. Now, it should be as these jobs are bulk inserted with auto-incrementing IDs, job_1 will have the lower id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concurrency controls don't guarantee anything about the order, though 🤔 I think in the case you're enqueuing two jobs together (in bulk), concurrency-controlled, there shouldn't be any guarantees that one is run before the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intuitively, if I were using bulk enqueue and concurrency controls, I would want it to enqueue the jobs in the order they were created, no?
If you don't want to make that guarantee, I can remove these changes and instead add a more prominent call-out in the bulk enqueue documentation that ordering is not guaranteed.
@@ -66,6 +66,7 @@ def prepare_for_execution | |||
|
|||
def dispatch | |||
if acquire_concurrency_lock then ready | |||
elsif discard_concurrent? then discard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe discard does anything here, since it just deletes the execution. But I believe in this point we won't have an execution?
This isn't deleting the job as I once thought it was. This is effectively doing 'nothing'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe discard does anything here, since it just deletes the execution. But I believe in this point we won't have an execution?
That's right, we won't have an execution yet. However, we need to ensure we don't create the job. I think this will result in the job record being created. That's why I think we need to rollback the current transaction. There should be a transaction open since this is called as an after_create
callback on the job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rosa ah - thanks for pointing out the callback. Because dispatch
is also used in bulk enqueueing, should we rollback all jobs in the bulk enqueue if one failed to be enqueued due to concurrency controls?
I am really in need of this feature. As its not yet available , could you help me in implementing this ? I have a |
@rajeevriitm if you have a look at the read me you should find concurrency controls are available for you to use. The only difference being that jobs will always be blocked, so you may get several jobs that are awaiting execution. |
Closes #176
Add ability to discard duplicate jobs instead of becoming blocked. When configuring
concurrency_limits
withat_limit: discard
jobs scheduled above the concurrency limit are discarded and not executed.Implementation Details
concurrency_at_limit
andat_limit
option toconcurrency_limits