-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add capability to discard duplicate jobs with concurrency configuration #523
base: main
Are you sure you want to change the base?
Changes from all commits
f290af1
e561356
90793ad
b8dae8e
ddb513f
a3a7049
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,19 +10,22 @@ class EnqueueError < StandardError; end | |
|
||
class << self | ||
def enqueue_all(active_jobs) | ||
active_jobs_by_job_id = active_jobs.index_by(&:job_id) | ||
enqueued_jobs_count = 0 | ||
|
||
transaction do | ||
jobs = create_all_from_active_jobs(active_jobs) | ||
prepare_all_for_execution(jobs).tap do |enqueued_jobs| | ||
enqueued_jobs.each do |enqueued_job| | ||
active_jobs_by_job_id[enqueued_job.active_job_id].provider_job_id = enqueued_job.id | ||
active_jobs_by_job_id[enqueued_job.active_job_id].successfully_enqueued = true | ||
end | ||
enqueued_jobs_by_active_job_id = prepare_all_for_execution(jobs).index_by(&:active_job_id) | ||
|
||
active_jobs.each do |active_job| | ||
job = enqueued_jobs_by_active_job_id[active_job.job_id] | ||
active_job.provider_job_id = job&.id | ||
active_job.successfully_enqueued = job.present? | ||
end | ||
|
||
enqueued_jobs_count = enqueued_jobs_by_active_job_id.count | ||
end | ||
|
||
active_jobs.count(&:successfully_enqueued?) | ||
enqueued_jobs_count | ||
end | ||
|
||
def enqueue(active_job, scheduled_at: Time.current) | ||
|
@@ -49,7 +52,7 @@ def create_from_active_job(active_job) | |
def create_all_from_active_jobs(active_jobs) | ||
job_rows = active_jobs.map { |job| attributes_from_active_job(job) } | ||
insert_all(job_rows) | ||
where(active_job_id: active_jobs.map(&:job_id)) | ||
where(active_job_id: active_jobs.map(&:job_id)).order(id: :asc) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this may have been an unintentional bug, or would be something that's good to do moving forward with more concurrency configuration. Previous to adding this order, the jobs could be dispatched out of order they were provided in. This isn't really a problem when there isn't concurrency controls on the job, but when using concurrency controls, it might have unintended consequences. So if you were to do: jobs = [
job_1 = DiscardedNonOverlappingJob.new(@result, name: "A"),
job_2 = DiscardedNonOverlappingJob.new(@result, name: "B")
]
enqueued_jobs_count = SolidQueue::Job.enqueue_all(jobs) There wasn't any guarantee that job_1 would be dispatched before job_2. Now, it should be as these jobs are bulk inserted with auto-incrementing IDs, job_1 will have the lower id. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Concurrency controls don't guarantee anything about the order, though 🤔 I think in the case you're enqueuing two jobs together (in bulk), concurrency-controlled, there shouldn't be any guarantees that one is run before the other. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Intuitively, if I were using bulk enqueue and concurrency controls, I would want it to enqueue the jobs in the order they were created, no? If you don't want to make that guarantee, I can remove these changes and instead add a more prominent call-out in the bulk enqueue documentation that ordering is not guaranteed. |
||
end | ||
|
||
def attributes_from_active_job(active_job) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -66,6 +66,7 @@ def prepare_for_execution | |
|
||
def dispatch | ||
if acquire_concurrency_lock then ready | ||
elsif discard_concurrent? then discard | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't believe discard does anything here, since it just deletes the execution. But I believe in this point we won't have an execution? This isn't deleting the job as I once thought it was. This is effectively doing 'nothing'. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That's right, we won't have an execution yet. However, we need to ensure we don't create the job. I think this will result in the job record being created. That's why I think we need to rollback the current transaction. There should be a transaction open since this is called as an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rosa ah - thanks for pointing out the callback. Because |
||
else | ||
block | ||
end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reorganized this method so that we can efficiently set
successfully_enqueued
for both enqueued/blocked and discarded jobs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I think the previous implementation did that as well (only setting
successfully_enqueued = true
for jobs that had been actually enqueued), it should work without doing anything for discarded jobs, as long as discarded jobs don't create a job record. I must be missing something here, why do you think the previous implementation didn't set that correctly?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rosa previous implementation would only set successfully_enqueued = true, but wouldn't set successfully_enqueued = false if it wasn't enqueued.
If you're okay with that quirk, then I can revert this change.