Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add capability to discard duplicate jobs with concurrency configuration #523

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
19 changes: 11 additions & 8 deletions app/models/solid_queue/job.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,22 @@ class EnqueueError < StandardError; end

class << self
def enqueue_all(active_jobs)
active_jobs_by_job_id = active_jobs.index_by(&:job_id)
enqueued_jobs_count = 0

transaction do
jobs = create_all_from_active_jobs(active_jobs)
prepare_all_for_execution(jobs).tap do |enqueued_jobs|
enqueued_jobs.each do |enqueued_job|
active_jobs_by_job_id[enqueued_job.active_job_id].provider_job_id = enqueued_job.id
active_jobs_by_job_id[enqueued_job.active_job_id].successfully_enqueued = true
end
enqueued_jobs_by_active_job_id = prepare_all_for_execution(jobs).index_by(&:active_job_id)

active_jobs.each do |active_job|
job = enqueued_jobs_by_active_job_id[active_job.job_id]
active_job.provider_job_id = job&.id
active_job.successfully_enqueued = job.present?
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reorganized this method so that we can efficiently set successfully_enqueued for both enqueued/blocked and discarded jobs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I think the previous implementation did that as well (only setting successfully_enqueued = true for jobs that had been actually enqueued), it should work without doing anything for discarded jobs, as long as discarded jobs don't create a job record. I must be missing something here, why do you think the previous implementation didn't set that correctly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rosa previous implementation would only set successfully_enqueued = true, but wouldn't set successfully_enqueued = false if it wasn't enqueued.

If you're okay with that quirk, then I can revert this change.

end

enqueued_jobs_count = enqueued_jobs_by_active_job_id.count
end

active_jobs.count(&:successfully_enqueued?)
enqueued_jobs_count
end

def enqueue(active_job, scheduled_at: Time.current)
Expand All @@ -49,7 +52,7 @@ def create_from_active_job(active_job)
def create_all_from_active_jobs(active_jobs)
job_rows = active_jobs.map { |job| attributes_from_active_job(job) }
insert_all(job_rows)
where(active_job_id: active_jobs.map(&:job_id))
where(active_job_id: active_jobs.map(&:job_id)).order(id: :asc)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may have been an unintentional bug, or would be something that's good to do moving forward with more concurrency configuration.

Previous to adding this order, the jobs could be dispatched out of order they were provided in. This isn't really a problem when there isn't concurrency controls on the job, but when using concurrency controls, it might have unintended consequences.

So if you were to do:

jobs = [
  job_1 = DiscardedNonOverlappingJob.new(@result, name: "A"),
  job_2 = DiscardedNonOverlappingJob.new(@result, name: "B")
]

enqueued_jobs_count = SolidQueue::Job.enqueue_all(jobs)

There wasn't any guarantee that job_1 would be dispatched before job_2. Now, it should be as these jobs are bulk inserted with auto-incrementing IDs, job_1 will have the lower id.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrency controls don't guarantee anything about the order, though 🤔 I think in the case you're enqueuing two jobs together (in bulk), concurrency-controlled, there shouldn't be any guarantees that one is run before the other.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intuitively, if I were using bulk enqueue and concurrency controls, I would want it to enqueue the jobs in the order they were created, no?

If you don't want to make that guarantee, I can remove these changes and instead add a more prominent call-out in the bulk enqueue documentation that ordering is not guaranteed.

end

def attributes_from_active_job(active_job)
Expand Down
6 changes: 5 additions & 1 deletion app/models/solid_queue/job/concurrency_controls.rb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ module ConcurrencyControls
included do
has_one :blocked_execution

delegate :concurrency_limit, :concurrency_duration, to: :job_class
delegate :concurrency_limit, :concurrency_at_limit, :concurrency_duration, to: :job_class

before_destroy :unblock_next_blocked_job, if: -> { concurrency_limited? && ready? }
end
Expand All @@ -34,6 +34,10 @@ def blocked?
end

private
def discard_concurrent?
concurrency_at_limit == :discard
end

def acquire_concurrency_lock
return true unless concurrency_limited?

Expand Down
1 change: 1 addition & 0 deletions app/models/solid_queue/job/executable.rb
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ def prepare_for_execution

def dispatch
if acquire_concurrency_lock then ready
elsif discard_concurrent? then discard
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe discard does anything here, since it just deletes the execution. But I believe in this point we won't have an execution?

This isn't deleting the job as I once thought it was. This is effectively doing 'nothing'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe discard does anything here, since it just deletes the execution. But I believe in this point we won't have an execution?

That's right, we won't have an execution yet. However, we need to ensure we don't create the job. I think this will result in the job record being created. That's why I think we need to rollback the current transaction. There should be a transaction open since this is called as an after_create callback on the job.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rosa ah - thanks for pointing out the callback. Because dispatch is also used in bulk enqueueing, should we rollback all jobs in the bulk enqueue if one failed to be enqueued due to concurrency controls?

else
block
end
Expand Down
4 changes: 3 additions & 1 deletion lib/active_job/concurrency_controls.rb
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,17 @@ module ConcurrencyControls
class_attribute :concurrency_group, default: DEFAULT_CONCURRENCY_GROUP, instance_accessor: false

class_attribute :concurrency_limit
class_attribute :concurrency_at_limit
class_attribute :concurrency_duration, default: SolidQueue.default_concurrency_control_period
end

class_methods do
def limits_concurrency(key:, to: 1, group: DEFAULT_CONCURRENCY_GROUP, duration: SolidQueue.default_concurrency_control_period)
def limits_concurrency(key:, to: 1, group: DEFAULT_CONCURRENCY_GROUP, duration: SolidQueue.default_concurrency_control_period, at_limit: :block)
self.concurrency_key = key
self.concurrency_limit = to
self.concurrency_group = group
self.concurrency_duration = duration
self.concurrency_at_limit = at_limit
end
end

Expand Down
63 changes: 61 additions & 2 deletions test/models/solid_queue/job_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,14 @@ def perform(job_result)
end
end

class DiscardedNonOverlappingJob < NonOverlappingJob
limits_concurrency key: ->(job_result, **) { job_result }, at_limit: :discard
end

class DiscardedOverlappingJob < NonOverlappingJob
limits_concurrency to: 2, key: ->(job_result, **) { job_result }, at_limit: :discard
end

class NonOverlappingGroupedJob1 < NonOverlappingJob
limits_concurrency key: ->(job_result, **) { job_result }, group: "MyGroup"
end
Expand Down Expand Up @@ -98,6 +106,53 @@ class NonOverlappingGroupedJob2 < NonOverlappingJob
assert_equal active_job.concurrency_key, job.concurrency_key
end

test "enqueue jobs with discarding concurrency controls" do
assert_ready do
active_job = DiscardedNonOverlappingJob.perform_later(@result, name: "A")
assert_equal 1, active_job.concurrency_limit
assert_equal "SolidQueue::JobTest::DiscardedNonOverlappingJob/JobResult/#{@result.id}", active_job.concurrency_key
end

assert_discarded do
active_job = DiscardedNonOverlappingJob.perform_later(@result, name: "A")
assert_equal 1, active_job.concurrency_limit
assert_equal "SolidQueue::JobTest::DiscardedNonOverlappingJob/JobResult/#{@result.id}", active_job.concurrency_key
end
end

test "enqueuing multiple jobs with enqueue_all and concurrency controls" do
jobs = [
job_1 = DiscardedNonOverlappingJob.new(@result, name: "A"),
job_2 = DiscardedNonOverlappingJob.new(@result, name: "B")
]

enqueued_jobs_count = SolidQueue::Job.enqueue_all(jobs)
assert_equal enqueued_jobs_count, 1

assert job_1.successfully_enqueued?
assert_not job_2.successfully_enqueued?
end

test "enqueue jobs with discarding concurrency controls when below limit" do
assert_ready do
active_job = DiscardedOverlappingJob.perform_later(@result, name: "A")
assert_equal 2, active_job.concurrency_limit
assert_equal "SolidQueue::JobTest::DiscardedOverlappingJob/JobResult/#{@result.id}", active_job.concurrency_key
end

assert_ready do
active_job = DiscardedOverlappingJob.perform_later(@result, name: "A")
assert_equal 2, active_job.concurrency_limit
assert_equal "SolidQueue::JobTest::DiscardedOverlappingJob/JobResult/#{@result.id}", active_job.concurrency_key
end

assert_discarded do
active_job = DiscardedOverlappingJob.perform_later(@result, name: "A")
assert_equal 2, active_job.concurrency_limit
assert_equal "SolidQueue::JobTest::DiscardedOverlappingJob/JobResult/#{@result.id}", active_job.concurrency_key
end
end

test "enqueue jobs with concurrency controls in the same concurrency group" do
assert_ready do
active_job = NonOverlappingGroupedJob1.perform_later(@result, name: "A")
Expand Down Expand Up @@ -289,8 +344,12 @@ def assert_blocked(&block)
assert SolidQueue::Job.last.blocked?
end

def assert_job_counts(ready: 0, scheduled: 0, blocked: 0, &block)
assert_difference -> { SolidQueue::Job.count }, +(ready + scheduled + blocked) do
def assert_discarded(&block)
assert_job_counts(discarded: 1, &block)
end

def assert_job_counts(ready: 0, scheduled: 0, blocked: 0, discarded: 0, &block)
assert_difference -> { SolidQueue::Job.count }, +(ready + scheduled + blocked + discarded) do
assert_difference -> { SolidQueue::ReadyExecution.count }, +ready do
assert_difference -> { SolidQueue::ScheduledExecution.count }, +scheduled do
assert_difference -> { SolidQueue::BlockedExecution.count }, +blocked, &block
Expand Down