You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
do not schedule jobs inside a transaction if a record created/updated inside this transaction will be directly or indirectly accessed by a job: this comes with a risk of not finding a record, or getting a record with non-updated data in case job is performed before the transaction is committed. Use Sidekiq::Postpone.wrap around each transaction do
extend default limiter backoff.
it's in minutes, this is not so effective if you want to run jobs ASAP (seconds), as it re-schedules,
e.g. puts jobs after all other jobs. Do not set it to too low (e.g. 1s), otherwise overhead will start kicking in
It'd be nice if there was more concrete argumentation for when you need the commercial features
limiter policy: :skip not to raise
write about lack of concurrency on cpu-intensive jobshttps://www.mikeperham.com/2019/07/19/something-for-nothing/
think of a better way of dealing with limiter backoff, keeping all of them in a single file is weird (#default-limiter-backoff guideline)
mention that in Sidekiq-ent 2.0 it can be per-limiter
race condition in Business Logic in Jobs (stale data, the job should not be performed) - same problem without transactions, and also with, without locking)
precaution - don't use queue settings in sidekiq_options
serialization: only simple objects or what ActiveJob supports. custom serializers
what happens if the record is removed and can't be deserialized. use a discard_on ActiveRecord::RecordNotFounddiscard_on ActiveJob::DeserializationError in ApplicationJob?
? ALLOW_CONCURRENCY=true vs RAILS_ENV=production in local testing, nested concurrent limiters misbehave, sometimes neither works. prefer to use EAGER_LOAD=true?
emphasize on the distinction between jobs that don't need automatic retries and ephemeral jobs
The retry property can be set on a worker or specific job to disable retries completely (job goes straight to Dead) or disable death (the failed job is simply discarded). If your Failed count is increasing but you don't see anything in the Retry or Dead tabs, it's likely you've disabled one or both of those:
class SomeWorker
# will be completely ephemeral, not in Retry or Dead
sidekiq_options retry: false
# will go immediately to the Dead tab upon the first failure
sidekiq_options retry: 0
mention limiters for a long-running one-off in addition to batches
improve limiter backoff guideline and provide an example of how:
a limiter can be shared across jobs
a limiter can be locally specified (injected?)
a limiter is something-specific, e.g. user_id
mention leaky bucket rate limiter
? set lock_timeout to longer than average job execution time to prevent premature lock release lock_timeout option ensures a crashed Ruby process does not hold a lock forever. You must ensure that your operations take less than this number of seconds
The same Limiter (based on the name) may be used with different lock_timeout values which allow for different blocks of code to lock on the same resource with a different lock_timeout.
exception to pass by id for external ids - harmonize with cop (salesforce_external_id)
uncached
+# In Rails 5, ActiveJob execution is wrapped with code reloader, which
+# as one of its callbacks enables query cache. There are some background
+# jobs (e.g. {MidasReportJob}) that traverse many records and do not create
+# any. In such cases, the query cache is being filled with data that is not
+# being used ever again and it is huge. In Rails 4.2, jobs were running
+# without query cache and we didn't notice any slowdowns or database
+# overloads. The disabling cache will prevent from creating such cache
+# and will make job server use much less memory.
+#
class ApplicationJob < ActiveJob::Base
extend Memoist
+
+ around_perform do |_job, block|
+ ActiveRecord::Base.uncached do
+ block.call
+ end
+ end
recommend MALLOC_ARENA_MAX=2
Draper: Active Job Integration. Active Job allows you to pass ActiveRecord objects to background tasks directly and performs the necessary serialization and deserialization.
In order to do this, arguments to a background job must implement Global ID.
Decorated objects implement Global ID by delegating to the object they are decorating.
This means you can pass decorated objects to background jobs, however, the object won't be decorated when it is deserialized.
extend renaming jobs guideline regarding the changes to perform signatures, there can be scheduled jobs with old signature serialized
avoid changing mailers' signatures just like regular job's signatures
avoid using Redis namespaces if you keep data for several different Sidekiq instances in one Redis instance. (check if advanced Sidekiq features using Lua can interfere and even respect namespaces when used with e.g. redis-namespace)
when using Reliable Push, keep Redis timeouts (connection pool's network_timeout, connection pool's number of connections, redis's timeout, connect_timeout, read_timeout, write_timeout, reconnect_attempts, reconnect_delay, reconnect_delay_max) to the minimum. Doing otherwise would result in 30s+ pushes when Redis is down/degraded
don't rely on queue weights for job prioritization
make sure to add mailers to Sidekiq's configuration when using it with Active Job's deliver_later
use custom DeliveryJob to separate mailers between queues with different priorities
config: how to set up different pool sizes for Rails and Sidekiq processes (per-process RAILS_MAX_THREADS, and pool in config/database.yml for Rails and Sidekiq, ...)
Add a guideline to not pass large arguments to jobs, as they consume storage memory. This is especially risky for Redis which may just stop accepting jobs, causing issues. This is partially solved by Sidekiq reliable push, but only while the process is alive: chances are a web server will be rebooted before it has a chance to submit those jobs temporarily stored in memory. With ephemeral file systems, and jobs scheduled outside of db transactions (rather after commit), this is a way to losing jobs.
While other storages like pg don’t apply the same hard constraints on the jobs storage, and won’t cause a problem when a memory limit is hit, but they come with other unwanted side effects as postgres toast. Ask que/good_job/solidqueue guys what is they experience with storing many jobs. Off the top of my head, good_job had a note about processing millions of jobs, not more, which is not much compared to what I’ve seen with Sidekiq and Redis, thousands a second, over 50M daily, with each job having more than 1k of payload, and occasional jams of millions of jobs to process.
The text was updated successfully, but these errors were encountered:
This is a bag of thoughts issue, I encourage everyone to file more specific detailed issues and send pull requests.
do not use keyword arguments in plain Sidekiq workers Sidekiq does not support keyword arguments sidekiq/sidekiq#2372 (check if it works or not with ActiveJob)
do not schedule jobs inside a transaction if a record created/updated inside this transaction will be directly or indirectly accessed by a job: this comes with a risk of not finding a record, or getting a record with non-updated data in case job is performed before the transaction is committed. Use
Sidekiq::Postpone.wrap
around eachtransaction do
https://github.com/palkan/isolator can be used to detect jobs scheduled from inside a transaction
avoid setting the queue through sidekiq_options
explain the difference between bucket and window
extend default limiter backoff.
it's in minutes, this is not so effective if you want to run jobs ASAP (seconds), as it re-schedules,
e.g. puts jobs after all other jobs. Do not set it to too low (e.g. 1s), otherwise overhead will start kicking in
It'd be nice if there was more concrete argumentation for when you need the commercial features
limiter policy: :skip not to raise
write about lack of concurrency on cpu-intensive jobshttps://www.mikeperham.com/2019/07/19/something-for-nothing/
think of a better way of dealing with limiter backoff, keeping all of them in a single file is weird (#default-limiter-backoff guideline)
mention that in Sidekiq-ent 2.0 it can be per-limiter
race condition in Business Logic in Jobs (stale data, the job should not be performed) - same problem without transactions, and also with, without locking)
avoid using unique jobs?
it's mentioned sidekiq/sidekiq@40cc629#diff-0469ba29524ee01a49be37ff05e1f499R12
precaution - don't use
queue
settings insidekiq_options
serialization: only simple objects or what ActiveJob supports. custom serializers
what happens if the record is removed and can't be deserialized. use a
discard_on ActiveRecord::RecordNotFound
discard_on ActiveJob::DeserializationError
inApplicationJob
?how does discard_on work with batches
adopt the style suggested by Dan Allen - have separate good and bad code examples with explanations in their captions.
https://asciidoctor.org/docs/asciidoc-syntax-quick-reference/#source-code
? ALLOW_CONCURRENCY=true vs RAILS_ENV=production in local testing, nested concurrent limiters misbehave, sometimes neither works. prefer to use EAGER_LOAD=true?
emphasize on the distinction between jobs that don't need automatic retries and ephemeral jobs
The retry property can be set on a worker or specific job to disable retries completely (job goes straight to Dead) or disable death (the failed job is simply discarded). If your Failed count is increasing but you don't see anything in the Retry or Dead tabs, it's likely you've disabled one or both of those:
mention limiters for a long-running one-off in addition to batches
improve limiter backoff guideline and provide an example of how:
a limiter can be shared across jobs
a limiter can be locally specified (injected?)
a limiter is something-specific, e.g. user_id
mention leaky bucket rate limiter
? set lock_timeout to longer than average job execution time to prevent premature lock release lock_timeout option ensures a crashed Ruby process does not hold a lock forever. You must ensure that your operations take less than this number of seconds
The same Limiter (based on the name) may be used with different lock_timeout values which allow for different blocks of code to lock on the same resource with a different lock_timeout.
track your errors in error tracking software
infra: use persistent, reliable, dedicated redis (sentinels, ...)
https://github.com/mperham/sidekiq/wiki/Using-Redis
don't use Process.clock_gettime(Process::CLOCK_MONOTONIC) to share data across processes/machines
Monotonic times are not comparable sidekiq/sidekiq#4105
timeout (includes Net::HTTP and libraries using it, provide a reference to its source)
Don't ever use Ruby's Timeout module. You will get mysterious stuck or hung processes randomly.
http://www.mikeperham.com/2015/05/08/timeout-rubys-most-dangerous-api
https://github.com/mperham/sidekiq/wiki/Problems-and-Troubleshooting#frozen-processes
net:http based: httparty, restclient, open-uri https://github.com/httprb/http#another-ruby-http-library-why-should-i-care
exception to pass by id for external ids - harmonize with cop (salesforce_external_id)
uncached
recommend MALLOC_ARENA_MAX=2
Draper: Active Job Integration. Active Job allows you to pass ActiveRecord objects to background tasks directly and performs the necessary serialization and deserialization.
In order to do this, arguments to a background job must implement Global ID.
Decorated objects implement Global ID by delegating to the object they are decorating.
This means you can pass decorated objects to background jobs, however, the object won't be decorated when it is deserialized.
extend renaming jobs guideline regarding the changes to
perform
signatures, there can be scheduled jobs with old signature serializedavoid changing mailers' signatures just like regular job's signatures
avoid using Redis namespaces if you keep data for several different Sidekiq instances in one Redis instance. (check if advanced Sidekiq features using Lua can interfere and even respect namespaces when used with e.g.
redis-namespace
)when using Reliable Push, keep Redis timeouts (connection pool's
network_timeout
, connection pool's number of connections,redis
'stimeout
,connect_timeout
,read_timeout
,write_timeout
,reconnect_attempts
,reconnect_delay
,reconnect_delay_max
) to the minimum. Doing otherwise would result in 30s+ pushes when Redis is down/degradeddon't rely on queue weights for job prioritization
make sure to add
mailers
to Sidekiq's configuration when using it with Active Job'sdeliver_later
use custom DeliveryJob to separate mailers between queues with different priorities
config: how to set up different pool sizes for Rails and Sidekiq processes (per-process
RAILS_MAX_THREADS
, andpool
inconfig/database.yml
for Rails and Sidekiq, ...)Misses mention of https://github.com/sidekiq/sidekiq/wiki/Advanced-Options#transactional-push (if you enqueue in transactions, you likely want to use this)
Add a guideline to not pass large arguments to jobs, as they consume storage memory. This is especially risky for Redis which may just stop accepting jobs, causing issues. This is partially solved by Sidekiq reliable push, but only while the process is alive: chances are a web server will be rebooted before it has a chance to submit those jobs temporarily stored in memory. With ephemeral file systems, and jobs scheduled outside of db transactions (rather after commit), this is a way to losing jobs.
While other storages like pg don’t apply the same hard constraints on the jobs storage, and won’t cause a problem when a memory limit is hit, but they come with other unwanted side effects as postgres toast. Ask que/good_job/solidqueue guys what is they experience with storing many jobs. Off the top of my head, good_job had a note about processing millions of jobs, not more, which is not much compared to what I’ve seen with Sidekiq and Redis, thousands a second, over 50M daily, with each job having more than 1k of payload, and occasional jams of millions of jobs to process.
The text was updated successfully, but these errors were encountered: