-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
THREESCALE-2989 - simplify background deletion #3958
base: master
Are you sure you want to change the base?
Conversation
ar_object.destroyed_by_association = DummyDestroyedByAssociationReflection.new(entry) if arguments.first != entry | ||
ar_object.background_deletion_method_call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one difference with previous logic. Previously #destroy
without !
was used when destroyed by association. But I don't see why should this be done and so far haven't observed any negative effects. I think it is correct that the job should fail/retry if the #destroy!
method failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, OK, this is the response to my previous comment 😅
By the way, do you think background_deletion_method_call
method is necessary? I think it causes more confusion, and I would rather just do ar_object.send background_deletion_method
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be something like
ar_object.public_send(ar.object.background_deletion_method)
which is IMO ugly. Or do I miss something?
payment_gateway_setting | ||
buyer_accounts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice that these two are highlighted. Reading the original logic, all the reason for separate DeleteAccountHierarchyWorker
and DeletePaymentSettingHierarchyWorker
appears to be enforcing and order of executing the deletions. Since we do deleting serially here, the order is enforced by the order of entries put in background_deletion
so I removed all that and just arranged the entries like this.
I must say that I cannot definitively claim that I understood all implications of the original code. More like I vaguely understood it. I found it more important to understand the intentions of the original authors because the code was misbehaving anyways and is heavily been refactored.
Although further testing may lead to a realization of details that I missed. So don't blindly trust me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I blindly trust you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit concerned about the fact that many of these associations are applicable only for providers. I am not sure how it used to work in the previous implementation, and whether the buyer accounts were also deleted through background deletion (when being part of the providers's hierarchy), but when trying to schedule a buyer deletion on this branch, I eventually received:
[ActiveJob] [DeleteObjectHierarchyWorker] [e95bf17b-f675-4e05-90b0-2247188a1221] Error performing DeleteObjectHierarchyWorker (Job ID: e95bf17b-f675-4e05-90b0-2247188a1221) from Sidekiq(deletion) in 5088.16ms: DeleteObjectHierarchyWorker::DoNotRetryError (seems like unexpectedly broken delete hierarchy entry: Association-Account-6:payment_gateway_setting):
Maybe I was doing something wrong... will need to check further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They must have been deleted by background deletion in the past. I see no reason to believe otherwise.
But your question has a point. Are you able to write a test where this is reproduced? Or add to the existing test a gateway settings object and see if it reproduces.
I see special handling of payment gateway settings in the past. I thought only order should be enforced. But maybe I didn't fully understand the implications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, speaking of tests, they need a serious rework, I'm afraid... 🥲
I will try to reproduce. But in general it looks like most of these discrepancies are fixed by the suggestion here: https://github.com/3scale/porta/pull/3958/files#r1946512669
So, for most of the associations in this list, if called for a buyer, they will be either empty or nil.
However, I can't say I like it, because we already know that buyers will not have any of the associations in this list, probably apart from payment_detail
, users
and contracts
, so I think it's a bit waste of resources to query all the rest of the associations for them (especially as some providers have maaaaany buyers, so it can take quite a few queries to get all those non-existing associations).
Maybe we can make background_deletion
an instance method instead? That could take the value from a class constant for all of the classes, except Account. And for account, we'd check whether it's provider?
or not, and filter the list based on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to reproduce. But in general it looks like most of these discrepancies are fixed by the suggestion here:
So with the change you see a difference in behavior? Original code should handle generated objects. Perhaps need to add a nil
check just.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added 3612c2f
Nice catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a basic question I have here: what is the background deletion about?
Is it to replace dependent: destroy
for something the does the same but in background jobs? is it that?
@@ -113,6 +113,8 @@ class StaticPage < CMS::Builtin | |||
# CMS::Builtin::Page | |||
class Page < CMS::Builtin | |||
|
|||
validates :user_id, :forum_id, presence: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this related to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either somehow miraculously appeared out of nowhere or some confusion of github related to some other recent commits to master. Not part of this definitely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if this doesn't belong here, let's remove it to avoid confusion 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea how it appears here. I don't see this locally 🤔
Will check again after final rebase.
# @param hierarchy [Array<String>] something like ["Plain-Service-1234", "Association-Service-1234:plans" ...] | ||
# @note processes deleting a hierarchy of objects and reschedules itself at current progress after a time limit | ||
def perform(*hierarchy) | ||
return compatibility(hierarchy) unless hierarchy.first.is_a?(String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this method for? apparently the hierarchy
parameter, called _object
inside the method, is just being ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I don't care about the object. The idea is to take the root object of the original operation and schedule it anew with the new worker logic. The job locking should take care of discarding the pile of duplicates that could be present with the old logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This made me think also...
I think the issue is in the hierarchy
documentation comment.
hierarchy
is actually something like "gid://system/User/17", ["Hierarchy-Account-12"], "destroy"
, then it makes sense that the GID refers to the _object
variable, and the second argument is an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mayorova. So this is how hierarchy
was before this PR. But in that case, is "gid://system/User/17"
not a String
? Because the condition to call compatibility
is unless hierarchy.first.is_a?(String)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here variables have been set previously:
@object, workers_hierarchy, @background_destroy_method = job.arguments |
#object
is just a reader of the instance variable (
attr_reader :object, :caller_worker_hierarchy |
These are examples how it was called previously:
- https://github.com/3scale/porta/blob/master/app/models/backend_api.rb#L109
porta/app/workers/delete_object_hierarchy_worker.rb
Lines 36 to 41 in 1cb7785
def on_finish(method_name, options) workers_hierarchy = options['caller_worker_hierarchy'] info "Starting DeleteObjectHierarchyWorker##{method_name} with the hierarchy of workers: #{workers_hierarchy}" object = GlobalID::Locator.locate(options['object_global_id']) background_destroy_method = @background_destroy_method.presence || 'destroy' DeletePlainObjectWorker.perform_later(object, workers_hierarchy, background_destroy_method)
So the first argument is an AR object and what you see is the serialization that ActiveJob performs on such arguments.
P.S. If object is removed somehow before the job runs, the we get the ActiveJob::DeserializationError
that we speak about in another thread above. So once we stop calling the job with an object, that exception no longer needs to be handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks for clarifying. One more question: in the case we have a job which was enqueued with only an object as param, like from https://github.com/3scale/porta/blob/master/app/models/backend_api.rb#L109, then the compatibility
will do nothing and return, as I understand it:
porta/app/workers/delete_object_hierarchy_worker.rb
Lines 135 to 138 in 3612c2f
def compatibility(_object, caller_worker_hierarchy = [], _background_destroy_method = 'destroy') | |
# maybe requeue first object from hierarchy would be adequate and uniqueness should deduplicate jobs | |
hierarchy_root = Array(caller_worker_hierarchy).first | |
return unless hierarchy_root |
Then we are losing that job, or am I wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes, I had the same question actually.
If the job has just been scheduled (with the old code), but not processed yet (so, no hierarchy has not been populater), I think we need to still process it, based on the first argument (object).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Although this should be a super low chance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed with 4395034
My understanding is that we want to split a large delete operation into smaller ones that can run in the background without DoS-ing the server. Also if we try to delete in one go, on a busy server it is likely to hit a transaction deadlock or whatever that will make the delete always fail.
I wouldn't say it is to replace. I see it more like a way to define a reasonable hierarchy that practically works with tenants with a lot of related objects. In general we still want to delete all dependent objects. And since the hierarchy definition is not complete, some objects will deceive Walking the hierarchy from bottom to top, we make sure that there are not too many objects for I hope this makes sense. |
def batch_description | ||
"Deleting #{object.class.name} [##{object.id}]" | ||
end | ||
if ar_object.is_a?(Account) && !ar_object.should_be_deleted? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe?
if ar_object.is_a?(Account) && !ar_object.should_be_deleted? | |
if ar_object.is_a?(Account) && ar_object.should_not_be_deleted? |
Also, I'm trying to find the definition of Account#scheduled_for_deletion?
and couldn't find it, do you know where is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think should_be_deleted?
should only be checked for provider, because buyer accounts are not marked for deletion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But buyer accounts inherit should be deleted status from their provider
def will_be_deleted?
scheduled_for_deletion? || (buyer? && provider_account.try(:should_be_deleted?))
end
So no issue with that check. Except that background deleting a buyer without deleting a provider will not be ever possible.
Should it be possible?
P.S. I believe in the past is wasn't. While a little inconsistent, I it might be a reasonable safety measure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments:
- When deleting an account, when is the first
DeleteObjectHierarchyWorker
scheduled? I couldn't find the place? - I miss some unit tests for
hierarchy_entries_for
which is crucial to be sure it works fine.
There is no such place I'm aware of. The idea is to have a cron job so something to clear old accounts regularly. Without such, scheduled for deletion accounts stay in this state indefinitely.
Many tests need to be changed/added, this is a draft to get comments before investing too much into the details. |
app/models/alert.rb
Outdated
# self.background_deletion_scope_name = :non_deleted | ||
# | ||
# scope :non_deleted, -> { where.not(state: :deleted) } | ||
alias delete_truly delete # keep reference to the original AR #delete method to use for background deletion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than aliasing delete
as delete_truly
, I would really rather just change the method that changes the alert state:
event :mark_as_deleted do
transition [:unread, :read] => :deleted
end
And then, in both controllers that are currently calling alert.delete!
change this invocation to alert.mark_as_deleted!
. I think that would be much more clear.
Indeed, I think that the alerts not being deleted when the corresponding cinstance (or the account owning it) is deleted is just an oversight.
Actually, the fact that the alerts are just marked as deleted and not actually deleted on UI actions is also questionable, I think. But I guess this is because there are two "consumers" of the alerts - the provider and the buyer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but I'd like it to be a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about first doing this refactoring in a separate PR, confirming that nothing is broken (as per the tests), and then rebase this PR on top of it?
I think it's more confusing to add these aliases, and then to remove them...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really enthusiastic. I'll check if it is easy and give up if not. ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seemed fairly easy to me, like 3 small changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think it would be nice to rebase anyway, because the current branch seems to be based on Rails 6.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #3959
I'm a little concerned whether that method was ever used in drops. See
https://github.com/3scale/porta/blob/1cb77853cb82d3b66d3012137d2b621bb3e24643/doc/liquid/drops.md#alert-drop
@@ -5,32 +5,17 @@ module BackgroundDeletion | |||
|
|||
included do | |||
class_attribute :background_deletion, default: [], instance_writer: false | |||
class_attribute :background_deletion_method, default: :destroy!, instance_writer: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the original default method was destroy
, while now it's destroy!
. I guess that was a deliberate choice?..
I'm wondering what are the potential consequences though. I mean, probably it's better to just abort the deletion if there is an issue, rather than let some intermediate deletes silently fail... But just wanted to ask anyway :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my original comment. Originally the default was destroy
for "by association" and destroy!
for the main object.
See
porta/app/workers/delete_plain_object_worker.rb
Lines 49 to 53 in 1cb7785
def destroy_method(bang_if_possible: false) | |
object_destroy_method = @destroy_method.presence || 'destroy' | |
(bang_if_possible && object_destroy_method == 'destroy') ? 'destroy!' : object_destroy_method | |
end |
should_destroy_by_association? ? destroy_by_association : object.public_send(destroy_method(bang_if_possible: true)) |
No reason I see to avoid bang. I hope testing in stg will reveal any issues.
@@ -5,32 +5,17 @@ module BackgroundDeletion | |||
|
|||
included do | |||
class_attribute :background_deletion, default: [], instance_writer: false | |||
class_attribute :background_deletion_method, default: :destroy!, instance_writer: false | |||
class_attribute :background_deletion_scope_name, default: :all, instance_writer: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this? Seems that the only place it is currently used in is the alerts, but it's commented out...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you say, I will remove. I wanted in the PoC to show the two approaches. But it seems we are on the same page that it was an overlook originally so we just remove any objects, not change state.
return unless (tenant_id = object.tenant_id) | ||
caller_worker_hierarchy.include?("Hierarchy-Account-#{tenant_id}") | ||
def hierarchy_entries_for(...) | ||
self.class.send(:hierarchy_entries_for, ...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it probably be substituted by delegate :hierarchy_entries_for, to: :class, private: true
? Seems cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private: true
makes the delegated method private, it doesn't allow you to call a private method. This was also my original thought though.
rescue ActiveRecord::UnknownPrimaryKey => exception | ||
Rails.logger.info "DeleteObjectHierarchyWorker#perform raised #{exception.class} with message #{exception.message}" | ||
end | ||
self.class.delete_later object_class.constantize.find(id.to_i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we probably throw DoNotRetryError
if the object is not found? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, good catch, actually better do not queue if object is not found
def destroyable_association?(_association) | ||
true | ||
# we can just use job.arguments.first but for compatibility mode we want it not to lock | ||
def lock_key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I don't quite get it - is this locking used?... Is this method used by activejob-uniqueness
? If so, let's leave a comment, because I think it's a bit confusing :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how activejob-uniqueness
works standard. If you think it is needed, suggest a comment and I'll add.
|
||
ReflectionDestroyer.new(object, reflection, caller_worker_hierarchy).destroy_later | ||
started = now | ||
while hierarchy.present? && now - started < WORK_TIME_LIMIT_SECONDS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, to make sure I understand correctly:
we process the array of hierarchy for N seconds (5 at the moment), and when the time is up we reschedule whatever is still left?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct
case reflection.macro | ||
when :has_many | ||
# here we keep original hierarchy entry if we still find an associated object | ||
dependent = ar_object.public_send(association).public_send(:background_deletion_scope).take |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is this .take
deliberate? Is this to add just one object to the tree on each iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, one at a time
dependent ? [hierarchy_association_string, *hierarchy_entries_for(dependent)] : [] | ||
when :has_one | ||
# maximum of one associated so we never keep the original hierarchy entry | ||
hierarchy_entries_for ar_object.public_send(association) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hierarchy_entries_for ar_object.public_send(association) | |
dependent = ar_object.public_send(association) | |
dependent ? hierarchy_entries_for(dependent) : [] |
There is an issue with :profile
association on Account, it is initialized lazily, so for some providers will not exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how comment is related to precisely this point of the PR. But there is an additional check somewhere
So if object has just been initialized, I expect it would be skipped.
I am a bit concerned about the fact that we can only (apparently) schedule the deletion via
and I think with the current implementation it won't work. |
Users are not yet fixed. This is next step. Once you have generally approved the design. |
So there's no such cron job currently?, does that mean background deletion is currently dead code? |
Yes, there is. porta/app/lib/three_scale/jobs.rb Line 114 in 89fd879
It's just disabled for SaaS:
|
4395034
to
5418124
Compare
What this PR does / why we need it:
Presently background deletion relies on Sidekiq batches which don't work well with ActiveJob. Specifically failing a job registers triggers callbacks for successful job if I remember correctly.
Moreover there was a bug that calling #delete was calling a state machine method instead of the activerecord method.
These two have the potential to cause indefinite rescheduling and multiplication of delete jobs eventually exhausting server performance and sidekiq redis memory.
With the new implementation, we only use a single ActiveRecord job, do not rely on Sidekiq batches and basically assuring order of operations, the logic can be much better understood. Runoffs are hopefully impossible but should be much easier to contain because now only the
:deletion
queue is used and stopping this queue should also freeze the problem until it can be properly debugged and fixed.Which issue(s) this PR fixes
https://issues.redhat.com/browse/THREESCALE-2989
Verification steps
DeleteObjectHierarchyWorker.delete_later(myprovider)
For upgrade
Special notes for your reviewer:
This is just an early preview to gather feedback. It is very much untested in most regards except for a simple new test.
As for review, I think you better initially ignore old code and just read the new implementation of the Worker as new code. Then see if there are areas in the old code that have not been addressed by the new code. I know it is a pain.
It remains to test compatibility, self-rescheduling, other existing users of the worker (except for the service deletion which is the only one tested atm) and who knows what else.