Batch up all deletes/updates in an Update API call #111
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #106
Previously, when an update call was issued to dateilager, we would run a set of queries for every object being updated. This resulted in A LOT of DB chatter. For example, during application creating in Gadget:
You can see 1000s of pgx spans are nested under
update-objects
andupdate-packed-objects
. This PR fixes that by bulk inserting. Since we have a dynamic number of objects to insert, we can't do a regular INSERT due to potential limitations on the number of parameters. Instead, we use another trick: create a temporary table that is dropped after the transaction completes where we stage all of the updates, and useCOPY
to efficiently get data into it.The end result is pretty good!
Ignoring the packed objects update (discussion below), the regular update call goes from 1k+ queries taking 289ms to 6 calls taking 44ms. I'll have to benchmark a bunch, but on average I'm thinking this could be close to an order of magnitude improvement. Hard to talk specifics, but it's also likely going to be a big win for the production DB too, since it has to deal with less chatter.
Topics of conversation
First, the regular update works really well, but I still need to investigate what's making the packed objects version so slow. It's specifically the
CopyFrom
call, but I would have suspected that to be significantly faster.Second, there were some comments about doing certain things outside of a transaction to avoid deadlocks. I don't do that any more, but I suspect we'll be okay because we do everything in single statements. The comments mentioned deadlocks were observed in tests, but I didn't observe any issues with a Gadget-side PR using a DL built from this branch for testing.
Finally, we likely now need to load a bunch more stuff into memory at once. Assuming I can make the packed objects query have gains similar to the update, how do we feel about that? If we're feeling a bit uncomfy, I can set it all up to batch
CopyFrom
calls so that we stay within some amount of memory per call.