Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New efcore provider: Implementing batched update support #35512

Open
ronnyek opened this issue Jan 22, 2025 · 4 comments
Open

New efcore provider: Implementing batched update support #35512

ronnyek opened this issue Jan 22, 2025 · 4 comments

Comments

@ronnyek
Copy link

ronnyek commented Jan 22, 2025

Question

Greetings! I'm still plugging my way through an efcore implementation for a proprietary database engine. I've made quite a lot of progress, but noticed I'm not seeing batch support happening automatically. I'm not sure this is explicit bulk insert, but rather sending multiple updates delimited by semi colon or something. I've just got a few questions on implementation of this. Article about batched updates

  1. Its safe to assume that batched updates as discussed above are different than the bulk insert implementations floating around, right?
  2. Do those batched updates handle inserts, and if so... how do people typically handle tracking insert id's for that?
  3. For implementing, is it a simple matter of implementing IModificationCommandBatchFactory to handle that?
  4. Is there any place anyone can point me to see a reference implementation?

I've got a sample project where I'm simply trying to insert a number of records by AddRangeAsync and then calling SaveChangesAsync and they seem to run as a bunch of different queries on the same connection. For this demo project I've also configured min of 1 and max of 50 for batched statement sizes. I just don't know if batched updates are only for updates, or if they should cover inserts as well.

Thank you for your time, and any pointers you might be able to give

Your code

Stack traces


Verbose output


EF Core version

8.0.11

Database provider

Custom

Target framework

.net 8

Operating system

Windows 11

IDE

Visual Studio 2022

@AndriySvyryd
Copy link
Member

Since you mentioned IModificationCommandBatchFactory I'm assuming that you are using an ADO.NET provider to communicate with the database. If so, then #18990 should make implementing this easier in the future.

  1. Its safe to assume that batched updates as discussed above are different than the bulk insert implementations floating around, right?

At least in our terminology - yes. Bulk Insert is a highly optimized way of inserting data that doesn't allow retrieving any store-generated values from the inserted rows. Tracked by #27333

  1. Do those batched updates handle inserts, and if so... how do people typically handle tracking insert id's for that?

Yes and this is a tricky part. For SQL Server we use MERGE ... INSERT ... OUTPUT ..., however the OUTPUT clause is essentially a SELECT in the sense that it doesn't guarantee that the results are returned in the same order as they were sent to the database. So, we add an artificial ordinal column, just in the scope of the insert, that we can use to ensure that the order is maintained.

See SqlServerUpdateSqlGenerator.cs for details.

  1. For implementing, is it a simple matter of implementing IModificationCommandBatchFactory to handle that?

Yes, and choose whether to derive from ReaderModificationCommandBatch.cs or AffectedCountModificationCommandBatch.cs depending on how the number of rows affected is returned by the database.

@AndriySvyryd AndriySvyryd added this to the Discussions milestone Jan 28, 2025
@ronnyek
Copy link
Author

ronnyek commented Jan 28, 2025

Since you mentioned IModificationCommandBatchFactory I'm assuming that you are using an ADO.NET provider to communicate with the database. If so, then #18990 should make implementing this easier in the future.

Yes, and I think I'll make sure we do that. Is this meant to handle any mutation (insert/update/delete)?

Yes and this is a tricky part. For SQL Server we use MERGE ... INSERT ... OUTPUT ..., however the OUTPUT clause is essentially a SELECT in the sense that it doesn't guarantee that the results are returned in the same order as they were sent to the database. So, we add an artificial ordinal column, just in the scope of the insert, that we can use to ensure that the order is maintained.

I believe this could work the same way, though one thing I'm actually curious about is ensuring db assigned ID's are communicated back to efcore infrastructure correctly (if this is even a requirement). Eg, If I essentially have 10 inserts happening, its easy enough to batch those by delimiting with semicolon. For each insert I can easily capture the newly selected ID, or the id and columns being expected back (using affected rows and scope_identity() which works like sql servers scope_identity(), but is the sql server implementation essentially taking the output which returns insert.* and re-mapping that back to the entities that were told to insert? Is this where the artificial ordinal comes in?

I understand a bit more now as to why this is automatically just creating its own statement per 'batch' because the ModificationCommandBatchFactory is returning a SingularCommandBatch, so I get how to change that to a ModificationCommandBatch I create, I just need to make sure I can actually insert/update and be able to return data from that, that the efcore infrastructure expects (and that may not be possible)

@ronnyek
Copy link
Author

ronnyek commented Jan 28, 2025

@AndriySvyryd Do you know if there are test I can run that would show exactly what is happening with sql server as far as here are x inserts, here is the statement batch sent to sql server and what it returns back? I found ModificationCommandBatch tests and ModificationCommandBatchFactoryTests but from what I can see, it looks like its unit testing statement limits etc. Would this be a good example?

DECLARE @inserted0 TABLE ([Id] int);
INSERT INTO [dbo].[Ducks] ([Id])
OUTPUT INSERTED.[Id]
INTO @inserted0
VALUES (DEFAULT),
(DEFAULT);
SELECT [t].[Id], [t].[Computed] FROM [dbo].[Ducks] t
INNER JOIN @inserted0 i ON ([t].[Id] = [i].[Id]);

So essentially declaring a variable of table type, then insert into the table outputting inserted id into that table (in this case with default values), ultimately returning the ids in the order they were inserted?

@AndriySvyryd
Copy link
Member

Is this meant to handle any mutation (insert/update/delete)?

Yes, once we implement the support for it in EF (not this year).

but is the sql server implementation essentially taking the output which returns insert.* and re-mapping that back to the entities that were told to insert? Is this where the artificial ordinal comes in?

Yes, SQL Server needs to do it because the generated values are returned as a table. If you use a separate statement for each row there should be no problem with the order. ReaderModificationCommandBatch doesn't have any code to read back the values, but you can look at

protected override async Task ConsumeAsync(

It allows reading the results from a single resultset or single row per resultset or anything in between.

@AndriySvyryd Do you know if there are test I can run that would show exactly what is happening with sql server as far as here are x inserts, here is the statement batch sent to sql server and what it returns back?

Take a look at https://github.com/dotnet/efcore/blob/44580352b37ed0df4fa02236c7d344b5bb003ef3/test/EFCore.SqlServer.FunctionalTests/BatchingTest.cs

You could use SSMS to look at the sent SQL.

So essentially declaring a variable of table type, then insert into the table outputting inserted id into that table (in this case with default values), ultimately returning the ids in the order they were inserted?

Yes, pretty much, this is for the most complex case where the keys are generated in addition to other columns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants