Batch / bulk insert problem (EF Core)

Discussion of open issues, suggestions and bugs regarding ADO.NET provider for PostgreSQL
Post Reply
ole.tetzschner
Posts: 17
Joined: Thu 12 Jul 2018 10:57

Batch / bulk insert problem (EF Core)

Post by ole.tetzschner » Tue 06 Nov 2018 10:27

Hi

Can't get batch-insert working on EF Core (transaction). The child-tables are inserted one-by-one with a RETURNING (according to dbMonitor).

In my inherit class:

Code: Select all

        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        {
            optionsBuilder.UsePostgreSql( connectionString);

            PgSqlEntityProviderConfig config = PgSqlEntityProviderConfig.Instance;
            config.DmlOptions.BatchUpdates.Enabled = true;
            config.DmlOptions.BatchUpdates.BatchSize = 100;
        }
It must be in a transaction (according to the first DELETE).

Code: Select all

	var pgMon = new PgSqlMonitor();
	pgMon.IsActive = true;

	using (var myContext = new MyDataModel("User Id=...;Password=...;Host=...;Database=...;Unicode=True;Persist Security Info=True;Initial Schema=..."))
	{
		myContext.Database.BeginTransaction();

		// DELETE
		myContext.myTable1.RemoveRange(myContext.myTable1
									.Where(s => s.t1Id == "00001"));
		myContext.SaveChanges();

		// INSERT (bulk/batch)
		// ...
		// myData is one record of table1 and multiple one-to-many relation to table2 (and table3). All tables has autogenerated ID's and TIMESTAMPs (when inserted).
		// 

		myContext.myTable1.Add(myData);
		int c1 = myContext.SaveChanges();

		myContext.Database.CommitTransaction();

		pgMon.IsActive = false;
		pgMon.Dispose();
	}
BUT dbMonitor shows lots of INSERT's (with RETURNING id and timestamp) instead of COPY. Why????

Shalex
Site Admin
Posts: 9543
Joined: Thu 14 Aug 2008 12:44

Re: Batch / bulk insert problem (EF Core)

Post by Shalex » Fri 09 Nov 2018 10:19

ole.tetzschner wrote: Tue 06 Nov 2018 10:27lots of INSERT's (with RETURNING id and timestamp) instead of COPY. Why????
When you create an instance of the context for inserting a lot of records, turn off autogeneration for TIMESTAMPs and IDs in the mapping to avoid creating the RETURNING clause.

ole.tetzschner
Posts: 17
Joined: Thu 12 Jul 2018 10:57

Re: Batch / bulk insert problem (EF Core)

Post by ole.tetzschner » Mon 12 Nov 2018 11:34

Shalex wrote: Fri 09 Nov 2018 10:19
ole.tetzschner wrote: Tue 06 Nov 2018 10:27lots of INSERT's (with RETURNING id and timestamp) instead of COPY. Why????
When you create an instance of the context for inserting a lot of records, turn off autogeneration for TIMESTAMPs and IDs in the mapping to avoid creating the RETURNING clause.
Thanks Shalex, but how do you accomplish this in EF Core (database-first)?

Snippet from DataModel1.efml

Code: Select all

<?xml version="1.0" encoding="utf-8"?>
<efcore namespace="ConsoleApp1" p1:Guid="..." xmlns:p1="http://devart.com/schemas/EntityDeveloper/1.0">
  <class name="table2" entity-set="table2s" table="`table2`" schema="my_shema" p1:Guid="...">
    <id name="Id" type="Int64" value-generated="OnAdd" p1:ValidateRequired="true" p1:Guid="...">
      <column name="`Id`" not-null="True" sql-type="bigserial" p1:unicode="False" />
    </id>
    <property name="Created" type="DateTime" value-generated="OnAdd" p1:ValidateRequired="true" p1:Guid="...">
      <column name="`Created`" default="now()" not-null="True" sql-type="timestamptz" p1:unicode="False" />
    </property>
    <property name="myCol3" type="String" p1:ValidateRequired="true" p1:Guid="...">
      <column name="`myCol3`" not-null="True" sql-type="text" p1:unicode="True" />
    </property>
	...

Shalex
Site Admin
Posts: 9543
Joined: Thu 14 Aug 2008 12:44

Re: Batch / bulk insert problem (EF Core)

Post by Shalex » Fri 16 Nov 2018 18:01

Please remove value-generated="OnAdd" from definitions of your Id and Created properties:

Code: Select all

<?xml version="1.0" encoding="utf-8"?>
<efcore namespace="ConsoleApp1" p1:Guid="..." xmlns:p1="http://devart.com/schemas/EntityDeveloper/1.0">
  <class name="table2" entity-set="table2s" table="`table2`" schema="my_shema" p1:Guid="...">
    <id name="Id" type="Int64" p1:ValidateRequired="true" p1:Guid="...">
      <column name="`Id`" not-null="True" sql-type="bigserial" p1:unicode="False" />
    </id>
    <property name="Created" type="DateTime" p1:ValidateRequired="true" p1:Guid="...">
      <column name="`Created`" default="now()" not-null="True" sql-type="timestamptz" p1:unicode="False" />
    </property>
    <property name="myCol3" type="String" p1:ValidateRequired="true" p1:Guid="...">
      <column name="`myCol3`" not-null="True" sql-type="text" p1:unicode="True" />
    </property>
	...

ole.tetzschner
Posts: 17
Joined: Thu 12 Jul 2018 10:57

Re: Batch / bulk insert problem (EF Core)

Post by ole.tetzschner » Sat 17 Nov 2018 09:19

That does not work (I've already tried that).

1. Removing value-generated on Created (the timestamp with default now()), will result in table-dates 0001-01-01.
2. Removing value-generated on Id (bigserial auto incremented on postgresql-server), will result in the exception below:

Code: Select all

another instance with the key value '{Id: 0}' is already being tracked


How do you tell EF Core not to insert the columns Id and Created?

Shalex
Site Admin
Posts: 9543
Joined: Thu 14 Aug 2008 12:44

Re: Batch / bulk insert problem (EF Core)

Post by Shalex » Tue 20 Nov 2018 20:19

1. Id: please follow the steps
a) read a current value of the sequence used by your Id via ADO.NET
b) increment this value in your code and assign it to Id before each insert
c) execute a command like this https://stackoverflow.com/questions/213 ... postgresql via ADO.NET to sync sequence with Id max value

2. Created: set also not-null="False"

ole.tetzschner
Posts: 17
Joined: Thu 12 Jul 2018 10:57

Re: Batch / bulk insert problem (EF Core)

Post by ole.tetzschner » Thu 22 Nov 2018 08:03

Thanks, but that's a no-go. If your program are incrementing the Id, it will lead to concurrency-issues (when two different connections are inserting huge data-sets).

But goofing around, I think I've got a solution. I'll remove all child-inserts from the first INSERT. Then, in-the-same-transaction, let PgSqlLoader (reuse the connection, and transaction) take care of the huge insert (now COPY). In my test-project it seems fast.

Kind regards, Ole

Post Reply