How to speed up INSERT

Discussion of open issues, suggestions and bugs regarding SDAC (SQL Server Data Access Components) for Delphi, C++Builder, Lazarus (and FPC)
Post Reply
Zoran565
Posts: 10
Joined: Mon 26 Jun 2006 19:23

How to speed up INSERT

Post by Zoran565 » Mon 26 Jun 2006 19:43

I have about 160 million records to import into one master table and one child table. All input records are in various .csv files. To improve the speed of insert I did not define all the indexes on the table, except two. Table has IDENTITY field as PK and one digest field which is binary(16) and it is an IX. I check for duplicates on this digest field.

I use MSQuery to import. I have an output IDENTITY parameter (SET :RowId=SCOPE_IDENTITY()). I need IDENTITY to insert row into child table. I start transaction and then every 10,000 rows I commit it and start a new one.

Whole process is painfully slow. It is inserting about 60 rows a second.

Questions:

1. How can I improve the performance?

2. I have a feeling that MSQuery is doing AutoCommit after every INSERT (regardless of transaction). Is that true and, if it is, how do I disable it?

Any help is appreciated.

Thanks

BDS2006, SDAC 3.70, SQL Server 2005 SP1, WinXP Pro SP2, 4GB RAM, 500GB Hard Disk

Jackson
Posts: 512
Joined: Thu 26 Jan 2006 10:06

Post by Jackson » Tue 27 Jun 2006 09:31

> 1. How can I improve the performance?

- Use batches. This will reduce the number of network roundtrips to the server.
- Try to use TMSSQL instead of TMSQuery.
- Use prepared execution to execute a parameterized INSERT statement.

> 2. I have a feeling that MSQuery is doing AutoCommit after every INSERT (regardless of transaction). Is that true and, if it is, how do I disable it?

This should not happen. If you test it an ensure that COMMIT actually happens, please send us a complete small test project to reproduce the problem; it is desirable to use Northwind or Master schema objects, otherwise include definition of your own database objects; don't use third party components.

Zoran565
Posts: 10
Joined: Mon 26 Jun 2006 19:23

Post by Zoran565 » Tue 27 Jun 2006 22:27

- Use batches. This will reduce the number of network roundtrips to the server.
I think I cannot use batches because I need IDENTITY column to insert row in the child table. Is there a way to use batches and get IDENTITY for each inserted master row?
- Try to use TMSSQL instead of TMSQuery.
Changed from MSQuery to MSSQL but did not see any performance improvement.
- Use prepared execution to execute a parameterized INSERT statement.
Cannot prepare because I have an output parameter in INSERT statement (SET :RowId=SCOPE_IDENTITY()).

Well, I guess 60 rows per second is what I'm stuck with. That is 4 million rows in 24 hours. Tough on initial import, but later additions of new rows will be moderate.
> 2. I have a feeling that MSQuery is doing AutoCommit after every INSERT (regardless of transaction). Is that true and, if it is, how do I disable it?

This should not happen.
It is not happening. My mistake. Slow speed of inserts threw me off...

Thanks.

Jackson
Posts: 512
Joined: Thu 26 Jan 2006 10:06

Post by Jackson » Thu 29 Jun 2006 13:03

You can create stored procedure or extended stored procedure on server which will care about insertion logic.
Then just pass actual data as arguments to the procedure.

Post Reply