Large batch insert: memory issue? too many connections

Large batch insert: memory issue? too many connections

Postby wtftc » Thu 14 Oct 2010 22:34

I am trying to do a large batch insert of data from a flat file. The flat file is essentially a long csv from a third party data vendor: my test file has about 20k lines, but ideally I'd like this to be able to handle a file with upwards of a million lines.

To help illustrate my problem, I'll use a toy example. Imagine the flatfile has these columns: author_id|author_name|book_id|book_name. Also assume that my database starts out blank. I have two tables, an author table and a book table. The author table has an author_id column (primary key) and an author_name column. The book table has book_id (primary key), author_id and book_name columns.

There are a few problems here.
1) I'd like to be able to insert all data in a single transaction.
2) The file is too large to fit into memory so I'd like to be able to process the file line by line
3) I insert new authors as I see them, but one author may be scattered in the file. In other words, lines 8 and 9 may be books by Author1, lines 10 and 11 by Author2 and then line 12 will be another book by Author1. This leads to the need to query the author table by author_id to ensure that I don't try and insert the author twice.

So I have a question and an error. First the question. I am trying to do this all in a single transaction. Does the devart DataContext cache the change set in memory until the transaction has been committed? In other words, am I going to hit a memory problem even though my code processes the file line by line?

Second, I am getting an exception that says Too many connections. I am only using a single DataContext in a single transaction. How could I have used up too many connections? I understand that I am executing a lot of queries, but I don't see why that should be a problem.

Here is the stack trace:
Code: Select all
Test.API.RelationalStoreException : Error occurred during write operation.
  ----> Devart.Data.Linq.LinqCommandExecutionException : Error on opening DbConnection.
  ----> Devart.Data.MySql.MySqlException : Too many connections
at Test.Storage.Relational.MySqlRelationalStorage.<>c__DisplayClass8a.b__89() in MySqlRelationalStorage.cs: line 1022
at Test.Storage.Relational.MySqlRelationalStorage.WrapTx(Action dbCall) in MySqlRelationalStorage.cs: line 1057
at Test.Storage.Relational.MySqlRelationalStorage.WriteTx(Action dbCall) in MySqlRelationalStorage.cs: line 1007
at Test.Storage.Relational.MySqlRelationalStorage.BootstrapData(RelationalBootstrapDataArg arg) in MySqlRelationalStorage.cs: line 712
at Test.Storage.Relational.Tests.MySqlRelationalStorageTests.TestBootstrapData() in MySqlRelationalStorageTests.cs: line 244
--LinqCommandExecutionException
at Devart.Data.Linq.LinqCommandExecutionException.a(String A_0, Exception A_1)
at Devart.Data.Linq.Provider.n.g()
at Devart.Data.Linq.Provider.n.b(IConnectionUser A_0)
at Devart.Data.Linq.Provider.DataProvider.ExecuteQuery(CompiledQuery compiledQuery, Object[] parentArgs, Object[] userArgs, Object lastResult)
at Devart.Data.Linq.Provider.DataProvider.ExecuteAllQueries(CompiledQuery compiledQuery, Object[] userArguments)
at Devart.Data.Linq.Provider.DataProvider.CompiledQuery.Devart.Data.Linq.Provider.ICompiledQuery.Execute(IProvider provider, Object[] userArgs)
at Devart.Data.Linq.CompiledQuery.a(DataContext A_0, Object[] A_1)
at Devart.Data.Linq.CompiledQuery.Invoke(a A_0, b A_1)
at Test.Storage.Relational.MySqlRelationalStorage.ProcessBootstrapData(StreamReader bsymSR, DateTime bsymTime, StreamReader cusipSR, DateTime cusipTime) in MySqlRelationalStorage.cs: line 806
at Test.Storage.Relational.MySqlRelationalStorage.<>c__DisplayClass68.b__61() in MySqlRelationalStorage.cs: line 712
at Test.Storage.Relational.MySqlRelationalStorage.<>c__DisplayClass8a.b__89() in MySqlRelationalStorage.cs: line 1011
--MySqlException
at Devart.Data.MySql.bk.s()
at Devart.Data.MySql.bk.d()
at Devart.Data.MySql.u.a(String A_0, String A_1, String A_2, String A_3, Int32 A_4, String A_5, Int32 A_6, SshOptions A_7, SslOptions A_8, ProxyOptions A_9, MySqlHttpOptions A_10, HttpOptions A_11)
at Devart.Data.MySql.MySqlInternalConnection.Connect(MySqlConnection owner, String userId, String password, String host, String database, Int32 port, Int32 connectionTimeout, MySqlProtocol protocol, Boolean compress, Boolean clientInteractive)
at Devart.Data.MySql.MySqlInternalConnection..ctor(o connectionOptions, MySqlConnection owner)
at Devart.Data.MySql.ar.a(w A_0, Object A_1, DbConnectionBase A_2)
at Devart.Common.DbConnectionFactory.a(DbConnectionPool A_0, w A_1, DbConnectionBase A_2)
at Devart.Common.DbConnectionPoolGroup.a(DbConnectionPool A_0, DbConnectionBase A_1)
at Devart.Common.DbConnectionPool.a(DbConnectionBase A_0)
at Devart.Common.DbConnectionPool.GetObject(DbConnectionBase owningConnection)
at Devart.Common.DbConnectionFactory.a(DbConnectionBase A_0)
at Devart.Common.DbConnectionClosed.Open(DbConnectionBase outerConnection)
at Devart.Common.DbConnectionBase.Open()
at Devart.Data.MySql.MySqlConnection.Open()
at Devart.Data.Linq.Provider.n.g()
wtftc
 
Posts: 10
Joined: Mon 26 Jul 2010 04:10

Postby wtftc » Fri 15 Oct 2010 15:03

I tried setting DataContext.MaxUsedConnections = 1, but that did not help with the error either.
wtftc
 
Posts: 10
Joined: Mon 26 Jul 2010 04:10

Postby StanislavK » Fri 15 Oct 2010 16:18

We will analyze the situation and inform you about the results.

Also, did you try using the MySqlLoader class for this operation? For its description, please refer to
http://www.devart.com/dotconnect/mysql/docs/Devart.Data.MySql~Devart.Data.MySql.MySqlLoader.html
StanislavK
Devart Team
 
Posts: 1710
Joined: Thu 03 Dec 2009 10:48

Postby wtftc » Fri 15 Oct 2010 20:32

The loader doesn't seem like it will work because I have to load multiple tables and query the data I am adding mid-load.

I have a workaround that I really, really don't like to the exception in my original post, which is to mix direct sql statements, which I don't like with the linq code that I do like. That way, I don't run into the random "too many connections" problem with CompiledQuery or DataContext.ExecuteQuery. In other words, I would still like a resolution to that problem.

However, I have another problem. My transaction is very long lived because I would like to insert on the order of 4m items into 4 different tables. As such, I seem to be running into another problem with something that looks like a timeout (stack trace below). Do you know where this is happening? Is it a connection timeout, a command timeout, or something else? Is there any way to set a timeout on the connection string or something so that this wouldn't happen?

Code: Select all
Test.API.RelationalStoreException : Error occurred during write operation.
  ----> Devart.Data.Linq.LinqCommandExecutionException : Error on executing DbCommand.
  ----> Devart.Data.MySql.MySqlException : Lost connection to MySQL server during query
  ----> Devart.Common.i : Transport channel is closed.
at Test.Storage.Relational.MySqlRelationalStorage.<>c__DisplayClass93.b__92() in MySqlRelationalStorage.cs: line 1103
at Test.Storage.Relational.MySqlRelationalStorage.WrapTx(Action dbCall) in MySqlRelationalStorage.cs: line 1138
at Test.Storage.Relational.MySqlRelationalStorage.WriteTx(Action dbCall) in MySqlRelationalStorage.cs: line 1088
at Test.Storage.Relational.MySqlRelationalStorage.BootstrapData(RelationalBootstrapDataArg arg) in MySqlRelationalStorage.cs: line 730
at Test.Storage.Relational.Tests.MySqlRelationalStorageTests.TestBootstrapData_2() in MySqlRelationalStorageTests.cs: line 364
--LinqCommandExecutionException
at Devart.Data.Linq.LinqCommandExecutionException.a(String A_0, Exception A_1)
at Devart.Data.Linq.Provider.DataProvider.ExecuteQuery(String commandText, List`1 resultsetParameters, IList`1 parameters, Boolean isBatch, ref IDbCommand dbCommand)
at Devart.Data.Linq.h.a(SubmitCommand A_0, SubmitCommandBuilder A_1)
at Devart.Data.Linq.q.a(SubmitCommandBuilder A_0)
at Devart.Data.Linq.y.a(DataContext A_0, ConflictMode A_1)
at Devart.Data.Linq.DataContext.SubmitChanges(ConflictMode failureMode)
at Test.Storage.Relational.MySqlRelationalStorage.ProcessBootstrapData(StreamReader bsymSR, DateTime bsymTime, StreamReader cusipSR, DateTime cusipTime) in MySqlRelationalStorage.cs: line 913
at Test.Storage.Relational.MySqlRelationalStorage.<>c__DisplayClass71.b__64() in MySqlRelationalStorage.cs: line 731
at Test.Storage.Relational.MySqlRelationalStorage.TimeIt(String actionName, Func`1 action) in MySqlRelationalStorage.cs: line 1074
at Test.Storage.Relational.MySqlRelationalStorage.<>c__DisplayClass71.b__63() in MySqlRelationalStorage.cs: line 730
at Test.Storage.Relational.MySqlRelationalStorage.<>c__DisplayClass93.b__92() in MySqlRelationalStorage.cs: line 1092
--MySqlException
at Devart.Data.MySql.bg.d(Exception A_0)
at Devart.Common.t.d(Byte[] A_0, Int32 A_1, Int32 A_2)
at Devart.Data.MySql.bk.d(Byte[] A_0, Int32 A_1, Int32 A_2)
at Devart.Data.MySql.bk.n()
at Devart.Data.MySql.bh.c()
at Devart.Data.MySql.ap.b(Boolean A_0)
at Devart.Data.MySql.a0.a(Boolean A_0)
at Devart.Common.DbCommandBase.Unprepare()
at Devart.Data.MySql.MySqlCommand.Unprepare()
at Devart.Common.DbCommandBase.Dispose(Boolean disposing)
at System.ComponentModel.Component.Dispose()
at Devart.Common.l.a(Int32 A_0, Object A_1, Int32 A_2, Object A_3)
at Devart.Common.l.a.a(Int32 A_0, Int32 A_1, Object A_2, l A_3)
at Devart.Common.l.a(Int32 A_0, Int32 A_1, Object A_2)
at Devart.Common.DbConnectionBase.a(Int32 A_0)
at Devart.Common.DbConnectionBase.Close()
at Devart.Data.MySql.MySqlCommand.InternalExecute(CommandBehavior behavior, IDisposable stmt, Int32 startRecord, Int32 maxRecords)
at Devart.Common.DbCommandBase.ExecuteDbDataReader(CommandBehavior behavior, Boolean nonQuery)
at System.Data.Common.DbCommand.System.Data.IDbCommand.ExecuteReader()
at Devart.Data.Linq.Provider.DataProvider.ExecuteQuery(String commandText, List`1 resultsetParameters, IList`1 parameters, Boolean isBatch, ref IDbCommand dbCommand)
--i
at Devart.Common.t.d(Byte[] A_0, Int32 A_1, Int32 A_2)
wtftc
 
Posts: 10
Joined: Mon 26 Jul 2010 04:10

Postby StanislavK » Mon 18 Oct 2010 14:58

The recommended way of performing such update is to explicitly open a transaction before starting inserts, and commit it after the file is read completely. In this case, all submits of the data context should be performed in the same transaction. For more information about this, please see the corresponding topic of our documentation:
http://www.devart.com/linqconnect/docs/Transactions.html#explicit

We could not reproduce the problem with the 'Too many connections' exception; could you please specify how exactly you are inserting data? If possible, please send us a small test project so that we are able to investigate the issue in details.

As for the 'Lost connection' problem, it is possible that the command timeout expired. To increase the command timeout, you can either change the CommandTimeout property of each MySqlCommand, or set the 'Default Command Timeout' parameter of the connection string.
StanislavK
Devart Team
 
Posts: 1710
Joined: Thu 03 Dec 2009 10:48

Postby wtftc » Mon 18 Oct 2010 16:06

I have sent the test project separately.
wtftc
 
Posts: 10
Joined: Mon 26 Jul 2010 04:10

Postby wtftc » Mon 18 Oct 2010 17:47

This exception really has me stumped. I am just trying to do the same insert/query batch job and now I seem to be getting bytes out of order, which doesn't usually happen with TCP. Am I seeing all of these problems because I'm using 6.0 beta? Should I switch to 5.8? Would that be more stable? Note that this is also witch the Connection Timeout set to 2 hours.

Code: Select all
Test.API.RelationalStoreException : Error occurred during write operation.
  ----> Devart.Data.Linq.LinqCommandExecutionException : Error on executing DbCommand.
  ----> Devart.Data.MySql.MySqlException : Net packets out of order: received[1], expected[2]
at Test.Storage.Relational.MySqlRelationalStorage.<>c__DisplayClass93.b__92() in MySqlRelationalStorage.cs: line 1117
at Test.Storage.Relational.MySqlRelationalStorage.WrapTx(Action dbCall) in MySqlRelationalStorage.cs: line 1152
at Test.Storage.Relational.MySqlRelationalStorage.WriteTx(Action dbCall) in MySqlRelationalStorage.cs: line 1102
at Test.Storage.Relational.MySqlRelationalStorage.BootstrapData(RelationalBootstrapDataArg arg) in MySqlRelationalStorage.cs: line 733
at Test.Storage.Relational.Tests.MySqlRelationalStorageTests.TestBootstrapData_2() in MySqlRelationalStorageTests.cs: line 304
--LinqCommandExecutionException
at Devart.Data.Linq.LinqCommandExecutionException.a(String A_0, Exception A_1)
at Devart.Data.Linq.Provider.DataProvider.ExecuteQuery(String commandText, List`1 resultsetParameters, IList`1 parameters, Boolean isBatch, ref IDbCommand dbCommand)
at Devart.Data.Linq.h.a(SubmitCommand A_0, SubmitCommandBuilder A_1)
at Devart.Data.Linq.q.a(SubmitCommandBuilder A_0)
at Devart.Data.Linq.y.a(DataContext A_0, ConflictMode A_1)
at Devart.Data.Linq.DataContext.SubmitChanges(ConflictMode failureMode)
at Test.Storage.Relational.MySqlRelationalStorage.ProcessBootstrapData(StreamReader bsymSR, DateTime bsymTime, StreamReader cusipSR, DateTime cusipTime) in MySqlRelationalStorage.cs: line 927
at Test.Storage.Relational.MySqlRelationalStorage.<>c__DisplayClass71.b__64() in MySqlRelationalStorage.cs: line 734
at Test.Storage.Relational.MySqlRelationalStorage.TimeIt(String actionName, Func`1 action) in MySqlRelationalStorage.cs: line 1088
at Test.Storage.Relational.MySqlRelationalStorage.<>c__DisplayClass71.b__63() in MySqlRelationalStorage.cs: line 733
at Test.Storage.Relational.MySqlRelationalStorage.<>c__DisplayClass93.b__92() in MySqlRelationalStorage.cs: line 1106
--MySqlException
at Devart.Data.MySql.bk.a()
at Devart.Data.MySql.bk.d()
at Devart.Data.MySql.u.a(ag[]& A_0, Int32& A_1)
at Devart.Data.MySql.u.a(Byte[] A_0, Int32 A_1, Boolean A_2)
at Devart.Data.MySql.a0.e()
at Devart.Data.MySql.MySqlCommand.InternalExecute(CommandBehavior behavior, IDisposable stmt, Int32 startRecord, Int32 maxRecords)
at Devart.Common.DbCommandBase.ExecuteDbDataReader(CommandBehavior behavior, Boolean nonQuery)
at System.Data.Common.DbCommand.System.Data.IDbCommand.ExecuteReader()
at Devart.Data.Linq.Provider.DataProvider.ExecuteQuery(String commandText, List`1 resultsetParameters, IList`1 parameters, Boolean isBatch, ref IDbCommand dbCommand)
wtftc
 
Posts: 10
Joined: Mon 26 Jul 2010 04:10

Postby StanislavK » Tue 19 Oct 2010 14:54

We have answered you by mail.
StanislavK
Devart Team
 
Posts: 1710
Joined: Thu 03 Dec 2009 10:48

Postby StanislavK » Tue 30 Nov 2010 17:49

We have implemented the 'Transaction Scope Local' connection string parameter which specifies whether a single physical connection will be used for all local transactions opened by the data context. Please try setting this parameter to true and tell us if this helps.

The 'Transaction Scope Local' parameter is available in the release of dotConnect for MySQL 6. The release can be downloaded from
http://www.devart.com/dotconnect/mysql/download.html
(the trial version) or from Registered Users' Area (for users with active subscription only):
http://secure.devart.com/

For the detailed information on fixes and improvements available in dotConnect for MySQL 6, please refer to
http://www.devart.com/forums/viewtopic.php?t=19623
StanislavK
Devart Team
 
Posts: 1710
Joined: Thu 03 Dec 2009 10:48


Return to dotConnect for MySQL