Malformed string exception with Firebird 2.5 UTF8 database

Discussion of open issues, suggestions and bugs regarding IBDAC (InterBase Data Access Components) for Delphi, C++Builder, Lazarus (and FPC)
Post Reply
tsteinmaurer
Posts: 52
Joined: Thu 17 Dec 2009 08:25

Malformed string exception with Firebird 2.5 UTF8 database

Post by tsteinmaurer » Sun 21 Nov 2010 20:07

Hello,

I'm using Delphi 2010 and IBDAC 3.10.0.14.

I read data from a Firebird 2.5 database with character set NONE, which also can contain characters like e.g. 'Ø'. This data then should be inserted into a UTF8 Firebird 2.5 database in a BLOB SUB_TYPE 1 column. For the destination connection, I use the following code in my data access layer to instantiate the connection:

Code: Select all

FIB_Connection := TIBCConnection.Create(nil);
FIB_Connection.ClientLibrary := 'fbclient.dll';
FIB_Connection.Options.Charset := 'UTF8';
FIB_Connection.Options.UseUnicode := True;
FIB_Connection.Options.EnableMemos := True;
So I guess, pretty much all I can do for "enabling" unicode stuff at connection level. But, anytime I'm trying to insert data with including e.g. 'Ø', I get a "Malformed string" exception.

Any ideas?

Thanks,
Thomas

AndreyZ

Post by AndreyZ » Tue 23 Nov 2010 11:49

Hello,

I cannot reproduce the problem. I have created two connections with the following options:

Code: Select all

// database with character set UTF8
IBCConnection1.ClientLibrary := 'fbclient.dll'; 
IBCConnection1.Options.Charset := 'UTF8'; 
IBCConnection1.Options.UseUnicode := True; 
IBCConnection1.Options.EnableMemos := True;

// database with character set NONE
IBCConnection2.ClientLibrary := 'fbclient.dll'; 
IBCConnection2.Options.Charset := ''; 
IBCConnection2.Options.UseUnicode := False; 
IBCConnection2.Options.EnableMemos := True;
and tried to copy data from one database to another in the following way:

Code: Select all

IBCQuery1.Append;
IBCQuery1.FieldByName('id').AsInteger := IBCQuery2.FieldByName('id').AsInteger;
IBCQuery1.FieldByName('blb').AsString := IBCQuery2.FieldByName('blb').AsString;
IBCQuery1.Post;
where IBCQuery1.Connection=IBCConnection1, IBCQuery2.Connection=IBCConnection2, and the blb field has the BLOB SUB_TYPE 1 database type.
The copy process was successful. Please try reproducing the problem using these options and code.

tsteinmaurer
Posts: 52
Joined: Thu 17 Dec 2009 08:25

Post by tsteinmaurer » Tue 23 Nov 2010 12:44

Hello,

I'm using a parametrized INSERT INTO query and access the BLOB paramters with ParamByName(...).AsMemo := ... Does this make a difference?

In your test, does the blob field in database 2 hold special characters like 'Ø'?

Thanks,
Thomas

tsteinmaurer
Posts: 52
Joined: Thu 17 Dec 2009 08:25

Post by tsteinmaurer » Tue 23 Nov 2010 12:51

Btw, why is UNICODE_FSS used instead of the provided UTF8 character set when UseUnicode is set to True?

Thanks,
Thomas

AndreyZ

Post by AndreyZ » Tue 23 Nov 2010 15:02

I'm using a parametrized INSERT INTO query and access the BLOB paramters with ParamByName(...).AsMemo := ... Does this make a difference?
You should use the following code:

Code: Select all

IBCQuery1.ParamByName('id').AsInteger := IBCQuery2.FieldByName('id').AsInteger;
IBCQuery1.ParamByName('blb').AsMemoRef.IsUnicode := true;
IBCQuery1.ParamByName('blb').AsMemoRef.AsString := IBCQuery2.FieldByName('blb').AsString;
IBCQuery1.Execute;

AndreyZ

Post by AndreyZ » Tue 23 Nov 2010 15:03

Btw, why is UNICODE_FSS used instead of the provided UTF8 character set when UseUnicode is set to True?
Please read this topic: http://www.devart.com/forums/viewtopic.php?t=19480

tsteinmaurer
Posts: 52
Joined: Thu 17 Dec 2009 08:25

Post by tsteinmaurer » Tue 23 Nov 2010 17:13

Hello,

thanks. With the workaround, it seems to work, but I wonder what other hacks do I need to be sure that my Firebird based application is fully Unicode-enabled? Because the mentioned "Working with Unicode character data" article doesn't even mentioned your proposed solution.

Imagine, my product connects to databases where I don't have the control over whether the user connects with UTF8/UNICODE_FSS or not. Your parameter hack isn't re-usable in that scenario.

You really should look into using UTF8 instead of UNICODE_FSS, because UNICODE_FSS is not only deprecated, but also has bugs in handling unicode data correctly.

To be honest, using D2010 with IBDAC, I thought, by using standard datatypes like String etc. I'm Unicode-enabled out-of-the box. Including BLOBs SUB_TYPE 1. Don't get me wrong, but the Unicode-support doesn't work as advertised. Hopefully things are getting better with forthcoming releases, but this also forces me to renew my subscription for that. Unicode-support for a Firebird D2010 application was my number on reason for moving to IBDAC ...

Thanks anyway,
Thomas

AndreyZ

Post by AndreyZ » Wed 24 Nov 2010 13:35

The workaround I gave you is needed only for this particular case when you are trying to copy data from a non-unicode database to a unicode database. In other cases you should set the TIBCConnection.Options.UseUnicode property to True to have full unicode support.
We are investigating the possibility of using UTF-8 instead of the UNICODE_FSS character set.

ralfiii
Posts: 25
Joined: Wed 16 Mar 2011 09:25

Post by ralfiii » Thu 17 Mar 2011 15:31

AndreyZ wrote: You should use the following code:

Code: Select all

IBCQuery1.ParamByName('blb').AsMemoRef.IsUnicode := true;
IBCQuery1.ParamByName('blb').AsMemo := xxx;
This article solved my current problem.
However, for a UTF8-Unicode-Database I expected this to be the default behaviour.

Maybe you should consider making that the standard behaviour. (I would think the more people using D2009+ the more troubles you will hear of).

At least you should mention this "fix" in the helpfile in the section "Working with Unicode Character Data".

After reading the helpfile I am still not sure:
My whole application is D2010, unicode, UTF8. To use Unicode-Parameters, do I always need to set that explicitly or use .AsWideString.
(under D2010 I would have expected .AsString to be unicode and .AsAnsiString to be AnsiStrings.)

AndreyZ

Post by AndreyZ » Fri 18 Mar 2011 08:32

We will investigate this question. As soon as we get any results, we will let you know.

Dimon
Devart Team
Posts: 2910
Joined: Mon 05 Mar 2007 16:32

Post by Dimon » Fri 08 Apr 2011 09:26

To use this functionality by default, you should set the UnicodeMemoParameters global variable from the IBC unit to True.

ralfiii
Posts: 25
Joined: Wed 16 Mar 2011 09:25

Post by ralfiii » Mon 20 Jun 2011 09:19

Dimon wrote:To use this functionality by default, you should set the UnicodeMemoParameters global variable from the IBC unit to True.
Sounds promising.
Is there any downside/bad sideeffect in doing that?

AndreyZ

Post by AndreyZ » Mon 20 Jun 2011 12:14

When you set the UnicodeMemoParameters global variable to True, IBDAC automatically sets the following TIBCParam properties: IsUnicode to True, CharsetID to CH_UTF8. In this case, all queries with Memo parameters will transfer parameters' data to the server in the Unicode format.

Post Reply