Changing from latin1 to utf8mb4

Discussion of open issues, suggestions and bugs regarding MyDAC (Data Access Components for MySQL) for Delphi, C++Builder, Lazarus (and FPC)
davidmarcus
Posts: 50
Joined: Tue 25 Jan 2005 11:22
Location: Somerville, MA
Contact:

Changing from latin1 to utf8mb4

Post by davidmarcus » Sun 15 Feb 2015 22:03

I would appreciate some guidance on how to work with utf8mb4 tables.

My Delphi app communicates with a MySQL database (version 5.5.40-MariaDB-cll-lve) on a Web server. Currently, the string fields in the MySQL tables are char or varchar using the latin1 character set. I have TMyConnection.Options.CharSet set to 'latin1'.

For insert, replace, update, and delete statements, I've been using TMyScript by constructing the SQL myself, assigning it to TMyScript.SQL.Text, then doing TMyScript.Execute. I escape and quote the values myself, i.e., put single quotes around strings, prefix quotes with a backslash, etc. I end up with a string (UnicodeString in XE7) that I assign to TMyScript.SQL.Text.

For select statements, I use TMyQuery, construct the SQL myself, and assign it to TMyQuery.SQL.Text. After calling TMyQuery.Execute, I retrieve the string values via TMyQuery.FieldByName('TheFieldName').AsString.

I'm planning to convert the MySQL tables so most of the varchar fields will be utf8mb4. Some of the varchar fields and all the char fields will be ascii. What do I need to change on the Delphi side?

I'm using MyDAC 8.4.13, Delphi XE7 Professional, Windows Vista Home Premium.

ViktorV
Devart Team
Posts: 3168
Joined: Wed 30 Jul 2014 07:16

Re: Changing from latin1 to utf8mb4

Post by ViktorV » Mon 16 Feb 2015 13:05

In your case, you can set the TMyConnection.Options.CharSet property to "utf8mb4" or set the TMyConnection.Options.UseUnicode property to "True" .

davidmarcus
Posts: 50
Joined: Tue 25 Jan 2005 11:22
Location: Somerville, MA
Contact:

Re: Changing from latin1 to utf8mb4

Post by davidmarcus » Mon 16 Feb 2015 13:20

ViktorV wrote:In your case, you can set the TMyConnection.Options.CharSet property to "utf8mb4" or set the TMyConnection.Options.UseUnicode property to "True".
Thank you. Simple as could be.

When would someone need to set the UseUnicode property rather than just setting CharSet? Is it for use with TMyTable?

ViktorV
Devart Team
Posts: 3168
Joined: Wed 30 Jul 2014 07:16

Re: Changing from latin1 to utf8mb4

Post by ViktorV » Mon 16 Feb 2015 14:41

You can find out how to use these properties in the MyDAC help: http://www.devart.com/mydac/docs/usingn ... acters.htm Both these properties affect the behavior of TMyTable.

davidmarcus
Posts: 50
Joined: Tue 25 Jan 2005 11:22
Location: Somerville, MA
Contact:

Re: Changing from latin1 to utf8mb4

Post by davidmarcus » Sun 10 Jan 2016 18:30

I'm trying to use utf8mb4 for CharSet, but MySQL seems to be receiving
Latin 1. I'm using MyDAC 8.6.20 and MySQL 5.5.47. I create a table as
follows:

Code: Select all

create database if not exists TestDatabase;
use TestDatabase;
drop table if exists TestTable;
create table TestTable (
ID   integer unsigned not null auto_increment,
Name varchar(50)      not null,
primary key ( ID ),
index Name  ( Name ) )
default character set utf8mb4 collate utf8mb4_unicode_ci pack_keys=1 engine=MyISAM;
show warnings;
I then run this app:

Code: Select all

{$apptype console}
program TestMyDAC;

uses
   Winapi.Windows, System.SysUtils, MyAccess, MyScript;

var
   Connection: TMyConnection;
   Updater:    TMyScript;

begin

   try
      Connection := TMyConnection.Create( nil );
      Connection.Options.Direct   := true;
      Connection.Options.Compress := true;
      Connection.Options.CharSet  := 'utf8mb4';
      Connection.Server           := '127.0.0.1';
      Connection.Username         := 'root';
      Connection.Password         := '';
      Connection.Database         := 'testdatabase';
      Connection.Connected        := true;
      Updater                     := TMyScript.Create( nil );
      Updater.Connection          := Connection;
      Updater.SQL.Text            := 'insert into TestTable set Name = ''ÀÁÂÃÄÅÇÈÉ''';
      Updater.Execute;
      Connection.Connected := false;
      Updater.Free;
      Connection.Free;

   except
      on E: Exception do
         writeln( E.Message );
   end;

end.
I get the error
#HY000Incorrect string value: '\xC0\xC1\xC2\xC3\xC4\xC5...' for column 'Name' at row 1
This test only tries to insert a Latin 1 string, but I also want to be
able to use characters not in Latin 1. What am I doing wrong?

ViktorV
Devart Team
Posts: 3168
Joined: Wed 30 Jul 2014 07:16

Re: Changing from latin1 to utf8mb4

Post by ViktorV » Tue 12 Jan 2016 13:39

To solve the issue, please replace the line

Code: Select all

Connection.Options.CharSet := 'utf8mb4';
with

Code: Select all

Connection.Options.UseUnicode := True; 
and check if the issue is reproduced.

davidmarcus
Posts: 50
Joined: Tue 25 Jan 2005 11:22
Location: Somerville, MA
Contact:

Re: Changing from latin1 to utf8mb4

Post by davidmarcus » Wed 13 Jan 2016 00:36

That works better. I tried some Japanese in the Basic Multilingual Plane, and it worked. But, when I tried a Japanese character not in the Basic Multilingual Plane, I got
#HY000Incorrect string value: '\xF0\xA0\x80\x81' for column 'Name' at row 1
Do you know if it should work for such characters?

davidmarcus
Posts: 50
Joined: Tue 25 Jan 2005 11:22
Location: Somerville, MA
Contact:

Re: Changing from latin1 to utf8mb4

Post by davidmarcus » Wed 13 Jan 2016 15:58

MySQL Workbench seems to have no problem with utf8mb4. See http://www.davidmarcus.com/MySQL_Workbench_utf8mb4.png. I still get an error with MyDAC with Options.UseUnicode := true. Setting UseUnicode to true and also setting Options.CharSet to 'utf8mb4' gives the same "Incorrect string value" error.

davidmarcus
Posts: 50
Joined: Tue 25 Jan 2005 11:22
Location: Somerville, MA
Contact:

Re: Changing from latin1 to utf8mb4

Post by davidmarcus » Thu 14 Jan 2016 21:18

I also tried using PHP and the mysqli extension. It has no problem with the four byte character.

Also, how should I set collation_connection to utf8mb4_unicode_ci (which is not the default collation)?

ViktorV
Devart Team
Posts: 3168
Joined: Wed 30 Jul 2014 07:16

Re: Changing from latin1 to utf8mb4

Post by ViktorV » Fri 15 Jan 2016 11:11

Thank you for the information. The problem investigation is in progress. We will inform you when we have any results.

JEFFB
Posts: 3
Joined: Wed 06 Jan 2016 17:32

Re: Changing from latin1 to utf8mb4

Post by JEFFB » Tue 19 Jan 2016 17:23

oops. I see I was responding to something really old.

Jeff
Last edited by JEFFB on Tue 19 Jan 2016 20:35, edited 1 time in total.

davidmarcus
Posts: 50
Joined: Tue 25 Jan 2005 11:22
Location: Somerville, MA
Contact:

Re: Changing from latin1 to utf8mb4

Post by davidmarcus » Tue 19 Jan 2016 18:45

Jeff,

It works fine with MySQL Workbench. You don't need to change the server and database variables. You only need to do

Code: Select all

set names 'utf8mb4' collate 'utf8mb4_unicode_ci'
Unfortunately, MySQL Workbench defaults to utf8 instead of utf8mb4, so you have to change it every time you start a session.

For MyDAC, did you try a four byte character?

David

davidmarcus
Posts: 50
Joined: Tue 25 Jan 2005 11:22
Location: Somerville, MA
Contact:

Re: Changing from latin1 to utf8mb4

Post by davidmarcus » Wed 20 Jan 2016 19:48

Viktor,

This seems to work (replace the asterisk with a four-byte character; the forum won't let me post it):

Code: Select all

{$apptype console}
program TestMyDAC;

uses
   Winapi.Windows, System.SysUtils, MyAccess, MyScript;

var
   Connection: TMyConnection;
   Updater:    TMyScript;

begin

   try
      Connection := TMyConnection.Create( nil );
      Connection.Options.Direct     := true;
      Connection.Options.Compress   := true;
      Connection.Options.UseUnicode := true;
      Connection.Options.CharSet    := 'utf8mb4';
      Connection.Server             := '127.0.0.1';
      Connection.Username           := 'root';
      Connection.Password           := '';
      Connection.Database           := 'testdatabase';
      Connection.Connected          := true;
      Updater                       := TMyScript.Create( nil );
      Updater.Connection            := Connection;
      Updater.SQL.Text              := 'set names ''utf8mb4'' collate ''utf8mb4_unicode_ci''';
      Updater.Execute;
      Updater.SQL.Text              := 'insert into TestTable set Name = ''*''';
      Updater.Execute;
      Connection.Connected := false;
      Updater.Free;
      Connection.Free;

   except
      on E: Exception do
         writeln( E.Message );
   end;

end.

ViktorV
Devart Team
Posts: 3168
Joined: Wed 30 Jul 2014 07:16

Re: Changing from latin1 to utf8mb4

Post by ViktorV » Thu 21 Jan 2016 09:51

You can save characters in utf8mb4 encoding correctly during execution of the provided code. But when reading them, you might get an error: imcomplete multibyte char.

davidmarcus
Posts: 50
Joined: Tue 25 Jan 2005 11:22
Location: Somerville, MA
Contact:

Re: Changing from latin1 to utf8mb4

Post by davidmarcus » Thu 21 Jan 2016 12:49

OK. I'll wait for an official fix.

Post Reply