Page 1 of 2

Changing from latin1 to utf8mb4

Posted: Sun 15 Feb 2015 22:03
by davidmarcus
I would appreciate some guidance on how to work with utf8mb4 tables.

My Delphi app communicates with a MySQL database (version 5.5.40-MariaDB-cll-lve) on a Web server. Currently, the string fields in the MySQL tables are char or varchar using the latin1 character set. I have TMyConnection.Options.CharSet set to 'latin1'.

For insert, replace, update, and delete statements, I've been using TMyScript by constructing the SQL myself, assigning it to TMyScript.SQL.Text, then doing TMyScript.Execute. I escape and quote the values myself, i.e., put single quotes around strings, prefix quotes with a backslash, etc. I end up with a string (UnicodeString in XE7) that I assign to TMyScript.SQL.Text.

For select statements, I use TMyQuery, construct the SQL myself, and assign it to TMyQuery.SQL.Text. After calling TMyQuery.Execute, I retrieve the string values via TMyQuery.FieldByName('TheFieldName').AsString.

I'm planning to convert the MySQL tables so most of the varchar fields will be utf8mb4. Some of the varchar fields and all the char fields will be ascii. What do I need to change on the Delphi side?

I'm using MyDAC 8.4.13, Delphi XE7 Professional, Windows Vista Home Premium.

Re: Changing from latin1 to utf8mb4

Posted: Mon 16 Feb 2015 13:05
by ViktorV
In your case, you can set the TMyConnection.Options.CharSet property to "utf8mb4" or set the TMyConnection.Options.UseUnicode property to "True" .

Re: Changing from latin1 to utf8mb4

Posted: Mon 16 Feb 2015 13:20
by davidmarcus
ViktorV wrote:In your case, you can set the TMyConnection.Options.CharSet property to "utf8mb4" or set the TMyConnection.Options.UseUnicode property to "True".
Thank you. Simple as could be.

When would someone need to set the UseUnicode property rather than just setting CharSet? Is it for use with TMyTable?

Re: Changing from latin1 to utf8mb4

Posted: Mon 16 Feb 2015 14:41
by ViktorV
You can find out how to use these properties in the MyDAC help: http://www.devart.com/mydac/docs/usingn ... acters.htm Both these properties affect the behavior of TMyTable.

Re: Changing from latin1 to utf8mb4

Posted: Sun 10 Jan 2016 18:30
by davidmarcus
I'm trying to use utf8mb4 for CharSet, but MySQL seems to be receiving
Latin 1. I'm using MyDAC 8.6.20 and MySQL 5.5.47. I create a table as
follows:

Code: Select all

create database if not exists TestDatabase;
use TestDatabase;
drop table if exists TestTable;
create table TestTable (
ID   integer unsigned not null auto_increment,
Name varchar(50)      not null,
primary key ( ID ),
index Name  ( Name ) )
default character set utf8mb4 collate utf8mb4_unicode_ci pack_keys=1 engine=MyISAM;
show warnings;
I then run this app:

Code: Select all

{$apptype console}
program TestMyDAC;

uses
   Winapi.Windows, System.SysUtils, MyAccess, MyScript;

var
   Connection: TMyConnection;
   Updater:    TMyScript;

begin

   try
      Connection := TMyConnection.Create( nil );
      Connection.Options.Direct   := true;
      Connection.Options.Compress := true;
      Connection.Options.CharSet  := 'utf8mb4';
      Connection.Server           := '127.0.0.1';
      Connection.Username         := 'root';
      Connection.Password         := '';
      Connection.Database         := 'testdatabase';
      Connection.Connected        := true;
      Updater                     := TMyScript.Create( nil );
      Updater.Connection          := Connection;
      Updater.SQL.Text            := 'insert into TestTable set Name = ''ÀÁÂÃÄÅÇÈÉ''';
      Updater.Execute;
      Connection.Connected := false;
      Updater.Free;
      Connection.Free;

   except
      on E: Exception do
         writeln( E.Message );
   end;

end.
I get the error
#HY000Incorrect string value: '\xC0\xC1\xC2\xC3\xC4\xC5...' for column 'Name' at row 1
This test only tries to insert a Latin 1 string, but I also want to be
able to use characters not in Latin 1. What am I doing wrong?

Re: Changing from latin1 to utf8mb4

Posted: Tue 12 Jan 2016 13:39
by ViktorV
To solve the issue, please replace the line

Code: Select all

Connection.Options.CharSet := 'utf8mb4';
with

Code: Select all

Connection.Options.UseUnicode := True; 
and check if the issue is reproduced.

Re: Changing from latin1 to utf8mb4

Posted: Wed 13 Jan 2016 00:36
by davidmarcus
That works better. I tried some Japanese in the Basic Multilingual Plane, and it worked. But, when I tried a Japanese character not in the Basic Multilingual Plane, I got
#HY000Incorrect string value: '\xF0\xA0\x80\x81' for column 'Name' at row 1
Do you know if it should work for such characters?

Re: Changing from latin1 to utf8mb4

Posted: Wed 13 Jan 2016 15:58
by davidmarcus
MySQL Workbench seems to have no problem with utf8mb4. See http://www.davidmarcus.com/MySQL_Workbench_utf8mb4.png. I still get an error with MyDAC with Options.UseUnicode := true. Setting UseUnicode to true and also setting Options.CharSet to 'utf8mb4' gives the same "Incorrect string value" error.

Re: Changing from latin1 to utf8mb4

Posted: Thu 14 Jan 2016 21:18
by davidmarcus
I also tried using PHP and the mysqli extension. It has no problem with the four byte character.

Also, how should I set collation_connection to utf8mb4_unicode_ci (which is not the default collation)?

Re: Changing from latin1 to utf8mb4

Posted: Fri 15 Jan 2016 11:11
by ViktorV
Thank you for the information. The problem investigation is in progress. We will inform you when we have any results.

Re: Changing from latin1 to utf8mb4

Posted: Tue 19 Jan 2016 17:23
by JEFFB
oops. I see I was responding to something really old.

Jeff

Re: Changing from latin1 to utf8mb4

Posted: Tue 19 Jan 2016 18:45
by davidmarcus
Jeff,

It works fine with MySQL Workbench. You don't need to change the server and database variables. You only need to do

Code: Select all

set names 'utf8mb4' collate 'utf8mb4_unicode_ci'
Unfortunately, MySQL Workbench defaults to utf8 instead of utf8mb4, so you have to change it every time you start a session.

For MyDAC, did you try a four byte character?

David

Re: Changing from latin1 to utf8mb4

Posted: Wed 20 Jan 2016 19:48
by davidmarcus
Viktor,

This seems to work (replace the asterisk with a four-byte character; the forum won't let me post it):

Code: Select all

{$apptype console}
program TestMyDAC;

uses
   Winapi.Windows, System.SysUtils, MyAccess, MyScript;

var
   Connection: TMyConnection;
   Updater:    TMyScript;

begin

   try
      Connection := TMyConnection.Create( nil );
      Connection.Options.Direct     := true;
      Connection.Options.Compress   := true;
      Connection.Options.UseUnicode := true;
      Connection.Options.CharSet    := 'utf8mb4';
      Connection.Server             := '127.0.0.1';
      Connection.Username           := 'root';
      Connection.Password           := '';
      Connection.Database           := 'testdatabase';
      Connection.Connected          := true;
      Updater                       := TMyScript.Create( nil );
      Updater.Connection            := Connection;
      Updater.SQL.Text              := 'set names ''utf8mb4'' collate ''utf8mb4_unicode_ci''';
      Updater.Execute;
      Updater.SQL.Text              := 'insert into TestTable set Name = ''*''';
      Updater.Execute;
      Connection.Connected := false;
      Updater.Free;
      Connection.Free;

   except
      on E: Exception do
         writeln( E.Message );
   end;

end.

Re: Changing from latin1 to utf8mb4

Posted: Thu 21 Jan 2016 09:51
by ViktorV
You can save characters in utf8mb4 encoding correctly during execution of the provided code. But when reading them, you might get an error: imcomplete multibyte char.

Re: Changing from latin1 to utf8mb4

Posted: Thu 21 Jan 2016 12:49
by davidmarcus
OK. I'll wait for an official fix.