Changing from latin1 to utf8mb4

Discussion of open issues, suggestions and bugs regarding MyDAC (Data Access Components for MySQL) for Delphi, C++Builder, Lazarus (and FPC)
davidmarcus
Posts: 50
Joined: Tue 25 Jan 2005 11:22
Location: Somerville, MA
Contact:

Re: Changing from latin1 to utf8mb4

Post by davidmarcus » Sat 23 Jan 2016 17:27

I am getting an error:

Exception class: Exception
Exception message: malformed trail byte.

Stack Trace
[009FCF37]{NewRatingsCentralServerTest.exe} MyClasses.Utf8ToWs (Line 698, "MyClasses.pas" + 34)
...

davidmarcus
Posts: 50
Joined: Tue 25 Jan 2005 11:22
Location: Somerville, MA
Contact:

Re: Changing from latin1 to utf8mb4

Post by davidmarcus » Fri 29 Jan 2016 15:05

Any estimate of when a fix might be available?

davidmarcus
Posts: 50
Joined: Tue 25 Jan 2005 11:22
Location: Somerville, MA
Contact:

Re: Changing from latin1 to utf8mb4

Post by davidmarcus » Sat 06 Feb 2016 14:57

Viktor,

I would appreciate some idea of when this will be fixed. Weeks, months, years, never?

David

ViktorV
Devart Team
Posts: 3168
Joined: Wed 30 Jul 2014 07:16

Re: Changing from latin1 to utf8mb4

Post by ViktorV » Fri 12 Feb 2016 15:46

We are investigating the issue and will notify you about the result as soon as possible.

davidmarcus
Posts: 50
Joined: Tue 25 Jan 2005 11:22
Location: Somerville, MA
Contact:

Re: Changing from latin1 to utf8mb4

Post by davidmarcus » Sun 10 Apr 2016 18:41

Viktor,

The following seems to work. The lines that I added are marked with "DJM". What do you think?

Code: Select all

// Convert Utf8 buffer to WideString buffer with or without null terminator.
// Nearly copied from System.Utf8ToUnicode
function Utf8ToWs(
  const Dest: TValueArr; DestIdx: Cardinal; MaxDestBytes{w/wo #0}: Cardinal;
  const Source: TValueArr; SourceIdx, SourceBytes: Cardinal;
  const AddNull: boolean): Cardinal{bytes w/wo #0};
var
  i: Cardinal;
  c: Byte;
  wc: Cardinal;
begin
  (*OFS('+Utf8ToWs ' +
    'DestIdx = ' + IntToStr(DestIdx) + ', MaxDestBytes = ' + IntToStr(MaxDestBytes) {+ ', Length(Dest) = ' + IntToStr(Length(Dest))} +
    ', SourceIdx = ' + IntToStr(SourceIdx) + ', SourceBytes = ' + IntToStr(SourceBytes));
    OFS(Source + SourceIdx, SourceBytes);
  *)

  if false then begin // DJM
  Assert(Source <> nil);
  Assert(Dest <> nil);

  Result := 0;
  i := SourceIdx;
  while i < SourceIdx + SourceBytes do
  begin
    wc := Cardinal(Source[Integer(i)]);
    Inc(i);
    if (wc and $80) <> 0 then
    begin
      if i >= SourceIdx + SourceBytes then
        raise Exception.Create('incomplete multibyte char');
      wc := wc and $3F;
      if (wc and $20) <> 0 then
      begin
        c := Byte(Source[Integer(i)]);
        Inc(i);
        if (c and $C0) <> $80 then
          raise Exception.Create('malformed trail byte or out of range char');
        if i >= SourceIdx + SourceBytes then
          raise Exception.Create('incomplete multibyte char');
        wc := (wc shl 6) or (c and $3F);
      end;
      c := Byte(Source[Integer(i)]);
      Inc(i);
      if ((c and $C0) <> $80) and (c > $80) then
        raise Exception.Create('malformed trail byte');
      wc := (wc shl 6) or (c and $3F);
    end;

    // Assert(Result + 1 < MaxDestBytes, 'Result + 1 >= MaxDestBytes, Result = ' + IntToStr(Result) + ', MaxDestBytes = ' + IntToStr(MaxDestBytes));
    if not (Result + 1 < MaxDestBytes) then
      Break;
    Cardinal(PtrOffset(Dest, DestIdx + Result)^) := wc;
    Inc(Result, sizeof(WideChar));
  end;
  end else // DJM
     Result := cardinal( UnicodeFromLocaleChars( CP_UTF8, 0, addr( Source[ SourceIdx ] ), integer( SourceBytes ), // DJM
                                                 addr( Dest[ DestIdx ] ), integer( MaxDestBytes ) div sizeof( WideChar ) ) ) // DJM
               * sizeof( WideChar ); // DJM

  if AddNull then begin
    // Assert(Result < MaxDestBytes, 'Result >= MaxDestBytes, Result = ' + IntToStr(Result) + ', MaxDestBytes = ' + IntToStr(MaxDestBytes));
    if Result < MaxDestBytes then
      Marshal.WriteInt16(Dest, Integer(DestIdx + Result), 0)
    else
    begin
      Result := MaxDestBytes - sizeof(WideChar);
      Marshal.WriteInt16(Dest, Integer(DestIdx + Result), 0);
    end;
    Inc(Result, sizeof(WideChar));
  end;
  //OFS('-Utf8ToWs');
end;
David

ViktorV
Devart Team
Posts: 3168
Joined: Wed 30 Jul 2014 07:16

Re: Changing from latin1 to utf8mb4

Post by ViktorV » Tue 12 Apr 2016 12:26

Thank you for being interested in our products. We will consider your notice concerning MyDAC code and inform about the result.

davidmarcus
Posts: 50
Joined: Tue 25 Jan 2005 11:22
Location: Somerville, MA
Contact:

Re: Changing from latin1 to utf8mb4

Post by davidmarcus » Sat 16 Apr 2016 15:50

A new release with an official fix would be appreciated.

ViktorV
Devart Team
Posts: 3168
Joined: Wed 30 Jul 2014 07:16

Re: Changing from latin1 to utf8mb4

Post by ViktorV » Tue 26 Apr 2016 07:49

The new MyDAC 8.7.22 with support for utf8mb4 charset is already available for download now.

davidmarcus
Posts: 50
Joined: Tue 25 Jan 2005 11:22
Location: Somerville, MA
Contact:

Re: Changing from latin1 to utf8mb4

Post by davidmarcus » Tue 26 Apr 2016 11:37

ViktorV wrote:The new MyDAC 8.7.22 with support for utf8mb4 charset is already available for download now.
Great! Thanks.

ViktorV
Devart Team
Posts: 3168
Joined: Wed 30 Jul 2014 07:16

Re: Changing from latin1 to utf8mb4

Post by ViktorV » Tue 26 Apr 2016 11:54

Thank you for being interested in our products.
If you have any questions during using our products, please don't hesitate to contact us - and we will try to help you solve them.

robert84
Posts: 8
Joined: Sun 28 Aug 2022 16:51

Re: Changing from latin1 to utf8mb4

Post by robert84 » Mon 29 Aug 2022 16:30

davidmarcus wrote: Wed 13 Jan 2016 00:36 That works better. I tried some Japanese in the Basic Multilingual Plane, and it worked. But, when I tried a Japanese character not in the Basic Multilingual Plane, I got
#HY000Incorrect string value: '\xF0\xA0\x80\x81' for column 'Name' at row 1
Do you know if it should work for such characters?
If you have enabled UseUnicode, MyDAC uses 'utf8' internally which means only BMP characters will be supported. For full UTF-8 support you need 'utf8mb4'.

Unfortunately It's not clear if you can use 'utf8mb4' in combination with UseUnicode (I recently started a thread regarding this). I'm waiting for some clarification from MyDAC devs on this matter.

pavelpd
Devart Team
Posts: 109
Joined: Thu 06 Jan 2022 14:16

Re: Changing from latin1 to utf8mb4

Post by pavelpd » Thu 15 Sep 2022 07:33

Hi, Robert

We have responded to your request in this topic: viewtopic.php?f=7&t=58439

Post Reply