Changing from latin1 to utf8mb4
-
- Posts: 50
- Joined: Tue 25 Jan 2005 11:22
- Location: Somerville, MA
- Contact:
Re: Changing from latin1 to utf8mb4
I am getting an error:
Exception class: Exception
Exception message: malformed trail byte.
Stack Trace
[009FCF37]{NewRatingsCentralServerTest.exe} MyClasses.Utf8ToWs (Line 698, "MyClasses.pas" + 34)
...
Exception class: Exception
Exception message: malformed trail byte.
Stack Trace
[009FCF37]{NewRatingsCentralServerTest.exe} MyClasses.Utf8ToWs (Line 698, "MyClasses.pas" + 34)
...
-
- Posts: 50
- Joined: Tue 25 Jan 2005 11:22
- Location: Somerville, MA
- Contact:
Re: Changing from latin1 to utf8mb4
Any estimate of when a fix might be available?
-
- Posts: 50
- Joined: Tue 25 Jan 2005 11:22
- Location: Somerville, MA
- Contact:
Re: Changing from latin1 to utf8mb4
Viktor,
I would appreciate some idea of when this will be fixed. Weeks, months, years, never?
David
I would appreciate some idea of when this will be fixed. Weeks, months, years, never?
David
Re: Changing from latin1 to utf8mb4
We are investigating the issue and will notify you about the result as soon as possible.
-
- Posts: 50
- Joined: Tue 25 Jan 2005 11:22
- Location: Somerville, MA
- Contact:
Re: Changing from latin1 to utf8mb4
Viktor,
The following seems to work. The lines that I added are marked with "DJM". What do you think?
David
The following seems to work. The lines that I added are marked with "DJM". What do you think?
Code: Select all
// Convert Utf8 buffer to WideString buffer with or without null terminator.
// Nearly copied from System.Utf8ToUnicode
function Utf8ToWs(
const Dest: TValueArr; DestIdx: Cardinal; MaxDestBytes{w/wo #0}: Cardinal;
const Source: TValueArr; SourceIdx, SourceBytes: Cardinal;
const AddNull: boolean): Cardinal{bytes w/wo #0};
var
i: Cardinal;
c: Byte;
wc: Cardinal;
begin
(*OFS('+Utf8ToWs ' +
'DestIdx = ' + IntToStr(DestIdx) + ', MaxDestBytes = ' + IntToStr(MaxDestBytes) {+ ', Length(Dest) = ' + IntToStr(Length(Dest))} +
', SourceIdx = ' + IntToStr(SourceIdx) + ', SourceBytes = ' + IntToStr(SourceBytes));
OFS(Source + SourceIdx, SourceBytes);
*)
if false then begin // DJM
Assert(Source <> nil);
Assert(Dest <> nil);
Result := 0;
i := SourceIdx;
while i < SourceIdx + SourceBytes do
begin
wc := Cardinal(Source[Integer(i)]);
Inc(i);
if (wc and $80) <> 0 then
begin
if i >= SourceIdx + SourceBytes then
raise Exception.Create('incomplete multibyte char');
wc := wc and $3F;
if (wc and $20) <> 0 then
begin
c := Byte(Source[Integer(i)]);
Inc(i);
if (c and $C0) <> $80 then
raise Exception.Create('malformed trail byte or out of range char');
if i >= SourceIdx + SourceBytes then
raise Exception.Create('incomplete multibyte char');
wc := (wc shl 6) or (c and $3F);
end;
c := Byte(Source[Integer(i)]);
Inc(i);
if ((c and $C0) <> $80) and (c > $80) then
raise Exception.Create('malformed trail byte');
wc := (wc shl 6) or (c and $3F);
end;
// Assert(Result + 1 < MaxDestBytes, 'Result + 1 >= MaxDestBytes, Result = ' + IntToStr(Result) + ', MaxDestBytes = ' + IntToStr(MaxDestBytes));
if not (Result + 1 < MaxDestBytes) then
Break;
Cardinal(PtrOffset(Dest, DestIdx + Result)^) := wc;
Inc(Result, sizeof(WideChar));
end;
end else // DJM
Result := cardinal( UnicodeFromLocaleChars( CP_UTF8, 0, addr( Source[ SourceIdx ] ), integer( SourceBytes ), // DJM
addr( Dest[ DestIdx ] ), integer( MaxDestBytes ) div sizeof( WideChar ) ) ) // DJM
* sizeof( WideChar ); // DJM
if AddNull then begin
// Assert(Result < MaxDestBytes, 'Result >= MaxDestBytes, Result = ' + IntToStr(Result) + ', MaxDestBytes = ' + IntToStr(MaxDestBytes));
if Result < MaxDestBytes then
Marshal.WriteInt16(Dest, Integer(DestIdx + Result), 0)
else
begin
Result := MaxDestBytes - sizeof(WideChar);
Marshal.WriteInt16(Dest, Integer(DestIdx + Result), 0);
end;
Inc(Result, sizeof(WideChar));
end;
//OFS('-Utf8ToWs');
end;
Re: Changing from latin1 to utf8mb4
Thank you for being interested in our products. We will consider your notice concerning MyDAC code and inform about the result.
-
- Posts: 50
- Joined: Tue 25 Jan 2005 11:22
- Location: Somerville, MA
- Contact:
Re: Changing from latin1 to utf8mb4
A new release with an official fix would be appreciated.
Re: Changing from latin1 to utf8mb4
The new MyDAC 8.7.22 with support for utf8mb4 charset is already available for download now.
-
- Posts: 50
- Joined: Tue 25 Jan 2005 11:22
- Location: Somerville, MA
- Contact:
Re: Changing from latin1 to utf8mb4
Great! Thanks.ViktorV wrote:The new MyDAC 8.7.22 with support for utf8mb4 charset is already available for download now.
Re: Changing from latin1 to utf8mb4
Thank you for being interested in our products.
If you have any questions during using our products, please don't hesitate to contact us - and we will try to help you solve them.
If you have any questions during using our products, please don't hesitate to contact us - and we will try to help you solve them.
Re: Changing from latin1 to utf8mb4
If you have enabled UseUnicode, MyDAC uses 'utf8' internally which means only BMP characters will be supported. For full UTF-8 support you need 'utf8mb4'.davidmarcus wrote: ↑Wed 13 Jan 2016 00:36 That works better. I tried some Japanese in the Basic Multilingual Plane, and it worked. But, when I tried a Japanese character not in the Basic Multilingual Plane, I gotDo you know if it should work for such characters?#HY000Incorrect string value: '\xF0\xA0\x80\x81' for column 'Name' at row 1
Unfortunately It's not clear if you can use 'utf8mb4' in combination with UseUnicode (I recently started a thread regarding this). I'm waiting for some clarification from MyDAC devs on this matter.