Multilingual Support with Unicode Datatypes

cemick · Post by **cemick** » Fri 04 Dec 2015 09:59

Do I understand correctly that the ODAC does not support Multilingual Support with Unicode Datatypes(http://docs.oracle.com/cd/B19306_01/ser ... m#CACHCAHF)?
I have a database in encoding ANSI, but there are some columns in tables may have to be national type (NCHAR, NVARCHAR, NCLOB). If I make Session.UseUnicode to true, then all my columns will be TWideStringField. This way I can get data loss when you save this field in the ANSI column base.

MaximG · Post by **MaximG** » Mon 07 Dec 2015 09:13

We have checked Multilingual Support when using national type column in tables and found no issues (DB: Oracle 11g, CHARACTER SET: AL32UTF8, NATIONAL CHARACTER SET: AL16UTF16). Please describe the problem in more details. Compose a small sample demonstrating the issue and send it to us, including the table creating script used in this sample.

cemick · Post by **cemick** » Mon 07 Dec 2015 16:04

I have prepared an example. If UseUnicode = False, the user immediately sees when entering the inadmissibility of Unicode characters. If UseUnicode = True, the user creates the illusion that it is possible to enter Unicode symbols not in Unicode columns. Which can lead to data loss.

Delphi project:
https://onedrive.live.com/redir?resid=D ... file%2czip

screencast:
https://onedrive.live.com/redir?resid=D ... file%2cmkv

MaximG · Post by **MaximG** » Tue 08 Dec 2015 09:19

We have received your sample and reproduced its behavior in our test environment. When using Oracle with CHARACTER SET=AL32UTF8 and NATIONAL CHARACTER SET=AL16UTF16 , character ¥ saving is correct. When using another code page as a CHARACTER SET (e.g., CL8MSWIN1251), the character ¥ is saved in a VARCHAR2 field incorrectly, since this encoding doesn't support it. This can be ensured even not using ODAC. It is enough to execute the following query in SQL*Plus:

Update UniCodeTypesTest Set SANSI=SUniCode

So use an encoding with support for the required character set as CHARACTER SET.

cemick · Post by **cemick** » Tue 08 Dec 2015 10:15

All right Oracle provides two modes of operation with Unicode. Unicode database and Unicode data types. When using a Unicode database all string becomes Unicode types(Char, VarChar2, clob). In this case, there is no need to use National data types(NChar, NVarChar). We can't for a number of reasons to use a Unicode database. Therefore, the reason we only use Unicode data types (National) and set CHARACTER SET=AL32UTF8 we can't. The original question was if the ODAC components support the mode of operation using only Unicode data types as described in http://docs.oracle.com/cd/B19306_01/ser ... nicode.htm and http://docs.oracle.com/cd/B19306_01/ser ... m#CACHCAHF

MaximG · Post by **MaximG** » Tue 08 Dec 2015 14:57

Try to execute the following scripts in the SQL*Plus environment:

A. Update UniCodeTypesTest Set SUniCode='¥'
B. Update UniCodeTypesTest Set SANSI=SUniCode
C. Select SUniCode, SANSI From UniCodeTypesTest

Let us know the result returned by the query C.

cemick · Post by **cemick** » Tue 08 Dec 2015 16:28

Code: Select all

SUNICODE
--------------------------------------------------
SANSI
--------------------------------------------------
?
?

https://onedrive.live.com/redir?resid=D ... hoto%2cPNG

UPD1

Our NLS Params:

NLS_CALENDAR GREGORIAN
NLS_CHARACTERSET CL8MSWIN1251
NLS_COMP BINARY
NLS_CURRENCY $
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_DUAL_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_LANGUAGE AMERICAN
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_NCHAR_CONV_EXCP FALSE
NLS_NUMERIC_CHARACTERS .,
NLS_RDBMS_VERSION 11.2.0.4.0
NLS_SORT BINARY
NLS_TERRITORY AMERICA
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR

NLS_Lang environment variable on my host is AMERICAN_AMERICA.CL8MSWIN1251

UPD2
Nothing changed after changing NLS_Lang to UTF8
https://onedrive.live.com/redir?resid=D ... hoto%2cPNG

MaximG · Post by **MaximG** » Fri 11 Dec 2015 09:36

We continue investigation of Multilingual Support with Unicode Datatypes using ODAC. Try to perform the following steps:

1) Using your application, set the SUniCode field value to '¥';
2) In the same way, set the '¥' value for the SANSI field;
3) Using SQL*Plus, analyze which SUniCode and SANSI field values are saved in the database. For this, use «Command Prompt» with parameters to support the required code page:
- set font «Lucida Console» in the «Command Prompt» properties;
- after starting «Command Prompt», set the current code page to 1252 (command: chcp 1252)
- run SQL*Plus in «Command Prompt»;
- execute the query: Select SANSI, SUniCode From UniCodeTypesTest.

4) Let us know what SUniCode and SANSI field values you get.
5) In the case if SUniCode='¥', and SANSI='?', try to gain SUniCode='¥' and SANSI='¥' in your DB instance in any way without using ODAC.

cemick · Post by **cemick** » Tue 15 Dec 2015 16:48

The result is the same https://onedrive.live.com/redir?resid=D ... hoto%2cPNG.
However, I do not quite understand what you have to show the experiment. The problem is that sANSI field in the client when using ODAC is TWideStringField. Respectively Grid control creates a column with unicode support. This sANSI field in DB is not Unicode. If field is TStringField was in ODAC(not Unicode), then the Grid editor would have created a column with data type AnsiString. As a result, if you enter Unicode values in such a column, it would be immediately obvious that data loss occurs.

MaximG · Post by **MaximG** » Wed 16 Dec 2015 11:40

Data loss occurs due to the fact that Oracle can't store the '¥' value in a VarChar2 field when using CHARACTER_SET=CL8MSWIN1251. Therefore you couldn't insert the required value to the SANSI field without using ODAC ( this exactly was checked in previous post ). When using TWideStringField fields in ODAC (when UseUnicode=True), their values are passed to the server in UTF-16. At this, the Oracle server encodes these values according to its CHARACTER_SET - the characters, that are absent in the CHARACTER_SET of the server, will be lost (their values become '?'). This is correct behavior of Oracle. When retrieving TWideStringField fields from the server, their values are returned in UTF-16 as well. If a character value was lost on saving to a database field, then the TWideStringField value will contain '?'.

cemick · Post by **cemick** » Fri 18 Dec 2015 10:15

All right. You have correctly described the consequences. However, once again. Our database is not Unicode. If we include the UseUnicode=True then get all fields of type TWideStringField. All values in them are Unicode. As a result of data loss we can learn only after you save and the Dataset.Refresh. If ANSI remains TStringField fields, about data loss users would know already while entering text into the cell. Everything else at the level of the program logic we don't know how to work with this field, because we do not know this field really is Unicode or ANSI. Relying on implicit type conversion is dangerous. This factor also imposes a number of difficulties. When copying to the clipboard for Unicode text instead of ansi. There are situation when you might go out incident. And so on. The problem is that the ODAC supports only one mode of operation Oracle to Unicode. When the entire database is Unicode. Use ODAC when only some of the columns have Unicode data type is dangerous and fraught with consequences.

MaximG · Post by **MaximG** » Fri 18 Dec 2015 12:25

Usage of the Unicode property in ODAC is absolutely secure on any field type combination returned by a query. The issue is not caused by using the TWideStringFieldt type. You can convert it to any compatible type using Data Type Mapping: https://www.devart.com/odac/docs/?data_type_mapping.htm . For example, all Varchar2 fields can be mapped explicitly to the String type of Delphi:

uses OraDataTypeMap;
…
OraQuery1.DataTypeMap.Clear;
OraQuery1.DataTypeMap.AddDBTypeRule(oraVarchar2, ftString);
...

The fact is that Oracle is unable to store characters in VARCHAR2 fields if their encoding is not supported by the server code page (the CHARACTER_SET parameter). This is possible only for NVARCHAR2 fields.

Devart Forums

Multilingual Support with Unicode Datatypes

Multilingual Support with Unicode Datatypes

Re: Multilingual Support with Unicode Datatypes

Re: Multilingual Support with Unicode Datatypes

Re: Multilingual Support with Unicode Datatypes

Re: Multilingual Support with Unicode Datatypes

Re: Multilingual Support with Unicode Datatypes

Re: Multilingual Support with Unicode Datatypes

Re: Multilingual Support with Unicode Datatypes

Re: Multilingual Support with Unicode Datatypes

Re: Multilingual Support with Unicode Datatypes

Re: Multilingual Support with Unicode Datatypes

Re: Multilingual Support with Unicode Datatypes