Performance issue with LEFT OUTER JOIN on a CHAR field

alheureu · Post by **alheureu** » Mon 25 Feb 2013 20:24

Hi, here's another issue I face with a CHAR field. I do wish my DB was not full of fields of this old datatype. Anyway, here's how it goes:

Let's suppose I have table A with the two columns:
PGM_CD CHAR(4) NOT NULL, Primary Key
PGM_DESC CHAR(63) NOT NULL;

And table B with two columns and a foreign key to table A:
ETA_CD CHAR(2) NOT NULL, Primary Key Part A, Foreign Key to table A
PGM_CD CHAR(4) NOT NULL, Primary Key Part B
ETA_DESC CHAR(15);

Now, suppose I want to load all entities from table B and eager load related entites from table A. The following SQL will be generated by dotConnect:

SELECT [columns...]
FROM PRISME.PGM_ETABLISSEMENT t1
LEFT OUTER JOIN PRISME.PROGRAMME t2
ON (t1.PGM_CD = t2.PGM_CD)
OR ((TRIM(t1.PGM_CD) IS NULL) AND (TRIM(t2.PGM_CD) IS NULL))
ORDER BY t1.PGM_CD
-- Context: Devart.Data.Oracle.Linq.Provider.OracleDataProvider Model: MetaModel Build: 4.1.184.0

This statement takes 12 secondes to execute which is too long.
But take the same statement and just remove this part:

« OR ((TRIM(t1.PGM_CD) IS NULL) AND (TRIM(t2.PGM_CD) IS NULL)) »

Then the statement takes less than 2 seconds to execute.
So, I wonder if there's an option that enables me to get rid of that line from the generated SQL statement?

My guess is that it is not useful anyway because I specified that the column PGM_CD is not nullable. Isn't this line only necessary if the field is nullable?

As always your help is really appreciated. Thanks!

StanislavK · Post by **StanislavK** » Wed 27 Feb 2013 15:24

Thank you for the report, we have reproduced this behaviour. We will analyze the possibility of removing the second condition of the join.

alheureu · Post by **alheureu** » Fri 22 Mar 2013 19:51

We will analyze the possibility of removing the second condition of the join.

Did you have any time to analyze the possibility? If so, will the condition be removed or not?
Also, do you think this change would be made in a short time frame, let's say the next quarter?

Sorry to insist, but let me explain myself. I have a lot of char fields that are part of a key in the database I am working with and thus the performance issues are found in many parts of my application. The performance is so bad that I have to find an alternative. Now, I need to decide if I'm gonna live with this issue for a short time in expectation of a solution in a short-term future release or if I have to think of a plan B, like using another persistence framework.

Thanks for your consideration.

MariiaI · Post by **MariiaI** » Mon 25 Mar 2013 12:51

We have investigated this behaviour. Provided that the additional checks for possibly empty fixed chars are added, these checks will be added to the queries for loading related entities as well:
- first, the query for loading related entities is prepared regardless of the particular primary key values and cached (in general cases, this improves performance);
- second, when trimming is enabled, any fixed char (even that explicitly marked as 'not null') is potentially an empty string (i.e., a null), so the null checks are necessary for these queries.

As a workaround, you can execute queries with explicit primary key checks like:

Code: Select all

var list1 = dc.PGM_ETABLISSEMENTs.First();
var list2 = (list1.PROGRAMME.Where(p => p.PGM_CD == p.PGM_CD)).ToList();

Also, you could try adding indexes on primary key and foreign key with the 'Trim' function (this also would speed up executing of the query):

Code: Select all

CREATE INDEX FixCharTrimPK ON PGM_ETABLISSEMENT(TRIM(PGM_CD))
CREATE INDEX FixCharTrimFK ON PROGRAMME(TRIM(PGM_CD))

alheureu · Post by **alheureu** » Mon 25 Mar 2013 18:01

Hi, first of all, thanks for your quick answer, I appreciate!

Sadly, the workarounds will not help me.

you can execute queries with explicit primary key checks like:

This will not work for two reasons.
A) Each Pgm_etablissement only has one Programme (many to one relationship).
So I cannot do «list1.programme.where()» as programme is not a list of entity but only one entity. Or would there be a way to specify explicit primary key check for a single entity?

B) Even if it could be done, this would mean I go from a single sql statement (reading from two joined tables) to multiple sql statements (one to read the Pgm_etablissement and then one more for each of these pgm_etablissement to go fetch its programme). So, imagine I have to read X pgm_etablissement then X+1 queries would be generated. Can rapidly become a performance issue.

you could try adding indexes on primary key and foreign key with the 'Trim' function

Function based index are not allowed in the database I am working with. I will not explain the details, but I can assure you it is for a good reason and I can't do anything with it.
Anyway, I still tried to add the indexes just to see the impact and it did not prove to be helpful. It did not even make a significant improvement.

Also, isn't it fundamentally wrong to do that:
OR ((TRIM(t1.PGM_CD) IS NULL) AND (TRIM(t2.PGM_CD) IS NULL)).

I mean:
1)NULL shoud not be considered equal to NULL,
2)In a case other than mine, where the keys could be null, this statement could join rows for table A with a null foreign key to rows from table B with a five blanks key. Blanks are not equal to null.

MariiaI · Post by **MariiaI** » Wed 27 Mar 2013 12:49

Your arguments are completely correct. We are considering the possibility of removing the second condition of the join in such situations, but we cannot provide any timeframe at the moment.

You could try using 'join' explicitly instead of 'LoadWith' in your queries,e g.:

Code: Select all

var list = (from t in dc.PROGRAMME
               join x in dc.PGM_ETABLISSEMENTs on t.PGM_CD equals x.PGM_CD
               select new { childID = t.PGM_CD, parentID = x.PGM_CD});

This way, you should get the proper entities with the join based on the indexed columns only. However, the navigation properties won't be loaded; i.e., accessing a certain child's 'PGM_ETABLISSEMENT' property will result in executing an additional query (with checks on trimmed columns).

Devart Forums

Performance issue with LEFT OUTER JOIN on a CHAR field

Performance issue with LEFT OUTER JOIN on a CHAR field

Re: Performance issue with LEFT OUTER JOIN on a CHAR field

Re: Performance issue with LEFT OUTER JOIN on a CHAR field

Re: Performance issue with LEFT OUTER JOIN on a CHAR field

Re: Performance issue with LEFT OUTER JOIN on a CHAR field

Re: Performance issue with LEFT OUTER JOIN on a CHAR field