Page 1 of 1

PostgreSQL lookup - inefficient query

Posted: Thu 12 Jul 2018 10:27
by mike_d
Hi,

I'm using Devart SSIS Data Flow components (v 1.7.672) for SQL Server => Postgres data migration.

Trying to investigate weak performance of Devart PostgreSQL Lookup component, I profiled queries generated by it. Long story short - I got the following query (object names changed but logic stays the same):

Code: Select all

SELECT col1, col2, col3, col4 
  FROM my_table 
 WHERE (col1 = 1) OR 
       (col1 = 2) OR 
       (...)
       (col1 = 10000)
- there are 10000 OR conditions (this seems to be batch size for Lookup component).

Problem is, that type of query generates the following execution plan in Postgres (tested on Postgres 9.5 and 10):

Code: Select all

Bitmap Heap Scan on my_table  (cost=4686.00..10491.10 rows=1014 width=12) (actual time=3.963..6.279 rows=1020 loops=1)
  Recheck Cond: ((col1 = 1) OR (col1 = 2) OR (...) OR (col1 = 10000))
  Heap Blocks: exact=760
  ->  BitmapOr  (cost=4686.00..4686.00 rows=1015 width=0) (actual time=3.780..3.780 rows=0 loops=1)
        ->  Bitmap Index Scan on ix_on_my_table_on_col_1  (cost=0.00..4.43 rows=1 width=0) (actual time=0.030..0.030 rows=1 loops=1)
              Index Cond: (col1 = 1)
        ->  Bitmap Index Scan on ix_on_my_table_on_col_1  (cost=0.00..4.43 rows=1 width=0) (actual time=0.003..0.003 rows=1 loops=1)
              Index Cond: (col1 = 2)       
       (...)
        ->  Bitmap Index Scan on ix_on_my_table_on_col_1  (cost=0.00..4.43 rows=1 width=0) (actual time=0.003..0.003 rows=1 loops=1)
              Index Cond: (col1 = 10000)     
Basically, for those 10000 conditions I get 10000 index scans which is.... suboptimal to say the least ;)

Now, if that query is rewritten to the following form (which produces exactly the same results):

Code: Select all

SELECT col1, col2, col3, col4 
  FROM my_table 
 WHERE col1 IN (1, 2, (...), 10000)  
I get way better plan - one that executes in milliseconds.

Code: Select all

Index Scan using ix_on_my_table_on_col_1 on my_table  (cost=0.42..14.73 rows=2 width=12) (actual time=0.053..0.059 rows=2 loops=1)
  Index Cond: (col1 = ANY ('{1,2,(...), 10000}'::integer[]))             
My question is - any chance for changes in Lookup component for Postgres that would generate <column> in (all values) query instead of <column> = value OR <column> = value OR....
?

While I understand it's a postgres issue, unfortunately in current form lookup component in unusable because of performance reasons :/

Re: PostgreSQL lookup - inefficient query

Posted: Fri 13 Jul 2018 13:32
by Pinturiccio
Thank you for the provided information. We will investigate the possibility to replace OR conditions with one IN condition and post here when we get any results.

Re: PostgreSQL lookup - inefficient query

Posted: Wed 18 Jul 2018 14:06
by Pinturiccio
We have changed the behaviour: SSIS Lookup component now creates one IN clause instead of many OR clauses. We post here you when the corresponding build of SSIS Data Flow Components is available for download.

Re: PostgreSQL lookup - inefficient query

Posted: Thu 19 Jul 2018 14:04
by Pinturiccio
New build of Devart SSIS Data Flow Components 1.7.747 is available for download now!
It can be downloaded from https://www.devart.com/ssis/download.html or from Customer Portal (for users with valid subscription only).
For more information, please refer to viewtopic.php?t=37480

Re: PostgreSQL lookup - inefficient query

Posted: Fri 24 Aug 2018 10:01
by mike_d
Hi Pinturiccio,

I tested most recent version and indeed it generates IN condition - thanks a lot for quick response and fix, appreciate!