PostgreSQL lookup - inefficient query

Discussion of issues, suggestions and bugs of Devart SSIS Data Flow Components, our product line for building SSIS-based ETL solutions, performing data access to popular cloud applications and databases.
Post Reply
mike_d
Posts: 2
Joined: Thu 05 Apr 2018 12:31

PostgreSQL lookup - inefficient query

Post by mike_d » Thu 12 Jul 2018 10:27

Hi,

I'm using Devart SSIS Data Flow components (v 1.7.672) for SQL Server => Postgres data migration.

Trying to investigate weak performance of Devart PostgreSQL Lookup component, I profiled queries generated by it. Long story short - I got the following query (object names changed but logic stays the same):

Code: Select all

SELECT col1, col2, col3, col4 
  FROM my_table 
 WHERE (col1 = 1) OR 
       (col1 = 2) OR 
       (...)
       (col1 = 10000)
- there are 10000 OR conditions (this seems to be batch size for Lookup component).

Problem is, that type of query generates the following execution plan in Postgres (tested on Postgres 9.5 and 10):

Code: Select all

Bitmap Heap Scan on my_table  (cost=4686.00..10491.10 rows=1014 width=12) (actual time=3.963..6.279 rows=1020 loops=1)
  Recheck Cond: ((col1 = 1) OR (col1 = 2) OR (...) OR (col1 = 10000))
  Heap Blocks: exact=760
  ->  BitmapOr  (cost=4686.00..4686.00 rows=1015 width=0) (actual time=3.780..3.780 rows=0 loops=1)
        ->  Bitmap Index Scan on ix_on_my_table_on_col_1  (cost=0.00..4.43 rows=1 width=0) (actual time=0.030..0.030 rows=1 loops=1)
              Index Cond: (col1 = 1)
        ->  Bitmap Index Scan on ix_on_my_table_on_col_1  (cost=0.00..4.43 rows=1 width=0) (actual time=0.003..0.003 rows=1 loops=1)
              Index Cond: (col1 = 2)       
       (...)
        ->  Bitmap Index Scan on ix_on_my_table_on_col_1  (cost=0.00..4.43 rows=1 width=0) (actual time=0.003..0.003 rows=1 loops=1)
              Index Cond: (col1 = 10000)     
Basically, for those 10000 conditions I get 10000 index scans which is.... suboptimal to say the least ;)

Now, if that query is rewritten to the following form (which produces exactly the same results):

Code: Select all

SELECT col1, col2, col3, col4 
  FROM my_table 
 WHERE col1 IN (1, 2, (...), 10000)  
I get way better plan - one that executes in milliseconds.

Code: Select all

Index Scan using ix_on_my_table_on_col_1 on my_table  (cost=0.42..14.73 rows=2 width=12) (actual time=0.053..0.059 rows=2 loops=1)
  Index Cond: (col1 = ANY ('{1,2,(...), 10000}'::integer[]))             
My question is - any chance for changes in Lookup component for Postgres that would generate <column> in (all values) query instead of <column> = value OR <column> = value OR....
?

While I understand it's a postgres issue, unfortunately in current form lookup component in unusable because of performance reasons :/

Pinturiccio
Devart Team
Posts: 2420
Joined: Wed 02 Nov 2011 09:44

Re: PostgreSQL lookup - inefficient query

Post by Pinturiccio » Fri 13 Jul 2018 13:32

Thank you for the provided information. We will investigate the possibility to replace OR conditions with one IN condition and post here when we get any results.

Pinturiccio
Devart Team
Posts: 2420
Joined: Wed 02 Nov 2011 09:44

Re: PostgreSQL lookup - inefficient query

Post by Pinturiccio » Wed 18 Jul 2018 14:06

We have changed the behaviour: SSIS Lookup component now creates one IN clause instead of many OR clauses. We post here you when the corresponding build of SSIS Data Flow Components is available for download.

Pinturiccio
Devart Team
Posts: 2420
Joined: Wed 02 Nov 2011 09:44

Re: PostgreSQL lookup - inefficient query

Post by Pinturiccio » Thu 19 Jul 2018 14:04

New build of Devart SSIS Data Flow Components 1.7.747 is available for download now!
It can be downloaded from https://www.devart.com/ssis/download.html or from Customer Portal (for users with valid subscription only).
For more information, please refer to viewtopic.php?t=37480

mike_d
Posts: 2
Joined: Thu 05 Apr 2018 12:31

Re: PostgreSQL lookup - inefficient query

Post by mike_d » Fri 24 Aug 2018 10:01

Hi Pinturiccio,

I tested most recent version and indeed it generates IN condition - thanks a lot for quick response and fix, appreciate!

Post Reply