PostgreSQL COPY TO command available?

Discussion of open issues, suggestions and bugs regarding ADO.NET provider for PostgreSQL
Post Reply
migle
Posts: 27
Joined: Tue 19 Jan 2010 13:52

PostgreSQL COPY TO command available?

Post by migle » Mon 22 Feb 2010 14:22

Hi,

For inserting data into the database faster than using INSERT commands, dotConnect has the PgSqlLoader class which uses PostgreSQL COPY FROM command and is faster.
Is there some similar way to use the PostgreSQL COPY TO command?

When using PgSqlCommand and PgSqlDataReader, I can only use a SELECT statement (can't manage to use COPY TO), and it is a bit slow.

Thanks

StanislavK
Devart Team
Posts: 1710
Joined: Thu 03 Dec 2009 10:48

Post by StanislavK » Tue 23 Feb 2010 14:42

There is no special class for data transfer from the database to a file in dotConnect for PostgreSQL, but you should be able to execute the "Copy To" command via the PgSqlCommand object.

Please specify the problem you are encountering in this case.

migle
Posts: 27
Joined: Tue 19 Jan 2010 13:52

Post by migle » Wed 24 Feb 2010 12:23

Ok, maybe I didn't try hard enough. I'll try harder when I can and then post again if I can't make it work.

migle
Posts: 27
Joined: Tue 19 Jan 2010 13:52

Post by migle » Tue 23 Mar 2010 16:52

Ok,

I had some time now. I misread your answer last time: you got it wrong, I don't want to transfer data to a file. I would like to know if it is possible to transfer data to the client in using COPY TO instead of a select.

Just as it is possible to transfer data TO the database using PgSqlReader, which translates to "COPY FROM", I would like to know if there is an optimized transfer of data to the client, which would translate to "COPY TO".

Something like the following, only it doesn't work.

Code: Select all

            string sql =
                "COPY (SELECT column1, column2 FROM table1 JOIN table2 WHERE column1 > :param1) " +
                " TO STDOUT WITH BINARY";

            PgSqlCommand comm = new PgSqlCommand(sql, conn);

            comm.Parameters.Add(new PgSqlParameter("param1", 0));

            comm.PgSqlDataReader reader = comm.ExecuteReader();
            List list = new List();
            try
            {
                while (reader.Read())
                {
                    list.Add(new MyObject()
                    {
                        Column1 = reader.GetInt32(0),
                        Column2 = reader.GetInt32(1)
                    });
                }
            }
            finally
            {
                reader.Close();
            }

StanislavK
Devart Team
Posts: 1710
Joined: Thu 03 Dec 2009 10:48

Post by StanislavK » Wed 24 Mar 2010 16:50

Could you please qualify the difference between COPY TO and SELECT performance?

There is no tool for processing STDOUT of the COPY TO command in dotConnect for PostgreSQL.

You may be interested in the ExecutePageReader method of PgSqlCommand. It returns a data reader for only a subset of rows, and thus can increase the performance.

migle
Posts: 27
Joined: Tue 19 Jan 2010 13:52

Post by migle » Thu 25 Mar 2010 10:12

No, I can't qualify the difference between COPY TO and SELECT performance.
Precisely, I wanted to know if that functionality was available, to see if I could gain anything from it.
PostgreSQL documentation often refers to using COPY TO being faster.

I see one difference, even if performance is the same.
COPY TO ... WITH BINARY transfers data in binary.
Binary data transfer TO the database is possible using your PgSqlLoader class (despite bugs I already reported and have been fixed in the meanwhile). However, without using COPY TO binary transfer FROM the database is not possible.

Even though performance may be more or less the same, binary transfer can be preferrable in some cases, for example in the case of the transfer of floating point data. Even though a double precision value should theoretically be accurately reproduced from a 16 decimal digit representation, it's hard to guarantee that the deserialized value is exactly the same for all cases.
And even then, you require 16 decimal digits, a decimal dot, perhaps a +/- sign, an 'e' for exponent, plus some digits for exponent. That's well over 20 bytes for a single double precision value, that is represented in binary using only 8 bytes.

For the typical case, using text representation of numerical data takes twice as much space as binary. Now, this imposes an unnecessary burden on network bandwidth utilization.

In my case, ExecutePageReader is not what I need. I have lots of numerical data which I transfer using queries that return 400-800 rows at a time. Because it's not many rows, ExecutePageReader won't help me. The only problem is that I transfer some 20 such chunks each second (that's 10000 rows/second) and any gain I have here would be noticeable.

StanislavK
Devart Team
Posts: 1710
Joined: Thu 03 Dec 2009 10:48

Post by StanislavK » Thu 25 Mar 2010 15:01

Could you please provide us with a link to the PostgreSQL official documentation where it is stated that the COPY TO command is more performant than SELECT? Unfortunately, I couldn't find any at the moment.

Also, please note that the SELECT command transfers result sets in binary mode as well.

migle
Posts: 27
Joined: Tue 19 Jan 2010 13:52

Post by migle » Thu 25 Mar 2010 16:50

StanislavK wrote:Could you please provide us with a link to the PostgreSQL official documentation where it is stated that the COPY TO command is more performant than SELECT? Unfortunately, I couldn't find any at the moment.
Is that a joke?

Look, I asked if the feature was available.
I wanted to check if I had any performance gain with that.
If the feature is not available, why don't you just answer so?

If you know better than that, that using "COPY TO" isn't faster than using "SELECT", then great.
StanislavK wrote:Also, please note that the SELECT command transfers result sets in binary mode as well.
Does it? Great.
I'm not the one who knows the protocol here.

But then again, are you sure?
Because when I use the psql client, if I ask for the square root of 2 and check what goes on the wire with tcpdump I see the three packets bellow being transfered.
There I see a binary protocol, but the data is sent in text mode.

Now, that is using the psql command line client not with your driver (I'm not in position of checking with your driver right now).

So, again, is that really so?
If it is then I was really mistaken there.

Code: Select all

x0000:  4500 0049 64e4 4000 4006 5266 c0a8 010a  E..Id.@[email protected]....
0x0010:  c0a8 010a 839a 1538 7c02 bed2 7b5e 5954  .......8|...{^YT
0x0020:  8018 0212 83a0 0000 0101 080a 0077 f964  .............w.d
0x0030:  0077 dab1 5100 0000 1473 656c 6563 7420  .w..Q....select.
0x0040:  7371 7274 2832 293b 00                   sqrt(2);.

0x0000:  4500 007e 82fc 4000 4006 3419 c0a8 010a  E..~..@[email protected].....
0x0010:  c0a8 010a 1538 839a 7b5e 5954 7c02 bee7  .....8..{^YT|...
0x0020:  8018 0200 83d5 0000 0101 080a 0077 f965  .............w.e
0x0030:  0077 f964 5400 0000 1d00 0173 7172 7400  .w.dT......sqrt.
0x0040:  0000 0000 0000 0000 02bd 0008 ffff ffff  ................
0x0050:  0000 4400 0000 1900 0100 0000 0f31 2e34  ..D..........1.4
0x0060:  3134 3231 3335 3632 3337 3331 4300 0000  142135623731C...
0x0070:  0b53 454c 4543 5400 5a00 0000 0549       .SELECT.Z....I

0x0000:  4500 0034 64e5 4000 4006 527a c0a8 010a  E..4d.@[email protected]....
0x0010:  c0a8 010a 839a 1538 7c02 bee7 7b5e 599e  .......8|...{^Y.
0x0020:  8010 0212 54d3 0000 0101 080a 0077 f965  ....T........w.e
0x0030:  0077 f965                                .w.e
[/code]

StanislavK
Devart Team
Posts: 1710
Joined: Thu 03 Dec 2009 10:48

Post by StanislavK » Fri 26 Mar 2010 14:02

dotConnect for PostgreSQL does not have such feature.

We are now investigating the possibility of implementing it, and would be very grateful to you if you help us with this decision. In particular, any information on the COPY TO performance comparative with the SELECT one will be highly appreciated.

As for the binary format, please check which protocol you are using in your connection to the PostgreSQL server. Old versions of PostgreSQL servers use the text protocol of version 2.0, whereas newer versions support protocol 3.0, which is binary. dotConnect for PostgreSQL allows setting either version 2.0 or 3.0 for your connection:
http://www.devart.com/dotconnect/postgresql/docs/...

migle
Posts: 27
Joined: Tue 19 Jan 2010 13:52

Post by migle » Tue 06 Apr 2010 09:00

Hello, again.

I have just confirmed that when I do a SELECT using dotConnect for PostgreSQL, the data is transmitted in binary as you said.

In that case, maybe there's no reason for COPY TO being faster and my concerns about data loss on floating point data also are unfounded.

I have to leave this up to you. I have no performance comparison data and it seems not to make sense.

The only thing odd I saw right now is that while when I use the psql client which uses text protocol and I do that "select sqrt(2)" query, only 3 messages are exchanged between client and server.
When I use dotConnect for PostgreSQL, using binary protocol, tcpdump shows me 18 messages being exchanged...

Well, this is not the place to discuss the PostgreSQL protocol... but it looks redundant, with lots of repetitions of sequences like "PORTAL20318972712401650" and "PTSTMT20318972712401650".

Anyway, thanks, I have what I needed most, which is binary protocol, and can't believe that performance improvement with COPY TO can be anything better than 10% (just guessing), which makes little difference.

StanislavK
Devart Team
Posts: 1710
Joined: Thu 03 Dec 2009 10:48

Post by StanislavK » Tue 06 Apr 2010 14:33

Glad to see that the problem resolved.

As for multiple messages, this is a feature of the binary protocol. The server is called several times as the query is being prepared before execution.

sideshowleonard
Posts: 1
Joined: Tue 05 Aug 2014 22:26

Re: PostgreSQL COPY TO command available?

Post by sideshowleonard » Tue 05 Aug 2014 23:16

Can we resurrect this topic?

We know that "copy from stdin" is more performant when *loading* data:
http://www.postgresql.org/docs/9.2/static/populate.html

.."copy from stdin" would have been nice for a database conversion project I was working on several years ago, which migrated an old Access database to PostgreSQL.

There is also an interesting case for *dumping* data with "copy to stdout," even though the performance difference (as previously stated) isn't compelling...

Postgresql has already gone through the pains of writing/reading CSV format (http://secretgeek.net/csv_trouble). For example, what if your field contains a carriage return/line feed? Did you escape it properly on output? Did you parse it properly in input? Instead of relying on a CSV output library, or writing our own, which may be buggy, it would be nice to write a query like:

Code: Select all

COPY (
  SELECT
      student.name AS "Student Name"
    , score.grade AS "Grade"
    , course.name AS "Course Name"
    , student.notes AS "Student Notes"
  FROM 
         student
    JOIN score ON student.id=score.student_id
    JOIN course ON score.course_id=course.id
  WHERE
        score.semester='2014 S2'
    AND course.name='SQL 101'
) TO STDOUT WITH CSV HEADER;
..and then read that result from the query object and write it to a file (one that, for example, could be digested by a csv reporting tool.. or MS Excel.. or bulk-imported into another database.. etc...).

If necessary, I'm sure I can come up with more reasons why 'copy to stdout' is good to have.

Cheers,
-Leo

Pinturiccio
Devart Team
Posts: 2420
Joined: Wed 02 Nov 2011 09:44

Re: PostgreSQL COPY TO command available?

Post by Pinturiccio » Thu 07 Aug 2014 15:04

We will investigate the possibility to copy data from the stdin stream and copy to the stdout stream, but we can't tell any timeframe at the moment.

Post Reply