Knight, Doug dknight at wsi.com
Fri Oct 3 10:04:56 PDT 2008
I totally agree that a configurable method is preferable. To note, I
would not expect to see any performance improvement unless the table
being copied contained binary data. Our tables run upwards of 1.7GB of
binary data. It saves the conversion from binary to character,
transmission, then conversion back to binary. When I get a chance I'll
try implementing the WITH BINARY in slony, and see if I get the same or
similar results to my simple table "copy".

Doug

-----Original Message-----
From: Christopher Browne [mailto:cbbrowne at ca.afilias.info] 
Sent: Friday, October 03, 2008 12:28 PM
To: ""=?UTF-8?Q?Filip_Rembia=C5=82kowski?=""
Cc: Knight, Doug; slony1-general at lists.slony.info
Subject: Re: [Slony1-general] Binary COPY in slony?

""=?UTF-8?Q?Filip_Rembia=C5=82kowski?="" <plk.zuber at gmail.com> writes:
> 2008/10/2 Knight, Doug <dknight at wsi.com>:
>> I put together a quick test, copying one of my largest tables
>> (around 1.7GB), from my staging system back to my workstation (not
>> over the dedicated GB connection, but through our
>> intranet). Without the binary qualifier on the COPY command it took
>> 11 minutes and 33 seconds. With the binary option it took 2 minutes
>> and 52 seconds. That's a significant savings in replication time,
>> roughly 400% if my math is right. When I next test a slony copy,
>> I'll compare results using the dedicated link to eliminate any
>> random latencies on our intranet.
>
> Wow, that's nice improvement.
> the quick&dirty patch could be just to add WITH BINARY to both
> copy-out and copy-in invocations in remote_worker.c
> please share your success story afterwards :)

It *is* a very simple change, in the simplest form of the patch:

===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/src/slon/remote_worker.c,v
retrieving revision 1.176
diff -c -u -r1.176 remote_worker.c
cvs diff: conflicting specifications of output style
--- src/slon/remote_worker.c    29 Aug 2008 21:06:45 -0000      1.176
+++ src/slon/remote_worker.c    3 Oct 2008 16:19:58 -0000
@@ -2886,7 +2886,7 @@

                (void) slon_mkquery(&query1,
                                                        "select
%s.prepareTableForCopy(%d); "
-                                                       "copy %s %s from
stdin; ",
+                                                       "copy %s %s from
stdin WITH BINARY; ",
                                                        rtcfg_namespace,
                                                        tab_id,
tab_fqname,
                                                        PQgetvalue(res3,
0, 0)
@@ -2912,7 +2912,7 @@
                if (archive_dir)
                {
                        (void) slon_mkquery(&query1,
-                       "delete from %s;\ncopy %s %s from stdin;",
tab_fqname, tab_fqname,
+                       "delete from %s;\ncopy %s %s from stdin WITH
BINARY;", tab_fqname, tab_fqname,
 
PQgetvalue(res3, 0, 0));
                        rc = archive_append_ds(node, &query1);
                        if (rc < 0)
@@ -2933,7 +2933,7 @@
                 * Begin a COPY to stdout for the table on the provider
DB
                 */
                (void) slon_mkquery(&query1,
-                          "copy %s %s to stdout; ", tab_fqname,
PQgetvalue(res3, 0, 0));
+                          "copy %s %s to stdout WITH BINARY; ",
tab_fqname, PQgetvalue(res3, 0, 0));
                PQclear(res3);
                res3 = PQexec(pro_dbconn, dstring_data(&query1));
                if (PQresultStatus(res3) != PGRES_COPY_OUT)

I ran a small test, and didn't see a material difference between
having WITH BINARY and not having it.  

It is fair to say that my test wasn't at all comprehensive; I merely
stowed 32K records in a table, and set up a subscription, and and
with-versus-without was the difference between 0.45s and 0.51s.

The way I would prefer to make this change would be a little bit more
complex than the above.  The BINARY form is documented to be "less
portable," as follows:

   "The BINARY key word causes all data to be stored/read as binary
   format rather than as text. It is somewhat faster than the normal
   text mode, but a binary-format file is less portable across machine
   architectures and PostgreSQL versions."

Thus, my inclination would be to add a configuration parameter for
this, which allows changing between BINARY and "not BINARY" at
runtime, as opposed to picking one policy or the other at compile
time.

I'd certainly be interested in hearing how material this change is,
for larger table sizes.
-- 
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://linuxdatabases.info/info/emacs.html
You don't *run* programs on Ultrix.
                - Mark Moraes
Right, you chase them.
                - Rayan Zachariassen



More information about the Slony1-general mailing list