[Slony1-general] Still having issues with wide area replication. large table , copy set 2 failed

Sat Feb 15 22:48:37 PST 2014

It's probably a firewall timing out your PostgreSQL connection while the indexes are being built on the replica. 

Look into tcp keep alive settings. 

> On Feb 15, 2014, at 22:09, Tory M Blue <tmblue at gmail.com> wrote:
> 
> 
> So I've been fighting with this for a few months. I had someone on slony Dev attempt to lend a hand but others in the group, felt it was more of a  postgres issue. While this may be true, I'm still looking for some assistance. Everything points to a disconnect in slony.
> 
> Wide area replication, fails on one of my largest tables. Now the table will copy over complete no issues (using standard pgsql commands), it's the post processing after the data is copied that seems to cause a sig term or something on the connection, since slony states that the set failed and tries again, fails at the same place ,
> 
> 2014-02-15 15:23:00 PST CONFIG remoteWorkerThread_1: Begin COPY of table "tracking"."spotlightimp"
> 2014-02-15 16:46:45 PST CONFIG remoteWorkerThread_1: 5643041332 bytes copied for table "tracking"."spotlightimp"   <--- Completes transfer
>  2014-02-15 17:34:10 PST CONFIG remoteWorkerThread_1: 7870.124 seconds to copy table "tracking"."spotlightimp"    <-- At this point it finishes the index creation and everything else
> 2014-02-15 17:34:10 PST CONFIG remoteWorkerThread_1: copy table "tracking"."adimp"
> 2014-02-15 17:34:10 PST CONFIG remoteWorkerThread_1: Begin COPY of table "tracking"."adimp"
> 2014-02-15 17:34:10 PST ERROR  remoteWorkerThread_1: "select "_slonyschema".copyFields(19);"   <--- FAILS but adimp table is there, this is a red herring. the issue is above!
> 2014-02-15 17:34:10 PST WARN   remoteWorkerThread_1: data copy for set 2 failed 1 times - sleep 15 seconds
> NOTICE:  Slony-I: Logswitch to sl_log_1 initiated
> CONTEXT:  SQL statement "SELECT "_slonyschema".logswitch_start()"
> PL/pgSQL function _slonyschema.cleanupevent(interval) line 96 at PERFORM
> 2014-02-15 17:34:14 PST INFO   cleanupThread: 7209.360 seconds for cleanupEvent()
> 
> 
> I've brought my work_mem to over 40GB and that's not helping the length of time for this large table. I have even removed the index statement still doesn't cut the time,  The copy is fine, all the data comes over. It's something in the processing of the table. There is s disconnect at some point between when slony finishes up the copy of the spotlightimp, and Postgres processes the rules in the table, and slony starts on the next table.
> 
> 
> 2014-02-15 18:48:27 PST CONFIG remoteWorkerThread_1: copy table "tracking"."spotlightimp"
> 2014-02-15 18:48:27 PST CONFIG remoteWorkerThread_1: Begin COPY of table "tracking"."spotlightimp"
> 2014-02-15 20:11:07 PST CONFIG remoteWorkerThread_1: 5643067207 bytes copied for table "tracking"."spotlightimp"
> 2014-02-15 20:59:46 PST CONFIG remoteWorkerThread_1: 7878.124 seconds to copy table "tracking"."spotlightimp"
> 2014-02-15 20:59:46 PST CONFIG remoteWorkerThread_1: copy table "tracking"."adimp"
> 2014-02-15 20:59:46 PST CONFIG remoteWorkerThread_1: Begin COPY of table "tracking"."adimp"
> 2014-02-15 20:59:46 PST ERROR  remoteWorkerThread_1: "select "_slonyschema".copyFields(19);" 
> 2014-02-15 20:59:46 PST WARN   remoteWorkerThread_1: data copy for set 2 failed 1 times - sleep 15 seconds
> NOTICE:  Slony-I: log switch to sl_log_2 complete - truncate sl_log_1
> CONTEXT:  PL/pgSQL function _slonyschema.cleanupevent(interval) line 94 at assignment
> 2014-02-15 20:59:50 PST INFO   cleanupThread: 7203.435 seconds for cleanupEvent()
> 
> I do feel incredibly strongly it's the size of the table and how long the process takes, the network / postgres is either reaping the connection or other causing slony to be in an unknown state and causes the error the minute we try to move forward from the spotlightimp table.. If I could cut down the preprocessing after the table was copied that may solve it, but removing the index part has not helped the situation as I hoped it would.  This is a complicated table, as well as it's size.
> 
> I would love to get this sorted out, slony should allow for this remote replication, but something is going wrong and man would I love to get this resolved!
> 
> CentOS6.2
> Postgres 9.2.4 slony 2.1.3
> 
> Thanks
> Tory
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general