Troy Wolf troy at troywolf.com
Thu Dec 18 12:33:03 PST 2008
This is a preemptive strike to see if this is a familiar issue with any of
you. Share your experience if this rings a bell.

Postgres 8.2
Slony 1.2
Linux

3 Nodes, 1 origin, 1 normal subscriber, 1 logshipping subscriber, 1
replication set with many tables and sequences

This has happened enough times now that I think a pattern is emerging. I
have a shell script that we use to add tables into replication. The script
uses slonik to:

1. Create a "dummy" set (id=3D999)
2. Add the new table to set 999
3. Subscribe to the new set 999
3. Merge set 999 with set 1

We've used this script many times without issue before logshipping. However,
now that we've added logshipping into the replication stew, when we use this
script to add a new table into the replication set, we have a problem where
it appears the temporary set is created, but the subscribe fails. Notice
line #11 below. What could possibly be happening inside Slony that would
cause it to suddenly have trouble writing to it's current logshipping file?

(I have log_level set to 1 to reduce noise in the logs.)

---------------------------------------------------------------------------=
------
   1| 2008-12-18 09:21:01 EST CONFIG storeSet: set_id=3D999 set_origin=3D1
set_comment=3D'temporary merge set to add a table into an existing set'
   2| 2008-12-18 09:21:03 EST CONFIG storeSubscribe: sub_set=3D999
sub_provider=3D1 sub_forward=3D't'
   3| 2008-12-18 09:21:03 EST CONFIG storeListen: li_origin=3D1 li_receiver=
=3D2
li_provider=3D1
   4| 2008-12-18 09:21:03 EST DEBUG1 copy_set 999
   5| 2008-12-18 09:21:03 EST DEBUG1 remoteWorkerThread_1: connected to
provider DB
   6| 2008-12-18 09:21:03 EST WARN   remoteWorkerThread_1: transactions
earlier than XID 1022212361 are still in progress
   7| 2008-12-18 09:21:03 EST WARN   remoteWorkerThread_1: data copy for set
999 failed - sleep 15 seconds
   8| 2008-12-18 09:21:18 EST DEBUG1 copy_set 999
   9| 2008-12-18 09:21:18 EST DEBUG1 remoteWorkerThread_1: connected to
provider DB
  10| NOTICE:  truncate of "my_schema"."my_new_table" succeeded
* 11| 2008-12-18 09:21:18 EST ERROR  remoteWorkerThread_1: Cannot write to
archive file
/opt/data/slony/logship/slony1_log_2_00000000000001183550.sql.tmp - not open
  12| 2008-12-18 09:21:18 EST WARN   remoteWorkerThread_1: data copy for set
999 failed - sleep 30 seconds
  13| WARNING:  there is no transaction in progress
---------------------------------------------------------------------------=
-------

At this same time on the origin, we start seeing this:

---------------------------------------------------------------------------=
-------
   1| 2008-12-18 09:21:01 EST CONFIG storeSet: set_id=3D999 set_origin=3D1
set_comment=3D'temporary merge set to add a table into an existing set'
   2| 2008-12-18 09:21:03 EST CONFIG storeListen: li_origin=3D2 li_receiver=
=3D1
li_provider=3D2
   3| 2008-12-18 09:21:03 EST CONFIG storeListen: li_origin=3D2 li_receiver=
=3D1
li_provider=3D2
   4| NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=3D31321
   5| CONTEXT:  SQL statement "SELECT  "_slony".cleanupNodelock()"
   6| PL/pgSQL function "cleanupevent" line 77 at perform
   7| NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=3D31327
   8| CONTEXT:  SQL statement "SELECT  "_slony".cleanupNodelock()"
   9| PL/pgSQL function "cleanupevent" line 77 at perform
  10| NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=3D31341
  11| CONTEXT:  SQL statement "SELECT  "_slony".cleanupNodelock()"
  12| PL/pgSQL function "cleanupevent" line 77 at perform
---------------------------------------------------------------------------=
-------

Today, when this happened, I just stopped slony and restarted (on the normal
subscriber) and the problem went away. In the past, I've had to stop slony,
configure it NOT to do logshipping, start slony, problem goes away, stop
slony, configure logshipping on, start slony. Of course the danger in this
second scenario is that any changes that flowed through while logshipping
was off never make it to my logshipping-subscribed node.

Troy Wolf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.slony.info/pipermail/slony1-general/attachments/20081218/=
ed7aeed0/attachment.htm


More information about the Slony1-general mailing list