[Slony1-general] Issue

Wed Aug 25 18:07:53 PDT 2004

Well, bad form replying to my own post, but no one else has so I'll have to.

This is the slon output at -d 4 Node 1:
CONFIG main: local node id = 1
CONFIG main: loading current cluster configuration
CONFIG storeNode: no_id=2 no_comment='Node 2'
DEBUG2 setNodeLastEvent: no_id=2 event_seq=7032
CONFIG storePath: pa_server=2 pa_client=1 
pa_conninfo="dbname=test_destination host=river user=postgres" 
pa_connretry=10
CONFIG storeListen: li_origin=2 li_receiver=1 li_provider=2
CONFIG storeSet: set_id=1 set_origin=1 set_comment='All pgbench tables'
DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
CONFIG storeSet: set_id=2 set_origin=1 set_comment='seq_test table'
DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
DEBUG2 main: last local event sequence = 8095
CONFIG main: configuration complete - starting threads
DEBUG1 localListenThread: thread starts
FATAL  localListenThread: Another slon daemon is serving this node already

Node 2:
CONFIG main: local node id = 2
CONFIG main: loading current cluster configuration
CONFIG storeNode: no_id=1 no_comment='Node 1'
DEBUG2 setNodeLastEvent: no_id=1 event_seq=8083
CONFIG storePath: pa_server=1 pa_client=2 
pa_conninfo="dbname=test_source host=river user=postgres" pa_connretry=10
CONFIG storeListen: li_origin=1 li_receiver=2 li_provider=1
CONFIG storeSet: set_id=1 set_origin=1 set_comment='All pgbench tables'
WARN   remoteWorker_wakeup: node 1 - no worker thread
DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
CONFIG storeSet: set_id=2 set_origin=1 set_comment='seq_test table'
WARN   remoteWorker_wakeup: node 1 - no worker thread
DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
CONFIG storeSubscribe: sub_set=1 sub_provider=1 sub_forward='f'
WARN   remoteWorker_wakeup: node 1 - no worker thread
DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
CONFIG enableSubscription: sub_set=1
WARN   remoteWorker_wakeup: node 1 - no worker thread
DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
CONFIG storeSubscribe: sub_set=2 sub_provider=1 sub_forward='f'
WARN   remoteWorker_wakeup: node 1 - no worker thread
DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
CONFIG enableSubscription: sub_set=2
WARN   remoteWorker_wakeup: node 1 - no worker thread
DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
DEBUG2 main: last local event sequence = 7032
CONFIG main: configuration complete - starting threads
DEBUG1 localListenThread: thread starts
FATAL  localListenThread: Another slon daemon is serving this node already

Don't know if that will help.
I looked at the pg_listener layout, and the only fix I can think of is 
to check for the pid in the query.  This would only work from the DB and 
only if stats are on.  But assuming stats are on then the 
pg_stat_get_backend_idset function combined with the 
pg_stat_get_backend_pid function would tell you what PID's are current;y 
connected.  So, you could filter your listner list by this data, and get 
a more representative list of active listners.  But this would 
neccesitate a way to determine if stats are on, so you could just use 
pg_stat_get_backend_idset to see if any rows come back after an 
appropriate delay (I believe TOM said it can be as much as 500ms), 
because your own connection should always be there at minimum.

So, should I file this as a Bug should I submit a patch, or should I 
just stick my problem somewhere dark and dank so that it's never heard 
from again?  Enquiring minds want to know.

DeJuan Jackson wrote:

> I've been putting Slony-I 1.0.2 though it's paces so to speak and I 
> have a concern /question.
> select version();
>
>                                                 
> version                                                
> --------------------------------------------------------------------------------------------------------- 
>
> PostgreSQL 7.4.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.3 
> 20040412 (Red Hat Linux 3.3.3-7)
>
> When I do the ever faithful pull the power on the test box while 
> pgbench and replication is running,  once the box comes back up the 
> slon's (both source and destination) die with a FATAL message 
> "localListenThread: Another slon daemon is serving this node 
> already".  I tracked this down to a check in src/slon/local_listner.c  
> The error message only happens when a row exists in the 
> pg_catalog.pg_listener where relname = '_<clustername>_Restart'.
>
> I can clear the error up by issuing a NOTIFY "_<clustername>_Restart" 
> on both the source and the target, then issuing a kill -9 on the two 
> slon's that are running and the re-launching them (I've waited 
> approcimately 3 minutes with no response from the slon's and normal 
> kill doesn't work).  The NOTIFY get's rid of the old pg_listener 
> entries, the kill get's rid of the current entries, and the restart 
> prompts the new slon's to pick up where they left off before the 
> simulated outage.
>
> Need any more info?