Marc G. Fournier marc
Tue Dec 20 00:56:47 PST 2005
'k, setting up monitoring, and the script is reporting 1 out of 3 nodes 
out of sync:

./check_slony_cluster.sh dns ams ams.hub.org
ERROR - 2 of 3 nodes not in sync

no problem, figured out in the script how it is being determined, and:

  st_received |    cfmdelay
-------------+-----------------
            2 | 00:00:00.010721
            3 | 03:59:55.2181
            4 | 00:00:00.125318
(3 rows)

wow ... 3 hours and 59 minutes where the other two (Node 4 is a remote 
server, somewhere in the US, while node 3 is the server beside the master) 
...

Now, I've checked Node 3, and it contains the same # of records as Node 1 
..

Now, I just did an update on one record in the table, and checked all 3 
slaves and they see the change, yet now I'm seeing:

  st_received |    cfmdelay
-------------+-----------------
            2 | 00:00:00.009916
            3 | 03:59:55.175099
            4 | 01:46:02.69134
(3 rows)

Node 4 just shot up ...

Looking at sl_status:

# select * from "_dns".sl_status;
  st_origin | st_received | st_last_event |      st_last_event_ts      | st_last_received |    st_last_received_ts     | st_last_received_event_ts  | st_lag_num_events |   st_lag_time 
-----------+-------------+---------------+----------------------------+------------------+----------------------------+----------------------------+-------------------+-----------------
          1 |           2 |           837 | 2005-12-19 20:52:23.576685 |              837 | 2005-12-19 20:52:23.589583 | 2005-12-19 20:52:23.576685 |                 0 | 00:00:06.669823
          1 |           3 |           837 | 2005-12-19 20:52:23.576685 |              837 | 2005-12-20 00:52:18.736552 | 2005-12-19 20:52:23.576685 |                 0 | 00:00:06.669823
          1 |           4 |           837 | 2005-12-19 20:52:23.576685 |              837 | 2005-12-19 22:36:25.514229 | 2005-12-19 20:52:23.576685 |                 0 | 00:00:06.669823

So, what is st_last_received_ts, and why isn't Node 3 updating it?  I've 
checked my slon_ams.out file on Node 3, and there are no errors being 
generated that I can see ... and replication appears to be working fine on 
all the Nodes ...

Somewhere else I need to be looking for this?


More information about the Slony1-general mailing list