Glyn Astill glynastill at yahoo.co.uk
Thu Oct 16 07:34:13 PDT 2014
> From: Dave Cramer <davecramer at gmail.com>
>To: slony <slony1-general at lists.slony.info> 
>Sent: Thursday, 16 October 2014, 11:19
>Subject: [Slony1-general] Lag time increasing but there are no events
> 
>
>
>I have a situation I can't explain.
>
>
>sl_status shows lag time increasing. num events is 0, and the data is being replicated.
>
>
>What exactly does lag_time represent ?
>
>

My considerably fuzzy understanding is that it's essentially the time difference between the timestamp for the last received event in sl_confirm and the current timestamp so (current_timestmp-st_last_received_event_ts)

I guess you've made sure the time is synchronized on all the nodes? That's the first thing I'd be checking as I'd expect to see a positive value in st_lag_num_events too if st_lag_time were going up.

Is there anything out of place in the slon logs and have you tried restarting the slon daemons?  There's also the lag_interval parameter to look at, I've no idea how sl_status would look if that were set, but any chance it is?


Personally I'd be digging into the sl_confirm and sl_event tables to look for some sort of clue if none of the above were fruitful.  Something like the below, which more likely than not isn't completely correct:

select e.ev_origin, c.con_received, e.ev_seqno, (c.con_timestamp-e.ev_timestamp)
from sl_event e inner join sl_confirm c on e.ev_seqno = con_seqno and e.ev_origin = c.con_origin
order by (c.con_timestamp-e.ev_timestamp) desc;


More information about the Slony1-general mailing list