Glyn Astill glynastill at yahoo.co.uk
Fri Sep 9 01:27:58 PDT 2011

> From: Steve Singer <ssinger at ca.afilias.info>
> On 11-09-08 11:43 AM, Glyn Astill wrote: 
>> 
>>       SELECT st_origin, st_received, st_lag_num_events, round(extract(epoch 
> from st_lag_time))
>>       FROM "<my_replication_cluster>".sl_status;
>> 
>>  A graph for the weeks leading up to and after the upgrade is attached.  I 
> upgraded on the night of the 25th/26th and ignoring any other downtime where I 
> was obviously fiddling with things, you can see the syncs going out after that 
> date.  As you can imagine, I'm massively embarrassed that it took me 3 
> months to notice it happening.
>> 
> 
> st_lag_time is a measure of the difference between now() and the last 
> unconfirmed event.  The pg_dump locks sl_event which prevents the SYNC's 
> from being created so there might not be any unconfirmed events to be measured 
> by this check.
> 
> 
> Sometime between 2.0.4 and 2.0.6 we fixed a bug that prevented SYNC events from 
> being generated from pure slaves. I suspect your check is now measuring the 
> other half of replication (if you do your select from sl_status you should see 
> at least two rows, it isn't clear if your graphing both of them or just 
> one).
> 
> If  now()-st_last_event_ts gets too high it means that SYNC events are not being 
> generated.  You might want to alert on both SYNC events not being generated and 
> events not being confirmed.
> 

Okay, you know better than me.  However I'm positive that when we were on 1.2 and I was in overnight our slaves were up to date whilst the backups were running, it's only circumstansial of course, but pretty sure I'd have noticed in 3 years if not as I'd query those slaves all the time.

I've excluded the slony scchema from the dump now, so we're all good anyway.  

Thanks
Glyn



More information about the Slony1-general mailing list