Jan Wieck JanWieck at Yahoo.com
Wed May 19 02:00:33 PDT 2010
On 5/19/2010 3:25 PM, Jeff wrote:
> This is going to sound very very odd and highly improbable.  We've had  
> a long time intermittent issue where a slave will get a primary key  
> violation on a replicated table.   Sometimes we go a few weeks without  
> it occurring, sometimes a few hours. Until now I hadn't been motivated  
> enough to really dig in (as it often liked to do so at inopportune  
> times).  This has happened under PG 8.2 and 8.4 (we're currently on  
> 8.4, both origin & slave) and on various slony1 versions (we're  
> currently on 1.2.17).
> 
> The code which is causing the problem is a plpgsql function fired from  
> a trigger.
> It basically does:
> begin;
> 	delete from thetable where id = v_id;
> 	insert into thetable (id, otherjunk) values (v_id, v_otherjunk);
> end;
> 
> "thetable" has a pk on the id column.
> 
> Now, for the evidence - this is pulled from sl_log_2 and while I can't  
> include log_cmddata here, here is the rest:
> 
>    log_xid   | log_actionseq | log_cmdtype
> ------------+---------------+-------------
>   1153890130 |    2800679119 | I
>   1153890130 |    2800679120 | D
>   1153890760 |    2800716473 | D
>   1153890760 |    2800716474 | I
>   1153919695 |    2800872885 | D
>   1153919695 |    2800880852 | I

I have no idea why that would happen. I will need to check if the 
trigger queue of 8.4 may do funky stuff here, like actually firing all 
those AFTER triggers when the stored procedure returns, rather than at 
the end of each command within your procedure.

Notice also that there is a 1000+ bump in the log_actionsequence for the 
two operations done by the last transaction. This suggests that somehow 
that sequence is configured for cacheing, which is a bad thing in general.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


More information about the Slony1-general mailing list