[Slony1-general] proper procedure for re-starting slony after replication slave reboots

Thu Feb 21 05:38:20 PST 2008

On Wed, Feb 20, 2008 at 02:55:32PM -0500, Geoffrey wrote:
> >1.  Somewhere, your application or some person got in and removed (or maybe
> >renamed and re-created) a table that was referenced by _something_ that was
> >still open.
> 
> The only tables that could possibly be removed would be temp tables.  I 
> assure you, none of the tables that are being replicated are being 
> removed by anyone.  The application is not designed that way.

Just because an application isn't designed to do something doesn't mean it
never does ;-)  Temp tables could indeed cause the message in question, but
_only if_ something was looking for that temp table.  (A temp table created
by a stored procedure without execute would fall into this case, for
instance, because the plans are cached.)

> >2.  Slony was dropped from the node without some set of your connections
> >having disconnected, and they're still expecting the triggers they can 
> >still
> >see to be able to write into that table.
> 
> Can you define 'dropped from the node?'

Somehow, that node stopped being a Slony replica, and so the Slony schema
was removed.  Someone attempted to insert something into a replicated table
(or delete something, or update something), and the trigger fired without
the underlying table into which to insert being there.  If someone had
superuser permission on the database, and was fooling with the underlying
Slony tables, for instance, all bets are off.  I have seen bigger messes
created by fat fingers.

> I simply don't understand how one table inparticular could get so far 
> out of sync.  We're talking 300 records.

Yes.  Note, however, that 300 records could be just a couple of SYNCs, if
the failure happened at just the right moment.

> I can't imagine that slony is that fragile.  There's got to be something 
> going on that we don't see.

I agree.

A