Christopher Browne cbbrowne at ca.afilias.info
Fri Aug 20 14:28:11 PDT 2010
Greg Sabino Mullane <greg at endpoint.com> writes:
> I just found a case where Slony was refusing to startup because of
> entries in the sl_nodelock table. Further investigation showed that an
> entry in sl_nodelock.nl_backendpid matched up to a *non* Slony
> backend[1] in Postgres (definitely non-Slony as it was an application
> level username and not 'postgres') Is Slony relying solely on the pid
> number here? I'm guessing that something killed Slony, and then some
> other process used that pid and was holding on to it when Slony was
> attempted to restart (which was a few hours later). Any other theories
> of what might have happened?

Hmm.  Yes, Slony is relying on the pid.  It's not checking for the user
name or such, as that's really not something it can determine.  You
could use different users, if you so wished.

Is it possible that there were a large number of processes generated
during those hours?  It isn't unusual for there to be ~32K entries in
the process table on Linux or such, so if something's spawning a lot of
processes, it wouldn't be difficult to have a duplicate after a few
hours.  Something forking processes per second would roll through a 32K
table in a little less than an hour.

Note that the *crucial* entry in sl_nodelock is the one with "nl_conncnt
= 0", as that's the one that is established to manage the local node,
and that's what'll cause the node to fall over.
-- 
select 'cbbrowne' || '@' || 'ca.afilias.info';
Christopher Browne
"Bother,"  said Pooh,  "Eeyore, ready  two photon  torpedoes  and lock
phasers on the Heffalump, Piglet, meet me in transporter room three"


More information about the Slony1-general mailing list