[Slony1-general] Second part "Slony stop after 9 hours"

Fri Sep 15 21:00:36 PDT 2006

>  In the first message I forgot to put another log:
> 2006-09-14 18:41:44 [12395] ERROR:  duplicate key violates unique
> constraint
> "trusted_chave_key"

There is no way for us to diagnose whether or not this represents any sort
of replication problem.

If this error message was seen on the origin node, that may merely
indicate that your application tried to insert invalid data that was
rejected by a UNIQUE constraint on one of your tables.

But I cannot tell; you did not explain what node encountered that message.

>  Below there is the original message:
>
> ---------- Forwarded message ----------
> From: Andrew And <andcop2006 at gmail.com>
> Date: 15/09/2006 23:11
> Subject: Slony stop after 9 hours
> To: Slony1-general at gborg.postgresql.org
>
>
>
>  After some time my SLONY master stop to  replicate table, and SLAVE
> don??t
> receive new datas.
>
>  In syslog file /var/log/postgresql/postgres.log I have:
> 1- When slony is ok:
> 2006-09-15 11:06:55 [1967] LOG:  connection received: host=X.Y.Z.W
> port=1032
> 2006-09-15 11:06:55 [1967] LOG:  connection authorized: user=postgres
> database=trusted
>
> 2 - After 9 hours slony stop:
> 2006-09-15 20:03:54 [4006] LOG:  connection received: host=[local] port=
> 2006-09-15 20:03:54 [4006] LOG:  connection authorized: user=postgres
> database=trusted
>  In (2) I use "ps -aux" and the output show me that SLONY is not down.
>
>
>  I am thinking that SLONY lost information about host and port after 9
> hours. In this situationI need to restart Slony in maste, butitis not
> good.

Some of that doesn't make very much sense.  Part of the problem is that
you are not being nearly precise enough about where the problems are
occurring.  Slony is a replication system; when problems occur, they are
always associated with some specific component.

Successful diagnosis problems in a distributed application that is
comprised by numerous components requires fairly excruciating precision as
to what problems were observed where.

>  What can I do in this situation?   What variables do I need to use? and
> where should I put the variables?

There isn't any real place where Slony-I uses anything that can be well
characterized as variables.  One of the somewhat unusual features of the
Slonik language is that, unlike in most programming languages, it doesn't
have variables.  It doesn't have loops.  It has a *very* limited form of
"conditional" expression.

It is quite possible that something funny happened on your network,
cutting off one or another of the database connections.  If that is the
case, stopping and restarting the slon process for each node in your
cluster might help clear things up.  It isn't likely to worsen things.  In
the early days of Slony-I, "restart the slons" was the simple answer to
numerous cases where things got stuck.

If that doesn't seem to help, you should consult the logs generated by
each slon process, looking particularly for ERROR/FATAL messages, that
should give some clue as to what might be going wrong.  Sometimes changing
debug levels (the -d option) is helpful; I usually run slon processes at
debug level "-d 2".

Virtually all messages you can possibly encounter are listed, with at
least some explanation, here...
<http://linuxfinances.info/info/loganalysis.html>

Error messages in database logs may also be of some assistance, as long as
you can distinguish relevant things (e.g. - messages raised by Slony-I
activity) from those likely to be irrelevant (e.g. - messages raised by
your own applications).