[Slony1-general] Replication stopping

Fri Sep 14 08:27:54 PDT 2007

I put the SQL statement that you supplied and it returned no results.
That means no duplicate triggers?

BTW, I also let Slony replicate the data across to the slave db after a
10 second delay because I found anything else was not sufficient for the
replication to occur reliably.  Is that normal?

The slonik script I'm running to set up the replication is below (I
removed a bunch of tables and sequences to make this shorter):

======================== START ======================

#! /bin/bash

host=`hostname`
if [ $host = "CM" ]
then
   HOST_PRIME=${CM_eth2_CM2}
   HOST_BACKUP=${CM2_eth2_CM}
else
   HOST_PRIME=${CM2_eth2_CM}
   HOST_BACKUP=${CM_eth2_CM2}
fi
DB_PORT=5432

rm -f /root/slony_setup.log

echo "Starting Slony setup" `date` > /root/slony_setup.log

if [ x$1 = "xuninstall" ]
then
slonik <<_EOF_ 2>> /root/slony_setup.log 1>> /root/slony_setup.log
    cluster name = t1;
    node 1 admin conninfo = 'dbname=smsdb host=$HOST_PRIME port=$DB_PORT
user=slonysu password=pass';
    node 2 admin conninfo = 'dbname=smsdb host=$HOST_BACKUP
port=$DB_PORT user=slonysu password=pass';
    try {
        uninstall node (id = 1);
    }
    on error {
        echo 'Could not uninstall Slony-I on node 1';
        exit -1;
    }
	try {
        uninstall node (id = 2);
    }
    on error {
        echo 'Could not uninstall Slony-I on node 2';
        exit -1;
    }
    echo 'Slony-I successfully uninstalled on database smsdb';
_EOF_

if [ $? -ne 0 ]
then
    echo Errors were detected.  Please review /root/slony_setup.log.
Uninstall halted.
    exit -1
fi

exit 0
fi

rm -f ~/.pgpass

echo $HOST_PRIME":5432:*:slonysu:pass" >> ~/.pgpass
echo $HOST_BACKUP":5432:*:slonysu:pass" >> ~/.pgpass

chmod 600 ~/.pgpass

slonik <<_EOF_ 2>> /root/slony_setup.log 1>> /root/slony_setup.log
    cluster name = t1;
    node 1 admin conninfo = 'dbname=smsdb host=$HOST_PRIME port=$DB_PORT
user=slonysu password=pass';
    node 2 admin conninfo = 'dbname=smsdb host=$HOST_BACKUP
port=$DB_PORT user=slonysu password=pass';
    try {
        echo 'Initializing the cluster';
        init cluster (id = 1, comment = 'Node 1');
    }
    on error {
        echo 'Could not initialize the cluster!';
        exit -1;
    }
    echo 'Database cluster initialized as Node 1';
    try {
        create set (id = 1, origin = 1, comment = 'smsdb tables');
    }
    on error {
        echo 'Could not create subscription set!';
        exit -1;
    }
    echo 'Subscription set created';
    try {
        echo 'Adding tables to the subscription set';

        echo '  Adding table public.t_fl_beam...';
        set add table (set id = 1, origin = 1, id = 1, full qualified
name = 'public.t_fl_beam', comment = 'Table public.t_fl_beam');
        echo '    done';

<---- + A WHOLE OF BUNCH OF OTHER TABLES and SEQUENCES ---->

        echo '  Adding sequence public.t_sla_condition_set_id_seq...';
        set add sequence (set id = 1, origin = 1, id = 108, full
qualified name = 'public.t_sla_condition_set_id_seq', comment =
'Sequence public.t_sla_condition_set_id_seq');
        echo '    done';

    }
    on error {
        echo 'Could not add tables and sequences!';
        exit -1;
    }
    echo 'All tables added';
_EOF_

if [ $? -ne 0 ]
then
    echo Errors were detected.  Please review /root/slony_setup.log and
fix the errors.
    exit -1
fi

sleep 10

slonik <<_EOF_ 2>> /root/slony_setup.log 1>> /root/slony_setup.log
    cluster name = t1;
    node 1 admin conninfo = 'dbname=smsdb host=$HOST_PRIME port=$DB_PORT
user=slonysu password=pass';
    node 2 admin conninfo = 'dbname=smsdb host=$HOST_BACKUP
port=$DB_PORT user=slonysu password=pass';
    try {
        echo 'Storing node 2 - try 1';
        store node (id = 2, comment = 'Node 2');
    }
    on error {
        echo 'Could not create Node 2!';
        exit -1;
    }
_EOF_

slonik <<_EOF_ 2>> /root/slony_setup.log 1>> /root/slony_setup.log
    cluster name = t1;
    node 1 admin conninfo = 'dbname=smsdb host=$HOST_PRIME port=$DB_PORT
user=slonysu password=pass';
    node 2 admin conninfo = 'dbname=smsdb host=$HOST_BACKUP
port=$DB_PORT user=slonysu password=pass';

    try {
        drop node ( id = 2 );
        uninstall node ( id = 2 );
        echo 'Recreating node 2 - try 2';
        store node (id = 2, comment = 'Node 2');
    }
    on error {
        echo 'Could not create Node 2!';
    }    

    try {
        echo 'Creating store paths';
        store path (server = 1, client = 2, conninfo = 'dbname=smsdb
host=$HOST_PRIME port=$DB_PORT user=slonysu password=pass');
        store path (server = 2, client = 1, conninfo = 'dbname=smsdb
host=$HOST_BACKUP port=$DB_PORT user=slonysu password=pass');
    }
    on error {
        echo 'Could not create store paths!';
        exit -1;
    }
    echo 'Store paths created';
    try {
        echo 'Storing listen network';
        store listen (origin = 1, provider = 1, receiver = 2);
        store listen (origin = 2, provider = 2, receiver = 1);
    }
    on error {
        echo 'Could not store listen network!';
        exit -1;
    }
    echo 'listen network stored';
_EOF_

if [ $? -ne 0 ]
then
    echo Errors were detected.  Please review /root/slony_setup.log and
fix the errors.
    exit -1
fi

/usr/local/bin/slon_start 1 2>> /root/slony_setup.log 1>>
/root/slony_setup.log
/usr/local/bin/slon_start 2 2>> /root/slony_setup.log 1>>
/root/slony_setup.log

if [ $? -ne 0 ]
then
    echo Errors were detected.  Please review /root/slony_setup.log and
fix the errors.
    exit -1
else
#     /usr/local/bin/slon_start 1 2>/root/redundancy/slon-smsdb.err
>/root/redundancy/slon-smsdb.out
#     /usr/local/bin/slon_start 2 2>>/root/redundancy/slon-smsdb.err
>>/root/redundancy/slon-smsdb.out

#    slon t1 "user=slonysu host=$HOST_PRIME dbname=smsdb"
2>/root/slon-smsdb.err >/root/slon-smsdb.out &
    echo slon has been started on the master and slave and placed into
the background.
    echo It is logging STDOUT to /root/slon-smsdb.out and STDERR to
/root/slon-smsdb.err.
    echo
fi

slonik <<_EOF_ 2>> /root/slony_setup.log 1>> /root/slony_setup.log
    cluster name = t1;
    node 1 admin conninfo = 'dbname=smsdb host=$HOST_PRIME port=$DB_PORT
user=slonysu password=pass';
    node 2 admin conninfo = 'dbname=smsdb host=$HOST_BACKUP
port=$DB_PORT user=slonysu password=pass';

    try {
        subscribe set (id = 1, provider = 1, receiver = 2, forward =
no);
    }
    on error {
        echo 'Could not subscribe the set to the slaves';
        echo '-> Try again';
        try {
            echo 'Recreating node 2 - after subscribtion failure';
            store node (id = 2, comment = 'Node 2');
        }
        on error {
            echo 'Could not create Node 2!';
            exit -1;
        }
        try {
            subscribe set (id = 1, provider = 1, receiver = 2, forward =
no);
        }
        on error {
            exit -1;
        }
    }
    echo 'Database smsdb subscribed to slaves';
_EOF_

if [ $? -ne 0 ]
then
    echo Errors were detected.  Please review /root/slony_setup.log and
fix the errors.
    exit -1
fi

echo The installation has succeeded.  At this time the slaves should be
receiving the data
echo from the master.

======================== END ======================

-----Original Message-----
From: Christopher Browne [mailto:cbbrowne at ca.afilias.info] 
Sent: Friday, September 14, 2007 11:05 AM
To: Slawek Jarosz
Subject: Re: [Slony1-general] Replication stopping

"Slawek Jarosz" <Slawek.Jarosz at advantechamt.com> writes:

> Hello, first of all thanks for the quick response.
>
> I am pretty sure no one has added a trigger by hand.  The servers are 
> only controlled by me and I haven't done anything.  However the main 
> database that I'm working with has triggers on half of the tables.  
> These are BEFORE or AFTER triggers and these triggers cause another 
> table to be updated automatically (a log of all database actions 
> performed).

Well, that suggests that there shouldn't be lavish quantities of
hand-hacked stuff, at least :-).

> What puzzles me is that fact that replication is running great of a 
> long period of time (hours) before these issues can be seen.  When 
> they occur, I have to recreate the master and slave databases (BTW, 
> I'm using PostgreSQL 7.4.2), running the slonik scripts to create the 
> nodes, subscription sets and everything else before the replication 
> starts again.  slon_kill & slon_start does not fix the issue.  Is 
> there any other way to restart Slony?

It is likely that there was some more or less subtle thing broken in the
configuration that you didn't find.

slon_kill / slon_start will fix things like:
 - DB connections that have gotten mussed up due to network events
 - A slon that has gotten confused ABOUT ITS STATE (OOPS - KEYBOARD
WEIRDNESS JUST NOW...)

> When you mentioned that one of the tables has duplicate triggers on 
> it, how can I check which one?

select a.* from pg_trigger a, pg_trigger b where a.tgrelid = b.tgrelid
and a.tgname = b.tgname and a.oid <> b.oid;
--
output = ("cbbrowne" "@" "acm.org")
http://www3.sympatico.ca/cbbrowne/linux.html
                              MULTICS MAN!!!!
With his power ring PL-1, backed by the mighty resources of the powerful
H-6880, his faithful sidekick, the Fso Eagle, and his trusted gang: "The
System Daemons", he fights a never-ending battle for truth, security,
and the Honeywell Way!
-- T. Kenney