Fiel Cabral e4696wyoa63emq6w3250kiw60i45e1
Thu Feb 17 22:55:31 PST 2005
No, I put the WAIT FOR EVENT outside the TRY block.
This is the script that I run to perform switchover.

     1  cluster name = CLUSTERNAME;
     2  node 1 admin conninfo = 'dbname=DATABASE host=NODE1 user=slony
sslmode=require';
     3  node 2 admin conninfo = 'dbname=DATABASE host=NODE2 user=slony
sslmode=require';
     4  echo 'SWITCHOVER BEGIN';
     5  try {
     6    echo 'TRY: lock set';
     7    lock set (id = 1, origin = 1);
     8  }
     9  on success {
    10    echo 'SUCCESS: lock set';
    11  }
    12  on error {
    13    echo 'ERROR: lock set';
    14    exit 10;
    15  }
    16  echo 'wait for event';
    17  wait for event (origin = 1, confirmed = 2, timeout = 180);
    18  try {
    19    echo 'TRY: move set';
    20    move set (id = 1, old origin = 1, new origin = 2);
    21  }
    22  on success {
    23    echo 'SUCCESS: move set';
    24  }
    25  on error {
    26    echo 'ERROR: move set';
    27    unlock set (id = 1, origin = 1);
    28    exit 11;
    29  }
    30  echo 'wait for event';
    31  wait for event (origin = 1, confirmed = 2, timeout = 180);
    32  echo 'SWITCHOVER END';

While slonik is executing this script the following programs may or
may not be connected to PostgreSQL:
a. a C# GUI application (connected some of the time)
b. Tomcat servlet container (connected most the time)
c. another Java application (connected most of the time)
d. pg_dump (connects very  infrequently, once a night)
e. vacuumdb (connects once a night)


I've been testing this by doing switchovers 30 times and it hasn't
happened yet but slonik did hang the other day and the day before
that. When slonik hangs, subsequent attempts to run it with this
script cause it to output either "a MOVE SET operation is in progress"
or "a LOCK SET operation is in progress" or "the set is already
locked" (something similar).  While slonik was hung, I stopped Tomcat
and then half a minute later slonik was able to continue to
completion.

-Fiel Cabral


On Thu, 17 Feb 2005 17:15:01 -0500, Tim Goodaire
<tgoodair at ca.afilias.info> wrote:
> Are you doing the lock set and wait for event slonik commands within the
> same try block?
> 
> eg.
> 
> try {
> LOCK SET (ID = 1, ORIGIN = 1);
> WAIT FOR EVENT (ORIGIN = ALL, CONFIRMED = ALL, WAIT ON = 1);
> }
> 
> This won't work, because the lock set and wait for event slonik commands
> are part of the same transaction. It'll sit and wait forever.
> 
> If this isn't the problem, could you post your slonik scripts for us to
> take a look at?
> 
> Tim
> 
> On Wed, Feb 16, 2005 at 11:12:44AM -0500, Fiel Cabral wrote:
> > Initially:
> > Node 1 is the master. Node 2 is the slave.
> > There is only one replication set (set ID 1).
> >
> > Switchover is defined as:
> > LOCK SET 1.
> > WAIT FOR EVENT.
> > MOVE SET 1 from node 1 to node 2.
> > WAIT FOR EVENT.
> >
> > I run the switchover script on node 2 to effect a switchover.
> > After waiting for 1 hour I realize that switchover is stalled. I find
> > that if I stop and then start the database clients that are connected
> > to node 1, the switchover may complete successfully.
> >
> > What are the factors that cause switchover to stall? What are the
> > factors that help switchover to complete?
> >
> > Thank you for any guidance in advance.
> > -Fiel Cabral
> > _______________________________________________
> > Slony1-general mailing list
> > Slony1-general at gborg.postgresql.org
> > http://gborg.postgresql.org/mailman/listinfo/slony1-general
> 
> --
> Tim Goodaire    416-673-4126    tgoodair at ca.afilias.info
> Database Administrator, Afilias Canada Corp.
> 
> 
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at gborg.postgresql.org
> http://gborg.postgresql.org/mailman/listinfo/slony1-general
> 
> 
> 
>


More information about the Slony1-general mailing list