Karl Denninger karl at denninger.net
Sat Aug 22 21:53:51 PDT 2009
Here's what I attempted:

1. Set up new version of Postgresql (in this case, 8.4) against a
running 8.3 release.

2. Add the new database engine as a client of the old on a different
port (in this case, 55432.)  This worked, and the databases replicated.

3. Shut down the client software and wait for the syncs to propagate.

4. Move the origin node from 2 (the old master) to 4 (the new master)
with a "lock set (id=3D1 ....)" and then a "move set (id=3D1, old origin=3D=
2,
new origin =3D 4);.  This executed properly; a quick check showed that
Node 2 no longer had write access, but node 4 did.

5. Shut down both the old and new Postgres copies, rename the
directories, and restart node 4 as the master and 2 as the slave (swap
their ports in the postgresql.conf file)

Ok, so now we're good to go.  Well, kinda.

Node 4 came up fine as athe master.

But on the other clients slony instantly pitches a fit and claims that
the ID of the master is incorrect (which is, by the way, correct - the
connection now points to node 4 instead of node 2.)  I thought I could
get around this with a pair of "store path" statements to change the
node Ids for each client doing this:

include <slonik.preamble>;
store path (server=3D1, client=3D4, conninfo=3D'dbname=3Dticker
host=3Dgenesis.denninger.net user=3Dslony port=3D55432 password=3Dxxxxx');
store path (server=3D 4, client=3D1, conninfo=3D'dbname=3Dticker
host=3Dcolo1.denninger.net user=3Dslony port=3D5432 password=3Dxxxxx');

That logs a bunch of messages for provider configuration updates in the
syslog as expected when it is executed.  The slonik.preamble contains:

CLUSTER NAME =3D tickerforum;
node 4 admin conninfo=3D'dbname=3Dticker host=3Dcolo1.denninger.net user=3D=
slony
password=3Dxxxx port=3D5432';
node 1 admin conninfo=3D'dbname=3Dticker host=3Dlocalhost user=3Dslony port=
=3D5432
password=3Dxxxx';

(The divergence between the path stored into the dbms and the admin info
is due to the fact that the OTHER nodes "See" this machine - node 1 -
through a firewall that redirects ports, but to connect locally on this
machine I can't "bounce" off from the inside.  This is "good" and has
worked to set it up and replicate....)

No dice; I get lots of these:

Aug 22 23:40:08 dbms slon[25559]: [1830-1] CONFIG version for
"dbname=3Dticker host=3Dcolo1.denninger.net user=3Dslony port=3D5432
password=3Dxxxx" is 80400
Aug 22 23:40:08 dbms slon[25559]: [1831-1] ERROR  remoteListenThread_2:
db_getLocalNodeId() returned 4 - wrong database?
Aug 22 23:40:19 dbms slon[25559]: [1832-1] CONFIG version for
"dbname=3Dticker host=3Dcolo1.denninger.net user=3Dslony port=3D5432
password=3Dxxxx" is 80400
Aug 22 23:40:19 dbms slon[25559]: [1833-1] ERROR  remoteListenThread_2:
db_getLocalNodeId() returned 4 - wrong database?

Over and over again....

It looks like the path statement was ignored and attempting to update it
does nothing.

The only way I was able to get replication going again was to drop the
node.  That, however, while it executed, did not clear the schema (!)
nor did it stop the slon process from trying to connect; I had to go in
and drop the slony schema from each database, then re-add the node to
the replication set.  This worked, BUT it also forced a copy of the
entire freaking database!

I'm wondering what happened here.  It is almost as if the "move set"
never executed on the other subscribers - an impossibility, no?  They
WERE all replicating and current just before the shutdown - I checked
them all.  How does that happen under these circumstances?

Is there a better way for the future?  I'm back up now, but the entire
point of this exercise was to AVOID having to copy the entire database
over - while I avoided any material downtime for my users, I was left
EXPOSED to a failure for the copy period, which was kinda nasty.

Thoughts appreciated.

-- Karl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: karl.vcf
Type: text/x-vcard
Size: 265 bytes
Desc: not available
Url : http://lists.slony.info/pipermail/slony1-general/attachments/20090822=
/7d76fc15/karl.vcf


More information about the Slony1-general mailing list