Wed Sep 19 01:39:24 PDT 2007
- Previous message: [Slony1-general] bug in deadlock handling?
- Next message: [Slony1-general] bug in deadlock handling?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Jeff Frost a écrit : > On Tue, 18 Sep 2007, Jeff Frost wrote: > >> Hi guys, I've got an interesting situation on a slony 1.2.10 3 node >> cluster. Both slaves get their data direct from the master. >> Everything has been running well up to a few days ago. Now every >> time we try to add a new table to the cluster, we end up with the >> following error: >> >> 2007-09-18 15:56:06 EDT ERROR remoteWorkerThread_1: "select >> "_nerdcluster".ddlScript_prepare_int(1, -1); " PGRES_FATAL_ERROR >> ERROR: Slony-I: alterTableRestore(): Table "public"."carts" is not in >> altered state >> CONTEXT: SQL statement "SELECT "_nerdcluster".alterTableRestore( $1 )" >> PL/pgSQL function "ddlscript_prepare_int" line 46 at perform >> >> It looks like the problem is being caused by a deadlock: >> >> 15:27:54 sql1 slon[12252]: [39-1] 2007-09-18 15:27:54 EDT ERROR >> remoteWorkerThread_1: "select "_nerdcluster".ddlScript >> _complete_int(1, -1); " PGRES_FATAL_ERROR >> Sep 18 15:27:54 sql1 slon[12252]: [39-2] ERROR: deadlock detected >> Sep 18 15:27:54 sql1 slon[12252]: [39-3] DETAIL: Process 12263 waits >> for AccessExclusiveLock on relation 121589880 of databas >> e 121589046; blocked by process 12096. >> Sep 18 15:27:54 sql1 slon[12252]: [39-4] Process 12096 waits for >> AccessShareLock on relation 121589817 of database 121589046; >> blocked by process 12263. >> >> So, my theory is that the execute script alters the tables back to >> their normal states, doesn't get all the locks it wants and bails out >> without putting them back to their previously altered state, thus >> breaking replication. >> >> So, is there a reasonable way to fix this without >> droppping/resubscribing the node? > > BTW, this is a quiet error...i.e. slonik does not indicate anything > untoward happened. Also, the problem only occurs on node 2 and not on > node 3. This isn't surprising as there's no activity on node 3 it's > just a warm standby, but node 2 is actively processing queries. > Hi, I encoutered the same problem and came to the conclusion pgpool was the cause of that problem. I do have 5 nodes in the replication. The databases on two of them are accessed through pgpool. One of the two is a spare one, not accessed under normal conditions, the other one is hugely accessed. When trying to propagate a ddl script through a slonik "EXECUTE SCRIPT" command, master and spare slave are correctly modified, the production one falls into error producing the same messages about altered state. The only way I found to repair that node was then to drop all slony things in it (uninstall it) and then reinitialize it. As full subscription is a more-than-three-hour process, I now stop pgpool before I do the execute script. The last "execute script" I ran did not fall into error. I wonder how I could stop pgpool without preventing application to connect to the database, and it seemed a good procedure to access the pgpool or the database by tunneling the access. I mean pg listens on 5432, pgpool on 5433, and the application connects through 5434. I then juste have to modify the tunnel to bypass pgpool. It seems to work, that way. I just wonder if it really a good solution. Do you have any clue on such a way of processing ? Best regards, -- Stéphane SCHILDKNECHT Président de PostgreSQLFr http://www.postgresqlfr.org
- Previous message: [Slony1-general] bug in deadlock handling?
- Next message: [Slony1-general] bug in deadlock handling?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list