Jeff Frost jeff at frostconsultingllc.com
Thu Sep 20 08:04:23 PDT 2007
On Thu, 20 Sep 2007, St=E9phane Schildknecht wrote:

>> --------
>> CREATE TABLE orders.amazon_items
>> (
>>   id serial NOT NULL,
>>   order_id integer NOT NULL,
>>   item_id integer NOT NULL,
>>   amazon_item_id character varying(14) NOT NULL,
>>   CONSTRAINT amazon_items_pkey PRIMARY KEY (id),
>>   CONSTRAINT amazon_items_order_id_fkey FOREIGN KEY (order_id)
>>       REFERENCES orders.orders (id) MATCH SIMPLE
>>       ON UPDATE NO ACTION ON DELETE NO ACTION,
>>   CONSTRAINT amazon_items_item_id_fkey FOREIGN KEY (item_id)
>>       REFERENCES orders.items (id) MATCH SIMPLE
>>       ON UPDATE NO ACTION ON DELETE NO ACTION
>> )
>> WITH OIDS;
>> ALTER TABLE orders.amazon_items OWNER TO thenerds;
>> --------
>>
>> It was called by the following slonik script:
>>
>> --------
>> #!/usr/bin/slonik
>> include </nerds/preamble.slonik>;
>>
>>         EXECUTE SCRIPT (
>>                 SET ID =3D 1,
>>                 FILENAME =3D '/nerds/thenerds.sql',
>>                 EVENT NODE =3D 1
>>         );
>> --------
>>
>> and caused the following deadlock to occur:
>>
>> 15:27:54 sql1 slon[12252]: [39-1] 2007-09-18 15:27:54 EDT ERROR
>> remoteWorkerThread_1: "select "_nerdcluster".ddlScript
>> _complete_int(1, -1); " PGRES_FATAL_ERROR
>> Sep 18 15:27:54 sql1 slon[12252]: [39-2]  ERROR:  deadlock detected
>> Sep 18 15:27:54 sql1 slon[12252]: [39-3] DETAIL:  Process 12263 waits for
>> AccessExclusiveLock on relation 121589880 of databas
>> e 121589046; blocked by process 12096.
>> Sep 18 15:27:54 sql1 slon[12252]: [39-4] Process 12096 waits for
>> AccessShareLock on relation 121589817 of database 121589046;
>> blocked by process 12263.
>>
>> Which then left the some of the tables on that slave in a bad state
>> breaking replication:
>>
>> 2007-09-18 15:56:06 EDT ERROR  remoteWorkerThread_1: "select
>> "_nerdcluster".ddlScript_prepare_int(1, -1); " PGRES_FATAL_ERROR ERROR:
>> Slony-I: alterTableRestore(): Table "public"."carts" is not in altered
>> state
>> CONTEXT:  SQL statement "SELECT  "_nerdcluster".alterTableRestore( $1 )"
>> PL/pgSQL function "ddlscript_prepare_int" line 46 at perform
>>
>> Note that it's just an AccessShareLock that's killing us.  Looks like
>> that's caused by a select query which does searches.  Our application
>> does not produce any extraneous locking, it simply does SELECTS on
>> that server.
>>
>> Interestingly, before we started using the slave for queries, we would
>> have the deadlocks happen on the master when doing DDL changes, but
>> this never caused the tables on the master to get into a bad state.
>> You could just re-run your EXECUTE SCRIPT and it would usually work
>> fine the second time.
>>
>> What other info can I provide?
>>
> First, I don't think you have to use the execute script command to
> create a new table.
>
> I'd also like to know if you use pgpool, on slave, master, both ?
>
> Does the re-executing of the script offer any solution ?

You're quite right and I think we'll start just executing the create tables=
 on =

all nodes independently without execute script, but the same problem happen=
s =

with ALTER TABLE as well, this just happened to be the latest example that =

caused a problem.

We are using pgpool in master/slave mode.  So, pgpool is load balancing =

between node 1 and node 2.  Node 3 is not in the pgpool cluster.

Unfortunately, since the first run of the script leaves node 2 in a bad sta=
te =

and breaks replication, running the script again doesn't help.  So far I've =

had to node 2 from replication and re-subscribe to fix the problem.

-- =

Jeff Frost, Owner 	<jeff at frostconsultingllc.com>
Frost Consulting, LLC 	http://www.frostconsultingllc.com/
Phone: 650-780-7908	FAX: 650-649-1954


More information about the Slony1-general mailing list