Alan Hodgson ahodgson at simkin.ca
Tue Mar 27 09:49:43 PDT 2007
I recently upgraded to Slony 1.2, specifically 1.2.8.  1.1.5 was working 
fine, but I'm getting ready for 8.2, etc, and it seemed like 1.2 had 
settled down bug-wise. 

I am now having a fairly severe problem with DDL changes.

I have a relatively complex replication environment.   In the cluster that 
is having problems, I have the following:

Node 1 - master for sets 1,2,3
Node 2 - subscribes to sets 1,2,3 with forward = yes
Nodes 3-10 - subscribe to sets 1,2 from node 2
Nodes 11-12 - subscribe to sets 2,3 from node 2

I'm not sure how much of that is relevant.  I believe the issue is simply 
due to having multiple replication sets with different subscribers.

The problem that I am having is that DDL changes issues with:

  EXECUTE SCRIPT (SET ID = 1, FILENAME = 'ddl.sql', EVENT NODE = 1);

are failing on nodes 11 and 12 with errors like:

09:27:49.827654', '3676500785', '3676500786', '', 'DDL_SCRIPT', '1', 'alter 
table ppc_provider add column active boolean not null default true;
@400000004609486420df8cdc ', '-1'); insert into "_web_cluster".sl_confirm       
(con_origin, con_received, con_seqno, con_timestamp)    values (1, 
11, '147115', now()); commit transaction;" PGRES_FATAL_ERROR ERROR:  
Slony-I: alterTableForReplication(): Table "public"."geo_combined_data" is 
already in altered state
@400000004609486420dfac1c CONTEXT:  SQL 
statement "SELECT  "_web_cluster".alterTableForReplication( $1 )"
@400000004609486420dfbbbc PL/pgSQL function "ddlscript_complete_int" line 11 
at perform

... as in the following log:

@40000000460948641ba439ac 2007-03-27 09:37:46 PDT DEBUG2 
remoteListenThread_2: queue event 1,147115 DDL_SCRIPT
@40000000460948641bad5d84 2007-03-27 09:37:46 PDT DEBUG4 
remoteWorkerThread_1: update provider configuration
@40000000460948641baf844c 2007-03-27 09:37:46 PDT DEBUG1 
remoteWorkerThread_1: helper thread for provider 2 created
@40000000460948641bb0ebac 2007-03-27 09:37:46 PDT DEBUG4 
remoteWorkerThread_1: added active set 2 to provider 2
@40000000460948641bb131fc 2007-03-27 09:37:46 PDT DEBUG4 
remoteWorkerThread_1: added active set 3 to provider 2
@40000000460948641bb1707c 2007-03-27 09:37:46 PDT DEBUG2 
remoteWorkerThread_1: Received event 1,147115 DDL_SCRIPT
@40000000460948641bb2ace4 2007-03-27 09:37:46 PDT DEBUG4 
remoteHelperThread_1_2: waiting for work
@40000000460948641c405e54 2007-03-27 09:37:46 PDT CONFIG 
remoteWorkerThread_1: DDL request with 1 statements
@40000000460948641c4121a4 2007-03-27 09:37:46 PDT CONFIG 
remoteWorkerThread_1: DDL Statement 0: [alter table ppc_provider add column 
active boolean not null default true;]
@40000000460948641c57e5c4 2007-03-27 09:37:46 PDT DEBUG4 version 
for "dbname=Hitfarm user=slony host=node11.mydomain.com" is 80108
@40000000460948641dfdc5c4 2007-03-27 09:37:46 PDT DEBUG4 
remoteWorkerThread_7: update provider configuration
@40000000460948641e218a04 2007-03-27 09:37:46 PDT DEBUG4 version 
for "dbname=Hitfarm user=slony host=node11.mydomain.com" is 80108
@40000000460948641e6c4ea4 2007-03-27 09:37:46 PDT DEBUG4 
remoteWorkerThread_8: update provider configuration
@40000000460948641f25e65c 2007-03-27 09:37:46 PDT CONFIG DDL success - 
PGRES_COMMAND_OK
@400000004609486420df36ec 2007-03-27 09:37:46 PDT ERROR  
remoteWorkerThread_1: "select "_web_cluster".ddlScript_complete_int(1, -1); 
notify "_web_cluster_Event"; notify "_web_cluster_Confirm"; insert 
into "_web_cluster".sl_event     (ev_origin, ev_seqno, ev_timestamp,      
ev_minxid, ev_maxxid, ev_xip, ev_type , ev_data1, ev_data2, ev_data3    ) 
values ('1', '147115', '2007-03-27 
09:27:49.827654', '3676500785', '3676500786', '', 'DDL_SCRIPT', '1', 'alter 
table table_in_set_1 add column active boolean not null default true;
@400000004609486420df8cdc ', '-1'); insert into "_web_cluster".sl_confirm       
(con_origin, con_received, con_seqno, con_timestamp)    values (1, 
11, '147115', now()); commit transaction;" PGRES_FATAL_ERROR ERROR:  
Slony-I: alterTableForReplication(): Table "public"."geo_combined_data" is 
already in altered state
@400000004609486420dfac1c CONTEXT:  SQL 
statement "SELECT  "_web_cluster".alterTableForReplication( $1 )"
@400000004609486420dfbbbc PL/pgSQL function "ddlscript_complete_int" line 11 
at perform
@400000004609486420dffa3c 2007-03-27 09:37:46 PDT DEBUG2 slon_retry() from 
pid=11132
@400000004609486420e07f0c 2007-03-27 09:37:46 PDT DEBUG1 slon: retry 
requested
@400000004609486420e0fff4 2007-03-27 09:37:46 PDT DEBUG2 slon: notify worker 
process to shutdown


"public"."geo_combined_data", the table giving the error, is in set 2.  The 
DDL change is only to set 1.  The node that is failing is not subscribed to 
set 1, but is subscribed to sets 2 and 3.  I rebuilt node 11 from scratch a 
few minutes ago to see if it was due to any legacy issues, but it seems to 
recur quite predictably.

The events are processing correctly on node 2, which is subscribed to all 3 
sets, and also any node subscribed to set 1 and 2 but not set 3.

I would appreciate any assistance resolving this.   Using Slony 1.1, these 
changes would not have even replicated out to these nodes (I had a separate 
script to update them), but now it's just killing my replication until I 
can delete the DDL events and process them manually.

-- 
"Bad laws are the worst sort of tyranny." -- Edmund Burke (1729-1797)


More information about the Slony1-general mailing list