Tue Jan 3 08:17:00 PST 2012
- Previous message: [Slony1-general] Slony-I in strange state after attempted unsubscribe/resubscribe
- Next message: [Slony1-general] Slony-I in strange state after attempted unsubscribe/resubscribe
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 12-01-03 05:49 AM, Martijn van Oosterhout wrote: > Hoi, What version of Slony are you using? What version of Postgresql are you using? > > We have a Slony-I setup in a slightly weird situation. What happened > was that the server did a huge delete (millions of rows) in a single > transaction which caused the replication to start to run behind. For > some reason in this state it takes forever to apply the change because > the query to find out what it needs to apply does a sort or something > because it doesn't want to apply the whole set at once. A single SYNC > takes 10 minutes. > I suspect you hit bug #167 (fixed in 2.1.0) that makes sync operations slow when sl_log_x gets big. > In any case, the way we fixed it before was to unsubscribe and > resubscribe the set, because resyncing the whole database is quicker > than waiting for the deletes to complete. However, this time it broke > in a new way. The result is that slony thinks it is properly > subscribed, but the database data has not been resynced, so you get > some bastard combination of old and new data. Logs below. > > Two questions: > 1. Is there a way we could have detected the unsubscribe failed > (slonik gave no error, but we didn't ask). If so, we can add that to > the procedure as something to check The proper procedure should be unsubscribe(id=1, receiver=2); wait for event(origin=2, confirmed=all, wait on=2); subscribe set(set id=1, provider=1,receiver=2,forward=yes); In 2.1 and above slonik should automatically perform the 'wait for event'. The slonik command 'unsubscribe set' gets submitted to the node being unsubscribed (node 2). If that command failed then slonik should have caught the error. > 2. Seems like a bug to me, but pilot error is not impossible. > > Thanks in advance, > > 2012-01-01 14:24:27 CETINFO cleanupThread: 0.002 seconds for cleanupEvent() > 2012-01-01 14:24:27 CETINFO cleanupThread: 0.006 seconds for vacuuming > 2012-01-01 14:30:55 CETINFO remoteWorkerThread_1: SYNC 5000990612 > done in 733.067 seconds > 2012-01-01 14:30:55 CETINFO remoteWorkerThread_1: syncing set 1 with > 50 table(s) from provider 1 > NOTICE: Slony-I: log switch to sl_log_2 complete - truncate sl_log_1 > CONTEXT: PL/pgSQL function "cleanupevent" line 99 at assignment > 2012-01-01 14:35:22 CETINFO cleanupThread: 0.066 seconds for cleanupEvent() > > --- unsubscribe happens here --- > 2012-01-01 14:37:58 CETCONFIG unsubscribeSet: sub_set=1 > 2012-01-01 14:37:58 CETCONFIG storeListen: li_origin=1 li_receiver=2 > li_provider=1 > --- this is the weird error --- > 2012-01-01 14:44:09 CETERROR remoteWorkerThread_1: "update > "_pp_config_rep".sl_setsync set ssy_seqno = '5000990613', > ssy_snapshot = '20587190:20590885:20587190', ssy_action_list = '' > where ssy_setid in (1) and ssy_seqno< '5000990613'; " ERROR: could > not serialize access due to concurrent update > 2012-01-01 14:44:09 CETERROR remoteWorkerThread_1: SYNC aborted If the receivers remoteWorkerThread_1 is aborted because of the unsubscribe then I don't see this as an issue. > 2012-01-01 14:44:21 CETCONFIG remoteWorkerThread_1: update provider > configuration > 2012-01-01 14:44:21 CETCONFIG remoteWorkerThread_1: helper thread for > provider 1 terminated > 2012-01-01 14:44:21 CETCONFIG remoteWorkerThread_1: disconnecting from > data provider 1 > 2012-01-01 14:44:21 CETCONFIG storeSubscribe: sub_set=1 sub_provider=1 > sub_forward='t' > 2012-01-01 14:44:21 CETCONFIG storeListen: li_origin=1 li_receiver=2 > li_provider=1 > 2012-01-01 14:44:21 CETCONFIG remoteWorkerThread_1: update provider > configuration > --- and here it appears slony thinks it is properly subscribed, but > the database has not been resynced --- What do you base this statement on? I do not see anything in the logs you pasted showing me that slon2 has received the SUBSCRIBE_SET or ENABLE_SUBSCRIPTION event. What is sl_subscribe on node 2 at this point in time? Does node 2 have a record in sl_event of it having processed those two events? > NOTICE: Slony-I: Logswitch to sl_log_1 initiated > CONTEXT: SQL statement "SELECT "_pp_config_rep".logswitch_start()" > PL/pgSQL function "cleanupevent" line 101 at PERFORM > 2012-01-01 14:46:29 CETINFO cleanupThread: 0.015 seconds for cleanupEvent() > >
- Previous message: [Slony1-general] Slony-I in strange state after attempted unsubscribe/resubscribe
- Next message: [Slony1-general] Slony-I in strange state after attempted unsubscribe/resubscribe
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list