Tue Jun 17 11:08:04 PDT 2008
- Previous message: [Slony1-general] Upgrading from postgres 8.2.3 to 8.3.1
- Next message: [Slony1-general] Making Slony lazy? (= encouraging it to sync less frequently in bigger blocks)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I just committed a small fix to the remote worker. The bug was actually revealed after a change I made to the ducttape test #2. I added wait for event commands there in order to start subscribing node 3, which cascades from node 2, as soon as node 2 had finished its copy set. The problem was that node 3 as a "not subscribed to anything at all" node was listening on node 1 for events originating from node 1. That is fine under normal circumstances. However, in this specific setup the attempt is to subscribe a set, originating on 1, cascaded with node 2 as data provider which at this point is for sure lagging behind (it just started to catch up after the copy set). What happens is that the SUBSCRIBE_SET event originates on node 2 (data provider) and travels to node 1 (origin). There it causes the ENABLE_SUBSCRIPTION event to be generated. This event is received by node 3 "directly", which causes node 3 to wait and check in 5 second intervals if node 2 finally has caught up to at least that ENABLE_SUBSCRIPTION event. In that wait loop, it never processed any confirm forward messages, which were added to the end of the internal message loop. I changed a few things to make sure that confirm forward messages are kept at the head of the remote worker internal message queue. There have been repeated comments that wait for event does not work in connection with subscribe set. This bug may have been one, the other might be that people don't realize that subscribing to a set internally does create two events, and both need to be waited for in the right order. The correct sequence of slonik commands to wait for a subscribe is: subscribe set (...); wait for event (origin = <data provider>, confirmed = <set origin>, wait on = <set origin>, timeout = 0); sync (id = <set origin>); wait for event (origin = <set origin>, confirmed = <new subscriber>, wait on = <new subscriber>, timeout = 0); The first "wait for event" waits until the actual subscribe set command has been processed by the origin on the data set. The following "sync" command is necessary to update slonik's idea of what the last event sequence on the set origin is. The second "wait for event" now will wait until that very sync has been confirmed by the new subscriber, which means that it has finished not only the copy set, but also the very first sync operation thereafter. The "wait for event" has a timeout. In case of subscribe set operations, which are known to lead to hours or in some cases even days of lag, such timeout is for sure unwanted. It is disabled with timeout=0. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin
- Previous message: [Slony1-general] Upgrading from postgres 8.2.3 to 8.3.1
- Next message: [Slony1-general] Making Slony lazy? (= encouraging it to sync less frequently in bigger blocks)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list