Thu Apr 22 17:26:28 PDT 2010
- Previous message: [Slony1-general] Slony 2.0.3 RPMs for RHEL5 are released
- Next message: [Slony1-general] [slony-general] error adding a table
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 4/22/2010 7:33 PM, Jaime Casanova wrote: > On Thu, Apr 22, 2010 at 12:58 AM, Jan Wieck <JanWieck at yahoo.com> wrote: > >> You may be able to fix things by reinserting that sl_subscribe row with >> sub_active = false, then restart the slon for node 2 and see how far that >> gets you. >> > > yes, that makes receiver start accepting events again... it's trying > to get upto date now... > thanx for your help... Jaime was so kind to provide me with a dump of the slony schema of node 2 and we were able to completely figure out what happened. The whole mess was started by using direct DDL against a subscriber under Slony 1.2.x. The attempted fix for this was to drop the table from the replication set via SET DROP TABLE, fix the table definitions and resubscribe it via a temp set. The subscription failed because of an inconsistency between the system catalog and the slony catalog on the subscriber. The exact steps after that are not 100% clear to me yet, but I think I understand them good enough to be able to reproduce them later down the road. The SUBSCRIBE SET is actually a two step operation. In the first step, the SUBSCRIBE_SET event causes the new subscriber and everyone in the path to create the sl_subscribe row, which causes all data forwarders to keep replication data until the new subscriber has confirmed it. The second step is an internal event, ENABLE_SUBSCRIPTION, that is generated automatically by the origin of the set and that kicks off the actual copy_set() call. That copy_set() failed due to the catalog inconsistency. What Jaime tried then was an UNSUBSCRIBE SET, which slonik issued against the half subscribed node 2, deleting the sl_subscribe row. The code in copy_set() doesn't use the parameters from the event, but expects the in memory runtime configuration data to know the data provider for the set. Since the sl_subscribe row is gone now, that information is missing and the -1 is the default value for a set, the node isn't subscribed to. I don't know exactly what the right fix for this bug is. My first gut feeling is to ignore the ENABLE_SUBSCRIPTION and generate another UNSUBSCRIBE_SET event just to clear out any sl_subscribe row existing in the cluster. Since I am in Toronto right now, I can discuss this with Steve Singer tomorrow morning. Thank you Jaime. Your patience on this matter helped to track down a very nasty bug that apparently had been lingering in the system for a long time. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin
- Previous message: [Slony1-general] Slony 2.0.3 RPMs for RHEL5 are released
- Next message: [Slony1-general] [slony-general] error adding a table
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list