Fri Jun 22 14:46:56 PDT 2007
- Previous message: [Slony1-general] Huge database remote sync issue. Ideas?
- Next message: [Slony1-general] Huge database remote sync issue. Ideas?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Shaun Thomas wrote: > Howdy folks, > > We're in the middle of a migration / upgrade, and I've got a giant slony > set in place, and I get no errors on anything, and syncing starts up > just great. But something seems to be weird here: > > 2007-06-21 19:08:44 CDT FATAL cleanupThread: "delete > from "_replication".sl_log_1 where log_origin = '10' and log_xid > < '757377'; delete from "_replication".sl_log_2 where log_origin = '10' > and log_xid < '757377'; delete from "_replication".sl_seqlog where > seql_origin = '10' and seql_ev_seqno < '2'; > select "_replication".logswitch_finish(); " - server closed the > connection unexpectedly > > After it copies a huge amount, say 15-17GB of our 40-45GB total, the > pace slows from about 300MB per minute to 5MB / minute, then to almost > nothing. The remote system we're mirroring to has an idle disconnect > which is likely killing the connection in question, causing a giant > rollback of current progress. The FATAL error above, tells me it's > doing a log switch on Node 10, which makes no sense, since Node 10 is a > slave, and should have no events. This is also the same error I get, > every single time, even though the log_xid number itself may change. > > So my questions: > > 1. Why is log switching on node 10, instead of node 1, which is > providing the data? > That'll take place routinely on all nodes; slony needs to switch between sl_log_1 and sl_log_2 periodically, and does so on every node. There's something odd about the problem with logswitch_finish(); can you check your logs to see if the DBMS saw a Signal 11 or such at 19:08:44? If node 1 is the origin, that set of queries should trivially run quickly with no muss and fuss on node 10, as there shouldn't be *any* data in sl_log_1/2 on node 10. It seems as though there may be something funky happening at the network level, not particularly diagnosable (nor controllable) at the DBMS level... > 2. Why is this mysterious log switch stalling the data copy, so our idle > timer slaughters the initial table COPY commands mid-progress. > > The log switch shouldn't be having that effect; it doesn't make sense for it to break things. > 3. Is there some way the initial copy can *not* be an "all or nothing" > proposition? 45GB seems an awfully huge first-bite, and it seems > unfair that not a single error or disconnect may occur during the > entire process of copying that much data. Checkpoints? Something? > Maybe a configuration for a heartbeat, anything I missed? > > If you have multiple tables, you could set up a replication set per table, and subscribe one table at a time. In practice, you probably have five tables that are bigger than all the others put together; if you set up a set for each of those 5, and a set for "the rest," that's probably about as good as it can get. > 4. Is it possible to somehow... bootstrap the mirror? Make an exact > data copy of the current database and have slony only copy updates > after a certain point? I mean, I could probably do a dump/restore and > let slony keep everything up to date, before our systems launch the > nightly insert jobs. > > Jan's thinking about having a way to do this with Slony-I 2.x with PG 8.3; it's still a glimmer in the eye, at this point...
- Previous message: [Slony1-general] Huge database remote sync issue. Ideas?
- Next message: [Slony1-general] Huge database remote sync issue. Ideas?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list