[Slony1-general] Huge database remote sync issue. Ideas?

Fri Jun 22 13:18:47 PDT 2007

Howdy folks,

We're in the middle of a migration / upgrade, and I've got a giant slony 
set in place, and I get no errors on anything, and syncing starts up 
just great.  But something seems to be weird here:

2007-06-21 19:08:44 CDT FATAL  cleanupThread: "delete 
from "_replication".sl_log_1 where log_origin = '10' and log_xid 
< '757377'; delete from "_replication".sl_log_2 where log_origin = '10' 
and log_xid < '757377'; delete from "_replication".sl_seqlog where 
seql_origin = '10' and seql_ev_seqno < '2'; 
select "_replication".logswitch_finish(); " - server closed the 
connection unexpectedly

After it copies a huge amount, say 15-17GB of our 40-45GB total, the 
pace slows from about 300MB per minute to 5MB / minute, then to almost 
nothing.  The remote system we're mirroring to has an idle disconnect 
which is likely killing the connection in question, causing a giant 
rollback of current progress.  The FATAL error above, tells me it's 
doing a log switch on Node 10, which makes no sense, since Node 10 is a 
slave, and should have no events.  This is also the same error I get, 
every single time, even though the log_xid number itself may change.

So my questions:

1. Why is log switching on node 10, instead of node 1, which is 
providing the data?

2. Why is this mysterious log switch stalling the data copy, so our idle 
timer slaughters the initial table COPY commands mid-progress.

3. Is there some way the initial copy can *not* be an "all or nothing" 
proposition?  45GB seems an awfully huge first-bite, and it seems 
unfair that not a single error or disconnect may occur during the 
entire process of copying that much data.  Checkpoints?  Something? 
Maybe a configuration for a heartbeat, anything I missed?

4. Is it possible to somehow... bootstrap the mirror?  Make an exact 
data copy of the current database and have slony only copy updates 
after a certain point?  I mean, I could probably do a dump/restore and 
let slony keep everything up to date, before our systems launch the 
nightly insert jobs.

5. Something else I didn't consider?

Thanks in advance.  This is driving me nuts and I've scanned through 
various documentation without much luck.  We're working with our vendor 
to temporarily disable to idle kickoff, but there's a chance that may 
not be the issue, considering that weird error I pasted always having 
the same contents; I'd think the error would be different if it were 
just an idle disconnect.

-- 

Shaun Thomas
Database Administrator

Leapfrog Online 
807 Greenwood Street 
Evanston, IL 60201 
Tel. 847-440-8253
Fax. 847-570-5750
www.leapfrogonline.com