Hendrik Woltersdorf hendrik.woltersdorf
Fri Sep 10 07:26:13 PDT 2004



Hi Ed,

I have some experiences with other replication systems (Sybase) and had the
same concerns about the initial sync. Once upon a time it took me a whole
weekend and 1,5GB of temporary disk space to copy 300MB of data from
Duesseldorf to Frankfurt.
But it's easy to build a slon deamon that does not do the initial delete
and copy (controlled by a command line option). That way I can do the sync
any way I like, subscribe without copying and restart the slon without the
"do not copy"  flag. It's also a work around for removing a table from a
set.

regards

Hendrik Woltersdorf
(source code for that hack is available)


                                                                           
             Christopher                                                   
             Browne                                                        
             <cbbrowne at ca.afil                                          An 
             ias.info>                  "Ed L." <pgsql at bluepolka.net>      
             Gesendet von:                                           Kopie 
             slony1-general-bo          Jan Wieck <JanWieck at Yahoo.com>,    
             unces at gborg.postg          slony1-general at gborg.postgresql.or 
             resql.org                  g                                  
                                                                     Thema 
                                        Re: [Slony1-general] Slony-I       
             09.09.2004 18:39           Capabilities . . . [auf Viren      
                                        geprueft]                          
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




"Ed L." <pgsql at bluepolka.net> writes:

> On Wednesday September 8 2004 5:29, Jan Wieck wrote:
>> On 9/8/2004 1:51 PM, Ed L. wrote:
>> > On Monday September 6 2004 4:20, Christopher Browne wrote:
>> >> And the real problem would most likely come in the initial "seeding."
>> >> When you provision a new subscriber, it has to take on _all_ of the
>> >> data then sitting at its provider, all in one transaction.  That
takes
>> >> a while, across a slow link, and if that link is not sufficiently
>> >> reliable, you might never get the first "sync."
>> >
>> > Our typical DB is around 10GB.  Do I understand correctly that the
>> > first seeding transfer will include all of the 10GB in one very large,
>> > very long transaction on the slave?  Any concerns about that much data
>> > going across?
>>
>> Why would that concern you?
>
> The only possible concern that comes to mind would be a very long
> transaction on the master/provider.  Sounds like you have no concerns
about
> the volume of data in the first sync?
>
>> > And the path traveled is from provider pgsql to provider slon to
>> > subscriber slon to subscriber pgsql?  Any concerns there about memory
>> > needs?  Or is it pipelined in transfer?
>>
>> The subscriber slon has a DB connection to the provider and the local
>> subscriber DB. It does "COPY foo TO stdout" on the provider and on the
>> subscriber DB "COPY foo FROM stdin", then it forwards the entire data in
>> chunks via PQgetCopyData(), PQputCopyData(). I didn't bother to
>> multithread that process.
>
> So, is it the case that the data for foo is ever completely buffered in
> either slon process?  In other words, to sync a 10GB table, roughly how
> much memory would the slon processes need?

At base, it requires 8192 bytes of memory, for that, as that is the
size of the "copybuf" variable in remote_worker.c.

Well, there's probably some more that libpq will consume, on both
sides, from whence I have some slon processes that are as big as 11MB
in size.

But certainly _nothing_ close to 10GB of data is needed.  "Snapshots"
on ERServer required holding all the data in RAM, which could, if
things went badly, cause you to need enough RAM to blow out the
largest that a Java JVM is allowed to get :-(.

To relieve that particular situation was one of the Slony-I "design
expectations."

That the "seeding" is one big transaction is something I'm not
_completely_ thrilled with.  There is a useful workaround, if this
proved troublesome in practice, in that you could:

- build a replica on a local host, over a fast 100BaseT or better
connection;
- take that database, stop it, and tar/bzip it;
- push the tarball across the 'somewhat flakey' connection;
- FTP can commonly do "continue transmission" if connections get
  dropped;
- finally, once the tarball gets to the destination, extract it, and
  reconfigure the Slony nodes to tell it where the _new_ location is.

Still another approach would be to split the "seeding" into multiple
sets, where you'd try to keep the sizes of the sets down, subscribe
them one by one, where each one might only be a couple GB of data, and
then merge them together once they're all subscribed.

If I had had a problem populating our "afar off" Slony-I replica, I
would have taken one or another of these approaches.  I didn't have a
problem, so the "do it all across the link as one big brute force
transaction" worked out OK.
--
let name="cbbrowne" and tld="ca.afilias.info" in String.concat "@"
[name;tld];;
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)
_______________________________________________
Slony1-general mailing list
Slony1-general at gborg.postgresql.org
http://gborg.postgresql.org/mailman/listinfo/slony1-general




More information about the Slony1-general mailing list