[Slony1-general] Strange behavior adding a new node, very, VERY slow

Tue Aug 5 00:50:50 PDT 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Eriksson a écrit :
> Sorry,
> 
> I should mention that this is Postgres 8.2.4, and Slony 1.2.14
> 
> Martin Eriksson wrote:
>> Hi everyone.
>>
>> I've been using slonly for a while now and feel pretty confident with
>> what im doing but I can not understand what is going now!
>>
>> current setup:
>> 1 Master
>> 2 slave1 (provider = 1)
>> 3 slave2 (provider = 1)
>>
>> adding a new node 4  (provider = 1)
>>
>> machines on same hardware, all machines are pretty nice machines, 8
>> gigs of ram in each machine
>> master got 6 gigs allocated to postgres, slave machines got 3.2 gigs
>> allocated. all running ubuntu 64 bit
>>
>> database is a total of 7.9 gigs (including the slony schema, total
>> data that need to be replicated around 3.5 gigs)
>>
>> master and slave 1 are sitting next to each other connected with a 1
>> GB/s line on a separate interface.
>>
>> now node 4, I created a new postgres installation on slave 1 machine,
>> running on different port same memory allocation (3.2 gigs) so total
>> usage of memory on that machine by the two postgres servers is 6.4 gig
>> (still 1.4 gig free)
>>
>> On saturday I did sync up node 2 from scratch and it toke a total of
>> 20 minutes.
>>
>> Sunday afternoon database was put in production and being used, its
>> not a overly used database around 18000, slony event per 24h with a
>> total of 2000-3000 db commits on Master per 24h
>>
>> So yesterday morning I started to sync node 4, and now 22h later it is
>> still running!!! and its only 1/3rd done!!!
>>
>> does anyone got a good explination for this?
>>
>> I look on the slave 2 machine, 0.2-0.4 load, memory is available, only
>> using a fraction of the bandwidth, io-stats are down. It is more or
>> less the same for the Master as low cpu load and low io load, and low
>> bandwidth usage.
>>
>> looking on the db, it appear that its trying to do EVERYTHING in a
>> single transaction as tables that have been copied are still showing
>> up as count(*) = 0, is there a way to not do everything in a single
>> transaction??
>>
>> or anyone got some other idea??
>>

Do you have any error messages ?
As you noticed, the first synchronisation is done in a sigle transaction.
That's why any failure (network failure, schema not exactly the same on both
nodes...) will interrupt replication and make it begin from scratch again and
again.

Further reading let me think it can't be a network trouble.
How did you get the schema for that new slave ?

A quick look at pg_stat_activity may tell you which table is been synchronized.

Regards,
- --
Stéphane Schildknecht
PostgreSQLFr : http://www.postgresql.fr

Venez nous rencontrer le 4 octobre lors du plus important événement
PostgreSQL francophone : http://www.pgday.fr

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFImAZaA+REPKWGI0ERAn1dAJ0VA5GY04W5Bl96pEk1GcuFHAkf2gCfQQdk
y12rN2fShxthch5cMtJn5Ek=
=qIRa
-----END PGP SIGNATURE-----