Cyril SCETBON cscetbon.ext at orange-ftgroup.com
Mon Sep 10 13:33:17 PDT 2007

Cyril SCETBON wrote:
>
>
> Jan Wieck wrote:
>> On 9/7/2007 9:36 AM, Cyril SCETBON wrote:
>>> Hi,
>>>
>>> I got this configuration                Node1 --> Node2 (5 seconds 
>>> late)
>>>                                                           |
>>>                                                           --> Node3 
>>> (2 hours late)
>>>
>>> Node2 is processing each SYNC from Node3 and Node2, but Node3 is 
>>> processing each SYNC from Node2 but not from Node1 which is the 
>>> origin of the sets :
>>>
>>> On Node3 we see  `grep processing 
>>> /var/log/slony1/node3-pns_profiles_preprod.log|awk '{print 
>>> $5}'|sort|uniq -c`
>>>      19 remoteWorkerThread_1:
>>>     963 remoteWorkerThread_2:
>>>
>>> On Node2 we see `grep processing 
>>> /var/log/slony1/node2-pns_profiles_preprod.log |awk '{print 
>>> $5}'|sort|uniq -c`
>>>    1570 remoteWorkerThread_1:
>>>     865 remoteWorkerThread_3:
>>>
>>> Why is there so many SYNC not processed on Node3 ???
>>>
>>> Node3 got 22440 queue event and 25 Received event from 
>>> remoteWorkerThread_1, while Node2 got 4467 queue event and 1578 
>>> Received event from the same worker.
>>>
>>> Is there something to do ?
>>
>> How about looking for some error messages?
> None.
I've put slon in debug level 2
>>
>> What comes to mind would be that sl_event is grossly out of shape and 
>> that the event selection times out.
> Seems vacuuming sl_log_1 takes too much time cause of 
> vacuum_cost_delay and that selecting from this table use a seq scan. 
> I'm investiguating.
I forced vacuum to go faster and checked slon logs of subscribers. They 
got similar disks capabilities which seems to be the bottleneck on all 
node (wait io ~=50% in vmstat).

I found replication tasks time are different :

On node 3 :
                     delay in seconds = 585.974ms
                     cleanupEvent in seconds = 9.25167s

On node 2 :
                     delay in seconds = 37.6463ms
                     cleanupEvent in seconds = 0.203265s

May these times explain why node 3 is late compared to node 2 ? What do 
you think I have to investiguate now ?

PS: hosts consume the same processor load but node 2 is a biprocessor 
2.6Ghz and node 3 is a biprocessor dual core 1.8Ghz (4 processors seen 
by Linux kernel SMP)
>
> Regards.
>>
>>
>> Jan
>>
>

-- 
Cyril SCETBON - Ingénieur bases de données
AUSY pour France Télécom - SCR/HDI/DOP/HEBEX

Tél : +33 (0)4 97 12 87 60
Jabber : cscetbon at jabber.org
France Telecom - Orange
790 Avenue du Docteur Maurice Donat 
Bâtiment Marco Polo C2 
06250 Mougins
France

***********************************
Ce message et toutes les pieces jointes (ci-apres le 'message') sont
confidentiels et etablis a l'intention exclusive de ses destinataires.
Toute utilisation ou diffusion non autorisee est interdite.
Tout message electronique est susceptible d'alteration. Le Groupe France
Telecom decline toute responsabilite au titre de ce message s'il a ete
altere, deforme ou falsifie.
Si vous n'etes pas destinataire de ce message, merci de le detruire
immediatement et d'avertir l'expediteur.
***********************************
This message and any attachments (the 'message') are confidential and
intended solely for the addressees.
Any unauthorised use or dissemination is prohibited.
Messages are susceptible to alteration. France Telecom Group shall not be
liable for the message if altered, changed or falsified.
If you are not recipient of this message, please cancel it immediately and
inform the sender.
************************************



More information about the Slony1-general mailing list