[Slony1-hackers] [Slony1-bugs] [Slony1-general] An old event not confirmed: A possible bug?

Wed May 19 19:58:41 PDT 2010

On 5/20/2010 9:35 AM, Cyril Scetbon wrote:
> 
> Jan Wieck a écrit :
>> On 5/12/2010 10:31 AM, Gurjeet Singh wrote:
>>   
>>> Hi All,
>>>
>>>     I have two Slony test beds which show the exact same symptoms!
>>>
>>> select * from sl_event order by ev_seqno;
>>>
>>>  ev_origin |  ev_seqno  |        ev_timestamp        |        
>>> ev_snapshot         | ev_type |
>>> -----------+------------+----------------------------+----------------------------+---------+-
>>>          2 | 5000000002 | 2010-04-30 08:32:38.622928 | 
>>> 458:458:                   | SYNC    |
>>>          1 | 5000525721 | 2010-05-12 13:30:22.79626  | 
>>> 72685915:72685915:         | SYNC    |
>>>          1 | 5000525722 | 2010-05-12 13:30:24.800943 | 
>>> 72686139:72686139:         | SYNC    |
>>>          1 | 5000525723 | 2010-05-12 13:30:26.804862 | 
>>> 72686224:72686224:         | SYNC    |
>>> ...
>>>
>>>     
>>
>> Slony always keeps at least the last event per origin around. Otherwise 
>> the view sl_status would not work.
>>   
> Hi Jan, Can you talk more about it ? I've posted a mail today to 
> slony1-bugs cause test_slony_state.pl is warning us about old events 
> (that's exactly the eldest ones). That's a matter for events generated 
> from the local node. I see events from the local node only when I 
> restart it :

I presume that you have set sync_interval_timeout to zero on the 
subscribers, which will prevent the generation of SYNC events on those 
nodes because no actual replication work is ever generated there. Looks 
like test_slony_state.pl depends on that parameter no be non-zero 
(default is -t 10000, meaning every 10 seconds).

Jan

> 
> select * from _OURCLUSTER.sl_event where 
> ev_origin=102;                                  
>  ev_origin | ev_seqno |        ev_timestamp        |     
> ev_snapshot      | ev_type | ev_data1 | ev_data2 | ev_data3 | ev_data4 | 
> ev_data5 | ev_data6 | ev_data7 | ev_data8
> -----------+----------+----------------------------+----------------------+---------+----------+----------+----------+----------+----------+----------+----------+----------
>        102 |       51 | 2010-05-20 12:27:00.099562 | 
> 338318875:338318875: | SYNC    |          |          |          
> |          |          |          |          |
> (1 row)
> 
> select * from _OURCLUSTER.sl_confirm where con_origin=102;
>  con_origin | con_received | con_seqno |       con_timestamp       
> ------------+--------------+-----------+----------------------------
>         102 |          101 |        51 | 2010-05-20 12:27:02.78581
>         102 |          103 |        51 | 2010-05-20 12:27:00.118815
>         102 |          104 |        51 | 2010-05-20 12:27:00.253975
> 
> the SYNC appears in slony logs as "new sl_action_seq 1 - SYNC %d"
> 
>> What should worry you is that there are no newer SYNC events from node 2 
>> available. Slony does create a sporadic SYNC every now and then even if 
>> there is no activity or the node isn't an origin anyway.
>>
>> Is it possible that node 2's clock is way off?
>>
>>
>> Jan
>>
>>   
>>> The reason I think this _might_ be a bug is that on both clusters, slave 
>>> node's sl_event has the exact same record for ev_seqno=5000000002 except 
>>> for the timestamp; same origin, and same snapshot!
>>>
>>> The head of sl_confirm has:
>>>
>>>  select * from sl_confirm order by con_seqno;
>>>
>>>  con_origin | con_received | con_seqno  |       con_timestamp
>>> ------------+--------------+------------+----------------------------
>>>           2 |            1 | 5000000002 | 2010-04-30 08:32:53.974021
>>>           1 |            2 | 5000527075 | 2010-05-12 14:15:41.192279
>>>           1 |            2 | 5000527076 | 2010-05-12 14:15:43.193607
>>>           1 |            2 | 5000527077 | 2010-05-12 14:15:45.196291
>>>           1 |            2 | 5000527078 | 2010-05-12 14:15:47.197005
>>> ...
>>>
>>> Can someone comment on the health of the cluster? All events, except for 
>>> that on, are being confirmed and purged from the system regularly, so my 
>>> assumption is that the cluster is healthy and that the slave is in sync 
>>> with the master.
>>>
>>> Thanks in advance.
>>> -- 
>>> gurjeet.singh
>>> @ EnterpriseDB - The Enterprise Postgres Company
>>> http://www.enterprisedb.com
>>>
>>> singh.gurjeet@{ gmail | yahoo }.com
>>> Twitter/Skype: singh_gurjeet
>>>
>>> Mail sent from my BlackLaptop device
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Slony1-general mailing list
>>> Slony1-general at lists.slony.info
>>> http://lists.slony.info/mailman/listinfo/slony1-general
>>>     
>>
>>
>>   
> 

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin