Gurjeet Singh singh.gurjeet at gmail.com
Wed May 12 09:44:52 PDT 2010
On Wed, May 12, 2010 at 10:56 AM, Jan Wieck <JanWieck at yahoo.com> wrote:

> On 5/12/2010 10:31 AM, Gurjeet Singh wrote:
>
>> Hi All,
>>
>>    I have two Slony test beds which show the exact same symptoms!
>>
>> select * from sl_event order by ev_seqno;
>>
>>  ev_origin |  ev_seqno  |        ev_timestamp        |        ev_snapshot
>>         | ev_type |
>>
>> -----------+------------+----------------------------+----------------------------+---------+-
>>         2 | 5000000002 | 2010-04-30 08:32:38.622928 | 458:458:
>>       | SYNC    |
>>         1 | 5000525721 | 2010-05-12 13:30:22.79626  | 72685915:72685915:
>>       | SYNC    |
>>         1 | 5000525722 | 2010-05-12 13:30:24.800943 | 72686139:72686139:
>>       | SYNC    |
>>         1 | 5000525723 | 2010-05-12 13:30:26.804862 | 72686224:72686224:
>>       | SYNC    |
>> ...
>>
>>
> Slony always keeps at least the last event per origin around. Otherwise the
> view sl_status would not work.
>
> What should worry you is that there are no newer SYNC events from node 2
> available. Slony does create a sporadic SYNC every now and then even if
> there is no activity or the node isn't an origin anyway.
>
> Is it possible that node 2's clock is way off?
>

# ssh root at 10.32.169.215 date; ssh root at 10.32.169.216 date
Wed May 12 16:38:20 UTC 2010
Wed May 12 16:38:20 UTC 2010

Above the difference of times on the two nodes; 215 has the origin and 216
has the subscriber. They seem to be perfectly in sync.

I think I forgot to paste the test_slony_state.pl output before. This is
waht raised the concern
<snip>
Node: 2 Confirmations not propagating from 2 to 1
================================================
Confirmations not propagating quickly in sl_confirm -

For origin node 2, receiver node 1, earliest propagated
confirmation has age 12 days > 00:30:00

Are slons running for both nodes?

Could listen paths be missing so that confirmations are not propagating?


Node: 2 Events not propagating to node 2
================================================
Events not propagating quickly in sl_event -
For origin node 2, earliest propagated event of age 12 days 00:01:00 >
00:30:00

Are slons running for both nodes?

Could listen paths be missing so that events are not propagating?
</snip>

And the path and listen configs:

system.db=# select * from sl_path;
 pa_server | pa_client |                    pa_conninfo                    |
pa_connretry
-----------+-----------+---------------------------------------------------+--------------
         2 |         1 | dbname=system.db host=10.32.169.216 user=postgres
|           10
         1 |         2 | dbname=system.db host=10.32.169.215 user=postgres
|           10
(2 rows)
system.db=# select * from sl_listen ;
 li_origin | li_provider | li_receiver
-----------+-------------+-------------
         2 |           2 |           1
         1 |           1 |           2
(2 rows)


Thanks and best regards,


>
>
> Jan
>
>  The reason I think this _might_ be a bug is that on both clusters, slave
>> node's sl_event has the exact same record for ev_seqno=5000000002 except for
>> the timestamp; same origin, and same snapshot!
>>
>> The head of sl_confirm has:
>>
>>  select * from sl_confirm order by con_seqno;
>>
>>  con_origin | con_received | con_seqno  |       con_timestamp
>> ------------+--------------+------------+----------------------------
>>          2 |            1 | 5000000002 | 2010-04-30 08:32:53.974021
>>          1 |            2 | 5000527075 | 2010-05-12 14:15:41.192279
>>          1 |            2 | 5000527076 | 2010-05-12 14:15:43.193607
>>          1 |            2 | 5000527077 | 2010-05-12 14:15:45.196291
>>          1 |            2 | 5000527078 | 2010-05-12 14:15:47.197005
>> ...
>>
>> Can someone comment on the health of the cluster? All events, except for
>> that on, are being confirmed and purged from the system regularly, so my
>> assumption is that the cluster is healthy and that the slave is in sync with
>> the master.
>>
>> Thanks in advance.
>> --
>> gurjeet.singh
>> @ EnterpriseDB - The Enterprise Postgres Company
>> http://www.enterprisedb.com
>>
>> singh.gurjeet@{ gmail | yahoo }.com
>> Twitter/Skype: singh_gurjeet
>>
>> Mail sent from my BlackLaptop device
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Slony1-general mailing list
>> Slony1-general at lists.slony.info
>> http://lists.slony.info/mailman/listinfo/slony1-general
>>
>
>
> --
> Anyone who trades liberty for security deserves neither
> liberty nor security. -- Benjamin Franklin
>



-- 
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.enterprisedb.com

singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.slony.info/pipermail/slony1-hackers/attachments/20100512/ba6812db/attachment-0001.htm 


More information about the Slony1-hackers mailing list