Tue Apr 28 05:44:13 PDT 2009
- Previous message: [Slony1-general] [Slony 2.0.1] Events and Confirmations not propagating
- Next message: [Slony1-general] [Slony 2.0.1] Events and Confirmations not propagating
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,
I found in slony log files that the worker has restarted the process on
each host in my configuration (1 provider, 3 receivers) at the same time :
2009-04-28 08:47:37 CEST INFO remoteWorkerThread_101: SYNC 4516360
sync_event timing: pqexec (s/count)- provider 0.001/1 - subscriber
0.035/1 - IUD 0.000/2
2009-04-28 08:47:38 CEST ERROR remoteListenThread_102: "select
ev_origin, ev_seqno, ev_timestamp, ev_snapshot,
"pg_catalog".txid_snapshot_xmin(ev_snapshot),
"pg_catalog".txid_snapshot_xmax(ev_snapshot), ev_type,
ev_data1, ev_data2, ev_data3, ev_data4, ev_data5,
ev_data6, ev_data7, ev_data8 from
"_pns_slony_voila_archi_0".sl_event e where (e.ev_origin = '102' and
e.ev_seqno > '29') or (e.ev_origin = '103' and e.ev_seqno > '62') order
by e.ev_origin, e.ev_seqno limit 40" - FATAL: terminating connection
due to administrator command
2009-04-28 08:47:38 CEST ERROR remoteListenThread_102: "select
con_origin, con_received, max(con_seqno) as con_seqno,
max(con_timestamp) as con_timestamp from
"_pns_slony_voila_archi_0".sl_confirm where con_received <> 104 group by
con_origin, con_received" 2009-04-28 08:47:38 CEST CONFIG slon: child
terminated status: 11; pid: 28152, current worker pid: 28152
2009-04-28 08:47:38 CEST CONFIG slon: restart of worker in 10 seconds
The origin of the first error is that postgresql has been restarted on
one receiver without stopping slon before. Is it a known source of errors ?
The slon process has been terminated with status=0 on the receiver where
postgresql has been restarted and has segfault (chid status=11) on the
others (each watchdog has restart a new slon process). This fact seems
to cause the error with events and confirmations.
Regards.
Cyril Scetbon wrote:
> A simple restart of all slon processes seems to have resolved the
> issue. weird ....
>
> Cyril Scetbon wrote:
>> I use test_slony_state and see some informations like :
>>
>> Check of event info
>> ---------------------------------------------------
>> Problem : Events not propagating to node 2
>> Problem : Events not propagating to node 4
>> Problem : Events not propagating to node 3
>>
>> Check of sl_confirm aging
>> ---------------------------------------------------
>> Confirmations not propagating from 2 to 1
>> Confirmations not propagating from 2 to 3
>> Confirmations not propagating from 2 to 4
>> Confirmations not propagating from 3 to 1
>> Confirmations not propagating from 3 to 2
>> Confirmations not propagating from 3 to 4
>> Confirmations not propagating from 4 to 1
>> Confirmations not propagating from 4 to 2
>> Confirmations not propagating from 4 to 3
>>
>> You can see the results on one of my databases :
>>
>> - for Confirmations
>>
>> select con_origin, con_received, min(con_seqno) as minseq,
>> max(con_seqno) as maxseq, date_trunc('minutes',
>> min(now()-con_timestamp)) as age1, date_trunc('minutes',
>> max(now()-con_timestamp)) as age2, min(now() - con_timestamp) >
>> '00:30:00' as tooold from _pns_slony_voila_archi_0.sl_confirm group
>> by con_origin, con_received order by con_origin, con_received;
>> con_origin | con_received | minseq | maxseq | age1 | age2 |
>> tooold
>> ------------+--------------+---------+---------+----------+----------+--------
>>
>> 101 | 102 | 4464029 | 4464792 | 00:00:00 | 00:16:00 | f
>> 101 | 103 | 4464027 | 4464792 | 00:00:00 | 00:16:00 | f
>> 101 | 104 | 4464024 | 4464792 | 00:00:00 | 00:16:00 | f
>> 102 | 101 | 29 | 29 | 03:39:00 | 03:39:00 | t
>> 102 | 103 | 29 | 29 | 03:39:00 | 03:39:00 | t
>> 102 | 104 | 29 | 29 | 03:39:00 | 03:39:00 | t
>> 103 | 101 | 62 | 62 | 03:39:00 | 03:39:00 | t
>> 103 | 102 | 62 | 62 | 03:39:00 | 03:39:00 | t
>> 103 | 104 | 62 | 62 | 03:39:00 | 03:39:00 | t
>> 104 | 101 | 57 | 57 | 03:39:00 | 03:39:00 | t
>> 104 | 102 | 57 | 57 | 03:39:00 | 03:39:00 | t
>> 104 | 103 | 57 | 57 | 03:39:00 | 03:39:00 | t
>>
>>
>> - for Events
>>
>> select ev_origin, min(ev_seqno), max(ev_seqno),
>> date_trunc('minutes', min(now() - ev_timestamp)),
>> date_trunc('minutes', max(now() - ev_timestamp)),
>> min(now() - ev_timestamp) > '00:30:00' as agehi
>> from _pns_slony_voila_archi_0.sl_event group by ev_origin;
>> ev_origin | min | max | date_trunc | date_trunc | agehi
>> -----------+---------+---------+------------+------------+-------
>> 103 | 62 | 62 | 03:49:00 | 03:49:00 | t
>> 104 | 57 | 57 | 03:49:00 | 03:49:00 | t
>> 102 | 29 | 29 | 03:49:00 | 03:49:00 | t
>> 101 | 4464493 | 4465346 | 00:00:00 | 00:14:00 | f
>>
>>
>> What can be the source of these errors and how to track them ? FYI, I
>> have logs but in debug level 1.
>>
>> I did not have issues in 1.2.15.
>>
>> Regards.
>
--
Cyril SCETBON
- Previous message: [Slony1-general] [Slony 2.0.1] Events and Confirmations not propagating
- Next message: [Slony1-general] [Slony 2.0.1] Events and Confirmations not propagating
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list