Jason Chen yunfeng82 at gmail.com
Fri Oct 29 08:12:52 PDT 2010
That is correct. In the error node, the master node cannot get STORE_PATH
event and cannot start remoteListen and remoteWorker threads.

Below is the event table for the error node.
*[root at slony-r1s1-001 ~]# psql -U postgres system.db -c "select * from
_slony.sl_event where ev_seqno > 5000000078";
* ev_origin |  ev_seqno  |        ev_timestamp        | ev_snapshot |
ev_type       | ev_data1 | ev_data2 |
ev_data3
        | ev_data4 | ev_data5 | ev_data6 | ev_data7 | ev_data8
-----------+------------+----------------------------+-------------+---------------------+----------+----------+-----------------------------------------------------
--------+----------+----------+----------+----------+----------
         1 | 5000000079 | 2010-10-29 16:17:23.546814 | 888:888:    |
SYNC                |          |
|
        |          |          |          |          |
         1 | 5000000080 | 2010-10-29 09:49:06.107569 | 982:982:    |
STORE_NODE          | 2        | slave
|
        |          |          |          |          |
         1 | 5000000081 | 2010-10-29 09:49:06.107569 | 982:982:    |
ENABLE_NODE         | 2        |
|
        |          |          |          |          |
         1 | 5000000082 | 2010-10-29 09:49:06.433519 | 983:983:    |
STORE_PATH          | 2        | 1        | host=192.168.11.12
dbname=system.db user=postgres po
rt=5432 | 10       |          |          |          |
         1 | 5000000083 | 2010-10-29 09:49:10.311715 | 988:988:    |
SUBSCRIBE_SET       | 1        | 1        |
2
        | t        | f        |          |          |
         1 | 5000000084 | 2010-10-29 09:49:10.311715 | 988:988:    |
ENABLE_SUBSCRIPTION | 1        | 1        |
2
        | t        | f        |          |          |
(6 rows)

In the normal node event table, there has similar records below. Only
difference is there have a number of SYNC events generated continuously.
*[root at 140-r1s1-001 ~]# psql -U postgres system.db -c "select * from
_slony.sl_event where ev_seqno > 5000000078 order by ev_seqno";
*LOG:  duration: 5.093 ms  statement: select * from _slony.sl_event where
ev_seqno > 5000000078 order by ev_seqno
 ev_origin |  ev_seqno  |        ev_timestamp        | ev_snapshot |
ev_type       | ev_data1 | ev_data2 |
ev_data3
        | ev_data4 | ev_data5 | ev_data6 | ev_data7 | ev_data8
-----------+------------+----------------------------+-------------+---------------------+----------+----------+-----------------------------------------------------
--------+----------+----------+----------+----------+----------
         1 | 5000000082 | 2010-10-29 13:55:57.312714 | 892:892:    |
SYNC                |          |
|
        |          |          |          |          |
         1 | 5000000083 | 2010-10-29 06:32:52.628858 | 997:997:    |
STORE_NODE          | 2        | slave
|
        |          |          |          |          |
         1 | 5000000084 | 2010-10-29 06:32:52.628858 | 997:997:    |
ENABLE_NODE         | 2        |
|
        |          |          |          |          |
         1 | 5000000085 | 2010-10-29 06:32:53.07135  | 998:998:    |
STORE_PATH          | 2        | 1        | host=192.168.11.12
dbname=system.db user=postgres po
rt=5432 | 10       |          |          |          |
         1 | 5000000086 | 2010-10-29 06:32:57.676823 | 1003:1003:  |
SUBSCRIBE_SET       | 1        | 1        |
2
        | t        | f        |          |          |
         1 | 5000000087 | 2010-10-29 06:32:57.676823 | 1003:1003:  |
ENABLE_SUBSCRIPTION | 1        | 1        |
2
        | t        | f        |          |          |
         1 | 5000000088 | 2010-10-29 13:58:27.852526 | 1085:1085:  |
SYNC                |          |
|
        |          |          |          |          |
         1 | 5000000089 | 2010-10-29 13:58:37.868268 | 1087:1087:  |
SYNC                |          |
|
        |          |          |          |          |
...................
         1 | 5000000526 | 2010-10-29 15:11:31.902246 | 1987:1987:  |
SYNC                |          |
|
        |          |          |          |          |
         1 | 5000000527 | 2010-10-29 15:11:41.905465 | 1989:1989:  |
SYNC                |          |
|
        |          |          |          |          |
         1 | 5000000528 | 2010-10-29 15:11:51.912475 | 1991:1991:  |
SYNC                |          |
|
        |          |          |          |          |
         1 | 5000000529 | 2010-10-29 15:12:01.913758 | 1993:1993:  |
SYNC                |          |
|
        |          |          |          |          |
(448 rows)


On Fri, Oct 29, 2010 at 10:39 PM, Steve Singer <ssinger at ca.afilias.info>wrote:

> On 10-10-29 10:24 AM, Jason Chen wrote:
>
>> Hi Steve,
>>
>>  >If you turn up the logging level to debug , what does slon report in
>> the log in cases where it doesn't work.   I must be logging some stuff
>> even if it then stops/hangs.
>>
>> I have attached the normal configuration and error configuration master
>> node log. Can you take a look and see if there is anything abnormal?
>>
>>  >slon using the connection settings from the service config to connect
>> to its 'local' database that it generates the syncs on.  It sounds like
>> slon isn't able to talk to this database.
>>
>> Is there any log we can check on that since postgresql on the master
>> node runs well?
>>
>> Please also let me know if you need other more information.
>>
>> Thanks,
>> Jason
>>
>>
> The error log stops after a few minutes.  Does slon just stop writing to
> the file?
>
> As you can see in the normal log file,
> the localListener thread sees the STORE NODE event and then the STORE PATH
> event a bit further down.
>
> 2010-10-29 15:01:38 UTCDEBUG2 localListenThread: Received event
> 1,5000000179 STORE_NODE
> 2010-10-29 15:01:38 UTCCONFIG storeNode: no_id=2 no_comment='slave'
>
>
> Once it processes that store node event it then starts the remoteWorker and
> remoteListener threads that actually do stuff
>
> In the error case, if you query the sl_event table on the master you should
> see the STORE NODE and STORE PATH events. (this is worth confirming).  The
> question is why is the slon not getting to this events. What event numbers
> are assigned to them?
>
> In the error case it got as far as 1,5000000081  in the normal case the
> STORE NODE event was 1,5000000180 so if the time from when you started slon
> until when you ran the storeNode is similar in both cases then you still
> have a fair number of events left to process (though processing 100 SYNC
> events when no tables are replicated should be pretty fast)
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.slony.info/pipermail/slony1-hackers/attachments/20101029/03bc1aae/attachment-0001.htm 


More information about the Slony1-hackers mailing list