[Slony1-general] data copy for set 1 failed 3 times

Thu Nov 29 13:31:26 PST 2012

On Thu, Nov 29, 2012 at 2:27 PM, Tory M Blue <tmblue at gmail.com> wrote:
>
>
> On Thu, Nov 29, 2012 at 10:15 AM, Steve Singer <ssinger at ca.afilias.info>
> wrote:
>>
>> On 12-11-29 11:53 AM, Tory M Blue wrote:
>>>
>>>
>>>
>>>
>>> On Thu, Nov 29, 2012 at 12:57 AM, Glyn Astill <glynastill at yahoo.co.uk
>>> <mailto:glynastill at yahoo.co.uk>> wrote:
>>>
>>>     Hi Tory
>>>
>>>      > From: Tory M Blue <tmblue at gmail.com <mailto:tmblue at gmail.com>>
>>>
>>>      >To: slony1-general <slony1-general at lists.slony.info
>>>     <mailto:slony1-general at lists.slony.info>>
>>>      >Sent: Wednesday, 28 November 2012, 18:35
>>>      >Subject: [Slony1-general] data copy for set 1 failed 3 times -
>>>     sleep 60 seconds
>>>      >
>>>      >
>>>      >Greetings
>>>      >
>>>      >I've just brought up a replication node across the state, we have
>>>     gig circuits, but still going over the net over a vpn tunnel. So
>>>     there is some delay,
>>>      >
>>>      >I'm getting these errors and can't get the initial replication to
>>>     finish
>>>      >
>>>      >4273435:2012-11-27 16:45:17 PST WARN   remoteWorkerThread_1: data
>>>     copy for set 1 failed 3 times - sleep 60 seconds
>>>      >
>>>
>>>     Is that the only line you get in your logs?  If so use a higher
>>>     setting for log_level (like log_level=4).
>>>
>>>     Also is there anything in the postgresql log to indicate the problem?
>>>
>>>
>>> I get the following
>>>
>>> 2012-11-29 08:34:38 PST CONFIG remoteWorkerThread_1: 3858.988 seconds to
>>> copy table "cls"."listings"
>>> 2012-11-29 08:34:38 PST CONFIG remoteWorkerThread_1: copy table
>>> "cls"."customers"
>>> 2012-11-29 08:34:38 PST CONFIG remoteWorkerThread_1: Begin COPY of table
>>> "cls"."customers"
>>> 2012-11-29 08:34:38 PST ERROR  remoteWorkerThread_1: "select
>>> "_admissioncls".copyFields(8);"
>>>
>>
>> What is the structure of the table with table_id=8 (ie set add table(id=8,
>> fully qualified name=?????)
>>
>> If you do
>> manually run
>> select _admissioncls".copyFields(8);
>> from psql, what does it come back with?
>>
>>
> So ran it again with debug = 4, same failure same spot. Should be be dying
> due to the logswitch, but I guess it could be?
>
> Failed again, this time with heavier debug, but not showing me much (level
> 4)
> 1235574-2012-11-29 12:22:12 PST CONFIG remoteWorkerThread_1: Begin COPY of
> table "cls"."customers"
> 1235665-2012-11-29 12:22:12 PST ERROR  remoteWorkerThread_1: "select
> "_admissioncls".copyFields(8);"
> 1235759:2012-11-29 12:22:12 PST WARN   remoteWorkerThread_1: data copy for
> set 1 failed 1 times - sleep 15 seconds
> Followed sometime later by this
> 2012-11-29 12:22:28 PST DEBUG2 remoteWorkerThread_2: forward confirm
> 3,5001168772 received by 4
> 2012-11-29 12:22:28 PST INFO   copy_set 1 - omit=f - bool=0
> 2012-11-29 12:22:28 PST INFO   omit is FALSE
> And it starts all over again.
>
> Postgres logs
>
> 2012-11-29 12:19:40 PST    HINT:  Consider increasing the configuration
> paramete
> r "checkpoint_segments".
> 2012-11-29 12:22:13 PST admissionclsdb postgres [local] NOTICE:  Slony-I:
> Logswi
> tch to sl_log_2 initiated
> 2012-11-29 12:22:13 PST admissionclsdb postgres [local] CONTEXT:  SQL
> statement
> "SELECT "_admissioncls".logswitch_start()"
>     PL/pgSQL function "cleanupevent" line 96 at PERFORM
>
> Nothing in /var/log/messages or dmesg, this is not a system thing it doesn't
> appear. Slony configurations sync rate, or something is putting the kabosh
> on this.

Is there anything useful in the postgresql logs from the same time?
Like maybe lost connections or crashing backends?