[Slony1-general] Slave can't catch up, postgres error 'stack depth limit exceeded'

Fri Jan 27 09:33:40 PST 2012

On Fri, Jan 27, 2012 at 12:05 PM, Brian Fehrle
<brianf at consistentstate.com> wrote:
> On 01/26/2012 06:29 PM, Steve Singer wrote:
>> On Tue, 24 Jan 2012, Cédric Villemain wrote:
>>
>>> Le 22 janvier 2012 17:16, Steve Singer <steve at ssinger.info> a écrit :
>>>> On Sun, 22 Jan 2012, Brian Fehrle wrote:
>>
>>>
>>> but ... isn't it slony which should not use more than
>>> default_stack_size ? can't there be an underlining bug ?
>>
>> If slony is leaking memory or if the compression routine for the
>> snapshot id's isn't working properly then it is a bug.  I haven't seen
>> any evidence of this (nor have I analyzed the entire contents of his
>> sl_event to figure out if that is the case).
>>
>> If a single SYNC group really had a lot of active xids such that it
>> exceeded the amount of text that can be passed to a function with the
>> default stack size then this isn't a bug.
>>
>> In 2.2 on a failed SYNC slon should now dynamically shrink the SYNC
>> group size until it works (or reaches a size of 1).
>>
> Very cool.
>
> Unfortunately I've now removed my logs due to space issues. But one
> thing that concerns me is that I had two slave nodes that were both
> behind the master at the same SYNC event. One node was on postgres 9.1.2
> (which is the one that I had this issue with), and the other on 8.4.9.
> When I brought the daemon for 8.4.9 online, it synced up and did not
> have this issue, while the 9.1 still did. Both 8.4.9 and 9.1.2 instances
> had the same value for max_stack_depth.

A different thing troubles me...

The point of the "compress" step is to compress together runs of
sequential transaction ID values, and that depends on the values being
returned in sequential order so that it can recognize runs of
sequences and compress them together.

It seems as though the query is no longer returning the values in
sequential order, which seems like a problem.