[Slony1-general] very slow create set

Wed Nov 11 10:05:01 PST 2009

On Wed, Nov 11, 2009 at 9:54 AM, Christopher Browne
<cbbrowne at ca.afilias.info> wrote:
> Scott Marlowe <scott.marlowe at gmail.com> writes:
>> On Mon, Nov 9, 2009 at 7:52 AM, Jan Wieck <JanWieck at yahoo.com> wrote:
>>> On 11/7/2009 11:23 PM, Scott Marlowe wrote:
>>>>
>>>> On Sat, Nov 7, 2009 at 9:02 PM, Vick Khera <vivek at khera.org> wrote:
>>>>>
>>>>> On Fri, Nov 6, 2009 at 4:07 PM, Scott Marlowe <scott.marlowe at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> I have about 1500 objects in my main db, and running create set takes
>>>>>> about 5 seconds per table or sequence.  Is this typical?  Any changes
>>>>>> I can make in terms of cost of any slony functions or creating indexes
>>>>>> to make it faster?
>>>>>>
>>>>>
>>>>> The most complicated one I have has about 250 tables and about 50
>>>>> sequences.  I don't recall it taking that long to create a set.
>>>>> Perhaps the time is proportional to the number of objects?  Which
>>>>> versions of everything are you using?  I run Pg 8.3 and slony1 1.2.x.
>>>>
>>>> Yep, pg 8.3.7 and slony 1.2.14 at the moment.  If I do this on a
>>>> machine with JUST the objects I'm replicating, I can get 10 seconds
>>>> per table and 5 or so per sequence.
>>>>
>>>> On the DB with 29000 other objects not replicated, it gets up to 30
>>>> seconds per table and 20 seconds per sequence.  Which means a 5 hour
>>>> time for create set.  And if autovac kicks in it blocks create set.
>>>> so I had to turn that off.
>>>
>>> This sounds like some of the system catalog operations done by Slony are
>>> using seq scans. I've never tested Slony with a database having that many
>>> objects.
>>
>> Could be.  But whatever it is it must be doing it over and over (i.e.
>> a seq scan feeding a nested loop ARG!!!)  because I can seq scan
>> things like pg_class in way under a second.  But if I had to do it 37k
>> times or so it would then be as slow as I'm seeing.  Got some good
>> troubleshooting tips I'm gonna look at.  Thanks to all who posted.
>> I'll keep you informed.
>
> A browse of the code in SetAddTable_int() doesn't show off anything that
> *ought* to be notably slow.
>
> There's a subquery that is a bit pointless:
>
>   ... where attrelid = (select oid from pg_catalog.pg_class where oid = v_tab_reloid)
>
> That subquery could be replaced by v_tab_reloid.  But it shouldn't
> worsen things because there's an index on pg_class(oid).
>
> The more major set of queries are in alterTableForReplication().  I'd be
> suspicious of it.
>
> You could exercise alterTableForReplication() by trying that over and
> over again in conjunction with alterTableRestore()...
>
> Pick a table, say the one with ID 5...
>
> begin;
> explain analyze select "_MyNameSpace".alterTableRestore(5);
> explain analyze select "_MyNameSpace".alterTableForReplication(5);
> commit;
>
> Throwing some notify requests into alterTableForReplication(), and
> logging to the millisecond, should help track down a query that is
> slower than it ought to be.

Thanks I'll look into that.  I'm just getting a test environment setup
I can play with that's not production for this type of thing.  I'm
guessing that somewhere in there it's calling some pg system function
that gets super slow with large numbers of objects.