[Slony1-general] really inefficient queries with slony 1.1 and postgres 7.4

Tue Aug 30 10:08:19 PDT 2005

On E, 2005-08-29 at 18:04 -0400, Christopher Browne wrote:
> Hannu Krosing <hannu at skype.net> writes:
> > I have currently some performance problems, where slony generates a
> > query that postgres 7.4.5 does not optimise well
> >
> > the query generated by slony when it replicates 4 sets from node 1 loocs
> > like this (really it is in "declare cursor")
> >
> > select     log_origin, log_xid, log_tableid,     log_actionseq,
> > log_cmdtype, log_cmddata 
> >   from "_bbb_cluster".sl_log_1 
> >  where log_origin = 1 
> > and (  
> ... much elided ...
> > ) 
> > order by log_actionseq;
> >
> > And this is done by postgres a seqscan, fro which the first FETCH runs
> > about 60 sec :(
> 
> That is indeed unfortunate.  And this is the sort of case that is most
> likely to occur when processing the first few SYNCs after subscribing
> a big new set.

Actually I have loads that cause this thing to happen when some replica
falls back for more than a few minutes, usually caused by pg_listener
bloat due to long-running analysis queries;

> > as everything other than "log_tableid in (n,n,n) is the same in all
> > OR'd parts, the query could actually be done as
> >
> > select     log_origin, log_xid, log_tableid,     log_actionseq,
> > log_cmdtype, log_cmddata 
> >   from "_bbb_cluster".sl_log_1 
> >  where log_origin = 1 
> >    and (log_xid >= '1312942023' and "_bbb_cluster".xxid_ge_snapshot
> > (log_xid,
> > '1312942023:1312947935:''1312947917'',''1312947924'',''1312942023'',''1312946242''')) 
> >    and (log_xid < '1312955843'      and "_bbb_cluster".xxid_lt_snapshot
> > (log_xid,
> > '1312950014:1312955843:''1312955836'',''1312955783'',''1312955806'',''1312950014'',''1312952044'''))
> >    and log_tableid in
> > (3,9008,9007,9005,9004,2002,2001,1002,1003,1013,1041,1037,1028,1026,1023,1031,1012,1048,1050,1046,
> > 1021,1019,1024,1027,1029,1025,1035,1011,1009,1010,1016,1032,1018,1030,1138,7001,7008,7007,7004,7039,
> > 7002,7030,7018,7038,7003,7005,7006,7009,7011,7012,7013,7016,7021,7022,7025,7026,7027,7028,7029,7031,
> > 7033,7034,7035,7036,7037,1075,9009,9011,9012,9013,9014,9015,9016,9017,1051,1052,1053,1054,1055,1056,
> > 1057,1058,1059,1060,1061,1062,1063,1064,1065,1066,1067,1068,1070,1071,1072,1073,1074,1076,1077,1078,
> > 7051,7050,7052,7053,7054,7055,7056,7057,7058,7059,7060,7061,7062,7063,7064,7065,7066,7067,7068,7069,
> > 7070,7071,7072,7073,7074)
> > order by log_actionseq;
> >
> > which is done as index scan and runs in 0.4 sec (as measured by
> > explain analyse)
> >
> > So I'd propose that if there are several table sets subscribed from
> > the same master, their id lists should be merged before generating
> > the query instead of generating a similar query for each set and
> > then OR'ing these.
> 
> Regrettably, C doesn't have any "similarity" operator to detect the
> relevant similarities.

>From my casual observations it seems that if sets have the same origin,
they have (almost?) always the exactly _same_ conditions after the 

log_tableid in (...)

part, something silimar to:

and (log_xid >= '1312942023' and "_bbb_cluster".xxid_ge_snapshot
 (log_xid,
 '1312942023:1312947935:''1312947917'',''1312947924'',''1312942023'',''1312946242''')) 
    and (log_xid < '1312955843'      and "_bbb_cluster".xxid_lt_snapshot
 (log_xid,
 '1312950014:1312955843:''1312955836'',''1312955783'',''1312955806'',''1312950014'',''1312952044'''))

> It seems to me that we need to do something fairly fundamentally
> better than this.
> 
> There was a patch I sent out a while back which would compress
> sequences of log_actionseq items into a series of "between" clauses.

this was for a different case, namely "long id lists after subscribe."

> This big query points onwards to the notion of perhaps constructing
> some temp tables...
> 
> 1.  There would be one consisting of a set identifier and then
> log_tableid values.
> 
> That could turn the clause like "and log_tableid in
> (3,9008,castofthousands)" into ...
> 
> "and log_tableid = t1.table_id "

I don't think the main thing in my case is long id lists, but rather an
OR of 4 identical (except in id lists) queries which cause PG optimiser
not to notice that it can lift the identical "and log_xid >= XXX" part
out of all the or'ed subclauses, get plain

"log_origin=N and log_xid >= M and ..." and 

use index on 

(log_origin, log_xid, log_actionseq)

to do the query in an effective way.

> 2.  I think we'd stow set/minlogxid values in another temp table, and
> thus have...
> 
> "and log_xid >= t2.minlogxid "
> 
> 3.  Likewise, stow set/maxlogxid in a third temp table, and have
> 
> "and log_xid < t3.maxlogxid "
> 
> This should be a simple matter of programming, right?  :-)

sure ;)

> If we construct a temp table for each SYNC, that probably leads to a
> certain amount of bloating of pg_attribute/pg_class; they'd have to
> get vacuumed once in a while too :-).
> 
> It is a SMOP which I seriously doubt I can get to until on the order
> of November.  Feel free to volunteer...

I'd try to do it the way I proposed in my previous letter, and hopefully
clarified a little in this letter.

Just need a confirmation by somebody, that slony 1.1 does indeed get the
same sync intervals for sets subscribed from the same master (at least
most of the time).

And maybe a hint on where in the code is this query (which gets the
actual changes from sl_log_1) constructed.

again :

this is what slony generates for 3 sets (1,2,3), (11,12,13), (21,22,23):
------------------------------------------------------------------------
select log_origin, log_xid, log_tableid,     
       log_actionseq, log_cmdtype, log_cmddata 
  from "_bbb_cluster".sl_log_1 
 where log_origin = 1 
   and (  
        ( log_tableid in (1,2,3) 
        and (log_xid < 'XMIN' 
        and "_bbb_cluster".xxid_lt_snapshot(log_xid, 'AAA'))
        and (log_xid >= 'XMAX' 
        and "_bbb_cluster".xxid_ge_snapshot(log_xid,'BBB'))
        ) 
        or 
        ( log_tableid in (11,12,13) 
        and (log_xid < 'XMIN'
        and "_bbb_cluster".xxid_lt_snapshot(log_xid, 'AAA'))
        and (log_xid >= 'XMAX'
        and "_bbb_cluster".xxid_ge_snapshot(log_xid,'BBB'))
        )
        or 
        ( log_tableid in (21,22,23) 
        and (log_xid < '1312955843' 'XMIN'
        and "_bbb_cluster".xxid_lt_snapshot(log_xid, 'AAA'))
        and (log_xid >= '1312942023' 'XMAX'
        and "_bbb_cluster".xxid_ge_snapshot(log_xid,'BBB'))
        )
       ) 
order by log_actionseq;
------------------------------------------------------------------------

as everything in each of the or'ed subclauses is the same excep the
"log_tableid in (...)" list, the rest can be lifted out and the query
would look like this:
------------------------------------------------------------------------
select log_origin, log_xid, log_tableid,     
       log_actionseq, log_cmdtype, log_cmddata 
  from "_bbb_cluster".sl_log_1 
 where log_origin = 1 
        and (log_xid < 'XMIN' 
        and "_bbb_cluster".xxid_lt_snapshot(log_xid, 'AAA'))
        and (log_xid >= 'XMAX' 
        and "_bbb_cluster".xxid_ge_snapshot(log_xid,'BBB'))
   and (  
        ( log_tableid in (1,2,3) 
        ) 
        or 
        ( log_tableid in (11,12,13) 
        )
        or 
        ( log_tableid in (21,22,23) 
        )
       ) 
order by log_actionseq;
------------------------------------------------------------------------

Actually I hope I don't have to do it this way, by I just check that
XMIN, AAA, XMAX, BBB are the same and will merge the tableid lists and
generate the query once to look like this
-----------------------------------------------------------------------
select log_origin, log_xid, log_tableid,     
       log_actionseq, log_cmdtype, log_cmddata 
  from "_bbb_cluster".sl_log_1 
 where log_origin = 1 
        and  log_tableid in (1,2,3,11,12,13,21,22,23) 
        and (log_xid < '1312955843' 
        and "_bbb_cluster".xxid_lt_snapshot(log_xid, 'AAA'))
        and (log_xid >= '1312942023' 
        and "_bbb_cluster".xxid_ge_snapshot(log_xid,'BBB'))
order by log_actionseq;
-----------------------------------------------------------------------

And if it is the case (as I hope/suspect) that XMIN, AAA, XMAX, BBB are
always the same for same origin, I can even skip the "detecting" part :)

-- 
Hannu Krosing <hannu at skype.net>