[Slony1-general] really inefficient queries with slony 1.1 and postgres 7.4

Mon Aug 29 18:09:17 PDT 2005

I have currently some performance problems, where slony generates a
query that postgres 7.4.5 does not optimise well

the query generated by slony when it replicates 4 sets from node 1 loocs
like this (really it is in "declare cursor")

select     log_origin, log_xid, log_tableid,     log_actionseq,
log_cmdtype, log_cmddata 
  from "_bbb_cluster".sl_log_1 
 where log_origin = 1 
and (  
( log_tableid in (3,9008,9007,9005,9004,2002,2001) 
and (log_xid < '1312955843' 
     and "_bbb_cluster".xxid_lt_snapshot(log_xid,
'1312950014:1312955843:''1312955836'',''1312955783'',''1312955806'',''1312950014'',''1312952044'''))

and (log_xid >= '1312942023' 
     and "_bbb_cluster".xxid_ge_snapshot(log_xid,
'1312942023:1312947935:''1312947917'',''1312947924'',''1312942023'',''1312946242'''))
) 
or 
( log_tableid in
(1002,1003,1013,1041,1037,1028,1026,1023,1031,1012,1048,1050,1046,1021,1019,1024,1027,1029,1025,1035,1011,1009,1010,1016,1032,1018,1030,1138) 

and (log_xid < '1312955843' 
     and "_bbb_cluster".xxid_lt_snapshot(log_xid,
'1312950014:1312955843:''1312955836'',''1312955783'',''1312955806'',''1312950014'',''1312952044''')) 

and (log_xid >= '1312942023' 
     and "_bbb_cluster".xxid_ge_snapshot(log_xid,
'1312942023:1312947935:''1312947917'',''1312947924'',''1312942023'',''1312946242''')) 
)
or 
( log_tableid in
(7001,7008,7007,7004,7039,7002,7030,7018,7038,7003,7005,7006,7009,7011,7012,7013,7016,7021,
7022,7025,7026,7027,7028,7029,7031,7033,7034,7035,7036,7037,1075,9009,9011,9012,9013,9014,
9015,9016,9017,1051,1052,1053,1054,1055,1056,1057,1058,1059,1060,1061,1062,1063,1064,1065,
1066,1067,1068,1070,1071,1072,1073,1074,1076,1077,1078) 
and (log_xid < '1312955843' 
     and "_bbb_cluster".xxid_lt_snapshot(log_xid,
'1312950014:1312955843:''1312955836'',''1312955783'',''1312955806'',''1312950014'',''1312952044''')) 
and (log_xid >= '1312942023' 
     and "_bbb_cluster".xxid_ge_snapshot(log_xid,
'1312942023:1312947935:''1312947917'',''1312947924'',''1312942023'',''1312946242''')) 
)
or 
( log_tableid in
(7051,7050,7052,7053,7054,7055,7056,7057,7058,7059,7060,7061,7062,7063,7064,7065,7066,7067,7068,7069,7070,7071,7072,7073,7074) 
and (log_xid < '1312955843' 
     and "_bbb_cluster".xxid_lt_snapshot(log_xid,
'1312950014:1312955843:''1312955836'',''1312955783'',''1312955806'',''1312950014'',''1312952044''')) 
and (log_xid >= '1312942023' 
     and "_bbb_cluster".xxid_ge_snapshot(log_xid,
'1312942023:1312947935:''1312947917'',''1312947924'',''1312942023'',''1312946242''')) ) 
) 
order by log_actionseq;

And this is done by postgres a seqscan, fro which the first FETCH runs
about 60 sec :(

as everything other than "log_tableid in (n,n,n) is the same in all OR'd
parts, the query  could actually be done as

select     log_origin, log_xid, log_tableid,     log_actionseq,
log_cmdtype, log_cmddata 
  from "_bbb_cluster".sl_log_1 
 where log_origin = 1 
   and (log_xid >= '1312942023' and "_bbb_cluster".xxid_ge_snapshot
(log_xid,
'1312942023:1312947935:''1312947917'',''1312947924'',''1312942023'',''1312946242''')) 
   and (log_xid < '1312955843'      and "_bbb_cluster".xxid_lt_snapshot
(log_xid,
'1312950014:1312955843:''1312955836'',''1312955783'',''1312955806'',''1312950014'',''1312952044'''))
   and log_tableid in
(3,9008,9007,9005,9004,2002,2001,1002,1003,1013,1041,1037,1028,1026,1023,1031,1012,1048,1050,1046,
1021,1019,1024,1027,1029,1025,1035,1011,1009,1010,1016,1032,1018,1030,1138,7001,7008,7007,7004,7039,
7002,7030,7018,7038,7003,7005,7006,7009,7011,7012,7013,7016,7021,7022,7025,7026,7027,7028,7029,7031,
7033,7034,7035,7036,7037,1075,9009,9011,9012,9013,9014,9015,9016,9017,1051,1052,1053,1054,1055,1056,
1057,1058,1059,1060,1061,1062,1063,1064,1065,1066,1067,1068,1070,1071,1072,1073,1074,1076,1077,1078,
7051,7050,7052,7053,7054,7055,7056,7057,7058,7059,7060,7061,7062,7063,7064,7065,7066,7067,7068,7069,
7070,7071,7072,7073,7074)
order by log_actionseq;

which is done as index scan and runs in 0.4 sec (as measured by explain
analyse)

So I'd propose that if there are several table sets subscribed from the
same master, their id lists should be merged before generating the query
instead of generating a similar query for each set and then OR'ing
these.

-- 
Hannu Krosing <hannu at skype.net>