[Slony1-general] Revision to DDL handling

Wed Nov 9 15:14:25 PST 2011

I have gotten around to poking at fairly old bug #137
<http://www.slony.info/bugzilla/show_bug.cgi?id=137>, which heads
towards a quite different implementation of DDL handling in Slony-I
version 2.2.

Where, in 2.0 and 2.1, the process of applying DDL looked like:

 - Start running a DDL event
    - Take the list of SQL statements out of the event, splitting them
into individual queries
    - For each query
      - Run the query

With some vagueness as to what happens if update activity is being
logged concurrently in sl_log_{1,2}.  (I'm not sure whether the
updates are applied *before* or *after* the DDL; neither is
necessarily the right answer!)

In 2.2, we add a new table, sl_log_script, which is used to capture
the SQL statements with exactly the same transaction and ordering
information used to control ordering of log application of log data in
sl_log_{1,2}.

This changes the semantics a bit, but, we think, in a pretty well
unambiguously better way.

It actually makes it easier to do DDL handling; we have a wrapper
function, ddlCapture(), which drops the DDL into the new table.  A
clever administrator might use exactly the same function to run their
own favorite bit of DDL and have it run exactly as it would have been
had they used slonik EXECUTE SCRIPT.

There is a change to EXECUTE SCRIPT; it becomes rather more meaningful
to use the EXECUTE ONLY ON option, and to have that be a list of
nodes, rather than just a single node.

And here's where a question opens...

The way I have initially implemented the new form of EXECUTE SCRIPT is
to request a specific list of nodes.

Thus:

EXECUTE SCRIPT (set id=1, filename='/tmp/my-ddl-script.sql', event
node=2, execute only on = '2,3,4');

My colleagues have suggested that perhaps we'd like to have the option
of a script running only on the subscribers to a single set.

I imagine this might be handled via a syntax like:
    EXECUTE SCRIPT (set id=1, filename='/tmp/my-ddl-script.sql', event
node=2, execute only on = set nodes);

But I also imagine that this may be overkill.  Creating a syntax
specifically for the case of running just on a certain set's nodes may
be adding a complication that no one really cares to use.

Does anyone feel strongly about this?  If not, then my inclination is
to have just two behaviours:
  a) Run the script on ALL nodes, as a default behaviour
  b) Run on a specified list of nodes, e.g. - EXECUTE ONLY ON='2,3,4'

If anyone badly wants an option c), I'd appreciate hearing so.

Though I'm ready to argue "but if you don't know what your set of
nodes are, I think you're in deep, deep trouble..."