Chris Browne cbbrowne at lists.slony.info
Thu Apr 30 09:06:12 PDT 2009
Update of /home/cvsd/slony1/slony1-engine/doc/adminguide
In directory main.slony.info:/tmp/cvs-serv3474

Modified Files:
      Tag: REL_1_2_STABLE
	addthings.sgml adminscripts.sgml bestpractices.sgml 
	cluster.sgml concepts.sgml ddlchanges.sgml defineset.sgml 
	dropthings.sgml failover.sgml faq.sgml filelist.sgml 
	firstdb.sgml help.sgml installation.sgml intro.sgml legal.sgml 
	listenpaths.sgml locking.sgml loganalysis.sgml 
	logshipping.sgml maintenance.sgml monitoring.sgml 
	partitioning.sgml prerequisites.sgml releasechecklist.sgml 
	reshape.sgml slon.sgml slonconf.sgml slonik_ref.sgml 
	slony.sgml slonyupgrade.sgml startslons.sgml 
	subscribenodes.sgml supportedplatforms.sgml testbed.sgml 
	usingslonik.sgml versionupgrade.sgml 
Log Message:
Draw in doc updates from 2.0 branch



Index: legal.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/legal.sgml,v
retrieving revision 1.11
retrieving revision 1.11.2.1
diff -C2 -d -r1.11 -r1.11.2.1
*** legal.sgml	2 Aug 2006 18:34:58 -0000	1.11
--- legal.sgml	30 Apr 2009 16:06:10 -0000	1.11.2.1
***************
*** 2,6 ****
  
  <copyright>
!  <year>2004-2006</year>
   <holder>The PostgreSQL Global Development Group</holder>
  </copyright>
--- 2,6 ----
  
  <copyright>
!  <year>2004-2007</year>
   <holder>The PostgreSQL Global Development Group</holder>
  </copyright>
***************
*** 10,14 ****
  
   <para>
!   <productname>PostgreSQL</productname> is Copyright &copy; 2004-2006
    by the PostgreSQL Global Development Group and is distributed under
    the terms of the license of the University of California below.
--- 10,14 ----
  
   <para>
!   <productname>PostgreSQL</productname> is Copyright &copy; 2004-2007
    by the PostgreSQL Global Development Group and is distributed under
    the terms of the license of the University of California below.

Index: locking.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/locking.sgml,v
retrieving revision 1.10
retrieving revision 1.10.2.1
diff -C2 -d -r1.10 -r1.10.2.1
*** locking.sgml	2 Aug 2006 18:34:59 -0000	1.10
--- locking.sgml	30 Apr 2009 16:06:10 -0000	1.10.2.1
***************
*** 14,18 ****
  can access <quote>old tuples.</quote> Most of the time, this allows
  the gentle user of &postgres; to not need to worry very much about
! locks. </para>
  
  <para> Unfortunately, there are several sorts of &slony1; events that
--- 14,21 ----
  can access <quote>old tuples.</quote> Most of the time, this allows
  the gentle user of &postgres; to not need to worry very much about
! locks.  &slony1; configuration events normally grab locks on an
! internal table, <envar>sl_config_lock</envar>, which should not be
! visible to applications unless they are performing actions on &slony1;
! components.  </para>
  
  <para> Unfortunately, there are several sorts of &slony1; events that

Index: bestpractices.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/bestpractices.sgml,v
retrieving revision 1.24.2.1
retrieving revision 1.24.2.2
diff -C2 -d -r1.24.2.1 -r1.24.2.2
*** bestpractices.sgml	16 Mar 2007 19:01:26 -0000	1.24.2.1
--- bestpractices.sgml	30 Apr 2009 16:06:10 -0000	1.24.2.2
***************
*** 104,110 ****
  <listitem><para> The system will periodically rotate (using
  <command>TRUNCATE</command> to clean out the old table) between the
! two log tables, <xref linkend="table.sl-log-1"> and <xref
! linkend="table.sl-log-2">, preventing unbounded growth of dead space
! there.  </para></listitem>
  </itemizedlist>
  
--- 104,109 ----
  <listitem><para> The system will periodically rotate (using
  <command>TRUNCATE</command> to clean out the old table) between the
! two log tables, &sllog1; and &sllog2;, preventing unbounded growth of
! dead space there.  </para></listitem>
  </itemizedlist>
  
***************
*** 115,118 ****
--- 114,122 ----
  should be planned for ahead of time.  </para>
  
+ <para> Most pointedly, any node that is expected to be a failover
+ target must have its subscription(s) set up with the option
+ <command>FORWARD = YES</command>.  Otherwise, that node is not a
+ candidate for being promoted to origin node. </para>
+ 
  <para> This may simply involve thinking about what the priority lists
  should be of what should fail to what, as opposed to trying to
***************
*** 144,147 ****
--- 148,160 ----
  </listitem>
  
+ <listitem><para> If you are using the autovacuum process in recent
+ versions of &postgres;, you may wish to leave &slony1; tables out, as
+ &slony1; is a bit more intelligent about vacuuming when it is expected
+ to be conspicuously useful (<emphasis>e.g.</emphasis> - immediately
+ after purging old data) to do so than autovacuum can be. </para>
+ 
+ <para> See <xref linkend="maintenance-autovac"> for more
+ details. </para> </listitem>
+ 
  <listitem> <para> Running all of the &lslon; daemons on a central
  server for each network has proven preferable. </para>
***************
*** 164,174 ****
  for managing so that the connection to that node is a
  <quote>local</quote> one.  Do <emphasis>not</emphasis> run such links
! across a WAN. </para>
  
! <para> A WAN outage can leave database connections
! <quote>zombied</quote>, and typical TCP/IP behaviour <link
! linkend="multipleslonconnections"> will allow those connections to
! persist, preventing a slon restart for around two hours. </link>
! </para>
  
  <para> It is not difficult to remedy this; you need only <command>kill
--- 177,189 ----
  for managing so that the connection to that node is a
  <quote>local</quote> one.  Do <emphasis>not</emphasis> run such links
! across a WAN.  Thus, if you have nodes in London and nodes in New
! York, the &lslon;s managing London nodes should run in London, and the
! &lslon;s managing New York nodes should run in New York.</para>
  
! <para> A WAN outage (or flakiness of the WAN in general) can leave
! database connections <quote>zombied</quote>, and typical TCP/IP
! behaviour <link linkend="multipleslonconnections"> will allow those
! connections to persist, preventing a slon restart for around two
! hours. </link> </para>
  
  <para> It is not difficult to remedy this; you need only <command>kill
***************
*** 193,200 ****
  scratch.</para>
  
! <para> The exception, where it is undesirable to restart a &lslon;, is
! where a <command>COPY_SET</command> is running on a large replication
! set, such that stopping the &lslon; may discard several hours worth of
! load work. </para>
  
  <para> In early versions of &slony1;, it was frequently the case that
--- 208,215 ----
  scratch.</para>
  
! <para> The exception scenario where it is undesirable to restart a
! &lslon; is where a <command>COPY_SET</command> is running on a large
! replication set, such that stopping the &lslon; may discard several
! hours worth of load work. </para>
  
  <para> In early versions of &slony1;, it was frequently the case that
***************
*** 224,228 ****
  possibility that updates to this table can fail due to the introduced
  unique index, which means that &slony1; has introduced a new failure
! mode for your application.</para>
  </listitem>
  
--- 239,249 ----
  possibility that updates to this table can fail due to the introduced
  unique index, which means that &slony1; has introduced a new failure
! mode for your application.  
! </para>
! 
! <warning><para> In version 2 of &slony1;, <xref
! linkend="stmttableaddkey"> is no longer supported.  You
! <emphasis>must</emphasis> have either a true primary key or a
! candidate primary key.  </para></warning>
  </listitem>
  
***************
*** 281,286 ****
  lock on them; doing so via <command>execute script</command> requires
  that &slony1; take out an exclusive lock on <emphasis>all</emphasis>
! replicated tables.  This can prove quite inconvenient when
! applications are running; you run into deadlocks and such. </para>
  
  <para> One particularly dogmatic position that some hold is that
--- 302,310 ----
  lock on them; doing so via <command>execute script</command> requires
  that &slony1; take out an exclusive lock on <emphasis>all</emphasis>
! replicated tables.  This can prove quite inconvenient if applications
! are running when running DDL; &slony1; is asking for those exclusive
! table locks, whilst, simultaneously, some application connections are
! gradually relinquishing locks, whilst others are backing up behind the
! &slony1; locks.  </para>
  
  <para> One particularly dogmatic position that some hold is that
***************
*** 428,433 ****
  </listitem>
  
! <listitem><para> Use <filename>test_slony_state.pl</filename> to look
! for configuration problems.</para>
  
  <para>This is a Perl script which connects to a &slony1; node and then
--- 452,457 ----
  </listitem>
  
! <listitem><para> Run &lteststate; frequently to discover configuration
! problems as early as possible.</para>
  
  <para>This is a Perl script which connects to a &slony1; node and then
***************
*** 443,446 ****
--- 467,476 ----
  tool can run through many of the possible problems for you. </para>
  
+ <para> It will also notice a number of sorts of situations where
+ something has broken.  Not only should it be run when problems have
+ been noticed - it should be run frequently (<emphasis>e.g.</emphasis>
+ - hourly, or thereabouts) as a general purpose <quote>health
+ check</quote> for each &slony1; cluster. </para>
+ 
  </listitem>
  
***************
*** 491,494 ****
--- 521,533 ----
  user out of the new subscriber because:
  </para>
+ 
+ <para> It is also a very good idea to change &lslon; configuration for
+ <xref linkend="slon-config-sync-interval"> on the origin node to
+ reduce how many <command>SYNC</command> events are generated.  If the
+ subscription takes 8 hours, there is little sense in there being 28800
+ <command>SYNC</command>s waiting to be applied.  Running a
+ <command>SYNC</command> every minute or so is likely to make catching
+ up easier.</para>
+ 
  </listitem>
  </itemizedlist>
***************
*** 575,580 ****
  
  <para> There will correspondingly be an <emphasis>enormous</emphasis>
! growth of <xref linkend="table.sl-log-1"> and <xref
! linkend="table.sl-seqlog">.  Unfortunately, once the
  <command>COPY_SET</command> completes, users have found that the
  queries against these tables wind up reverting to <command>Seq
--- 614,618 ----
  
  <para> There will correspondingly be an <emphasis>enormous</emphasis>
! growth of &sllog1;, &sllog2;, and &slseqlog;.  Unfortunately, once the
  <command>COPY_SET</command> completes, users have found that the
  queries against these tables wind up reverting to <command>Seq
***************
*** 599,605 ****
  the exact form that the index setup should take. </para> 
  
! <para> In 1.2, there is a process that runs automatically to add
! partial indexes by origin node number, which should be the optimal
! form for such an index to take.  </para>
  </listitem>
  
--- 637,643 ----
  the exact form that the index setup should take. </para> 
  
! <para> In 1.2 and later versions, there is a process that runs
! automatically to add partial indexes by origin node number, which
! should be the optimal form for such an index to take.  </para>
  </listitem>
  


Index: slonik_ref.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/slonik_ref.sgml,v
retrieving revision 1.61.2.13
retrieving revision 1.61.2.14
diff -C2 -d -r1.61.2.13 -r1.61.2.14
*** slonik_ref.sgml	19 Jun 2008 20:34:00 -0000	1.61.2.13
--- slonik_ref.sgml	30 Apr 2009 16:06:10 -0000	1.61.2.14
***************
*** 49,55 ****
        The slonik command language is format free. Commands begin with
        keywords and are terminated with a semicolon. Most commands have
!       a list of parameters, some of which have default values and are
!       therefore optional. The parameters of commands are enclosed in
!       parentheses. Each option consists of one or more keywords,
        followed by an equal sign, followed by a value. Multiple options
        inside the parentheses are separated by commas. All keywords are
--- 49,55 ----
        The slonik command language is format free. Commands begin with
        keywords and are terminated with a semicolon. Most commands have
[...1308 lines suppressed...]
+     <para>
+      This completes the work done by <xref
+      linkend="stmtcloneprepare">, establishing confirmation data for
+      the new <quote>clone</quote> based on the status found for the
+      <quote>provider</quote> node.
+     </para>
+    </Refsect1>
+    <Refsect1><Title>Example</Title>
+     <Programlisting>
+      clone finish (id = 33, provider = 22);
+     </Programlisting>
+    </Refsect1>
+    <refsect1> <title> Version Information </title>
+     <para> This command was introduced in &slony1; 2.0. </para>
+    </refsect1>
+   </Refentry>
+ 
+ 
   </reference>
  <!-- Keep this comment at the end of the file

Index: subscribenodes.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/subscribenodes.sgml,v
retrieving revision 1.16
retrieving revision 1.16.2.1
diff -C2 -d -r1.16 -r1.16.2.1
*** subscribenodes.sgml	2 Aug 2006 18:34:59 -0000	1.16
--- subscribenodes.sgml	30 Apr 2009 16:06:10 -0000	1.16.2.1
***************
*** 94,98 ****
  
  <screen>
! 2005-04-13 07:11:28 PDT ERROR remoteWorkerThread_11: "declare LOG
  cursor for select log_origin, log_xid, log_tableid, log_actionseq,
  log_cmdtype, log_cmddata from "_T1".sl_log_1 where log_origin = 11 and
--- 94,98 ----
  
  <screen>
! 2007-04-13 07:11:28 PDT ERROR remoteWorkerThread_11: "declare LOG
  cursor for select log_origin, log_xid, log_tableid, log_actionseq,
  log_cmdtype, log_cmddata from "_T1".sl_log_1 where log_origin = 11 and

Index: loganalysis.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/loganalysis.sgml,v
retrieving revision 1.4.2.5
retrieving revision 1.4.2.6
diff -C2 -d -r1.4.2.5 -r1.4.2.6
*** loganalysis.sgml	22 Oct 2007 20:47:48 -0000	1.4.2.5
--- loganalysis.sgml	30 Apr 2009 16:06:10 -0000	1.4.2.6
***************
*** 26,33 ****
  </screen></para></sect2>
  
  <sect2><title>DEBUG Notices</title>
  
! <para>Debug notices are always prefaced by the name of the thread that
! the notice originates from. You will see messages from the following
  threads:
  
--- 26,48 ----
  </screen></para></sect2>
  
+ <sect2><title>INFO notices</title>
+ 
+ <para> Events that take place that seem like they will generally be of
+ interest are recorded at the INFO level, and, just as with CONFIG
+ notices, are always listed. </para>
+ 
+ </sect2>
+ 
  <sect2><title>DEBUG Notices</title>
  
! <para>Debug notices are of less interest, and will quite likely only
! need to be shown if you are running into some problem with &slony1;.</para>
! 
! </sect2>
! 
! <sect2><title>Thread name </title>
! 
! <para> Notices are always prefaced by the name of the thread from
! which the notice originates. You will see messages from the following
  threads:
  
***************
*** 60,68 ****
  </para>
  
! <para> How much information they display is controlled by
! the <envar>log_level</envar> &lslon; parameter;
! ERROR/WARN/CONFIG/INFO messages will always be displayed, while
! choosing increasing values from 1 to 4 will lead to additional DEBUG
! level messages being displayed. </para>
  </sect2>
  
--- 75,83 ----
  </para>
  
! <para> How much information they display is controlled by the
! <envar>log_level</envar> &lslon; parameter; ERROR/WARN/CONFIG/INFO
! messages will always be displayed, while choosing increasing values
! from 1 to 4 will lead to additional DEBUG level messages being
! displayed. </para>
  </sect2>
  
***************
*** 177,185 ****
  <para> This section lists numerous of the error messages found in
  &slony1;, along with a brief explanation of implications.  It is a
! fairly well comprehensive list, leaving out mostly some of
! the <command>DEBUG4</command> messages that are generally
  uninteresting.</para>
  
! <sect3 id="logshiplog"><title> Log Messages Associated with Log Shipping </title>
  
  <para> Most of these represent errors that come up if
--- 192,201 ----
  <para> This section lists numerous of the error messages found in
  &slony1;, along with a brief explanation of implications.  It is a
! fairly comprehensive list, only leaving out some of the
! <command>DEBUG4</command> messages that are almost always
  uninteresting.</para>
  
! <sect3 id="logshiplog"><title> Log Messages Associated with Log
! Shipping </title>
  
  <para> Most of these represent errors that come up if
***************
*** 1030,1034 ****
  <listitem><para><command>WARN: remoteWorkerThread_%d: event %d ignored - origin inactive</command></para> 
  
! <para> This shouldn't occur now (2006) as we don't support the notion
  of deactivating a node... </para>
  </listitem>
--- 1046,1050 ----
  <listitem><para><command>WARN: remoteWorkerThread_%d: event %d ignored - origin inactive</command></para> 
  
! <para> This shouldn't occur now (2007) as we don't support the notion
  of deactivating a node... </para>
  </listitem>
***************
*** 1044,1047 ****
--- 1060,1072 ----
  of <command>STORE_NODE</command> requests not
  propagating... </para> </listitem>
+ 
+ <listitem><para><command>insert or update on table "sl_path" violates
+ foreign key constraint "pa_client-no_id-ref".  DETAIL: Key
+ (pa_client)=(2) is not present on table "s1_node</command></para>
+ 
+ <para> This happens if you try to do <xref linkend="stmtsubscribeset">
+ when the node unaware of a would-be new node; probably a sign of
+ <command>STORE_NODE</command> and <command>STORE_PATH</command>
+ requests not propagating... </para> </listitem>
  </itemizedlist>
  </sect3>

Index: slonconf.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/slonconf.sgml,v
retrieving revision 1.14.2.4
retrieving revision 1.14.2.5
diff -C2 -d -r1.14.2.4 -r1.14.2.5
*** slonconf.sgml	7 May 2008 19:26:33 -0000	1.14.2.4
--- slonconf.sgml	30 Apr 2009 16:06:10 -0000	1.14.2.5
***************
*** 87,96 ****
        </indexterm>
        <listitem>
!         <para>Debug log level (higher value ==> more output).  Range: [0,4], default 2</para>
  
  	<para> There are <link linkend="nineloglevels">nine log
! 	message types</link>; using this option, some or all of
! 	the <quote>debugging</quote> levels may be left out of the
! 	slon logs. </para>
  
        </listitem>
--- 87,101 ----
        </indexterm>
        <listitem>
!         <para>Debug log level (higher value ==> more output).  Range: [0,4], default 0</para>
  
  	<para> There are <link linkend="nineloglevels">nine log
! 	message types</link>; using this option, some or all of the
! 	<quote>debugging</quote> levels may be left out of the slon
! 	logs.  In &slony1; version 2, a lot of log message levels have
! 	been revised in an attempt to ensure the <quote>interesting
! 	stuff</quote> comes in at CONFIG/INFO levels, so that you
! 	could run at level 0, omitting all of the <quote>DEBUG</quote>
! 	messages, and still have meaningful contents in the
! 	logs. </para>
  
        </listitem>
***************
*** 118,127 ****
            appear in each log line entry.
          </para>
        </listitem>
      </varlistentry>
  
- 
- 
- 
      <varlistentry id="slon-config-logging-log-timestamp-format" xreflabel="slon_conf_log_timestamp_format">
        <term><varname>log_timestamp_format</varname> (<type>string</type>)</term>
--- 123,135 ----
            appear in each log line entry.
          </para>
+ 
+         <para> Note that if <envar>syslog</envar> usage is configured,
+         then this is ignored; it is assumed that
+         <application>syslog</application> will be supplying
+         timestamps, and timestamps are therefore suppressed.
+         </para>
        </listitem>
      </varlistentry>
  
      <varlistentry id="slon-config-logging-log-timestamp-format" xreflabel="slon_conf_log_timestamp_format">
        <term><varname>log_timestamp_format</varname> (<type>string</type>)</term>
***************
*** 267,270 ****
--- 275,285 ----
            Range: [10-60000], default 100
          </para>
+ 
+         <para> This parameter is primarily of concern on nodes that
+           originate replication sets.  On a non-origin node, there
+           will never be update activity that would induce a SYNC;
+           instead, the timeout value described below will induce a
+           SYNC every so often <emphasis>despite absence of changes to
+           replicate.</emphasis> </para>
        </listitem>
      </varlistentry>
***************
*** 293,296 ****
--- 308,346 ----
            default 1000
          </para>
+ 
+         <para> This parameter is likely to be primarily of concern on
+           nodes that originate replication sets, though it does affect
+           how often events are generated on other nodes.</para>
+ 
+ 	<para>
+           On a non-origin node, there never is activity to cause a
+           SYNC to get generated; as a result, there will be a SYNC
+           generated every <envar>sync_interval_timeout</envar>
+           milliseconds.  There are no subscribers looking for those
+           SYNCs, so these events do not lead to any replication
+           activity.  They will, however, clutter sl_event up a little,
+           so it would be undesirable for this timeout value to be set
+           too terribly low.  120000ms represents 2 minutes, which is
+           not a terrible value.
+         </para>
+ 
+ 	<para> The two values function together in varying ways: </para>
+ 
+ 	<para> On an origin node, <envar>sync_interval</envar> is
+ 	the <emphasis>minimum</emphasis> time period that will be
+ 	covered by a SYNC, and during periods of heavy application
+ 	activity, it may be that a SYNC is being generated
+ 	every <envar>sync_interval</envar> milliseconds. </para>
+ 
+ 	<para> On that same origin node, there may be quiet intervals,
+ 	when no replicatable changes are being submitted.  A SYNC will
+ 	be induced, anyways,
+ 	every <envar>sync_interval_timeout</envar>
+ 	milliseconds. </para>
+ 
+ 	<para> On a subscriber node that does not originate any sets,
+ 	only the <quote>timeout-induced</quote> SYNCs will
+ 	occur.  </para>
+ 
        </listitem>
      </varlistentry>
***************
*** 302,317 ****
        </indexterm>
        <listitem>
          <para>
!           Maximum number of <command>SYNC</command> events to group
!           together when/if a subscriber falls behind.
!           <command>SYNC</command>s are batched only if there are that
!           many available and if they are contiguous. Every other event
!           type in between leads to a smaller batch.  And if there is
!           only one <command>SYNC</command> available, even
!           <option>-g60</option> will apply just that one. As soon as a
!           subscriber catches up, it will apply every single
!           <command>SYNC</command> by itself.  Range: [0,10000], default:
!           20
          </para>
        </listitem>
      </varlistentry>
--- 352,372 ----
        </indexterm>
        <listitem>
+ 
          <para>
!           Maximum number of <command>SYNC</command> events that a
!           subscriber node will group together when/if a subscriber
!           falls behind.  <command>SYNC</command>s are batched only if
!           there are that many available and if they are
!           contiguous.  Every other event type in between leads to a
!           smaller batch.  And if there is only
!           one <command>SYNC</command> available, even though you used
!           <option>-g600</option>, the &lslon; will apply just the one
!           that is available.  As soon as a subscriber catches up, it
!           will tend to apply each
!           <command>SYNC</command> by itself, as a singleton, unless
!           processing should fall behind for some reason.  Range:
!           [0,10000], default: 20
          </para>
+ 
        </listitem>
      </varlistentry>
***************
*** 331,334 ****
--- 386,420 ----
        </listitem>
      </varlistentry>
+ 
+ 
+     <varlistentry id="slon-config-cleanup-interval" xreflabel="slon_config_cleanup_interval">
+       <term><varname>cleanup_interval</varname> (<type>interval</type>)</term>
+       <indexterm>
+         <primary><varname>cleanup_interval</varname> configuration parameter</primary>
+       </indexterm>
+       <listitem>
+         <para>
+           Controls how quickly old events are trimmed out.  That
+           subsequently controls when the data in the log tables,
+           <envar>sl_log_1</envar> and <envar>sl_log_2</envar>, get
+           trimmed out.  Default: '10 minutes'.
+         </para>
+       </listitem>
+     </varlistentry>
+ 
+     <varlistentry id="slon-config-cleanup-deletelogs" xreflabel="slon_conf_cleanup_deletelogs">
+       <term><varname>cleanup_deletelogs</varname> (<type>boolean</type>)</term>
+       <indexterm>
+         <primary><varname>cleanup_deletelogs</varname> configuration parameter</primary>
+       </indexterm>
+       <listitem>
+         <para>
+           Controls whether or not we use DELETE to trim old data from the log tables,
+           <envar>sl_log_1</envar> and <envar>sl_log_2</envar>.
+           Default: false
+         </para>
+       </listitem>
+     </varlistentry>
+ 
      <varlistentry id="slon-config-desired-sync-time" xreflabel="desired_sync_time">
        <term><varname>desired_sync_time</varname>  (<type>integer</type>)</term>
***************
*** 443,447 ****
        </indexterm>
        <listitem>
!         <para>How long, in milliseconds should the remote listener wait before treating the event selection criteria as having timed out?
            Range: [30-30000], default 300ms
          </para>
--- 529,533 ----
        </indexterm>
        <listitem>
!         <para>How long, in milliseconds, should the remote listener wait before treating the event selection criteria as having timed out?
            Range: [30-30000], default 300ms
          </para>

Index: filelist.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/filelist.sgml,v
retrieving revision 1.18.2.1
retrieving revision 1.18.2.2
diff -C2 -d -r1.18.2.1 -r1.18.2.2
*** filelist.sgml	5 Sep 2007 21:36:31 -0000	1.18.2.1
--- filelist.sgml	30 Apr 2009 16:06:10 -0000	1.18.2.2
***************
*** 45,49 ****
--- 45,51 ----
  <!entity slonyupgrade       SYSTEM "slonyupgrade.sgml">
  <!entity releasechecklist   SYSTEM "releasechecklist.sgml">
+ <!entity raceconditions     SYSTEM "raceconditions.sgml">
  <!entity partitioning       SYSTEM "partitioning.sgml">
+ <!entity triggers           SYSTEM "triggers.sgml">
  
  <!-- back matter -->

Index: reshape.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/reshape.sgml,v
retrieving revision 1.20.2.1
retrieving revision 1.20.2.2
diff -C2 -d -r1.20.2.1 -r1.20.2.2
*** reshape.sgml	22 Oct 2007 20:50:55 -0000	1.20.2.1
--- reshape.sgml	30 Apr 2009 16:06:10 -0000	1.20.2.2
***************
*** 40,43 ****
--- 40,48 ----
  about <xref linkend="stmtstorelisten">.</para></listitem>
  
+ <listitem><para> After performing the configuration change, you
+ should, as <xref linkend="bestpractices">, run the &lteststate;
+ scripts in order to validate that the cluster state remains in good
+ order after this change. </para> </listitem>
+ 
  </itemizedlist>
  </para>

Index: monitoring.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/monitoring.sgml,v
retrieving revision 1.29.2.8
retrieving revision 1.29.2.9
diff -C2 -d -r1.29.2.8 -r1.29.2.9
*** monitoring.sgml	11 Jun 2007 16:01:33 -0000	1.29.2.8
--- monitoring.sgml	30 Apr 2009 16:06:10 -0000	1.29.2.9
***************
*** 5,8 ****
--- 5,168 ----
  <indexterm><primary>monitoring &slony1;</primary></indexterm>
  
+ <para> As a prelude to the discussion, it is worth pointing out that
+ since the bulk of &slony1; functionality is implemented via running
+ database functions and SQL queries against tables within a &slony1;
+ schema, most of the things that one might want to monitor about
+ replication may be found by querying tables in the schema created for
+ the cluster in each database in the cluster. </para>
+ 
+ <para> Here are some of the tables that contain information likely to
+ be particularly interesting from a monitoring and diagnostic
+ perspective.</para>
+ 
+ <glosslist>
+ <glossentry><glossterm><envar>sl_status</envar></glossterm>
+ 
+ <glossdef><para>This view is the first, most obviously useful thing to
+ look at from a monitoring perspective.  It looks at the local node's
+ events, and checks to see how quickly they are being confirmed on
+ other nodes.</para>
+ 
+ <para> The view is primarily useful to run against an origin
+ (<quote>master</quote>) node, as it is only there where the events
+ generated are generally expected to require interesting work to be
+ done.  The events generated on non-origin nodes tend to
+ be <command>SYNC</command> events that require no replication work be
+ done, and that are nearly no-ops, as a
+ result. </para></glossdef></glossentry>
+ 
+ <glossentry><glossterm>&slconfirm;</glossterm>
+ 
+ <glossdef><para>Contains confirmations of replication events; this may be used to infer which events have, and <emphasis>have not</emphasis> been processed.</para></glossdef></glossentry>
+ 
+ <glossentry><glossterm>&slevent;</glossterm>
+ <glossdef><para>Contains information about the replication events processed on the local node.  </para></glossdef></glossentry>
+ 
+ <glossentry><glossterm>
+ &sllog1;
+ and
+ &sllog2;
+ </glossterm>
+ 
+ <glossdef><para>These tables contain replicable data.  On an origin node, this is the <quote>queue</quote> of data that has not necessarily been replicated everywhere.  By examining the table, you may examine the details of what data is replicable. </para></glossdef></glossentry>
+ 
+ <glossentry><glossterm>&slnode;</glossterm>
+ <glossdef><para>The list of nodes in the cluster.</para></glossdef></glossentry>
+ 
+ <glossentry><glossterm>&slpath;</glossterm>
+ <glossdef><para>This table holds connection information indicating how &lslon; processes are to connect to remote nodes, whether to access events, or to request replication data. </para></glossdef></glossentry>
+ 
+ <glossentry><glossterm>&sllisten;</glossterm>
+ 
+ <glossdef><para>This configuration table indicates how nodes listen
+ for events coming from other nodes.  Usually this is automatically
+ populated; generally you can detect configuration problems by this
+ table being <quote>underpopulated.</quote> </para></glossdef></glossentry>
+ 
+ <glossentry><glossterm>&slregistry;</glossterm>
+ 
+ <glossdef><para>A configuration table that may be used to store
+ miscellaneous runtime data.  Presently used only to manage switching
+ between the two log tables.  </para></glossdef></glossentry>
+ 
+ <glossentry><glossterm>&slseqlog;</glossterm>
+ 
+ <glossdef><para>Contains the <quote>last value</quote> of replicated
+ sequences.</para></glossdef></glossentry>
+ 
+ <glossentry><glossterm>&slset;</glossterm>
+ 
+ <glossdef><para>Contains definition information for replication sets,
+ which is the mechanism used to group together related replicable
+ tables and sequences.</para></glossdef></glossentry>
+ 
+ <glossentry><glossterm>&slsetsync;</glossterm>
+ <glossdef><para>Contains information about the state of synchronization of each replication set, including transaction snapshot data.</para></glossdef></glossentry>
+ 
+ <glossentry><glossterm>&slsubscribe;</glossterm>
+ <glossdef><para>Indicates what subscriptions are in effect for each replication set.</para></glossdef></glossentry>
+ 
+ <glossentry><glossterm>&sltable;</glossterm>
+ <glossdef><para>Contains the list of tables being replicated.</para></glossdef></glossentry>
+ 
+ </glosslist>
+ 
+ <sect2 id="testslonystate"> <title> test_slony_state</title>
+ 
+ <indexterm><primary>script test_slony_state to test replication state</primary></indexterm>
+ 
+ <para> This invaluable script does various sorts of analysis of the
+ state of a &slony1; cluster.  &slony1; <xref linkend="bestpractices">
+ recommend running these scripts frequently (hourly seems suitable) to
+ find problems as early as possible.  </para>
+ 
+ <para> You specify arguments including <option>database</option>,
+ <option>host</option>, <option>user</option>,
+ <option>cluster</option>, <option>password</option>, and
+ <option>port</option> to connect to any of the nodes on a cluster.
+ You also specify a <option>mailprog</option> command (which should be
+ a program equivalent to <productname>Unix</productname>
+ <application>mailx</application>) and a recipient of email. </para>
+ 
+ <para> You may alternatively specify database connection parameters
+ via the environment variables used by
+ <application>libpq</application>, <emphasis>e.g.</emphasis> - using
+ <envar>PGPORT</envar>, <envar>PGDATABASE</envar>,
+ <envar>PGUSER</envar>, <envar>PGSERVICE</envar>, and such.</para>
+ 
+ <para> The script then rummages through <xref linkend="table.sl-path">
+ to find all of the nodes in the cluster, and the DSNs to allow it to,
+ in turn, connect to each of them.</para>
+ 
+ <para> For each node, the script examines the state of things,
+ including such things as:
+ 
+ <itemizedlist>
+ <listitem><para> Checking <xref linkend="table.sl-listen"> for some
+ <quote>analytically determinable</quote> problems.  It lists paths
+ that are not covered.</para></listitem>
+ 
+ <listitem><para> Providing a summary of events by origin node</para>
+ 
+ <para> If a node hasn't submitted any events in a while, that likely
+ suggests a problem.</para></listitem>
+ 
+ <listitem><para> Summarizes the <quote>aging</quote> of table <xref
+ linkend="table.sl-confirm"> </para>
+ 
+ <para> If one or another of the nodes in the cluster hasn't reported
+ back recently, that tends to lead to cleanups of tables like &sllog1;,
+ &sllog2; and &slseqlog; not taking place.</para></listitem>
+ 
+ <listitem><para> Summarizes what transactions have been running for a
+ long time</para>
+ 
+ <para> This only works properly if the statistics collector is
+ configured to collect command strings, as controlled by the option
+ <option> stats_command_string = true </option> in <filename>
+ postgresql.conf </filename>.</para>
+ 
+ <para> If you have broken applications that hold connections open,
+ this will find them.</para>
+ 
+ <para> If you have broken applications that hold connections open,
+ that has several unsalutory effects as <link
+ linkend="longtxnsareevil"> described in the
+ FAQ</link>.</para></listitem>
+ 
+ </itemizedlist></para>
+ 
+ <para> The script does some diagnosis work based on parameters in the
+ script; if you don't like the values, pick your favorites!</para>
+ 
+ <note><para> Note that there are two versions, one using the
+ <quote>classic</quote> <filename>Pg.pm</filename> Perl module for
+ accessing &postgres; databases, and one, with <filename>dbi</filename>
+ in its name, that uses the newer Perl <function> DBI</function>
+ interface.  It is likely going to be easier to find packaging for
+ <function>DBI</function>. </para> </note>
+ 
+ </sect2>
+ 
  <sect2> <title> &nagios; Replication Checks </title>
  
***************
*** 95,166 ****
  Options[db_replication_lagtime]: gauge,nopercent,growright
  </programlisting>
- </sect2>
- 
- <sect2 id="testslonystate"> <title> test_slony_state</title>
- 
- <indexterm><primary>script test_slony_state to test replication state</primary></indexterm>
- 
- <para> This script does various sorts of analysis of the state of a
- &slony1; cluster.</para>
- 
- <para> You specify arguments including <option>database</option>,
- <option>host</option>, <option>user</option>,
- <option>cluster</option>, <option>password</option>, and
- <option>port</option> to connect to any of the nodes on a cluster.
- You also specify a <option>mailprog</option> command (which should be
- a program equivalent to <productname>Unix</productname>
- <application>mailx</application>) and a recipient of email. </para>
- 
- <para> You may alternatively specify database connection parameters
- via the environment variables used by
- <application>libpq</application>, <emphasis>e.g.</emphasis> - using
- <envar>PGPORT</envar>, <envar>PGDATABASE</envar>,
- <envar>PGUSER</envar>, <envar>PGSERVICE</envar>, and such.</para>
- 
- <para> The script then rummages through <xref linkend="table.sl-path">
- to find all of the nodes in the cluster, and the DSNs to allow it to,
- in turn, connect to each of them.</para>
- 
- <para> For each node, the script examines the state of things,
- including such things as:
  
! <itemizedlist>
! <listitem><para> Checking <xref linkend="table.sl-listen"> for some
! <quote>analytically determinable</quote> problems.  It lists paths
! that are not covered.</para></listitem>
! 
! <listitem><para> Providing a summary of events by origin node</para>
! 
! <para> If a node hasn't submitted any events in a while, that likely
! suggests a problem.</para></listitem>
! 
! <listitem><para> Summarizes the <quote>aging</quote> of table <xref
! linkend="table.sl-confirm"> </para>
  
! <para> If one or another of the nodes in the cluster hasn't reported
! back recently, that tends to lead to cleanups of tables like <xref
! linkend="table.sl-log-1"> and <xref linkend="table.sl-seqlog"> not
! taking place.</para></listitem>
  
! <listitem><para> Summarizes what transactions have been running for a
! long time</para>
  
! <para> This only works properly if the statistics collector is
! configured to collect command strings, as controlled by the option
! <option> stats_command_string = true </option> in <filename>
! postgresql.conf </filename>.</para>
  
! <para> If you have broken applications that hold connections open,
! this will find them.</para>
  
! <para> If you have broken applications that hold connections open,
! that has several unsalutory effects as <link
! linkend="longtxnsareevil"> described in the
! FAQ</link>.</para></listitem>
  
- </itemizedlist></para>
  
! <para> The script does some diagnosis work based on parameters in the
! script; if you don't like the values, pick your favorites!</para>
  
  </sect2>
--- 255,292 ----
  Options[db_replication_lagtime]: gauge,nopercent,growright
  </programlisting>
  
! <para> Alternatively, Ismail Yenigul points out how he managed to
! monitor slony using <application>MRTG</application> without installing
! <application>SNMPD</application>.</para>
  
! <para> Here is the mrtg configuration</para>
  
! <programlisting>
! Target[db_replication_lagtime]:`/bin/snmpReplicationLagTime.sh 2`
! MaxBytes[db_replication_lagtime]: 400000000
! Title[db_replication_lagtime]: db: replication lag time
! PageTop[db_replication_lagtime]: &lt;H1&gt;db: replication lag time&lt;/H1&gt;
! Options[db_replication_lagtime]: gauge,nopercent,growright
! </programlisting>
  
! <para> and here is the modified version of the script</para>
  
! <programlisting>
! # cat /bin/snmpReplicationLagTime.sh
! #!/bin/bash
  
! output=`/usr/bin/psql -U slony -h 192.168.1.1 -d endersysecm -qAt -c
! "select cast(extract(epoch from st_lag_time) as int8) FROM _mycluster.sl_status WHERE st_received = $1"`
! echo $output
! echo $output
! echo 
! echo
! # end of script#
! </programlisting>
  
  
! <note><para> MRTG expects four lines from the script, and since there
! are only two lines provided, the output must be padded to four
! lines. </para> </note>
  
  </sect2>
***************
*** 194,198 ****
  <filename>tools</filename>, may be used to generate a cluster summary
  compatible with the popular <ulink url="http://www.mediawiki.org/">
! MediaWiki </ulink> software. </para>
  
  <para> The gentle user might use the script as follows: </para>
--- 320,330 ----
  <filename>tools</filename>, may be used to generate a cluster summary
  compatible with the popular <ulink url="http://www.mediawiki.org/">
! MediaWiki </ulink> software.  Note that the
! <option>--categories</option> permits the user to specify a set of
! (comma-delimited) categories with which to associate the output.  If
! you have a series of &slony1; clusters, passing in the option
! <option>--categories=slony1</option> leads to the MediaWiki instance
! generating a category page listing all &slony1; clusters so
! categorized on the wiki.  </para>
  
  <para> The gentle user might use the script as follows: </para>
***************
*** 201,205 ****
  ~/logtail.en>         mvs login -d mywiki.example.info -u "Chris Browne" -p `cat ~/.wikipass` -w wiki/index.php                     
  Doing login with host: logtail and lang: en
! ~/logtail.en> perl $SLONYHOME/tools/mkmediawiki.pl --host localhost --database slonyregress1 --cluster slony_regress1  > Slony_replication.wiki
  ~/logtail.en> mvs commit -m "More sophisticated generated Slony-I cluster docs" Slony_replication.wiki
  Doing commit Slony_replication.wiki with host: logtail and lang: en
--- 333,337 ----
  ~/logtail.en>         mvs login -d mywiki.example.info -u "Chris Browne" -p `cat ~/.wikipass` -w wiki/index.php                     
  Doing login with host: logtail and lang: en
! ~/logtail.en> perl $SLONYHOME/tools/mkmediawiki.pl --host localhost --database slonyregress1 --cluster slony_regress1 --categories=Slony-I  > Slony_replication.wiki
  ~/logtail.en> mvs commit -m "More sophisticated generated Slony-I cluster docs" Slony_replication.wiki
  Doing commit Slony_replication.wiki with host: logtail and lang: en
***************
*** 213,216 ****
--- 345,424 ----
  
  </sect2>
+ 
+ <sect2> <title> Analysis of a SYNC </title>
+ 
+ <para> The following is (as of 2.0) an extract from the &lslon; log for node
+ #2 in a run of <quote>test1</quote> from the <xref linkend="testbed">. </para>
+ 
+ <screen>
+ DEBUG2 remoteWorkerThread_1: SYNC 19 processing
+ INFO   about to monitor_subscriber_query - pulling big actionid list 134885072
+ INFO   remoteWorkerThread_1: syncing set 1 with 4 table(s) from provider 1
+ DEBUG2  ssy_action_list length: 0
+ DEBUG2 remoteWorkerThread_1: current local log_status is 0
+ DEBUG2 remoteWorkerThread_1_1: current remote log_status = 0
+ DEBUG1 remoteHelperThread_1_1: 0.028 seconds delay for first row
+ DEBUG1 remoteHelperThread_1_1: 0.978 seconds until close cursor
+ INFO   remoteHelperThread_1_1: inserts=144 updates=1084 deletes=0
+ INFO   remoteWorkerThread_1: sync_helper timing:  pqexec (s/count)- provider 0.063/6 - subscriber 0.000/6
+ INFO   remoteWorkerThread_1: sync_helper timing:  large tuples 0.315/288
+ DEBUG2 remoteWorkerThread_1: cleanup
+ INFO   remoteWorkerThread_1: SYNC 19 done in 1.272 seconds
+ INFO   remoteWorkerThread_1: SYNC 19 sync_event timing:  pqexec (s/count)- provider 0.001/1 - subscriber 0.004/1 - IUD 0.972/248
+ </screen>
+ 
+ <para> Here are some notes to interpret this output: </para>
+ 
+ <itemizedlist>
+ <listitem><para> Note the line that indicates <screen>inserts=144 updates=1084 deletes=0</screen> </para> 
+ <para> This indicates how many tuples were affected by this particular SYNC. </para></listitem>
+ <listitem><para> Note the line indicating <screen>0.028 seconds delay for first row</screen></para> 
+ 
+ <para> This indicates the time it took for the <screen>LOG
+ cursor</screen> to get to the point of processing the first row of
+ data.  Normally, this takes a long time if the SYNC is a large one,
+ and one requiring sorting of a sizable result set.</para></listitem>
+ 
+ <listitem><para> Note the line indicating <screen>0.978 seconds until
+ close cursor</screen></para> 
+ 
+ <para> This indicates how long processing took against the
+ provider.</para></listitem>
+ 
+ <listitem><para> sync_helper timing:  large tuples 0.315/288 </para> 
+ 
+ <para> This breaks off, as a separate item, the number of large tuples
+ (<emphasis>e.g.</emphasis> - where size exceeded the configuration
+ parameter <xref linkend="slon-config-max-rowsize">) and where the
+ tuples had to be processed individually. </para></listitem>
+ 
+ <listitem><para> <screen>SYNC 19 done in 1.272 seconds</screen></para> 
+ 
+ <para> This indicates that it took 1.272 seconds, in total, to process
+ this set of SYNCs. </para>
+ </listitem>
+ 
+ <listitem><para> <screen>SYNC 19 sync_event timing:  pqexec (s/count)- provider 0.001/1 - subscriber 0.004/0 - IUD 0.972/248</screen></para> 
+ 
+ <para> This records information about how many queries were issued
+ against providers and subscribers in function
+ <function>sync_event()</function>, and how long they took. </para>
+ 
+ <para> Note that 248 does not match against the numbers of inserts,
+ updates, and deletes, described earlier, as I/U/D requests are
+ clustered into groups of queries that are submitted via a single
+ <function>pqexec()</function> call on the
+ subscriber. </para></listitem>
+ 
+ <listitem><para> <screen>sync_helper timing:  pqexec (s/count)- provider 0.063/6 - subscriber 0.000/6</screen></para>
+ 
+ <para> This records information about how many queries were issued
+ against providers and subscribers in function
+ <function>sync_helper()</function>, and how long they took.
+ </para></listitem>
+ 
+ </itemizedlist>
+ 
+ </sect2>
  </sect1>
  <!-- Keep this comment at the end of the file

Index: usingslonik.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/usingslonik.sgml,v
retrieving revision 1.18.2.1
retrieving revision 1.18.2.2
diff -C2 -d -r1.18.2.1 -r1.18.2.2
*** usingslonik.sgml	11 Jun 2007 16:01:33 -0000	1.18.2.1
--- usingslonik.sgml	30 Apr 2009 16:06:10 -0000	1.18.2.2
***************
*** 112,123 ****
  
  	try {
- 		table add key (node id = 1, fully qualified name = 
-                                'public.history');
- 	}
- 	on error {
- 		exit 1;
- 	}
- 
- 	try {
  		create set (id = 1, origin = 1, comment = 
                              'Set 1 - pgbench tables');
--- 112,115 ----
***************
*** 133,137 ****
  		set add table (set id = 1, origin = 1,
  			id = 4, fully qualified name = 'public.history',
! 			key = serial, comment = 'Table accounts');
  	}
  	on error {
--- 125,129 ----
  		set add table (set id = 1, origin = 1,
  			id = 4, fully qualified name = 'public.history',
! 			comment = 'Table accounts');
  	}
  	on error {
***************
*** 173,182 ****
  $PREAMBLE
  try {
-     table add key (node id = $origin, fully qualified name = 
-                    'public.history');
- } on error {
-     exit 1;
- }
- try {
  	create set (id = $mainset, origin = $origin, 
                      comment = 'Set $mainset - pgbench tables');
--- 165,168 ----
***************
*** 192,196 ****
  	set add table (set id = $mainset, origin = $origin,
  		id = 4, fully qualified name = 'public.history',
! 		key = serial, comment = 'Table accounts');
  } on error {
  	exit 1;
--- 178,182 ----
  	set add table (set id = $mainset, origin = $origin,
  		id = 4, fully qualified name = 'public.history',
! 		comment = 'Table accounts');
  } on error {
  	exit 1;
***************
*** 222,231 ****
  $PREAMBLE
  try {
-     table add key (node id = $origin, fully qualified name = 
-                    'public.history');
- } on error {
-     exit 1;
- }
- try {
  	create set (id = $mainset, origin = $origin, 
                      comment = 'Set $mainset - pgbench tables');
--- 208,211 ----

Index: listenpaths.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/listenpaths.sgml,v
retrieving revision 1.19.2.1
retrieving revision 1.19.2.2
diff -C2 -d -r1.19.2.1 -r1.19.2.2
*** listenpaths.sgml	11 Jun 2007 16:01:33 -0000	1.19.2.1
--- listenpaths.sgml	30 Apr 2009 16:06:10 -0000	1.19.2.2
***************
*** 26,32 ****
  <emphasis>all</emphasis> nodes in order to be able to conclude that
  <command>sync</command>s have been received everywhere, and that,
! therefore, entries in <xref linkend="table.sl-log-1"> and <xref
! linkend="table.sl-log-2"> have been applied everywhere, and can
! therefore be purged.  this extra communication is needful so
  <productname>Slony-I</productname> is able to shift origins to other
  locations.</para>
--- 26,32 ----
  <emphasis>all</emphasis> nodes in order to be able to conclude that
  <command>sync</command>s have been received everywhere, and that,
! therefore, entries in &sllog1; and &sllog2; have been applied
! everywhere, and can therefore be purged.  this extra communication is
! needful so
  <productname>Slony-I</productname> is able to shift origins to other
  locations.</para>

Index: help.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/help.sgml,v
retrieving revision 1.18.2.2
retrieving revision 1.18.2.3
diff -C2 -d -r1.18.2.2 -r1.18.2.3
*** help.sgml	16 Mar 2007 19:01:26 -0000	1.18.2.2
--- help.sgml	30 Apr 2009 16:06:10 -0000	1.18.2.3
***************
*** 11,17 ****
  <listitem><para> Before submitting questions to any public forum as to
  why <quote>something mysterious</quote> has happened to your
! replication cluster, please run the <xref linkend="testslonystate">
! tool.  It may give some clues as to what is wrong, and the results are
! likely to be of some assistance in analyzing the problem. </para>
  </listitem>
  
--- 11,18 ----
  <listitem><para> Before submitting questions to any public forum as to
  why <quote>something mysterious</quote> has happened to your
! replication cluster, be sure to run the &lteststate; tool and be
! prepared to provide its output.  It may give some clues as to what is
! wrong, and the results are likely to be of some assistance in
! analyzing the problem. </para>
  </listitem>
  

Index: concepts.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/concepts.sgml,v
retrieving revision 1.20
retrieving revision 1.20.2.1
diff -C2 -d -r1.20 -r1.20.2.1
*** concepts.sgml	2 Aug 2006 18:34:57 -0000	1.20
--- concepts.sgml	30 Apr 2009 16:06:10 -0000	1.20.2.1
***************
*** 41,45 ****
  <para>The cluster name is specified in each and every Slonik script via the directive:</para>
  <programlisting>
! cluster name = 'something';
  </programlisting>
  
--- 41,45 ----
  <para>The cluster name is specified in each and every Slonik script via the directive:</para>
  <programlisting>
! cluster name = something;
  </programlisting>
  

Index: adminscripts.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/adminscripts.sgml,v
retrieving revision 1.40.2.9
retrieving revision 1.40.2.10
diff -C2 -d -r1.40.2.9 -r1.40.2.10
*** adminscripts.sgml	13 Mar 2008 16:51:26 -0000	1.40.2.9
--- adminscripts.sgml	30 Apr 2009 16:06:10 -0000	1.40.2.10
***************
*** 135,147 ****
  replicated.</para>
  </sect3>
! <sect3><title>slonik_drop_node</title>
  
- <para>Generates Slonik script to drop a node from a &slony1; cluster.</para>
  </sect3>
! <sect3><title>slonik_drop_set</title>
  
  <para>Generates Slonik script to drop a replication set
  (<emphasis>e.g.</emphasis> - set of tables and sequences) from a
  &slony1; cluster.</para>
  </sect3>
  
--- 135,162 ----
  replicated.</para>
  </sect3>
! <sect3 id="slonik-drop-node"><title>slonik_drop_node</title>
! 
! <para>Generates Slonik script to drop a node from a &slony1;
! cluster.</para>
  
  </sect3>
! <sect3 id="slonik-drop-set"><title>slonik_drop_set</title>
  
  <para>Generates Slonik script to drop a replication set
  (<emphasis>e.g.</emphasis> - set of tables and sequences) from a
  &slony1; cluster.</para>
+ 
+ <para> This represents a pretty big potential <quote>foot gun</quote>
+ as this eliminates a replication set all at once.  A typo that points
+ it to the wrong set could be rather damaging.  Compare to <xref
+ linkend="slonik-unsubscribe-set"> and <xref
+ linkend="slonik-drop-node">; with both of those, attempting to drop a
+ subscription or a node that is vital to your operations will be
+ blocked (via a foreign key constraint violation) if there exists a
+ downstream subscriber that would be adversely affected.  In contrast,
+ there will be no warnings or errors if you drop a set; the set will
+ simply disappear from replication.
+ </para>
+ 
  </sect3>
  
***************
*** 232,239 ****
  
  <para>This goes through and drops the &slony1; schema from each node;
! use this if you want to destroy replication throughout a cluster.
! This is a <emphasis>VERY</emphasis> unsafe script!</para>
  
! </sect3><sect3><title>slonik_unsubscribe_set</title>
  
  <para>Generates Slonik script to unsubscribe a node from a replication set.</para>
--- 247,257 ----
  
  <para>This goes through and drops the &slony1; schema from each node;
! use this if you want to destroy replication throughout a cluster.  As
! its effects are necessarily rather destructive, this has the potential
! to be pretty unsafe.</para>
  
! </sect3>
! 
! <sect3 id="slonik-unsubscribe-set"><title>slonik_unsubscribe_set</title>
  
  <para>Generates Slonik script to unsubscribe a node from a replication set.</para>
***************
*** 344,347 ****
--- 362,408 ----
  </sect2>
  
+ <sect2 id="startslon"> <title>start_slon.sh</title>
+ 
+ <para> This <filename>rc.d</filename>-style script was introduced in
+ &slony1; version 2.0; it provides automatable ways of:</para>
+ 
+ <itemizedlist>
+ <listitem><para>Starting the &lslon;, via <command> start_slon.sh start </command> </para> </listitem>
+ </itemizedlist>
+ <para> Attempts to start the &lslon;, checking first to verify that it
+ is not already running, that configuration exists, and that the log
+ file location is writable.  Failure cases include:</para>
+ 
+ <itemizedlist>
+ <listitem><para> No <link linkend="runtime-config"> slon runtime configuration file </link> exists, </para></listitem>
+ <listitem><para> A &lslon; is found with the PID indicated via the runtime configuration, </para></listitem>
+ <listitem><para> The specified <envar>SLON_LOG</envar> location is not writable. </para></listitem>
+ <listitem><para>Stopping the &lslon;, via <command> start_slon.sh stop </command> </para> 
+ <para> This fails (doing nothing) if the PID (indicated via the runtime configuration file) does not exist; </para> </listitem>
+ <listitem><para>Monitoring the status of the &lslon;, via <command> start_slon.sh status </command> </para> 
+ <para> This indicates whether or not the &lslon; is running, and, if so, prints out the process ID. </para> </listitem>
+ 
+ </itemizedlist>
+ 
+ <para> The following environment variables are used to control &lslon; configuration:</para>
+ 
+ <glosslist>
+ <glossentry><glossterm> <envar> SLON_BIN_PATH </envar> </glossterm>
+ <glossdef><para> This indicates where the &lslon; binary program is found. </para> </glossdef> </glossentry>
+ <glossentry><glossterm> <envar> SLON_CONF </envar> </glossterm>
+ <glossdef><para> This indicates the location of the <link linkend="runtime-config"> slon runtime configuration file </link> that controls how the &lslon; behaves. </para> 
+ <para> Note that this file is <emphasis>required</emphasis> to contain a value for <link linkend="slon-config-logging-pid-file">log_pid_file</link>; that is necessary to allow this script to detect whether the &lslon; is running or not. </para>
+ </glossdef> </glossentry>
+ <glossentry><glossterm> <envar> SLON_LOG </envar> </glossterm>
+ <glossdef><para> This file is the location where &lslon; log files are to be stored, if need be.  There is an option <xref linkend ="slon-config-logging-syslog"> for &lslon; to use <application>syslog</application> to manage logging; in that case, you may prefer to set <envar>SLON_LOG</envar> to <filename>/dev/null</filename>.  </para> </glossdef> </glossentry>
+ </glosslist>
+ 
+ <para> Note that these environment variables may either be set, in the
+ script, or overridden by values passed in from the environment.  The
+ latter usage makes it easy to use this script in conjunction with the
+ <xref linkend="testbed"> so that it is regularly tested. </para>
+ 
+ </sect2>
+ 
  <sect2 id="launchclusters"><title> launch_clusters.sh </title>
  
***************
*** 349,356 ****
  
  <para> This is another shell script which uses the configuration as
! set up by <filename>mkslonconf.sh</filename> and is intended to either
! be run at system boot time, as an addition to the
! <filename>rc.d</filename> processes, or regularly, as a cron process,
! to ensure that &lslon; processes are running.</para>
  
  <para> It uses the following environment variables:</para>
--- 410,417 ----
  
  <para> This is another shell script which uses the configuration as
! set up by <filename>mkslonconf.sh</filename> and is intended to
! support an approach to running &slony1; involving regularly
! (<emphasis>e.g.</emphasis> via a cron process) checking to ensure that
! &lslon; processes are running.</para>
  
  <para> It uses the following environment variables:</para>
***************
*** 420,433 ****
  </itemizedlist>
  
- <note> <para> This script only works properly when run against an <emphasis>origin</emphasis> node. </para> </note>
- 
- <warning> <para> If this script is against a
- <emphasis>subscriber</emphasis> node, the <command>pg_dump</command>
- used to draw the schema from the <quote>source</quote> node will
- attempt to pull the <emphasis>broken</emphasis> schema found on the
- subscriber, and thus, the result will <emphasis>not</emphasis> be a
- faithful representation of the schema as would be found on the origin
- node. </para> </warning>
- 
  </sect2>
  <sect2><title> slony-cluster-analysis </title>
--- 481,484 ----
***************
*** 569,573 ****
  cluster.</para></listitem>
  
! <listitem><para> <filename>create_set.slonik</filename></para>
  
  <para>This is the first script to run; it sets up the requested nodes
--- 620,624 ----
  cluster.</para></listitem>
  
! <listitem><para> <filename>create_nodes.slonik</filename></para>
  
  <para>This is the first script to run; it sets up the requested nodes
***************
*** 644,648 ****
  <subtitle> Apache-Style profiles for FreeBSD <filename>ports/databases/slony/*</filename> </subtitle>
  
! <para> In the tools area, <filename>slon.in-profiles</filename> is a
  script that might be used to start up &lslon; instances at the time of
  system startup.  It is designed to interact with the FreeBSD Ports
--- 695,701 ----
  <subtitle> Apache-Style profiles for FreeBSD <filename>ports/databases/slony/*</filename> </subtitle>
  
! <indexterm><primary> Apache-style profiles for FreeBSD </primary> <secondary>FreeBSD </secondary> </indexterm>
! 
! <para> In the <filename>tools</filename> area, <filename>slon.in-profiles</filename> is a
  script that might be used to start up &lslon; instances at the time of
  system startup.  It is designed to interact with the FreeBSD Ports
***************
*** 650,653 ****
--- 703,762 ----
  
  </sect2>
+ 
+ <sect2 id="duplicate-node"> <title> <filename> duplicate-node.sh </filename> </title>
+ <indexterm><primary> duplicating nodes </primary> </indexterm>
+ <para> In the <filename>tools</filename> area,
+ <filename>duplicate-node.sh</filename> is a script that may be used to
+ help create a new node that duplicates one of the ones in the
+ cluster. </para>
+ 
+ <para> The script expects the following parameters: </para>
+ <itemizedlist>
+ <listitem><para> Cluster name </para> </listitem>
+ <listitem><para> New node number </para> </listitem>
+ <listitem><para> Origin node </para> </listitem>
+ <listitem><para> Node being duplicated </para> </listitem>
+ <listitem><para> New node </para> </listitem>
+ </itemizedlist>
+ 
+ <para> For each of the nodes specified, the script offers flags to
+ specify <function>libpq</function>-style parameters for
+ <envar>PGHOST</envar>, <envar>PGPORT</envar>,
+ <envar>PGDATABASE</envar>, and <envar>PGUSER</envar>; it is expected
+ that <filename>.pgpass</filename> will be used for storage of
+ passwords, as is generally considered best practice. Those values may
+ inherit from the <function>libpq</function> environment variables, if
+ not set, which is useful when using this for testing.  When
+ <quote>used in anger,</quote> however, it is likely that nearly all of
+ the 14 available parameters should be used. </para>
+ 
+ <para> The script prepares files, normally in
+ <filename>/tmp</filename>, and will report the name of the directory
+ that it creates that contain SQL and &lslonik; scripts to set up the
+ new node. </para>
+ 
+ <itemizedlist>
+ <listitem><para> <filename> schema.sql </filename> </para> 
+ <para> This is drawn from the origin node, and contains the <quote>pristine</quote> database schema that must be applied first.</para></listitem>
+ <listitem><para> <filename> slonik.preamble </filename> </para> 
+ 
+ <para> This <quote>preamble</quote> is used by the subsequent set of slonik scripts. </para> </listitem>
+ <listitem><para> <filename> step1-storenode.slonik </filename> </para> 
+ <para> A &lslonik; script to set up the new node. </para> </listitem>
+ <listitem><para> <filename> step2-storepath.slonik </filename> </para> 
+ <para> A &lslonik; script to set up path communications between the provider node and the new node. </para> </listitem>
+ <listitem><para> <filename> step3-subscribe-sets.slonik </filename> </para> 
+ <para> A &lslonik; script to request subscriptions for all replications sets.</para> </listitem>
+ </itemizedlist>
+ 
+ <para> For testing purposes, this is sufficient to get a new node working.  The configuration may not necessarily reflect what is desired as a final state:</para>
+ 
+ <itemizedlist>
+ <listitem><para> Additional communications paths may be desirable in order to have redundancy. </para> </listitem>
+ <listitem><para> It is assumed, in the generated scripts, that the new node should support forwarding; that may not be true. </para> </listitem>
+ <listitem><para> It may be desirable later, after the subscription process is complete, to revise subscriptions. </para> </listitem>
+ </itemizedlist>
+ 
+ </sect2>
  </sect1>
  <!-- Keep this comment at the end of the file

Index: maintenance.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/maintenance.sgml,v
retrieving revision 1.25
retrieving revision 1.25.2.1
diff -C2 -d -r1.25 -r1.25.2.1
*** maintenance.sgml	2 Aug 2006 18:34:59 -0000	1.25
--- maintenance.sgml	30 Apr 2009 16:06:10 -0000	1.25.2.1
***************
*** 11,17 ****
  <listitem><para> Deletes old data from various tables in the
  <productname>Slony-I</productname> cluster's namespace, notably
! entries in <xref linkend="table.sl-log-1">, <xref
! linkend="table.sl-log-2"> (not yet used), and <xref
! linkend="table.sl-seqlog">.</para></listitem>
  
  <listitem><para> Vacuum certain tables used by &slony1;.  As of 1.0.5,
--- 11,15 ----
  <listitem><para> Deletes old data from various tables in the
  <productname>Slony-I</productname> cluster's namespace, notably
! entries in &sllog1;, &sllog2;, and &slseqlog;.</para></listitem>
  
  <listitem><para> Vacuum certain tables used by &slony1;.  As of 1.0.5,
***************
*** 26,30 ****
  vacuuming of these tables.  Unfortunately, it has been quite possible
  for <application>pg_autovacuum</application> to not vacuum quite
! frequently enough, so you probably want to use the internal vacuums.
  Vacuuming &pglistener; <quote>too often</quote> isn't nearly as
  hazardous as not vacuuming it frequently enough.</para>
--- 24,28 ----
  vacuuming of these tables.  Unfortunately, it has been quite possible
  for <application>pg_autovacuum</application> to not vacuum quite
! frequently enough, so you may prefer to use the internal vacuums.
  Vacuuming &pglistener; <quote>too often</quote> isn't nearly as
  hazardous as not vacuuming it frequently enough.</para>
***************
*** 37,52 ****
  
  <listitem> <para> The <link linkend="dupkey"> Duplicate Key Violation
! </link> bug has helped track down some &postgres; race conditions.
! One remaining issue is that it appears that is a case where
! <command>VACUUM</command> is not reclaiming space correctly, leading
! to corruption of B-trees. </para>
! 
! <para> It may be helpful to run the command <command> REINDEX TABLE
! sl_log_1;</command> periodically to avoid the problem
! occurring. </para> </listitem>
  
  <listitem><para> As of version 1.2, <quote>log switching</quote>
! functionality is in place; every so often, it seeks to switch between
! storing data in &sllog1; and &sllog2; so that it may seek
  to <command>TRUNCATE</command> the <quote>elder</quote> data.</para>
  
--- 35,48 ----
  
  <listitem> <para> The <link linkend="dupkey"> Duplicate Key Violation
! </link> bug has helped track down a number of rather obscure
! &postgres; race conditions, so that in modern versions of &slony1; and &postgres;, there should be little to worry about.
! </para>
! </listitem>
  
  <listitem><para> As of version 1.2, <quote>log switching</quote>
! functionality is in place; every so often (by default, once per week,
! though you may induce it by calling the stored
! function <function>logswitch_start()</function>), it seeks to switch
! between storing data in &sllog1; and &sllog2; so that it may seek
  to <command>TRUNCATE</command> the <quote>elder</quote> data.</para>
  
***************
*** 54,62 ****
  cleared out, so that you will not suffer from them having grown to
  some significant size, due to heavy load, after which they are
! incapable of shrinking back down </para> </listitem>
  
  </itemizedlist>
  </para>
  
  <sect2><title> Watchdogs: Keeping Slons Running</title>
  
--- 50,119 ----
  cleared out, so that you will not suffer from them having grown to
  some significant size, due to heavy load, after which they are
! incapable of shrinking back down </para> 
! 
! <para> In version 2.0, <command>DELETE</command> is no longer used to
! clear out data in &sllog1; and &sllog2;; instead, the log switch logic
! is induced frequently, every time the cleanup loop does not find a
! switch in progress, and these tables are purely cleared out
! via <command>TRUNCATE</command>.  This eliminates the need to vacuum
! these tables. </para>
! 
! </listitem>
  
  </itemizedlist>
  </para>
  
+ <sect2 id="maintenance-autovac"> <title> Interaction with &postgres;
+ autovacuum </title>
+ 
+ <indexterm><primary>autovacuum interaction</primary></indexterm>
+ 
+ <para> Recent versions of &postgres; support an
+ <quote>autovacuum</quote> process which notices when tables are
+ modified, thereby creating dead tuples, and vacuums those tables,
+ <quote>on demand.</quote> It has been observed that this can interact
+ somewhat negatively with &slony1;'s own vacuuming policies on its own
+ tables. </para>
+ 
+ <para> &slony1; requests vacuums on its tables immediately after
+ completing transactions that are expected to clean out old data, which
+ may be expected to be the ideal time to do so.  It appears as though
+ autovacuum may notice the changes a bit earlier, and attempts
+ vacuuming when transactions are not complete, rendering the work
+ pretty useless.  It seems preferable to configure autovacuum to avoid
+ vacuum &slony1;-managed configuration tables. </para>
+ 
+ <para> The following query (change the cluster name to match your
+ local configuration) will identify the tables that autovacuum should
+ be configured not to process: </para>
+ 
+ <programlisting>
+ mycluster=# select oid, relname from pg_class where relnamespace = (select oid from pg_namespace where nspname = '_' || 'MyCluster') and relhasindex;
+   oid  |   relname    
+ -------+--------------
+  17946 | sl_nodelock
+  17963 | sl_setsync
+  17994 | sl_trigger
+  17980 | sl_table
+  18003 | sl_sequence
+  17937 | sl_node
+  18034 | sl_listen
+  18017 | sl_path
+  18048 | sl_subscribe
+  17951 | sl_set
+  18062 | sl_event
+  18069 | sl_confirm
+  18074 | sl_seqlog
+  18078 | sl_log_1
+  18085 | sl_log_2
+ (15 rows)
+ </programlisting>
+ 
+ <para> The following query will populate
+ <envar>pg_catalog.pg_autovacuum</envar> with suitable configuration
+ information: <command> INSERT INTO pg_catalog.pg_autovacuum (vacrelid, enabled, vac_base_thresh, vac_scale_factor, anl_base_thresh, anl_scale_factor, vac_cost_delay, vac_cost_limit, freeze_min_age, freeze_max_age) SELECT oid, 'f', -1, -1, -1, -1, -1, -1, -1, -1 FROM pg_catalog.pg_class WHERE relnamespace = (SELECT OID FROM pg_namespace WHERE nspname = '_' || 'MyCluster') AND relhasindex; </command>
+ </para>
+ </sect2>
+ 
  <sect2><title> Watchdogs: Keeping Slons Running</title>
  
***************
*** 89,92 ****
--- 146,150 ----
  <sect2 id="gensync"><title>Parallel to Watchdog: generate_syncs.sh</title>
  
+ <indexterm><primary>generate SYNCs</primary></indexterm>
  <para>A new script for &slony1; 1.1 is
  <application>generate_syncs.sh</application>, which addresses the following kind of
***************
*** 121,128 ****
  <indexterm><primary>testing cluster status</primary></indexterm>
  
! <para> In the <filename>tools</filename> directory, you may find
! scripts called <filename>test_slony_state.pl</filename> and
! <filename>test_slony_state-dbi.pl</filename>.  One uses the Perl/DBI
! interface; the other uses the Pg interface.
  </para>
  
--- 179,186 ----
  <indexterm><primary>testing cluster status</primary></indexterm>
  
! <para> In the <filename>tools</filename> directory, you will find
! &lteststate; scripts called <filename>test_slony_state.pl</filename>
! and <filename>test_slony_state-dbi.pl</filename>.  One uses the
! Perl/DBI interface; the other uses the Pg interface.
  </para>
  
***************
*** 130,136 ****
  &slony1; node (you can pick any one), and from that, determine all the
  nodes in the cluster.  They then run a series of queries (read only,
! so this should be quite safe to run) which look at the various
! &slony1; tables, looking for a variety of sorts of conditions
! suggestive of problems, including:
  </para>
  
--- 188,194 ----
  &slony1; node (you can pick any one), and from that, determine all the
  nodes in the cluster.  They then run a series of queries (read only,
! so this should be quite safe to run) which examine various &slony1;
! tables, looking for a variety of sorts of conditions suggestive of
! problems, including:
  </para>
  
***************
*** 219,222 ****
--- 277,282 ----
  <sect2><title> Other Replication Tests </title>
  
+ <indexterm><primary>testing replication</primary></indexterm>
+ 
  <para> The methodology of the previous section is designed with a view
  to minimizing the cost of submitting replication test queries; on a
***************
*** 287,290 ****
--- 347,446 ----
  </para>
  </sect2>
+ <sect2><title>mkservice </title>
+ <indexterm><primary>mkservice for BSD </primary></indexterm>
+ 
+ <sect3><title>slon-mkservice.sh</title>
+ 
+ <para> Create a slon service directory for use with svscan from
+ daemontools.  This uses multilog in a pretty basic way, which seems to
+ be standard for daemontools / multilog setups. If you want clever
+ logging, see logrep below. Currently this script has very limited
+ error handling capabilities.</para>
+ 
+ <para> For non-interactive use, set the following environment
+ variables.  <envar>BASEDIR</envar> <envar>SYSUSR</envar>
+ <envar>PASSFILE</envar> <envar>DBUSER</envar> <envar>HOST</envar>
+ <envar>PORT</envar> <envar>DATABASE</envar> <envar>CLUSTER</envar>
+ <envar>SLON_BINARY</envar> If any of the above are not set, the script
+ asks for configuration information interactively.</para>
+ 
+ <itemizedlist>
+ <listitem><para>
+ <envar>BASEDIR</envar> where you want the service directory structure for the slon
+ to be created. This should <emphasis>not</emphasis> be the <filename>/var/service</filename> directory.</para></listitem>
+ <listitem><para>
+ <envar>SYSUSR</envar> the unix user under which the slon (and multilog) process should run.</para></listitem>
+ <listitem><para>
+ <envar>PASSFILE</envar> location of the <filename>.pgpass</filename> file to be used. (default <filename>~sysusr/.pgpass</filename>)</para></listitem>
+ <listitem><para>
+ <envar>DBUSER</envar> the postgres user the slon should connect as (default slony)</para></listitem>
+ <listitem><para>
+ <envar>HOST</envar> what database server to connect to (default localhost)</para></listitem>
+ <listitem><para>
+ <envar>PORT</envar> what port to connect to (default 5432)</para></listitem>
+ <listitem><para>
+ <envar>DATABASE</envar> which database to connect to (default dbuser)</para></listitem>
+ <listitem><para>
+ <envar>CLUSTER</envar> the name of your Slony1 cluster? (default database)</para></listitem>
+ <listitem><para>
+ <envar>SLON_BINARY</envar> the full path name of the slon binary (default <command>which slon</command>)</para></listitem>
+ </itemizedlist>
+ </sect3>
+ 
+ <sect3><title>logrep-mkservice.sh</title>
+ 
+ <para>This uses <command>tail -F</command> to pull data from log files allowing
+ you to use multilog filters (by setting the CRITERIA) to create
+ special purpose log files. The goal is to provide a way to monitor log
+ files in near realtime for <quote>interesting</quote> data without either
+ hacking up the initial log file or wasting CPU/IO by re-scanning the
+ same log repeatedly.
+ </para>
+ 
+ <para>For non-interactive use, set the following environment
+ variables.  <envar>BASEDIR</envar> <envar>SYSUSR</envar> <envar>SOURCE</envar>
+ <envar>EXTENSION</envar> <envar>CRITERIA</envar> If any of the above are not set,
+ the script asks for configuration information interactively.
+ </para>
+ 
+ <itemizedlist>
+ <listitem><para>
+ <envar>BASEDIR</envar> where you want the service directory structure for the logrep
+ to be created. This should <emphasis>not</emphasis> be the <filename>/var/service</filename> directory.</para></listitem>
+ <listitem><para><envar>SYSUSR</envar> unix user under which the service should run.</para></listitem>
+ <listitem><para><envar>SOURCE</envar> name of the service with the log you want to follow.</para></listitem>
+ <listitem><para><envar>EXTENSION</envar> a tag to differentiate this logrep from others using the same source.</para></listitem>
+ <listitem><para><envar>CRITERIA</envar> the multilog filter you want to use.</para></listitem>
+ </itemizedlist>
+ 
+ <para> A trivial example of this would be to provide a log file of all slon
+ ERROR messages which could be used to trigger a nagios alarm.
+ <command>EXTENSION='ERRORS'</command>
+ <command>CRITERIA="'-*' '+* * ERROR*'"</command>
+ (Reset the monitor by rotating the log using <command>svc -a $svc_dir</command>)
+ </para>
+ 
+ <para> A more interesting application is a subscription progress log.
+ <command>EXTENSION='COPY'</command>
+ <command>CRITERIA="'-*' '+* * ERROR*' '+* * WARN*' '+* * CONFIG enableSubscription*' '+* * DEBUG2 remoteWorkerThread_* prepare to copy table*' '+* * DEBUG2 remoteWorkerThread_* all tables for set * found on subscriber*' '+* * DEBUG2 remoteWorkerThread_* copy*' '+* * DEBUG2 remoteWorkerThread_* Begin COPY of table*' '+* * DEBUG2 remoteWorkerThread_* * bytes copied for table*' '+* * DEBUG2 remoteWorkerThread_* * seconds to*' '+* * DEBUG2 remoteWorkerThread_* set last_value of sequence*' '+* * DEBUG2 remoteWorkerThread_* copy_set*'"</command>
+ </para>
+ 
+ <para>If you have a subscription log then it's easy to determine if a given
+ slon is in the process of handling copies or other subscription activity.
+ If the log isn't empty, and doesn't end with a 
+ <command>"CONFIG enableSubscription: sub_set:1"</command>
+ (or whatever set number you've subscribed) then the slon is currently in
+ the middle of initial copies.</para>
+ 
+ <para> If you happen to be monitoring the mtime of your primary slony logs to 
+ determine if your slon has gone brain-dead, checking this is a good way
+ to avoid mistakenly clobbering it in the middle of a subscribe. As a bonus,
+ recall that since the the slons are running under svscan, you only need to
+ kill it (via the svc interface) and let svscan start it up again laster.
+ I've also found the COPY logs handy for following subscribe activity 
+ interactively.</para>
+ </sect3>
+ 
+ </sect2>
  </sect1>
  <!-- Keep this comment at the end of the file

Index: addthings.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/addthings.sgml,v
retrieving revision 1.23.2.5
retrieving revision 1.23.2.6
diff -C2 -d -r1.23.2.5 -r1.23.2.6
*** addthings.sgml	11 Jun 2007 16:01:33 -0000	1.23.2.5
--- addthings.sgml	30 Apr 2009 16:06:10 -0000	1.23.2.6
***************
*** 237,241 ****
  drops the schema and its contents, but also removes any columns
  previously added in using <xref linkend= "stmttableaddkey">.
! </para></listitem>
  </itemizedlist>
  </sect2>
--- 237,247 ----
  drops the schema and its contents, but also removes any columns
  previously added in using <xref linkend= "stmttableaddkey">.
! </para>
! 
! <note><para> In &slony1; version 2.0, <xref linkend=
! "stmttableaddkey"> is <emphasis>no longer supported</emphasis>, and
! thus <xref linkend="stmtuninstallnode"> consists very simply of
! <command>DROP SCHEMA "_ClusterName" CASCADE;</command>.  </para>
! </note></listitem>
  </itemizedlist>
  </sect2>
***************
*** 290,298 ****
  </para></listitem>
  
! <listitem><para> At this point, it is an excellent idea to run
! the <filename>tools</filename>
! script <command>test_slony_state-dbi.pl</command>, which rummages
! through the state of the entire cluster, pointing out any anomalies
! that it finds.  This includes a variety of sorts of communications
  problems.</para> </listitem>
  
--- 296,303 ----
  </para></listitem>
  
! <listitem><para> At this point, it is an excellent idea to run the
! <filename>tools</filename> script &lteststate;, which rummages through
! the state of the entire cluster, pointing out any anomalies that it
! finds.  This includes a variety of sorts of communications
  problems.</para> </listitem>
  
***************
*** 347,355 ****
  originates a replication set.</para> </listitem>
  
! <listitem><para> Run the <filename>tools</filename>
! script <command>test_slony_state-dbi.pl</command>, which rummages
! through the state of the entire cluster, pointing out any anomalies
! that it notices, as well as some information on the status of each
! node. </para> </listitem>
  
  </itemizedlist>
--- 352,359 ----
  originates a replication set.</para> </listitem>
  
! <listitem><para> Run the <filename>tools</filename> script
! &lteststate;, which rummages through the state of the entire cluster,
! pointing out any anomalies that it notices, as well as some
! information on the status of each node. </para> </listitem>
  
  </itemizedlist>

Index: testbed.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/testbed.sgml,v
retrieving revision 1.10.2.2
retrieving revision 1.10.2.3
diff -C2 -d -r1.10.2.2 -r1.10.2.3
*** testbed.sgml	20 Apr 2007 20:51:09 -0000	1.10.2.2
--- testbed.sgml	30 Apr 2009 16:06:10 -0000	1.10.2.3
***************
*** 148,151 ****
--- 148,161 ----
  
  <glossentry>
+ <glossterm><envar>TMPDIR</envar></glossterm>
+ 
+ <glossdef><para> By default, the tests will generate their output in
+ <filename>/tmp</filename>, <filename>/usr/tmp</filename>, or
+ <filename>/var/tmp</filename>, unless you set your own value for this
+ environment variable.  </para></glossdef>
+ 
+ </glossentry>
+ 
+ <glossentry>
  <glossterm><envar>SLTOOLDIR</envar></glossterm>
  
***************
*** 180,183 ****
--- 190,231 ----
  </glossentry>
  
+ <glossentry>
+ <glossterm><envar>SLONCONF[n]</envar></glossterm>
+ 
+ <glossdef><para> If set to <quote>true</quote>, for a particular node,
+ typically handled in <filename>settings.ik</filename> for a given
+ test, then configuration will be set up in a <link
+ linkend="runtime-config"> per-node <filename>slon.conf</filename>
+ runtime config file. </link> </para> </glossdef>
+ </glossentry>
+ 
+ <glossentry>
+ <glossterm><envar>SLONYTESTER</envar></glossterm>
+ 
+ <glossdef><para> Email address of the person who might be
+ contacted about the test results. This is stored in the
+ <envar>SLONYTESTFILE</envar>, and may eventually be aggregated in some
+ sort of buildfarm-like registry. </para> </glossdef>
+ </glossentry>
+ 
+ <glossentry>
+ <glossterm><envar>SLONYTESTFILE</envar></glossterm>
+ 
+ <glossdef><para> File in which to store summary results from tests.
+ Eventually, this may be used to construct a buildfarm-like repository of
+ aggregated test results. </para> </glossdef>
+ </glossentry>
+ 
+ <glossentry>
+ <glossterm><filename>random_number</filename> and <filename>random_string</filename> </glossterm>
+ 
+ <glossdef><para> If you run <command>make</command> in the
+ <filename>test</filename> directory, C programs
+ <application>random_number</application> and
+ <application>random_string</application> will be built which will then
+ be used when generating random data in lieu of using shell/SQL
+ capabilities that are much slower than the C programs.  </para>
+ </glossdef>
+ </glossentry>
  
  </glosslist>

Index: releasechecklist.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/releasechecklist.sgml,v
retrieving revision 1.3.2.6
retrieving revision 1.3.2.7
diff -C2 -d -r1.3.2.6 -r1.3.2.7
*** releasechecklist.sgml	29 Aug 2007 05:44:58 -0000	1.3.2.6
--- releasechecklist.sgml	30 Apr 2009 16:06:10 -0000	1.3.2.7
***************
*** 1,4 ****
  <!-- $Id$ -->
! <article id="releasechecklist"> <title> Release Checklist </title>
  
  <indexterm><primary>release checklist</primary></indexterm>
--- 1,4 ----
  <!-- $Id$ -->
! <sect1 id="releasechecklist"> <title> Release Checklist </title>
  
  <indexterm><primary>release checklist</primary></indexterm>
***************
*** 53,59 ****
  <filename>configure.ac</filename></para></listitem> 
  
! <listitem><para>Purge directory <filename>autom4te.cache</filename> so it is not included in the build  </para></listitem> 
  
! <listitem><para>Purge out .cvsignore files; this can be done with the command <command> find . -name .cvsignore | xargs rm </command>  </para></listitem> 
  <listitem><para> Run <filename>tools/release_checklist.sh</filename> </para>
  
--- 53,63 ----
  <filename>configure.ac</filename></para></listitem> 
  
! <listitem><para>Purge directory <filename>autom4te.cache</filename> so it is not included in the build  </para>
! <para> Does not need to be done by hand - the later <command> make distclean </command> step does this for you. </para>
! </listitem> 
  
! <listitem><para>Purge out .cvsignore files; this can be done with the command <command> find . -name .cvsignore | xargs rm </command>  </para>
! <para> Does not need to be done by hand - the later <command> make distclean </command> step does this for you. </para>
! </listitem> 
  <listitem><para> Run <filename>tools/release_checklist.sh</filename> </para>
  
***************
*** 66,70 ****
  <listitem><para>PACKAGE_VERSION=REL_1_1_2</para></listitem>
  
! <listitem><para>PACKAGE_STRING=postgresql-slony1 REL_1_1_2</para></listitem>
  
  </itemizedlist>
--- 70,74 ----
  <listitem><para>PACKAGE_VERSION=REL_1_1_2</para></listitem>
  
! <listitem><para>PACKAGE_STRING=slony1 REL_1_1_2</para></listitem>
  
  </itemizedlist>
***************
*** 94,99 ****
  
  <para> Currently this is best done by issuing <command> ./configure &&
! make all && make clean</command> but that is a somewhat ugly approach.
  
  </para></listitem> 
  
--- 98,105 ----
  
  <para> Currently this is best done by issuing <command> ./configure &&
! make all && make clean</command> but that is a somewhat ugly approach.</para>
  
+ <para> Slightly better may be <command> ./configure && make
+ src/slon/conf-file.c src/slonik/parser.c src/slonik/scan.c </command>
  </para></listitem> 
  
***************
*** 101,106 ****
  previous step(s) are removed.</para>
  
! <para> <command>make distclean</command> ought to do
! that... </para></listitem>
  
  <listitem><para>Generate HTML tarball, and RTF/PDF, if
--- 107,118 ----
  previous step(s) are removed.</para>
  
! <para> <command>make distclean</command> will do
! that... </para>
! 
! <para> Note that <command>make distclean</command> also clears out
! <filename>.cvsignore</filename> files and
! <filename>autom4te.cache</filename>, thus obsoleting some former steps
! that suggested that it was needful to delete them. </para>
! </listitem>
  
  <listitem><para>Generate HTML tarball, and RTF/PDF, if
***************
*** 135,139 ****
  
  </itemizedlist>
! </article>
  <!-- Keep this comment at the end of the file
  Local variables:
--- 147,151 ----
  
  </itemizedlist>
! </sect1>
  <!-- Keep this comment at the end of the file
  Local variables:

Index: firstdb.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/firstdb.sgml,v
retrieving revision 1.20.2.4
retrieving revision 1.20.2.5
diff -C2 -d -r1.20.2.4 -r1.20.2.5
*** firstdb.sgml	18 Feb 2009 21:02:53 -0000	1.20.2.4
--- firstdb.sgml	30 Apr 2009 16:06:10 -0000	1.20.2.5
***************
*** 31,35 ****
  
  <listitem><para> You have <option>tcpip_socket=true</option> in your
! <filename>postgresql.conf</filename> and</para></listitem>
  
  <listitem><para> You have enabled access in your cluster(s) via
--- 31,35 ----
  
  <listitem><para> You have <option>tcpip_socket=true</option> in your
! <filename>postgresql.conf</filename>;</para> <note> <para> This is no longer needed for &postgres; 8.0 and later versions.</para></note> </listitem>
  
  <listitem><para> You have enabled access in your cluster(s) via
***************
*** 95,98 ****
--- 95,116 ----
  </programlisting>
  
+ <para> One of the tables created by
+ <application>pgbench</application>, <envar>history</envar>, does not
+ have a primary key.  In earlier versions of &slony1;, a &lslonik;
+ command called <xref linkend="stmttableaddkey"> could be used to
+ introduce one.  This caused a number of problems, and so this feature
+ has been removed in version 2 of &slony1;.  It now
+ <emphasis>requires</emphasis> that there is a suitable candidate
+ primary key. </para>
+ 
+ <para> The following SQL requests will establish a proper primary key on this table: </para>
+ 
+ <programlisting>
+ psql -U $PGBENCHUSER -h $HOST1 -d $MASTERDBNAME -c "begin; alter table
+ history add column id serial; update history set id =
+ nextval('history_id_seq'); alter table history add primary key(id);
+ commit"
+ </programlisting>
+ 
  <para>Because &slony1; depends on the databases having the pl/pgSQL
  procedural language installed, we better install it now.  It is
***************
*** 142,177 ****
  procedures in the master/slave (node) databases. </para>
  
! <sect3><title>Using the altperl scripts</title>
! 
! <indexterm><primary> altperl script usage </primary></indexterm>
! 
! <para>
! Using the <xref linkend="altperl"> scripts is an easy way to get started.  The
! <command>slonik_build_env</command> script will generate output providing
! details you need to  omplete building a <filename>slon_tools.conf</filename>. 
! An example <filename>slon_tools.conf</filename> is provided in the distribution
! to get you started.  The altperl scripts will all reference
! this central configuration file in the future to ease administration. Once 
! slon_tools.conf has been created, you can proceed as follows:
! </para>
! 
! <programlisting>
! # Initialize cluster:
! $ slonik_init_cluster  | slonik 
! 
! # Start slon  (here 1 and 2 are node numbers)
! $ slon_start 1    
! $ slon_start 2
! 
! # Create Sets (here 1 is a set number)
! $ slonik_create_set 1 | slonik             
  
! # subscribe set to second node (1= set ID, 2= node ID)
! $ slonik_subscribe_set  2 | slonik
! </programlisting>
  
! <para> You have now replicated your first database.  You can skip the following section
! of documentation if you'd like, which documents more of a <quote>bare-metal</quote> approach.</para>
! </sect3>
  
  <sect3><title>Using slonik command directly</title>
--- 160,180 ----
  procedures in the master/slave (node) databases. </para>
  
! <para> The example that follows uses <xref linkend="slonik"> directly
! (or embedded directly into scripts).  This is not necessarily the most
! pleasant way to get started; there exist tools for building <xref
! linkend="slonik"> scripts under the <filename>tools</filename>
! directory, including:</para>
! <itemizedlist>
! <listitem><para> <xref linkend="altperl"> - a set of Perl scripts that
! build <xref linkend="slonik"> scripts based on a single
! <filename>slon_tools.conf</filename> file. </para> </listitem>
  
! <listitem><para> <xref linkend="mkslonconf"> - a shell script
! (<emphasis>e.g.</emphasis> - works with Bash) which, based either on
! self-contained configuration or on shell environment variables,
! generates a set of <xref linkend="slonik"> scripts to configure a
! whole cluster. </para> </listitem>
  
! </itemizedlist>
  
  <sect3><title>Using slonik command directly</title>
***************
*** 211,225 ****
   
  	#--
- 	# Because the history table does not have a primary key or other unique
- 	# constraint that could be used to identify a row, we need to add one.
- 	# The following command adds a bigint column named
- 	# _Slony-I_$CLUSTERNAME_rowID to the table.  It will have a default value
- 	# of nextval('_$CLUSTERNAME.s1_rowid_seq'), and have UNIQUE and NOT NULL
- 	# constraints applied.  All existing rows will be initialized with a
- 	# number
- 	#--
- 	table add key (node id = 1, fully qualified name = 'public.history');
- 
- 	#--
  	# Slony-I organizes tables into sets.  The smallest unit a node can
  	# subscribe is a set.  The following commands create one set containing
--- 214,217 ----
***************
*** 230,234 ****
  	set add table (set id=1, origin=1, id=2, fully qualified name = 'public.branches', comment='branches table');
  	set add table (set id=1, origin=1, id=3, fully qualified name = 'public.tellers', comment='tellers table');
! 	set add table (set id=1, origin=1, id=4, fully qualified name = 'public.history', comment='history table', key = serial);
  
  	#--
--- 222,226 ----
  	set add table (set id=1, origin=1, id=2, fully qualified name = 'public.branches', comment='branches table');
  	set add table (set id=1, origin=1, id=3, fully qualified name = 'public.tellers', comment='tellers table');
! 	set add table (set id=1, origin=1, id=4, fully qualified name = 'public.history', comment='history table');
  
  	#--
***************
*** 237,241 ****
  	#--
  
! 	store node (id=2, comment = 'Slave node');
  	store path (server = 1, client = 2, conninfo='dbname=$MASTERDBNAME host=$MASTERHOST user=$REPLICATIONUSER');
  	store path (server = 2, client = 1, conninfo='dbname=$SLAVEDBNAME host=$SLAVEHOST user=$REPLICATIONUSER');
--- 229,233 ----
  	#--
  
! 	store node (id=2, comment = 'Slave node', event node=1);
  	store path (server = 1, client = 2, conninfo='dbname=$MASTERDBNAME host=$MASTERHOST user=$REPLICATIONUSER');
  	store path (server = 2, client = 1, conninfo='dbname=$SLAVEDBNAME host=$SLAVEHOST user=$REPLICATIONUSER');
***************
*** 304,313 ****
  the database.  When the copy process is finished, the replication
  daemon on <envar>$SLAVEHOST</envar> will start to catch up by applying
! the accumulated replication log.  It will do this in little steps, 10
! seconds worth of application work at a time.  Depending on the
! performance of the two systems involved, the sizing of the two
! databases, the actual transaction load and how well the two databases
! are tuned and maintained, this catchup process can be a matter of
! minutes, hours, or eons.</para>
  
  <para>You have now successfully set up your first basic master/slave
--- 296,311 ----
  the database.  When the copy process is finished, the replication
  daemon on <envar>$SLAVEHOST</envar> will start to catch up by applying
! the accumulated replication log.  It will do this in little steps,
! initially doing about 10 seconds worth of application work at a time.
! Depending on the performance of the two systems involved, the sizing
! of the two databases, the actual transaction load and how well the two
! databases are tuned and maintained, this catchup process may be a
! matter of minutes, hours, or eons.</para>
! 
! <para> If you encounter problems getting this working, check over the
! logs for the &lslon; processes, as error messages are likely to be
! suggestive of the nature of the problem.  The tool &lteststate; is
! also useful for diagnosing problems with nearly-functioning
! replication clusters.</para>
  
  <para>You have now successfully set up your first basic master/slave
***************
*** 362,368 ****
  <filename>slony-I-basic-mstr-slv.txt</filename>.</para>
  
! <para>If this script returns <command>FAILED</command> please contact the
! developers at <ulink url="http://slony.info/">
! http://slony.info/</ulink></para></sect3>
  </sect2>
  </sect1>
--- 360,407 ----
  <filename>slony-I-basic-mstr-slv.txt</filename>.</para>
  
! <para>If this script returns <command>FAILED</command> please contact
! the developers at <ulink url="http://slony.info/">
! http://slony.info/</ulink>.  Be sure to be prepared with useful
! diagnostic information including the logs generated by &lslon;
! processes and the output of &lteststate;. </para></sect3>
! 
! <sect3><title>Using the altperl scripts</title>
! 
! <indexterm><primary> altperl script example </primary></indexterm>
! 
! <para>
! Using the <xref linkend="altperl"> scripts is an alternative way to
! get started; it allows you to avoid writing slonik scripts, at least
! for some of the simple ways of configuring &slony1;.  The
! <command>slonik_build_env</command> script will generate output
! providing details you need to build a
! <filename>slon_tools.conf</filename>, which is required by these
! scripts.  An example <filename>slon_tools.conf</filename> is provided
! in the distribution to get you started.  The altperl scripts all
! reference this central configuration file centralize cluster
! configuration information. Once slon_tools.conf has been created, you
! can proceed as follows:
! </para>
! 
! <programlisting>
! # Initialize cluster:
! $ slonik_init_cluster  | slonik 
! 
! # Start slon  (here 1 and 2 are node numbers)
! $ slon_start 1    
! $ slon_start 2
! 
! # Create Sets (here 1 is a set number)
! $ slonik_create_set 1 | slonik             
! 
! # subscribe set to second node (1= set ID, 2= node ID)
! $ slonik_subscribe_set 1 2 | slonik
! </programlisting>
! 
! <para> You have now replicated your first database.  You can skip the
! following section of documentation if you'd like, which documents more
! of a <quote>bare-metal</quote> approach.</para>
! </sect3>
! 
  </sect2>
  </sect1>

Index: versionupgrade.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/versionupgrade.sgml,v
retrieving revision 1.9
retrieving revision 1.9.2.1
diff -C2 -d -r1.9 -r1.9.2.1
*** versionupgrade.sgml	2 Aug 2006 18:34:59 -0000	1.9
--- versionupgrade.sgml	30 Apr 2009 16:06:10 -0000	1.9.2.1
***************
*** 28,45 ****
  </itemizedlist></para>
  
! <para> And note that this led to a 40 hour outage.</para>
  
  <para> &slony1; offers an opportunity to replace that long outage with
! one a few minutes or even a few seconds long.  The approach taken is
! to create a &slony1; replica in the new version.  It is possible that
! it might take much longer than 40h to create that replica, but once
! it's there, it can be kept very nearly up to date.</para>
  
! <para> When it is time to switch over to the new database, the
! procedure is rather less time consuming:
  
  <itemizedlist>
  
! <listitem><para> Stop all applications that might modify the data</para></listitem>
  
  <listitem><para> Lock the set against client application updates using
--- 28,48 ----
  </itemizedlist></para>
  
! <para> And note that this approach led to a 40 hour outage.</para>
  
  <para> &slony1; offers an opportunity to replace that long outage with
! one as little as a few seconds long.  The approach required is to
! create a &slony1; replica in the new version.  It is possible that it
! may take considerably longer than 40h to create that replica, however,
! establishing that replica requires no outage, and once it's there, it
! can be kept very nearly up to date.</para>
  
! <para> When it comes time to switch over to the new database, the
! portion of the procedure that requires an application
! <quote>outage</quote> is a lot less time consuming:
  
  <itemizedlist>
  
! <listitem><para> Stop all applications that might modify the data
! </para></listitem>
  
  <listitem><para> Lock the set against client application updates using
***************
*** 50,68 ****
  the new one</para></listitem>
  
! <listitem><para> Point the applications at the new database</para></listitem>
! </itemizedlist></para>
  
  <para> This procedure should only need to take a very short time,
! likely bound more by how quickly you can reconfigure your applications
! than anything else.  If you can automate all the steps, it might take
! less than a second.  If not, somewhere between a few seconds and a few
! minutes is likely.</para>
  
! <para> Note that after the origin has been shifted, updates now flow
! into the <emphasis>old</emphasis> database.  If you discover that due
! to some unforeseen, untested condition, your application is somehow
! unhappy connecting to the new database, you could easily use <xref
! linkend="stmtmoveset"> again to shift the origin back to the old
! database.</para>
  
  <para> If you consider it particularly vital to be able to shift back
--- 53,72 ----
  the new one</para></listitem>
  
! <listitem><para> Point the applications to the new database
! </para></listitem> </itemizedlist></para>
  
  <para> This procedure should only need to take a very short time,
! likely based more on how much time is required to reconfigure your
! applications than anything else.  If you can automate all of these
! steps, the outage may conceivably be a second or less.  If manual
! handling is necessary, then it is likely to take somewhere between a
! few seconds and a few minutes.</para>
  
! <para> Note that after the origin has been shifted, updates are
! replicated back into the <emphasis>old</emphasis> database.  If you
! discover that due to some unforeseen, untested condition, your
! application is somehow unhappy connecting to the new database, you may
! readily use <xref linkend="stmtmoveset"> again to reverse the process
! to shift the origin back to the old database.</para>
  
  <para> If you consider it particularly vital to be able to shift back
***************
*** 82,86 ****
  
  <para> Thus, you have <emphasis>three</emphasis> nodes, one running
! the new version of &postgres;, and the other two the old version.</para></listitem>
  
  <listitem><para> Once they are roughly <quote>in sync</quote>, stop
--- 86,98 ----
  
  <para> Thus, you have <emphasis>three</emphasis> nodes, one running
! the new version of &postgres;, and the other two the old
! version.</para>
! 
! <para> Note that this imposes a need to have &slony1; built against
! <emphasis>both</emphasis> databases (<emphasis>e.g.</emphasis> - at
! the very least, the binaries for the stored procedures need to have
! been compiled against both versions of &postgres;). </para>
! 
! </listitem>
  
  <listitem><para> Once they are roughly <quote>in sync</quote>, stop
***************
*** 120,128 ****
  <emphasis>considerably</emphasis> since 7.2), but that this was more
  workable for him than other replication systems such as
! <productname>eRServer</productname>.  If you desperately need that,
! look for him on the &postgres; Hackers mailing list.  It is not
! anticipated that 7.2 will be supported by any official &slony1;
  release.</para></note></para>
  
  </sect1>
  <!-- Keep this comment at the end of the file
--- 132,555 ----
  <emphasis>considerably</emphasis> since 7.2), but that this was more
  workable for him than other replication systems such as
! <productname>eRServer</productname>.  &postgres; 7.2 will
! <emphasis>never</emphasis> be supported by any official &slony1;
  release.</para></note></para>
  
+ <sect2> <title>Example: Upgrading a single database with no existing replication </title>
+ 
+ <para>This example shows names, IP addresses, ports, etc to describe
+ in detail what is going on</para>
+ 
+    <sect3>
+     <title>The Environment</title>
+     <programlisting>
+ 		Database machine:
+ 			name = rome 
+ 			ip = 192.168.1.23
+ 			OS: Ubuntu 6.06 LTS
+ 			postgres user = postgres, group postgres
+ 			
+ 		Current PostgreSQL 
+ 			Version = 8.2.3 
+ 			Port 5432
+ 			Installed at: /data/pgsql-8.2.3
+ 			Data directory: /data/pgsql-8.2.3/data
+ 			Database to be moved: mydb
+ 			
+ 		New PostgreSQL installation
+ 			Version = 8.3.3
+ 			Port 5433
+ 			Installed at: /data/pgsql-8.3.3
+ 			Data directory: /data/pgsql-8.3.3/data
+ 			
+ 		Slony Version to be used = 1.2.14
+     </programlisting>
+    </sect3>
+    <sect3>
+     <title>Installing &slony1;</title>
+ 
+     <para>
+      How to install &slony1; is covered quite well in other parts of
+      the documentation (<xref linkend="installation">); we will just
+      provide a quick guide here.</para>
+ 
+       <programlisting>
+        wget http://main.slony.info/downloads/1.2/source/slony1-1.2.14.tar.bz2
+       </programlisting>
+ 
+       <para> Unpack and build as root with</para>
+       <programlisting>
+ 		tar xjf slony1-1.2.14.tar.bz2
+ 		cd slony1-1.2.14
+ 		./configure --prefix=/data/pgsql-8.2.3 --with-perltools=/data/pgsql-8.2.3/slony --with-pgconfigdir=/data/pgsql-8.2.3/bin
+ 		make clean
+ 		make
+ 		make install
+ 		chown -R postgres:postgres /data/pgsq-8.2.3 
+ 		mkdir /var/log/slony
+ 		chown -R postgres:postgres /var/log/slony
+       </programlisting>
+ 
+       <para> Then repeat this for the 8.3.3 build.  A very important
+       step is the <command>make clean</command>; it is not so
+       important the first time, but when building the second time, it
+       is essential to clean out the old binaries, otherwise the
+       binaries will not match the &postgres; 8.3.3 build with the
+       result that &slony1; will not work there.  </para>
+ 
+    </sect3>
+    <sect3>
+     <title>Creating the slon_tools.conf</title>
+ 
+     <para>
+      The slon_tools.conf is <emphasis>the</emphasis> configuration
+      file. It contain all all the configuration information such as:
+ 
+      <orderedlist>
+       <listitem>
+        <para>All the nodes and their details (IPs, ports, db, user,
+ 	password)</para>
+       </listitem>
+       <listitem>
+        <para>All the tables to be replicated</para>
+       </listitem>
+       <listitem>
+        <para>All the sequences to be replicated</para>
+       </listitem>
+       <listitem>
+        <para> How the tables and sequences are arranged in sets</para>
+       </listitem>
+      </orderedlist>
+      </para>
+      <para> Make a copy of
+       <filename>/data/pgsql-8.2.3/etc/slon_tools.conf-sample</filename>
+       to <filename>slon_tools.conf</filename> and open it. The comments
+       in this file are fairly self explanatory. Since this is a one time
+       replication you will generally not need to split into multiple
+       sets. On a production machine running with 500 tables and 100
+       sequences, putting them all in a single set has worked fine.</para>
+       
+       <orderedlist>
+        <para>A few modifications to do:</para>
+        <listitem>
+ 	<para> In our case we only need 2 nodes so delete the <command>add_node</command>
+ 	 for 3 and 4.</para>
+        </listitem>
+        <listitem>
+ 	<para> <envar>pkeyedtables</envar> entry need to be updated with your tables that
+ 	 have a primary key. If your tables are spread across multiple
+ 	 schemas, then you need to qualify the table name with the schema
+ 	 (schema.tablename)</para>
+        </listitem>
+        <listitem>
+ 	<para> <envar>keyedtables</envar> entries need to be updated
+ 	with any tables that match the comment (with good schema
+ 	design, there should not be any).
+ 	</para>
+        </listitem>
+        <listitem>
+ 	<para> <envar>serialtables</envar> (if you have any; as it says, it is wise to avoid this).</para>
+        </listitem>
+        <listitem>
+ 	<para> <envar>sequences</envar>  needs to be updated with your sequences.
+ 	</para>
+        </listitem>
+        <listitem>
+ 	<para>Remove the whole set2 entry (as we are only using set1)</para>
+        </listitem>
+       </orderedlist>
+      <para>
+       This is what it look like with all comments stripped out:
+       <programlisting>
+ $CLUSTER_NAME = 'replication';
+ $LOGDIR = '/var/log/slony';
+ $MASTERNODE = 1;
+ 
+     add_node(node     => 1,
+ 	     host     => 'rome',
+ 	     dbname   => 'mydb',
+ 	     port     => 5432,
+ 	     user     => 'postgres',
+          password => '');
+ 
+     add_node(node     => 2,
+ 	     host     => 'rome',
+ 	     dbname   => 'mydb',
+ 	     port     => 5433,
+ 	     user     => 'postgres',
+          password => '');
+ 
+ $SLONY_SETS = {
+     "set1" => {
+ 	"set_id" => 1,
+ 	"table_id"    => 1,
+ 	"sequence_id" => 1,
+         "pkeyedtables" => [
+ 			   'mytable1',
+ 			   'mytable2',
+ 			   'otherschema.mytable3',
+ 			   'otherschema.mytable4',
+ 			   'otherschema.mytable5',
+ 			   'mytable6',
+ 			   'mytable7',
+ 			   'mytable8',
+ 			   ],
+ 
+ 		"sequences" => [
+ 			   'mytable1_sequence1',
+    			   'mytable1_sequence2',
+ 			   'otherschema.mytable3_sequence1',
+    			   'mytable6_sequence1',
+    			   'mytable7_sequence1',
+    			   'mytable7_sequence2',
+ 			],
+     },
+ 
+ };
+ 
+ 1;
+       </programlisting>
+       </para>
+       <para> As can be seen this database is pretty small with only 8
+       tables and 6 sequences. Now copy your
+       <filename>slon_tools.conf</filename> into
+       <filename>/data/pgsql-8.2.3/etc/</filename> and
+       <filename>/data/pgsql-8.3.3/etc/</filename>
+       </para>
+    </sect3>
+    <sect3>
+     <title>Preparing the new &postgres; instance</title>
+     <para> You now have a fresh second instance of &postgres; running on
+      port 5433 on the same machine.  Now is time to prepare to 
+      receive &slony1; replication data.</para>
+     <orderedlist>
+      <listitem>
+       <para>Slony does not replicate roles, so first create all the
+        users on the new instance so it is identical in terms of
+        roles/groups</para>
+      </listitem>
+      <listitem>
+       <para>
+        Create your db in the same encoding as original db, in my case
+        UTF8
+        <command>/data/pgsql-8.3.3/bin/createdb
+ 	-E UNICODE -p5433 mydb</command>
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        &slony1; replicates data, not schemas, so take a dump of your schema
+        <command>/data/pgsql-8.2.3/bin/pg_dump
+ 	-s mydb > /tmp/mydb.schema</command>
+        and then import it on the new instance.
+        <command>cat /tmp/mydb.schema | /data/pgsql-8.3.3/bin/psql -p5433
+ 	mydb</command>
+       </para>
+      </listitem>
+     </orderedlist>
+ 
+     <para>The new database is now ready to start receiving replication
+     data</para>
+ 
+    </sect3>
+    <sect3>
+     <title>Initiating &slony1; Replication</title>
+     <para>This is the point where we start changing your current
+      production database by adding a new schema to it that  contains
+      all the &slony1; replication information</para>
+     <para>The first thing to do is to initialize the &slony1;
+      schema.  Do the following as, in the example, the  postgres user.</para>
+     <note>
+      <para> All commands starting with <command>slonik</command> does not do anything
+       themself they only generate command output that can be interpreted
+       by the slonik binary. So issuing any of the scripts starting with
+       slonik_ will not do anything to your database. Also by default the
+       slonik_ scripts will look for your slon_tools.conf in your etc
+       directory of the postgresSQL directory. In my case
+       <filename>/data/pgsql-8.x.x/etc</filename> depending on which you are working on.</para>
+     </note>
+     <para>
+      <command>/data/pgsql-8.2.3/slony/slonik_init_cluster
+       > /tmp/init.txt</command>
+     </para>
+     <para>open /tmp/init.txt and it should look like something like
+      this</para>
+     <programlisting>
+ # INIT CLUSTER
+ cluster name = replication;
+  node 1 admin conninfo='host=rome dbname=mydb user=postgres port=5432';
+  node 2 admin conninfo='host=rome dbname=mydb user=postgres port=5433';
+   init cluster (id = 1, comment = 'Node 1 - mydb at rome');
+ 
+ # STORE NODE
+   store node (id = 2, event node = 1, comment = 'Node 2 - mydb at rome');
+   echo 'Set up replication nodes';
+ 
+ # STORE PATH
+   echo 'Next: configure paths for each node/origin';
+   store path (server = 1, client = 2, conninfo = 'host=rome dbname=mydb user=postgres port=5432');
+   store path (server = 2, client = 1, conninfo = 'host=rome dbname=mydb user=postgres port=5433');
+   echo 'Replication nodes prepared';
+   echo 'Please start a slon replication daemon for each node';
+      
+     </programlisting>
+     <para>The first section indicates node information and the
+     initialization of the cluster, then it adds the second node to the
+     cluster and finally stores communications paths for both nodes in
+     the slony schema.</para>
+     <para>
+      Now is time to execute the command:
+      <command>cat /tmp/init.txt | /data/pgsql-8.2.3/bin/slonik</command>
+     </para>
+     <para>This will run pretty quickly and give you some output to
+     indicate success.</para>
+     <para>
+      If things do fail, the most likely reasons would be database
+      permissions, <filename>pg_hba.conf</filename> settings, or typos
+      in <filename>slon_tools.conf</filename>. Look over your problem
+      and solve it.  If slony schemas were created but it still failed
+      you can issue the script <command>slonik_uninstall_nodes</command> to
+      clean things up.  In the worst case you may connect to each
+      database and issue <command>drop schema _replication cascade;</command>
+      to clean up.
+     </para>
+    </sect3>
+    <sect3>
+     <title>The slon daemon</title>
+ 
+     <para>As the result from the last command told us, we should now
+     be starting a slon replication daemon for each node! The slon
+     daemon is what makes the replication work. All transfers and all
+     work is done by the slon daemon. One is needed for each node. So
+     in our case we need one for the 8.2.3 installation and one for the
+     8.3.3.</para>
+ 
+     <para> to start one for 8.2.3 you would do:
+     <command>/data/pgsql-8.2.3/slony/slon_start 1 --nowatchdog</command>
+     This would start the daemon for node 1, the --nowatchdog since we
+     are running a very small replication we do not need any watchdogs
+     that keep an eye on the slon process if it stays up etc.  </para>
+ 
+     <para>if it says started successfully have a look in the log file
+      at /var/log/slony/slony1/node1/ It will show that the process was
+      started ok</para>
+ 
+     <para> We need to start one for 8.3.3 as well.  <command>
+     <command>/data/pgsql-8.3.3/slony/slon_start 2 --nowatchdog</command>
+     </command> </para>
+ 
+     <para>If it says it started successfully have a look in the log
+     file at /var/log/slony/slony1/node2/ It will show that the process
+     was started ok</para>
+    </sect3>
+    <sect3>
+     <title>Adding the replication set</title>
+     <para>We now need to let the slon replication know which tables and
+      sequences it is to replicate. We need to create the set.</para>
+     <para>
+      Issue the following:
+      <command>/data/pgsql-8.2.3/slony/slonik_create_set
+       set1 > /tmp/createset.txt</command>
+     </para>
+ 
+     <para> <filename> /tmp/createset.txt</filename> may be quite lengthy depending on how
+      many tables; in any case, take a quick look and it should make sense as it
+      defines all the tables and sequences to be replicated</para>
+ 
+     <para>
+      If you are happy with the result send the file to the slonik for
+      execution
+      <command>cat /tmp/createset.txt | /data/pgsql-8.2.3/bin/slonik
+      </command>
+      You will see quite a lot rolling by, one entry for each table.
+     </para>
+     <para>You now have defined what is to be replicated</para>
+    </sect3>
+    <sect3>
+     <title>Subscribing all the data</title>
+     <para>
+      The final step is to get all the data onto the new database. It is
+      simply done using the subscribe script.
+      <command>data/pgsql-8.2.3/slony/slonik_subscribe_set
+       1 2 > /tmp/subscribe.txt</command>
+      the first is the ID of the set, second is which node that is to
+      subscribe.
+     </para>
+     <para>
+      will look something like this:
+      <programlisting>
+  cluster name = replication;
+  node 1 admin conninfo='host=rome dbname=mydb user=postgres port=5432';
+  node 2 admin conninfo='host=rome dbname=mydb user=postgres port=5433';
+   try {
+     subscribe set (id = 1, provider = 1, receiver = 2, forward = yes);
+   }
+   on error {
+     exit 1;
+   }
+   echo 'Subscribed nodes to set 1';
+      </programlisting>
+      send it to the slonik
+      <command>cat /tmp/subscribe.txt | /data/pgsql-8.2.3/bin/slonik
+      </command>
+     </para>
+     <para>The replication will now start. It will copy everything in
+      tables/sequneces that were in the set. understandable this can take
+      quite some time, all depending on the size of db and power of the
+      machine.</para>
+     <para>
+      One way to keep track of the progress would be to do the following:
+      <command>tail -f /var/log/slony/slony1/node2/log | grep -i copy
+      </command>
+      The slony logging is pretty verbose and doing it this way will let
+      you know how the copying is going. At some point it will say "copy
+      completed sucessfully in xxx seconds" when you do get this it is
+      done!
+     </para>
+     <para>Once this is done it will start trying to catch up with all
+      data that has come in since the replication was started. You can
+      easily view the progress of this in the database. Go to the master
+      database, in the replication schema there is a view called
+      sl_status. It is pretty self explanatory. The field of most interest
+      is the "st_lag_num_events" this declare how many slony events behind
+      the node is. 0 is best. but it all depends how active your db is.
+      The field next to it st_lag_time is an estimation how much in time
+      it is lagging behind. Take this with a grain of salt. The actual
+      events is a more accurate messure of lag.</para>
+     <para>You now have a fully replicated database</para>
+    </sect3>
+    <sect3>
+     <title>Switching over</title>
+     <para>Our database is fully replicated and its keeping up. There
+      are few different options for doing the actual switch over it all
+      depends on how much time you got to work with, down time vs. data
+      loss ratio. the most brute force fast way of doing it would be
+     </para>
+     <orderedlist>
+      <listitem>
+       <para>First modify the postgresql.conf file for the 8.3.3 to
+        use port 5432 so that it is ready for the restart</para>
+      </listitem>
+      <listitem>
+       <para>From this point you will have down time. shutdown the
+        8.2.3 postgreSQL installation</para>
+      </listitem>
+      <listitem>
+       <para>restart the 8.3.3 postgreSQL installation. It should
+        come up ok.</para>
+      </listitem>
+      <listitem>
+       <para>
+        drop all the slony stuff from the 8.3.3 installation login psql to
+        the 8.3.3 and issue
+        <command>drop schema _replication cascade;</command>
+       </para>
+      </listitem>
+     </orderedlist>
+     <para>You have now upgraded to 8.3.3 with, hopefully, minimal down
+     time. This procedure represents roughly the simplest way to do
+     this.</para>
+    </sect3>
+   </sect2>
  </sect1>
  <!-- Keep this comment at the end of the file

Index: faq.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/faq.sgml,v
retrieving revision 1.66.2.7
retrieving revision 1.66.2.8
diff -C2 -d -r1.66.2.7 -r1.66.2.8
*** faq.sgml	19 Feb 2009 16:47:05 -0000	1.66.2.7
--- faq.sgml	30 Apr 2009 16:06:10 -0000	1.66.2.8
***************
*** 15,19 ****
  </question>
  
- 
  <answer><para> <productname>Frotznik Freenix</productname> is new to
  me, so it's a bit dangerous to give really hard-and-fast definitive
--- 15,18 ----
***************
*** 233,236 ****
--- 232,454 ----
  </qandaentry>
  
+ <qandaentry>
+ <question> <para> Problem building on Fedora/x86-64 </para>
+ 
+ <para> When trying to configure &slony1; on a Fedora x86-64 system,
+ where <application>yum</application> was used to install the package
+ <filename>postgresql-libs.x86_64</filename>, the following complaint
+ comes up:
+ 
+ <screen>
+ configure: error: Your version of libpq doesn't have PQunescapeBytea
+  this means that your version of PostgreSQL is lower than 7.3
+  and thus not supported by Slony-I.
+ </screen></para>
+ 
+ <para> This happened with &postgres; 8.2.5, which is certainly rather
+ newer than 7.3. </para>
+ </question>
+ 
+ <answer> <para> <application>configure</application> is looking for
+ that symbol by compiling a little program that calls for it, and
+ checking if the compile succeeds.  On the <command>gcc</command>
+ command line it uses <command>-lpq</command> to search for the
+ library. </para>
+ 
+ <para> Unfortunately, that package is missing a symlink, from
+ <filename>/usr/lib64/libpq.so</filename> to
+ <filename>libpq.so.5.0</filename>; that is why it fails to link to
+ libpq.  The <emphasis>true</emphasis> problem is that the compiler failed to
+ find a library to link to, not that libpq lacked the function call.
+ </para>
+ 
+ <para> Eventually, this should be addressed by those that manage the
+ <filename>postgresql-libs.x86_64</filename> package. </para>
+ </answer>
+ 
+ <answer> <para> Note that this same symptom can be the indication of
+ similar classes of system configuration problems.  Bad symlinks, bad
+ permissions, bad behaviour on the part of your C compiler, all may
+ potentially lead to this same error message. </para> 
+ 
+ <para> Thus, if you see this error, you need to look in the log file
+ that is generated, <filename>config.log</filename>.  Search down to
+ near the end, and see what the <emphasis>actual</emphasis> complaint
+ was.  That will be helpful in tracking down the true root cause of the
+ problem.</para>
+ </answer>
+ 
+ </qandaentry>
+ </qandadiv>
+ 
+ <qandadiv id="faqhowto"> <title> &slony1; FAQ: How Do I? </title>
+ 
+ <qandaentry>
+ 
+ <question> <para> I need to dump a database
+ <emphasis>without</emphasis> getting &slony1; configuration
+ (<emphasis>e.g.</emphasis> - triggers, functions, and such). </para>
+ </question>
+ 
+ <answer> <para> Up to version 1.2, this is fairly nontrivial,
+ requiring careful choice of nodes, and some moderately heavy
+ <quote>procedure</quote>.   One methodology is as follows:</para>
+ 
+ <itemizedlist>
+ 
+ <listitem><para> First, dump the schema from the node that has the
+ <quote>master</quote> role.  That is the only place, pre-2.0, where
+ you can readily dump the schema using
+ <application>pg_dump</application> and have a consistent schema.  You
+ may use the &slony1; tool <xref linkend="extractschema"> to do
+ this. </para> </listitem>
+ 
+ <listitem><para> Take the resulting schema, which will <emphasis>not</emphasis>
+ include the &slony1;-specific bits, and split it into two pieces:
+ </para>
+ 
+ <itemizedlist>
+ 
+ <listitem><para> Firstly, the portion comprising all of the creations
+ of tables in the schema. </para> </listitem>
+ 
+ <listitem><para> Secondly, the portion consisting of creations of indices, constraints, and triggers. </para> </listitem>
+ 
+ </itemizedlist>
+ 
+ </listitem>
+ 
+ <listitem><para> Pull a data dump, using <command>pg_dump --data-only</command>, of some node of your choice.  It doesn't need to be for the <quote>master</quote> node.  This dump will include the contents of the &slony1;-specific tables; you can discard that, or ignore it.  Since the schema dump didn't contain table definitions for the &slony1; tables, they won't be loaded. </para> </listitem>
+ 
+ <listitem><para> Finally, load the three components in proper order: </para> 
+ <itemizedlist>
+ <listitem><para> Schema (tables) </para> </listitem>
+ <listitem><para> Data dump </para> </listitem>
+ <listitem><para> Remainder of the schema </para> </listitem>
+ </itemizedlist>
+ </listitem>
+ 
+ </itemizedlist>
+ 
+ </answer>
+ 
+ <answer> <para> In &slony1; 2.0, the answer becomes simpler: Just take
+ a <command>pg_dump --exclude-schema=_Cluster</command> against
+ <emphasis>any</emphasis> node.  In 2.0, the schemas are no longer
+ <quote>clobbered</quote> on subscribers, so a straight
+ <application>pg_dump</application> will do what you want.</para>
+ </answer>
+ 
+ </qandaentry>
+ 
+ <qandaentry id="cannotrenumbernodes">
+ <question> <para> I'd like to renumber the node numbers in my cluster.
+ How can I renumber nodes? </para> </question>
+ 
+ <answer> <para> The first answer is <quote>you can't do that</quote> -
+ &slony1; node numbers are quite <quote>immutable.</quote> Node numbers
+ are deeply woven into the fibres of the schema, by virtue of being
+ written into virtually every table in the system, but much more
+ importantly by virtue of being used as the basis for event
+ propagation.  The only time that it might be <quote>OK</quote> to
+ modify a node number is at some time where we know that it is not in
+ use, and we would need to do updates against each node in the cluster
+ in an organized fashion.</para>
+ 
+ <para> To do this in an automated fashion seems like a
+ <emphasis>huge</emphasis> challenge, as it changes the structure of
+ the very event propagation system that already needs to be working in
+ order for such a change to propagate.</para> </answer>
+ 
+ <answer> <para> If it is <emphasis>enormously necessary</emphasis> to
+ renumber nodes, this might be accomplished by dropping and re-adding
+ nodes to get rid of the node formerly using the node ID that needs to
+ be held by another node.</para> </answer>
+ </qandaentry>
+ 
+ </qandadiv>
+ 
+ <qandadiv id="faqimpossibilities"> <title> &slony1; FAQ: Impossible Things People Try </title>
+ 
+ <qandaentry>
+ <question><para> Can I use &slony1; to replicate changes back and forth on my database between my two offices? </para> </question>
+ 
+ <answer><para> At one level, it is <emphasis>theoretically
+ possible</emphasis> to do something like that, if you design your
+ application so that each office has its own distinct set of tables,
+ and you then have some system for consolidating the data to give them
+ some common view.  However, this requires a great deal of design work
+ to create an application that performs this consolidation. </para>
+ </answer>
+ 
+ <answer><para> In practice, the term for that is <quote>multimaster
+ replication,</quote> and &slony1; does not support <quote>multimaster
+ replication.</quote> </para> </answer>
+ 
+ </qandaentry>
+ 
+ <qandaentry>
+ <question><para> I want to replicate all of the databases for a shared-database system I am managing.  There are multiple databases, being used by my customers.  </para> </question>
+ 
+ <answer><para> For this purpose, something like &postgres; PITR (Point
+ In Time Recovery) is likely to be much more suitable.  &slony1;
+ requires a slon process (and multiple connections) for each
+ identifiable database, and if you have a &postgres; cluster hosting 50
+ or 100 databases, this will require hundreds of database connections.
+ Typically, in <quote>shared hosting</quote> situations, DML is being
+ managed by customers, who can change anything they like whenever
+ <emphasis>they</emphasis> want.  &slony1; does not work out well when
+ not used in a disciplined manner.  </para> </answer>
+ </qandaentry>
+ 
+ <qandaentry>
+ <question><para> I want to be able to make DDL changes, and have them replicated automatically. </para> </question>
+ 
+ <answer><para> &slony1; requires that <xref linkend="ddlchanges"> be planned for explicitly and carefully.  &slony1; captures changes using triggers, and &postgres; does not provide a way to use triggers to capture DDL changes.</para>
+ 
+ <note><para> There has been quite a bit of discussion, off and on, about how
+ &postgres; might capture DDL changes in a way that would make triggers
+ useful; nothing concrete has emerged after several years of
+ discussion. </para> </note> </answer>
+ </qandaentry>
+ 
+ <qandaentry>
+ <question><para> I want to split my cluster into disjoint partitions that are not aware of one another.  &slony1; keeps generating <xref linkend="listenpaths"> that link those partitions together. </para> </question>
+ 
+ <answer><para> The notion that all nodes are aware of one another is
+ deeply imbedded in the design of &slony1;.  For instance, its handling
+ of cleanup of obsolete data depends on being aware of whether any of
+ the nodes are behind, and thus might still depend on older data.
+ </para> </answer>
+ </qandaentry>
+ 
+ <qandaentry>
+ <question><para> I want to change some of my node numbers.  How do I <quote>rename</quote> a node to have a different node number? </para> </question>
+ <answer><para> You don't.  The node number is used to coordinate inter-node communications, and changing the node ID number <quote>on the fly</quote> would make it essentially impossible to keep node configuration coordinated.   </para> </answer>
+ </qandaentry>
+ 
+ <qandaentry>
+ <question> <para> My application uses OID attributes; is it possible to replicate tables like this? </para>
+ </question>
+ 
+ <answer><para> It is worth noting that oids, as a regular table
+ attribute, have been deprecated since &postgres; version 8.1, back in
+ 2005.  &slony1; has <emphasis>never</emphasis> collected oids to
+ replicate them, and, with that functionality being deprecated, the
+ developers do not intend to add this functionality. </para>
+ 
+ <para> &postgres; implemented oids as a way to link its internal
+ system tables together; to use them with application tables is
+ considered <emphasis>poor practice</emphasis>, and it is recommended
+ that you use sequences to populate your own ID column on application
+ tables.  </para> </answer>
+ 
+ <answer><para> Of course, nothing prevents you from creating a table
+ <emphasis>without</emphasis> oids, and then add in your own
+ application column called <envar>oid</envar>, preferably with type
+ information <command>SERIAL NOT NULL UNIQUE</command>, which
+ <emphasis>can</emphasis> be replicated, and which is likely to be
+ suitable as a candidate primary key for the table. </para> </answer>
+ </qandaentry>
  </qandadiv>
  
***************
*** 416,423 ****
  could also announce an admin to take a look...  </para> </answer>
  
- <answer><para> As of &postgres; 8.3, this should no longer be an
- issue, as this version has code which invalidates query plans when
- tables are altered. </para> </answer>
- 
  </qandaentry>
  
--- 634,637 ----
***************
*** 716,732 ****
  
  <qandaentry>
! <question> <para> Replication has fallen behind, and it appears that the
! queries to draw data from <xref linkend="table.sl-log-1">/<xref
! linkend="table.sl-log-2"> are taking a long time to pull just a few
  <command>SYNC</command>s. </para>
  </question>
  
! <answer> <para> Until version 1.1.1, there was only one index on <xref
! linkend="table.sl-log-1">/<xref linkend="table.sl-log-2">, and if
! there were multiple replication sets, some of the columns on the index
! would not provide meaningful selectivity.  If there is no index on
! column <function> log_xid</function>, consider adding it.  See
! <filename>slony1_base.sql</filename> for an example of how to create
! the index.
  </para>
  </answer>
--- 930,945 ----
  
  <qandaentry>
! <question> <para> Replication has fallen behind, and it appears that
! the queries to draw data from &sllog1;/&sllog2; are taking a long time
! to pull just a few
  <command>SYNC</command>s. </para>
  </question>
  
! <answer> <para> Until version 1.1.1, there was only one index on
! &sllog1;/&sllog2;, and if there were multiple replication sets, some
! of the columns on the index would not provide meaningful selectivity.
! If there is no index on column <function> log_xid</function>, consider
! adding it.  See <filename>slony1_base.sql</filename> for an example of
! how to create the index.
  </para>
  </answer>
***************
*** 1112,1117 ****
  <question><para> Replication has been slowing down, I'm seeing
  <command> FETCH 100 FROM LOG </command> queries running for a long
! time, <xref linkend="table.sl-log-1"> is growing, and performance is,
! well, generally getting steadily worse. </para>
  </question>
  
--- 1325,1330 ----
  <question><para> Replication has been slowing down, I'm seeing
  <command> FETCH 100 FROM LOG </command> queries running for a long
! time, &sllog1;/&sllog2; is growing, and performance is, well,
! generally getting steadily worse. </para>
  </question>
  
***************
*** 1136,1142 ****
  
  <listitem><para> The cleanup thread will be unable to clean out
! entries in <xref linkend="table.sl-log-1"> and <xref
! linkend="table.sl-seqlog">, with the result that these tables will
! grow, ceaselessly, until the transaction is closed. </para>
  </listitem>
  </itemizedlist>
--- 1349,1355 ----
  
  <listitem><para> The cleanup thread will be unable to clean out
! entries in &sllog1;, &sllog2;, and &slseqlog;, with the result that
! these tables will grow, ceaselessly, until the transaction is
! closed. </para>
  </listitem>
  </itemizedlist>
***************
*** 1177,1182 ****
  
  <qandaentry id="faq17">
! <question><para>After dropping a node, <xref linkend="table.sl-log-1">
! isn't getting purged out anymore.</para></question>
  
  <answer><para> This is a common scenario in versions before 1.0.5, as
--- 1390,1395 ----
  
  <qandaentry id="faq17">
! <question><para>After dropping a node, &sllog1;/&sllog2;
! aren't getting purged out anymore.</para></question>
  
  <answer><para> This is a common scenario in versions before 1.0.5, as
***************
*** 1242,1247 ****
  <listitem><para> At the start of each
  <function>cleanupEvent</function> run, which is the event in which old
! data is purged from <xref linkend="table.sl-log-1"> and <xref
! linkend="table.sl-seqlog"></para></listitem> </itemizedlist></para>
  </answer>
  </qandaentry>
--- 1455,1460 ----
  <listitem><para> At the start of each
  <function>cleanupEvent</function> run, which is the event in which old
! data is purged from &sllog1;, &sllog2;, and
! &slseqlog;</para></listitem> </itemizedlist></para>
  </answer>
  </qandaentry>
***************
*** 1253,1263 ****
  sync through.</para></question>
  
! <answer><para> You might want to take a look at the <xref
! linkend="table.sl-log-1">/<xref linkend="table.sl-log-2"> tables, and
! do a summary to see if there are any really enormous &slony1;
! transactions in there.  Up until at least 1.0.2, there needs to be a
! &lslon; connected to the origin in order for
  <command>SYNC</command> events to be generated.</para>
  
  <para>If none are being generated, then all of the updates until the
  next one is generated will collect into one rather enormous &slony1;
--- 1466,1479 ----
  sync through.</para></question>
  
! <answer><para> You might want to take a look at the tables &sllog1;
! and &sllog2; and do a summary to see if there are any really enormous
! &slony1; transactions in there.  Up until at least 1.0.2, there needs
! to be a &lslon; connected to the origin in order for
  <command>SYNC</command> events to be generated.</para>
  
+ <note><para> As of 1.0.2,
+ function <function>generate_sync_event()</function> provides an
+ alternative as backup...</para> </note>
+ 
  <para>If none are being generated, then all of the updates until the
  next one is generated will collect into one rather enormous &slony1;
***************
*** 1331,1334 ****
--- 1547,1569 ----
  </answer> </qandaentry>
  
+ <qandaentry>
+ 
+ <question><para> I'm noticing in the logs that a &lslon; is frequently
+ switching in and out of <quote>polling</quote> mode as it is
+ frequently reporting <quote>LISTEN - switch from polling mode to use
+ LISTEN</quote> and <quote>UNLISTEN - switch into polling
+ mode</quote>. </para> </question>
+ 
+ <answer><para> The thresholds for switching between these modes are
+ controlled by the configuration parameters <xref
+ linkend="slon-config-sync-interval"> and <xref
+ linkend="slon-config-sync-interval-timeout">; if the timeout value
+ (which defaults to 10000, implying 10s) is kept low, that makes it
+ easy for the &lslon; to decide to return to <quote>listening</quote>
+ mode.  You may want to increase the value of the timeout
+ parameter. </para>
+ </answer>
+ </qandaentry>
+ 
  </qandadiv>
  <qandadiv id="faqbugs"> <title> &slony1; FAQ: &slony1; Bugs in Elder Versions </title>
***************
*** 1461,1467 ****
  nodes.  I am discovering that confirmations for set 1 never get to the
  nodes subscribing to set 2, and that confirmations for set 2 never get
! to nodes subscribing to set 1.  As a result, <xref
! linkend="table.sl-log-1"> grows and grows and is never purged.  This
! was reported as &slony1; <ulink
  url="http://gborg.postgresql.org/project/slony1/bugs/bugupdate.php?1485">
  bug 1485 </ulink>.
--- 1696,1702 ----
  nodes.  I am discovering that confirmations for set 1 never get to the
  nodes subscribing to set 2, and that confirmations for set 2 never get
! to nodes subscribing to set 1.  As a result, &sllog1;/&sllog2; grow
! and grow, and are never purged.  This was reported as
! &slony1; <ulink
  url="http://gborg.postgresql.org/project/slony1/bugs/bugupdate.php?1485">
  bug 1485 </ulink>.
***************
*** 1515,1520 ****
  subscriber to a particular provider are for
  <quote>sequence-only</quote> sets.  If a node gets into that state,
! replication will fail, as the query that looks for data from <xref
! linkend="table.sl-log-1"> has no tables to find, and the query will be
  malformed, and fail.  If a replication set <emphasis>with</emphasis>
  tables is added back to the mix, everything will work out fine; it
--- 1750,1755 ----
  subscriber to a particular provider are for
  <quote>sequence-only</quote> sets.  If a node gets into that state,
! replication will fail, as the query that looks for data from
! &sllog1;/&sllog2; has no tables to find, and the query will be
  malformed, and fail.  If a replication set <emphasis>with</emphasis>
  tables is added back to the mix, everything will work out fine; it
***************
*** 1611,1614 ****
--- 1846,1887 ----
  linkend="stmtsetdropsequence">.</para></answer></qandaentry>
  
+ <qandaentry>
+ <question><para> I set up my cluster using pgAdminIII, with cluster
+ name <quote>MY-CLUSTER</quote>.  Time has passed, and I tried using
+ Slonik to make a configuration change, and this is failing with the
+ following error message:</para>
+ 
+ <programlisting>
+ ERROR: syntax error at or near -
+ </programlisting>
+ </question>
+ 
+ <answer><para> The problem here is that &slony1; expects cluster names
+ to be valid <ulink url=
+ "http://www.postgresql.org/docs/8.3/static/sql-syntax-lexical.html">
+ SQL Identifiers</ulink>, and &lslonik; enforces this.  Unfortunately,
+ <application>pgAdminIII</application> did not do so, and allowed using
+ a cluster name that now causes <emphasis>a problem.</emphasis> </para> </answer>
+ 
+ <answer> <para> If you have gotten into this spot, it's a problem that
+ we mayn't be help resolve, terribly much.  </para>
+ 
+ <para> It's <emphasis>conceivably possible</emphasis> that running the
+ SQL command <command>alter namespace "_My-Bad-Clustername" rename to
+ "_BetterClusterName";</command> against each database may work.  That
+ shouldn't particularly <emphasis>damage</emphasis> things!</para>
+ 
+ <para> On the other hand, when the problem has been experienced, users
+ have found they needed to drop replication and rebuild the
+ cluster.</para> </answer>
+ 
+ <answer><para> A change in version 2.0.2 is that a function runs as
+ part of loading functions into the database which checks the validity
+ of the cluster name.  If you try to use an invalid cluster name,
+ loading the functions will fail, with a suitable error message, which
+ should prevent things from going wrong even if you're using tools
+ other than &lslonik; to manage setting up the cluster. </para></answer>
+ </qandaentry>
+ 
  </qandadiv>
  
***************
*** 1818,1837 ****
  
  <para>By the time we notice that there is a problem, the seemingly
! missed delete transaction has been cleaned out of <xref
! linkend="table.sl-log-1">, so there appears to be no recovery
! possible.  What has seemed necessary, at this point, is to drop the
! replication set (or even the node), and restart replication from
! scratch on that node.</para>
  
! <para>In &slony1; 1.0.5, the handling of purges of <xref
! linkend="table.sl-log-1"> became more conservative, refusing to purge
! entries that haven't been successfully synced for at least 10 minutes
! on all nodes.  It was not certain that that would prevent the
! <quote>glitch</quote> from taking place, but it seemed plausible that
! it might leave enough <xref linkend="table.sl-log-1"> data to be able
! to do something about recovering from the condition or at least
! diagnosing it more exactly.  And perhaps the problem was that <xref
! linkend="table.sl-log-1"> was being purged too aggressively, and this
! would resolve the issue completely.</para>
  
  <para> It is a shame to have to reconstruct a large replication node
--- 2091,2108 ----
  
  <para>By the time we notice that there is a problem, the seemingly
! missed delete transaction has been cleaned out of &sllog1;, so there
! appears to be no recovery possible.  What has seemed necessary, at
! this point, is to drop the replication set (or even the node), and
! restart replication from scratch on that node.</para>
  
! <para>In &slony1; 1.0.5, the handling of purges of &sllog1; became
! more conservative, refusing to purge entries that haven't been
! successfully synced for at least 10 minutes on all nodes.  It was not
! certain that that would prevent the <quote>glitch</quote> from taking
! place, but it seemed plausible that it might leave enough &sllog1;
! data to be able to do something about recovering from the condition or
! at least diagnosing it more exactly.  And perhaps the problem was that
! &sllog1; was being purged too aggressively, and this would resolve the
! issue completely.</para>
  
  <para> It is a shame to have to reconstruct a large replication node
***************
*** 1844,1850 ****
  <para> In one case we found two lines in the SQL error message in the
  log file that contained <emphasis> identical </emphasis> insertions
! into <xref linkend="table.sl-log-1">.  This <emphasis> ought
! </emphasis> to be impossible as is a primary key on <xref
! linkend="table.sl-log-1">.  The latest (somewhat) punctured theory
  that comes from <emphasis>that</emphasis> was that perhaps this PK
  index has been corrupted (representing a &postgres; bug), and that
--- 2115,2120 ----
  <para> In one case we found two lines in the SQL error message in the
  log file that contained <emphasis> identical </emphasis> insertions
! into &sllog1;.  This <emphasis> ought </emphasis> to be impossible as
! is a primary key on &sllog1;.  The latest (somewhat) punctured theory
  that comes from <emphasis>that</emphasis> was that perhaps this PK
  index has been corrupted (representing a &postgres; bug), and that
***************
*** 1951,1956 ****
  
  <para> That trigger initiates the action of logging all updates to the
! table to &slony1; <xref linkend="table.sl-log-1">
! tables.</para></listitem>
  
  <listitem><para> On a subscriber node, this involves disabling
--- 2221,2225 ----
  
  <para> That trigger initiates the action of logging all updates to the
! table to &slony1; &sllog1;/&sllog2; tables.</para></listitem>
  
  <listitem><para> On a subscriber node, this involves disabling
***************
*** 2068,2072 ****
  
  <para>The solution is to rebuild the trigger on the affected table and
! fix the entries in <xref linkend="table.sl-log-1"> by hand.</para>
  
  <itemizedlist>
--- 2337,2341 ----
  
  <para>The solution is to rebuild the trigger on the affected table and
! fix the entries in &sllog1;/&sllog2; by hand.</para>
  
  <itemizedlist>
***************
*** 2087,2096 ****
  </screen>
  
! <para>You then need to find the rows in <xref
! linkend="table.sl-log-1"> that have bad 
! entries and fix them.  You may
! want to take down the slon daemons for all nodes except the master;
! that way, if you make a mistake, it won't immediately propagate
! through to the subscribers.</para>
  
  <para> Here is an example:</para>
--- 2356,2363 ----
  </screen>
  
! <para>You then need to find the rows in &sllog1;/&sllog2; that have
! bad entries and fix them.  You may want to take down the slon daemons
! for all nodes except the master; that way, if you make a mistake, it
! won't immediately propagate through to the subscribers.</para>
  
  <para> Here is an example:</para>
***************
*** 2215,2223 ****
  </question> 
  
  <para> &slony1; uses sequences to provide primary key values for log
  entries, and therefore this kind of behaviour may (perhaps
  regrettably!) be expected.  </para>
  
! <answer> <para> Calling <function>lastval()</function>, to
  <quote>anonymously</quote> get <quote>the most recently updated
  sequence value</quote>, rather than using
--- 2482,2491 ----
  </question> 
  
+ <answer>
  <para> &slony1; uses sequences to provide primary key values for log
  entries, and therefore this kind of behaviour may (perhaps
  regrettably!) be expected.  </para>
  
! <para> Calling <function>lastval()</function>, to
  <quote>anonymously</quote> get <quote>the most recently updated
  sequence value</quote>, rather than using

Index: dropthings.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/dropthings.sgml,v
retrieving revision 1.16.2.1
retrieving revision 1.16.2.2
diff -C2 -d -r1.16.2.1 -r1.16.2.2
*** dropthings.sgml	5 Jan 2007 19:11:44 -0000	1.16.2.1
--- dropthings.sgml	30 Apr 2009 16:06:10 -0000	1.16.2.2
***************
*** 159,162 ****
--- 159,172 ----
  nodes.</para>
  </sect2>
+ 
+ <sect2> <title> Verifying Cluster Health </title>
+ 
+ <para> After performing any of these procedures, it is an excellent
+ idea to run the <filename>tools</filename> script &lteststate;, which
+ rummages through the state of the entire cluster, pointing out any
+ anomalies that it finds.  This includes a variety of sorts of
+ communications problems.</para>
+ 
+ </sect2>
  </sect1>
  <!-- Keep this comment at the end of the file

Index: cluster.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/cluster.sgml,v
retrieving revision 1.13
retrieving revision 1.13.2.1
diff -C2 -d -r1.13 -r1.13.2.1
*** cluster.sgml	2 Aug 2006 18:34:57 -0000	1.13
--- cluster.sgml	30 Apr 2009 16:06:10 -0000	1.13.2.1
***************
*** 13,20 ****
  tables that store &slony1; configuration and replication state
  information.  See <xref linkend="schema"> for more documentation about
! what is stored in that schema.  More specifically, the tables <xref
! linkend="table.sl-log-1"> and <xref linkend="table.sl-log-2"> log
! changes collected on the origin node as they are replicated to
! subscribers.  </para>
  
  <para>Each database instance in which replication is to take place is
--- 13,19 ----
  tables that store &slony1; configuration and replication state
  information.  See <xref linkend="schema"> for more documentation about
! what is stored in that schema.  More specifically, the tables &sllog1;
! and &sllog2; log changes collected on the origin node as they are
! replicated to subscribers.  </para>
  
  <para>Each database instance in which replication is to take place is
***************
*** 24,27 ****
--- 23,31 ----
  node #1, and for the subscriber to be node #2.</para>
  
+ <para> Note that, as recorded in the <xref linkend="faq"> under <link
+ linkend="cannotrenumbernodes"> How can I renumber nodes?</link>, the
+ node number is immutable, so it is not possible to change a node's
+ node number after it has been set up.</para>
+ 
  <para>Some planning should be done, in more complex cases, to ensure
  that the numbering system is kept sane, lest the administrators be

Index: defineset.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/defineset.sgml,v
retrieving revision 1.25.2.2
retrieving revision 1.25.2.3
diff -C2 -d -r1.25.2.2 -r1.25.2.3
*** defineset.sgml	11 Jun 2007 16:01:33 -0000	1.25.2.2
--- defineset.sgml	30 Apr 2009 16:06:10 -0000	1.25.2.3
***************
*** 71,80 ****
  
  <listitem><para> If the table hasn't even got a candidate primary key,
! you can ask &slony1; to provide one.  This is done by first using
! <xref linkend="stmttableaddkey"> to add a column populated using a
! &slony1; sequence, and then having the <xref
! linkend="stmtsetaddtable"> include the directive
! <option>key=serial</option>, to indicate that &slony1;'s own column
! should be used.</para></listitem>
  
  </itemizedlist>
--- 71,82 ----
  
  <listitem><para> If the table hasn't even got a candidate primary key,
! you might ask &slony1; to provide one using 
! <xref linkend="stmttableaddkey">.</para>
! 
! <warning><para> <xref linkend="stmttableaddkey"> was always considered
! a <quote>kludge</quote>, at best, and as of version 2.0, it is
! considered such a misfeature that it is being removed.  </para>
! </warning>
! </listitem>
  
  </itemizedlist>
***************
*** 83,92 ****
  <quote>true</quote> primary key or a mere <quote>candidate primary
  key;</quote> it is, however, strongly recommended that you have one of
! those instead of having &slony1; populate the PK column for you. If you
! don't have a suitable primary key, that means that the table hasn't got
! any mechanism, from your application's standpoint, for keeping values
! unique. &slony1; may, therefore, introduce a new failure mode for your
! application, and this also implies that you had a way to enter confusing
! data into the database.</para>
  </sect2>
  
--- 85,94 ----
  <quote>true</quote> primary key or a mere <quote>candidate primary
  key;</quote> it is, however, strongly recommended that you have one of
! those instead of having &slony1; populate the PK column for you. If
! you don't have a suitable primary key, that means that the table
! hasn't got any mechanism, from your application's standpoint, for
! keeping values unique.  &slony1; may, therefore, introduce a new
! failure mode for your application, and this also implies that you had
! a way to enter confusing data into the database.</para>
  </sect2>
  
***************
*** 119,122 ****
--- 121,134 ----
  the degree of the <quote>injury</quote> to performance.</para>
  
+ <para> Another issue comes up particularly frequently when replicating
+ across a WAN; sometimes the network connection is a little bit
+ unstable, such that there is a risk that a connection held open for
+ several hours will lead to <command>CONNECTION TIMEOUT.</command> If
+ that happens when 95% done copying a 50-table replication set
+ consisting of 250GB of data, that could ruin your whole day.  If the
+ tables were, instead, associated with separate replication sets, that
+ failure at the 95% point might only interrupt, temporarily, the
+ copying of <emphasis>one</emphasis> of those tables.  </para>
+ 
  <para> These <quote>negative effects</quote> tend to emerge when the
  database being subscribed to is many gigabytes in size and where it
***************
*** 161,166 ****
  <para> Each time a SYNC is processed, values are recorded for
  <emphasis>all</emphasis> of the sequences in the set.  If there are a
! lot of sequences, this can cause <xref linkend="table.sl-seqlog"> to
! grow rather large.</para>
  
  <para> This points to an important difference between tables and
--- 173,178 ----
  <para> Each time a SYNC is processed, values are recorded for
  <emphasis>all</emphasis> of the sequences in the set.  If there are a
! lot of sequences, this can cause &slseqlog; to grow rather
! large.</para>
  
  <para> This points to an important difference between tables and
***************
*** 177,192 ****
  
  <para> If it is not updated, the trigger on the table on the origin
! never fires, and no entries are added to <xref
!        linkend="table.sl-log-1">.  The table never appears in any of the
  further replication queries (<emphasis>e.g.</emphasis> in the
  <command>FETCH 100 FROM LOG</command> queries used to find
  replicatable data) as they only look for tables for which there are
! entries in <xref linkend="table.sl-log-1">.</para></listitem>
  
  <listitem><para> In contrast, a fixed amount of work is introduced to
  each SYNC by each sequence that is replicated.</para>
  
! <para> Replicate 300 sequence and 300 rows need to be added to <xref
!        linkend="table.sl-seqlog"> on a regular basis.</para>
  
  <para> It is more than likely that if the value of a particular
--- 189,205 ----
  
  <para> If it is not updated, the trigger on the table on the origin
! never fires, and no entries are added to &sllog1;/&sllog2;.  The table never appears in any of the
  further replication queries (<emphasis>e.g.</emphasis> in the
  <command>FETCH 100 FROM LOG</command> queries used to find
  replicatable data) as they only look for tables for which there are
! entries in &sllog1;/&sllog2;.</para></listitem>
  
  <listitem><para> In contrast, a fixed amount of work is introduced to
  each SYNC by each sequence that is replicated.</para>
  
! <para> Replicate 300 sequence and 300 rows need to be added to
! &slseqlog; on a regular basis, at least, thru until the 2.0 branch,
! where updates are only applied when the value of a given sequence is
! seen to change.</para>
  
  <para> It is more than likely that if the value of a particular

Index: prerequisites.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/prerequisites.sgml,v
retrieving revision 1.26.2.2
retrieving revision 1.26.2.3
diff -C2 -d -r1.26.2.2 -r1.26.2.3
*** prerequisites.sgml	11 Jun 2007 16:01:33 -0000	1.26.2.2
--- prerequisites.sgml	30 Apr 2009 16:06:10 -0000	1.26.2.3
***************
*** 8,17 ****
  <indexterm><primary> platforms where &slony1; runs </primary> </indexterm>
  
! <para>The platforms that have received specific testing at the time of
! this release are FreeBSD-4X-i368, FreeBSD-5X-i386, FreeBSD-5X-alpha,
! OS-X-10.3, Linux-2.4X-i386 Linux-2.6X-i386 Linux-2.6X-amd64,
  <trademark>Solaris</trademark>-2.8-SPARC,
! <trademark>Solaris</trademark>-2.9-SPARC, AIX 5.1, OpenBSD-3.5-sparc64
! and &windows; 2000, XP and 2003 (32 bit).</para>
  
  <sect2>
--- 8,19 ----
  <indexterm><primary> platforms where &slony1; runs </primary> </indexterm>
  
! <para>The platforms that have received specific testing are
! FreeBSD-4X-i368, FreeBSD-5X-i386, FreeBSD-5X-alpha, OS-X-10.3,
! Linux-2.4X-i386 Linux-2.6X-i386 Linux-2.6X-amd64,
  <trademark>Solaris</trademark>-2.8-SPARC,
! <trademark>Solaris</trademark>-2.9-SPARC, AIX 5.1 and 5.3,
! OpenBSD-3.5-sparc64 and &windows; 2000, XP and 2003 (32 bit).  There
! is enough diversity amongst these platforms that nothing ought to
! prevent running &slony1; on other similar platforms. </para>
  
  <sect2>
***************
*** 67,70 ****
--- 69,76 ----
  linkend="pg81funs"> on &postgres; 8.1.[0-3] </link>. </para>
  
+ <para> There is variation between what versions of &postgres& are
+ compatible with what versions of &slony1;.  See <xref
+ linkend="installation"> for more details.</para>
+ 
  </listitem>
  
***************
*** 103,107 ****
  installation.</para>
  
! <note><para>In &slony1; version 1.1, it is possible to compile
  &slony1; separately from &postgres;, making it practical for the
  makers of distributions of <productname>Linux</productname> and
--- 109,113 ----
  installation.</para>
  
! <note><para>From &slony1; version 1.1, it is possible to compile
  &slony1; separately from &postgres;, making it practical for the
  makers of distributions of <productname>Linux</productname> and

Index: intro.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/intro.sgml,v
retrieving revision 1.25.2.2
retrieving revision 1.25.2.3
diff -C2 -d -r1.25.2.2 -r1.25.2.3
*** intro.sgml	11 Jun 2007 16:01:33 -0000	1.25.2.2
--- intro.sgml	30 Apr 2009 16:06:10 -0000	1.25.2.3
***************
*** 297,307 ****
  <listitem><para> Each SYNC applied needs to be reported back to all of
  the other nodes participating in the set so that the nodes all know
! that it is safe to purge <xref linkend="table.sl-log-1"> and <xref
! linkend="table.sl-log-2"> data, as any <quote>forwarding</quote> node
! could potentially take over as <quote>master</quote> at any time.  One
! might expect SYNC messages to need to travel through n/2 nodes to get
! propagated to their destinations; this means that each SYNC is
! expected to get transmitted n(n/2) times.  Again, this points to a
! quadratic growth in communications costs as the number of nodes
  increases.</para></listitem>
  
--- 297,307 ----
  <listitem><para> Each SYNC applied needs to be reported back to all of
  the other nodes participating in the set so that the nodes all know
! that it is safe to purge &sllog1; and &sllog2; data, as
! any <quote>forwarding</quote> node could potentially take over
! as <quote>master</quote> at any time.  One might expect SYNC messages
! to need to travel through n/2 nodes to get propagated to their
! destinations; this means that each SYNC is expected to get transmitted
! n(n/2) times.  Again, this points to a quadratic growth in
! communications costs as the number of nodes
  increases.</para></listitem>
  

Index: slonyupgrade.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/slonyupgrade.sgml,v
retrieving revision 1.3.2.2
retrieving revision 1.3.2.3
diff -C2 -d -r1.3.2.2 -r1.3.2.3
*** slonyupgrade.sgml	16 Mar 2007 19:01:26 -0000	1.3.2.2
--- slonyupgrade.sgml	30 Apr 2009 16:06:10 -0000	1.3.2.3
***************
*** 78,81 ****
--- 78,257 ----
  
  </variablelist>
+ 
+ <sect2> <title> TABLE ADD KEY issue in &slony1; 2.0 </title> 
+ 
+ <para> Usually, upgrades between &slony1; versions have required no
+ special attention to the condition of the existing replica.  That is,
+ you fairly much merely need to stop &lslon;s, put new binaries in
+ place, run <xref linkend="stmtupdatefunctions"> against each node, and
+ restart &lslon;s.  Schema changes have been internal to the cluster
+ schema, and <xref linkend="stmtupdatefunctions"> has been capable to
+ make all of the needed alterations.  With version 2, this changes, if
+ there are tables that used <xref linkend="stmttableaddkey">.  Version
+ 2 does not support the <quote>extra</quote> column, and
+ <quote>fixing</quote> the schema to have a proper primary key is not
+ within the scope of what <xref linkend="stmtupdatefunctions"> can
+ perform.  </para>
+ 
+ <para> When upgrading from versions 1.0.x, 1.1.x, or 1.2.x to version
+ 2, it will be necessary to have already eliminated any such
+ &slony1;-managed primary keys. </para>
+ 
+ <para> One may identify the tables affected via the following SQL
+ query: <command> select n.nspname, c.relname from pg_class c,
+ pg_namespace n where c.oid in (select attrelid from pg_attribute where
+ attname like '_Slony-I_%rowID' and not attisdropped) and reltype &lt;&gt; 0
+ and n.oid = c.relnamespace order by n.nspname, c.relname; </command>
+ </para>
+ 
+ <para> The simplest approach that may be taken to rectify the
+ <quote>broken</quote> state of such tables is as follows: </para>
+ 
+ <itemizedlist>
+ 
+ <listitem><para> Drop the table from replication using the &lslonik;
+ command <xref linkend="stmtsetdroptable">. </para>
+ 
+ <para> This does <emphasis>not</emphasis> drop out the
+ &slony1;-generated column. </para>
+ </listitem>
+ 
+ <listitem><para> On each node, run an SQL script to alter the table,
+ dropping the extra column.</para> <para> <command> alter table
+ whatever drop column "_Slony-I_cluster-rowID";</command> </para>
+ 
+ <para> This needs to be run individually against each node.  Depending
+ on your preferences, you might wish to use <xref
+ linkend="stmtddlscript"> to do this. </para>
+ 
+ <para> If the table is a heavily updated one, it is worth observing
+ that this alteration will require acquiring an exclusive lock on the
+ table.  It will not hold this lock for terribly long; dropping the
+ column should be quite a rapid operation as all it does internally is
+ to mark the column as being dropped; it <emphasis>does not</emphasis>
+ require rewriting the entire contents of the table.  Tuples that have
+ values in that column will continue to have that value; new tuples
+ will leave it NULL, and queries will ignore the column.  Space for
+ those columns will get reclaimed as tuples get updated.  </para>
+ 
+ <para> Note that at this point in the process, this table is not being
+ replicated.  If a failure takes place, replication is not, at this
+ point, providing protection on this table.  This is unfortunate but
+ unavoidable. </para>
+ </listitem>
+ 
+ <listitem><para> Make sure the table has a legitimate candidate for
+ primary key, some set of NOT NULL, UNIQUE columns.  </para>
+ 
+ <para> The possible variations to this are the reason that the
+ developers have made no effort to try to assist automation of
+ this.</para></listitem>
+ </itemizedlist>
+ 
+ <itemizedlist>
+ 
+ <listitem><para> If the table is a small one, it may be perfectly
+ reasonable to do alterations (note that they must be applied to
+ <emphasis>every node</emphasis>!) to add a new column, assign it via a
+ new sequence, and then declare it to be a primary key.  </para>
+ 
+ <para> If there are only a few tuples, this should take a fraction of
+ a second, and, with luck, be unnoticeable to a running
+ application. </para>
+ 
+ <para> Even if the table is fairly large, if it is not frequently
+ accessed by the application, the locking of the table that takes place
+ when you run <command>ALTER TABLE</command> may not cause much
+ inconvenience. </para></listitem>
+ 
+ <listitem> <para> If the table is a large one, and is vital to and
+ heavily accessed by the application, then it may be necessary to take
+ an application outage in order to accomplish the alterations, leaving
+ you necessarily somewhat vulnerable until the process is
+ complete. </para>
+ 
+ <para> If it is troublesome to take outages, then the upgrade to
+ &slony1; version 2 may take some planning... </para>
+ </listitem>
+ 
+ </itemizedlist>
+ 
+ <itemizedlist>
+ 
+ <listitem><para> Create a new replication set (<xref
+ linkend="stmtcreateset">) and re-add the table to that set (<xref
+ linkend="stmtsetaddtable">).  </para>
+ 
+ <para> If there are multiple tables, they may be handled via a single
+ replication set.</para>
+ </listitem>
+ 
+ <listitem><para> Subscribe the set (<xref linkend="stmtsubscribeset">)
+ on all the nodes desired. </para> </listitem>
+ 
+ <listitem><para> Once subscriptions are complete, merge the set(s) in,
+ if desired (<xref linkend="stmtmergeset">). </para> </listitem>
+ 
+ </itemizedlist>
+ 
+ <para> This approach should be fine for tables that are relatively
+ small, or infrequently used.  If, on the other hand, the table is
+ large and heavily used, another approach may prove necessary, namely
+ to create your own sequence, and <quote>promote</quote> the formerly
+ &slony1;-generated column into a <quote>real</quote> column in your
+ database schema.  An outline of the steps is as follows: </para>
+ 
+ <itemizedlist>
+ 
+ <listitem><para> Add a sequence that assigns values to the
+ column. </para>
+ 
+ <para> Setup steps will include SQL <command>CREATE
+ SEQUENCE</command>, SQL <command>SELECT SETVAL()</command> (to set the
+ value of the sequence high enough to reflect values used in the
+ table), Slonik <xref linkend="stmtcreateset"> (to create a set to
+ assign the sequence to), Slonik <xref linkend="stmtsetaddsequence">
+ (to assign the sequence to the set), Slonik <xref
+ linkend="stmtsubscribeset"> (to set up subscriptions to the new
+ set)</para>
+ </listitem>
+ 
+ <listitem><para> Attach the sequence to the column on the
+ table. </para>
+ 
+ <para> This involves <command>ALTER TABLE ALTER COLUMN</command>,
+ which must be submitted via the Slonik command <xref
+ linkend="stmtddlscript">. </para>
+ </listitem>
+ 
+ <listitem><para> Rename the column
+ <envar>_Slony-I_ at CLUSTERNAME@_rowID</envar> so that &slony1; won't
+ consider it to be under its control.</para>
+ 
+ <para> This involves <command>ALTER TABLE ALTER COLUMN</command>,
+ which must be submitted via the Slonik command <xref
+ linkend="stmtddlscript">. </para>
+ 
+ <para> Note that these two alterations might be accomplished via the
+ same <xref linkend="stmtddlscript"> request. </para>
+ </listitem>
+ 
+ </itemizedlist>
+ 
+ </sect2>
+ 
+ <sect2> <title> New Trigger Handling in &slony1; Version 2 </title>
+ 
+ <para> One of the major changes to &slony1; is that enabling/disabling
+ of triggers and rules now takes place as plain SQL, supported by
+ &postgres; 8.3+, rather than via <quote>hacking</quote> on the system
+ catalog. </para>
+ 
+ <para> As a result, &slony1; users should be aware of the &postgres;
+ syntax for <command>ALTER TABLE</command>, as that is how they can
+ accomplish what was formerly accomplished via <xref
+ linkend="stmtstoretrigger"> and <xref linkend="stmtdroptrigger">. </para>
+ 
+ </sect2>
  </sect1>
  <!-- Keep this comment at the end of the file

Index: installation.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/installation.sgml,v
retrieving revision 1.28.2.5
retrieving revision 1.28.2.6
diff -C2 -d -r1.28.2.5 -r1.28.2.6
*** installation.sgml	1 Mar 2008 02:53:47 -0000	1.28.2.5
--- installation.sgml	30 Apr 2009 16:06:10 -0000	1.28.2.6
***************
*** 44,48 ****
  <para>
  <screen>
! PGMAIN=/usr/local/pgsql746-freebsd-2005-04-01 \
  ./configure \
      --with-pgconfigdir=$PGMAIN/bin
--- 44,48 ----
  <para>
  <screen>
! PGMAIN=/usr/local/pgsql839-freebsd-2008-09-03 \
  ./configure \
      --with-pgconfigdir=$PGMAIN/bin
***************
*** 69,74 ****
  <application>configure</application> needed to know where your
  &postgres; source tree is, which was done with the
! <option>--with-pgsourcetree=</option> option.  As of version 1.1, this
! is no longer necessary, as &slony1; has included within its own code
  base certain parts needed for platform portability.  It now only needs
  to make reference to parts of &postgres; that are actually part of the
--- 69,74 ----
  <application>configure</application> needed to know where your
  &postgres; source tree is, which was done with the
! <option>--with-pgsourcetree=</option> option.  Since version 1.1, this
! has not been necessary, as &slony1; has included within its own code
  base certain parts needed for platform portability.  It now only needs
  to make reference to parts of &postgres; that are actually part of the
***************
*** 93,99 ****
  to provide correct client libraries. </para>
  
! <para> &postgres; version 8 installs the server header
  <command>#include</command> files by default; with version 7.4 and
! earlier, you need to make sure that the build installation included
  doing <command>make install-all-headers</command>, otherwise the
  server headers will not be installed, and &slony1; will be unable to
--- 93,99 ----
  to provide correct client libraries. </para>
  
! <para> &postgres; versions from 8.0 onwards install the server header
  <command>#include</command> files by default; with version 7.4 and
! earlier, you needed to make sure that the build installation included
  doing <command>make install-all-headers</command>, otherwise the
  server headers will not be installed, and &slony1; will be unable to
***************
*** 124,128 ****
  try to detect some quirks of your system.  &slony1; is known to need a
  modified version of <application>libpq</application> on specific
! platforms such as Solaris2.X on SPARC.  The patch for libpq version
  7.4.2 can be found at <ulink id="threadpatch" url=
  "http://developer.postgresql.org/~wieck/slony1/download/threadsafe-libpq-742.diff.gz">
--- 124,128 ----
  try to detect some quirks of your system.  &slony1; is known to need a
  modified version of <application>libpq</application> on specific
! platforms such as Solaris2.X on SPARC.  A patch for libpq version
  7.4.2 can be found at <ulink id="threadpatch" url=
  "http://developer.postgresql.org/~wieck/slony1/download/threadsafe-libpq-742.diff.gz">
***************
*** 175,179 ****
  </para>
  
! <para>The main list of files installed within the PostgreSQL instance is:</para>
  <itemizedlist>
  <listitem><para><filename> $bindir/slon</filename></para></listitem>
--- 175,180 ----
  </para>
  
! <para>The main list of files installed within the &postgres; instance
! is, for versions of &slony1; up to 1.2.x:</para>
  <itemizedlist>
  <listitem><para><filename> $bindir/slon</filename></para></listitem>
***************
*** 191,204 ****
  </itemizedlist>
  
! <para> (Note that as things change, the list of version-specific files
! may grow...) </para>
  
  <para>The <filename>.sql</filename> files are not fully substituted
! yet.  And yes, both the 7.3, 7.4 and the 8.0 files get installed on every
! system, irrespective of its version.  The <xref linkend="slonik">
! admin utility does namespace/cluster substitutions within these files,
! and loads the files when creating replication nodes.  At that point in
! time, the database being initialized may be remote and may run a
! different version of &postgres; than that of the local host.</para>
  
  <para> At the very least, the two shared objects installed in the
--- 192,207 ----
  </itemizedlist>
  
! <para> (Note that as things have change, the list of version-specific
! files has tended to grow...) </para>
  
  <para>The <filename>.sql</filename> files are not fully substituted
! yet.  And yes, versions for all supported versions of &postgres;
! (<emphasis>e.g.</emphasis> - such as 7.3, 7.4 8.0) get installed on
! every system, irrespective of its version.  The <xref
! linkend="slonik"> admin utility does namespace/cluster substitutions
! within these files, and loads the files when creating replication
! nodes.  At that point in time, the database being initialized may be
! remote and may run a different version of &postgres; than that of the
! local host.</para>
  
  <para> At the very least, the two shared objects installed in the
***************
*** 207,210 ****
--- 210,232 ----
  may be able to be loaded remotely from other hosts.) </para>
  
+ <para> In &slony1; version 2.0, this changes:</para>
+ <itemizedlist>
+ <listitem><para><filename> $bindir/slon</filename></para></listitem>
+ <listitem><para><filename> $bindir/slonik</filename></para></listitem>
+ <listitem><para><filename> $libdir/slony1_funcs$(DLSUFFIX)</filename></para></listitem>
+ <listitem><para><filename> $datadir/slony1_base.sql</filename></para></listitem>
+ <listitem><para><filename> $datadir/slony1_funcs.sql</filename></para></listitem>
+ </itemizedlist>
+ 
+ <note> <para> Note the loss of <filename>xxid.so</filename> - the txid
+ data type introduced in &postgres; 8.3 makes it
+ obsolete. </para></note>
+ 
+ <note> <para> &slony1; 2.0 gives up compatibility with versions of
+ &postgres; prior to 8.3, and hence <quote>resets</quote> the
+ version-specific base function handling.  There may be function files
+ for version 8.3, 8.4, and such, as replication-relevant divergences of
+ &postgres; functionality take place.  </para></note>
+ 
  </sect2>
  
***************
*** 219,224 ****
  <para> This is only built if you specify <command>--with-docs</command></para>
  
! <para> Note that you may have difficulty building the documentation on Red
! Hat-based systems due to NAMELEN being set way too low.  Havoc
  Pennington opened a bug on this back in mid-2001, back in the days of
  Red Hat 7.1; Red Hat Software has assigned the bug, but there does not
--- 241,246 ----
  <para> This is only built if you specify <command>--with-docs</command></para>
  
! <para> Note that you may have difficulty building the documentation on
! Red Hat-based systems due to NAMELEN being set way too low.  Havoc
  Pennington opened a bug on this back in mid-2001, back in the days of
  Red Hat 7.1; Red Hat Software has assigned the bug, but there does not
***************
*** 226,231 ****
  indicates that there is intent to address the issue by bumping up the
  value of NAMELEN in some future release of Red Hat Enterprise Linux,
! but that won't likely help you in 2005. Current Fedora releases have already
! addressed this issue. </para>
  
  <para>
--- 248,254 ----
  indicates that there is intent to address the issue by bumping up the
  value of NAMELEN in some future release of Red Hat Enterprise Linux,
! but that may not help you if you are using an elder version where this
! will never be rectified.  Current Fedora releases have already
! addressed this issue.  </para>
  
  <para>
***************
*** 257,261 ****
  
  <para>The RPMs are available at <ulink
! url="http://yum.pgsqlrpms.org"> &postgres RPM Repository 
  </ulink>. Please read the howto provided in the website for configuring
  yum to use that repository. Please note that the RPMs will look for RPM
--- 280,284 ----
  
  <para>The RPMs are available at <ulink
! url="http://yum.pgsqlrpms.org"> &postgres RPM Repository
  </ulink>. Please read the howto provided in the website for configuring
  yum to use that repository. Please note that the RPMs will look for RPM
***************
*** 264,268 ****
  &postgres;.</para>
  
! <para>Installing &slony1; using these RPMs is as easy as 
  installing any RPM.</para>
  
--- 287,291 ----
  &postgres;.</para>
  
! <para>Installing &slony1; using these RPMs is as easy as
  installing any RPM.</para>
  

Index: failover.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/failover.sgml,v
retrieving revision 1.23
retrieving revision 1.23.2.1
diff -C2 -d -r1.23 -r1.23.2.1
*** failover.sgml	4 Oct 2006 16:09:30 -0000	1.23
--- failover.sgml	30 Apr 2009 16:06:10 -0000	1.23.2.1
***************
*** 41,53 ****
  on node1.  Both databases are up and running and replication is more
  or less in sync.  We do controlled switchover using <xref
! linkend="stmtmoveset">.
  
  <itemizedlist>
  
  <listitem><para> At the time of this writing switchover to another
! server requires the application to reconnect to the database.  So in
! order to avoid any complications, we simply shut down the web server.
! Users who use <application>pg_pool</application> for the applications database
! connections merely have to shut down the pool.</para></listitem>
  
  <listitem><para> A small <xref linkend="slonik"> script executes the
--- 41,96 ----
  on node1.  Both databases are up and running and replication is more
  or less in sync.  We do controlled switchover using <xref
! linkend="stmtmoveset">.</para>
  
  <itemizedlist>
  
  <listitem><para> At the time of this writing switchover to another
! server requires the application to reconnect to the new database.  So
! in order to avoid any complications, we simply shut down the web
! server.  Users who use <application>pg_pool</application> for the
! applications database connections merely have to shut down the
! pool.</para>
! 
! <para> What needs to be done, here, is highly dependent on the way
! that the application(s) that use the database are configured.  The
! general point is thus: Applications that were connected to the old
! database must drop those connections and establish new connections to
! the database that has been promoted to the <quote/master/ role.  There
! are a number of ways that this may be configured, and therefore, a
! number of possible methods for accomplishing the change:</para>
! 
! <itemizedlist>
! 
! <listitem><para> The application may store the name of the database in
! a file.</para>
! 
! <para> In that case, the reconfiguration may require changing the
! value in the file, and stopping and restarting the application to get
! it to point to the new location.
! </para> </listitem>
! 
! <listitem><para> A clever usage of DNS might involve creating a CNAME
! <ulink url="http://www.iana.org/assignments/dns-parameters"> DNS
! record </ulink> that establishes a name for the application to use to
! reference the node that is in the <quote>master</quote> role.</para>
! 
! <para> In that case, reconfiguration would require changing the CNAME
! to point to the new server, and possibly restarting the application to
! refresh database connections.
! </para> </listitem>
! 
! <listitem><para> If you are using <application>pg_pool</application> or some
! similar <quote>connection pool manager,</quote> then the reconfiguration
! involves reconfiguring this management tool, but is otherwise similar
! to the DNS/CNAME example above.  </para> </listitem>
! 
! </itemizedlist>
! 
! <para> Whether or not the application that accesses the database needs
! to be restarted depends on how it is coded to cope with failed
! database connections; if, after encountering an error it tries
! re-opening them, then there may be no need to restart it. </para>
! 
! </listitem>
  
  <listitem><para> A small <xref linkend="slonik"> script executes the
***************
*** 77,81 ****
  seconds.</para></listitem>
  
! </itemizedlist></para>
  
  <para> You may now simply shutdown the server hosting node1 and do
--- 120,124 ----
  seconds.</para></listitem>
  
! </itemizedlist>
  
  <para> You may now simply shutdown the server hosting node1 and do
***************
*** 90,93 ****
--- 133,141 ----
  be any loss of data.</para>
  
+ <para> After performing the configuration change, you should, as <xref
+ linkend="bestpractices">, run the &lteststate; scripts in order to
+ validate that the cluster state remains in good order after this
+ change. </para>
+ 
  </sect2>
  <sect2><title> Failover</title>
***************
*** 141,151 ****
  will receive anything from node1 any more.</para>
  
  </listitem>
  
! <listitem>
! <para> Reconfigure and restart the application (or
  <application>pgpool</application>) to cause it to reconnect to
! node2.</para>
! </listitem>
  
  <listitem> <para> Purge out the abandoned node </para>
--- 189,205 ----
  will receive anything from node1 any more.</para>
  
+ <note><para> Note that in order for node 2 to be considered as a
+ candidate for failover, it must have been set up with the <xref
+ linkend="stmtsubscribeset"> option <command>forwarding =
+ yes</command>, which has the effect that replication log data is
+ collected in &sllog1;/&sllog2; on node 2.  If replication log data is
+ <emphasis>not</emphasis> being collected, then failover to that node
+ is not possible. </para></note>
+ 
  </listitem>
  
! <listitem> <para> Reconfigure and restart the application (or
  <application>pgpool</application>) to cause it to reconnect to
! node2.</para> </listitem>
  
  <listitem> <para> Purge out the abandoned node </para>
***************
*** 154,162 ****
  set of references to node 1 in <xref linkend="table.sl-node">, as well
  as in referring tables such as <xref linkend="table.sl-confirm">;
! since data in <xref linkend="table.sl-log-1"> is still present,
! &slony1; cannot immediately purge out the node. </para>
  
! <para> After the failover is complete and node2 accepts
! write operations against the tables, remove all remnants of node1's
  configuration information with the <xref linkend="stmtdropnode">
  command:
--- 208,216 ----
  set of references to node 1 in <xref linkend="table.sl-node">, as well
  as in referring tables such as <xref linkend="table.sl-confirm">;
! since data in &sllog1;/&sllog2; is still present, &slony1; cannot
! immediately purge out the node. </para>
  
! <para> After the failover is complete and node2 accepts write
! operations against the tables, remove all remnants of node1's
  configuration information with the <xref linkend="stmtdropnode">
  command:
***************
*** 177,184 ****
--- 231,319 ----
  
  </listitem>
+ 
+ <listitem> <para> After performing the configuration change, you
+ should, as <xref linkend="bestpractices">, run the &lteststate;
+ scripts in order to validate that the cluster state remains in good
+ order after this change. </para> </listitem>
+ 
  </itemizedlist>
  
  </sect2>
  
+ <sect2 id="complexfailover"> <title> Failover With Complex Node Set </title>
+ 
+ <para> Failover is relatively <quote/simple/ if there are only two
+ nodes; if a &slony1; cluster comprises many nodes, achieving a clean
+ failover requires careful planning and execution. </para>
+ 
+ <para> Consider the following diagram describing a set of six nodes at two sites.
+ 
+ <inlinemediaobject> <imageobject> <imagedata fileref="complexenv.png">
+ </imageobject> <textobject> <phrase> Symmetric Multisites </phrase>
+ </textobject> </inlinemediaobject></para>
+ 
+ <para> Let us assume that nodes 1, 2, and 3 reside at one data
+ centre, and that we find ourselves needing to perform failover due to
+ failure of that entire site.  Causes could range from a persistent
+ loss of communications to the physical destruction of the site; the
+ cause is not actually important, as what we are concerned about is how
+ to get &slony1; to properly fail over to the new site.</para>
+ 
+ <para> We will further assume that node 5 is to be the new origin,
+ after failover. </para>
+ 
+ <para> The sequence of &slony1; reconfiguration required to properly
+ failover this sort of node configuration is as follows:
+ </para>
+ 
+ <itemizedlist>
+ 
+ <listitem><para> Resubscribe (using <xref linkend="stmtsubscribeset">
+ ech node that is to be kept in the reformation of the cluster that is
+ not already subscribed to the intended data provider.  </para>
+ 
+ <para> In the example cluster, this means we would likely wish to
+ resubscribe nodes 4 and 6 to both point to node 5.</para>
+ 
+ <programlisting>
+    include &lt;/tmp/failover-preamble.slonik&gt;;
+    subscribe set (id = 1, provider = 5, receiver = 4);
+    subscribe set (id = 1, provider = 5, receiver = 4);
+ </programlisting>
+ 
+ </listitem>
+ <listitem><para> Drop all unimportant nodes, starting with leaf nodes.</para>
+ 
+ <para> Since nodes 1, 2, and 3 are unaccessible, we must indicate the
+ <envar>EVENT NODE</envar> so that the event reaches the still-live
+ portions of the cluster. </para>
+ 
+ <programlisting>
+    include &lt;/tmp/failover-preamble.slonik&gt;;
+    drop node (id=2, event node = 4);
+    drop node (id=3, event node = 4);
+ </programlisting>
+ 
+ </listitem>
+ 
+ <listitem><para> Now, run <command>FAILOVER</command>.</para>
+ 
+ <programlisting>
+    include &lt;/tmp/failover-preamble.slonik&gt;;
+    failover (id = 1, backup node = 5);
+ </programlisting>
+ 
+ </listitem>
+ 
+ <listitem><para> Finally, drop the former origin from the cluster.</para>
+ 
+ <programlisting>
+    include &lt;/tmp/failover-preamble.slonik&gt;;
+    drop node (id=1, event node = 4);
+ </programlisting>
+ </listitem>
+ 
+ </itemizedlist>
+ 
  <sect2><title> Automating <command> FAIL OVER </command> </title>
  
***************
*** 207,211 ****
  to forcibly knock the failed node off the network in order to prevent
  applications from getting confused.  This could take place via having
! an SNMP interface that does some combination of the following:
  
  <itemizedlist>
--- 342,346 ----
  to forcibly knock the failed node off the network in order to prevent
  applications from getting confused.  This could take place via having
! an SNMP interface that does some combination of the following:</para>
  
  <itemizedlist>
***************
*** 228,232 ****
  
  </itemizedlist>
- </para>
  </sect2>
  
--- 363,366 ----

Index: slony.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/slony.sgml,v
retrieving revision 1.36.2.1
retrieving revision 1.36.2.2
diff -C2 -d -r1.36.2.1 -r1.36.2.2
*** slony.sgml	5 Sep 2007 21:36:31 -0000	1.36.2.1
--- slony.sgml	30 Apr 2009 16:06:10 -0000	1.36.2.2
***************
*** 45,49 ****
--- 45,61 ----
  <!ENTITY sllog1 "<xref linkend=table.sl-log-1>">
  <!ENTITY sllog2 "<xref linkend=table.sl-log-2>">
+ <!ENTITY slseqlog "<xref linkend=table.sl-seqlog>">
  <!ENTITY slconfirm "<xref linkend=table.sl-confirm>">
+ 
+ <!ENTITY slevent "<xref linkend=table.sl-event>">
+ <!ENTITY slnode "<xref linkend=table.sl-node>">
+ <!ENTITY slpath "<xref linkend=table.sl-path>">
+ <!ENTITY sllisten "<xref linkend=table.sl-listen>">
+ <!ENTITY slregistry "<xref linkend=table.sl-registry>">
+ <!ENTITY slsetsync "<xref linkend=table.sl-setsync>">
+ <!ENTITY slsubscribe "<xref linkend=table.sl-subscribe>">
+ <!ENTITY sltable "<xref linkend=table.sl-table>">
+ <!ENTITY slset "<xref linkend=table.sl-set>">
+ 
  <!ENTITY rplainpaths "<xref linkend=plainpaths>">
  <!ENTITY rlistenpaths "<xref linkend=listenpaths>">
***************
*** 51,54 ****
--- 63,67 ----
  <!ENTITY lslon "<xref linkend=slon>">
  <!ENTITY lslonik "<xref linkend=slonik>">
+ <!ENTITY lteststate "<xref linkend=testslonystate>">
  
  ]>
***************
*** 94,98 ****
--- 107,113 ----
   &listenpaths;
   &plainpaths;
+  &triggers;
   &locking;
+  &raceconditions;
   &addthings;
   &dropthings;
***************
*** 107,112 ****
   &loganalysis;
   &help;
  </article>
- 
  <article id="faq">
  
--- 122,128 ----
   &loganalysis;
   &help;
+  &supportedplatforms;
+  &releasechecklist;
  </article>
  <article id="faq">
  
***************
*** 134,139 ****
  </part>
  
- &supportedplatforms;
- &releasechecklist;
  &schemadoc;
  &bookindex;
--- 150,153 ----

Index: partitioning.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/partitioning.sgml,v
retrieving revision 1.1.2.3
retrieving revision 1.1.2.4
diff -C2 -d -r1.1.2.3 -r1.1.2.4
*** partitioning.sgml	7 Mar 2008 19:05:11 -0000	1.1.2.3
--- partitioning.sgml	30 Apr 2009 16:06:10 -0000	1.1.2.4
***************
*** 74,81 ****
  </itemizedlist>
  
! <para> There are several stored functions provided to support this,
! for &postgres; 8.1 and newer; the Gentle User may use whichever seems
! preferable.  The <quote>base function</quote> is
! <function>add_empty_table_to_replication()</function>; the others
  provide additional structure and validation of the arguments </para>
  
--- 74,80 ----
  </itemizedlist>
  
! <para> There are several stored functions provided to support this;
! the Gentle User may use whichever seems preferable.  The <quote>base
! function</quote> is <function>add_empty_table_to_replication()</function>; the others
  provide additional structure and validation of the arguments </para>
  
***************
*** 107,112 ****
  with confidence to add any table to replication that is known to be
  empty. </para> </note>
- </sect2>
  
  </sect1>
  
--- 106,111 ----
  with confidence to add any table to replication that is known to be
  empty. </para> </note>
  
+ </sect2>
  </sect1>
  

Index: logshipping.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/logshipping.sgml,v
retrieving revision 1.16.2.5
retrieving revision 1.16.2.6
diff -C2 -d -r1.16.2.5 -r1.16.2.6
*** logshipping.sgml	24 Oct 2007 17:49:35 -0000	1.16.2.5
--- logshipping.sgml	30 Apr 2009 16:06:10 -0000	1.16.2.6
***************
*** 270,274 ****
  start transaction;
  
! select "_T1".setsyncTracking_offline(1, '655', '656', '2005-09-23 18:37:40.206342');
  -- end of log archiving header
  </programlisting></para></listitem>
--- 270,274 ----
  start transaction;
  
! select "_T1".setsyncTracking_offline(1, '655', '656', '2007-09-23 18:37:40.206342');
  -- end of log archiving header
  </programlisting></para></listitem>
***************
*** 282,286 ****
  start transaction;
  
! select "_T1".setsyncTracking_offline(1, '96', '109', '2005-09-23 19:01:31.267403');
  -- end of log archiving header
  </programlisting></para>
--- 282,286 ----
  start transaction;
  
! select "_T1".setsyncTracking_offline(1, '96', '109', '2007-09-23 19:01:31.267403');
  -- end of log archiving header
  </programlisting></para>
***************
*** 344,347 ****
--- 344,370 ----
  
  </sect2>
+ 
+ <sect2><title> <application> find-triggers-to-deactivate.sh
+ </application> </title>
+ 
+ <indexterm><primary> trigger deactivation </primary> </indexterm>
+ 
+ <para> It was once pointed out (<ulink
+ url="http://www.slony.info/bugzilla/show_bug.cgi?id=19"> Bugzilla bug
+ #19</ulink>) that the dump of a schema may include triggers and rules
+ that you may not wish to have running on the log shipped node.</para>
+ 
+ <para> The tool <filename> tools/find-triggers-to-deactivate.sh
+ </filename> was created to assist with this task.  It may be run
+ against the node that is to be used as a schema source, and it will
+ list the rules and triggers present on that node that may, in turn
+ need to be deactivated.</para>
+ 
+ <para> It includes <function>logtrigger</function> and <function>denyaccess</function>
+ triggers which will may be left out of the extracted schema, but it is
+ still worth the Gentle Administrator verifying that such triggers are
+ kept out of the log shipped replica.</para>
+ 
+ </sect2>
  <sect2> <title> <application>slony_logshipper </application> Tool </title>
  
***************
*** 382,385 ****
--- 405,409 ----
  <listitem><para> <command>post processing command = 'gzip -9 $inarchive';</command></para> <para> Pre- and post-processign commands are executed via <function>system(3)</function>. </para> </listitem>
  </itemizedlist>
+ 
  <para> An <quote>@</quote> as the first character causes the exit code to be ignored.  Otherwise, a nonzero exit code is treated as an error and causes processing to abort. </para>
  
***************
*** 399,405 ****
  <para> In the example shown, this sends an email to the DBAs upon
  encountering an error.</para> </listitem>
- </itemizedlist>
  
- <itemizedlist>
  <listitem><para> Archive File Names</para>
  
--- 423,427 ----

Index: supportedplatforms.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/supportedplatforms.sgml,v
retrieving revision 1.8.2.2
retrieving revision 1.8.2.3
diff -C2 -d -r1.8.2.2 -r1.8.2.3
*** supportedplatforms.sgml	17 Nov 2006 09:00:51 -0000	1.8.2.2
--- supportedplatforms.sgml	30 Apr 2009 16:06:10 -0000	1.8.2.3
***************
*** 1,3 ****
! <article id="supportedplatforms">
  <title>&slony1; Supported Platforms</title>
  
--- 1,3 ----
! <sect1 id="supportedplatforms">
  <title>&slony1; Supported Platforms</title>
  
***************
*** 10,14 ****
  </para>
  
! <para> Last updated: Nov 17, 2006</para>
  
  <para>If you experience problems in these platforms, please subscribe to 
--- 10,14 ----
  </para>
  
! <para> Last updated: Jun 23, 2005</para>
  
  <para>If you experience problems in these platforms, please subscribe to 
***************
*** 132,162 ****
  
       <row>
-       <entry>Fedora Core</entry>
-       <entry>5</entry>
-       <entry>x86</entry>
-       <entry>Nov 17, 2006</entry>
-       <entry>devrim at CommandPrompt.com</entry>
-       <entry>&postgres; Version: 8.1.5</entry>
-      </row>
- 
-      <row>
-       <entry>Fedora Core</entry>
-       <entry>6</entry>
-       <entry>x86</entry>
-       <entry>Nov 17, 2006</entry>
-       <entry>devrim at CommandPrompt.com</entry>
-       <entry>&postgres; Version: 8.1.5</entry>
-      </row>
- 
-      <row>
-       <entry>Fedora Core</entry>
-       <entry>6</entry>
-       <entry>x86_64</entry>
-       <entry>Nov 17, 2006</entry>
-       <entry>devrim at CommandPrompt.com</entry>
-       <entry>&postgres; Version: 8.1.5</entry>
-      </row>
- 
-      <row>
        <entry>Red Hat Linux</entry>
        <entry>9</entry>
--- 132,135 ----
***************
*** 204,208 ****
     </tgroup>
    </table>
! </article>
  <!-- Keep this comment at the end of the file
  Local variables:
--- 177,181 ----
     </tgroup>
    </table>
! </sect1>
  <!-- Keep this comment at the end of the file
  Local variables:

Index: slon.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/slon.sgml,v
retrieving revision 1.29.2.4
retrieving revision 1.29.2.5
diff -C2 -d -r1.29.2.4 -r1.29.2.5
*** slon.sgml	27 Mar 2008 21:01:30 -0000	1.29.2.4
--- slon.sgml	30 Apr 2009 16:06:10 -0000	1.29.2.5
***************
*** 64,71 ****
  
       <para> The first five non-debugging log levels (from Fatal to
!      Info) are <emphasis>always</emphasis> displayed in the logs.  If
!      <envar>log_level</envar> is set to 2 (a routine, and, seemingly,
!      preferable choice), then output at debugging levels 1 and 2 will
!      also be displayed.</para>
  
      </listitem>
--- 64,74 ----
  
       <para> The first five non-debugging log levels (from Fatal to
!      Info) are <emphasis>always</emphasis> displayed in the logs.  In
!      early versions of &slony1;, the <quote>suggested</quote>
!      <envar>log_level</envar> value was 2, which would list output at
!      all levels down to debugging level 2.  In &slony1; version 2, it
!      is recommended to set <envar>log_level</envar> to 0; most of the
!      consistently interesting log information is generated at levels
!      higher than that. </para>
  
      </listitem>
***************
*** 149,153 ****
        </itemizedlist>
  
!      <para>
        Default is 10000 ms and maximum is 120000 ms. By default, you
        can expect each node to <quote>report in</quote> with a
--- 152,156 ----
        </itemizedlist>
  
!       <para>
        Default is 10000 ms and maximum is 120000 ms. By default, you
        can expect each node to <quote>report in</quote> with a
***************
*** 219,223 ****
       </para> 
       <para>
!       In &slony1; version 1.1 and later versions the <application>slon</application>
        instead adaptively <quote>ramps up</quote> from doing 1
        <command>SYNC</command> at a time towards the maximum group
--- 222,226 ----
       </para> 
       <para>
!       In &slony1; version 1.1 and later versions, the <application>slon</application>
        instead adaptively <quote>ramps up</quote> from doing 1
        <command>SYNC</command> at a time towards the maximum group



More information about the Slony1-commit mailing list