[Slony1-commit] By cbbrowne: Added a "best practices" document (not yet complete, but

Wed Apr 20 19:29:05 PDT 2005

Log Message:
-----------
Added a "best practices" document (not yet complete, but most of it is
there, and there is some rough material for the missing bits) and linked
it pretty widely against other sections of the documentation.

Modified Files:
--------------
    slony1-engine/doc/adminguide:
        defineset.sgml (r1.15 -> r1.16)
        faq.sgml (r1.32 -> r1.33)
        filelist.sgml (r1.10 -> r1.11)
        intro.sgml (r1.15 -> r1.16)
        slony.sgml (r1.17 -> r1.18)

Added Files:
-----------
    slony1-engine/doc/adminguide:
        bestpractices.sgml (r1.1)

-------------- next part --------------
Index: defineset.sgml
===================================================================
RCS file: /usr/local/cvsroot/slony1/slony1-engine/doc/adminguide/defineset.sgml,v
retrieving revision 1.15
retrieving revision 1.16
diff -Ldoc/adminguide/defineset.sgml -Ldoc/adminguide/defineset.sgml -u -w -r1.15 -r1.16

--- doc/adminguide/defineset.sgml
+++ doc/adminguide/defineset.sgml
@@ -156,6 +156,16 @@
 value need not be stored over and over; some thought needs to go into
 how to do that safely.</para></listitem>
 
+<listitem><para> <ulink url=
+"http://gborg.postgresql.org/project/slony1/bugs/bugupdate.php?1226">
+Bug #1226 </ulink> indicates an error condition that can come up if
+you have a replication set that consists solely of sequences. </para>
+
+<para> This is documented more in the <link linkend="sequenceset"> FAQ
+here;</link> the long and short is that having a replication set
+consisting only of sequences is not a particularly good
+idea.</para></listitem>
+
 </itemizedlist></para></sect2>
 
 </sect1>
Index: intro.sgml
===================================================================
RCS file: /usr/local/cvsroot/slony1/slony1-engine/doc/adminguide/intro.sgml,v
retrieving revision 1.15
retrieving revision 1.16
diff -Ldoc/adminguide/intro.sgml -Ldoc/adminguide/intro.sgml -u -w -r1.15 -r1.16
--- doc/adminguide/intro.sgml
+++ doc/adminguide/intro.sgml
@@ -60,7 +60,9 @@
 <para>&slony1; was born from an idea to create a replication system
 that was not tied to a specific version of &postgres;, which is
 allowed to be started and stopped on an existing database with out the
-need for a dump/reload cycle.</para> </sect2>
+need for a dump/reload cycle.</para>
+
+</sect2>
 
 <sect2><title> What &slony1; is not</title>
 
@@ -147,10 +149,81 @@
 <quote>standby.</quote></para></listitem>
 
 </itemizedlist></para>
+</sect2>
+
+<sect2><title>Replication Models</title>
 
 <para>There are a number of distinct models for database replication;
 it is impossible for one replication system to be all things to all
-people.</para></sect2>
+people.</para>
+
+<para> &slony1; implements a particular model, namely that of
+asynchronous replication, using triggers, where a single
+<quote>origin</quote> may be replicated to multiple
+<quote>subscribers</quote> including cascaded subscribers.</para>
+
+<para> There are a number of other replication models which are
+<emphasis> different </emphasis>; it is worth pointing out other
+approaches that exist.  &slony1; is certainly not the only approach,
+and for some applications, it is <emphasis> not </emphasis> the
+optimal approach. </para>
+
+<itemizedlist> 
+<listitem><para> Synchronous single-origin multi-subscriber replication</para>
+
+<para> In a synchronous system, updates cannot be committed at the
+origin until they have also been accepted by subscriber nodes.  This
+enhances the security property of nonrepudiation as updates will not
+be committed until they can be confirmed elsewhere.  Unfortunately,
+the requirement that changes be applied in multiple places introduces
+a performance bottleneck.  </para>
+
+<para> This approach is similar to the two phase commit processing
+model of the XA transaction processing protocol.</para>
+</listitem>
+
+<listitem><para> Synchronous multi-origin multi-subscriber replication </para> 
+
+<para> This is the model being used by the forthcoming
+<productname>Slony-II</productname> system.  Synchronous replication
+systems all <quote>suffer</quote> from the performance bottleneck that
+updates must be accepted on all nodes before they can be
+<command>commit</command>ted anywhere.  </para>
+</listitem>
+
+<listitem><para> Asynchronous multimaster replication with conflict
+avoidance/resolution</para>
+
+<para> Perhaps the most widely used replication system of this sort is
+the <productname>PalmOS HotSync</productname> system.
+<trademark>Lotus Notes</trademark> also provides a replication system
+that functions in much this manner.</para>
+
+<para> The characteristic <quote>troublesome problem</quote> with this
+style of replication is that it is possible for conflicts to arise
+because users update the same record in different ways on different
+nodes. </para>
+
+<para> In the case of <productname>HotSync</productname>, if conflicts
+arise due to records being updated on multiple nodes, the
+<quote>resolution</quote> is to simply create a duplicate record to
+reflect the two changes, and have the user resolve the conflict
+manually. </para>
+
+<para> Some async multimaster systems try to resolve conflicts by
+finding ways to apply partial record updates.  For instance, with an
+address update, one user, on one node, might update the phone number
+for an address, and another user might update the street address, and
+the conflict resolution system might try to apply these updates in a
+non-conflicting order.</para>
+
+<para> Conflict resolution systems almost always require some domain
+knowledge of the application being used. </para>
+</listitem>
+
+</itemizedlist>
+
+</sect2>
 </sect1>
 
 <sect1 id="slonylistenercosts"><title>
--- /dev/null
+++ doc/adminguide/bestpractices.sgml
@@ -0,0 +1,164 @@
+<!-- $Id: bestpractices.sgml,v 1.1 2005/04/20 18:29:00 cbbrowne Exp $ --> 
+<sect1 id="bestpractices">
+<title> &slony1; <quote>Best Practices</quote> </title>
+
+<para> It is common for managers to have a desire to operate systems
+using some available, documented set of <quote>best practices.</quote>
+Documenting that sort of thing is essential to ISO 9000, ISO 9001, and
+other sorts of organizational certifications. </para>
+
+<para> It is worthwhile to preface a discussion of <quote>best
+practices</quote> by mentioning that each organization that uses
+&slony1; is unique, and there may be a need for local policies to
+reflect unique local operating characteristics.  It is for that reason
+that &slony1; does <emphasis>not</emphasis> impose its own policies
+for such things as <link linkend="failover"> failover </link>; those
+will need to be determined based on the overall shape of your network,
+of your set of database servers, and of your usage patterns for those
+servers. </para>
+
+<para> There are, however, a number of things that early adopters of
+&slony1; have discovered which can at least help to suggest some
+policies you might want to consider. </para>
+
+<itemizedlist>
+
+<listitem><para> &slony1; is a complex multi-client, multi-server
+system, with the result that there are almost an innumerable set of
+places where problems can arise.  </para> 
+
+<para> As a natural result, maintaining a clean environment is really
+valuable, as any sort of environmental <quote>messiness</quote> can
+either cause unexpected problems or mask the real problem. </para>
+
+<para> Numerous users have reported problems resulting from mismatches
+between &slony1; versions, local libraries, and &postgres; libraries.
+Details count; you need to be clear on what hosts are running what
+versions of what software.
+</para>
+
+</listitem>
+
+<listitem><para> Principle: Long running transactions are Evil </para>
+
+<para> The FAQ has an entry on <link linkend="pglistenerfull"> growth
+of <envar>pg_listener</envar> </link> which discusses this in a fair
+bit of detail; the long and short is that long running transactions
+have numerous ill effects.  They are particularly troublesome on an
+<quote>origin</quote> node, holding onto locks, preventing vacuums
+from taking effect, and the like.</para>
+</listitem>
+
+<listitem><para> <link linkend="Failover"> Failover </link> policies
+should be planned for ahead of time.  </para>
+
+<para> This may simply involve thinking about what the priority lists
+should be of what should fail to what, as opposed to trying to
+automate it.  But knowing what to do ahead of time cuts down on the
+number of mistakes made.
+
+<para> At Afilias, some internal <citation>The 3AM Unhappy DBA's Guide
+to...</citation> guides have been created to provide checklists of
+what to do when <quote>unhappy</quote> things happen; this sort of
+material is highly specific to the applications running, so you would
+need to generate your own such documents.
+</para>
+</listitem>
+
+<listitem><para> <xref linkend="stmtmoveset"> should be used to allow
+preventative maintenance to prevent problems from becoming serious
+enough to require <link linkend="failover"> failover </link>. </para>
+</listitem>
+
+<listitem><para> <command>VACUUM</command> policy needs to be
+carefully defined.</para>
+
+<para> As mentioned above, <quote>long running transactions are
+Evil.</quote> <command>VACUUM</command>s are no exception in this.  A
+<command>VACUUM</command> on a huge table will open a long-running
+transaction with all the known ill effects.</para>
+</listitem>
+
+<listitem><para> Running all of the <xref linkend="slon"> daemons on a
+central server for each network has proven preferable. </para> 
+
+<para> Each <xref linkend="slon"> should run on a host on the same
+local network as the node that it is servicing, as it does a
+<emphasis>lot</emphasis> of communications with its database.  </para>
+
+<para> In theory, the <quote>best</quote> speed would come from
+running the <xref linkend="slon"> on the database server that it is
+servicing. </para>
+
+<para> In practice, having the <xref linkend="slon"> processes strewn
+across a dozen servers turns out to be really inconvenient to manage,
+as making changes to their configuration requires logging onto a whole
+bunch of servers.  In environments where it is necessary to use
+<application>sudo</application> for users to switch to application
+users, this turns out to be seriously inconvenient.  It turns out to
+be <emphasis>much</emphasis> easier to manage to group the <xref
+linkend="slon"> processes on one server per local network, so that
+<emphasis>one</emphasis> script can start, monitor, terminate, and
+otherwise maintain <emphasis>all</emphasis> of the nearby nodes.</para>
+
+<para> That also has the implication that configuration data and
+configuration scripts only need to be maintained in one place,
+eliminating duplication of configuration efforts.</para>
+</listitem>
+
+<listitem><para>The <link linkend="ddlchanges"> Database Schema
+Changes </link> section outlines some practices that have been found
+useful for handling changes to database schemas. </para></listitem>
+
+<listitem><para> Handling of Primary Keys </para> 
+
+<para> Discussed in the section on <link linkend="definingsets">
+Replication Sets, </link> it is <emphasis>ideal</emphasis> if each
+replicated table has a true primary key constraint; it is
+<emphasis>acceptable</emphasis> to use a <quote>candidate primary key.</quote></para>
+
+<para> It is <emphasis>not recommended</emphasis> that a
+&slony1;-defined key be used to introduce a candidate primary key, as
+this introduces the possibility that updates to this table can fail
+due to the introduced unique index, which means that &slony1; has
+introduced a new failure mode for your application.</para>
+</listitem>
+
+<listitem><para> <link linkend="definesets"> Grouping tables into sets
+</link> suggests strategies for determining how to group tables and
+sequences into replication sets. </para> </listitem>
+
+<listitem><para> It should be obvious that actions that can delete a
+lot of data should be taken with great care; the section on <link
+linkend="dropthings"> Dropping things from &slony1; Replication</link>
+discusses the different sorts of <quote>deletion</quote> that &slony1;
+supports.  </para> </listitem>
+
+<listitem><para> listen path management </para> </listitem>
+
+<listitem><para> path configuration </para> </listitem>
+
+<listitem><para> configuring slon </para> </listitem>
+
+<listitem><para> when subscribing nodes </para> </listitem>
+
+<listitem><para> managing use of slonik </para> </listitem>
+
+</itemizedlist>
+
+</sect1>
+<!-- Keep this comment at the end of the file
+Local variables:
+mode:sgml
+sgml-omittag:nil
+sgml-shorttag:t
+sgml-minimize-attributes:nil
+sgml-always-quote-attributes:t
+sgml-indent-step:1
+sgml-indent-data:t
+sgml-parent-document:"book.sgml"
+sgml-exposed-tags:nil
+sgml-local-catalogs:("/usr/lib/sgml/catalog")
+sgml-local-ecat-files:nil
+End:
+-->
Index: faq.sgml
===================================================================
RCS file: /usr/local/cvsroot/slony1/slony1-engine/doc/adminguide/faq.sgml,v
retrieving revision 1.32
retrieving revision 1.33
diff -Ldoc/adminguide/faq.sgml -Ldoc/adminguide/faq.sgml -u -w -r1.32 -r1.33
--- doc/adminguide/faq.sgml
+++ doc/adminguide/faq.sgml
@@ -1038,7 +1038,8 @@
 </screen>
 
 <para>You then need to find the rows in <xref
-linkend="table.sl-log-1"> that have bad entries and fix them.  You may
+linkend="table.sl-log-1"> that have bad 
+entries and fix them.  You may
 want to take down the slon daemons for all nodes except the master;
 that way, if you make a mistake, it won't immediately propagate
 through to the subscribers.</para>
@@ -1418,6 +1419,34 @@
 
 </qandaentry>
 
+<qandaentry id="sequenceset"><question><para> <ulink url=
+"http://gborg.postgresql.org/project/slony1/bugs/bugupdate.php?1226">
+Bug #1226 </ulink> indicates an error condition that can come up if
+you have a replication set that consists solely of sequences. </para>
+</question>
+
+<answer> <para> The  short answer is that having a replication set
+consisting only of sequences is not a <link linkend="bestpractices">
+best practice.</link> </para>
+</answer>
+
+<answer>
+<para> The problem with a sequence-only set comes up only if you have
+a case where the only subscriptions that are active for a particular
+subscriber to a particular provider are for
+<quote>sequence-only</quote> sets.  If a node gets into that state,
+replication will fail, as the query that looks for data from <xref
+linkend="table.sl-log-1"> has no tables to find, and the query will be
+malformed, and fail.  If a replication set <emphasis>with</emphasis>
+tables is added back to the mix, everything will work out fine; it
+just <emphasis>seems</emphasis> scary.
+</para>
+
+<para> This problem should be resolved some time after &slony1;
+1.1.0.</para>
+</answer>
+</qandaentry>
+
 </qandaset>
 
 <!-- Keep this comment at the end of the file Local variables:
Index: slony.sgml
===================================================================
RCS file: /usr/local/cvsroot/slony1/slony1-engine/doc/adminguide/slony.sgml,v
retrieving revision 1.17
retrieving revision 1.18
diff -Ldoc/adminguide/slony.sgml -Ldoc/adminguide/slony.sgml -u -w -r1.17 -r1.18
--- doc/adminguide/slony.sgml
+++ doc/adminguide/slony.sgml
@@ -56,6 +56,7 @@
  &usingslonik;
  &adminscripts;
  &versionupgrade;
+ &bestpractices;
  &help;
 </article>
 
Index: filelist.sgml
===================================================================
RCS file: /usr/local/cvsroot/slony1/slony1-engine/doc/adminguide/filelist.sgml,v
retrieving revision 1.10
retrieving revision 1.11
diff -Ldoc/adminguide/filelist.sgml -Ldoc/adminguide/filelist.sgml -u -w -r1.10 -r1.11
--- doc/adminguide/filelist.sgml
+++ doc/adminguide/filelist.sgml
@@ -37,6 +37,7 @@
 <!entity problems           SYSTEM "problems.sgml">
 <!entity slonybook          SYSTEM "slony.sgml">
 <!entity logshipping        SYSTEM "logshipping.sgml">
+<!entity bestpractices      SYSTEM "bestpractices.sgml">
 
 <!-- back matter -->
 <!entity biblio     SYSTEM "biblio.sgml">