Tue Jul 25 17:36:04 PDT 2006
- Previous message: [Slony1-commit] By cbbrowne: Add a description to DDL docs as to how the
- Next message: [Slony1-commit] By darcyb: SET standard_conforming_strings to 'off' for pg 8.2 and
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Log Message: ----------- Documentation augmented... elein pointed out the good question "When is it OK / NOT OK to kill off slons?" I've added some comments on this to the best practices and FAQ. And added a link to the "generate_sync.sh" function... Modified Files: -------------- slony1-engine/doc/adminguide: bestpractices.sgml (r1.21 -> r1.22) faq.sgml (r1.59 -> r1.60) maintenance.sgml (r1.22 -> r1.23) -------------- next part -------------- Index: maintenance.sgml =================================================================== RCS file: /usr/local/cvsroot/slony1/slony1-engine/doc/adminguide/maintenance.sgml,v retrieving revision 1.22 retrieving revision 1.23 diff -Ldoc/adminguide/maintenance.sgml -Ldoc/adminguide/maintenance.sgml -u -w -r1.22 -r1.23 --- doc/adminguide/maintenance.sgml +++ doc/adminguide/maintenance.sgml @@ -82,7 +82,8 @@ thereby your whole day.</para> </sect2> -<sect2><title>Parallel to Watchdog: generate_syncs.sh</title> + +<sect2 id="gensync"><title>Parallel to Watchdog: generate_syncs.sh</title> <para>A new script for &slony1; 1.1 is <application>generate_syncs.sh</application>, which addresses the following kind of Index: bestpractices.sgml =================================================================== RCS file: /usr/local/cvsroot/slony1/slony1-engine/doc/adminguide/bestpractices.sgml,v retrieving revision 1.21 retrieving revision 1.22 diff -Ldoc/adminguide/bestpractices.sgml -Ldoc/adminguide/bestpractices.sgml -u -w -r1.21 -r1.22 --- doc/adminguide/bestpractices.sgml +++ doc/adminguide/bestpractices.sgml @@ -154,7 +154,6 @@ <para> In practice, strewing &lslon; processes and configuration across a dozen servers turns out to be inconvenient to manage.</para> - </listitem> <listitem><para> &lslon; processes should run in the same @@ -175,6 +174,31 @@ condition. </para> </listitem> +<listitem><para> Before getting too excited about having fallen into +some big problem, consider killing and restarting all the &lslon; +processes. Historically, this has frequently been able to +resolve <quote>stickiness.</quote> </para> + +<para> With a very few exceptions, it is generally not a big deal to +kill off and restart the &lslon; processes. Each &lslon; connects to +one database for which it is the manager, and then connects to other +databases as needed to draw in events. If you kill off a &lslon;, all +you do is to interrupt those connections. If +a <command>SYNC</command> or other event is sitting there +half-processed, there's no problem: the transaction will roll back, +and when the &lslon; restarts, it will restart that event from +scratch.</para> + +<para> The exception, where it is undesirable to restart a &lslon;, is +where a <command>COPY_SET</command> is running on a large replication +set, such that stopping the &lslon; may discard several hours worth of +load work. </para> + +<para> In early versions of &slony1;, it was frequently the case that +connections could get a bit <quote>deranged</quote> which restarting +&lslon;s would clean up. This has become much more rare, but it has +occasionally proven useful to restart the &lslon;.</para> </listitem> + <listitem> <para>The <link linkend="ddlchanges"> Database Schema Changes </link> section outlines some practices that have been found useful for Index: faq.sgml =================================================================== RCS file: /usr/local/cvsroot/slony1/slony1-engine/doc/adminguide/faq.sgml,v retrieving revision 1.59 retrieving revision 1.60 diff -Ldoc/adminguide/faq.sgml -Ldoc/adminguide/faq.sgml -u -w -r1.59 -r1.60 --- doc/adminguide/faq.sgml +++ doc/adminguide/faq.sgml @@ -462,6 +462,63 @@ threaten the entire server. </para></answer> </qandaentry> +<qandaentry> +<question><para> When can I shut down &lslon; processes?</para></question> + +<question><para> Are there risks to doing so? How about +benefits?</para></question> + +<answer><para> Generally, it's no big deal to shut down a &lslon; +process. Each one is <quote>merely</quote> a &postgres; client, +managing one node, which spawns threads to manage receiving events +from other nodes. </para> + +<para>The <quote>event listening</quote> threads are no big deal; they +are doing nothing fancier than periodically checking remote nodes to +see if they have work to be done on this node. If you kill off the +&lslon; these threads will be closed, which should have little or no +impact on much of anything. Events generated while the &lslon; is +down will be picked up when it is restarted.</para> + +<para> The <quote>node managing</quote> thread is a bit more +interesting; most of the time, you can expect, on a subscriber, for +this thread to be processing <command>SYNC</command> events. If you +shut off the &lslon; during an event, the transaction +will fail, and be rolled back, so that when the &lslon; restarts, it +will have to go back and reprocess the event.</para> + +<para> The only situation where this will +cause <emphasis>particular</emphasis> <quote>heartburn</quote> is if +the event being processed was one which takes a long time to process, +such as <command>COPY_SET</command> for a large replication +set. </para> + +<para> The other thing that <emphasis>might</emphasis> cause trouble +is if the &lslon; runs fairly distant from nodes that it connects to; +you could discover that database connections are left <command>idle in +transaction</command>. This would normally only occur if the network +connection is destroyed without either &lslon; or database being made +aware of it. In that case, you may discover +that <quote>zombied</quote> connections are left around for as long as +two hours if you don't go in by hand and kill off the &postgres; +backends.</para> + +<para> There is one other case that could cause trouble; when the +&lslon; managing the origin node is not running, +no <command>SYNC</command> events run against that node. If the +&lslon; stays down for an extended period of time, and something +like <xref linkend="gensync"> isn't running, you could be left +with <emphasis>one big <command>SYNC</command></emphasis> to process +when it comes back up. But that is only a concern if that &lslon; is +down for an extended period of time; shutting it down for a few +seconds shouldn't cause any great problem. </para> </answer> + +<answer><para> In short, if you don't have something like an 18 +hour <command>COPY_SET</command> under way, it's normally not at all a +big deal to take a &lslon; down for a little while, or perhaps even +cycle <emphasis>all</emphasis> the &lslon;s. </para> </answer> +</qandaentry> + </qandadiv> <qandadiv id="faqconfiguration"> <title> &slony1; FAQ: Configuration Issues </title>
- Previous message: [Slony1-commit] By cbbrowne: Add a description to DDL docs as to how the
- Next message: [Slony1-commit] By darcyb: SET standard_conforming_strings to 'off' for pg 8.2 and
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-commit mailing list