[Slony1-commit] slony1-engine/doc/adminguide faq.sgml loganalysis.sgml slonconf.sgml

Fri Feb 2 12:23:48 PST 2007

Update of /home/cvsd/slony1/slony1-engine/doc/adminguide
In directory main:/tmp/cvs-serv30435/doc/adminguide

Modified Files:
      Tag: REL_1_2_STABLE
	faq.sgml loganalysis.sgml slonconf.sgml 
Log Message:
Added a new conf file option for slon - remote_listen_timeout

This helps if you have a slon down for a Very Long Time so that sl_event
bloats and causes the query on it to time out.

Index: slonconf.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/slonconf.sgml,v
retrieving revision 1.14
retrieving revision 1.14.2.1
diff -C2 -d -r1.14 -r1.14.2.1
*** slonconf.sgml	2 Aug 2006 18:34:59 -0000	1.14
--- slonconf.sgml	2 Feb 2007 20:23:46 -0000	1.14.2.1
***************
*** 431,434 ****
--- 431,445 ----
        </listitem>
      </varlistentry>
+     <varlistentry id="slon-config-remote-listen-timeout" xreflabel="slon_conf_remote_listen_timeout">
+       <term><varname>remote_listen_timeout</varname> (<type>integer</type>)</term>
+       <indexterm>
+         <primary><varname>remote_listen_timeout</varname> configuration parameter</primary>
+       </indexterm>
+       <listitem>
+         <para>How long should the remote listener wait before treating the event selection criteria as having timed out?
+           Range: [30-30000], default 300
+         </para>
+       </listitem>
+     </varlistentry>
    </variablelist>
  </sect1>

Index: faq.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/faq.sgml,v
retrieving revision 1.66.2.2
retrieving revision 1.66.2.3
diff -C2 -d -r1.66.2.2 -r1.66.2.3
*** faq.sgml	27 Oct 2006 15:35:32 -0000	1.66.2.2
--- faq.sgml	2 Feb 2007 20:23:46 -0000	1.66.2.3
***************
*** 999,1002 ****
--- 999,1035 ----
  </qandaentry>

+ <qandaentry>
+ <question><para> One of my nodes fell over (&lslon; / postmaster was
+ down) and nobody noticed for several days.  Now, when the &lslon; for
+ that node starts up, it runs for about five minutes, then terminates,
+ with the error message: <command>ERROR: remoteListenThread_%d: timeout
+ for event selection</command> What's wrong, and what do I do? </para> 
+ </question>
+ 
+ <answer><para> The problem is that the listener thread (in
+ <filename>src/slon/remote_listener.c</filename>) timed out when trying
+ to determine what events were outstanding for that node.  By default,
+ the query will run for five minutes; if there were many days worth of
+ outstanding events, this might take too long.
+  </para> </answer>
+ 
+ <answer><para> On  versions of &slony1; before 1.1.7, 1.2.7, and 1.3, one answer would be to increase the timeout in 
+ <filename>src/slon/remote_listener.c</filename>, recompile &lslon;, and retry.  </para> </answer>
+ 
+ <answer><para> Another would be to treat the node as having failed,
+ and use the &slonik; command <xref linkend="stmtdropnode"> to drop the
+ node, and recreate it.  If the database is heavily updated, it may
+ well be cheaper to do this than it is to find a way to let it catch
+ up.  </para> </answer>
+ 
+ <answer><para> In newer versions of &slony1;, there is a new
+ configuration parameter called <xref
+ linkend="slon-config-remote-listen-timeout">; you'd alter the config
+ file to increase the timeout, and try again.  Of course, as mentioned
+ above, it could be faster to drop the node and recreate it than to let
+ it catch up across a week's worth of updates...  </para> </answer>
+ 
+ </qandaentry>
+ 
  </qandadiv>
  <qandadiv id="faqperformance"> <title> &slony1; FAQ: Performance Issues </title>

Index: loganalysis.sgml
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/loganalysis.sgml,v
retrieving revision 1.4.2.1
retrieving revision 1.4.2.2
diff -C2 -d -r1.4.2.1 -r1.4.2.2
*** loganalysis.sgml	30 Oct 2006 16:29:27 -0000	1.4.2.1
--- loganalysis.sgml	2 Feb 2007 20:23:46 -0000	1.4.2.2
***************
*** 610,613 ****
--- 610,634 ----
  normally...</para></listitem>

+ <listitem><para><command>ERROR: remoteListenThread_%d: timeout for event selection</command></para>
+ 
+ <para> This means that the listener thread
+ (<filename>src/slon/remote_listener.c</filename>) timed out when
+ trying to determine what events were outstanding for it.</para>
+ 
+ <para> This could occur because network connections broke, in which case restarting the &lslon; might help. </para>
+ 
+ <para> Alternatively, this might occur because the &lslon; for this
+ node has been broken for a long time, and there are an enormous number
+ of entries in <envar>sl_event</envar> on this or other nodes for the
+ node to work through, and it is taking more than <xref
+ linkend="slon-config-remote-listen-timeout"> seconds to run the query.
+ In older versions of &slony1;, that configuration parameter did not
+ exist; the timeout was fixed at 300 seconds.  In newer versions, you
+ might increase that timeout in the &lslon; config file to a larger
+ value so that it can continue to completion.  And then investigate why
+ nobody was monitoring things such that replication broke for such a
+ long time... </para>
+ </listitem>
+ 
  <listitem><para><command>ERROR: remoteWorkerThread_%d: cannot connect to data provider %d on 'dsn'</command></para>