Thu Apr 30 09:06:12 PDT 2009
- Previous message: [Slony1-commit] slony1-engine/src/slonik slonik.c
- Next message: [Slony1-commit] slony1-engine/doc/adminguide adminscripts.sgml failover.sgml faq.sgml firstdb.sgml installation.sgml monitoring.sgml prerequisites.sgml slonconf.sgml slonik_ref.sgml
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Update of /home/cvsd/slony1/slony1-engine/doc/adminguide In directory main.slony.info:/tmp/cvs-serv3474 Modified Files: Tag: REL_1_2_STABLE addthings.sgml adminscripts.sgml bestpractices.sgml cluster.sgml concepts.sgml ddlchanges.sgml defineset.sgml dropthings.sgml failover.sgml faq.sgml filelist.sgml firstdb.sgml help.sgml installation.sgml intro.sgml legal.sgml listenpaths.sgml locking.sgml loganalysis.sgml logshipping.sgml maintenance.sgml monitoring.sgml partitioning.sgml prerequisites.sgml releasechecklist.sgml reshape.sgml slon.sgml slonconf.sgml slonik_ref.sgml slony.sgml slonyupgrade.sgml startslons.sgml subscribenodes.sgml supportedplatforms.sgml testbed.sgml usingslonik.sgml versionupgrade.sgml Log Message: Draw in doc updates from 2.0 branch Index: legal.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/legal.sgml,v retrieving revision 1.11 retrieving revision 1.11.2.1 diff -C2 -d -r1.11 -r1.11.2.1 *** legal.sgml 2 Aug 2006 18:34:58 -0000 1.11 --- legal.sgml 30 Apr 2009 16:06:10 -0000 1.11.2.1 *************** *** 2,6 **** <copyright> ! <year>2004-2006</year> <holder>The PostgreSQL Global Development Group</holder> </copyright> --- 2,6 ---- <copyright> ! <year>2004-2007</year> <holder>The PostgreSQL Global Development Group</holder> </copyright> *************** *** 10,14 **** <para> ! <productname>PostgreSQL</productname> is Copyright © 2004-2006 by the PostgreSQL Global Development Group and is distributed under the terms of the license of the University of California below. --- 10,14 ---- <para> ! <productname>PostgreSQL</productname> is Copyright © 2004-2007 by the PostgreSQL Global Development Group and is distributed under the terms of the license of the University of California below. Index: locking.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/locking.sgml,v retrieving revision 1.10 retrieving revision 1.10.2.1 diff -C2 -d -r1.10 -r1.10.2.1 *** locking.sgml 2 Aug 2006 18:34:59 -0000 1.10 --- locking.sgml 30 Apr 2009 16:06:10 -0000 1.10.2.1 *************** *** 14,18 **** can access <quote>old tuples.</quote> Most of the time, this allows the gentle user of &postgres; to not need to worry very much about ! locks. </para> <para> Unfortunately, there are several sorts of &slony1; events that --- 14,21 ---- can access <quote>old tuples.</quote> Most of the time, this allows the gentle user of &postgres; to not need to worry very much about ! locks. &slony1; configuration events normally grab locks on an ! internal table, <envar>sl_config_lock</envar>, which should not be ! visible to applications unless they are performing actions on &slony1; ! components. </para> <para> Unfortunately, there are several sorts of &slony1; events that Index: bestpractices.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/bestpractices.sgml,v retrieving revision 1.24.2.1 retrieving revision 1.24.2.2 diff -C2 -d -r1.24.2.1 -r1.24.2.2 *** bestpractices.sgml 16 Mar 2007 19:01:26 -0000 1.24.2.1 --- bestpractices.sgml 30 Apr 2009 16:06:10 -0000 1.24.2.2 *************** *** 104,110 **** <listitem><para> The system will periodically rotate (using <command>TRUNCATE</command> to clean out the old table) between the ! two log tables, <xref linkend="table.sl-log-1"> and <xref ! linkend="table.sl-log-2">, preventing unbounded growth of dead space ! there. </para></listitem> </itemizedlist> --- 104,109 ---- <listitem><para> The system will periodically rotate (using <command>TRUNCATE</command> to clean out the old table) between the ! two log tables, &sllog1; and &sllog2;, preventing unbounded growth of ! dead space there. </para></listitem> </itemizedlist> *************** *** 115,118 **** --- 114,122 ---- should be planned for ahead of time. </para> + <para> Most pointedly, any node that is expected to be a failover + target must have its subscription(s) set up with the option + <command>FORWARD = YES</command>. Otherwise, that node is not a + candidate for being promoted to origin node. </para> + <para> This may simply involve thinking about what the priority lists should be of what should fail to what, as opposed to trying to *************** *** 144,147 **** --- 148,160 ---- </listitem> + <listitem><para> If you are using the autovacuum process in recent + versions of &postgres;, you may wish to leave &slony1; tables out, as + &slony1; is a bit more intelligent about vacuuming when it is expected + to be conspicuously useful (<emphasis>e.g.</emphasis> - immediately + after purging old data) to do so than autovacuum can be. </para> + + <para> See <xref linkend="maintenance-autovac"> for more + details. </para> </listitem> + <listitem> <para> Running all of the &lslon; daemons on a central server for each network has proven preferable. </para> *************** *** 164,174 **** for managing so that the connection to that node is a <quote>local</quote> one. Do <emphasis>not</emphasis> run such links ! across a WAN. </para> ! <para> A WAN outage can leave database connections ! <quote>zombied</quote>, and typical TCP/IP behaviour <link ! linkend="multipleslonconnections"> will allow those connections to ! persist, preventing a slon restart for around two hours. </link> ! </para> <para> It is not difficult to remedy this; you need only <command>kill --- 177,189 ---- for managing so that the connection to that node is a <quote>local</quote> one. Do <emphasis>not</emphasis> run such links ! across a WAN. Thus, if you have nodes in London and nodes in New ! York, the &lslon;s managing London nodes should run in London, and the ! &lslon;s managing New York nodes should run in New York.</para> ! <para> A WAN outage (or flakiness of the WAN in general) can leave ! database connections <quote>zombied</quote>, and typical TCP/IP ! behaviour <link linkend="multipleslonconnections"> will allow those ! connections to persist, preventing a slon restart for around two ! hours. </link> </para> <para> It is not difficult to remedy this; you need only <command>kill *************** *** 193,200 **** scratch.</para> ! <para> The exception, where it is undesirable to restart a &lslon;, is ! where a <command>COPY_SET</command> is running on a large replication ! set, such that stopping the &lslon; may discard several hours worth of ! load work. </para> <para> In early versions of &slony1;, it was frequently the case that --- 208,215 ---- scratch.</para> ! <para> The exception scenario where it is undesirable to restart a ! &lslon; is where a <command>COPY_SET</command> is running on a large ! replication set, such that stopping the &lslon; may discard several ! hours worth of load work. </para> <para> In early versions of &slony1;, it was frequently the case that *************** *** 224,228 **** possibility that updates to this table can fail due to the introduced unique index, which means that &slony1; has introduced a new failure ! mode for your application.</para> </listitem> --- 239,249 ---- possibility that updates to this table can fail due to the introduced unique index, which means that &slony1; has introduced a new failure ! mode for your application. ! </para> ! ! <warning><para> In version 2 of &slony1;, <xref ! linkend="stmttableaddkey"> is no longer supported. You ! <emphasis>must</emphasis> have either a true primary key or a ! candidate primary key. </para></warning> </listitem> *************** *** 281,286 **** lock on them; doing so via <command>execute script</command> requires that &slony1; take out an exclusive lock on <emphasis>all</emphasis> ! replicated tables. This can prove quite inconvenient when ! applications are running; you run into deadlocks and such. </para> <para> One particularly dogmatic position that some hold is that --- 302,310 ---- lock on them; doing so via <command>execute script</command> requires that &slony1; take out an exclusive lock on <emphasis>all</emphasis> ! replicated tables. This can prove quite inconvenient if applications ! are running when running DDL; &slony1; is asking for those exclusive ! table locks, whilst, simultaneously, some application connections are ! gradually relinquishing locks, whilst others are backing up behind the ! &slony1; locks. </para> <para> One particularly dogmatic position that some hold is that *************** *** 428,433 **** </listitem> ! <listitem><para> Use <filename>test_slony_state.pl</filename> to look ! for configuration problems.</para> <para>This is a Perl script which connects to a &slony1; node and then --- 452,457 ---- </listitem> ! <listitem><para> Run <eststate; frequently to discover configuration ! problems as early as possible.</para> <para>This is a Perl script which connects to a &slony1; node and then *************** *** 443,446 **** --- 467,476 ---- tool can run through many of the possible problems for you. </para> + <para> It will also notice a number of sorts of situations where + something has broken. Not only should it be run when problems have + been noticed - it should be run frequently (<emphasis>e.g.</emphasis> + - hourly, or thereabouts) as a general purpose <quote>health + check</quote> for each &slony1; cluster. </para> + </listitem> *************** *** 491,494 **** --- 521,533 ---- user out of the new subscriber because: </para> + + <para> It is also a very good idea to change &lslon; configuration for + <xref linkend="slon-config-sync-interval"> on the origin node to + reduce how many <command>SYNC</command> events are generated. If the + subscription takes 8 hours, there is little sense in there being 28800 + <command>SYNC</command>s waiting to be applied. Running a + <command>SYNC</command> every minute or so is likely to make catching + up easier.</para> + </listitem> </itemizedlist> *************** *** 575,580 **** <para> There will correspondingly be an <emphasis>enormous</emphasis> ! growth of <xref linkend="table.sl-log-1"> and <xref ! linkend="table.sl-seqlog">. Unfortunately, once the <command>COPY_SET</command> completes, users have found that the queries against these tables wind up reverting to <command>Seq --- 614,618 ---- <para> There will correspondingly be an <emphasis>enormous</emphasis> ! growth of &sllog1;, &sllog2;, and &slseqlog;. Unfortunately, once the <command>COPY_SET</command> completes, users have found that the queries against these tables wind up reverting to <command>Seq *************** *** 599,605 **** the exact form that the index setup should take. </para> ! <para> In 1.2, there is a process that runs automatically to add ! partial indexes by origin node number, which should be the optimal ! form for such an index to take. </para> </listitem> --- 637,643 ---- the exact form that the index setup should take. </para> ! <para> In 1.2 and later versions, there is a process that runs ! automatically to add partial indexes by origin node number, which ! should be the optimal form for such an index to take. </para> </listitem> Index: slonik_ref.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/slonik_ref.sgml,v retrieving revision 1.61.2.13 retrieving revision 1.61.2.14 diff -C2 -d -r1.61.2.13 -r1.61.2.14 *** slonik_ref.sgml 19 Jun 2008 20:34:00 -0000 1.61.2.13 --- slonik_ref.sgml 30 Apr 2009 16:06:10 -0000 1.61.2.14 *************** *** 49,55 **** The slonik command language is format free. Commands begin with keywords and are terminated with a semicolon. Most commands have ! a list of parameters, some of which have default values and are ! therefore optional. The parameters of commands are enclosed in ! parentheses. Each option consists of one or more keywords, followed by an equal sign, followed by a value. Multiple options inside the parentheses are separated by commas. All keywords are --- 49,55 ---- The slonik command language is format free. Commands begin with keywords and are terminated with a semicolon. Most commands have [...1308 lines suppressed...] + <para> + This completes the work done by <xref + linkend="stmtcloneprepare">, establishing confirmation data for + the new <quote>clone</quote> based on the status found for the + <quote>provider</quote> node. + </para> + </Refsect1> + <Refsect1><Title>Example</Title> + <Programlisting> + clone finish (id = 33, provider = 22); + </Programlisting> + </Refsect1> + <refsect1> <title> Version Information </title> + <para> This command was introduced in &slony1; 2.0. </para> + </refsect1> + </Refentry> + + </reference> <!-- Keep this comment at the end of the file Index: subscribenodes.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/subscribenodes.sgml,v retrieving revision 1.16 retrieving revision 1.16.2.1 diff -C2 -d -r1.16 -r1.16.2.1 *** subscribenodes.sgml 2 Aug 2006 18:34:59 -0000 1.16 --- subscribenodes.sgml 30 Apr 2009 16:06:10 -0000 1.16.2.1 *************** *** 94,98 **** <screen> ! 2005-04-13 07:11:28 PDT ERROR remoteWorkerThread_11: "declare LOG cursor for select log_origin, log_xid, log_tableid, log_actionseq, log_cmdtype, log_cmddata from "_T1".sl_log_1 where log_origin = 11 and --- 94,98 ---- <screen> ! 2007-04-13 07:11:28 PDT ERROR remoteWorkerThread_11: "declare LOG cursor for select log_origin, log_xid, log_tableid, log_actionseq, log_cmdtype, log_cmddata from "_T1".sl_log_1 where log_origin = 11 and Index: loganalysis.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/loganalysis.sgml,v retrieving revision 1.4.2.5 retrieving revision 1.4.2.6 diff -C2 -d -r1.4.2.5 -r1.4.2.6 *** loganalysis.sgml 22 Oct 2007 20:47:48 -0000 1.4.2.5 --- loganalysis.sgml 30 Apr 2009 16:06:10 -0000 1.4.2.6 *************** *** 26,33 **** </screen></para></sect2> <sect2><title>DEBUG Notices</title> ! <para>Debug notices are always prefaced by the name of the thread that ! the notice originates from. You will see messages from the following threads: --- 26,48 ---- </screen></para></sect2> + <sect2><title>INFO notices</title> + + <para> Events that take place that seem like they will generally be of + interest are recorded at the INFO level, and, just as with CONFIG + notices, are always listed. </para> + + </sect2> + <sect2><title>DEBUG Notices</title> ! <para>Debug notices are of less interest, and will quite likely only ! need to be shown if you are running into some problem with &slony1;.</para> ! ! </sect2> ! ! <sect2><title>Thread name </title> ! ! <para> Notices are always prefaced by the name of the thread from ! which the notice originates. You will see messages from the following threads: *************** *** 60,68 **** </para> ! <para> How much information they display is controlled by ! the <envar>log_level</envar> &lslon; parameter; ! ERROR/WARN/CONFIG/INFO messages will always be displayed, while ! choosing increasing values from 1 to 4 will lead to additional DEBUG ! level messages being displayed. </para> </sect2> --- 75,83 ---- </para> ! <para> How much information they display is controlled by the ! <envar>log_level</envar> &lslon; parameter; ERROR/WARN/CONFIG/INFO ! messages will always be displayed, while choosing increasing values ! from 1 to 4 will lead to additional DEBUG level messages being ! displayed. </para> </sect2> *************** *** 177,185 **** <para> This section lists numerous of the error messages found in &slony1;, along with a brief explanation of implications. It is a ! fairly well comprehensive list, leaving out mostly some of ! the <command>DEBUG4</command> messages that are generally uninteresting.</para> ! <sect3 id="logshiplog"><title> Log Messages Associated with Log Shipping </title> <para> Most of these represent errors that come up if --- 192,201 ---- <para> This section lists numerous of the error messages found in &slony1;, along with a brief explanation of implications. It is a ! fairly comprehensive list, only leaving out some of the ! <command>DEBUG4</command> messages that are almost always uninteresting.</para> ! <sect3 id="logshiplog"><title> Log Messages Associated with Log ! Shipping </title> <para> Most of these represent errors that come up if *************** *** 1030,1034 **** <listitem><para><command>WARN: remoteWorkerThread_%d: event %d ignored - origin inactive</command></para> ! <para> This shouldn't occur now (2006) as we don't support the notion of deactivating a node... </para> </listitem> --- 1046,1050 ---- <listitem><para><command>WARN: remoteWorkerThread_%d: event %d ignored - origin inactive</command></para> ! <para> This shouldn't occur now (2007) as we don't support the notion of deactivating a node... </para> </listitem> *************** *** 1044,1047 **** --- 1060,1072 ---- of <command>STORE_NODE</command> requests not propagating... </para> </listitem> + + <listitem><para><command>insert or update on table "sl_path" violates + foreign key constraint "pa_client-no_id-ref". DETAIL: Key + (pa_client)=(2) is not present on table "s1_node</command></para> + + <para> This happens if you try to do <xref linkend="stmtsubscribeset"> + when the node unaware of a would-be new node; probably a sign of + <command>STORE_NODE</command> and <command>STORE_PATH</command> + requests not propagating... </para> </listitem> </itemizedlist> </sect3> Index: slonconf.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/slonconf.sgml,v retrieving revision 1.14.2.4 retrieving revision 1.14.2.5 diff -C2 -d -r1.14.2.4 -r1.14.2.5 *** slonconf.sgml 7 May 2008 19:26:33 -0000 1.14.2.4 --- slonconf.sgml 30 Apr 2009 16:06:10 -0000 1.14.2.5 *************** *** 87,96 **** </indexterm> <listitem> ! <para>Debug log level (higher value ==> more output). Range: [0,4], default 2</para> <para> There are <link linkend="nineloglevels">nine log ! message types</link>; using this option, some or all of ! the <quote>debugging</quote> levels may be left out of the ! slon logs. </para> </listitem> --- 87,101 ---- </indexterm> <listitem> ! <para>Debug log level (higher value ==> more output). Range: [0,4], default 0</para> <para> There are <link linkend="nineloglevels">nine log ! message types</link>; using this option, some or all of the ! <quote>debugging</quote> levels may be left out of the slon ! logs. In &slony1; version 2, a lot of log message levels have ! been revised in an attempt to ensure the <quote>interesting ! stuff</quote> comes in at CONFIG/INFO levels, so that you ! could run at level 0, omitting all of the <quote>DEBUG</quote> ! messages, and still have meaningful contents in the ! logs. </para> </listitem> *************** *** 118,127 **** appear in each log line entry. </para> </listitem> </varlistentry> - - - <varlistentry id="slon-config-logging-log-timestamp-format" xreflabel="slon_conf_log_timestamp_format"> <term><varname>log_timestamp_format</varname> (<type>string</type>)</term> --- 123,135 ---- appear in each log line entry. </para> + + <para> Note that if <envar>syslog</envar> usage is configured, + then this is ignored; it is assumed that + <application>syslog</application> will be supplying + timestamps, and timestamps are therefore suppressed. + </para> </listitem> </varlistentry> <varlistentry id="slon-config-logging-log-timestamp-format" xreflabel="slon_conf_log_timestamp_format"> <term><varname>log_timestamp_format</varname> (<type>string</type>)</term> *************** *** 267,270 **** --- 275,285 ---- Range: [10-60000], default 100 </para> + + <para> This parameter is primarily of concern on nodes that + originate replication sets. On a non-origin node, there + will never be update activity that would induce a SYNC; + instead, the timeout value described below will induce a + SYNC every so often <emphasis>despite absence of changes to + replicate.</emphasis> </para> </listitem> </varlistentry> *************** *** 293,296 **** --- 308,346 ---- default 1000 </para> + + <para> This parameter is likely to be primarily of concern on + nodes that originate replication sets, though it does affect + how often events are generated on other nodes.</para> + + <para> + On a non-origin node, there never is activity to cause a + SYNC to get generated; as a result, there will be a SYNC + generated every <envar>sync_interval_timeout</envar> + milliseconds. There are no subscribers looking for those + SYNCs, so these events do not lead to any replication + activity. They will, however, clutter sl_event up a little, + so it would be undesirable for this timeout value to be set + too terribly low. 120000ms represents 2 minutes, which is + not a terrible value. + </para> + + <para> The two values function together in varying ways: </para> + + <para> On an origin node, <envar>sync_interval</envar> is + the <emphasis>minimum</emphasis> time period that will be + covered by a SYNC, and during periods of heavy application + activity, it may be that a SYNC is being generated + every <envar>sync_interval</envar> milliseconds. </para> + + <para> On that same origin node, there may be quiet intervals, + when no replicatable changes are being submitted. A SYNC will + be induced, anyways, + every <envar>sync_interval_timeout</envar> + milliseconds. </para> + + <para> On a subscriber node that does not originate any sets, + only the <quote>timeout-induced</quote> SYNCs will + occur. </para> + </listitem> </varlistentry> *************** *** 302,317 **** </indexterm> <listitem> <para> ! Maximum number of <command>SYNC</command> events to group ! together when/if a subscriber falls behind. ! <command>SYNC</command>s are batched only if there are that ! many available and if they are contiguous. Every other event ! type in between leads to a smaller batch. And if there is ! only one <command>SYNC</command> available, even ! <option>-g60</option> will apply just that one. As soon as a ! subscriber catches up, it will apply every single ! <command>SYNC</command> by itself. Range: [0,10000], default: ! 20 </para> </listitem> </varlistentry> --- 352,372 ---- </indexterm> <listitem> + <para> ! Maximum number of <command>SYNC</command> events that a ! subscriber node will group together when/if a subscriber ! falls behind. <command>SYNC</command>s are batched only if ! there are that many available and if they are ! contiguous. Every other event type in between leads to a ! smaller batch. And if there is only ! one <command>SYNC</command> available, even though you used ! <option>-g600</option>, the &lslon; will apply just the one ! that is available. As soon as a subscriber catches up, it ! will tend to apply each ! <command>SYNC</command> by itself, as a singleton, unless ! processing should fall behind for some reason. Range: ! [0,10000], default: 20 </para> + </listitem> </varlistentry> *************** *** 331,334 **** --- 386,420 ---- </listitem> </varlistentry> + + + <varlistentry id="slon-config-cleanup-interval" xreflabel="slon_config_cleanup_interval"> + <term><varname>cleanup_interval</varname> (<type>interval</type>)</term> + <indexterm> + <primary><varname>cleanup_interval</varname> configuration parameter</primary> + </indexterm> + <listitem> + <para> + Controls how quickly old events are trimmed out. That + subsequently controls when the data in the log tables, + <envar>sl_log_1</envar> and <envar>sl_log_2</envar>, get + trimmed out. Default: '10 minutes'. + </para> + </listitem> + </varlistentry> + + <varlistentry id="slon-config-cleanup-deletelogs" xreflabel="slon_conf_cleanup_deletelogs"> + <term><varname>cleanup_deletelogs</varname> (<type>boolean</type>)</term> + <indexterm> + <primary><varname>cleanup_deletelogs</varname> configuration parameter</primary> + </indexterm> + <listitem> + <para> + Controls whether or not we use DELETE to trim old data from the log tables, + <envar>sl_log_1</envar> and <envar>sl_log_2</envar>. + Default: false + </para> + </listitem> + </varlistentry> + <varlistentry id="slon-config-desired-sync-time" xreflabel="desired_sync_time"> <term><varname>desired_sync_time</varname> (<type>integer</type>)</term> *************** *** 443,447 **** </indexterm> <listitem> ! <para>How long, in milliseconds should the remote listener wait before treating the event selection criteria as having timed out? Range: [30-30000], default 300ms </para> --- 529,533 ---- </indexterm> <listitem> ! <para>How long, in milliseconds, should the remote listener wait before treating the event selection criteria as having timed out? Range: [30-30000], default 300ms </para> Index: filelist.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/filelist.sgml,v retrieving revision 1.18.2.1 retrieving revision 1.18.2.2 diff -C2 -d -r1.18.2.1 -r1.18.2.2 *** filelist.sgml 5 Sep 2007 21:36:31 -0000 1.18.2.1 --- filelist.sgml 30 Apr 2009 16:06:10 -0000 1.18.2.2 *************** *** 45,49 **** --- 45,51 ---- <!entity slonyupgrade SYSTEM "slonyupgrade.sgml"> <!entity releasechecklist SYSTEM "releasechecklist.sgml"> + <!entity raceconditions SYSTEM "raceconditions.sgml"> <!entity partitioning SYSTEM "partitioning.sgml"> + <!entity triggers SYSTEM "triggers.sgml"> <!-- back matter --> Index: reshape.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/reshape.sgml,v retrieving revision 1.20.2.1 retrieving revision 1.20.2.2 diff -C2 -d -r1.20.2.1 -r1.20.2.2 *** reshape.sgml 22 Oct 2007 20:50:55 -0000 1.20.2.1 --- reshape.sgml 30 Apr 2009 16:06:10 -0000 1.20.2.2 *************** *** 40,43 **** --- 40,48 ---- about <xref linkend="stmtstorelisten">.</para></listitem> + <listitem><para> After performing the configuration change, you + should, as <xref linkend="bestpractices">, run the <eststate; + scripts in order to validate that the cluster state remains in good + order after this change. </para> </listitem> + </itemizedlist> </para> Index: monitoring.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/monitoring.sgml,v retrieving revision 1.29.2.8 retrieving revision 1.29.2.9 diff -C2 -d -r1.29.2.8 -r1.29.2.9 *** monitoring.sgml 11 Jun 2007 16:01:33 -0000 1.29.2.8 --- monitoring.sgml 30 Apr 2009 16:06:10 -0000 1.29.2.9 *************** *** 5,8 **** --- 5,168 ---- <indexterm><primary>monitoring &slony1;</primary></indexterm> + <para> As a prelude to the discussion, it is worth pointing out that + since the bulk of &slony1; functionality is implemented via running + database functions and SQL queries against tables within a &slony1; + schema, most of the things that one might want to monitor about + replication may be found by querying tables in the schema created for + the cluster in each database in the cluster. </para> + + <para> Here are some of the tables that contain information likely to + be particularly interesting from a monitoring and diagnostic + perspective.</para> + + <glosslist> + <glossentry><glossterm><envar>sl_status</envar></glossterm> + + <glossdef><para>This view is the first, most obviously useful thing to + look at from a monitoring perspective. It looks at the local node's + events, and checks to see how quickly they are being confirmed on + other nodes.</para> + + <para> The view is primarily useful to run against an origin + (<quote>master</quote>) node, as it is only there where the events + generated are generally expected to require interesting work to be + done. The events generated on non-origin nodes tend to + be <command>SYNC</command> events that require no replication work be + done, and that are nearly no-ops, as a + result. </para></glossdef></glossentry> + + <glossentry><glossterm>&slconfirm;</glossterm> + + <glossdef><para>Contains confirmations of replication events; this may be used to infer which events have, and <emphasis>have not</emphasis> been processed.</para></glossdef></glossentry> + + <glossentry><glossterm>&slevent;</glossterm> + <glossdef><para>Contains information about the replication events processed on the local node. </para></glossdef></glossentry> + + <glossentry><glossterm> + &sllog1; + and + &sllog2; + </glossterm> + + <glossdef><para>These tables contain replicable data. On an origin node, this is the <quote>queue</quote> of data that has not necessarily been replicated everywhere. By examining the table, you may examine the details of what data is replicable. </para></glossdef></glossentry> + + <glossentry><glossterm>&slnode;</glossterm> + <glossdef><para>The list of nodes in the cluster.</para></glossdef></glossentry> + + <glossentry><glossterm>&slpath;</glossterm> + <glossdef><para>This table holds connection information indicating how &lslon; processes are to connect to remote nodes, whether to access events, or to request replication data. </para></glossdef></glossentry> + + <glossentry><glossterm>&sllisten;</glossterm> + + <glossdef><para>This configuration table indicates how nodes listen + for events coming from other nodes. Usually this is automatically + populated; generally you can detect configuration problems by this + table being <quote>underpopulated.</quote> </para></glossdef></glossentry> + + <glossentry><glossterm>&slregistry;</glossterm> + + <glossdef><para>A configuration table that may be used to store + miscellaneous runtime data. Presently used only to manage switching + between the two log tables. </para></glossdef></glossentry> + + <glossentry><glossterm>&slseqlog;</glossterm> + + <glossdef><para>Contains the <quote>last value</quote> of replicated + sequences.</para></glossdef></glossentry> + + <glossentry><glossterm>&slset;</glossterm> + + <glossdef><para>Contains definition information for replication sets, + which is the mechanism used to group together related replicable + tables and sequences.</para></glossdef></glossentry> + + <glossentry><glossterm>&slsetsync;</glossterm> + <glossdef><para>Contains information about the state of synchronization of each replication set, including transaction snapshot data.</para></glossdef></glossentry> + + <glossentry><glossterm>&slsubscribe;</glossterm> + <glossdef><para>Indicates what subscriptions are in effect for each replication set.</para></glossdef></glossentry> + + <glossentry><glossterm>&sltable;</glossterm> + <glossdef><para>Contains the list of tables being replicated.</para></glossdef></glossentry> + + </glosslist> + + <sect2 id="testslonystate"> <title> test_slony_state</title> + + <indexterm><primary>script test_slony_state to test replication state</primary></indexterm> + + <para> This invaluable script does various sorts of analysis of the + state of a &slony1; cluster. &slony1; <xref linkend="bestpractices"> + recommend running these scripts frequently (hourly seems suitable) to + find problems as early as possible. </para> + + <para> You specify arguments including <option>database</option>, + <option>host</option>, <option>user</option>, + <option>cluster</option>, <option>password</option>, and + <option>port</option> to connect to any of the nodes on a cluster. + You also specify a <option>mailprog</option> command (which should be + a program equivalent to <productname>Unix</productname> + <application>mailx</application>) and a recipient of email. </para> + + <para> You may alternatively specify database connection parameters + via the environment variables used by + <application>libpq</application>, <emphasis>e.g.</emphasis> - using + <envar>PGPORT</envar>, <envar>PGDATABASE</envar>, + <envar>PGUSER</envar>, <envar>PGSERVICE</envar>, and such.</para> + + <para> The script then rummages through <xref linkend="table.sl-path"> + to find all of the nodes in the cluster, and the DSNs to allow it to, + in turn, connect to each of them.</para> + + <para> For each node, the script examines the state of things, + including such things as: + + <itemizedlist> + <listitem><para> Checking <xref linkend="table.sl-listen"> for some + <quote>analytically determinable</quote> problems. It lists paths + that are not covered.</para></listitem> + + <listitem><para> Providing a summary of events by origin node</para> + + <para> If a node hasn't submitted any events in a while, that likely + suggests a problem.</para></listitem> + + <listitem><para> Summarizes the <quote>aging</quote> of table <xref + linkend="table.sl-confirm"> </para> + + <para> If one or another of the nodes in the cluster hasn't reported + back recently, that tends to lead to cleanups of tables like &sllog1;, + &sllog2; and &slseqlog; not taking place.</para></listitem> + + <listitem><para> Summarizes what transactions have been running for a + long time</para> + + <para> This only works properly if the statistics collector is + configured to collect command strings, as controlled by the option + <option> stats_command_string = true </option> in <filename> + postgresql.conf </filename>.</para> + + <para> If you have broken applications that hold connections open, + this will find them.</para> + + <para> If you have broken applications that hold connections open, + that has several unsalutory effects as <link + linkend="longtxnsareevil"> described in the + FAQ</link>.</para></listitem> + + </itemizedlist></para> + + <para> The script does some diagnosis work based on parameters in the + script; if you don't like the values, pick your favorites!</para> + + <note><para> Note that there are two versions, one using the + <quote>classic</quote> <filename>Pg.pm</filename> Perl module for + accessing &postgres; databases, and one, with <filename>dbi</filename> + in its name, that uses the newer Perl <function> DBI</function> + interface. It is likely going to be easier to find packaging for + <function>DBI</function>. </para> </note> + + </sect2> + <sect2> <title> &nagios; Replication Checks </title> *************** *** 95,166 **** Options[db_replication_lagtime]: gauge,nopercent,growright </programlisting> - </sect2> - - <sect2 id="testslonystate"> <title> test_slony_state</title> - - <indexterm><primary>script test_slony_state to test replication state</primary></indexterm> - - <para> This script does various sorts of analysis of the state of a - &slony1; cluster.</para> - - <para> You specify arguments including <option>database</option>, - <option>host</option>, <option>user</option>, - <option>cluster</option>, <option>password</option>, and - <option>port</option> to connect to any of the nodes on a cluster. - You also specify a <option>mailprog</option> command (which should be - a program equivalent to <productname>Unix</productname> - <application>mailx</application>) and a recipient of email. </para> - - <para> You may alternatively specify database connection parameters - via the environment variables used by - <application>libpq</application>, <emphasis>e.g.</emphasis> - using - <envar>PGPORT</envar>, <envar>PGDATABASE</envar>, - <envar>PGUSER</envar>, <envar>PGSERVICE</envar>, and such.</para> - - <para> The script then rummages through <xref linkend="table.sl-path"> - to find all of the nodes in the cluster, and the DSNs to allow it to, - in turn, connect to each of them.</para> - - <para> For each node, the script examines the state of things, - including such things as: ! <itemizedlist> ! <listitem><para> Checking <xref linkend="table.sl-listen"> for some ! <quote>analytically determinable</quote> problems. It lists paths ! that are not covered.</para></listitem> ! ! <listitem><para> Providing a summary of events by origin node</para> ! ! <para> If a node hasn't submitted any events in a while, that likely ! suggests a problem.</para></listitem> ! ! <listitem><para> Summarizes the <quote>aging</quote> of table <xref ! linkend="table.sl-confirm"> </para> ! <para> If one or another of the nodes in the cluster hasn't reported ! back recently, that tends to lead to cleanups of tables like <xref ! linkend="table.sl-log-1"> and <xref linkend="table.sl-seqlog"> not ! taking place.</para></listitem> ! <listitem><para> Summarizes what transactions have been running for a ! long time</para> ! <para> This only works properly if the statistics collector is ! configured to collect command strings, as controlled by the option ! <option> stats_command_string = true </option> in <filename> ! postgresql.conf </filename>.</para> ! <para> If you have broken applications that hold connections open, ! this will find them.</para> ! <para> If you have broken applications that hold connections open, ! that has several unsalutory effects as <link ! linkend="longtxnsareevil"> described in the ! FAQ</link>.</para></listitem> - </itemizedlist></para> ! <para> The script does some diagnosis work based on parameters in the ! script; if you don't like the values, pick your favorites!</para> </sect2> --- 255,292 ---- Options[db_replication_lagtime]: gauge,nopercent,growright </programlisting> ! <para> Alternatively, Ismail Yenigul points out how he managed to ! monitor slony using <application>MRTG</application> without installing ! <application>SNMPD</application>.</para> ! <para> Here is the mrtg configuration</para> ! <programlisting> ! Target[db_replication_lagtime]:`/bin/snmpReplicationLagTime.sh 2` ! MaxBytes[db_replication_lagtime]: 400000000 ! Title[db_replication_lagtime]: db: replication lag time ! PageTop[db_replication_lagtime]: <H1>db: replication lag time</H1> ! Options[db_replication_lagtime]: gauge,nopercent,growright ! </programlisting> ! <para> and here is the modified version of the script</para> ! <programlisting> ! # cat /bin/snmpReplicationLagTime.sh ! #!/bin/bash ! output=`/usr/bin/psql -U slony -h 192.168.1.1 -d endersysecm -qAt -c ! "select cast(extract(epoch from st_lag_time) as int8) FROM _mycluster.sl_status WHERE st_received = $1"` ! echo $output ! echo $output ! echo ! echo ! # end of script# ! </programlisting> ! <note><para> MRTG expects four lines from the script, and since there ! are only two lines provided, the output must be padded to four ! lines. </para> </note> </sect2> *************** *** 194,198 **** <filename>tools</filename>, may be used to generate a cluster summary compatible with the popular <ulink url="http://www.mediawiki.org/"> ! MediaWiki </ulink> software. </para> <para> The gentle user might use the script as follows: </para> --- 320,330 ---- <filename>tools</filename>, may be used to generate a cluster summary compatible with the popular <ulink url="http://www.mediawiki.org/"> ! MediaWiki </ulink> software. Note that the ! <option>--categories</option> permits the user to specify a set of ! (comma-delimited) categories with which to associate the output. If ! you have a series of &slony1; clusters, passing in the option ! <option>--categories=slony1</option> leads to the MediaWiki instance ! generating a category page listing all &slony1; clusters so ! categorized on the wiki. </para> <para> The gentle user might use the script as follows: </para> *************** *** 201,205 **** ~/logtail.en> mvs login -d mywiki.example.info -u "Chris Browne" -p `cat ~/.wikipass` -w wiki/index.php Doing login with host: logtail and lang: en ! ~/logtail.en> perl $SLONYHOME/tools/mkmediawiki.pl --host localhost --database slonyregress1 --cluster slony_regress1 > Slony_replication.wiki ~/logtail.en> mvs commit -m "More sophisticated generated Slony-I cluster docs" Slony_replication.wiki Doing commit Slony_replication.wiki with host: logtail and lang: en --- 333,337 ---- ~/logtail.en> mvs login -d mywiki.example.info -u "Chris Browne" -p `cat ~/.wikipass` -w wiki/index.php Doing login with host: logtail and lang: en ! ~/logtail.en> perl $SLONYHOME/tools/mkmediawiki.pl --host localhost --database slonyregress1 --cluster slony_regress1 --categories=Slony-I > Slony_replication.wiki ~/logtail.en> mvs commit -m "More sophisticated generated Slony-I cluster docs" Slony_replication.wiki Doing commit Slony_replication.wiki with host: logtail and lang: en *************** *** 213,216 **** --- 345,424 ---- </sect2> + + <sect2> <title> Analysis of a SYNC </title> + + <para> The following is (as of 2.0) an extract from the &lslon; log for node + #2 in a run of <quote>test1</quote> from the <xref linkend="testbed">. </para> + + <screen> + DEBUG2 remoteWorkerThread_1: SYNC 19 processing + INFO about to monitor_subscriber_query - pulling big actionid list 134885072 + INFO remoteWorkerThread_1: syncing set 1 with 4 table(s) from provider 1 + DEBUG2 ssy_action_list length: 0 + DEBUG2 remoteWorkerThread_1: current local log_status is 0 + DEBUG2 remoteWorkerThread_1_1: current remote log_status = 0 + DEBUG1 remoteHelperThread_1_1: 0.028 seconds delay for first row + DEBUG1 remoteHelperThread_1_1: 0.978 seconds until close cursor + INFO remoteHelperThread_1_1: inserts=144 updates=1084 deletes=0 + INFO remoteWorkerThread_1: sync_helper timing: pqexec (s/count)- provider 0.063/6 - subscriber 0.000/6 + INFO remoteWorkerThread_1: sync_helper timing: large tuples 0.315/288 + DEBUG2 remoteWorkerThread_1: cleanup + INFO remoteWorkerThread_1: SYNC 19 done in 1.272 seconds + INFO remoteWorkerThread_1: SYNC 19 sync_event timing: pqexec (s/count)- provider 0.001/1 - subscriber 0.004/1 - IUD 0.972/248 + </screen> + + <para> Here are some notes to interpret this output: </para> + + <itemizedlist> + <listitem><para> Note the line that indicates <screen>inserts=144 updates=1084 deletes=0</screen> </para> + <para> This indicates how many tuples were affected by this particular SYNC. </para></listitem> + <listitem><para> Note the line indicating <screen>0.028 seconds delay for first row</screen></para> + + <para> This indicates the time it took for the <screen>LOG + cursor</screen> to get to the point of processing the first row of + data. Normally, this takes a long time if the SYNC is a large one, + and one requiring sorting of a sizable result set.</para></listitem> + + <listitem><para> Note the line indicating <screen>0.978 seconds until + close cursor</screen></para> + + <para> This indicates how long processing took against the + provider.</para></listitem> + + <listitem><para> sync_helper timing: large tuples 0.315/288 </para> + + <para> This breaks off, as a separate item, the number of large tuples + (<emphasis>e.g.</emphasis> - where size exceeded the configuration + parameter <xref linkend="slon-config-max-rowsize">) and where the + tuples had to be processed individually. </para></listitem> + + <listitem><para> <screen>SYNC 19 done in 1.272 seconds</screen></para> + + <para> This indicates that it took 1.272 seconds, in total, to process + this set of SYNCs. </para> + </listitem> + + <listitem><para> <screen>SYNC 19 sync_event timing: pqexec (s/count)- provider 0.001/1 - subscriber 0.004/0 - IUD 0.972/248</screen></para> + + <para> This records information about how many queries were issued + against providers and subscribers in function + <function>sync_event()</function>, and how long they took. </para> + + <para> Note that 248 does not match against the numbers of inserts, + updates, and deletes, described earlier, as I/U/D requests are + clustered into groups of queries that are submitted via a single + <function>pqexec()</function> call on the + subscriber. </para></listitem> + + <listitem><para> <screen>sync_helper timing: pqexec (s/count)- provider 0.063/6 - subscriber 0.000/6</screen></para> + + <para> This records information about how many queries were issued + against providers and subscribers in function + <function>sync_helper()</function>, and how long they took. + </para></listitem> + + </itemizedlist> + + </sect2> </sect1> <!-- Keep this comment at the end of the file Index: usingslonik.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/usingslonik.sgml,v retrieving revision 1.18.2.1 retrieving revision 1.18.2.2 diff -C2 -d -r1.18.2.1 -r1.18.2.2 *** usingslonik.sgml 11 Jun 2007 16:01:33 -0000 1.18.2.1 --- usingslonik.sgml 30 Apr 2009 16:06:10 -0000 1.18.2.2 *************** *** 112,123 **** try { - table add key (node id = 1, fully qualified name = - 'public.history'); - } - on error { - exit 1; - } - - try { create set (id = 1, origin = 1, comment = 'Set 1 - pgbench tables'); --- 112,115 ---- *************** *** 133,137 **** set add table (set id = 1, origin = 1, id = 4, fully qualified name = 'public.history', ! key = serial, comment = 'Table accounts'); } on error { --- 125,129 ---- set add table (set id = 1, origin = 1, id = 4, fully qualified name = 'public.history', ! comment = 'Table accounts'); } on error { *************** *** 173,182 **** $PREAMBLE try { - table add key (node id = $origin, fully qualified name = - 'public.history'); - } on error { - exit 1; - } - try { create set (id = $mainset, origin = $origin, comment = 'Set $mainset - pgbench tables'); --- 165,168 ---- *************** *** 192,196 **** set add table (set id = $mainset, origin = $origin, id = 4, fully qualified name = 'public.history', ! key = serial, comment = 'Table accounts'); } on error { exit 1; --- 178,182 ---- set add table (set id = $mainset, origin = $origin, id = 4, fully qualified name = 'public.history', ! comment = 'Table accounts'); } on error { exit 1; *************** *** 222,231 **** $PREAMBLE try { - table add key (node id = $origin, fully qualified name = - 'public.history'); - } on error { - exit 1; - } - try { create set (id = $mainset, origin = $origin, comment = 'Set $mainset - pgbench tables'); --- 208,211 ---- Index: listenpaths.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/listenpaths.sgml,v retrieving revision 1.19.2.1 retrieving revision 1.19.2.2 diff -C2 -d -r1.19.2.1 -r1.19.2.2 *** listenpaths.sgml 11 Jun 2007 16:01:33 -0000 1.19.2.1 --- listenpaths.sgml 30 Apr 2009 16:06:10 -0000 1.19.2.2 *************** *** 26,32 **** <emphasis>all</emphasis> nodes in order to be able to conclude that <command>sync</command>s have been received everywhere, and that, ! therefore, entries in <xref linkend="table.sl-log-1"> and <xref ! linkend="table.sl-log-2"> have been applied everywhere, and can ! therefore be purged. this extra communication is needful so <productname>Slony-I</productname> is able to shift origins to other locations.</para> --- 26,32 ---- <emphasis>all</emphasis> nodes in order to be able to conclude that <command>sync</command>s have been received everywhere, and that, ! therefore, entries in &sllog1; and &sllog2; have been applied ! everywhere, and can therefore be purged. this extra communication is ! needful so <productname>Slony-I</productname> is able to shift origins to other locations.</para> Index: help.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/help.sgml,v retrieving revision 1.18.2.2 retrieving revision 1.18.2.3 diff -C2 -d -r1.18.2.2 -r1.18.2.3 *** help.sgml 16 Mar 2007 19:01:26 -0000 1.18.2.2 --- help.sgml 30 Apr 2009 16:06:10 -0000 1.18.2.3 *************** *** 11,17 **** <listitem><para> Before submitting questions to any public forum as to why <quote>something mysterious</quote> has happened to your ! replication cluster, please run the <xref linkend="testslonystate"> ! tool. It may give some clues as to what is wrong, and the results are ! likely to be of some assistance in analyzing the problem. </para> </listitem> --- 11,18 ---- <listitem><para> Before submitting questions to any public forum as to why <quote>something mysterious</quote> has happened to your ! replication cluster, be sure to run the <eststate; tool and be ! prepared to provide its output. It may give some clues as to what is ! wrong, and the results are likely to be of some assistance in ! analyzing the problem. </para> </listitem> Index: concepts.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/concepts.sgml,v retrieving revision 1.20 retrieving revision 1.20.2.1 diff -C2 -d -r1.20 -r1.20.2.1 *** concepts.sgml 2 Aug 2006 18:34:57 -0000 1.20 --- concepts.sgml 30 Apr 2009 16:06:10 -0000 1.20.2.1 *************** *** 41,45 **** <para>The cluster name is specified in each and every Slonik script via the directive:</para> <programlisting> ! cluster name = 'something'; </programlisting> --- 41,45 ---- <para>The cluster name is specified in each and every Slonik script via the directive:</para> <programlisting> ! cluster name = something; </programlisting> Index: adminscripts.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/adminscripts.sgml,v retrieving revision 1.40.2.9 retrieving revision 1.40.2.10 diff -C2 -d -r1.40.2.9 -r1.40.2.10 *** adminscripts.sgml 13 Mar 2008 16:51:26 -0000 1.40.2.9 --- adminscripts.sgml 30 Apr 2009 16:06:10 -0000 1.40.2.10 *************** *** 135,147 **** replicated.</para> </sect3> ! <sect3><title>slonik_drop_node</title> - <para>Generates Slonik script to drop a node from a &slony1; cluster.</para> </sect3> ! <sect3><title>slonik_drop_set</title> <para>Generates Slonik script to drop a replication set (<emphasis>e.g.</emphasis> - set of tables and sequences) from a &slony1; cluster.</para> </sect3> --- 135,162 ---- replicated.</para> </sect3> ! <sect3 id="slonik-drop-node"><title>slonik_drop_node</title> ! ! <para>Generates Slonik script to drop a node from a &slony1; ! cluster.</para> </sect3> ! <sect3 id="slonik-drop-set"><title>slonik_drop_set</title> <para>Generates Slonik script to drop a replication set (<emphasis>e.g.</emphasis> - set of tables and sequences) from a &slony1; cluster.</para> + + <para> This represents a pretty big potential <quote>foot gun</quote> + as this eliminates a replication set all at once. A typo that points + it to the wrong set could be rather damaging. Compare to <xref + linkend="slonik-unsubscribe-set"> and <xref + linkend="slonik-drop-node">; with both of those, attempting to drop a + subscription or a node that is vital to your operations will be + blocked (via a foreign key constraint violation) if there exists a + downstream subscriber that would be adversely affected. In contrast, + there will be no warnings or errors if you drop a set; the set will + simply disappear from replication. + </para> + </sect3> *************** *** 232,239 **** <para>This goes through and drops the &slony1; schema from each node; ! use this if you want to destroy replication throughout a cluster. ! This is a <emphasis>VERY</emphasis> unsafe script!</para> ! </sect3><sect3><title>slonik_unsubscribe_set</title> <para>Generates Slonik script to unsubscribe a node from a replication set.</para> --- 247,257 ---- <para>This goes through and drops the &slony1; schema from each node; ! use this if you want to destroy replication throughout a cluster. As ! its effects are necessarily rather destructive, this has the potential ! to be pretty unsafe.</para> ! </sect3> ! ! <sect3 id="slonik-unsubscribe-set"><title>slonik_unsubscribe_set</title> <para>Generates Slonik script to unsubscribe a node from a replication set.</para> *************** *** 344,347 **** --- 362,408 ---- </sect2> + <sect2 id="startslon"> <title>start_slon.sh</title> + + <para> This <filename>rc.d</filename>-style script was introduced in + &slony1; version 2.0; it provides automatable ways of:</para> + + <itemizedlist> + <listitem><para>Starting the &lslon;, via <command> start_slon.sh start </command> </para> </listitem> + </itemizedlist> + <para> Attempts to start the &lslon;, checking first to verify that it + is not already running, that configuration exists, and that the log + file location is writable. Failure cases include:</para> + + <itemizedlist> + <listitem><para> No <link linkend="runtime-config"> slon runtime configuration file </link> exists, </para></listitem> + <listitem><para> A &lslon; is found with the PID indicated via the runtime configuration, </para></listitem> + <listitem><para> The specified <envar>SLON_LOG</envar> location is not writable. </para></listitem> + <listitem><para>Stopping the &lslon;, via <command> start_slon.sh stop </command> </para> + <para> This fails (doing nothing) if the PID (indicated via the runtime configuration file) does not exist; </para> </listitem> + <listitem><para>Monitoring the status of the &lslon;, via <command> start_slon.sh status </command> </para> + <para> This indicates whether or not the &lslon; is running, and, if so, prints out the process ID. </para> </listitem> + + </itemizedlist> + + <para> The following environment variables are used to control &lslon; configuration:</para> + + <glosslist> + <glossentry><glossterm> <envar> SLON_BIN_PATH </envar> </glossterm> + <glossdef><para> This indicates where the &lslon; binary program is found. </para> </glossdef> </glossentry> + <glossentry><glossterm> <envar> SLON_CONF </envar> </glossterm> + <glossdef><para> This indicates the location of the <link linkend="runtime-config"> slon runtime configuration file </link> that controls how the &lslon; behaves. </para> + <para> Note that this file is <emphasis>required</emphasis> to contain a value for <link linkend="slon-config-logging-pid-file">log_pid_file</link>; that is necessary to allow this script to detect whether the &lslon; is running or not. </para> + </glossdef> </glossentry> + <glossentry><glossterm> <envar> SLON_LOG </envar> </glossterm> + <glossdef><para> This file is the location where &lslon; log files are to be stored, if need be. There is an option <xref linkend ="slon-config-logging-syslog"> for &lslon; to use <application>syslog</application> to manage logging; in that case, you may prefer to set <envar>SLON_LOG</envar> to <filename>/dev/null</filename>. </para> </glossdef> </glossentry> + </glosslist> + + <para> Note that these environment variables may either be set, in the + script, or overridden by values passed in from the environment. The + latter usage makes it easy to use this script in conjunction with the + <xref linkend="testbed"> so that it is regularly tested. </para> + + </sect2> + <sect2 id="launchclusters"><title> launch_clusters.sh </title> *************** *** 349,356 **** <para> This is another shell script which uses the configuration as ! set up by <filename>mkslonconf.sh</filename> and is intended to either ! be run at system boot time, as an addition to the ! <filename>rc.d</filename> processes, or regularly, as a cron process, ! to ensure that &lslon; processes are running.</para> <para> It uses the following environment variables:</para> --- 410,417 ---- <para> This is another shell script which uses the configuration as ! set up by <filename>mkslonconf.sh</filename> and is intended to ! support an approach to running &slony1; involving regularly ! (<emphasis>e.g.</emphasis> via a cron process) checking to ensure that ! &lslon; processes are running.</para> <para> It uses the following environment variables:</para> *************** *** 420,433 **** </itemizedlist> - <note> <para> This script only works properly when run against an <emphasis>origin</emphasis> node. </para> </note> - - <warning> <para> If this script is against a - <emphasis>subscriber</emphasis> node, the <command>pg_dump</command> - used to draw the schema from the <quote>source</quote> node will - attempt to pull the <emphasis>broken</emphasis> schema found on the - subscriber, and thus, the result will <emphasis>not</emphasis> be a - faithful representation of the schema as would be found on the origin - node. </para> </warning> - </sect2> <sect2><title> slony-cluster-analysis </title> --- 481,484 ---- *************** *** 569,573 **** cluster.</para></listitem> ! <listitem><para> <filename>create_set.slonik</filename></para> <para>This is the first script to run; it sets up the requested nodes --- 620,624 ---- cluster.</para></listitem> ! <listitem><para> <filename>create_nodes.slonik</filename></para> <para>This is the first script to run; it sets up the requested nodes *************** *** 644,648 **** <subtitle> Apache-Style profiles for FreeBSD <filename>ports/databases/slony/*</filename> </subtitle> ! <para> In the tools area, <filename>slon.in-profiles</filename> is a script that might be used to start up &lslon; instances at the time of system startup. It is designed to interact with the FreeBSD Ports --- 695,701 ---- <subtitle> Apache-Style profiles for FreeBSD <filename>ports/databases/slony/*</filename> </subtitle> ! <indexterm><primary> Apache-style profiles for FreeBSD </primary> <secondary>FreeBSD </secondary> </indexterm> ! ! <para> In the <filename>tools</filename> area, <filename>slon.in-profiles</filename> is a script that might be used to start up &lslon; instances at the time of system startup. It is designed to interact with the FreeBSD Ports *************** *** 650,653 **** --- 703,762 ---- </sect2> + + <sect2 id="duplicate-node"> <title> <filename> duplicate-node.sh </filename> </title> + <indexterm><primary> duplicating nodes </primary> </indexterm> + <para> In the <filename>tools</filename> area, + <filename>duplicate-node.sh</filename> is a script that may be used to + help create a new node that duplicates one of the ones in the + cluster. </para> + + <para> The script expects the following parameters: </para> + <itemizedlist> + <listitem><para> Cluster name </para> </listitem> + <listitem><para> New node number </para> </listitem> + <listitem><para> Origin node </para> </listitem> + <listitem><para> Node being duplicated </para> </listitem> + <listitem><para> New node </para> </listitem> + </itemizedlist> + + <para> For each of the nodes specified, the script offers flags to + specify <function>libpq</function>-style parameters for + <envar>PGHOST</envar>, <envar>PGPORT</envar>, + <envar>PGDATABASE</envar>, and <envar>PGUSER</envar>; it is expected + that <filename>.pgpass</filename> will be used for storage of + passwords, as is generally considered best practice. Those values may + inherit from the <function>libpq</function> environment variables, if + not set, which is useful when using this for testing. When + <quote>used in anger,</quote> however, it is likely that nearly all of + the 14 available parameters should be used. </para> + + <para> The script prepares files, normally in + <filename>/tmp</filename>, and will report the name of the directory + that it creates that contain SQL and &lslonik; scripts to set up the + new node. </para> + + <itemizedlist> + <listitem><para> <filename> schema.sql </filename> </para> + <para> This is drawn from the origin node, and contains the <quote>pristine</quote> database schema that must be applied first.</para></listitem> + <listitem><para> <filename> slonik.preamble </filename> </para> + + <para> This <quote>preamble</quote> is used by the subsequent set of slonik scripts. </para> </listitem> + <listitem><para> <filename> step1-storenode.slonik </filename> </para> + <para> A &lslonik; script to set up the new node. </para> </listitem> + <listitem><para> <filename> step2-storepath.slonik </filename> </para> + <para> A &lslonik; script to set up path communications between the provider node and the new node. </para> </listitem> + <listitem><para> <filename> step3-subscribe-sets.slonik </filename> </para> + <para> A &lslonik; script to request subscriptions for all replications sets.</para> </listitem> + </itemizedlist> + + <para> For testing purposes, this is sufficient to get a new node working. The configuration may not necessarily reflect what is desired as a final state:</para> + + <itemizedlist> + <listitem><para> Additional communications paths may be desirable in order to have redundancy. </para> </listitem> + <listitem><para> It is assumed, in the generated scripts, that the new node should support forwarding; that may not be true. </para> </listitem> + <listitem><para> It may be desirable later, after the subscription process is complete, to revise subscriptions. </para> </listitem> + </itemizedlist> + + </sect2> </sect1> <!-- Keep this comment at the end of the file Index: maintenance.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/maintenance.sgml,v retrieving revision 1.25 retrieving revision 1.25.2.1 diff -C2 -d -r1.25 -r1.25.2.1 *** maintenance.sgml 2 Aug 2006 18:34:59 -0000 1.25 --- maintenance.sgml 30 Apr 2009 16:06:10 -0000 1.25.2.1 *************** *** 11,17 **** <listitem><para> Deletes old data from various tables in the <productname>Slony-I</productname> cluster's namespace, notably ! entries in <xref linkend="table.sl-log-1">, <xref ! linkend="table.sl-log-2"> (not yet used), and <xref ! linkend="table.sl-seqlog">.</para></listitem> <listitem><para> Vacuum certain tables used by &slony1;. As of 1.0.5, --- 11,15 ---- <listitem><para> Deletes old data from various tables in the <productname>Slony-I</productname> cluster's namespace, notably ! entries in &sllog1;, &sllog2;, and &slseqlog;.</para></listitem> <listitem><para> Vacuum certain tables used by &slony1;. As of 1.0.5, *************** *** 26,30 **** vacuuming of these tables. Unfortunately, it has been quite possible for <application>pg_autovacuum</application> to not vacuum quite ! frequently enough, so you probably want to use the internal vacuums. Vacuuming &pglistener; <quote>too often</quote> isn't nearly as hazardous as not vacuuming it frequently enough.</para> --- 24,28 ---- vacuuming of these tables. Unfortunately, it has been quite possible for <application>pg_autovacuum</application> to not vacuum quite ! frequently enough, so you may prefer to use the internal vacuums. Vacuuming &pglistener; <quote>too often</quote> isn't nearly as hazardous as not vacuuming it frequently enough.</para> *************** *** 37,52 **** <listitem> <para> The <link linkend="dupkey"> Duplicate Key Violation ! </link> bug has helped track down some &postgres; race conditions. ! One remaining issue is that it appears that is a case where ! <command>VACUUM</command> is not reclaiming space correctly, leading ! to corruption of B-trees. </para> ! ! <para> It may be helpful to run the command <command> REINDEX TABLE ! sl_log_1;</command> periodically to avoid the problem ! occurring. </para> </listitem> <listitem><para> As of version 1.2, <quote>log switching</quote> ! functionality is in place; every so often, it seeks to switch between ! storing data in &sllog1; and &sllog2; so that it may seek to <command>TRUNCATE</command> the <quote>elder</quote> data.</para> --- 35,48 ---- <listitem> <para> The <link linkend="dupkey"> Duplicate Key Violation ! </link> bug has helped track down a number of rather obscure ! &postgres; race conditions, so that in modern versions of &slony1; and &postgres;, there should be little to worry about. ! </para> ! </listitem> <listitem><para> As of version 1.2, <quote>log switching</quote> ! functionality is in place; every so often (by default, once per week, ! though you may induce it by calling the stored ! function <function>logswitch_start()</function>), it seeks to switch ! between storing data in &sllog1; and &sllog2; so that it may seek to <command>TRUNCATE</command> the <quote>elder</quote> data.</para> *************** *** 54,62 **** cleared out, so that you will not suffer from them having grown to some significant size, due to heavy load, after which they are ! incapable of shrinking back down </para> </listitem> </itemizedlist> </para> <sect2><title> Watchdogs: Keeping Slons Running</title> --- 50,119 ---- cleared out, so that you will not suffer from them having grown to some significant size, due to heavy load, after which they are ! incapable of shrinking back down </para> ! ! <para> In version 2.0, <command>DELETE</command> is no longer used to ! clear out data in &sllog1; and &sllog2;; instead, the log switch logic ! is induced frequently, every time the cleanup loop does not find a ! switch in progress, and these tables are purely cleared out ! via <command>TRUNCATE</command>. This eliminates the need to vacuum ! these tables. </para> ! ! </listitem> </itemizedlist> </para> + <sect2 id="maintenance-autovac"> <title> Interaction with &postgres; + autovacuum </title> + + <indexterm><primary>autovacuum interaction</primary></indexterm> + + <para> Recent versions of &postgres; support an + <quote>autovacuum</quote> process which notices when tables are + modified, thereby creating dead tuples, and vacuums those tables, + <quote>on demand.</quote> It has been observed that this can interact + somewhat negatively with &slony1;'s own vacuuming policies on its own + tables. </para> + + <para> &slony1; requests vacuums on its tables immediately after + completing transactions that are expected to clean out old data, which + may be expected to be the ideal time to do so. It appears as though + autovacuum may notice the changes a bit earlier, and attempts + vacuuming when transactions are not complete, rendering the work + pretty useless. It seems preferable to configure autovacuum to avoid + vacuum &slony1;-managed configuration tables. </para> + + <para> The following query (change the cluster name to match your + local configuration) will identify the tables that autovacuum should + be configured not to process: </para> + + <programlisting> + mycluster=# select oid, relname from pg_class where relnamespace = (select oid from pg_namespace where nspname = '_' || 'MyCluster') and relhasindex; + oid | relname + -------+-------------- + 17946 | sl_nodelock + 17963 | sl_setsync + 17994 | sl_trigger + 17980 | sl_table + 18003 | sl_sequence + 17937 | sl_node + 18034 | sl_listen + 18017 | sl_path + 18048 | sl_subscribe + 17951 | sl_set + 18062 | sl_event + 18069 | sl_confirm + 18074 | sl_seqlog + 18078 | sl_log_1 + 18085 | sl_log_2 + (15 rows) + </programlisting> + + <para> The following query will populate + <envar>pg_catalog.pg_autovacuum</envar> with suitable configuration + information: <command> INSERT INTO pg_catalog.pg_autovacuum (vacrelid, enabled, vac_base_thresh, vac_scale_factor, anl_base_thresh, anl_scale_factor, vac_cost_delay, vac_cost_limit, freeze_min_age, freeze_max_age) SELECT oid, 'f', -1, -1, -1, -1, -1, -1, -1, -1 FROM pg_catalog.pg_class WHERE relnamespace = (SELECT OID FROM pg_namespace WHERE nspname = '_' || 'MyCluster') AND relhasindex; </command> + </para> + </sect2> + <sect2><title> Watchdogs: Keeping Slons Running</title> *************** *** 89,92 **** --- 146,150 ---- <sect2 id="gensync"><title>Parallel to Watchdog: generate_syncs.sh</title> + <indexterm><primary>generate SYNCs</primary></indexterm> <para>A new script for &slony1; 1.1 is <application>generate_syncs.sh</application>, which addresses the following kind of *************** *** 121,128 **** <indexterm><primary>testing cluster status</primary></indexterm> ! <para> In the <filename>tools</filename> directory, you may find ! scripts called <filename>test_slony_state.pl</filename> and ! <filename>test_slony_state-dbi.pl</filename>. One uses the Perl/DBI ! interface; the other uses the Pg interface. </para> --- 179,186 ---- <indexterm><primary>testing cluster status</primary></indexterm> ! <para> In the <filename>tools</filename> directory, you will find ! <eststate; scripts called <filename>test_slony_state.pl</filename> ! and <filename>test_slony_state-dbi.pl</filename>. One uses the ! Perl/DBI interface; the other uses the Pg interface. </para> *************** *** 130,136 **** &slony1; node (you can pick any one), and from that, determine all the nodes in the cluster. They then run a series of queries (read only, ! so this should be quite safe to run) which look at the various ! &slony1; tables, looking for a variety of sorts of conditions ! suggestive of problems, including: </para> --- 188,194 ---- &slony1; node (you can pick any one), and from that, determine all the nodes in the cluster. They then run a series of queries (read only, ! so this should be quite safe to run) which examine various &slony1; ! tables, looking for a variety of sorts of conditions suggestive of ! problems, including: </para> *************** *** 219,222 **** --- 277,282 ---- <sect2><title> Other Replication Tests </title> + <indexterm><primary>testing replication</primary></indexterm> + <para> The methodology of the previous section is designed with a view to minimizing the cost of submitting replication test queries; on a *************** *** 287,290 **** --- 347,446 ---- </para> </sect2> + <sect2><title>mkservice </title> + <indexterm><primary>mkservice for BSD </primary></indexterm> + + <sect3><title>slon-mkservice.sh</title> + + <para> Create a slon service directory for use with svscan from + daemontools. This uses multilog in a pretty basic way, which seems to + be standard for daemontools / multilog setups. If you want clever + logging, see logrep below. Currently this script has very limited + error handling capabilities.</para> + + <para> For non-interactive use, set the following environment + variables. <envar>BASEDIR</envar> <envar>SYSUSR</envar> + <envar>PASSFILE</envar> <envar>DBUSER</envar> <envar>HOST</envar> + <envar>PORT</envar> <envar>DATABASE</envar> <envar>CLUSTER</envar> + <envar>SLON_BINARY</envar> If any of the above are not set, the script + asks for configuration information interactively.</para> + + <itemizedlist> + <listitem><para> + <envar>BASEDIR</envar> where you want the service directory structure for the slon + to be created. This should <emphasis>not</emphasis> be the <filename>/var/service</filename> directory.</para></listitem> + <listitem><para> + <envar>SYSUSR</envar> the unix user under which the slon (and multilog) process should run.</para></listitem> + <listitem><para> + <envar>PASSFILE</envar> location of the <filename>.pgpass</filename> file to be used. (default <filename>~sysusr/.pgpass</filename>)</para></listitem> + <listitem><para> + <envar>DBUSER</envar> the postgres user the slon should connect as (default slony)</para></listitem> + <listitem><para> + <envar>HOST</envar> what database server to connect to (default localhost)</para></listitem> + <listitem><para> + <envar>PORT</envar> what port to connect to (default 5432)</para></listitem> + <listitem><para> + <envar>DATABASE</envar> which database to connect to (default dbuser)</para></listitem> + <listitem><para> + <envar>CLUSTER</envar> the name of your Slony1 cluster? (default database)</para></listitem> + <listitem><para> + <envar>SLON_BINARY</envar> the full path name of the slon binary (default <command>which slon</command>)</para></listitem> + </itemizedlist> + </sect3> + + <sect3><title>logrep-mkservice.sh</title> + + <para>This uses <command>tail -F</command> to pull data from log files allowing + you to use multilog filters (by setting the CRITERIA) to create + special purpose log files. The goal is to provide a way to monitor log + files in near realtime for <quote>interesting</quote> data without either + hacking up the initial log file or wasting CPU/IO by re-scanning the + same log repeatedly. + </para> + + <para>For non-interactive use, set the following environment + variables. <envar>BASEDIR</envar> <envar>SYSUSR</envar> <envar>SOURCE</envar> + <envar>EXTENSION</envar> <envar>CRITERIA</envar> If any of the above are not set, + the script asks for configuration information interactively. + </para> + + <itemizedlist> + <listitem><para> + <envar>BASEDIR</envar> where you want the service directory structure for the logrep + to be created. This should <emphasis>not</emphasis> be the <filename>/var/service</filename> directory.</para></listitem> + <listitem><para><envar>SYSUSR</envar> unix user under which the service should run.</para></listitem> + <listitem><para><envar>SOURCE</envar> name of the service with the log you want to follow.</para></listitem> + <listitem><para><envar>EXTENSION</envar> a tag to differentiate this logrep from others using the same source.</para></listitem> + <listitem><para><envar>CRITERIA</envar> the multilog filter you want to use.</para></listitem> + </itemizedlist> + + <para> A trivial example of this would be to provide a log file of all slon + ERROR messages which could be used to trigger a nagios alarm. + <command>EXTENSION='ERRORS'</command> + <command>CRITERIA="'-*' '+* * ERROR*'"</command> + (Reset the monitor by rotating the log using <command>svc -a $svc_dir</command>) + </para> + + <para> A more interesting application is a subscription progress log. + <command>EXTENSION='COPY'</command> + <command>CRITERIA="'-*' '+* * ERROR*' '+* * WARN*' '+* * CONFIG enableSubscription*' '+* * DEBUG2 remoteWorkerThread_* prepare to copy table*' '+* * DEBUG2 remoteWorkerThread_* all tables for set * found on subscriber*' '+* * DEBUG2 remoteWorkerThread_* copy*' '+* * DEBUG2 remoteWorkerThread_* Begin COPY of table*' '+* * DEBUG2 remoteWorkerThread_* * bytes copied for table*' '+* * DEBUG2 remoteWorkerThread_* * seconds to*' '+* * DEBUG2 remoteWorkerThread_* set last_value of sequence*' '+* * DEBUG2 remoteWorkerThread_* copy_set*'"</command> + </para> + + <para>If you have a subscription log then it's easy to determine if a given + slon is in the process of handling copies or other subscription activity. + If the log isn't empty, and doesn't end with a + <command>"CONFIG enableSubscription: sub_set:1"</command> + (or whatever set number you've subscribed) then the slon is currently in + the middle of initial copies.</para> + + <para> If you happen to be monitoring the mtime of your primary slony logs to + determine if your slon has gone brain-dead, checking this is a good way + to avoid mistakenly clobbering it in the middle of a subscribe. As a bonus, + recall that since the the slons are running under svscan, you only need to + kill it (via the svc interface) and let svscan start it up again laster. + I've also found the COPY logs handy for following subscribe activity + interactively.</para> + </sect3> + + </sect2> </sect1> <!-- Keep this comment at the end of the file Index: addthings.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/addthings.sgml,v retrieving revision 1.23.2.5 retrieving revision 1.23.2.6 diff -C2 -d -r1.23.2.5 -r1.23.2.6 *** addthings.sgml 11 Jun 2007 16:01:33 -0000 1.23.2.5 --- addthings.sgml 30 Apr 2009 16:06:10 -0000 1.23.2.6 *************** *** 237,241 **** drops the schema and its contents, but also removes any columns previously added in using <xref linkend= "stmttableaddkey">. ! </para></listitem> </itemizedlist> </sect2> --- 237,247 ---- drops the schema and its contents, but also removes any columns previously added in using <xref linkend= "stmttableaddkey">. ! </para> ! ! <note><para> In &slony1; version 2.0, <xref linkend= ! "stmttableaddkey"> is <emphasis>no longer supported</emphasis>, and ! thus <xref linkend="stmtuninstallnode"> consists very simply of ! <command>DROP SCHEMA "_ClusterName" CASCADE;</command>. </para> ! </note></listitem> </itemizedlist> </sect2> *************** *** 290,298 **** </para></listitem> ! <listitem><para> At this point, it is an excellent idea to run ! the <filename>tools</filename> ! script <command>test_slony_state-dbi.pl</command>, which rummages ! through the state of the entire cluster, pointing out any anomalies ! that it finds. This includes a variety of sorts of communications problems.</para> </listitem> --- 296,303 ---- </para></listitem> ! <listitem><para> At this point, it is an excellent idea to run the ! <filename>tools</filename> script <eststate;, which rummages through ! the state of the entire cluster, pointing out any anomalies that it ! finds. This includes a variety of sorts of communications problems.</para> </listitem> *************** *** 347,355 **** originates a replication set.</para> </listitem> ! <listitem><para> Run the <filename>tools</filename> ! script <command>test_slony_state-dbi.pl</command>, which rummages ! through the state of the entire cluster, pointing out any anomalies ! that it notices, as well as some information on the status of each ! node. </para> </listitem> </itemizedlist> --- 352,359 ---- originates a replication set.</para> </listitem> ! <listitem><para> Run the <filename>tools</filename> script ! <eststate;, which rummages through the state of the entire cluster, ! pointing out any anomalies that it notices, as well as some ! information on the status of each node. </para> </listitem> </itemizedlist> Index: testbed.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/testbed.sgml,v retrieving revision 1.10.2.2 retrieving revision 1.10.2.3 diff -C2 -d -r1.10.2.2 -r1.10.2.3 *** testbed.sgml 20 Apr 2007 20:51:09 -0000 1.10.2.2 --- testbed.sgml 30 Apr 2009 16:06:10 -0000 1.10.2.3 *************** *** 148,151 **** --- 148,161 ---- <glossentry> + <glossterm><envar>TMPDIR</envar></glossterm> + + <glossdef><para> By default, the tests will generate their output in + <filename>/tmp</filename>, <filename>/usr/tmp</filename>, or + <filename>/var/tmp</filename>, unless you set your own value for this + environment variable. </para></glossdef> + + </glossentry> + + <glossentry> <glossterm><envar>SLTOOLDIR</envar></glossterm> *************** *** 180,183 **** --- 190,231 ---- </glossentry> + <glossentry> + <glossterm><envar>SLONCONF[n]</envar></glossterm> + + <glossdef><para> If set to <quote>true</quote>, for a particular node, + typically handled in <filename>settings.ik</filename> for a given + test, then configuration will be set up in a <link + linkend="runtime-config"> per-node <filename>slon.conf</filename> + runtime config file. </link> </para> </glossdef> + </glossentry> + + <glossentry> + <glossterm><envar>SLONYTESTER</envar></glossterm> + + <glossdef><para> Email address of the person who might be + contacted about the test results. This is stored in the + <envar>SLONYTESTFILE</envar>, and may eventually be aggregated in some + sort of buildfarm-like registry. </para> </glossdef> + </glossentry> + + <glossentry> + <glossterm><envar>SLONYTESTFILE</envar></glossterm> + + <glossdef><para> File in which to store summary results from tests. + Eventually, this may be used to construct a buildfarm-like repository of + aggregated test results. </para> </glossdef> + </glossentry> + + <glossentry> + <glossterm><filename>random_number</filename> and <filename>random_string</filename> </glossterm> + + <glossdef><para> If you run <command>make</command> in the + <filename>test</filename> directory, C programs + <application>random_number</application> and + <application>random_string</application> will be built which will then + be used when generating random data in lieu of using shell/SQL + capabilities that are much slower than the C programs. </para> + </glossdef> + </glossentry> </glosslist> Index: releasechecklist.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/releasechecklist.sgml,v retrieving revision 1.3.2.6 retrieving revision 1.3.2.7 diff -C2 -d -r1.3.2.6 -r1.3.2.7 *** releasechecklist.sgml 29 Aug 2007 05:44:58 -0000 1.3.2.6 --- releasechecklist.sgml 30 Apr 2009 16:06:10 -0000 1.3.2.7 *************** *** 1,4 **** <!-- $Id$ --> ! <article id="releasechecklist"> <title> Release Checklist </title> <indexterm><primary>release checklist</primary></indexterm> --- 1,4 ---- <!-- $Id$ --> ! <sect1 id="releasechecklist"> <title> Release Checklist </title> <indexterm><primary>release checklist</primary></indexterm> *************** *** 53,59 **** <filename>configure.ac</filename></para></listitem> ! <listitem><para>Purge directory <filename>autom4te.cache</filename> so it is not included in the build </para></listitem> ! <listitem><para>Purge out .cvsignore files; this can be done with the command <command> find . -name .cvsignore | xargs rm </command> </para></listitem> <listitem><para> Run <filename>tools/release_checklist.sh</filename> </para> --- 53,63 ---- <filename>configure.ac</filename></para></listitem> ! <listitem><para>Purge directory <filename>autom4te.cache</filename> so it is not included in the build </para> ! <para> Does not need to be done by hand - the later <command> make distclean </command> step does this for you. </para> ! </listitem> ! <listitem><para>Purge out .cvsignore files; this can be done with the command <command> find . -name .cvsignore | xargs rm </command> </para> ! <para> Does not need to be done by hand - the later <command> make distclean </command> step does this for you. </para> ! </listitem> <listitem><para> Run <filename>tools/release_checklist.sh</filename> </para> *************** *** 66,70 **** <listitem><para>PACKAGE_VERSION=REL_1_1_2</para></listitem> ! <listitem><para>PACKAGE_STRING=postgresql-slony1 REL_1_1_2</para></listitem> </itemizedlist> --- 70,74 ---- <listitem><para>PACKAGE_VERSION=REL_1_1_2</para></listitem> ! <listitem><para>PACKAGE_STRING=slony1 REL_1_1_2</para></listitem> </itemizedlist> *************** *** 94,99 **** <para> Currently this is best done by issuing <command> ./configure && ! make all && make clean</command> but that is a somewhat ugly approach. </para></listitem> --- 98,105 ---- <para> Currently this is best done by issuing <command> ./configure && ! make all && make clean</command> but that is a somewhat ugly approach.</para> + <para> Slightly better may be <command> ./configure && make + src/slon/conf-file.c src/slonik/parser.c src/slonik/scan.c </command> </para></listitem> *************** *** 101,106 **** previous step(s) are removed.</para> ! <para> <command>make distclean</command> ought to do ! that... </para></listitem> <listitem><para>Generate HTML tarball, and RTF/PDF, if --- 107,118 ---- previous step(s) are removed.</para> ! <para> <command>make distclean</command> will do ! that... </para> ! ! <para> Note that <command>make distclean</command> also clears out ! <filename>.cvsignore</filename> files and ! <filename>autom4te.cache</filename>, thus obsoleting some former steps ! that suggested that it was needful to delete them. </para> ! </listitem> <listitem><para>Generate HTML tarball, and RTF/PDF, if *************** *** 135,139 **** </itemizedlist> ! </article> <!-- Keep this comment at the end of the file Local variables: --- 147,151 ---- </itemizedlist> ! </sect1> <!-- Keep this comment at the end of the file Local variables: Index: firstdb.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/firstdb.sgml,v retrieving revision 1.20.2.4 retrieving revision 1.20.2.5 diff -C2 -d -r1.20.2.4 -r1.20.2.5 *** firstdb.sgml 18 Feb 2009 21:02:53 -0000 1.20.2.4 --- firstdb.sgml 30 Apr 2009 16:06:10 -0000 1.20.2.5 *************** *** 31,35 **** <listitem><para> You have <option>tcpip_socket=true</option> in your ! <filename>postgresql.conf</filename> and</para></listitem> <listitem><para> You have enabled access in your cluster(s) via --- 31,35 ---- <listitem><para> You have <option>tcpip_socket=true</option> in your ! <filename>postgresql.conf</filename>;</para> <note> <para> This is no longer needed for &postgres; 8.0 and later versions.</para></note> </listitem> <listitem><para> You have enabled access in your cluster(s) via *************** *** 95,98 **** --- 95,116 ---- </programlisting> + <para> One of the tables created by + <application>pgbench</application>, <envar>history</envar>, does not + have a primary key. In earlier versions of &slony1;, a &lslonik; + command called <xref linkend="stmttableaddkey"> could be used to + introduce one. This caused a number of problems, and so this feature + has been removed in version 2 of &slony1;. It now + <emphasis>requires</emphasis> that there is a suitable candidate + primary key. </para> + + <para> The following SQL requests will establish a proper primary key on this table: </para> + + <programlisting> + psql -U $PGBENCHUSER -h $HOST1 -d $MASTERDBNAME -c "begin; alter table + history add column id serial; update history set id = + nextval('history_id_seq'); alter table history add primary key(id); + commit" + </programlisting> + <para>Because &slony1; depends on the databases having the pl/pgSQL procedural language installed, we better install it now. It is *************** *** 142,177 **** procedures in the master/slave (node) databases. </para> ! <sect3><title>Using the altperl scripts</title> ! ! <indexterm><primary> altperl script usage </primary></indexterm> ! ! <para> ! Using the <xref linkend="altperl"> scripts is an easy way to get started. The ! <command>slonik_build_env</command> script will generate output providing ! details you need to omplete building a <filename>slon_tools.conf</filename>. ! An example <filename>slon_tools.conf</filename> is provided in the distribution ! to get you started. The altperl scripts will all reference ! this central configuration file in the future to ease administration. Once ! slon_tools.conf has been created, you can proceed as follows: ! </para> ! ! <programlisting> ! # Initialize cluster: ! $ slonik_init_cluster | slonik ! ! # Start slon (here 1 and 2 are node numbers) ! $ slon_start 1 ! $ slon_start 2 ! ! # Create Sets (here 1 is a set number) ! $ slonik_create_set 1 | slonik ! # subscribe set to second node (1= set ID, 2= node ID) ! $ slonik_subscribe_set 2 | slonik ! </programlisting> ! <para> You have now replicated your first database. You can skip the following section ! of documentation if you'd like, which documents more of a <quote>bare-metal</quote> approach.</para> ! </sect3> <sect3><title>Using slonik command directly</title> --- 160,180 ---- procedures in the master/slave (node) databases. </para> ! <para> The example that follows uses <xref linkend="slonik"> directly ! (or embedded directly into scripts). This is not necessarily the most ! pleasant way to get started; there exist tools for building <xref ! linkend="slonik"> scripts under the <filename>tools</filename> ! directory, including:</para> ! <itemizedlist> ! <listitem><para> <xref linkend="altperl"> - a set of Perl scripts that ! build <xref linkend="slonik"> scripts based on a single ! <filename>slon_tools.conf</filename> file. </para> </listitem> ! <listitem><para> <xref linkend="mkslonconf"> - a shell script ! (<emphasis>e.g.</emphasis> - works with Bash) which, based either on ! self-contained configuration or on shell environment variables, ! generates a set of <xref linkend="slonik"> scripts to configure a ! whole cluster. </para> </listitem> ! </itemizedlist> <sect3><title>Using slonik command directly</title> *************** *** 211,225 **** #-- - # Because the history table does not have a primary key or other unique - # constraint that could be used to identify a row, we need to add one. - # The following command adds a bigint column named - # _Slony-I_$CLUSTERNAME_rowID to the table. It will have a default value - # of nextval('_$CLUSTERNAME.s1_rowid_seq'), and have UNIQUE and NOT NULL - # constraints applied. All existing rows will be initialized with a - # number - #-- - table add key (node id = 1, fully qualified name = 'public.history'); - - #-- # Slony-I organizes tables into sets. The smallest unit a node can # subscribe is a set. The following commands create one set containing --- 214,217 ---- *************** *** 230,234 **** set add table (set id=1, origin=1, id=2, fully qualified name = 'public.branches', comment='branches table'); set add table (set id=1, origin=1, id=3, fully qualified name = 'public.tellers', comment='tellers table'); ! set add table (set id=1, origin=1, id=4, fully qualified name = 'public.history', comment='history table', key = serial); #-- --- 222,226 ---- set add table (set id=1, origin=1, id=2, fully qualified name = 'public.branches', comment='branches table'); set add table (set id=1, origin=1, id=3, fully qualified name = 'public.tellers', comment='tellers table'); ! set add table (set id=1, origin=1, id=4, fully qualified name = 'public.history', comment='history table'); #-- *************** *** 237,241 **** #-- ! store node (id=2, comment = 'Slave node'); store path (server = 1, client = 2, conninfo='dbname=$MASTERDBNAME host=$MASTERHOST user=$REPLICATIONUSER'); store path (server = 2, client = 1, conninfo='dbname=$SLAVEDBNAME host=$SLAVEHOST user=$REPLICATIONUSER'); --- 229,233 ---- #-- ! store node (id=2, comment = 'Slave node', event node=1); store path (server = 1, client = 2, conninfo='dbname=$MASTERDBNAME host=$MASTERHOST user=$REPLICATIONUSER'); store path (server = 2, client = 1, conninfo='dbname=$SLAVEDBNAME host=$SLAVEHOST user=$REPLICATIONUSER'); *************** *** 304,313 **** the database. When the copy process is finished, the replication daemon on <envar>$SLAVEHOST</envar> will start to catch up by applying ! the accumulated replication log. It will do this in little steps, 10 ! seconds worth of application work at a time. Depending on the ! performance of the two systems involved, the sizing of the two ! databases, the actual transaction load and how well the two databases ! are tuned and maintained, this catchup process can be a matter of ! minutes, hours, or eons.</para> <para>You have now successfully set up your first basic master/slave --- 296,311 ---- the database. When the copy process is finished, the replication daemon on <envar>$SLAVEHOST</envar> will start to catch up by applying ! the accumulated replication log. It will do this in little steps, ! initially doing about 10 seconds worth of application work at a time. ! Depending on the performance of the two systems involved, the sizing ! of the two databases, the actual transaction load and how well the two ! databases are tuned and maintained, this catchup process may be a ! matter of minutes, hours, or eons.</para> ! ! <para> If you encounter problems getting this working, check over the ! logs for the &lslon; processes, as error messages are likely to be ! suggestive of the nature of the problem. The tool <eststate; is ! also useful for diagnosing problems with nearly-functioning ! replication clusters.</para> <para>You have now successfully set up your first basic master/slave *************** *** 362,368 **** <filename>slony-I-basic-mstr-slv.txt</filename>.</para> ! <para>If this script returns <command>FAILED</command> please contact the ! developers at <ulink url="http://slony.info/"> ! http://slony.info/</ulink></para></sect3> </sect2> </sect1> --- 360,407 ---- <filename>slony-I-basic-mstr-slv.txt</filename>.</para> ! <para>If this script returns <command>FAILED</command> please contact ! the developers at <ulink url="http://slony.info/"> ! http://slony.info/</ulink>. Be sure to be prepared with useful ! diagnostic information including the logs generated by &lslon; ! processes and the output of <eststate;. </para></sect3> ! ! <sect3><title>Using the altperl scripts</title> ! ! <indexterm><primary> altperl script example </primary></indexterm> ! ! <para> ! Using the <xref linkend="altperl"> scripts is an alternative way to ! get started; it allows you to avoid writing slonik scripts, at least ! for some of the simple ways of configuring &slony1;. The ! <command>slonik_build_env</command> script will generate output ! providing details you need to build a ! <filename>slon_tools.conf</filename>, which is required by these ! scripts. An example <filename>slon_tools.conf</filename> is provided ! in the distribution to get you started. The altperl scripts all ! reference this central configuration file centralize cluster ! configuration information. Once slon_tools.conf has been created, you ! can proceed as follows: ! </para> ! ! <programlisting> ! # Initialize cluster: ! $ slonik_init_cluster | slonik ! ! # Start slon (here 1 and 2 are node numbers) ! $ slon_start 1 ! $ slon_start 2 ! ! # Create Sets (here 1 is a set number) ! $ slonik_create_set 1 | slonik ! ! # subscribe set to second node (1= set ID, 2= node ID) ! $ slonik_subscribe_set 1 2 | slonik ! </programlisting> ! ! <para> You have now replicated your first database. You can skip the ! following section of documentation if you'd like, which documents more ! of a <quote>bare-metal</quote> approach.</para> ! </sect3> ! </sect2> </sect1> Index: versionupgrade.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/versionupgrade.sgml,v retrieving revision 1.9 retrieving revision 1.9.2.1 diff -C2 -d -r1.9 -r1.9.2.1 *** versionupgrade.sgml 2 Aug 2006 18:34:59 -0000 1.9 --- versionupgrade.sgml 30 Apr 2009 16:06:10 -0000 1.9.2.1 *************** *** 28,45 **** </itemizedlist></para> ! <para> And note that this led to a 40 hour outage.</para> <para> &slony1; offers an opportunity to replace that long outage with ! one a few minutes or even a few seconds long. The approach taken is ! to create a &slony1; replica in the new version. It is possible that ! it might take much longer than 40h to create that replica, but once ! it's there, it can be kept very nearly up to date.</para> ! <para> When it is time to switch over to the new database, the ! procedure is rather less time consuming: <itemizedlist> ! <listitem><para> Stop all applications that might modify the data</para></listitem> <listitem><para> Lock the set against client application updates using --- 28,48 ---- </itemizedlist></para> ! <para> And note that this approach led to a 40 hour outage.</para> <para> &slony1; offers an opportunity to replace that long outage with ! one as little as a few seconds long. The approach required is to ! create a &slony1; replica in the new version. It is possible that it ! may take considerably longer than 40h to create that replica, however, ! establishing that replica requires no outage, and once it's there, it ! can be kept very nearly up to date.</para> ! <para> When it comes time to switch over to the new database, the ! portion of the procedure that requires an application ! <quote>outage</quote> is a lot less time consuming: <itemizedlist> ! <listitem><para> Stop all applications that might modify the data ! </para></listitem> <listitem><para> Lock the set against client application updates using *************** *** 50,68 **** the new one</para></listitem> ! <listitem><para> Point the applications at the new database</para></listitem> ! </itemizedlist></para> <para> This procedure should only need to take a very short time, ! likely bound more by how quickly you can reconfigure your applications ! than anything else. If you can automate all the steps, it might take ! less than a second. If not, somewhere between a few seconds and a few ! minutes is likely.</para> ! <para> Note that after the origin has been shifted, updates now flow ! into the <emphasis>old</emphasis> database. If you discover that due ! to some unforeseen, untested condition, your application is somehow ! unhappy connecting to the new database, you could easily use <xref ! linkend="stmtmoveset"> again to shift the origin back to the old ! database.</para> <para> If you consider it particularly vital to be able to shift back --- 53,72 ---- the new one</para></listitem> ! <listitem><para> Point the applications to the new database ! </para></listitem> </itemizedlist></para> <para> This procedure should only need to take a very short time, ! likely based more on how much time is required to reconfigure your ! applications than anything else. If you can automate all of these ! steps, the outage may conceivably be a second or less. If manual ! handling is necessary, then it is likely to take somewhere between a ! few seconds and a few minutes.</para> ! <para> Note that after the origin has been shifted, updates are ! replicated back into the <emphasis>old</emphasis> database. If you ! discover that due to some unforeseen, untested condition, your ! application is somehow unhappy connecting to the new database, you may ! readily use <xref linkend="stmtmoveset"> again to reverse the process ! to shift the origin back to the old database.</para> <para> If you consider it particularly vital to be able to shift back *************** *** 82,86 **** <para> Thus, you have <emphasis>three</emphasis> nodes, one running ! the new version of &postgres;, and the other two the old version.</para></listitem> <listitem><para> Once they are roughly <quote>in sync</quote>, stop --- 86,98 ---- <para> Thus, you have <emphasis>three</emphasis> nodes, one running ! the new version of &postgres;, and the other two the old ! version.</para> ! ! <para> Note that this imposes a need to have &slony1; built against ! <emphasis>both</emphasis> databases (<emphasis>e.g.</emphasis> - at ! the very least, the binaries for the stored procedures need to have ! been compiled against both versions of &postgres;). </para> ! ! </listitem> <listitem><para> Once they are roughly <quote>in sync</quote>, stop *************** *** 120,128 **** <emphasis>considerably</emphasis> since 7.2), but that this was more workable for him than other replication systems such as ! <productname>eRServer</productname>. If you desperately need that, ! look for him on the &postgres; Hackers mailing list. It is not ! anticipated that 7.2 will be supported by any official &slony1; release.</para></note></para> </sect1> <!-- Keep this comment at the end of the file --- 132,555 ---- <emphasis>considerably</emphasis> since 7.2), but that this was more workable for him than other replication systems such as ! <productname>eRServer</productname>. &postgres; 7.2 will ! <emphasis>never</emphasis> be supported by any official &slony1; release.</para></note></para> + <sect2> <title>Example: Upgrading a single database with no existing replication </title> + + <para>This example shows names, IP addresses, ports, etc to describe + in detail what is going on</para> + + <sect3> + <title>The Environment</title> + <programlisting> + Database machine: + name = rome + ip = 192.168.1.23 + OS: Ubuntu 6.06 LTS + postgres user = postgres, group postgres + + Current PostgreSQL + Version = 8.2.3 + Port 5432 + Installed at: /data/pgsql-8.2.3 + Data directory: /data/pgsql-8.2.3/data + Database to be moved: mydb + + New PostgreSQL installation + Version = 8.3.3 + Port 5433 + Installed at: /data/pgsql-8.3.3 + Data directory: /data/pgsql-8.3.3/data + + Slony Version to be used = 1.2.14 + </programlisting> + </sect3> + <sect3> + <title>Installing &slony1;</title> + + <para> + How to install &slony1; is covered quite well in other parts of + the documentation (<xref linkend="installation">); we will just + provide a quick guide here.</para> + + <programlisting> + wget http://main.slony.info/downloads/1.2/source/slony1-1.2.14.tar.bz2 + </programlisting> + + <para> Unpack and build as root with</para> + <programlisting> + tar xjf slony1-1.2.14.tar.bz2 + cd slony1-1.2.14 + ./configure --prefix=/data/pgsql-8.2.3 --with-perltools=/data/pgsql-8.2.3/slony --with-pgconfigdir=/data/pgsql-8.2.3/bin + make clean + make + make install + chown -R postgres:postgres /data/pgsq-8.2.3 + mkdir /var/log/slony + chown -R postgres:postgres /var/log/slony + </programlisting> + + <para> Then repeat this for the 8.3.3 build. A very important + step is the <command>make clean</command>; it is not so + important the first time, but when building the second time, it + is essential to clean out the old binaries, otherwise the + binaries will not match the &postgres; 8.3.3 build with the + result that &slony1; will not work there. </para> + + </sect3> + <sect3> + <title>Creating the slon_tools.conf</title> + + <para> + The slon_tools.conf is <emphasis>the</emphasis> configuration + file. It contain all all the configuration information such as: + + <orderedlist> + <listitem> + <para>All the nodes and their details (IPs, ports, db, user, + password)</para> + </listitem> + <listitem> + <para>All the tables to be replicated</para> + </listitem> + <listitem> + <para>All the sequences to be replicated</para> + </listitem> + <listitem> + <para> How the tables and sequences are arranged in sets</para> + </listitem> + </orderedlist> + </para> + <para> Make a copy of + <filename>/data/pgsql-8.2.3/etc/slon_tools.conf-sample</filename> + to <filename>slon_tools.conf</filename> and open it. The comments + in this file are fairly self explanatory. Since this is a one time + replication you will generally not need to split into multiple + sets. On a production machine running with 500 tables and 100 + sequences, putting them all in a single set has worked fine.</para> + + <orderedlist> + <para>A few modifications to do:</para> + <listitem> + <para> In our case we only need 2 nodes so delete the <command>add_node</command> + for 3 and 4.</para> + </listitem> + <listitem> + <para> <envar>pkeyedtables</envar> entry need to be updated with your tables that + have a primary key. If your tables are spread across multiple + schemas, then you need to qualify the table name with the schema + (schema.tablename)</para> + </listitem> + <listitem> + <para> <envar>keyedtables</envar> entries need to be updated + with any tables that match the comment (with good schema + design, there should not be any). + </para> + </listitem> + <listitem> + <para> <envar>serialtables</envar> (if you have any; as it says, it is wise to avoid this).</para> + </listitem> + <listitem> + <para> <envar>sequences</envar> needs to be updated with your sequences. + </para> + </listitem> + <listitem> + <para>Remove the whole set2 entry (as we are only using set1)</para> + </listitem> + </orderedlist> + <para> + This is what it look like with all comments stripped out: + <programlisting> + $CLUSTER_NAME = 'replication'; + $LOGDIR = '/var/log/slony'; + $MASTERNODE = 1; + + add_node(node => 1, + host => 'rome', + dbname => 'mydb', + port => 5432, + user => 'postgres', + password => ''); + + add_node(node => 2, + host => 'rome', + dbname => 'mydb', + port => 5433, + user => 'postgres', + password => ''); + + $SLONY_SETS = { + "set1" => { + "set_id" => 1, + "table_id" => 1, + "sequence_id" => 1, + "pkeyedtables" => [ + 'mytable1', + 'mytable2', + 'otherschema.mytable3', + 'otherschema.mytable4', + 'otherschema.mytable5', + 'mytable6', + 'mytable7', + 'mytable8', + ], + + "sequences" => [ + 'mytable1_sequence1', + 'mytable1_sequence2', + 'otherschema.mytable3_sequence1', + 'mytable6_sequence1', + 'mytable7_sequence1', + 'mytable7_sequence2', + ], + }, + + }; + + 1; + </programlisting> + </para> + <para> As can be seen this database is pretty small with only 8 + tables and 6 sequences. Now copy your + <filename>slon_tools.conf</filename> into + <filename>/data/pgsql-8.2.3/etc/</filename> and + <filename>/data/pgsql-8.3.3/etc/</filename> + </para> + </sect3> + <sect3> + <title>Preparing the new &postgres; instance</title> + <para> You now have a fresh second instance of &postgres; running on + port 5433 on the same machine. Now is time to prepare to + receive &slony1; replication data.</para> + <orderedlist> + <listitem> + <para>Slony does not replicate roles, so first create all the + users on the new instance so it is identical in terms of + roles/groups</para> + </listitem> + <listitem> + <para> + Create your db in the same encoding as original db, in my case + UTF8 + <command>/data/pgsql-8.3.3/bin/createdb + -E UNICODE -p5433 mydb</command> + </para> + </listitem> + <listitem> + <para> + &slony1; replicates data, not schemas, so take a dump of your schema + <command>/data/pgsql-8.2.3/bin/pg_dump + -s mydb > /tmp/mydb.schema</command> + and then import it on the new instance. + <command>cat /tmp/mydb.schema | /data/pgsql-8.3.3/bin/psql -p5433 + mydb</command> + </para> + </listitem> + </orderedlist> + + <para>The new database is now ready to start receiving replication + data</para> + + </sect3> + <sect3> + <title>Initiating &slony1; Replication</title> + <para>This is the point where we start changing your current + production database by adding a new schema to it that contains + all the &slony1; replication information</para> + <para>The first thing to do is to initialize the &slony1; + schema. Do the following as, in the example, the postgres user.</para> + <note> + <para> All commands starting with <command>slonik</command> does not do anything + themself they only generate command output that can be interpreted + by the slonik binary. So issuing any of the scripts starting with + slonik_ will not do anything to your database. Also by default the + slonik_ scripts will look for your slon_tools.conf in your etc + directory of the postgresSQL directory. In my case + <filename>/data/pgsql-8.x.x/etc</filename> depending on which you are working on.</para> + </note> + <para> + <command>/data/pgsql-8.2.3/slony/slonik_init_cluster + > /tmp/init.txt</command> + </para> + <para>open /tmp/init.txt and it should look like something like + this</para> + <programlisting> + # INIT CLUSTER + cluster name = replication; + node 1 admin conninfo='host=rome dbname=mydb user=postgres port=5432'; + node 2 admin conninfo='host=rome dbname=mydb user=postgres port=5433'; + init cluster (id = 1, comment = 'Node 1 - mydb at rome'); + + # STORE NODE + store node (id = 2, event node = 1, comment = 'Node 2 - mydb at rome'); + echo 'Set up replication nodes'; + + # STORE PATH + echo 'Next: configure paths for each node/origin'; + store path (server = 1, client = 2, conninfo = 'host=rome dbname=mydb user=postgres port=5432'); + store path (server = 2, client = 1, conninfo = 'host=rome dbname=mydb user=postgres port=5433'); + echo 'Replication nodes prepared'; + echo 'Please start a slon replication daemon for each node'; + + </programlisting> + <para>The first section indicates node information and the + initialization of the cluster, then it adds the second node to the + cluster and finally stores communications paths for both nodes in + the slony schema.</para> + <para> + Now is time to execute the command: + <command>cat /tmp/init.txt | /data/pgsql-8.2.3/bin/slonik</command> + </para> + <para>This will run pretty quickly and give you some output to + indicate success.</para> + <para> + If things do fail, the most likely reasons would be database + permissions, <filename>pg_hba.conf</filename> settings, or typos + in <filename>slon_tools.conf</filename>. Look over your problem + and solve it. If slony schemas were created but it still failed + you can issue the script <command>slonik_uninstall_nodes</command> to + clean things up. In the worst case you may connect to each + database and issue <command>drop schema _replication cascade;</command> + to clean up. + </para> + </sect3> + <sect3> + <title>The slon daemon</title> + + <para>As the result from the last command told us, we should now + be starting a slon replication daemon for each node! The slon + daemon is what makes the replication work. All transfers and all + work is done by the slon daemon. One is needed for each node. So + in our case we need one for the 8.2.3 installation and one for the + 8.3.3.</para> + + <para> to start one for 8.2.3 you would do: + <command>/data/pgsql-8.2.3/slony/slon_start 1 --nowatchdog</command> + This would start the daemon for node 1, the --nowatchdog since we + are running a very small replication we do not need any watchdogs + that keep an eye on the slon process if it stays up etc. </para> + + <para>if it says started successfully have a look in the log file + at /var/log/slony/slony1/node1/ It will show that the process was + started ok</para> + + <para> We need to start one for 8.3.3 as well. <command> + <command>/data/pgsql-8.3.3/slony/slon_start 2 --nowatchdog</command> + </command> </para> + + <para>If it says it started successfully have a look in the log + file at /var/log/slony/slony1/node2/ It will show that the process + was started ok</para> + </sect3> + <sect3> + <title>Adding the replication set</title> + <para>We now need to let the slon replication know which tables and + sequences it is to replicate. We need to create the set.</para> + <para> + Issue the following: + <command>/data/pgsql-8.2.3/slony/slonik_create_set + set1 > /tmp/createset.txt</command> + </para> + + <para> <filename> /tmp/createset.txt</filename> may be quite lengthy depending on how + many tables; in any case, take a quick look and it should make sense as it + defines all the tables and sequences to be replicated</para> + + <para> + If you are happy with the result send the file to the slonik for + execution + <command>cat /tmp/createset.txt | /data/pgsql-8.2.3/bin/slonik + </command> + You will see quite a lot rolling by, one entry for each table. + </para> + <para>You now have defined what is to be replicated</para> + </sect3> + <sect3> + <title>Subscribing all the data</title> + <para> + The final step is to get all the data onto the new database. It is + simply done using the subscribe script. + <command>data/pgsql-8.2.3/slony/slonik_subscribe_set + 1 2 > /tmp/subscribe.txt</command> + the first is the ID of the set, second is which node that is to + subscribe. + </para> + <para> + will look something like this: + <programlisting> + cluster name = replication; + node 1 admin conninfo='host=rome dbname=mydb user=postgres port=5432'; + node 2 admin conninfo='host=rome dbname=mydb user=postgres port=5433'; + try { + subscribe set (id = 1, provider = 1, receiver = 2, forward = yes); + } + on error { + exit 1; + } + echo 'Subscribed nodes to set 1'; + </programlisting> + send it to the slonik + <command>cat /tmp/subscribe.txt | /data/pgsql-8.2.3/bin/slonik + </command> + </para> + <para>The replication will now start. It will copy everything in + tables/sequneces that were in the set. understandable this can take + quite some time, all depending on the size of db and power of the + machine.</para> + <para> + One way to keep track of the progress would be to do the following: + <command>tail -f /var/log/slony/slony1/node2/log | grep -i copy + </command> + The slony logging is pretty verbose and doing it this way will let + you know how the copying is going. At some point it will say "copy + completed sucessfully in xxx seconds" when you do get this it is + done! + </para> + <para>Once this is done it will start trying to catch up with all + data that has come in since the replication was started. You can + easily view the progress of this in the database. Go to the master + database, in the replication schema there is a view called + sl_status. It is pretty self explanatory. The field of most interest + is the "st_lag_num_events" this declare how many slony events behind + the node is. 0 is best. but it all depends how active your db is. + The field next to it st_lag_time is an estimation how much in time + it is lagging behind. Take this with a grain of salt. The actual + events is a more accurate messure of lag.</para> + <para>You now have a fully replicated database</para> + </sect3> + <sect3> + <title>Switching over</title> + <para>Our database is fully replicated and its keeping up. There + are few different options for doing the actual switch over it all + depends on how much time you got to work with, down time vs. data + loss ratio. the most brute force fast way of doing it would be + </para> + <orderedlist> + <listitem> + <para>First modify the postgresql.conf file for the 8.3.3 to + use port 5432 so that it is ready for the restart</para> + </listitem> + <listitem> + <para>From this point you will have down time. shutdown the + 8.2.3 postgreSQL installation</para> + </listitem> + <listitem> + <para>restart the 8.3.3 postgreSQL installation. It should + come up ok.</para> + </listitem> + <listitem> + <para> + drop all the slony stuff from the 8.3.3 installation login psql to + the 8.3.3 and issue + <command>drop schema _replication cascade;</command> + </para> + </listitem> + </orderedlist> + <para>You have now upgraded to 8.3.3 with, hopefully, minimal down + time. This procedure represents roughly the simplest way to do + this.</para> + </sect3> + </sect2> </sect1> <!-- Keep this comment at the end of the file Index: faq.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/faq.sgml,v retrieving revision 1.66.2.7 retrieving revision 1.66.2.8 diff -C2 -d -r1.66.2.7 -r1.66.2.8 *** faq.sgml 19 Feb 2009 16:47:05 -0000 1.66.2.7 --- faq.sgml 30 Apr 2009 16:06:10 -0000 1.66.2.8 *************** *** 15,19 **** </question> - <answer><para> <productname>Frotznik Freenix</productname> is new to me, so it's a bit dangerous to give really hard-and-fast definitive --- 15,18 ---- *************** *** 233,236 **** --- 232,454 ---- </qandaentry> + <qandaentry> + <question> <para> Problem building on Fedora/x86-64 </para> + + <para> When trying to configure &slony1; on a Fedora x86-64 system, + where <application>yum</application> was used to install the package + <filename>postgresql-libs.x86_64</filename>, the following complaint + comes up: + + <screen> + configure: error: Your version of libpq doesn't have PQunescapeBytea + this means that your version of PostgreSQL is lower than 7.3 + and thus not supported by Slony-I. + </screen></para> + + <para> This happened with &postgres; 8.2.5, which is certainly rather + newer than 7.3. </para> + </question> + + <answer> <para> <application>configure</application> is looking for + that symbol by compiling a little program that calls for it, and + checking if the compile succeeds. On the <command>gcc</command> + command line it uses <command>-lpq</command> to search for the + library. </para> + + <para> Unfortunately, that package is missing a symlink, from + <filename>/usr/lib64/libpq.so</filename> to + <filename>libpq.so.5.0</filename>; that is why it fails to link to + libpq. The <emphasis>true</emphasis> problem is that the compiler failed to + find a library to link to, not that libpq lacked the function call. + </para> + + <para> Eventually, this should be addressed by those that manage the + <filename>postgresql-libs.x86_64</filename> package. </para> + </answer> + + <answer> <para> Note that this same symptom can be the indication of + similar classes of system configuration problems. Bad symlinks, bad + permissions, bad behaviour on the part of your C compiler, all may + potentially lead to this same error message. </para> + + <para> Thus, if you see this error, you need to look in the log file + that is generated, <filename>config.log</filename>. Search down to + near the end, and see what the <emphasis>actual</emphasis> complaint + was. That will be helpful in tracking down the true root cause of the + problem.</para> + </answer> + + </qandaentry> + </qandadiv> + + <qandadiv id="faqhowto"> <title> &slony1; FAQ: How Do I? </title> + + <qandaentry> + + <question> <para> I need to dump a database + <emphasis>without</emphasis> getting &slony1; configuration + (<emphasis>e.g.</emphasis> - triggers, functions, and such). </para> + </question> + + <answer> <para> Up to version 1.2, this is fairly nontrivial, + requiring careful choice of nodes, and some moderately heavy + <quote>procedure</quote>. One methodology is as follows:</para> + + <itemizedlist> + + <listitem><para> First, dump the schema from the node that has the + <quote>master</quote> role. That is the only place, pre-2.0, where + you can readily dump the schema using + <application>pg_dump</application> and have a consistent schema. You + may use the &slony1; tool <xref linkend="extractschema"> to do + this. </para> </listitem> + + <listitem><para> Take the resulting schema, which will <emphasis>not</emphasis> + include the &slony1;-specific bits, and split it into two pieces: + </para> + + <itemizedlist> + + <listitem><para> Firstly, the portion comprising all of the creations + of tables in the schema. </para> </listitem> + + <listitem><para> Secondly, the portion consisting of creations of indices, constraints, and triggers. </para> </listitem> + + </itemizedlist> + + </listitem> + + <listitem><para> Pull a data dump, using <command>pg_dump --data-only</command>, of some node of your choice. It doesn't need to be for the <quote>master</quote> node. This dump will include the contents of the &slony1;-specific tables; you can discard that, or ignore it. Since the schema dump didn't contain table definitions for the &slony1; tables, they won't be loaded. </para> </listitem> + + <listitem><para> Finally, load the three components in proper order: </para> + <itemizedlist> + <listitem><para> Schema (tables) </para> </listitem> + <listitem><para> Data dump </para> </listitem> + <listitem><para> Remainder of the schema </para> </listitem> + </itemizedlist> + </listitem> + + </itemizedlist> + + </answer> + + <answer> <para> In &slony1; 2.0, the answer becomes simpler: Just take + a <command>pg_dump --exclude-schema=_Cluster</command> against + <emphasis>any</emphasis> node. In 2.0, the schemas are no longer + <quote>clobbered</quote> on subscribers, so a straight + <application>pg_dump</application> will do what you want.</para> + </answer> + + </qandaentry> + + <qandaentry id="cannotrenumbernodes"> + <question> <para> I'd like to renumber the node numbers in my cluster. + How can I renumber nodes? </para> </question> + + <answer> <para> The first answer is <quote>you can't do that</quote> - + &slony1; node numbers are quite <quote>immutable.</quote> Node numbers + are deeply woven into the fibres of the schema, by virtue of being + written into virtually every table in the system, but much more + importantly by virtue of being used as the basis for event + propagation. The only time that it might be <quote>OK</quote> to + modify a node number is at some time where we know that it is not in + use, and we would need to do updates against each node in the cluster + in an organized fashion.</para> + + <para> To do this in an automated fashion seems like a + <emphasis>huge</emphasis> challenge, as it changes the structure of + the very event propagation system that already needs to be working in + order for such a change to propagate.</para> </answer> + + <answer> <para> If it is <emphasis>enormously necessary</emphasis> to + renumber nodes, this might be accomplished by dropping and re-adding + nodes to get rid of the node formerly using the node ID that needs to + be held by another node.</para> </answer> + </qandaentry> + + </qandadiv> + + <qandadiv id="faqimpossibilities"> <title> &slony1; FAQ: Impossible Things People Try </title> + + <qandaentry> + <question><para> Can I use &slony1; to replicate changes back and forth on my database between my two offices? </para> </question> + + <answer><para> At one level, it is <emphasis>theoretically + possible</emphasis> to do something like that, if you design your + application so that each office has its own distinct set of tables, + and you then have some system for consolidating the data to give them + some common view. However, this requires a great deal of design work + to create an application that performs this consolidation. </para> + </answer> + + <answer><para> In practice, the term for that is <quote>multimaster + replication,</quote> and &slony1; does not support <quote>multimaster + replication.</quote> </para> </answer> + + </qandaentry> + + <qandaentry> + <question><para> I want to replicate all of the databases for a shared-database system I am managing. There are multiple databases, being used by my customers. </para> </question> + + <answer><para> For this purpose, something like &postgres; PITR (Point + In Time Recovery) is likely to be much more suitable. &slony1; + requires a slon process (and multiple connections) for each + identifiable database, and if you have a &postgres; cluster hosting 50 + or 100 databases, this will require hundreds of database connections. + Typically, in <quote>shared hosting</quote> situations, DML is being + managed by customers, who can change anything they like whenever + <emphasis>they</emphasis> want. &slony1; does not work out well when + not used in a disciplined manner. </para> </answer> + </qandaentry> + + <qandaentry> + <question><para> I want to be able to make DDL changes, and have them replicated automatically. </para> </question> + + <answer><para> &slony1; requires that <xref linkend="ddlchanges"> be planned for explicitly and carefully. &slony1; captures changes using triggers, and &postgres; does not provide a way to use triggers to capture DDL changes.</para> + + <note><para> There has been quite a bit of discussion, off and on, about how + &postgres; might capture DDL changes in a way that would make triggers + useful; nothing concrete has emerged after several years of + discussion. </para> </note> </answer> + </qandaentry> + + <qandaentry> + <question><para> I want to split my cluster into disjoint partitions that are not aware of one another. &slony1; keeps generating <xref linkend="listenpaths"> that link those partitions together. </para> </question> + + <answer><para> The notion that all nodes are aware of one another is + deeply imbedded in the design of &slony1;. For instance, its handling + of cleanup of obsolete data depends on being aware of whether any of + the nodes are behind, and thus might still depend on older data. + </para> </answer> + </qandaentry> + + <qandaentry> + <question><para> I want to change some of my node numbers. How do I <quote>rename</quote> a node to have a different node number? </para> </question> + <answer><para> You don't. The node number is used to coordinate inter-node communications, and changing the node ID number <quote>on the fly</quote> would make it essentially impossible to keep node configuration coordinated. </para> </answer> + </qandaentry> + + <qandaentry> + <question> <para> My application uses OID attributes; is it possible to replicate tables like this? </para> + </question> + + <answer><para> It is worth noting that oids, as a regular table + attribute, have been deprecated since &postgres; version 8.1, back in + 2005. &slony1; has <emphasis>never</emphasis> collected oids to + replicate them, and, with that functionality being deprecated, the + developers do not intend to add this functionality. </para> + + <para> &postgres; implemented oids as a way to link its internal + system tables together; to use them with application tables is + considered <emphasis>poor practice</emphasis>, and it is recommended + that you use sequences to populate your own ID column on application + tables. </para> </answer> + + <answer><para> Of course, nothing prevents you from creating a table + <emphasis>without</emphasis> oids, and then add in your own + application column called <envar>oid</envar>, preferably with type + information <command>SERIAL NOT NULL UNIQUE</command>, which + <emphasis>can</emphasis> be replicated, and which is likely to be + suitable as a candidate primary key for the table. </para> </answer> + </qandaentry> </qandadiv> *************** *** 416,423 **** could also announce an admin to take a look... </para> </answer> - <answer><para> As of &postgres; 8.3, this should no longer be an - issue, as this version has code which invalidates query plans when - tables are altered. </para> </answer> - </qandaentry> --- 634,637 ---- *************** *** 716,732 **** <qandaentry> ! <question> <para> Replication has fallen behind, and it appears that the ! queries to draw data from <xref linkend="table.sl-log-1">/<xref ! linkend="table.sl-log-2"> are taking a long time to pull just a few <command>SYNC</command>s. </para> </question> ! <answer> <para> Until version 1.1.1, there was only one index on <xref ! linkend="table.sl-log-1">/<xref linkend="table.sl-log-2">, and if ! there were multiple replication sets, some of the columns on the index ! would not provide meaningful selectivity. If there is no index on ! column <function> log_xid</function>, consider adding it. See ! <filename>slony1_base.sql</filename> for an example of how to create ! the index. </para> </answer> --- 930,945 ---- <qandaentry> ! <question> <para> Replication has fallen behind, and it appears that ! the queries to draw data from &sllog1;/&sllog2; are taking a long time ! to pull just a few <command>SYNC</command>s. </para> </question> ! <answer> <para> Until version 1.1.1, there was only one index on ! &sllog1;/&sllog2;, and if there were multiple replication sets, some ! of the columns on the index would not provide meaningful selectivity. ! If there is no index on column <function> log_xid</function>, consider ! adding it. See <filename>slony1_base.sql</filename> for an example of ! how to create the index. </para> </answer> *************** *** 1112,1117 **** <question><para> Replication has been slowing down, I'm seeing <command> FETCH 100 FROM LOG </command> queries running for a long ! time, <xref linkend="table.sl-log-1"> is growing, and performance is, ! well, generally getting steadily worse. </para> </question> --- 1325,1330 ---- <question><para> Replication has been slowing down, I'm seeing <command> FETCH 100 FROM LOG </command> queries running for a long ! time, &sllog1;/&sllog2; is growing, and performance is, well, ! generally getting steadily worse. </para> </question> *************** *** 1136,1142 **** <listitem><para> The cleanup thread will be unable to clean out ! entries in <xref linkend="table.sl-log-1"> and <xref ! linkend="table.sl-seqlog">, with the result that these tables will ! grow, ceaselessly, until the transaction is closed. </para> </listitem> </itemizedlist> --- 1349,1355 ---- <listitem><para> The cleanup thread will be unable to clean out ! entries in &sllog1;, &sllog2;, and &slseqlog;, with the result that ! these tables will grow, ceaselessly, until the transaction is ! closed. </para> </listitem> </itemizedlist> *************** *** 1177,1182 **** <qandaentry id="faq17"> ! <question><para>After dropping a node, <xref linkend="table.sl-log-1"> ! isn't getting purged out anymore.</para></question> <answer><para> This is a common scenario in versions before 1.0.5, as --- 1390,1395 ---- <qandaentry id="faq17"> ! <question><para>After dropping a node, &sllog1;/&sllog2; ! aren't getting purged out anymore.</para></question> <answer><para> This is a common scenario in versions before 1.0.5, as *************** *** 1242,1247 **** <listitem><para> At the start of each <function>cleanupEvent</function> run, which is the event in which old ! data is purged from <xref linkend="table.sl-log-1"> and <xref ! linkend="table.sl-seqlog"></para></listitem> </itemizedlist></para> </answer> </qandaentry> --- 1455,1460 ---- <listitem><para> At the start of each <function>cleanupEvent</function> run, which is the event in which old ! data is purged from &sllog1;, &sllog2;, and ! &slseqlog;</para></listitem> </itemizedlist></para> </answer> </qandaentry> *************** *** 1253,1263 **** sync through.</para></question> ! <answer><para> You might want to take a look at the <xref ! linkend="table.sl-log-1">/<xref linkend="table.sl-log-2"> tables, and ! do a summary to see if there are any really enormous &slony1; ! transactions in there. Up until at least 1.0.2, there needs to be a ! &lslon; connected to the origin in order for <command>SYNC</command> events to be generated.</para> <para>If none are being generated, then all of the updates until the next one is generated will collect into one rather enormous &slony1; --- 1466,1479 ---- sync through.</para></question> ! <answer><para> You might want to take a look at the tables &sllog1; ! and &sllog2; and do a summary to see if there are any really enormous ! &slony1; transactions in there. Up until at least 1.0.2, there needs ! to be a &lslon; connected to the origin in order for <command>SYNC</command> events to be generated.</para> + <note><para> As of 1.0.2, + function <function>generate_sync_event()</function> provides an + alternative as backup...</para> </note> + <para>If none are being generated, then all of the updates until the next one is generated will collect into one rather enormous &slony1; *************** *** 1331,1334 **** --- 1547,1569 ---- </answer> </qandaentry> + <qandaentry> + + <question><para> I'm noticing in the logs that a &lslon; is frequently + switching in and out of <quote>polling</quote> mode as it is + frequently reporting <quote>LISTEN - switch from polling mode to use + LISTEN</quote> and <quote>UNLISTEN - switch into polling + mode</quote>. </para> </question> + + <answer><para> The thresholds for switching between these modes are + controlled by the configuration parameters <xref + linkend="slon-config-sync-interval"> and <xref + linkend="slon-config-sync-interval-timeout">; if the timeout value + (which defaults to 10000, implying 10s) is kept low, that makes it + easy for the &lslon; to decide to return to <quote>listening</quote> + mode. You may want to increase the value of the timeout + parameter. </para> + </answer> + </qandaentry> + </qandadiv> <qandadiv id="faqbugs"> <title> &slony1; FAQ: &slony1; Bugs in Elder Versions </title> *************** *** 1461,1467 **** nodes. I am discovering that confirmations for set 1 never get to the nodes subscribing to set 2, and that confirmations for set 2 never get ! to nodes subscribing to set 1. As a result, <xref ! linkend="table.sl-log-1"> grows and grows and is never purged. This ! was reported as &slony1; <ulink url="http://gborg.postgresql.org/project/slony1/bugs/bugupdate.php?1485"> bug 1485 </ulink>. --- 1696,1702 ---- nodes. I am discovering that confirmations for set 1 never get to the nodes subscribing to set 2, and that confirmations for set 2 never get ! to nodes subscribing to set 1. As a result, &sllog1;/&sllog2; grow ! and grow, and are never purged. This was reported as ! &slony1; <ulink url="http://gborg.postgresql.org/project/slony1/bugs/bugupdate.php?1485"> bug 1485 </ulink>. *************** *** 1515,1520 **** subscriber to a particular provider are for <quote>sequence-only</quote> sets. If a node gets into that state, ! replication will fail, as the query that looks for data from <xref ! linkend="table.sl-log-1"> has no tables to find, and the query will be malformed, and fail. If a replication set <emphasis>with</emphasis> tables is added back to the mix, everything will work out fine; it --- 1750,1755 ---- subscriber to a particular provider are for <quote>sequence-only</quote> sets. If a node gets into that state, ! replication will fail, as the query that looks for data from ! &sllog1;/&sllog2; has no tables to find, and the query will be malformed, and fail. If a replication set <emphasis>with</emphasis> tables is added back to the mix, everything will work out fine; it *************** *** 1611,1614 **** --- 1846,1887 ---- linkend="stmtsetdropsequence">.</para></answer></qandaentry> + <qandaentry> + <question><para> I set up my cluster using pgAdminIII, with cluster + name <quote>MY-CLUSTER</quote>. Time has passed, and I tried using + Slonik to make a configuration change, and this is failing with the + following error message:</para> + + <programlisting> + ERROR: syntax error at or near - + </programlisting> + </question> + + <answer><para> The problem here is that &slony1; expects cluster names + to be valid <ulink url= + "http://www.postgresql.org/docs/8.3/static/sql-syntax-lexical.html"> + SQL Identifiers</ulink>, and &lslonik; enforces this. Unfortunately, + <application>pgAdminIII</application> did not do so, and allowed using + a cluster name that now causes <emphasis>a problem.</emphasis> </para> </answer> + + <answer> <para> If you have gotten into this spot, it's a problem that + we mayn't be help resolve, terribly much. </para> + + <para> It's <emphasis>conceivably possible</emphasis> that running the + SQL command <command>alter namespace "_My-Bad-Clustername" rename to + "_BetterClusterName";</command> against each database may work. That + shouldn't particularly <emphasis>damage</emphasis> things!</para> + + <para> On the other hand, when the problem has been experienced, users + have found they needed to drop replication and rebuild the + cluster.</para> </answer> + + <answer><para> A change in version 2.0.2 is that a function runs as + part of loading functions into the database which checks the validity + of the cluster name. If you try to use an invalid cluster name, + loading the functions will fail, with a suitable error message, which + should prevent things from going wrong even if you're using tools + other than &lslonik; to manage setting up the cluster. </para></answer> + </qandaentry> + </qandadiv> *************** *** 1818,1837 **** <para>By the time we notice that there is a problem, the seemingly ! missed delete transaction has been cleaned out of <xref ! linkend="table.sl-log-1">, so there appears to be no recovery ! possible. What has seemed necessary, at this point, is to drop the ! replication set (or even the node), and restart replication from ! scratch on that node.</para> ! <para>In &slony1; 1.0.5, the handling of purges of <xref ! linkend="table.sl-log-1"> became more conservative, refusing to purge ! entries that haven't been successfully synced for at least 10 minutes ! on all nodes. It was not certain that that would prevent the ! <quote>glitch</quote> from taking place, but it seemed plausible that ! it might leave enough <xref linkend="table.sl-log-1"> data to be able ! to do something about recovering from the condition or at least ! diagnosing it more exactly. And perhaps the problem was that <xref ! linkend="table.sl-log-1"> was being purged too aggressively, and this ! would resolve the issue completely.</para> <para> It is a shame to have to reconstruct a large replication node --- 2091,2108 ---- <para>By the time we notice that there is a problem, the seemingly ! missed delete transaction has been cleaned out of &sllog1;, so there ! appears to be no recovery possible. What has seemed necessary, at ! this point, is to drop the replication set (or even the node), and ! restart replication from scratch on that node.</para> ! <para>In &slony1; 1.0.5, the handling of purges of &sllog1; became ! more conservative, refusing to purge entries that haven't been ! successfully synced for at least 10 minutes on all nodes. It was not ! certain that that would prevent the <quote>glitch</quote> from taking ! place, but it seemed plausible that it might leave enough &sllog1; ! data to be able to do something about recovering from the condition or ! at least diagnosing it more exactly. And perhaps the problem was that ! &sllog1; was being purged too aggressively, and this would resolve the ! issue completely.</para> <para> It is a shame to have to reconstruct a large replication node *************** *** 1844,1850 **** <para> In one case we found two lines in the SQL error message in the log file that contained <emphasis> identical </emphasis> insertions ! into <xref linkend="table.sl-log-1">. This <emphasis> ought ! </emphasis> to be impossible as is a primary key on <xref ! linkend="table.sl-log-1">. The latest (somewhat) punctured theory that comes from <emphasis>that</emphasis> was that perhaps this PK index has been corrupted (representing a &postgres; bug), and that --- 2115,2120 ---- <para> In one case we found two lines in the SQL error message in the log file that contained <emphasis> identical </emphasis> insertions ! into &sllog1;. This <emphasis> ought </emphasis> to be impossible as ! is a primary key on &sllog1;. The latest (somewhat) punctured theory that comes from <emphasis>that</emphasis> was that perhaps this PK index has been corrupted (representing a &postgres; bug), and that *************** *** 1951,1956 **** <para> That trigger initiates the action of logging all updates to the ! table to &slony1; <xref linkend="table.sl-log-1"> ! tables.</para></listitem> <listitem><para> On a subscriber node, this involves disabling --- 2221,2225 ---- <para> That trigger initiates the action of logging all updates to the ! table to &slony1; &sllog1;/&sllog2; tables.</para></listitem> <listitem><para> On a subscriber node, this involves disabling *************** *** 2068,2072 **** <para>The solution is to rebuild the trigger on the affected table and ! fix the entries in <xref linkend="table.sl-log-1"> by hand.</para> <itemizedlist> --- 2337,2341 ---- <para>The solution is to rebuild the trigger on the affected table and ! fix the entries in &sllog1;/&sllog2; by hand.</para> <itemizedlist> *************** *** 2087,2096 **** </screen> ! <para>You then need to find the rows in <xref ! linkend="table.sl-log-1"> that have bad ! entries and fix them. You may ! want to take down the slon daemons for all nodes except the master; ! that way, if you make a mistake, it won't immediately propagate ! through to the subscribers.</para> <para> Here is an example:</para> --- 2356,2363 ---- </screen> ! <para>You then need to find the rows in &sllog1;/&sllog2; that have ! bad entries and fix them. You may want to take down the slon daemons ! for all nodes except the master; that way, if you make a mistake, it ! won't immediately propagate through to the subscribers.</para> <para> Here is an example:</para> *************** *** 2215,2223 **** </question> <para> &slony1; uses sequences to provide primary key values for log entries, and therefore this kind of behaviour may (perhaps regrettably!) be expected. </para> ! <answer> <para> Calling <function>lastval()</function>, to <quote>anonymously</quote> get <quote>the most recently updated sequence value</quote>, rather than using --- 2482,2491 ---- </question> + <answer> <para> &slony1; uses sequences to provide primary key values for log entries, and therefore this kind of behaviour may (perhaps regrettably!) be expected. </para> ! <para> Calling <function>lastval()</function>, to <quote>anonymously</quote> get <quote>the most recently updated sequence value</quote>, rather than using Index: dropthings.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/dropthings.sgml,v retrieving revision 1.16.2.1 retrieving revision 1.16.2.2 diff -C2 -d -r1.16.2.1 -r1.16.2.2 *** dropthings.sgml 5 Jan 2007 19:11:44 -0000 1.16.2.1 --- dropthings.sgml 30 Apr 2009 16:06:10 -0000 1.16.2.2 *************** *** 159,162 **** --- 159,172 ---- nodes.</para> </sect2> + + <sect2> <title> Verifying Cluster Health </title> + + <para> After performing any of these procedures, it is an excellent + idea to run the <filename>tools</filename> script <eststate;, which + rummages through the state of the entire cluster, pointing out any + anomalies that it finds. This includes a variety of sorts of + communications problems.</para> + + </sect2> </sect1> <!-- Keep this comment at the end of the file Index: cluster.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/cluster.sgml,v retrieving revision 1.13 retrieving revision 1.13.2.1 diff -C2 -d -r1.13 -r1.13.2.1 *** cluster.sgml 2 Aug 2006 18:34:57 -0000 1.13 --- cluster.sgml 30 Apr 2009 16:06:10 -0000 1.13.2.1 *************** *** 13,20 **** tables that store &slony1; configuration and replication state information. See <xref linkend="schema"> for more documentation about ! what is stored in that schema. More specifically, the tables <xref ! linkend="table.sl-log-1"> and <xref linkend="table.sl-log-2"> log ! changes collected on the origin node as they are replicated to ! subscribers. </para> <para>Each database instance in which replication is to take place is --- 13,19 ---- tables that store &slony1; configuration and replication state information. See <xref linkend="schema"> for more documentation about ! what is stored in that schema. More specifically, the tables &sllog1; ! and &sllog2; log changes collected on the origin node as they are ! replicated to subscribers. </para> <para>Each database instance in which replication is to take place is *************** *** 24,27 **** --- 23,31 ---- node #1, and for the subscriber to be node #2.</para> + <para> Note that, as recorded in the <xref linkend="faq"> under <link + linkend="cannotrenumbernodes"> How can I renumber nodes?</link>, the + node number is immutable, so it is not possible to change a node's + node number after it has been set up.</para> + <para>Some planning should be done, in more complex cases, to ensure that the numbering system is kept sane, lest the administrators be Index: defineset.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/defineset.sgml,v retrieving revision 1.25.2.2 retrieving revision 1.25.2.3 diff -C2 -d -r1.25.2.2 -r1.25.2.3 *** defineset.sgml 11 Jun 2007 16:01:33 -0000 1.25.2.2 --- defineset.sgml 30 Apr 2009 16:06:10 -0000 1.25.2.3 *************** *** 71,80 **** <listitem><para> If the table hasn't even got a candidate primary key, ! you can ask &slony1; to provide one. This is done by first using ! <xref linkend="stmttableaddkey"> to add a column populated using a ! &slony1; sequence, and then having the <xref ! linkend="stmtsetaddtable"> include the directive ! <option>key=serial</option>, to indicate that &slony1;'s own column ! should be used.</para></listitem> </itemizedlist> --- 71,82 ---- <listitem><para> If the table hasn't even got a candidate primary key, ! you might ask &slony1; to provide one using ! <xref linkend="stmttableaddkey">.</para> ! ! <warning><para> <xref linkend="stmttableaddkey"> was always considered ! a <quote>kludge</quote>, at best, and as of version 2.0, it is ! considered such a misfeature that it is being removed. </para> ! </warning> ! </listitem> </itemizedlist> *************** *** 83,92 **** <quote>true</quote> primary key or a mere <quote>candidate primary key;</quote> it is, however, strongly recommended that you have one of ! those instead of having &slony1; populate the PK column for you. If you ! don't have a suitable primary key, that means that the table hasn't got ! any mechanism, from your application's standpoint, for keeping values ! unique. &slony1; may, therefore, introduce a new failure mode for your ! application, and this also implies that you had a way to enter confusing ! data into the database.</para> </sect2> --- 85,94 ---- <quote>true</quote> primary key or a mere <quote>candidate primary key;</quote> it is, however, strongly recommended that you have one of ! those instead of having &slony1; populate the PK column for you. If ! you don't have a suitable primary key, that means that the table ! hasn't got any mechanism, from your application's standpoint, for ! keeping values unique. &slony1; may, therefore, introduce a new ! failure mode for your application, and this also implies that you had ! a way to enter confusing data into the database.</para> </sect2> *************** *** 119,122 **** --- 121,134 ---- the degree of the <quote>injury</quote> to performance.</para> + <para> Another issue comes up particularly frequently when replicating + across a WAN; sometimes the network connection is a little bit + unstable, such that there is a risk that a connection held open for + several hours will lead to <command>CONNECTION TIMEOUT.</command> If + that happens when 95% done copying a 50-table replication set + consisting of 250GB of data, that could ruin your whole day. If the + tables were, instead, associated with separate replication sets, that + failure at the 95% point might only interrupt, temporarily, the + copying of <emphasis>one</emphasis> of those tables. </para> + <para> These <quote>negative effects</quote> tend to emerge when the database being subscribed to is many gigabytes in size and where it *************** *** 161,166 **** <para> Each time a SYNC is processed, values are recorded for <emphasis>all</emphasis> of the sequences in the set. If there are a ! lot of sequences, this can cause <xref linkend="table.sl-seqlog"> to ! grow rather large.</para> <para> This points to an important difference between tables and --- 173,178 ---- <para> Each time a SYNC is processed, values are recorded for <emphasis>all</emphasis> of the sequences in the set. If there are a ! lot of sequences, this can cause &slseqlog; to grow rather ! large.</para> <para> This points to an important difference between tables and *************** *** 177,192 **** <para> If it is not updated, the trigger on the table on the origin ! never fires, and no entries are added to <xref ! linkend="table.sl-log-1">. The table never appears in any of the further replication queries (<emphasis>e.g.</emphasis> in the <command>FETCH 100 FROM LOG</command> queries used to find replicatable data) as they only look for tables for which there are ! entries in <xref linkend="table.sl-log-1">.</para></listitem> <listitem><para> In contrast, a fixed amount of work is introduced to each SYNC by each sequence that is replicated.</para> ! <para> Replicate 300 sequence and 300 rows need to be added to <xref ! linkend="table.sl-seqlog"> on a regular basis.</para> <para> It is more than likely that if the value of a particular --- 189,205 ---- <para> If it is not updated, the trigger on the table on the origin ! never fires, and no entries are added to &sllog1;/&sllog2;. The table never appears in any of the further replication queries (<emphasis>e.g.</emphasis> in the <command>FETCH 100 FROM LOG</command> queries used to find replicatable data) as they only look for tables for which there are ! entries in &sllog1;/&sllog2;.</para></listitem> <listitem><para> In contrast, a fixed amount of work is introduced to each SYNC by each sequence that is replicated.</para> ! <para> Replicate 300 sequence and 300 rows need to be added to ! &slseqlog; on a regular basis, at least, thru until the 2.0 branch, ! where updates are only applied when the value of a given sequence is ! seen to change.</para> <para> It is more than likely that if the value of a particular Index: prerequisites.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/prerequisites.sgml,v retrieving revision 1.26.2.2 retrieving revision 1.26.2.3 diff -C2 -d -r1.26.2.2 -r1.26.2.3 *** prerequisites.sgml 11 Jun 2007 16:01:33 -0000 1.26.2.2 --- prerequisites.sgml 30 Apr 2009 16:06:10 -0000 1.26.2.3 *************** *** 8,17 **** <indexterm><primary> platforms where &slony1; runs </primary> </indexterm> ! <para>The platforms that have received specific testing at the time of ! this release are FreeBSD-4X-i368, FreeBSD-5X-i386, FreeBSD-5X-alpha, ! OS-X-10.3, Linux-2.4X-i386 Linux-2.6X-i386 Linux-2.6X-amd64, <trademark>Solaris</trademark>-2.8-SPARC, ! <trademark>Solaris</trademark>-2.9-SPARC, AIX 5.1, OpenBSD-3.5-sparc64 ! and &windows; 2000, XP and 2003 (32 bit).</para> <sect2> --- 8,19 ---- <indexterm><primary> platforms where &slony1; runs </primary> </indexterm> ! <para>The platforms that have received specific testing are ! FreeBSD-4X-i368, FreeBSD-5X-i386, FreeBSD-5X-alpha, OS-X-10.3, ! Linux-2.4X-i386 Linux-2.6X-i386 Linux-2.6X-amd64, <trademark>Solaris</trademark>-2.8-SPARC, ! <trademark>Solaris</trademark>-2.9-SPARC, AIX 5.1 and 5.3, ! OpenBSD-3.5-sparc64 and &windows; 2000, XP and 2003 (32 bit). There ! is enough diversity amongst these platforms that nothing ought to ! prevent running &slony1; on other similar platforms. </para> <sect2> *************** *** 67,70 **** --- 69,76 ---- linkend="pg81funs"> on &postgres; 8.1.[0-3] </link>. </para> + <para> There is variation between what versions of &postgres& are + compatible with what versions of &slony1;. See <xref + linkend="installation"> for more details.</para> + </listitem> *************** *** 103,107 **** installation.</para> ! <note><para>In &slony1; version 1.1, it is possible to compile &slony1; separately from &postgres;, making it practical for the makers of distributions of <productname>Linux</productname> and --- 109,113 ---- installation.</para> ! <note><para>From &slony1; version 1.1, it is possible to compile &slony1; separately from &postgres;, making it practical for the makers of distributions of <productname>Linux</productname> and Index: intro.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/intro.sgml,v retrieving revision 1.25.2.2 retrieving revision 1.25.2.3 diff -C2 -d -r1.25.2.2 -r1.25.2.3 *** intro.sgml 11 Jun 2007 16:01:33 -0000 1.25.2.2 --- intro.sgml 30 Apr 2009 16:06:10 -0000 1.25.2.3 *************** *** 297,307 **** <listitem><para> Each SYNC applied needs to be reported back to all of the other nodes participating in the set so that the nodes all know ! that it is safe to purge <xref linkend="table.sl-log-1"> and <xref ! linkend="table.sl-log-2"> data, as any <quote>forwarding</quote> node ! could potentially take over as <quote>master</quote> at any time. One ! might expect SYNC messages to need to travel through n/2 nodes to get ! propagated to their destinations; this means that each SYNC is ! expected to get transmitted n(n/2) times. Again, this points to a ! quadratic growth in communications costs as the number of nodes increases.</para></listitem> --- 297,307 ---- <listitem><para> Each SYNC applied needs to be reported back to all of the other nodes participating in the set so that the nodes all know ! that it is safe to purge &sllog1; and &sllog2; data, as ! any <quote>forwarding</quote> node could potentially take over ! as <quote>master</quote> at any time. One might expect SYNC messages ! to need to travel through n/2 nodes to get propagated to their ! destinations; this means that each SYNC is expected to get transmitted ! n(n/2) times. Again, this points to a quadratic growth in ! communications costs as the number of nodes increases.</para></listitem> Index: slonyupgrade.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/slonyupgrade.sgml,v retrieving revision 1.3.2.2 retrieving revision 1.3.2.3 diff -C2 -d -r1.3.2.2 -r1.3.2.3 *** slonyupgrade.sgml 16 Mar 2007 19:01:26 -0000 1.3.2.2 --- slonyupgrade.sgml 30 Apr 2009 16:06:10 -0000 1.3.2.3 *************** *** 78,81 **** --- 78,257 ---- </variablelist> + + <sect2> <title> TABLE ADD KEY issue in &slony1; 2.0 </title> + + <para> Usually, upgrades between &slony1; versions have required no + special attention to the condition of the existing replica. That is, + you fairly much merely need to stop &lslon;s, put new binaries in + place, run <xref linkend="stmtupdatefunctions"> against each node, and + restart &lslon;s. Schema changes have been internal to the cluster + schema, and <xref linkend="stmtupdatefunctions"> has been capable to + make all of the needed alterations. With version 2, this changes, if + there are tables that used <xref linkend="stmttableaddkey">. Version + 2 does not support the <quote>extra</quote> column, and + <quote>fixing</quote> the schema to have a proper primary key is not + within the scope of what <xref linkend="stmtupdatefunctions"> can + perform. </para> + + <para> When upgrading from versions 1.0.x, 1.1.x, or 1.2.x to version + 2, it will be necessary to have already eliminated any such + &slony1;-managed primary keys. </para> + + <para> One may identify the tables affected via the following SQL + query: <command> select n.nspname, c.relname from pg_class c, + pg_namespace n where c.oid in (select attrelid from pg_attribute where + attname like '_Slony-I_%rowID' and not attisdropped) and reltype <> 0 + and n.oid = c.relnamespace order by n.nspname, c.relname; </command> + </para> + + <para> The simplest approach that may be taken to rectify the + <quote>broken</quote> state of such tables is as follows: </para> + + <itemizedlist> + + <listitem><para> Drop the table from replication using the &lslonik; + command <xref linkend="stmtsetdroptable">. </para> + + <para> This does <emphasis>not</emphasis> drop out the + &slony1;-generated column. </para> + </listitem> + + <listitem><para> On each node, run an SQL script to alter the table, + dropping the extra column.</para> <para> <command> alter table + whatever drop column "_Slony-I_cluster-rowID";</command> </para> + + <para> This needs to be run individually against each node. Depending + on your preferences, you might wish to use <xref + linkend="stmtddlscript"> to do this. </para> + + <para> If the table is a heavily updated one, it is worth observing + that this alteration will require acquiring an exclusive lock on the + table. It will not hold this lock for terribly long; dropping the + column should be quite a rapid operation as all it does internally is + to mark the column as being dropped; it <emphasis>does not</emphasis> + require rewriting the entire contents of the table. Tuples that have + values in that column will continue to have that value; new tuples + will leave it NULL, and queries will ignore the column. Space for + those columns will get reclaimed as tuples get updated. </para> + + <para> Note that at this point in the process, this table is not being + replicated. If a failure takes place, replication is not, at this + point, providing protection on this table. This is unfortunate but + unavoidable. </para> + </listitem> + + <listitem><para> Make sure the table has a legitimate candidate for + primary key, some set of NOT NULL, UNIQUE columns. </para> + + <para> The possible variations to this are the reason that the + developers have made no effort to try to assist automation of + this.</para></listitem> + </itemizedlist> + + <itemizedlist> + + <listitem><para> If the table is a small one, it may be perfectly + reasonable to do alterations (note that they must be applied to + <emphasis>every node</emphasis>!) to add a new column, assign it via a + new sequence, and then declare it to be a primary key. </para> + + <para> If there are only a few tuples, this should take a fraction of + a second, and, with luck, be unnoticeable to a running + application. </para> + + <para> Even if the table is fairly large, if it is not frequently + accessed by the application, the locking of the table that takes place + when you run <command>ALTER TABLE</command> may not cause much + inconvenience. </para></listitem> + + <listitem> <para> If the table is a large one, and is vital to and + heavily accessed by the application, then it may be necessary to take + an application outage in order to accomplish the alterations, leaving + you necessarily somewhat vulnerable until the process is + complete. </para> + + <para> If it is troublesome to take outages, then the upgrade to + &slony1; version 2 may take some planning... </para> + </listitem> + + </itemizedlist> + + <itemizedlist> + + <listitem><para> Create a new replication set (<xref + linkend="stmtcreateset">) and re-add the table to that set (<xref + linkend="stmtsetaddtable">). </para> + + <para> If there are multiple tables, they may be handled via a single + replication set.</para> + </listitem> + + <listitem><para> Subscribe the set (<xref linkend="stmtsubscribeset">) + on all the nodes desired. </para> </listitem> + + <listitem><para> Once subscriptions are complete, merge the set(s) in, + if desired (<xref linkend="stmtmergeset">). </para> </listitem> + + </itemizedlist> + + <para> This approach should be fine for tables that are relatively + small, or infrequently used. If, on the other hand, the table is + large and heavily used, another approach may prove necessary, namely + to create your own sequence, and <quote>promote</quote> the formerly + &slony1;-generated column into a <quote>real</quote> column in your + database schema. An outline of the steps is as follows: </para> + + <itemizedlist> + + <listitem><para> Add a sequence that assigns values to the + column. </para> + + <para> Setup steps will include SQL <command>CREATE + SEQUENCE</command>, SQL <command>SELECT SETVAL()</command> (to set the + value of the sequence high enough to reflect values used in the + table), Slonik <xref linkend="stmtcreateset"> (to create a set to + assign the sequence to), Slonik <xref linkend="stmtsetaddsequence"> + (to assign the sequence to the set), Slonik <xref + linkend="stmtsubscribeset"> (to set up subscriptions to the new + set)</para> + </listitem> + + <listitem><para> Attach the sequence to the column on the + table. </para> + + <para> This involves <command>ALTER TABLE ALTER COLUMN</command>, + which must be submitted via the Slonik command <xref + linkend="stmtddlscript">. </para> + </listitem> + + <listitem><para> Rename the column + <envar>_Slony-I_ at CLUSTERNAME@_rowID</envar> so that &slony1; won't + consider it to be under its control.</para> + + <para> This involves <command>ALTER TABLE ALTER COLUMN</command>, + which must be submitted via the Slonik command <xref + linkend="stmtddlscript">. </para> + + <para> Note that these two alterations might be accomplished via the + same <xref linkend="stmtddlscript"> request. </para> + </listitem> + + </itemizedlist> + + </sect2> + + <sect2> <title> New Trigger Handling in &slony1; Version 2 </title> + + <para> One of the major changes to &slony1; is that enabling/disabling + of triggers and rules now takes place as plain SQL, supported by + &postgres; 8.3+, rather than via <quote>hacking</quote> on the system + catalog. </para> + + <para> As a result, &slony1; users should be aware of the &postgres; + syntax for <command>ALTER TABLE</command>, as that is how they can + accomplish what was formerly accomplished via <xref + linkend="stmtstoretrigger"> and <xref linkend="stmtdroptrigger">. </para> + + </sect2> </sect1> <!-- Keep this comment at the end of the file Index: installation.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/installation.sgml,v retrieving revision 1.28.2.5 retrieving revision 1.28.2.6 diff -C2 -d -r1.28.2.5 -r1.28.2.6 *** installation.sgml 1 Mar 2008 02:53:47 -0000 1.28.2.5 --- installation.sgml 30 Apr 2009 16:06:10 -0000 1.28.2.6 *************** *** 44,48 **** <para> <screen> ! PGMAIN=/usr/local/pgsql746-freebsd-2005-04-01 \ ./configure \ --with-pgconfigdir=$PGMAIN/bin --- 44,48 ---- <para> <screen> ! PGMAIN=/usr/local/pgsql839-freebsd-2008-09-03 \ ./configure \ --with-pgconfigdir=$PGMAIN/bin *************** *** 69,74 **** <application>configure</application> needed to know where your &postgres; source tree is, which was done with the ! <option>--with-pgsourcetree=</option> option. As of version 1.1, this ! is no longer necessary, as &slony1; has included within its own code base certain parts needed for platform portability. It now only needs to make reference to parts of &postgres; that are actually part of the --- 69,74 ---- <application>configure</application> needed to know where your &postgres; source tree is, which was done with the ! <option>--with-pgsourcetree=</option> option. Since version 1.1, this ! has not been necessary, as &slony1; has included within its own code base certain parts needed for platform portability. It now only needs to make reference to parts of &postgres; that are actually part of the *************** *** 93,99 **** to provide correct client libraries. </para> ! <para> &postgres; version 8 installs the server header <command>#include</command> files by default; with version 7.4 and ! earlier, you need to make sure that the build installation included doing <command>make install-all-headers</command>, otherwise the server headers will not be installed, and &slony1; will be unable to --- 93,99 ---- to provide correct client libraries. </para> ! <para> &postgres; versions from 8.0 onwards install the server header <command>#include</command> files by default; with version 7.4 and ! earlier, you needed to make sure that the build installation included doing <command>make install-all-headers</command>, otherwise the server headers will not be installed, and &slony1; will be unable to *************** *** 124,128 **** try to detect some quirks of your system. &slony1; is known to need a modified version of <application>libpq</application> on specific ! platforms such as Solaris2.X on SPARC. The patch for libpq version 7.4.2 can be found at <ulink id="threadpatch" url= "http://developer.postgresql.org/~wieck/slony1/download/threadsafe-libpq-742.diff.gz"> --- 124,128 ---- try to detect some quirks of your system. &slony1; is known to need a modified version of <application>libpq</application> on specific ! platforms such as Solaris2.X on SPARC. A patch for libpq version 7.4.2 can be found at <ulink id="threadpatch" url= "http://developer.postgresql.org/~wieck/slony1/download/threadsafe-libpq-742.diff.gz"> *************** *** 175,179 **** </para> ! <para>The main list of files installed within the PostgreSQL instance is:</para> <itemizedlist> <listitem><para><filename> $bindir/slon</filename></para></listitem> --- 175,180 ---- </para> ! <para>The main list of files installed within the &postgres; instance ! is, for versions of &slony1; up to 1.2.x:</para> <itemizedlist> <listitem><para><filename> $bindir/slon</filename></para></listitem> *************** *** 191,204 **** </itemizedlist> ! <para> (Note that as things change, the list of version-specific files ! may grow...) </para> <para>The <filename>.sql</filename> files are not fully substituted ! yet. And yes, both the 7.3, 7.4 and the 8.0 files get installed on every ! system, irrespective of its version. The <xref linkend="slonik"> ! admin utility does namespace/cluster substitutions within these files, ! and loads the files when creating replication nodes. At that point in ! time, the database being initialized may be remote and may run a ! different version of &postgres; than that of the local host.</para> <para> At the very least, the two shared objects installed in the --- 192,207 ---- </itemizedlist> ! <para> (Note that as things have change, the list of version-specific ! files has tended to grow...) </para> <para>The <filename>.sql</filename> files are not fully substituted ! yet. And yes, versions for all supported versions of &postgres; ! (<emphasis>e.g.</emphasis> - such as 7.3, 7.4 8.0) get installed on ! every system, irrespective of its version. The <xref ! linkend="slonik"> admin utility does namespace/cluster substitutions ! within these files, and loads the files when creating replication ! nodes. At that point in time, the database being initialized may be ! remote and may run a different version of &postgres; than that of the ! local host.</para> <para> At the very least, the two shared objects installed in the *************** *** 207,210 **** --- 210,232 ---- may be able to be loaded remotely from other hosts.) </para> + <para> In &slony1; version 2.0, this changes:</para> + <itemizedlist> + <listitem><para><filename> $bindir/slon</filename></para></listitem> + <listitem><para><filename> $bindir/slonik</filename></para></listitem> + <listitem><para><filename> $libdir/slony1_funcs$(DLSUFFIX)</filename></para></listitem> + <listitem><para><filename> $datadir/slony1_base.sql</filename></para></listitem> + <listitem><para><filename> $datadir/slony1_funcs.sql</filename></para></listitem> + </itemizedlist> + + <note> <para> Note the loss of <filename>xxid.so</filename> - the txid + data type introduced in &postgres; 8.3 makes it + obsolete. </para></note> + + <note> <para> &slony1; 2.0 gives up compatibility with versions of + &postgres; prior to 8.3, and hence <quote>resets</quote> the + version-specific base function handling. There may be function files + for version 8.3, 8.4, and such, as replication-relevant divergences of + &postgres; functionality take place. </para></note> + </sect2> *************** *** 219,224 **** <para> This is only built if you specify <command>--with-docs</command></para> ! <para> Note that you may have difficulty building the documentation on Red ! Hat-based systems due to NAMELEN being set way too low. Havoc Pennington opened a bug on this back in mid-2001, back in the days of Red Hat 7.1; Red Hat Software has assigned the bug, but there does not --- 241,246 ---- <para> This is only built if you specify <command>--with-docs</command></para> ! <para> Note that you may have difficulty building the documentation on ! Red Hat-based systems due to NAMELEN being set way too low. Havoc Pennington opened a bug on this back in mid-2001, back in the days of Red Hat 7.1; Red Hat Software has assigned the bug, but there does not *************** *** 226,231 **** indicates that there is intent to address the issue by bumping up the value of NAMELEN in some future release of Red Hat Enterprise Linux, ! but that won't likely help you in 2005. Current Fedora releases have already ! addressed this issue. </para> <para> --- 248,254 ---- indicates that there is intent to address the issue by bumping up the value of NAMELEN in some future release of Red Hat Enterprise Linux, ! but that may not help you if you are using an elder version where this ! will never be rectified. Current Fedora releases have already ! addressed this issue. </para> <para> *************** *** 257,261 **** <para>The RPMs are available at <ulink ! url="http://yum.pgsqlrpms.org"> &postgres RPM Repository </ulink>. Please read the howto provided in the website for configuring yum to use that repository. Please note that the RPMs will look for RPM --- 280,284 ---- <para>The RPMs are available at <ulink ! url="http://yum.pgsqlrpms.org"> &postgres RPM Repository </ulink>. Please read the howto provided in the website for configuring yum to use that repository. Please note that the RPMs will look for RPM *************** *** 264,268 **** &postgres;.</para> ! <para>Installing &slony1; using these RPMs is as easy as installing any RPM.</para> --- 287,291 ---- &postgres;.</para> ! <para>Installing &slony1; using these RPMs is as easy as installing any RPM.</para> Index: failover.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/failover.sgml,v retrieving revision 1.23 retrieving revision 1.23.2.1 diff -C2 -d -r1.23 -r1.23.2.1 *** failover.sgml 4 Oct 2006 16:09:30 -0000 1.23 --- failover.sgml 30 Apr 2009 16:06:10 -0000 1.23.2.1 *************** *** 41,53 **** on node1. Both databases are up and running and replication is more or less in sync. We do controlled switchover using <xref ! linkend="stmtmoveset">. <itemizedlist> <listitem><para> At the time of this writing switchover to another ! server requires the application to reconnect to the database. So in ! order to avoid any complications, we simply shut down the web server. ! Users who use <application>pg_pool</application> for the applications database ! connections merely have to shut down the pool.</para></listitem> <listitem><para> A small <xref linkend="slonik"> script executes the --- 41,96 ---- on node1. Both databases are up and running and replication is more or less in sync. We do controlled switchover using <xref ! linkend="stmtmoveset">.</para> <itemizedlist> <listitem><para> At the time of this writing switchover to another ! server requires the application to reconnect to the new database. So ! in order to avoid any complications, we simply shut down the web ! server. Users who use <application>pg_pool</application> for the ! applications database connections merely have to shut down the ! pool.</para> ! ! <para> What needs to be done, here, is highly dependent on the way ! that the application(s) that use the database are configured. The ! general point is thus: Applications that were connected to the old ! database must drop those connections and establish new connections to ! the database that has been promoted to the <quote/master/ role. There ! are a number of ways that this may be configured, and therefore, a ! number of possible methods for accomplishing the change:</para> ! ! <itemizedlist> ! ! <listitem><para> The application may store the name of the database in ! a file.</para> ! ! <para> In that case, the reconfiguration may require changing the ! value in the file, and stopping and restarting the application to get ! it to point to the new location. ! </para> </listitem> ! ! <listitem><para> A clever usage of DNS might involve creating a CNAME ! <ulink url="http://www.iana.org/assignments/dns-parameters"> DNS ! record </ulink> that establishes a name for the application to use to ! reference the node that is in the <quote>master</quote> role.</para> ! ! <para> In that case, reconfiguration would require changing the CNAME ! to point to the new server, and possibly restarting the application to ! refresh database connections. ! </para> </listitem> ! ! <listitem><para> If you are using <application>pg_pool</application> or some ! similar <quote>connection pool manager,</quote> then the reconfiguration ! involves reconfiguring this management tool, but is otherwise similar ! to the DNS/CNAME example above. </para> </listitem> ! ! </itemizedlist> ! ! <para> Whether or not the application that accesses the database needs ! to be restarted depends on how it is coded to cope with failed ! database connections; if, after encountering an error it tries ! re-opening them, then there may be no need to restart it. </para> ! ! </listitem> <listitem><para> A small <xref linkend="slonik"> script executes the *************** *** 77,81 **** seconds.</para></listitem> ! </itemizedlist></para> <para> You may now simply shutdown the server hosting node1 and do --- 120,124 ---- seconds.</para></listitem> ! </itemizedlist> <para> You may now simply shutdown the server hosting node1 and do *************** *** 90,93 **** --- 133,141 ---- be any loss of data.</para> + <para> After performing the configuration change, you should, as <xref + linkend="bestpractices">, run the <eststate; scripts in order to + validate that the cluster state remains in good order after this + change. </para> + </sect2> <sect2><title> Failover</title> *************** *** 141,151 **** will receive anything from node1 any more.</para> </listitem> ! <listitem> ! <para> Reconfigure and restart the application (or <application>pgpool</application>) to cause it to reconnect to ! node2.</para> ! </listitem> <listitem> <para> Purge out the abandoned node </para> --- 189,205 ---- will receive anything from node1 any more.</para> + <note><para> Note that in order for node 2 to be considered as a + candidate for failover, it must have been set up with the <xref + linkend="stmtsubscribeset"> option <command>forwarding = + yes</command>, which has the effect that replication log data is + collected in &sllog1;/&sllog2; on node 2. If replication log data is + <emphasis>not</emphasis> being collected, then failover to that node + is not possible. </para></note> + </listitem> ! <listitem> <para> Reconfigure and restart the application (or <application>pgpool</application>) to cause it to reconnect to ! node2.</para> </listitem> <listitem> <para> Purge out the abandoned node </para> *************** *** 154,162 **** set of references to node 1 in <xref linkend="table.sl-node">, as well as in referring tables such as <xref linkend="table.sl-confirm">; ! since data in <xref linkend="table.sl-log-1"> is still present, ! &slony1; cannot immediately purge out the node. </para> ! <para> After the failover is complete and node2 accepts ! write operations against the tables, remove all remnants of node1's configuration information with the <xref linkend="stmtdropnode"> command: --- 208,216 ---- set of references to node 1 in <xref linkend="table.sl-node">, as well as in referring tables such as <xref linkend="table.sl-confirm">; ! since data in &sllog1;/&sllog2; is still present, &slony1; cannot ! immediately purge out the node. </para> ! <para> After the failover is complete and node2 accepts write ! operations against the tables, remove all remnants of node1's configuration information with the <xref linkend="stmtdropnode"> command: *************** *** 177,184 **** --- 231,319 ---- </listitem> + + <listitem> <para> After performing the configuration change, you + should, as <xref linkend="bestpractices">, run the <eststate; + scripts in order to validate that the cluster state remains in good + order after this change. </para> </listitem> + </itemizedlist> </sect2> + <sect2 id="complexfailover"> <title> Failover With Complex Node Set </title> + + <para> Failover is relatively <quote/simple/ if there are only two + nodes; if a &slony1; cluster comprises many nodes, achieving a clean + failover requires careful planning and execution. </para> + + <para> Consider the following diagram describing a set of six nodes at two sites. + + <inlinemediaobject> <imageobject> <imagedata fileref="complexenv.png"> + </imageobject> <textobject> <phrase> Symmetric Multisites </phrase> + </textobject> </inlinemediaobject></para> + + <para> Let us assume that nodes 1, 2, and 3 reside at one data + centre, and that we find ourselves needing to perform failover due to + failure of that entire site. Causes could range from a persistent + loss of communications to the physical destruction of the site; the + cause is not actually important, as what we are concerned about is how + to get &slony1; to properly fail over to the new site.</para> + + <para> We will further assume that node 5 is to be the new origin, + after failover. </para> + + <para> The sequence of &slony1; reconfiguration required to properly + failover this sort of node configuration is as follows: + </para> + + <itemizedlist> + + <listitem><para> Resubscribe (using <xref linkend="stmtsubscribeset"> + ech node that is to be kept in the reformation of the cluster that is + not already subscribed to the intended data provider. </para> + + <para> In the example cluster, this means we would likely wish to + resubscribe nodes 4 and 6 to both point to node 5.</para> + + <programlisting> + include </tmp/failover-preamble.slonik>; + subscribe set (id = 1, provider = 5, receiver = 4); + subscribe set (id = 1, provider = 5, receiver = 4); + </programlisting> + + </listitem> + <listitem><para> Drop all unimportant nodes, starting with leaf nodes.</para> + + <para> Since nodes 1, 2, and 3 are unaccessible, we must indicate the + <envar>EVENT NODE</envar> so that the event reaches the still-live + portions of the cluster. </para> + + <programlisting> + include </tmp/failover-preamble.slonik>; + drop node (id=2, event node = 4); + drop node (id=3, event node = 4); + </programlisting> + + </listitem> + + <listitem><para> Now, run <command>FAILOVER</command>.</para> + + <programlisting> + include </tmp/failover-preamble.slonik>; + failover (id = 1, backup node = 5); + </programlisting> + + </listitem> + + <listitem><para> Finally, drop the former origin from the cluster.</para> + + <programlisting> + include </tmp/failover-preamble.slonik>; + drop node (id=1, event node = 4); + </programlisting> + </listitem> + + </itemizedlist> + <sect2><title> Automating <command> FAIL OVER </command> </title> *************** *** 207,211 **** to forcibly knock the failed node off the network in order to prevent applications from getting confused. This could take place via having ! an SNMP interface that does some combination of the following: <itemizedlist> --- 342,346 ---- to forcibly knock the failed node off the network in order to prevent applications from getting confused. This could take place via having ! an SNMP interface that does some combination of the following:</para> <itemizedlist> *************** *** 228,232 **** </itemizedlist> - </para> </sect2> --- 363,366 ---- Index: slony.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/slony.sgml,v retrieving revision 1.36.2.1 retrieving revision 1.36.2.2 diff -C2 -d -r1.36.2.1 -r1.36.2.2 *** slony.sgml 5 Sep 2007 21:36:31 -0000 1.36.2.1 --- slony.sgml 30 Apr 2009 16:06:10 -0000 1.36.2.2 *************** *** 45,49 **** --- 45,61 ---- <!ENTITY sllog1 "<xref linkend=table.sl-log-1>"> <!ENTITY sllog2 "<xref linkend=table.sl-log-2>"> + <!ENTITY slseqlog "<xref linkend=table.sl-seqlog>"> <!ENTITY slconfirm "<xref linkend=table.sl-confirm>"> + + <!ENTITY slevent "<xref linkend=table.sl-event>"> + <!ENTITY slnode "<xref linkend=table.sl-node>"> + <!ENTITY slpath "<xref linkend=table.sl-path>"> + <!ENTITY sllisten "<xref linkend=table.sl-listen>"> + <!ENTITY slregistry "<xref linkend=table.sl-registry>"> + <!ENTITY slsetsync "<xref linkend=table.sl-setsync>"> + <!ENTITY slsubscribe "<xref linkend=table.sl-subscribe>"> + <!ENTITY sltable "<xref linkend=table.sl-table>"> + <!ENTITY slset "<xref linkend=table.sl-set>"> + <!ENTITY rplainpaths "<xref linkend=plainpaths>"> <!ENTITY rlistenpaths "<xref linkend=listenpaths>"> *************** *** 51,54 **** --- 63,67 ---- <!ENTITY lslon "<xref linkend=slon>"> <!ENTITY lslonik "<xref linkend=slonik>"> + <!ENTITY lteststate "<xref linkend=testslonystate>"> ]> *************** *** 94,98 **** --- 107,113 ---- &listenpaths; &plainpaths; + &triggers; &locking; + &raceconditions; &addthings; &dropthings; *************** *** 107,112 **** &loganalysis; &help; </article> - <article id="faq"> --- 122,128 ---- &loganalysis; &help; + &supportedplatforms; + &releasechecklist; </article> <article id="faq"> *************** *** 134,139 **** </part> - &supportedplatforms; - &releasechecklist; &schemadoc; &bookindex; --- 150,153 ---- Index: partitioning.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/partitioning.sgml,v retrieving revision 1.1.2.3 retrieving revision 1.1.2.4 diff -C2 -d -r1.1.2.3 -r1.1.2.4 *** partitioning.sgml 7 Mar 2008 19:05:11 -0000 1.1.2.3 --- partitioning.sgml 30 Apr 2009 16:06:10 -0000 1.1.2.4 *************** *** 74,81 **** </itemizedlist> ! <para> There are several stored functions provided to support this, ! for &postgres; 8.1 and newer; the Gentle User may use whichever seems ! preferable. The <quote>base function</quote> is ! <function>add_empty_table_to_replication()</function>; the others provide additional structure and validation of the arguments </para> --- 74,80 ---- </itemizedlist> ! <para> There are several stored functions provided to support this; ! the Gentle User may use whichever seems preferable. The <quote>base ! function</quote> is <function>add_empty_table_to_replication()</function>; the others provide additional structure and validation of the arguments </para> *************** *** 107,112 **** with confidence to add any table to replication that is known to be empty. </para> </note> - </sect2> </sect1> --- 106,111 ---- with confidence to add any table to replication that is known to be empty. </para> </note> + </sect2> </sect1> Index: logshipping.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/logshipping.sgml,v retrieving revision 1.16.2.5 retrieving revision 1.16.2.6 diff -C2 -d -r1.16.2.5 -r1.16.2.6 *** logshipping.sgml 24 Oct 2007 17:49:35 -0000 1.16.2.5 --- logshipping.sgml 30 Apr 2009 16:06:10 -0000 1.16.2.6 *************** *** 270,274 **** start transaction; ! select "_T1".setsyncTracking_offline(1, '655', '656', '2005-09-23 18:37:40.206342'); -- end of log archiving header </programlisting></para></listitem> --- 270,274 ---- start transaction; ! select "_T1".setsyncTracking_offline(1, '655', '656', '2007-09-23 18:37:40.206342'); -- end of log archiving header </programlisting></para></listitem> *************** *** 282,286 **** start transaction; ! select "_T1".setsyncTracking_offline(1, '96', '109', '2005-09-23 19:01:31.267403'); -- end of log archiving header </programlisting></para> --- 282,286 ---- start transaction; ! select "_T1".setsyncTracking_offline(1, '96', '109', '2007-09-23 19:01:31.267403'); -- end of log archiving header </programlisting></para> *************** *** 344,347 **** --- 344,370 ---- </sect2> + + <sect2><title> <application> find-triggers-to-deactivate.sh + </application> </title> + + <indexterm><primary> trigger deactivation </primary> </indexterm> + + <para> It was once pointed out (<ulink + url="http://www.slony.info/bugzilla/show_bug.cgi?id=19"> Bugzilla bug + #19</ulink>) that the dump of a schema may include triggers and rules + that you may not wish to have running on the log shipped node.</para> + + <para> The tool <filename> tools/find-triggers-to-deactivate.sh + </filename> was created to assist with this task. It may be run + against the node that is to be used as a schema source, and it will + list the rules and triggers present on that node that may, in turn + need to be deactivated.</para> + + <para> It includes <function>logtrigger</function> and <function>denyaccess</function> + triggers which will may be left out of the extracted schema, but it is + still worth the Gentle Administrator verifying that such triggers are + kept out of the log shipped replica.</para> + + </sect2> <sect2> <title> <application>slony_logshipper </application> Tool </title> *************** *** 382,385 **** --- 405,409 ---- <listitem><para> <command>post processing command = 'gzip -9 $inarchive';</command></para> <para> Pre- and post-processign commands are executed via <function>system(3)</function>. </para> </listitem> </itemizedlist> + <para> An <quote>@</quote> as the first character causes the exit code to be ignored. Otherwise, a nonzero exit code is treated as an error and causes processing to abort. </para> *************** *** 399,405 **** <para> In the example shown, this sends an email to the DBAs upon encountering an error.</para> </listitem> - </itemizedlist> - <itemizedlist> <listitem><para> Archive File Names</para> --- 423,427 ---- Index: supportedplatforms.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/supportedplatforms.sgml,v retrieving revision 1.8.2.2 retrieving revision 1.8.2.3 diff -C2 -d -r1.8.2.2 -r1.8.2.3 *** supportedplatforms.sgml 17 Nov 2006 09:00:51 -0000 1.8.2.2 --- supportedplatforms.sgml 30 Apr 2009 16:06:10 -0000 1.8.2.3 *************** *** 1,3 **** ! <article id="supportedplatforms"> <title>&slony1; Supported Platforms</title> --- 1,3 ---- ! <sect1 id="supportedplatforms"> <title>&slony1; Supported Platforms</title> *************** *** 10,14 **** </para> ! <para> Last updated: Nov 17, 2006</para> <para>If you experience problems in these platforms, please subscribe to --- 10,14 ---- </para> ! <para> Last updated: Jun 23, 2005</para> <para>If you experience problems in these platforms, please subscribe to *************** *** 132,162 **** <row> - <entry>Fedora Core</entry> - <entry>5</entry> - <entry>x86</entry> - <entry>Nov 17, 2006</entry> - <entry>devrim at CommandPrompt.com</entry> - <entry>&postgres; Version: 8.1.5</entry> - </row> - - <row> - <entry>Fedora Core</entry> - <entry>6</entry> - <entry>x86</entry> - <entry>Nov 17, 2006</entry> - <entry>devrim at CommandPrompt.com</entry> - <entry>&postgres; Version: 8.1.5</entry> - </row> - - <row> - <entry>Fedora Core</entry> - <entry>6</entry> - <entry>x86_64</entry> - <entry>Nov 17, 2006</entry> - <entry>devrim at CommandPrompt.com</entry> - <entry>&postgres; Version: 8.1.5</entry> - </row> - - <row> <entry>Red Hat Linux</entry> <entry>9</entry> --- 132,135 ---- *************** *** 204,208 **** </tgroup> </table> ! </article> <!-- Keep this comment at the end of the file Local variables: --- 177,181 ---- </tgroup> </table> ! </sect1> <!-- Keep this comment at the end of the file Local variables: Index: slon.sgml =================================================================== RCS file: /home/cvsd/slony1/slony1-engine/doc/adminguide/slon.sgml,v retrieving revision 1.29.2.4 retrieving revision 1.29.2.5 diff -C2 -d -r1.29.2.4 -r1.29.2.5 *** slon.sgml 27 Mar 2008 21:01:30 -0000 1.29.2.4 --- slon.sgml 30 Apr 2009 16:06:10 -0000 1.29.2.5 *************** *** 64,71 **** <para> The first five non-debugging log levels (from Fatal to ! Info) are <emphasis>always</emphasis> displayed in the logs. If ! <envar>log_level</envar> is set to 2 (a routine, and, seemingly, ! preferable choice), then output at debugging levels 1 and 2 will ! also be displayed.</para> </listitem> --- 64,74 ---- <para> The first five non-debugging log levels (from Fatal to ! Info) are <emphasis>always</emphasis> displayed in the logs. In ! early versions of &slony1;, the <quote>suggested</quote> ! <envar>log_level</envar> value was 2, which would list output at ! all levels down to debugging level 2. In &slony1; version 2, it ! is recommended to set <envar>log_level</envar> to 0; most of the ! consistently interesting log information is generated at levels ! higher than that. </para> </listitem> *************** *** 149,153 **** </itemizedlist> ! <para> Default is 10000 ms and maximum is 120000 ms. By default, you can expect each node to <quote>report in</quote> with a --- 152,156 ---- </itemizedlist> ! <para> Default is 10000 ms and maximum is 120000 ms. By default, you can expect each node to <quote>report in</quote> with a *************** *** 219,223 **** </para> <para> ! In &slony1; version 1.1 and later versions the <application>slon</application> instead adaptively <quote>ramps up</quote> from doing 1 <command>SYNC</command> at a time towards the maximum group --- 222,226 ---- </para> <para> ! In &slony1; version 1.1 and later versions, the <application>slon</application> instead adaptively <quote>ramps up</quote> from doing 1 <command>SYNC</command> at a time towards the maximum group
- Previous message: [Slony1-commit] slony1-engine/src/slonik slonik.c
- Next message: [Slony1-commit] slony1-engine/doc/adminguide adminscripts.sgml failover.sgml faq.sgml firstdb.sgml installation.sgml monitoring.sgml prerequisites.sgml slonconf.sgml slonik_ref.sgml
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-commit mailing list