Slony-I Administration

Slony-I Administration

The PostgreSQL Global Development Group

Christopher Browne


1. Slony-I "Best Practices"

It is common for managers to have a desire to operate systems using some available, documented set of "best practices." Documenting that sort of thing is essential to ISO 9000, ISO 9001, and other sorts of organizational certifications.

It is worthwhile to preface a discussion of "best practices" by mentioning that each organization that uses Slony-I is unique, and there may be a need for local policies to reflect unique local operating characteristics. It is for that reason that Slony-I does not impose its own policies for such things as failover ; those will need to be determined based on the overall shape of your network, of your set of database servers, and of your usage patterns for those servers.

There are, however, a number of things that early adopters of Slony-I have discovered which can at least help to suggest the sorts of policies you might want to consider.

  • Slony-I is a complex multi-client, multi-server system, with the result that there are almost an innumerable set of places where problems can arise.

    As a natural result, maintaining a clean, consistent environment is really valuable, as any sort of environmental "messiness" can either cause unexpected problems or mask the real problem.

    Numerous users have reported problems resulting from mismatches between Slony-I versions, local libraries, and PostgreSQL libraries. Details count: you need to be clear on what hosts are running what versions of what software.

    This is normally a matter of being disciplined about how your software is deployed, and the challenges represent a natural consequence of being a distributed system comprised of a large number of components that need to match.

  • If a slonik script does not run as expected in a first attempt, it would be foolhardy to attempt to run it again until a problem has been found and resolved.

    There are a very few slonik commands such as SLONIK STORE PATH that behave in a nearly idempotent manner; if you run SLONIK STORE PATH again, that merely updates table sl_path with the same value.

    In contrast SLONIK SUBSCRIBE SET behaves in two very different ways depending on whether the subscription has been activated yet or not; if initiating the subscription didn't work at a first attempt, submitting the request again won't help make it happen.

  • Principle: Use an unambiguous, stable time zone such as UTC or GMT.

    Users have run into problems with slon functioning properly when their system uses a time zone that PostgreSQL was unable to recognize such as CUT0 or WST. It is necessary that you use a timezone that PostgreSQL can recognize correctly. It is furthermore preferable to use a time zone where times do not shift around due to Daylight Savings Time.

    The "geographically unbiased" choice seems to be TZ=UTC or TZ=GMT, and to make sure that systems are "in sync" by using NTP to synchronize clocks throughout the environment.

    See also Section 3.4.

  • Principle: Long running transactions are Evil

    The FAQ has an entry on growth of pg_listener which discusses this in a fair bit of detail; the long and short is that long running transactions have numerous ill effects. They are particularly troublesome on an "origin" node, holding onto locks, preventing vacuums from taking effect, and the like.

    In version 1.2, some of the "evils" should be lessened, because:

    • Events in pg_listener are only generated when replication updates are relatively infrequent, which should mean that busy systems won't generate many dead tuples in that table

    • The system will periodically rotate (using TRUNCATE to clean out the old table) between the two log tables, sl_log_1 and sl_log_2, preventing unbounded growth of dead space there.

  • Failover policies should be planned for ahead of time.

    Most pointedly, any node that is expected to be a failover target must have its subscription(s) set up with the option FORWARD = YES. Otherwise, that node is not a candidate for being promoted to origin node.

    This may simply involve thinking about what the priority lists should be of what should fail to what, as opposed to trying to automate it. But knowing what to do ahead of time cuts down on the number of mistakes made.

    At Afilias, a variety of internal [The 3AM Unhappy DBA's Guide to...] guides have been created to provide checklists of what to do when certain "unhappy" events take place. This sort of material is highly specific to the environment and the set of applications running there, so you would need to generate your own such documents. This is one of the vital components of any disaster recovery preparations.

  • SLONIK MOVE SET should be used to allow preventative maintenance to prevent problems from becoming serious enough to require failover .

  • VACUUM policy needs to be carefully defined.

    As mentioned above, "long running transactions are Evil." VACUUMs are no exception in this. A VACUUM on a huge table will open a long-running transaction with all the known ill effects.

  • If you are using the autovacuum process in recent versions of PostgreSQL, you may wish to leave Slony-I tables out, as Slony-I is a bit more intelligent about vacuuming when it is expected to be conspicuously useful (e.g. - immediately after purging old data) to do so than autovacuum can be.

    See Section 6.1 for more details.

  • Running all of the slon daemons on a central server for each network has proven preferable.

    Each slon should run on a host on the same local network as the node that it is servicing, as it does a lot of communications with its database, and that connection needs to be as reliable as possible.

    In theory, the "best" speed might be expected to come from running the slon on the database server that it is servicing.

    In practice, strewing slon processes and configuration across a dozen servers turns out to be inconvenient to manage.

  • slon processes should run in the same "network context" as the node that each is responsible for managing so that the connection to that node is a "local" one. Do not run such links across a WAN. Thus, if you have nodes in London and nodes in New York, the slons managing London nodes should run in London, and the slons managing New York nodes should run in New York.

    A WAN outage (or flakiness of the WAN in general) can leave database connections "zombied", and typical TCP/IP behaviour will allow those connections to persist, preventing a slon restart for around two hours.

    It is not difficult to remedy this; you need only kill SIGINT the offending backend connection. But by running the slon locally, you will generally not be vulnerable to this condition.

  • Before getting too excited about having fallen into some big problem, consider killing and restarting all the slon processes. Historically, this has frequently been able to resolve "stickiness."

    With a very few exceptions, it is generally not a big deal to kill off and restart the slon processes. Each slon connects to one database for which it is the manager, and then connects to other databases as needed to draw in events. If you kill off a slon, all you do is to interrupt those connections. If a SYNC or other event is sitting there half-processed, there's no problem: the transaction will roll back, and when the slon restarts, it will restart that event from scratch.

    The exception scenario where it is undesirable to restart a slon is where a COPY_SET is running on a large replication set, such that stopping the slon may discard several hours worth of load work.

    In early versions of Slony-I, it was frequently the case that connections could get a bit "deranged" which restarting slons would clean up. This has become much more rare, but it has occasionally proven useful to restart the slon. If there has been any "network derangement", this can clear up the issue of defunct database connections.

  • The Database Schema Changes section outlines some practices that have been found useful for handling changes to database schemas.

  • Handling of Primary Keys

    Discussed in the section on Replication Sets, it is ideal if each replicated table has a true primary key constraint; it is acceptable to use a "candidate primary key."

    It is not recommended that a Slony-I-defined key (created via SLONIK TABLE ADD KEY) be used to introduce a candidate primary key, as this introduces the possibility that updates to this table can fail due to the introduced unique index, which means that Slony-I has introduced a new failure mode for your application.

    Warning

    In version 2 of Slony-I, SLONIK TABLE ADD KEY is no longer supported. You must have either a true primary key or a candidate primary key.

  • Grouping tables into sets suggests strategies for determining how to group tables and sequences into replication sets.

  • It should be obvious that actions that can delete a lot of data should be taken with great care; the section on Dropping things from Slony-I Replication discusses the different sorts of "deletion" that Slony-I supports.

  • Locking issues

    Certain Slony-I operations, notably set add table , move set , lock set , and execute script require acquiring exclusive locks on the tables being replicated.

    Depending on the kind of activity on the databases, this may or may not have the effect of requiring a (hopefully brief) database outage.

  • What to do about DDL.

    Slony-I operates via detecting updates to table data via triggers that are attached to those tables. That means that updates that take place via methods that do not fire triggers will not notice those updates. ALTER TABLE, CREATE OR REPLACE FUNCTION, CREATE TABLE, all represent SQL requests that Slony-I has no way to notice.

    A philosophy underlying Slony-I's handling of this is that competent system designers do not write self-modifying code, and database schemas that get modified by the application are an instance of this. It does not try hard to make it convenient to modify database schemas.

    There will be cases where that is necessary, so the execute script is provided which will apply DDL changes at the same location in the transaction stream on all servers.

    Unfortunately, this introduces a great deal of locking of database objects. Altering tables requires taking out an exclusive lock on them; doing so via execute script requires that Slony-I take out an exclusive lock on all replicated tables. This can prove quite inconvenient if applications are running when running DDL; Slony-I is asking for those exclusive table locks, whilst, simultaneously, some application connections are gradually relinquishing locks, whilst others are backing up behind the Slony-I locks.

    One particularly dogmatic position that some hold is that all schema changes should always be propagated using execute script. This guarantees that nodes will be consistent, but the costs of locking and deadlocking may be too high for some users.

    At Afilias, our approach has been less dogmatic; there are sorts of changes that must be applied using execute script, but we apply others independently.

    • Changes that must be applied using execute script

      • All instances of ALTER TABLE

    • Changes that are not normally applied using execute script

      • CREATE INDEX

      • CREATE TABLE

        Tables that are not being replicated do not require Slony-I "permission".

      • CREATE OR REPLACE FUNCTION

        Typically, new versions of functions may be done without Slony-I being "aware" of them. The obvious exception is when a new function is being deployed to accomodate a table alteration; in that case, the new version must be added in in a manner synchronized with the execute script for the table alteration.

        Similarly, CREATE TYPE, CREATE AGGREGATE , and such will commonly not need to be forcibly applied in "perfectly synchronized" manner across nodes.

      • Security management, such as CREATE USER, CREATE ROLE , GRANT, and such are largely irrelevant to Slony-I as it runs as a "superuser".

        Indeed, we have frequently found it useful to have different security arrangements on different nodes. Access to the "master" node should be restricted to applications that truly need access to it; "reporting" users commonly are restricted much more there than on subscriber nodes.

  • Slony-I-specific user names.

    It has proven useful to define a slony user for use by Slony-I, as distinct from a generic postgres or pgsql user.

    If all sorts of automatic "maintenance" activities, such as vacuuming and performing backups, are performed under the "ownership" of a single PostgreSQL user, it turns out to be pretty easy to run into deadlock problems.

    For instance, a series of vacuums that unexpectedly run against a database that has a large SUBSCRIBE_SET event under way may run into a deadlock which would roll back several hours worth of data copying work.

    If, instead, different maintenance roles are performed by different users, you may, during vital operations such as SUBSCRIBE_SET, lock out other users at the pg_hba.conf level, only allowing the slony user in, which substantially reduces the risk of problems while the subscription is in progress.

  • Path configuration

    The section on Path Communications discusses the issues surrounding what network connections need to be in place in order for Slony-I to function.

  • Lowering Authority

    Traditionally, it has been stated that "Slony-I needs to use superuser connections." It turns out that this is not entirely true, and and if there are particular concerns about excessive use of superuser accounts, it is possible to reduce this considerably.

    It is true to say that each slon must have a superuser connection in order to manage the node that it is assigned to. It needs to be able to alter the system catalogue in order to set up subscriptions and to process alterations (e.g - to run SLONIK EXECUTE SCRIPT and other events that may alter the role of replicated tables on the local node).

    However, the connections that slon processes open to other nodes to access events and process subcriptions do not need to have nearly so much permission. Indeed, one could set up a "weak user" assigned to all SLONIK STORE PATH requests. The minimal permissions that this user, let's call it weakuser, requires are as follows:

    • It must have read access to the Slony-I-specific namespace

    • It must have read access to all tables and sequences in that namespace

    • It must have write access to the Slony-I table sl_nodelock and sequence sl_nodelock_nl_conncnt_seq

    • At subscribe time, it must have read access to all of the replicated tables.

      Outside of subscription time, there is no need for access to access to the replicated tables.

    • There is some need for read access to tables in pg_catalog; it has not been verified how little access would be suitable.

    In version 1.3, the tests in the Section 25 support using a WEAKUSER so that testing can regularly confirm the minimal set of permissions needed to support replication.

  • The section on listen paths discusses the issues surrounding the table sl_listen.

    As of Slony-I 1.1, its contents are computed automatically based on the communications information available to Slony-I which should alleviate the problems found in earlier versions where this had to be configured by hand. Many seemingly inexplicable communications failures, where nodes failed to talk to one another even though they technically could, were a result of incorrect listen path configuration.

  • Run Section 5.1 frequently to discover configuration problems as early as possible.

    This is a Perl script which connects to a Slony-I node and then rummages through Slony-I configuration looking for quite a variety of conditions that tend to indicate problems, including:

    • Bloating of some config tables

    • Analysis of listen paths

    • Analysis of event propagation and confirmation

    If replication mysteriously "isn't working", this tool can run through many of the possible problems for you.

    It will also notice a number of sorts of situations where something has broken. Not only should it be run when problems have been noticed - it should be run frequently (e.g. - hourly, or thereabouts) as a general purpose "health check" for each Slony-I cluster.

  • Configuring slon

    As of version 1.1, slon configuration may be drawn either from the command line or from configuration files. "Best" practices have yet to emerge from the two options:

  • Configuration via command line options

    This approach has the merit that all the options that are active are visible in the process environment. (And if there are a lot of them, they may be a nuisance to read.)

    Unfortunately, if you invoke slon from the command line, you could forget to include log shipping configuration and thereby destroy the sequence of logs for a log shipping node.

  • Unlike when command line options are used, the active options are not visible. They can only be inferred from the name and/or contents of the slon configuration file, and will not reflect subsequent changes to the configuration file.

    By putting the options in a file, you won't forget including any of them, so this is safer for log shipping.

  • Things to do when subscribing nodes

    When a new node is running the COPY_SET event for a large replication set (e.g. - one which takes several hours to subscribe) it has been found to be desirable to lock all users other than the slony user out of the new subscriber because:

    It is also a very good idea to change slon configuration for slon_conf_sync_interval on the origin node to reduce how many SYNC events are generated. If the subscription takes 8 hours, there is little sense in there being 28800 SYNCs waiting to be applied. Running a SYNC every minute or so is likely to make catching up easier.

  • Applications will run into partially-copied, half-baked data that is not totally consistent.

  • It is possible for applications (and maintenance scripts) to submit combinations of queries that will get the system into a deadlock situation, thereby terminating the COPY_SET event, and requiring the subscription to start over again.

It may be worth considering turning the PostgreSQL fsync functionality off during the copying of data, as this will improve performance, and if the database "falls over" during the COPY_SET event, you will be restarting the copy of the whole replication set.

  • Managing use of slonik

    The notes on Using Slonik describe some of the lessons learned from managing large numbers of slonik scripts.

    Notable principles that have fallen out of generating many slonik scripts are that:

    • Using "preamble" files is highly recommended as it means that you use heavily-verified preambles over and over.

    • Any opportunity that you have to automatically generate configuration whether by drawing it from a database or by using a script that generates repetitively similar elements will help prevent human error.

  • Handling Very Large Replication Sets

    Some users have set up replication on replication sets that are tens to hundreds of gigabytes in size, which puts some added "strain" on the system, in particular where it may take several days for the COPY_SET event to complete. Here are some principles that have been observed for dealing with these sorts of situations.

  • Drop all indices other than the primary key index while the COPY_SET event is run.

    When data is copied into a table that has indices on it, PostgreSQL builds the indices incrementally, on the fly. This is much slower than simply copying the data into the table, and then recreating each index "ex nihilo", as the latter can take substantial advantage of sort memory.

    In Slony-I version 1.1.5 and later versions, indices are dropped and recreated automatically, which effectively invalidates this practice.

  • If there are large numbers of updates taking place as the large set is being copied, this can lead to the subscriber being behind by some enormous number of SYNC events.

    If a SYNC is generated about once per second, that leads to the subscriber "falling behind" by around 90,000 SYNCs per day, possibly for several days.

    There will correspondingly be an enormous growth of sl_log_1, sl_log_2, and sl_seqlog. Unfortunately, once the COPY_SET completes, users have found that the queries against these tables wind up reverting to Seq Scans so that even though a particular SYNC processing event is only processing a small number of those 90,000 SYNC events, it still reads through the entire table. In such a case, you may never see replication catch up.

    Several things can be done that will help, involving careful selection of slon parameters:

  • Ensure that there exists, on the "master" node, an index on sl_log_1(log_xid). If it doesn't exist, as the Slony-I instance was set up before version 1.1.1, see slony1_base.sql for the exact form that the index setup should take.

    In 1.2 and later versions, there is a process that runs automatically to add partial indexes by origin node number, which should be the optimal form for such an index to take.

  • On the subscriber's slon, increase the number of SYNC events processed together, with the slon_conf_sync_group_maxsize parameter to some value that allows it to process a significant portion of the outstanding SYNC events.

  • On the subscriber's slon, set the desired_sync_time to 0, as the adaptive SYNC grouping system will start with small groupings that will, under these circumstances, perform poorly.

  • Increase the slon_conf_sync_interval on the origin's slon so that SYNC events are generated less frequently. If a SYNC is only generated once per minute instead of once per second, that will cut down the number of events by a factor of 60.

  • It is likely to be worthwhile to use slon_conf_vac_frequency to deactivate slon-initiated vacuuming in favor of running your own vacuum scripts, as there will be a buildup of unpurgeable data while the data is copied and the subscriber starts to catch up.