[Slony1-general] My small success story (and a few remarks)

Fri Jun 6 15:38:49 PDT 2008

Well, just wanted to say "Thank you" to Slony authors, and also
report, that after a 3 evenings of docreading, about 4 hours
of actual "write plan + write scripts + run them" work, and
about 3 hours of waiting, I managed to:

- upgrade two databases from postgresql 8.1 to 8.3 using slony to
  migrate the data without downtime
- setup slony replication to backup machine and actually replicate
  my ~1.5GB database over WAN

A few observations/remarks/suggestions:

- perl tools are helpful, as it is easier to patch scripts than to
  write them from scratch, but are often very stupid. One should
  always review what they generated and correct it before running.
  For example:

  - slonik_create_set defaults to public. schema wherever the tables truly are
  - slonik_init_cluster does not handle "the same database is achieved by
    different means from different places" config (perfectly
    handled by slonik)
  - slonik_init_cluster generates store path for all database pairs, even if
    given databases are totally unrelated (are not in the same
    set)
  - no script uses DEFINE and INCLUDE
  - databases running on the same machine (but different ports) get the
    same comment
  - I could run slon deamon directly from slon_tools.conf, but in no way
    could generate slon.conf from slon_tools.conf

- I applied the following policy (which helped me noticeably, I feel):

  - always review and edit what slonik_* scripts produced before running
  - use slonik_print_preamble to create preamble file, add there DEFINEs
    for all node and sets identifiers, then include this file to every
    script and use @identifiers instead of constants
  - always read the reference doc for every instruction slonik_* scripts
    suggested
  - use non-overlapping numeric identifiers (so my nodes are 1,2,3,...;
    my sets are 11,21,31, ...; my tables are 101, 102, .. (in set 11),
    201, 202 (in set 21), ... - this way in most cases I can deduce
    from the number itself what does it mean
    { I understand that numeric ids are used for performance reasons,
      applying such a convention and using DEFINEs in docs would make
      them noticealby less criptic }

- error messages are good and helpful, but one really should read docs
  and have some picture of things 

- reference doc (slonik instrucitons etc) is very good, but introductory
  doc is worse - there are noticeable fragments clearly related to
  obsolete/legacy functionality, pgbench replication example is
  oversimplified and can't be used as basis for true plan, discussions
  about alternative techniques are somewhat scattered. Some refresh of
  those docs could make sense. I found Readme.Debian, which clearly
  enumerated replication steps and named alternatives, to be of great
  help, before I found it I felt lost.

  { and yes, it really helps to read reference docs, I recommend it
    to those who feel lost }

- most important obstacle I faced while using slony for upgrade was
  ... slony version mismatch. Simply and just, Debian packaged
  slony 1-2-1 with postgres 8.1 and 1-2-13 with 8.3. I managed
  to compile 1-2-13 for 8.1, but it required some messing with
  packages (downgrading libpq-dev to 8.1-compatible, guessing
  some lacking packages configure did not test for - for example
  flex and bison, and postgresql-8.1-server-dev)

  Not sure whether this is remark to Slony, or maybe to packagers,
  but it would make sense to support such upgrades better (for
  example, by providing new slony compiled against old database).
  Slony docs could also mention those preparations in the chapter
  about database upgrades.

- at least on Debian and Ubuntu scripts like test_slony_state
  (in general, most scripts from tools dir) are not packaged, it could
  make sense to check why and recommend packaging them, at least as
  samples

- on Debian and Ubuntu for some reason debug=4 is used by default,
  filling logs with a lot of useless information

- a few times I found it to be unexpected that different things
  happen async (for example, case of moving master, when there
  was a few minutes when all bases could not be updated), it would
  make sense to mention this in docs (and maybe add some WAITs
  in slonik_* generated scripts)

- it was great discovery for me that I need slony_tools.conf only
  on admin machine and run slonik only just from there - I feel
  that such a picture could be of some help somewhere in docs

- SLONYNODES do not work anymore, there is --config=<> option
  in slonik_*

- cluster initialization (/node adding) could install plpgsql
  instead o complaining that it is not found

- it would be nice to have a way to make report of the replication
  structure (who is pulling which table from whom) and maybe just
  regenerate slonik scripts out of the real configuration

- I wanted to configure slony for low-mem, low-io config, mostly
  experimented with slon.conf for this. Some example "big fullspeed
  site" and "small light site" configs could be nice to have.

- to test whether initial replication has finished I used
  simple SELECT from destination table (which was blocked
  until initial replication finished). I am curious whether
  there is a better way.

- it would be nice if "node ... admin conninfo ..." checked
  whether the connection works and reported problems, a few times I
  found myself patching some pg_hba.conf while subscribing nodes or
  defining sets

Best regards and thanks everybody for this pleasant tool.

-- 
----------------------------------------------------------------------
| Marcin Kasperski   | We want to know as early as possible whether
| http://mekk.waw.pl | the project will succeed. Thus we attack the
|                    |        high-risk areas first. (Martin)
----------------------------------------------------------------------