Michael Alan Dorman mdorman
Sun Nov 7 02:29:26 PST 2004
I just spent the last three months of the run-up to the US Elections
working for a political organization in Washington, DC.

This organization is a MySQL shop, and I spent my first two months
there fighting with, and generally being unimpressed by, MySQL; I
hadn't used it since before InnoDB existed, but even that didn't seem
to keep it from doing badly things that I had used PostgreSQL for
without problems.

By the time they asked me to create a system for internal election-day
communication at the beginning of October---a ludicrously abbreviated
development cycle---I told them the only way I felt I could produce a
reliable product on that sort of schedule was if I could back-end it
against PostgreSQL.  They let me do so, with the caveat that that
meant that I was *IT*---the box was mine for anything but hardware or
kernel issues.

Another, even more important, election-day system was being developed
on the same insane schedule, and that developer elected to use the
same PostgreSQL install.

Just about the last thing that we got done was replication, which
wasn't up and running until the morning of November 1st---less than 24
hours until the election.  Slony-I worked great (although the docs can
be a little opaque the first time out), and we figured this was just a
security blanket.  We did some dumps and test loads, compared sizes,
etc., but we didn't have time to test failover or anything.

I arrived at the campaign HQ, pillow in hand, at 4:30AM on Tuesday
morning.  After some network silliness, we did our final pre-start
push of the software, and everyone considered getting in a nap.

At 6:00AM, with polls opening in an hour, our primary database server
went down.  When it came back up, things seemed stable for a few
minutes before its hard drive controller failed entirely.

We were back up and in operation in under 15 minutes.  If I hadn't
been so tired, we would have had replication back up within a few
minutes of the hard drives from the primary being dropped into another
chassis---I kept reinitializing the system to be a slave, and then
forgetting to start the actual replication daemon.

Slony-I saved my ass, both in its performance in the pinch, and in
being simple enough to set up that I was able to bring it up so late
in the game without it causing problems.  I owe everyone who has
contributed to it a great debt of thanks.  Or beer, take your pick.

Mike


More information about the Slony1-general mailing list