[Slony1-general] PostgreSQL/Slony-I: Purging Queued Replication Events

Fri Mar 30 14:36:50 PDT 2007

Andrew Immerman <ahi_public at immerman.org> writes:

> Hello,
>
> I'm working on a fairly substantial database with several large tables
> (seven orders of magnitude each) replicated.  For the sake of this
> description, consider one database master and one slave where table B
> references table A with ON DELETE CASCADE specified.  In the course of a
> normal day, say 100,000 records are INSERTed/UPDATEd and 90,000 records
> are DELETEd from A.  Replication typically works without issue.
>
> Recently, we executed DELETE FROM A WHERE ... on the master.  As
> expected, this purged nearly 75% of the recordsets in tables A and B.
> Our slave has been unable to maintain synchronization.  The system
> completes a nightly VACUUM ANALYZE job on all objects -- this has been
> disabled from before the massive DELETE.  Taking the advice of this
> list, we attempted a REINDEX of sl_log_1/2, which was unable to complete
> without disrupting other services -- eventually the REINDEX was killed.
>  sl_log_1/2 currently have 0 and 50M records, respectively; and,
> sl_log_2 is growing.
>
> Our current theory is that sl_log_1/2 have become unreasonably
> large/bloated.  Our slaves initiate FETCH 100 FROM LOG that execute for
> 30+ minutes and must occasionally be canceled to prevent service
> disruptions.
>
> As tables A and B no longer require replication, we wish to disable
> replication of those tables and purge the queued events for those
> tables.  To do this, can we do something akin to the following?
>
> 1. Disable Slony on both master and slave,

I presume you mean that you'll stop the slon processes for a little
while.  And that step #5 is that you restart the slons.  Seems right.

> 2. Reconfigure Slony to not replicate A or B,

> 3. On the master: DELETE FROM sl_log_1/2 WHERE sl_table = # AND
> log_cmdtype = 'D'

Hmm.  I think I'd want to do this in something of reverse order.

That is...

- On master: delete from sl_log_* where sl_table = [table ID#]

- *Then*, on each node: delete from sl_table where tab_id = [table ID#].

- Third:  on each node: select _slony_cluster.alterTableRestore([table ID#])

I don't think you can have the "and log_cmdtype = 'D' clause; that
would leave the table partly in replication and partly not.

If you want to get fresh data onto the subscriber, you're going to
need to do a pg_dump -t on the table; you can't safely expect to drop
the table out solely for the purposes of the big delete.

> 4. On the master: VACUUM FULL ANALYZE sl_log_1/2, REINDEX TABLE
> sl_log_1/2, ANALYZE sl_log_1/2,

In principle, you can handle this in one swell foop via:
   cluster [some index name] on sl_log_1;
   cluster [some index name] on sl_log_2;

That does the cleanup and reindex all in one.  It's not MVCC-safe, but
the slons are down, so there's nothing during this maintenance for
them to see, and they're the only apps querying sl_log_[12].

And actually, this step isn't particularly vital.  The log tables will
have gaping empty spaces, but the indexes should allow access to be
reasonably efficient short term even though they're bloated.  And when
Slony-I rotates thru the log tables, it'll do a TRUNCATE and properly
scrub them out over the next few days.

> 5. Re-enable replication, and

e.g. - restart slons...  Yeah, that seems appropriate.

> 6. On the slave: DELETE FROM A, VACUUM FULL ANALYZE A/B.

If you're deleting ALL the data, I'd use TRUNCATE, as that leaves the
table scrubbed clean.

> We currently run Linux v2.6.18, PostgreSQL v8.1.4, and Slony-I v1.2.2 on
> both master and slave.
-- 
output = reverse("ofni.sailifa.ac" "@" "enworbbc")
<http://dba2.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)