Christopher Browne cbbrowne at ca.afilias.info
Thu Nov 29 08:43:00 PST 2007
Simon Riggs <simon at 2ndquadrant.com> writes:
> I'm trying to understand what goes on during cleanupThread_main()
>
> The code does this, in order:
>
> 1. deletes all outstanding log rows from both log tables
>
> 2. (eventually) truncates the log table
>
> 3. (allegedly) vacuums the log table 

I am the one who did a lot of rehaping of this code in the 1.2 branch,
so I'll see what I can address.

> I have a few questions on the above for 1.2.x
>
> - SQL function logswitch_finish() says it is called after cleanup thread
> has vacuumed both log tables. The code in cleanupThread_main() seems to
> avoid the vacuum. Not sure whether the SQL comment is wrong, or are we
> saying we allow autovacuum to do this for us, or we don't do it at all? 

- if autovac is enabled for the table, then yes, indeed, the cleanup
  thread lets autovac do vacuum for it.

- if autovac is not enabled on the whole or for an individual table,
  then it is indeed handled by the cleanup thread.

That's neither "yes" nor "no."  :-)

> - Why do we DELETE log table rows at all? We're doing it out-of-line so
> it clearly isn't a necessary step for correctness (or is it?). At the
> end of it all we TRUNCATE them anyway, so what was the point of all that
> deletion? Or alternatively, why do we truncate and log switch at all?

The older/original logic would simply use DELETE, never TRUNCATING;
originally, we were just using the one log table, so there wasn't any
option of doing a TRUNCATE.

I was trying to be as conservative/safe as possible when I changed things
to add in periodic truncation of log tables.

Yes, you're right, we could solely use TRUNCATE, never using DELETE.
That sounds like a good further incremental change to propose.

> - My understanding of the flip-flop design with 2 log tables was that it
> would allow us to avoid VACUUM entirely, yet this doesn't seem to be the
> case in 1.2. What purpose does the second log table serve?

That's not exactly it; what it provides is that if we can periodically
TRUNCATE log tables, then we can be certain that they do not
permanently bloat to any ridiculous size.

It's not, in effect, that we can get a "best case;" instead, we get to
avoid a "worst case."

> - If we do have to DELETE, why do we do this to both log tables? Surely
> changes will only be found in one? Or are we assuming that the query
> will do a fast indexscan and return quickly, so why bother trying to
> avoid it?

During the period just after a switch, useful data will be found in
both log tables, so "making sure code is right" dictated applying the
delete logic to both tables.

Think: "Paranoid coding."

> - We rely on vacuum_delay having been set elsewhere. If we do have to do
> VACUUMs, then can we/should we force a non-zero vacuum_delay for the
> cleanup thread?

Good question.  I'm not sure.

I think the answer to that varies somewhat across PostgreSQL versions.
For 7.4, 8.0, 8.1, the vacuum_delay values make VACUUM run longer, and
VACUUM transactions are considered to be transactions that tend to
block later VACUUMS from doing any good until they complete.  That
changes in 8.2, and I think that changes how desirable (or rather, how
UNdesirable) a non-zero vacuum_delay becomes.

> - We ANALYZE the log tables when they are empty following the TRUNCATE.
> Is that done deliberately for some reason? At that point the data values
> are not available so it means later planning of SQL against the log
> tables is going to be a little strange. Should weavoid re-ANALYZEing the
> log table when we have just TRUNCATED it?

Hmm.  If it seems conspicuously unwise to do this ANALYZE, then I can
see us dropping it.  That may well be the case...
-- 
(format nil "~S@~S" "cbbrowne" "linuxdatabases.info")
http://linuxdatabases.info/info/rdbms.html
"I don't plan to maintain it, just to install it." -- Richard M. Stallman


More information about the Slony1-general mailing list