Scott Marlowe scott.marlowe at gmail.com
Sun Nov 8 18:13:58 PST 2009
(Note, please don't use html email on the list. Many folks' mail
readers can't read it and they just delete the message.)  (me
personally I could care less, gmail has a nice html -> text tool that
makes it easy)

On Sun, Nov 8, 2009 at 7:40 AM, Melvin Davidson <melvin6925 at yahoo.com> wrote:
>
> >This an 8 core opteron 2.1GHz machine with 32G ram and a 12 disk RAID-10 for pgdata >and a two disk RAID-1 for pg_xlog.
>
> OK, based on that information, I can only suggest the following tweaks. Try them
> 1 at a time to see what difference they make. Note that shared_buffers & max_files_per_process require a restart. The others can be handled from psql with
> SELECT pg_reload_conf();
>
> shared_buffers                    --> increase to 48mb
> work_mem                          --> increase to 8mb
> maintenance_work_mem   --> increase to 32mb
> max_files_per_process      --> increase to 4000

All those types of changes were made long ago, and these machines are
well tuned and run VERY fast, most of the time.  They started getting
slow at \d and tab completion in psql as we added ~2000 schemas and
Tom Lane gave us a very very good fix of setting some function with
"is visible" in it to a higher cost and voila, we went from 20 seconds
tab completions to sub second responses.  Same with \d, the speed up
was huge.

This setAddTable() function is, I'm assuming, running over the pg
catalogs for the most part, and they all easily fit in memory, and are
kept well vacuumed.  But they are good sized, with 30k various objects
in the db.  Some slowdown is expected, but this one seems extreme.

I'm guessing that some part of the setAddTable function is looking at
all schemas / objects when it should be only operating on the set in
question.   If it was it wouldn't matter if I had 1M schemas.

> Also, check the postgresql log to see if there are any warnings about the need to increase other parameters such as max_fsm_pages or checkpoint_segments.

You don't wanna be near one of these machines when we blow out the
FSM.  The second we have more objects to track than FSM can handle
things get out of control in a matter of days, if not hours.    Our
master write database has a setting of 10M for max_fsm_pages and 500k
max_fsm_relations.  Vacuum verbose shows us using 2M pages used in 37k
relations.

Regular db operations are fast, in the millisecond range for most
queries, and we're pushing only about 20M a second on our pg_xlog
drives that can handle about 60M to 70M sequential.

I have two very distinct problems to deal with.

The first is somewhat more bothersome than dangerous, and that's that
with a large catalog setAddTable is spinning a CPU for 30 seconds,
which in my past experience is a bad query plan nest loop joining 37k
records when it expected 40 or something like that.  I can live with
it, I have enough spare CPU to run two or three cores on a create set
and I don't have to run it more than once a year or so at most.

The second problem is MUCH MUCH worse, and that is that regular old
autovacuum is running it blocks setAddTable(), which blocks all access
to the referenced table. Note that once autovac was running on an FKed
table that caused this issue on a subscribing table for setAddTable().
 That problem can take out my database while trying to subscribe.  So
I have to remember to turn off autovacuum.  Or it gets in the way and
things go boom.

This is actually a classic example of priority inversion.  By lowering
the priority of auto-vacuum we slow down the one thing that is
blocking another thing, which is then blocking everything else.  The
more we slow down auto-vacuum the worse the problem becomes.  If
autovacuum ran at maximum speed, we'd be in trouble less during
subscription.  But I think disabling autovac while creating is the
safer bet.


More information about the Slony1-general mailing list