Rod Taylor pg
Tue Mar 29 00:43:10 PST 2005
> I notice your comment thus:
> 
> (modified to handle a group size larger than 100).
> 
> Is this about changing the FETCHes to pull > 100 records at a time?  Or 
> about allowing the grouping of more syncs together?

Grouping more sync's together into a single set. Right now I'm running
it in groups of about 10000.

> An open question (to me) is whether it is particularly beneficial to 
> raise these sizes when behind or not.

Well, as you saw above with the plan it's using it scans the entire
table for data. Groups of 10000 causes it go replicate 20 hours of data
in a single shot (about 6 hours execution) where groups of 100 executes
for 5 hours to replicate a few minutes worth of data.

Once it gets past the MERGE_SET events, this won't be necessary since
the plan will convert back to something sane again (using the index to
restrict the transaction range).

> 1.  Doubling the 'grouping' should cut down the number of queries by ~ 
> 1/2, which should diminishing parsing overhead.  I would expect this to 
> be relatively immaterial, but could be wrong.

Without the horrible "look at everything in the table" plan the grouping
size would not be a big deal.

Another notable is the minimum transaction id.

Group of 100 has range of transactions from 10 through 100.

Next group of 100 has range of transactions from 10 through 1000 since
transaction 10 was long running.

Next group of 100 has transaction range 10 through 2500 since
transaction 10 was still running.

It takes me about 12 to 24 hours to copy the largest tables (that's
each, not collectively); so there are a number of blocks where it
rescans from transaction ID X to some larger number many times if I use
grouping sets less than 7000.

The data copy for slony is the min transaction ID that it then runs into
trouble handling later on. Pg_dump isn't helpful either (about 16 hours
to do a backup via pg_dump), but that is why we're upgrading to 8.0.

> 2.  Doubling the 'grouping' will lead to query result sets being larger 
> on the provider, which will have some added cost in memory and in the 
> cost of sorting

Big deal. If you've fallen behind by days and a few hundred million
tuples trying to get through the initial data copy, you've probably got
a little bit of memory hanging around.

Slony is welcome to use 8GB of memory on my systems if it'll get through
the initial "copy" period faster (takes about 3 weeks currently).

> Another open question is whether or not there may be a possibility for 
> it to be beneficial to add additional indices on some of the Slony-I 
> tables to help in such cases.

I don't think so. Just optimizations as to how Slony queries and
internally groups things.




More information about the Slony1-general mailing list