Jeff Davis pgsql at j-davis.com
Thu Jul 12 11:02:29 PDT 2007
On Tue, 2007-07-10 at 01:12 -0400, Jan Wieck wrote:
> The speed gain in your test has a couple of sources. The parsing 
> overhead of 5 million update statements vs one copy is one of them, the 
> network latency for 5 million round trips another and the execution plan 
> (which only applies if all or at least a substantial part of the entire 
> table is updated) of course. Sure is one such mega update of the entire 
> table better done in a merge over the two sorted tuple sets. But the 
> planner might chose another strategy if you updated only 500,000 out of 
> 10 million rows.

The time it takes to do 5M "SELECT #" calls, where # is a constant.
These are run over a local socket, not a real network, so the latency is
less: 305s

The time it takes to do 5M "UPDATE" queries on a 5M record table, and
none of the WHERE clauses match any tuples (so only an index lookup is
done), in one transaction: 623s

The time it takes to do 5M of the same update queries, but prepared
first: 398s

I know tests involving this overhead have been done before, but this is
on my machine, and it could be a reference point for the other numbers I
stated earlier. 

> application). Using prepared statements to apply the changes will deal 
> with some part of the parsing and planning overhead. It will still force 
> your case to do 5 million index scans instead of two sorts and a merge. 
> But that comparison really only applies to cases where you update a very 
> substantial part of the whole table.

Yes, my test was designed to show that there are areas of potential
improvement without moving to statement-based replication.

> The log is selected by actually doing a COPY over a SELECT (the usual 
> log select). That COPY result is fed into the current log table on the 
> subscriber. A trigger on that log table will suppress the actual insert 
> operation if the subscriber is not in forwarding mode and the operation 
> is for a subscribed table (the node might be origin to something else). 
> What it also does is doing the actual leg work of applying the changes 
> via prepared SPI statements or maybe even direct heap and index updates. 
> This method cuts down on the parsing and planning, as well as on the 
> network round trips.
> 

Interesting idea. I don't think I understand the implications of doing
direct heap/index updates. Would you have to log the visibility
information as well?

Regards,
	Jeff Davis



More information about the Slony1-general mailing list