Tue Jul 3 09:33:12 PDT 2007
- Previous message: [Slony1-general] timestamp with time zone insanity
- Next message: [Slony1-general] Soliciting ideas for v2.0
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"Andrew Hammond" <andrew.george.hammond at gmail.com> writes: > On 6/29/07, Christopher Browne <cbbrowne at mail.libertyrms.com> wrote: > A really interesting win would be in detecting cases where you can go from > > WHERE id IN ( a list ) > > to > > WHERE a < id AND id < b > > However I think this is only possible at the time the transaction > happens (how else will you know if your sequence is contigious. And > that suggests to me that it's not reasonable to do at this time. That also seems near-nondeterministic in that we're capturing data based on the state of things when the transactions (multiple!) happen, on the data source, when the effects will be based on the state of things on destination nodes, at a different point in time. I'll see about doing an experiment on this to see if, for the DELETE case, it seems to actually help. It may be that the performance effects are small to none, so that the added code complication isn't worthwhile. > Also, ISTM that the big reason we don't like statement based > replication is that SQL has many non-deterministic aspects. However, > there is probably a pretty darn big subset of SQL which is provably > non-deterministic. And for that subset, would it be any less > rigorous to transmit those statements than to transmit the per-row > change statments like we currently do? Well, by capturing the values, we have captured a deterministic form of the update. Jan and I had a chat last week on ideas of how to do "wilder transformations" (e.g. - like adding/dropping columns, or of replicating "WHERE FOO IN ('BAR')"); what we arrived at was that, in such cases, what we'd need to do is to have custom 'logtrigger' functions that would have full access to OLD.* and NEW.* (e.g. - the two sets of columns, old and new), which would then use them, perhaps with arbitrary complexity, construct sl_log_n entries. The "fully general" logtrigger function would be *way* less efficient than the present ones; you don't get complex transformations for free. >> It would take some parsing of the log_cmddata to do this, nonetheless, >> I think it ought to be possible to compress this into some smaller >> number of queries. Again, if we limited each query to process 100 >> tuples, at most, that would still seem like enough to call it a "win." > > I can see two places to find these wins. When the statement is parsed > (probably very affordable) and, as you mentioned above, by inspecting > the log tables. I think that we'd have to be pretty clever with the > log tables to avoid having it get too expensive. I wonder if full text > indexing with an "sql stemmer" might be clever way to index that data > usefully. I have a *small* regret in this; it would be very nice if data in sl_log_[n].log_cmddata were split into two portions: 1. For an INSERT, split between the column name list and the VALUES portion; You could, in principle, join together a set of VALUES entries for the same table as long as the list of column names match. 2. For an UPDATE, split between the SET portion and the WHERE portion; You could, in principle, join together a set of entries which have identical SET portions by folding together the WHERE clauses. 3. For DELETE, there's nothing to be split :-). It's trivial to fold DELETE requests together as I previously showed. > Two downsides of the parser approach that I can see are > 1) the postgresql parser / planner is already plenty complex > 2) it doesn't group stuff across multiple statements I don't see any possibility of using a parser-based approach; that jumps us back into statement-based replication, which is susceptible to nondeterminism problems. Remember, the thought we started with was: "What if we could do something that would make mass operations less expensive?" I don't want to introduce anything that can materially increase processing costs. The more intelligent we try to get, the more expensive the logtrigger() function gets, and if the price is high enough, then we gain nothing. The only "win" I see is if we can opportunistically join some statements together. If we have to make the log trigger function universally *WAY* more expensive, well, that's a performance loss :-(. -- let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];; http://cbbrowne.com/info/unix.html Rules of the Evil Overlord #207. "Employees will have conjugal visit trailers which they may use provided they call in a replacement and sign out on the timesheet. Given this, anyone caught making out in a closet while leaving their station unmonitored will be shot." <http://www.eviloverlord.com/>
- Previous message: [Slony1-general] timestamp with time zone insanity
- Next message: [Slony1-general] Soliciting ideas for v2.0
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list