Created attachment 89 [details] Full cursor declaration Slony can not continue replicating after mal-formed txid_snapshot is generated. As far as i can see from the remote_worker.c src file, the log cursor is generated on the fly: in which case the txid_snapshot was calculated wrongly: In my case in the line: "pg_catalog".txid_visible_in_snapshot(log_txid, '204108434:204108466:204108434,204108435,204108435,204108439,204108441,204108444,204108447,204108449,204108451,204108453') the xip_list contains two identical entries: <snip>204108435,204108435,</snip> Due to the nature of this error, i set the severity to major (i had to do an initial set up the replication cluster). If this is a known issue, please close it. Regards
Did you happen to get a dump of sl_event from your origin containing the details of the SYNC events involved? Also what version of slony and postgresql are you using.
of course i forgot the most important info - Version infos: Slony version: 2.0.4 Postgres: 8.4.7 running on Ubuntu 10.04 LTS The only dump i have of the replication schema is dated 3 hours prior to the incident. I have been following up on the issue myself as well, and admit that the issue is not in the remote_worker, but rather in the event creator: approx. line 3834 in remote_worker.c: (void) slon_mkquery(&new_qual, "(log_txid < '%s' and " "\"pg_catalog\".txid_visible_in_snapshot(log_txid, '%s'))", event->ev_maxtxid_c, event->ev_snapshot_c);
(In reply to comment #2) The event creation code (in slony1_funcs.c) just calls "pg_catalog\".txid_current_snapshot(), to get the list of in progress txids
> "pg_catalog\".txid_current_snapshot(), I see how the list is constructed now: tranaction ids between: currenttxid and oldtxid are then racked up into the event. Of course, without a dump of the event table it is hard to reconstruct, yet the only way a situation in which two identical txids can occur in my original post is that the event table has two identical entries for this txid. Please correct me if i am wrong, but wouldn't a unique clause on the ev_snapshot hinder a problem like this from occurring? Thanks for your time.
As far as I can tell, this is erroneous data produces by Postgres itself. Slony uses txid_current_snapshot() to get that snapshot datum, which now apparently cannot pass through a data type output+input cycle. I'm going to run a few tests this afternoon and contact the PG hackers list after that.
(In reply to comment #5) > As far as I can tell, this is erroneous data produces by Postgres itself. Slony > uses txid_current_snapshot() to get that snapshot datum, which now apparently > cannot pass through a data type output+input cycle. > > I'm going to run a few tests this afternoon and contact the PG hackers list > after that. Did your tests offer any explanations for this behavior?