[Slony1-general] Uninterrupted Slony Replication

Mon Aug 8 17:08:19 PDT 2011

On Mon, 8 Aug 2011, Dilraj Singh wrote:

> Hi,
> 
> Yup, it works for 2.0.7. Thanks.
> 
> But i tried for version 2.0.4 also, still its giving the same errors. We are
> little bit inclined to use version 2.0.4 as it is current version available
> with apt-get on debian and hence can be updated easily using apt-get. So is
> there any way i can make this work in the version 2.0.4 too?
> 
> Also, I noticed that on rebooting the machine, it does not even work when i
> kill the slon process started on reboot and manually start the slon process
> like ./slon conninfo=.

Once your network and postgresql instances are up you should just be able to 
restart all of your slon processes and replication should resume (with 
2.0.4) it should recover from the dropped connections when slon is 
restarting.

How are you starting slon?  Are you using a slon.conf file or passing the 
conninfo on the command line? (you need to be doing one of the two).

Steve

> 
> Regards
> Dilraj Singh
> 
> On Sat, Aug 6, 2011 at 8:42 AM, Steve Singer <ssinger_pg at sympatico.ca>
> wrote:
>       On Fri, 5 Aug 2011, Dilraj Singh wrote:
>
>             Hi,
>
>             I am using postgresql-8.4 and slony1-1.2.0.3 and i
>             have been able implement
>             a 4 node replication cluster where nodes communicate
>             successfully with each
> 
> 
> Try upgrading to 2.0.7 and see if it fixes your problem.
> 
> 1) 2.0.3 has a bug (unrelated to your current issue) that isn't
> present in 2.0.2 or 2.0.4 so that release should be avoided
> 
> 2) 2.0.7 has some fixes related to recovering from dropped connections
> that might fix your issue, the error you paste below looks familiar.
> 
> <snip>
> 
>
>       2011-08-05 09:25:40 PDTERROR  remoteListenThread_3:
>       "select con_origin,
>       con_received,     max(con_seqno) as con_seqno,    
>       max(con_timestamp) as
>       con_timestamp from "_four_node_rep_cluster20".sl_confirm
>       where con_received
>       <> 2 group by con_origin, con_received" 2011-08-05
>       09:25:42 PDTERROR 
>       remoteListenThread_3: "select ev_origin, ev_seqno,
>       ev_timestamp,       
>       ev_snapshot,       
>       "pg_catalog".txid_snapshot_xmin(ev_snapshot),       
>       "pg_catalog".txid_snapshot_xmax(ev_snapshot),       
>       ev_type,       
>       ev_data1, ev_data2,        ev_data3, ev_data4,       
>       ev_data5,
>       ev_data6,        ev_data7, ev_data8 from
>       "_four_node_rep_cluster20".sl_event
>       e where (e.ev_origin = '3' and e.ev_seqno > '5000000005')
>       or (e.ev_origin =
>       '4' and e.ev_seqno > '5000000039') order by e.ev_origin,
>       e.ev_seqno limit
>       40" - no connection to the server
>
>       and then the replication wont start working again till the
>       time i reboot all
>       the nodes. I am guessing it might be the case that the
>       provider node gets
>       reinitialized on rebooting thats why the replication
>       starts again. I know
>       slony is used for automated database replication so i was
>       wondering whether
>       there is any way in which i can make this work without
>       rebooting all the
>       nodes, which will be inconvenient if the number of nodes
>       increase or for
>       production server
>
>       Any inputs on the above error will be greatly appreciated.
>
>       Regards
>       Dilraj Singh
> 
> 
> 
> 
>