Steve Singer ssinger at ca.afilias.info
Tue May 25 10:52:25 PDT 2010
Brian Fehrle wrote:

 > Hi all,

A few things I would look at

Look at  sl_event and sl_confirm.  Are there events in sl_event that are 
larger than what shows up as being confirmed in sl_confirm?  When where 
these events generated?   If so look at the next unconfirmed event in 
sl_event and see what type of event it is.


You can look at sl_log_1 and sl_log_2, you should see your missing rows, 
in particular the ev_snapshot from sl_event of the last unconfirmed SYNC 
event should give you the range of rows (log_txid) of some of the 
unreplicated rows.  The set of all of the unconfirmed sync events should 
give you all of the rows in sl_log_1 and sl_log_2 that need to still be 
sent.


You can also try a slonik script like

sync(id=1);
wait for event(origin=1, confirmed=all, wait on=1);


This generate a sync event and wait until it gets replicated.  If slonik 
exists on success and  your still missing those rows then something 
strange is going on (I would start to wonder if you did something like 
an execute script on your replica that deleted rows just from the replica)

Steve





>     I'm having some trouble determining why replication isn't happening 
> on a replication table. I have a two node slony cluster. I have a table 
> in the slony replication set that has 72332 records on the master, 
> however it has 71225 records on the slave. It's been this way for a few 
> hours at least (could be more as that is when we first noticed it). This 
> table was added to the replication set several weeks ago, so it's not 
> stalled mid-publish. The slon daemons are running, and the logs for the 
> daemons report no abnormalities. I've restarted the slon daemons to see 
> if it would clear anything up, but it remains the same.
> 
> Looking at sl_status, the lag events never go above 1, and the lag time 
> never goes above a couple of minutes.
> 
> Best reasons I can think of are, either something is causing the 
> replication on this particular table to be on "hold" and not update the 
> remaining rows on the slave, while not alerting me via the slon logs. Or 
> something went screwy and replication for that table is out of sync and 
> I need to drop the table from the set and add it back again, let it sync 
> up (however this solution is not ideal.)
> 
> Any tips of places I should look to see what may be going on?
> 
> Thanks in advance.
> 
>        - Brian Fehrle
> 
> Data that may be important:
> 
> Commands that start the slon daemons:
> /usr/local/pgsql/bin/slon -p /usr/local/pgsql/log/slon.node1.pid -s 
> 60000 -t 300000 SLONY "dbname=$MASTERDBNAME port=$MASTERPORT 
> host=$MASTERHOST user=$REPUSER"  > /usr/local/pgsql/log/slon.node1.log 
> 2>&1 &
> /usr/local/pgsql/bin/slon -p  /usr/local/pgsql/log/slon.node2.pid-s 
> 60000 -a /usr/local/pgsql/slon_logs -t 300000 -x "log_parsing_script" 
> SLONY "dbname=$SLAVEDBNAME port=$SLAVEPORT host=$SLAVEHOST 
> user=$REPUSER"  > /usr/local/pgsql/log/slon.node2.log 2>&1 &
> 
> slony version 1.2.20
> master PostgreSQL version 8.4.1
> slave PostgreSQL version 8.4.2
> 
> 
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general


-- 
Steve Singer
Afilias Canada
Data Services Developer
416-673-1142


More information about the Slony1-general mailing list