[Slony1-general] Table not replicating, but no errors reported by slony.

Tue May 25 11:21:27 PDT 2010

Steve Singer wrote:
> Brian Fehrle wrote:
>
> > Hi all,
>
> A few things I would look at
>
> Look at  sl_event and sl_confirm.  Are there events in sl_event that 
> are larger than what shows up as being confirmed in sl_confirm?  When 
> where these events generated?   If so look at the next unconfirmed 
> event in sl_event and see what type of event it is.
>
All entries between sl_event and sl_confirm match exactly (each 
con_seqno from sl_confirm matches an ev_seqno from sl_event)
>
> You can look at sl_log_1 and sl_log_2, you should see your missing 
> rows, in particular the ev_snapshot from sl_event of the last 
> unconfirmed SYNC event should give you the range of rows (log_txid) of 
> some of the unreplicated rows.  The set of all of the unconfirmed sync 
> events should give you all of the rows in sl_log_1 and sl_log_2 that 
> need to still be sent.
>
On both the master and the slave, there are zero entries in either 
sl_log_1 or sl_log_2.
>
> You can also try a slonik script like
>
> sync(id=1);
> wait for event(origin=1, confirmed=all, wait on=1);
>
>
> This generate a sync event and wait until it gets replicated.  If 
> slonik exists on success and  your still missing those rows then 
> something strange is going on (I would start to wonder if you did 
> something like an execute script on your replica that deleted rows 
> just from the replica)
I was wondering the same thing, however doesn't the slave node refuse 
updates/inserts/deletes via a locking system? There are quite a few 
people who use the databases and I can't account for all actions by 
everyone. I will give this sync command a try in a bit, I need to wait 
on some things before I can give it a try.

Another thing that has come to mind, when we first added this table to 
the replication set, we had a few problems with some of our scrips which 
resulted in a daemon attempting to start the slon daemons even if they 
were already running. Normally the daemons are smart enough to kill 
themselves, however since this was going on during the initial 
propagation of the data to the slave, it may have done something 
unintentional.

- Brian
>
> Steve
>
>
>
>
>
>>     I'm having some trouble determining why replication isn't 
>> happening on a replication table. I have a two node slony cluster. I 
>> have a table in the slony replication set that has 72332 records on 
>> the master, however it has 71225 records on the slave. It's been this 
>> way for a few hours at least (could be more as that is when we first 
>> noticed it). This table was added to the replication set several 
>> weeks ago, so it's not stalled mid-publish. The slon daemons are 
>> running, and the logs for the daemons report no abnormalities. I've 
>> restarted the slon daemons to see if it would clear anything up, but 
>> it remains the same.
>>
>> Looking at sl_status, the lag events never go above 1, and the lag 
>> time never goes above a couple of minutes.
>>
>> Best reasons I can think of are, either something is causing the 
>> replication on this particular table to be on "hold" and not update 
>> the remaining rows on the slave, while not alerting me via the slon 
>> logs. Or something went screwy and replication for that table is out 
>> of sync and I need to drop the table from the set and add it back 
>> again, let it sync up (however this solution is not ideal.)
>>
>> Any tips of places I should look to see what may be going on?
>>
>> Thanks in advance.
>>
>>        - Brian Fehrle
>>
>> Data that may be important:
>>
>> Commands that start the slon daemons:
>> /usr/local/pgsql/bin/slon -p /usr/local/pgsql/log/slon.node1.pid -s 
>> 60000 -t 300000 SLONY "dbname=$MASTERDBNAME port=$MASTERPORT 
>> host=$MASTERHOST user=$REPUSER"  > 
>> /usr/local/pgsql/log/slon.node1.log 2>&1 &
>> /usr/local/pgsql/bin/slon -p  /usr/local/pgsql/log/slon.node2.pid-s 
>> 60000 -a /usr/local/pgsql/slon_logs -t 300000 -x "log_parsing_script" 
>> SLONY "dbname=$SLAVEDBNAME port=$SLAVEPORT host=$SLAVEHOST 
>> user=$REPUSER"  > /usr/local/pgsql/log/slon.node2.log 2>&1 &
>>
>> slony version 1.2.20
>> master PostgreSQL version 8.4.1
>> slave PostgreSQL version 8.4.2
>>
>>
>> _______________________________________________
>> Slony1-general mailing list
>> Slony1-general at lists.slony.info
>> http://lists.slony.info/mailman/listinfo/slony1-general
>
>