[Slony1-general] Diagnosing a possible problem with replication

Thu Mar 25 08:41:26 PDT 2010

Hi, Steve!
Thanks a lot for your help!
I did what you told me, and after that I noticed something interesting: this
was not the very first time Slony was installed on the machines!!!
Of course, after uninstalling, nobody did the "DROP SCHEMA _dbprod_cluster
CASCADE".
I did that, reinstalled the cluster, and now everything is working fine!
Thanks again for your help, and best regards,

HeCSa.

On Mon, Mar 22, 2010 at 11:28 PM, Steve Singer <ssinger at ca.afilias.info>wrote:

> Hernan Saltiel wrote:
>
>> Hi!
>> I configured a slony cluster between two nodes: the master, srvdb01, and a
>> slave, srvdb02. The database is "dbprod".
>> Both nodes are CentOS 64 bits, with this postgres packages installed:
>> create set (id = 1, origin = 1,
>> comment = 'Base Productiva');
>>
>> (All the set's are here, are more than 120...)
>>
>>
> You never mentioned where you added tables to your sets.  Could you have
> 120 replication sets with 0 tables in each?
>
> Does
> SELECT * FROM _mycluster.sl_table;
>
> show you anything interesting? (is it empty, meaning your sets seem to have
> no tables?)
>
> Did you also issue 120 subscribe set requests or did you only subscribe the
> first one? (If you tried subscribing all 120 at once you might want to try
> and tear-down the slony cluster and try it again only doing the first set
> and waiting for it to finish before moving on.  It is possible there are
> some race conditions that result from trying to subscribe to multiple sets
> concurrently)
>
> You should also check to see if there are any locks being held on slony
> tables.
>
>
>
>
>
>  store node (id = 2, comment = 'Node 2');
>> store path (server = 1, client = 2,
>> conninfo = 'dbname=$DB1 host=$H1 user=$U password=$P');
>> store path (server = 2, client = 1,
>> conninfo = 'dbname=$DB2 host=$H2 user=$U password=$P');
>> store listen (origin = 1, provider = 1, receiver = 2);
>> store listen (origin = 2, provider = 2, receiver = 1);
>>
>> Then, executed the script.
>>
>> On the master and slave nodes, I ran:
>> nohup slon dbprod_cluster "dbname=dbprod user=postgres" &
>>
>> After that, created the subscribe.sh script, on the slave node:
>>
>> #!/bin/sh
>>
>> CLUSTER=dbprod_cluster
>> DB1=dbprod
>> DB2=dbprod
>> H1=srvdb01
>> H2=srvdb02
>> U=postgres
>> P=Secreta01
>>
>> slonik <<_EOF_
>>
>> cluster name = $CLUSTER;
>>
>> node 1 admin conninfo = 'dbname=$DB1 host=$H1 user=$U password=$P';
>> node 2 admin conninfo = 'dbname=$DB2 host=$H2 user=$U password=$P';
>>
>> subscribe set (id = 1, provider = 1, receiver = 2, forward = yes);
>>
>> I ran that script, and saw in the nohup.out log file of the slon process
>> several SYNC, LISTEN and UNLISTEN messages.
>> I'm concerned, after two days seeing those messages, and not seeing any
>> row being replicated, if this is normal, because Slony needs to do something
>> before start replicating, or if there is some way to understand if something
>> is going wrong.
>>
>> Here are some rows of the master nohup.out file:
>>
>> DEBUG2 remoteWorkerThread_2: SYNC 30755 processing
>> DEBUG2 remoteWorkerThread_2: no sets need syncing for this event
>> DEBUG2 syncThread: new sl_action_seq 11392 - SYNC 16232
>> DEBUG2 remoteListenThread_2: queue event 2,30756 SYNC
>> DEBUG2 remoteListenThread_2: queue event 2,30757 SYNC
>> DEBUG2 remoteWorkerThread_2: Received event 2,30756 SYNC
>> DEBUG2 calc sync size - last time: 1 last length: 8611 ideal: 6 proposed
>> size: 3
>> DEBUG2 remoteWorkerThread_2: SYNC 30757 processing
>> DEBUG2 remoteWorkerThread_2: no sets need syncing for this event
>> DEBUG2 localListenThread: Received event 1,16232 SYNC
>> DEBUG2 syncThread: new sl_action_seq 11392 - SYNC 16233
>> DEBUG2 remoteListenThread_2: queue event 2,30758 SYNC
>> DEBUG2 remoteWorkerThread_2: Received event 2,30758 SYNC
>> DEBUG2 calc sync size - last time: 2 last length: 8525 ideal: 14 proposed
>> size: 5
>> DEBUG2 remoteWorkerThread_2: SYNC 30758 processing
>> DEBUG2 remoteWorkerThread_2: no sets need syncing for this event
>> DEBUG2 remoteListenThread_2: queue event 2,30759 SYNC
>> DEBUG2 remoteWorkerThread_2: Received event 2,30759 SYNC
>> DEBUG2 calc sync size - last time: 1 last length: 2389 ideal: 25 proposed
>> size: 3
>> DEBUG2 remoteWorkerThread_2: SYNC 30759 processing
>> DEBUG2 remoteWorkerThread_2: no sets need syncing for this event
>> DEBUG2 localListenThread: Received event 1,16233 SYNC
>> DEBUG2 syncThread: new sl_action_seq 11392 - SYNC 16234
>> DEBUG2 localListenThread: Received event 1,16234 SYNC
>> DEBUG2 remoteListenThread_2: queue event 2,30760 SYNC
>> DEBUG2 remoteListenThread_2: queue event 2,30761 SYNC
>> DEBUG2 remoteWorkerThread_2: Received event 2,30760 SYNC
>> DEBUG2 calc sync size - last time: 1 last length: 8570 ideal: 7 proposed
>> size: 3
>> DEBUG2 remoteWorkerThread_2: SYNC 30761 processing
>> DEBUG2 remoteWorkerThread_2: no sets need syncing for this event
>> DEBUG2 syncThread: new sl_action_seq 11392 - SYNC 16235
>> DEBUG2 remoteListenThread_2: queue event 2,30762 SYNC
>> DEBUG2 remoteWorkerThread_2: Received event 2,30762 SYNC
>> DEBUG2 calc sync size - last time: 2 last length: 8519 ideal: 14 proposed
>> size: 5
>> DEBUG2 remoteWorkerThread_2: SYNC 30762 processing
>> DEBUG2 remoteWorkerThread_2: no sets need syncing for this event
>> DEBUG2 remoteListenThread_2: queue event 2,30763 SYNC
>> DEBUG2 remoteWorkerThread_2: Received event 2,30763 SYNC
>> DEBUG2 calc sync size - last time: 1 last length: 2350 ideal: 25 proposed
>> size: 3
>> DEBUG2 remoteWorkerThread_2: SYNC 30763 processing
>> DEBUG2 remoteWorkerThread_2: no sets need syncing for this event
>> DEBUG2 localListenThread: Received event 1,16235 SYNC
>>
>>
>> ...and here some of the slave:
>>
>> DEBUG2 localListenThread: Received event 2,30773 SYNC
>> DEBUG2 remoteListenThread_1: LISTEN
>> DEBUG2 syncThread: new sl_action_seq 1 - SYNC 30774
>> DEBUG2 localListenThread: Received event 2,30774 SYNC
>> DEBUG2 remoteListenThread_1: queue event 1,16241 SYNC
>> DEBUG2 remoteListenThread_1: UNLISTEN
>> DEBUG2 syncThread: new sl_action_seq 1 - SYNC 30775
>> DEBUG2 localListenThread: Received event 2,30775 SYNC
>> DEBUG2 remoteListenThread_1: LISTEN
>> DEBUG2 syncThread: new sl_action_seq 1 - SYNC 30776
>> DEBUG2 localListenThread: Received event 2,30776 SYNC
>> DEBUG2 syncThread: new sl_action_seq 1 - SYNC 30777
>> DEBUG2 remoteListenThread_1: queue event 1,16242 SYNC
>> DEBUG2 remoteListenThread_1: UNLISTEN
>> DEBUG2 localListenThread: Received event 2,30777 SYNC
>> DEBUG2 remoteListenThread_1: LISTEN
>> DEBUG2 syncThread: new sl_action_seq 1 - SYNC 30778
>> DEBUG2 localListenThread: Received event 2,30778 SYNC
>> DEBUG2 remoteListenThread_1: queue event 1,16243 SYNC
>> DEBUG2 remoteListenThread_1: UNLISTEN
>> DEBUG2 syncThread: new sl_action_seq 1 - SYNC 30779
>> DEBUG2 localListenThread: Received event 2,30779 SYNC
>> DEBUG2 remoteListenThread_1: LISTEN
>> DEBUG2 syncThread: new sl_action_seq 1 - SYNC 30780
>> DEBUG2 localListenThread: Received event 2,30780 SYNC
>> DEBUG2 syncThread: new sl_action_seq 1 - SYNC 30781
>> DEBUG2 remoteListenThread_1: queue event 1,16244 SYNC
>> DEBUG2 remoteListenThread_1: UNLISTEN
>> DEBUG2 localListenThread: Received event 2,30781 SYNC
>> DEBUG2 remoteListenThread_1: LISTEN
>> DEBUG2 syncThread: new sl_action_seq 1 - SYNC 30782
>> DEBUG2 localListenThread: Received event 2,30782 SYNC
>> DEBUG2 remoteListenThread_1: queue event 1,16245 SYNC
>> DEBUG2 remoteListenThread_1: UNLISTEN
>> DEBUG2 syncThread: new sl_action_seq 1 - SYNC 30783
>> DEBUG2 localListenThread: Received event 2,30783 SYNC
>>
>> I ran some scripts in the _dbprod_cluster view, because of some tips I
>> found on blog's, but don't really know if this is an indicator of something
>> going normally, or not.
>> Here are some of them:
>>
>> select count(*) from _dbprod_cluster.sl_log_1;
>>
>>  count
>> -------
>>  11392
>> (1 row)
>>
>> select count(*) from _dbprod_cluster.sl_log_2;
>>
>>  count
>> -------
>>     0
>> (1 row)
>>
>> select st_lag_num_events from _dbprod_cluster.sl_status;
>>
>>  st_lag_num_events
>> -------------------
>>             16130
>> (1 row)
>>
>> Could anybody help me understand what this numbers are telling me?
>> Thanks a lot in advance for your help!!!!
>> Best regards,
>>
>> --
>> HeCSa
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Slony1-general mailing list
>> Slony1-general at lists.slony.info
>> http://lists.slony.info/mailman/listinfo/slony1-general
>>
>
>
> --
> Steve Singer
> Afilias Canada
> Data Services Developer
> 416-673-1142
>

-- 
HeCSa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.slony.info/pipermail/slony1-general/attachments/20100325/95f3ba20/attachment-0001.htm