[Slony1-general] RE: node -1 error after restarting servers

Mon Aug 11 13:51:49 PDT 2008

Hi Mark - I have had this problem myself.  The only way I solved it was to remove all reference to the 
set merge/creation (set 99 in your case) from my sl_events table.  It's not enough to remove 
references to the set from your sl_set and sl_table tables, unfortunately.

It's a little tricky to find the event number you need to remove once you've gotten rid of the set 
from sl_set, sl_table, etc.  Maybe someone else here will chime in on that?

-Jennifer

Mark Steben wrote:
> 
> Thanks Chris for the response,
> 
> I have tried to restart the slons several times with no success.
> 
> I should have added that, prior to this problem, I was succesfully
> Replicating 7 tables.
> Then I added an 8th table to the replication through the recommended
> scenario of
>  CREATE SET (Set add table)
>  SUBSCRIBE SET
>  MERGE SET
> 
> This worked.  Now, upon further investigation, I do a query
>  On the slave:
> 
> select * from sl_set   
> slony_practice-# ;
>  set_id | set_origin | set_locked |          set_comment           
> --------+------------+------------+--------------------------------
>       1 |          1 |            | 7 mavmail tables
>      99 |          1 |            | merged feedback response table
> 
>  
> But On the master: 
>  
>  slony_practice=# select * from sl_set;
>  set_id | set_origin | set_locked |     set_comment      
> --------+------------+------------+----------------------
>       1 |          1 |            | 7 + 1 mavmail tables
> 
> 
> This condition occurs because, when I restarted the servers I did a
> DROP SET / CREATE SET set of sloniks on the master where I recreated the
> Set 1 with all 8 tables and dropped set 99. So now obviously I need to
> Get them back in sync.  I tried a DROP SET on the slave to attempt to
> Drop set 99 but it errors out saying that:
>   
> "set 99 does not originate on local node:
> 
> And I tried the same thing on the master node and got the same message.
> 
> Do I need to rerun the entire MERGE SET scenario where I create set 99 on
> the master again?
> 
> Thanks for your time.
> 
> Mark
> 
> From: chris [mailto:chris at dba2.int.libertyrms.com] 
> Sent: Monday, August 11, 2008 10:56 AM
> To: Mark Steben
> Cc: slony1-general at lists.slony.info
> Subject: Re: [Slony1-general] RE: node -1 error after restarting servers
> 
> "Mark Steben" <msteben at autorevenue.com> writes:
>> I messed up and sent this originally as HTML.  Resending as plain text.
> 
> Thanks, that's helpful!
> 
>> Hi - hoping for some help.
>> I'm running Slony 1.2.14 on a simple 1 master 1 slave configuration.  Each
>> server is running Postgres 8.2.5.
>> I had to restart both master and slave servers.  When I tried to restart
> the
>> slons I got errors in the slave log that
>> Table ids were already assigned within the set.  So I dropped the set,
>> dropped the path and the listens,
>> And recreated all with CREATE SET,  STORE PATH, and STORE LISTENS on the
>> provider
>>
>> Now when I restart the slons and subscribe the newly defined set I get the
>> following error in the logs
>> On the slave:
>>     node -1 not found in runtime configuration
>>
>> and the copy fails.
>>
>> When I query SL_NODE on the master I get:
>>
>>    no_id | no_active |   no_comment    | no_spool 
>> -------+-----------+-----------------+----------
>>      1 | t         | Master Node     | f
>>      2 | f         | <event pending> | f
>>
>> And the same query on the slave gives:
>>
>> no_id | no_active |        no_comment        | no_spool 
>> -------+-----------+--------------------------+----------
>>      1 | t         | Master Node                    | f
>>      2 | t         | subscriber node                | f
>> (2 rows)
>>
>> Do I have to recreate the node(s) as well?   
>>
>> Any help would be appreciated.  thanks
> 
> That error message takes place at the beginning of the function
> "copy_set", and indicates that the slon couldn't find the node in the
> in-memory configuration.
> 
> That can commonly be rectified by restarting the slon, which will
> cause the slon to reread its configuration.
> 
> Except, we should take a step back.  The trouble is that copy_set()
> was trying to access a node that it wasn't properly aware of.
> 
> The data that you have provided about sl_node on both nodes looks
> useful, actually.  It indicates that the "master" node hasn't yet
> figured out its configuration, which seems quite plausible to cause
> further troubles.
> 
> If I had to guess, I'd imagine that perhaps you hadn't run a slon
> against the "master" node and that it isn't properly aware, yet, that
> node #2 has been set up.
> 
> Before you start trying to set up subscriptions, make sure you can run
> things like "STORE PATH" and see, in the logs, that this has been
> processed by both nodes.