John Sidney-Woollett johnsw
Mon Aug 15 16:05:35 PDT 2005
Thanks for the info - I was worried that my setup was damaged in some 
way. But I'm glad that that appears not to be the case.

Just wanted to say that Slony 1.1 is great. It was easy to set up and it 
seems to be working fine. Also getting a slave to catch up with the 
master was QUICK - a table with 1.5 million records took only 6 minutes 
to build on the slave! Apparantly postgres 8 is even faster...

Slony 1.0.5 saved our bacon following a crash on our master when it ran 
out of disk space, we switched to the slave for 24 hours, installed 
Slony 1.1, redefined the cluster, and reverted back to the newly rebuilt 
master. We lost no data and had minimal downtime...

Thanks for all the great work - what an awesome product!

John

Christopher Browne wrote:
> John Sidney-Woollett wrote:
> 
> 
>>Can anyone explain why slon on (the master) node #1 stopped when a
>>MOVE SET command was issued on that node - see the FATAL error notice
>>below. But when slon on node 1 was restarted, it processed the as yet
>>unprocessed MOVE SET correctly.
>>
>>The slonik script below was executed on fs01b, node #1, and this is
>>the server where the slon process died. The slon process on db01a,
>>node #2 stayed up fine during all the move set operations.
>>
>>No applications were running against either database during the switch
>>over. But I did have one psql session open against each db to check
>>that the moves worked OK by issuing SQL statements for the appropriate
>>tables after the move.
>>
>>We had 6 sets to move and only the first move (set #6) didn't
>>terminate the slon process. However, all the move sets seem to have
>>worked OK.
>>
>>I checked the _bpreplicate2.sl_set table (on both nodes), and the
>>set_origin is now 2, and all the tables are unlocked on node #2. All
>>sequences and tables seem to be being replicated correctly (now from
>>node #2 to node #1).
>>
>>I'm using slony 1.1 with postgres 7.4.6
>>
>>Any ideas?
> 
> 
> Yes, I have an idea...
> 
> The place where the error message is generated is in local_listen.c;
> here's the code fragment that generates it...
> 
>                 if (PQntuples(res2) != 1)
>                 {
>                     slon_log(SLON_FATAL, "localListenThread: MOVE_SET "
>                          "but no provider found for set %d\n",
>                          set_id);
>                     dstring_free(&query2);
>                     PQclear(res2);
>                     slon_abort();
>                 }
>                
> That != 1 struck me as suspicious...  If the code were looking for
> "non-existence," I'd expect it to look for PQntuples(res2) being zero.
> 
> If the query returns more than 1 entry, that code path would be taken,
> and, even before looking at the query, that seems suspicious.
> 
> Stepping backwards, the query was...
> 
>                 slon_mkquery(&query2,
>                          "select sub_provider from %s.sl_subscribe "
>                          "    where sub_receiver = %d",
>                          rtcfg_namespace, rtcfg_nodeid);
>                 res2 = PQexec(dbconn, dstring_data(&query2));
> 
> In your case, when the *first* MOVE SET takes place, this will lead to
> the receiver being changed for the first set, and thus to there being
> one set found, and hence PQntuples(res2) will return 1, and all will
> appear OK.
> 
> When the subsequent MOVE SETs are performed, there will be multiple
> records found with the revised receiver, and the SLON_FATAL error will
> be raised each time.
> 
> It ought to be simple enough to add the set_id into the query, which
> would resolve the issue.  I'll hold off on that until we can get the new
> testing framework checked in, so that I can test using the new
> framework.  (Hint, hint, Darcy!)
> 
> The problem introduced by this bug is basically that the attempt to
> reconfigure the slon after the event fails.  Restarting the slon is the
> right answer, and will work fine.  Your watchdog process is your friend,
> in this case :-).


More information about the Slony1-general mailing list