Christopher Browne cbbrowne at ca.afilias.info
Fri Nov 30 07:56:03 PST 2007
"Mike C" <smith.not.western at gmail.com> writes:
> The biggest concern to me is the VPN links. I note the documentation
> strongly suggests that using Slony over a flakey WAN is a bad idea.
> So, is what I have suggested feasible with slony? Is there some other
> alternative I should be investigating? (I have to rule out any
> synchronous solution due to WAN reliability).

What you're describing sounds pretty reasonable...

The transaction mix of 10% writes versus 90% reads fits well with what
has been seen to work, elsewhere.

The trouble with flakey links is that it can lead to replication
getting some induced delays.  The specific scenario that we have seen
is thus...

 -> Slons all running at Site A.
 -> A node is at Site B, managed by slon at Site A.
 -> WAN connection falls over, cutting off all of the connection links.
 -> HOWEVER, the connection that was managing the "slon managing node
    at Site B" isn't aware that the other end of the connection is
    never coming back.

At this point, it tends to take 3h+ for TCP timeouts to take place and
allow that connection to get dropped.

During that 3h, what we would observe is that network guys would
refresh the WAN link, and...

- Slon at site A would almost immediately realize that its connection
  to site B is dead.  It would die and get restarted within minutes.
  (This on a 1.1 version of Slony-I; on 1.2, it would only be a thread
  that would get restarted.)

- Unfortunately, the attempt to reconnect would find that the existing
  connection on the "Site B" node was claiming to still be managing that
  node.

Things wouldn't clear up until the TCP timeout that would clear out the
elderly DB connection at "Site B."

We could fix this by hand, by killing off the bad DB connection, but
if this took place late at night, that required paging someone.

It happened seldom enough that we never got around to creating an
automated mechanism for this.

There was, in fact, a better solution, namely to make sure that each
slon process was running locally at the site where its database
resided, that way failed connections *wouldn't* be troublesome.  The
only kind of connection that could be left open would be read-only
feeds against remote nodes, and this wouldn't cause nearly as much
trouble.
-- 
output = reverse("ofni.secnanifxunil" "@" "enworbbc")
http://cbbrowne.com/info/languages.html
Rules of the Evil Overlord #193.  "If I am using the hero's girlfriend
as a  hostage and am holding her  at the point of  imminent death when
confronting the  hero, I will focus on  her and not him.  He won't try
anything with his true love held  hostage. On the other hand, the fact
that she has been weak, slow-witted, naive and generally useless up to
this point  has no bearing  on her actions  at the moment  of dramatic
climax."  <http://www.eviloverlord.com/>


More information about the Slony1-general mailing list