[Slony1-general] Some basic newbie-like questions

Mon Feb 28 23:14:42 PST 2005

On Mon, 28 Feb 2005, murphy pope wrote:

> Let's say that I have a small chain of auto-part stores.  One in
> Homeville, one in Nearville, and one in Farville (three imaginary
> cities).  Each store keeps an inventory table that keeps track of the
> parts in that store.  The Homeville store keeps its inventory table in a
> database named homeville, the Nearville store keeps its inventory in a
> database named Nearville, and the Farville store keeps its inventory in
> a database named Farville.

You might consider using separate schema instead of separate databases.  The 
benefit would be that you can replicate everything using one replication 
cluster, instead of needing a separate one for each database.

See:
http://www.postgresql.org/docs/current/static/sql-createschema.html

> I want the Homeville store to replicate its homeville.inventory table to
> Nearville and Farville. Likewise, Nearville and Farville each replicate
> their databases to the other two.  That way, any store can find
> (reasonably) current stock levels for all three stores.
>
> Question 1 - Is this do-able? Can a single node be both a master and a
> slave (Homeville would be a master for homeville.inventory and would
> subscribe to nearville.inventory and farville.inventory)?

Yes, a single node can be both a master (the Slony term is "origin") for one 
set and a slave ("subscriber").

In this scenario, since you have three origins, you would probably run into 
a problem if you ever ran a "move set" command and you were using schema in 
one database instead of three databases, but this has been fixed in CVS in 
both HEAD and the 1.0.x STABLE branch.

> Assuming the answer to question 1 is yes, here's how I think I should
> set up my replication cluster (I'll call it 'stores').  First, I need a
> single cluster with three nodes (homeville, nearville, and farville).
>
> The Homeville node creates a single SET(set number 1) that exposes
> homeville.inventory to anyone who's interested.
> The Nearville node creates another SET (set number 2) that exposes
> nearville.inventory.
> The Farville node creates a third SET (set number 3) that exposes
> farville.inventory.
[...]
> Question 2 - Did I get that right?  Can I do all of this in a single
> replication cluster or do I need three separate clusters?

This will only work if you use schema instead of databases.  Otherwise, you 
need three clusters, each with three nodes (nine slon processes total), and 
one set in each node.

> Question 3 - Since Farville is a long way away from Homeville, I want
> Farville to pull it's copy of homeville.inventory from Nearville instead
> of directly from Homeville.  That's a cascading (or chained) slave,
> right?  How do I do that?

Fill in the "provider" field when you run the "store listen" command:

http://linuxdatabases.info/info/stmtstorelisten.html

> Question 4 - That's it - the trigger doesn't NOTIFY anybody and it
> doesn't push any data to the slaves, is that correct?

Correct.

> Question 5 - Why does slon.homeville sleep for a pre-determined (but
> adjustable) amount of time?  Why doesn't it LISTEN for a NOTIFY thrown
> by _Slony_I_logTrigger instead?  My guess is that slon.homeville would
> get flooded by notifications (and the postgres processes would waste a
> lot of time sending the notifications), is that right?

My knowledge is a little shaky here, so someone else may step in and 
add/correct, but it's worth noting that Slony doesn't require that all nodes 
be active at all times.  You can take a node offline for a week, bring it 
back online, and it will catch up (eventually).  There are definite 
performance repercussions to this if you have an active database, but it's 
possible.

> Question 6 - Why does slon.homeville have to write a SYNC event to
> sl_event and then do a NOTIFY?  Why isn't the NOTIFY sufficient to tell
> nearville that it needs to read through sl_log_1?  I guess it's because
> the _storesEvent notification can signal replication data or it can
> signal an administrative message - the sl_event record tells
> slon.nearville which event actually occurred.  Right so far?

I have to defer to someone else for this one.

> Question 7 - If slon.nearville isn't running, it obviously misses the
> NOTIFY that slon.homeville signalled.  Is that what the 60-second
> timeout thing is all about?  When slon.nearville finally gets up and
> running, it gets a SYNC notification so that it can catch up.

Same here.

> Question 8 - How does homeville know that it can start cleaning out
> sl_log_1?  homeville can't remove a modification record from sl_log_1
> until it knows that all subscribers have pulled that modification.  Is
> that what confirmations are all about?

Yes.  Once all nodes have confirmed that they've gotten to a certain point, 
I believe there's a "just in case" buffer, or something like that, but 
otherwise they're able to be cleared then.

> Does homeville batch together
> modification records (maybe each batch corresponds to a SYNC event?) and
> keep track of which subscribers have pulled each batch?

Look at _CLUSTER.sl_confirm and the _CLUSTER.sl_status view.

> Question 9 - How does homeville know that it can remove an entry from
> sl_event?

Same process as above, I would imagine.  I'd have to double-check the code 
to be sure.

> Question 10 - (sorry, one more bonus question).  Instead of keeping
> three databases on each system, can I keep a single database and three
> schemas?

Here's how you can tell that I didn't read the whole message before starting 
my reply.  :-)  Yes, you can do this, and I'd recommend it (for whatever my 
recommendation is worth).

Hope this helps,
Steve Simms

--
Steve Simms <steve at deefs.net>
http://www.deefs.net