[Slony1-general] How many slaves will slony scale to for a smaller database on cheap, commodity x86 boxes?

Mon Mar 8 08:17:20 PST 2010

Karl Denninger wrote:
> Brad Nicholson wrote:
>> On Mon, 2010-03-08 at 12:58 +0000, John Moran wrote:
>>   
>>> On Mon, Mar 8, 2010 at 7:58 AM, Ian Lea <ian.lea at gmail.com> wrote:
>>>     
>>>> If the slaves are local i.e. LAN rather than WAN and the update volume
>>>> is low, it should work OK.
>>>>
>>>> The hardware spec of the slave should be irrelevant, as long as they
>>>> can cope with the load.
>>>>
>>>>
>>>> --
>>>> Ian.
>>>>       
>>> Great. Is there hard data available on how well slony-I scales on a
>>> LAN without using cascading?
>>>
>>>     
>>
>> Not that I am aware of. 
>>
>> It is going to be highly dependant on the write load and speed of
>> hardware.
>>
>>   
> Note that the EXPECTATION (based on the design) is that transactional 
> traffic will approximately rise on a quadratic basis as the number of 
> slaves increase at the same branch level.
>
> This PROBABLY doesn't get you in trouble before you reach a half-dozen 
> to a dozen slaves, roughly, but beyond that, it both can and will.
>
> There is no particular reason not to use a cascade (or "branched") 
> structure to control this as the number of replicated nodes increases.
That's overstating the thing that behaves quadratically.

What notably increases in quadratic fashion is the coordination work 
between the nodes, that is, the sending of SYNC confirmations.  Each 
node, as a potential failover target, needs to know where the other 
nodes are at, and so this information is indeed widely reported.

This will, as the number of nodes increases, eventually become dominant, 
but it is by no means obvious that the cost becomes prohibitive at a 
*low* level of activity.

If the sizes of SYNCs are pretty large (e.g. - each SYNC on an active 
origin is capturing rather a lot of update activity), then the cost of 
confirmations will only be a tiny fraction of this, and it would take a 
rather large increase in the number of nodes in order for the 
confirmation costs to outweigh the ordinary (scaling at linear cost) 
activity required to transfer the replicated INSERT/UPDATE/DELETE 
statements.

The two things much more likely to cause trouble are if:

a) The origin nodes aren't powerful enough to cope with the combination of:
   1.  The update load induced by the application,
   2.  The added INSERTs into sl_log_1/sl_log_2 done by the Slony-I 
triggers, and
   3.  The added query load induced by subscribers pulling data from the 
origin,
or
b) Subscriber nodes aren't powerful enough to cope with the update load 
induced by the INSERT/UPDATE/DELETE requests performed by Slony-I.

Note that using "cascaded subscriptions" can be a big help for a.3; you 
need at least one node subscribing to the origin, but you can cut down 
on load against the origin by having other nodes subscribe indirectly.