Jeff threshar at torgo.978.org
Thu Mar 5 07:59:40 PST 2009
I've got a situation that I've lived with for a while, but its finally  
ticked me off enough (and I've finally got time to deal with it!) to  
do something about it. I just need to confirm one tiny detail.

In my current setup I have a master and several slaves.  One of these  
slaves is for servicing the website, another is for crunching data.    
Now, the problem that is happening is the crunch box can get quite  
lagged when crunching data - we're talking 9-10 hours.  Slony is  
running, but the crunch gobbles up so much IO and CPU that it cannot  
keep up with our rather high replication loads.  This in turn causes  
lag to build up on the master as crunch is behind, it can't purge  
records from sl_log_x.  This in turn causes the web-db to lag because  
the query to fetch new data is taking a long time.   I've analyzed the  
queries used, they seem sane, but there is just so much data in sl_log  
that its cpu bound. (Right now I've got about 12M rows in sl_log, it  
ends up having to filter 10M of those through xxid_xx_snapshot()).

Now, (before you all start saying "bah, buy more disks") things are  
fine under low load or smaller crunches, its just when we fire up a  
massive job this happens.  We've got some more hardware coming in, but  
it will only lessen the problem, not cure it.

So, my current thinking is if I setup a slave off of prod, lets call  
him crunchmaster and then have the existing crunch box sync off of  
crunchmaster will that insulate prod from crunch getting lagged? (as  
in, if crunchmaster is in sync, will prod be able to purge from  
sl_log, but crunchmaster's sl_logs will continue to grow until crunch  
is caught up?)

thanks guys!

--
Jeff Trout <jeff at jefftrout.com>
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/





More information about the Slony1-general mailing list