Thu Jan 2 12:33:41 PST 2014
- Previous message: [Slony1-general] 6 hours to replicate single table, 12 hours to replicate DB
- Next message: [Slony1-general] 6 hours to replicate single table, 12 hours to replicate DB
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Thu, Jan 2, 2014 at 2:35 PM, Tory M Blue <tmblue at gmail.com> wrote: > > Wondering what settings I need to speed this up. To do a rebuild of a db > it takes a long time, 6 hours for a singe table. No I/O issues, no load, > just slon postgres taking their sweet old time. I would like to use the > resources available to speed this up. > > The table is > > 2013-12-21 19:17:58 PST CONFIG remoteWorkerThread_1: Begin COPY of table > "impressions" > 2013-12-21 19:37:03 PST CONFIG remoteWorkerThread_1: 12657163552 bytes > copied for table ”impressions” > 2013-12-22 01:40:22 PST CONFIG remoteWorkerThread_1: 22944.144 seconds to > copy table ”impressions” <— 6 hours > > Postgres 9.2.4 slony 2.1.3 > > This is a larger table, but because of bloat etc, we need to do ground ups > to clean it out every so often (Vacuums don't do it). > > > Slony config , pretty much at default other than sync interval. > > # Check for updates at least this often in milliseconds. > # Range: [10-60000], default 2000 > sync_interval=1000 > #sync_interval_timeout=10000 > # apply every single SYNC by itself. > # Range: [0,100], default: 6 > #sync_group_maxsize=6 > #sync_max_rowsize=8192 > #sync_max_largemem=5242880 > > I either need some advanced settings for when we are doing a rebuild, to > speed up the process, or I need to do some configurations that stay during > normal workloads as well. But normal workloads things are replicated and > keep in sync, it's just the rebuild portion. I would like to see it > actually stressing my boxen :) > > The one place where it might be worth modifying configuration is to change the amount of memory being used for sorts, as the step *after* the COPY of table "impressions" (which likely takes most of the remaining 6h of the subscription process) is to reindex that table. If the reindex takes 4h, changing the GUC might reduce that 4h significantly. http://www.postgresql.org/docs/9.2/static/runtime-config-resource.html#GUC-MAINTENANCE-WORK-MEM Aside from that, there's not too much else to improve in practice, and not too terribly much even in theory. We've had discussions (not too lately; I don't think anything relevant has changed too terribly much) about the idea of doing some multi-threading of the SUBSCRIBE/COPY process. The notable idea was to split off the REINDEX process to a series of extra threads so that it could be done in parallel. In that you have one Very Very Large Table, only a limited amount of benefit could be gotten from that; if the biggest table involves 6h of COPY and 4h of REINDEX, then the fastest, in theory, that the subscription could get done is 10h, which probably isn't enough improvement to be worth the added code + fragility that would result. The last time the concept was discussed, the conclusion was made to not implement parallel processing of REINDEX, on the basis that: a) It would be a fair bit of code, adding potential for new bugs; b) It would add more potential for lock and deadlock conditions; c) In the common case where one table's data dominates the database (which sure seems to be the case for you), there's little benefit as the theoretical minimum subscription time is *at minimum* time-to-COPY biggest table + time to reindex biggest table which makes the optimization pretty futile. Your case doesn't seem to invalidate the reasoning. You could get *some* benefit from splitting things into two sets, one consisting of the table "impressions" and the other consisting of the rest. I'd hazard the guess that there's not too much need to clean up the tables other than "impressions"; you could drop the 'outage' to the theoretical minimum if you don't bother re-subscribing the other tables. I wonder if there's merit to trying to split "impressions" into several tables, one which would contain the "hot, heavily updated" data, and others containing "cooked/complete" data that doesn't need to be reorganized. It may not be easy to distinguish between data that is "complete" and data that is still changing a lot. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20140102/7266606e/attachment.htm
- Previous message: [Slony1-general] 6 hours to replicate single table, 12 hours to replicate DB
- Next message: [Slony1-general] 6 hours to replicate single table, 12 hours to replicate DB
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list