Zac Bentley zbentley at crsinc.com
Thu Jun 7 07:03:24 PDT 2012
Context:

We run Slony 2.0 with Postgres 8.4 on two CentOS 6 servers--one master, and one slave. Our database is about 30GB in size, which isn't unusual, but we do have a couple of tables that are more than 5GB each.

Recently, we needed to re-build our Slony cluster. I turned off Slony, restored identical database snapshots on the master and the slave, set up my slony.conf and slon_tools.conf, started the slons, ran slonik_init_cluster | slonik, then slonik_create_set 1 | slonik (we only have one replication set), and finally slonik_subscribe_set 1 2 | slonik. Everything looked good, and I was able to watch subscription progress in the logs.

Then the server stopped responding. I rebooted it, and saw "Kernel panic - not syncing: Out of memory and no killable processes" after it had killed everything it could.

It happens during the subscription process when our first "large" (3GB) table is encountered. The logs report "so and so bytes copied for table" for the table in question, then a few dozen queued SYNC events, and then they detect a child process crash and log a watchdog-initialized restart.



What I've tried:

First I blew away the database completely, re ran initdb, and then restored the identical snapshots again. Same kernel panic. Then I blew it away, uninstalled Postgres and Slony, and reinstalled them. I double-checked all of our memory-based settings in postgresql.conf, and they are all at stock/recommended levels (i.e. shared_buffers is at 1/4 of RAM etc etc). I ran a VACUUM ANALYZE FULL on the database before initializing the Slony cluster. As a last-ditch effort, I completely reinstalled the OS and all base software on the db servers and started from scratch. Same result every time: kernel panic, out of memory. It happens on the slave server, which runs both of the slons.



Question:

Why is this happening?

Our database has grown fairly linearly over the past few months (at the beginning of the year it was about 23GB, now it's 30), and every other time I have had to re-initialize the Slony cluster on these same servers, it has worked fine.


Zac Bentley
Systems Administrator
Corporate Reimbursement Services, Inc.
www.crsinc.com<http://www.crsinc.com/>
617-467-1949


This email message contains information that Corporate Reimbursement Services, Inc. considers confidential and/or proprietary, or may later designate as confidential and proprietary. It is intended only for use of the individual or entity named above and should not be forwarded to any other persons or entities without the express consent of Corporate Reimbursement Services, Inc., nor should it be used for any purpose other than in the course of any potential or actual business relationship with Corporate Reimbursement Services, Inc. If the reader of this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately and destroy the original message.

Internal Revenue Service regulations require that certain types of written advice include a disclaimer. To the extent the preceding message contains advice relating to a Federal tax issue, unless expressly stated otherwise the advice is not intended or written to be used, and it cannot be used by the recipient or any other taxpayer, for the purpose of avoiding Federal tax penalties, and was not written to support the promotion or marketing of any transaction or matter discussed herein.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.slony.info/pipermail/slony1-general/attachments/20120607/2a743884/attachment.htm 


More information about the Slony1-general mailing list