Sat Feb 23 10:28:57 PST 2008
- Previous message: [Slony1-general] syslog output?
- Next message: [Slony1-general] STILL can't migrate a node.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Jan Wieck wrote: > On 2/23/2008 12:20 AM, Craig James wrote: >> A little more info on this problem... >> >> Craig James wrote: >>> I'm trying to migrate a node for the second time, and no luck. Last >>> time I tried it, it just got stuck, and due to lack of time, I didn't >>> investigate. >>> >>> This time I watched -- it got stuck again, doing some sort of huge >>> SELECT statement. I was under the impression that migrating a node >>> was a fairly simple operation that should happen in a short time >>> (less than a minute?) even for large databases. >>> >>> I waited 10 minutes, during which the entire system was completely >>> locked up (no other process could access the database), and our web >>> site was offline. I finally had to kill all of the slon daemons and >>> kill Postgres to get our site back on the air, then run the >>> node-unlock command to get Slony back in shape. >>> >>> This system appears to otherwise be working well. I can insert, >>> update and delete records, and they're copied to the slave node >>> immediately. >>> >>> What's up? Am I just too impatient? >> >> I tried it again, after vacuuming the slony tables that are subject to >> bloat. This time I shut everything off, started the migration of the >> master to node 2, and waited for 35 minutes, but the SELECT never >> finished. vmstat showed massive I/O and CPU activity the whole time. > > What SELECT are you referring to? I don't see where in the MOVE SET you > have to perform any SELECT. You tell me? It is the slon daemon that is executing this select. There were no other connections to the database the second time I tried this. >> Again, after I killed postgres, restarted, and unlocked the node, >> Slony went back to performing perfectly. > > Killing postgres is a bad idea. Stop that habit right now, before you > physically corrupt any of your databases. Thanks for the advice, but I don't think it's a problem. That's one of the features of a robust relational database with a write-ahead log -- it can withstand being killed without corrupting data. Besides, I had no choice, my web site went offline because slon apparently took an exclusive lock on the tables, blocking all other activity. And I killed a SELECT, not an INSERT or UPDATE. But that's a topic for a separate discussion ... I have to fix this Slony problem first. > Anyhow, apparently the LOCK SET part of the process succeeds. So what I > now assume is that the WAIT FOR EVENT never finishes. First, you don't > need a WAIT FOR EVENT between LOCK SET and MOVE SET. Both events are > executed on the origin, so by the time the LOCK SET finishes, everything > is ready for the MOVE. I don't think it got as far as this, but I don't know the internals. When I execute the script, the SELECT starts, and that's where everything comes to a sudden halt. > But what this indicates is that node 2 never confirms the LOCK SET. Can > it be that you actually have a problem with the connection from node 2 > to node 1? What is the content of the view sl_status on both nodes? Both nodes seem to be normal -- the st_last_event_ts is just a few seconds prior to the query, st_lag_time is 00:00:11.465251 (node 1) and 00:00:07.50164. > If you want to speed up this communication in order to meet your Sat. > noon deadline, I'll be available on IRC, channel #slony on freenode. Thanks, but I don't use an IRC client, hope you get this. Craig
- Previous message: [Slony1-general] syslog output?
- Next message: [Slony1-general] STILL can't migrate a node.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list