Tue Dec 4 21:35:32 PST 2007
- Previous message: [Slony1-general] postgresq freezes up on a query from slony
- Next message: [Slony1-general] Dropped a node and still has status
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Richard Yen <dba at richyen.com> writes: > On Dec 4, 2007, at 1:39 PM, Christopher Browne wrote: > >> Richard Yen <dba at richyen.com> writes: >>> Not sure if this is a slony issue or a postgres issue...I'm posting >>> on >>> both. >>> >>> I'm running slony on a one master/two subscriber system. One of the >>> subscribers seems to get stuck on a group of queries, and I can't >>> seem >>> to figure out why. >>> >>> If I do a select on pg_stat_activity, I get the following: >>> >>> datid | datname | procpid | usesysid | usename | current_query | >>> waiting | query_start | backend_start | client_addr | client_port >>> -------+---------+---------+----------+---------- >>> + >>> --------------------------------------------------------------------------------------------------------------+ >>> ---------+------------------------------- >>> +-------------------------------+--------------+------------- >>> 16384 | tii | 12204 | 16392 | slony | update only >>> "public"."m_report_stats" set date_start='2007-12-03 13:27:05.661155' >>> where objectid='56917411'; | f | 2007-12-04 11:20:23.839088-08 | >>> 2007-12-04 11:20:23.005228-08 | | -1 >>> : update only "public"."m_object_paper" set overwriteflag='t' where >>> id='56069688'; >>> : insert into "public"."m_search_list" (nodeid,id) values >>> ('0','45844662'); >>> : insert into "public"."m_search_list" (nodeid,id) values >>> ('1','45844662'); >>> : insert into "public"."m_search_list" (nodeid,id) values >>> ('4','45844662'); >>> : update only "public"."m_dg_read" set delete_flag='t' where >>> id='1474821'; >>> : insert into "public"."m_search_list" (nodeid,id) values >>> ('5','45844662'); >>> : insert into "public"."m_search_list" (nodeid,id) values >>> ('14','45844662'); >>> : update only "public"."m_user" set duration='02:52:24.744252' where >>> id='10369924'; >>> : insert into "public"."m_search_list" (nodeid,id) values >>> ('32','45844662'); >>> : >> >> 1. Are you certain that it is in fact the same query each time? >> >> It could be that it's working on a rather large SYNC (or set of >> grouped SYNCs), and you're seeing a set of queries that *look* >> similar, but it's just *looking* like it's not progressing. >> > Yes, I'm certain it's the same query every time. When I first > noticed, the query_start in pg_stat_activity was approx. 4 hours > before. The current attempt is about 3 hours old. OK, I imagined you'd likely have a clue on this, but if a dumb question helps, that's a good use of a dumb question ;-). >> I've had DBAs come to me 'asking why slon had stopped,' and what >> we discovered is that there was a frequently repeating pattern of >> queries so that it *looked* like it wasn't progressing, but >> actually was. >> >> 2. If it is stuck, could it be because of an ungranted lock? >> > No, it's not from an ungranted lock. The query below: > tii=# select * from pg_locks where not granted; > locktype | database | relation | page | tuple | transactionid | > classid | objid | objsubid | transaction | pid | mode | granted > ----------+----------+----------+------+-------+--------------- > +---------+-------+----------+-------------+-----+------+--------- > (0 rows) > > tii=# > > >> I'd look for any ungranted locks requested by process 16392, and >> see if I can correlate this with something else that has already >> grabbed some object... > > You mean process 12204? Just making sure... My bad - yes, that was 12204. If you have no ungranted locks at all, that certainly answers what I was wondering. > Also, this process is taking up 100% of CPU, if that gives any clue. > The frustrating part is that it takes up 100% CPU, but there's no > activity in the postgres log or under strace... :/ OK, well, you've answered all of what came initially to my mind. Definitely bona fide mysterious, and yes, strace-worthy. That it's getting horde of identical select() requests timing out seems suspicious, but I'm not sure of quite what. Smells like a pgsql-hackers question. It's not unimaginable that something Slony-I-related might be inducing it, but I think asking on the PG side is the right move here. -- (format nil "~S@~S" "cbbrowne" "ca.afilias.info") <http://dba2.int.libertyrms.com/> Christopher Browne (416) 673-4124 (land)
- Previous message: [Slony1-general] postgresq freezes up on a query from slony
- Next message: [Slony1-general] Dropped a node and still has status
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list