[Slony1-general] postgresq freezes up on a query from slony

Tue Dec 4 21:35:32 PST 2007

Richard Yen <dba at richyen.com> writes:
> On Dec 4, 2007, at 1:39 PM, Christopher Browne wrote:
>
>> Richard Yen <dba at richyen.com> writes:
>>> Not sure if this is a slony issue or a postgres issue...I'm posting
>>> on
>>> both.
>>>
>>> I'm running slony on a one master/two subscriber system. One of the
>>> subscribers seems to get stuck on a group of queries, and I can't
>>> seem
>>> to figure out why.
>>>
>>> If I do a select on pg_stat_activity, I get the following:
>>>
>>> datid | datname | procpid | usesysid | usename | current_query |
>>> waiting | query_start | backend_start | client_addr | client_port
>>> -------+---------+---------+----------+----------
>>> +
>>> --------------------------------------------------------------------------------------------------------------+
>>> ---------+-------------------------------
>>> +-------------------------------+--------------+-------------
>>> 16384 | tii | 12204 | 16392 | slony | update only
>>> "public"."m_report_stats" set date_start='2007-12-03 13:27:05.661155'
>>> where objectid='56917411'; | f | 2007-12-04 11:20:23.839088-08 |
>>> 2007-12-04 11:20:23.005228-08 | | -1
>>> : update only "public"."m_object_paper" set overwriteflag='t' where
>>> id='56069688';
>>> : insert into "public"."m_search_list" (nodeid,id) values
>>> ('0','45844662');
>>> : insert into "public"."m_search_list" (nodeid,id) values
>>> ('1','45844662');
>>> : insert into "public"."m_search_list" (nodeid,id) values
>>> ('4','45844662');
>>> : update only "public"."m_dg_read" set delete_flag='t' where
>>> id='1474821';
>>> : insert into "public"."m_search_list" (nodeid,id) values
>>> ('5','45844662');
>>> : insert into "public"."m_search_list" (nodeid,id) values
>>> ('14','45844662');
>>> : update only "public"."m_user" set duration='02:52:24.744252' where
>>> id='10369924';
>>> : insert into "public"."m_search_list" (nodeid,id) values
>>> ('32','45844662');
>>> :
>>
>> 1.  Are you certain that it is in fact the same query each time?
>>
>>    It could be that it's working on a rather large SYNC (or set of
>>    grouped SYNCs), and you're seeing a set of queries that *look*
>>    similar, but it's just *looking* like it's not progressing.
>>
> Yes, I'm certain it's the same query every time.  When I first
> noticed, the query_start in pg_stat_activity was approx. 4 hours
> before.  The current attempt is about 3 hours old.

OK, I imagined you'd likely have a clue on this, but if a dumb
question helps, that's a good use of a dumb question ;-).

>>  I've had DBAs come to me 'asking why slon had stopped,' and what
>>    we discovered is that there was a frequently repeating pattern of
>>    queries so that it *looked* like it wasn't progressing, but
>>    actually was.
>>
>> 2.  If it is stuck, could it be because of an ungranted lock?
>>
> No, it's not from an ungranted lock.  The query below:
> tii=# select * from pg_locks where not granted;
>  locktype | database | relation | page | tuple | transactionid |
> classid | objid | objsubid | transaction | pid | mode | granted
> ----------+----------+----------+------+-------+--------------- 
> +---------+-------+----------+-------------+-----+------+---------
> (0 rows)
>
> tii=#
>
>
>>    I'd look for any ungranted locks requested by process 16392, and
>>    see if I can correlate this with something else that has already
>>    grabbed some object...
>
> You mean process 12204?  Just making sure...

My bad - yes, that was 12204.

If you have no ungranted locks at all, that certainly answers what I
was wondering.

> Also, this process is taking up 100% of CPU, if that gives any clue.
> The frustrating part is that it takes up 100% CPU, but there's no
> activity in the postgres log or under strace... :/

OK, well, you've answered all of what came initially to my mind.
Definitely bona fide mysterious, and yes, strace-worthy.

That it's getting horde of identical select() requests timing out
seems suspicious, but I'm not sure of quite what.  Smells like a
pgsql-hackers question.  It's not unimaginable that something
Slony-I-related might be inducing it, but I think asking on the PG
side is the right move here.
-- 
(format nil "~S@~S" "cbbrowne" "ca.afilias.info")
<http://dba2.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)