Richard Yen dba at richyen.com
Tue Dec 4 13:07:28 PST 2007
Hi All,

Not sure if this is a slony issue or a postgres issue...I'm posting on  
both.

I'm running slony on a one master/two subscriber system. One of the  
subscribers seems to get stuck on a group of queries, and I can't seem  
to figure out why.

If I do a select on pg_stat_activity, I get the following:

  datid | datname | procpid | usesysid | usename | current_query |  
waiting | query_start | backend_start | client_addr | client_port
-------+---------+---------+----------+---------- 
+ 
--------------------------------------------------------------------------------------------------------------+ 
---------+------------------------------- 
+-------------------------------+--------------+-------------
16384 | tii | 12204 | 16392 | slony | update only  
"public"."m_report_stats" set date_start='2007-12-03 13:27:05.661155'  
where objectid='56917411'; | f | 2007-12-04 11:20:23.839088-08 |  
2007-12-04 11:20:23.005228-08 | | -1
: update only "public"."m_object_paper" set overwriteflag='t' where  
id='56069688';
: insert into "public"."m_search_list" (nodeid,id) values  
('0','45844662');
: insert into "public"."m_search_list" (nodeid,id) values  
('1','45844662');
: insert into "public"."m_search_list" (nodeid,id) values  
('4','45844662');
: update only "public"."m_dg_read" set delete_flag='t' where  
id='1474821';
: insert into "public"."m_search_list" (nodeid,id) values  
('5','45844662');
: insert into "public"."m_search_list" (nodeid,id) values  
('14','45844662');
: update only "public"."m_user" set duration='02:52:24.744252' where  
id='10369924';
: insert into "public"."m_search_list" (nodeid,id) values  
('32','45844662');
:

What I've done so far is do an strace on the process, and I get the  
following:
tii-db1 postgres # strace -p 12204
Process 12204 attached - interrupt to quit
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
Process 12204 detached
tii-db1 postgres #

I've also set log_min_error_statement=debug5 on postgresql.conf, did a  
restart, and the logs show the following:
Dec 4 11:20:23 tii-db1 postgres[12204]: [1134-1] 2007-12-04  
11:20:23.846 PST [user=slony,db=tii [local] PID:12204 XID: 
129605365]DEBUG: ProcessQuery
Dec 4 11:20:23 tii-db1 postgres[12204]: [1134-2] 2007-12-04  
11:20:23.846 PST [user=slony,db=tii [local] PID:12204 XID: 
129605365]STATEMENT: update only "public"."m_report_stats"
Dec 4 11:20:23 tii-db1 postgres[12204]: [1134-3] set  
date_start='2007-12-03 13:27:05.661155' where objectid='56917411';
Dec 4 11:20:23 tii-db1 postgres[12204]: [1134-4] update only  
"public"."m_object_paper" set overwriteflag='t' where id='56069688';
Dec 4 11:20:23 tii-db1 postgres[12204]: [1134-5] insert into  
"public"."m_search_list" (nodeid,id) values ('0','45844662');
Dec 4 11:20:23 tii-db1 postgres[12204]: [1134-6] insert into  
"public"."m_search_list" (nodeid,id) values ('1','45844662');
Dec 4 11:20:23 tii-db1 postgres[12204]: [1134-7] insert into  
"public"."m_search_list" (nodeid,id) values ('4','45844662');
Dec 4 11:20:23 tii-db1 postgres[12204]: [1134-8] update only  
"public"."m_dg_read" set delete_flag='t' where id='1474821';
Dec 4 11:20:23 tii-db1 postgres[12204]: [1134-9] insert into  
"public"."m_search_list" (nodeid,id) values ('5','45844662');
Dec 4 11:20:23 tii-db1 postgres[12204]: [1134-10] insert into  
"public"."m_search_list" (nodeid,id) values ('14','45844662');
Dec 4 11:20:23 tii-db1 postgres[12204]: [1134-11] update only  
"public"."m_user" set duration='02:52:24.744252' where id='10369924';
Dec 4 11:20:23 tii-db1 postgres[12204]: [1134-12] insert into  
"public"."m_search_list" (nodeid,id) values ('32','45844662');
Dec 4 11:20:23 tii-db1 postgres[12204]: [1134-13]

For this process, the log is just stuck here and doesn't do anything  
else, as if it's perpetually waiting for something. I killed the  
process and restarted postgres numerous times (after changing logging  
settings in postgresql.conf), and every time I start the slon daemon,  
it gets stuck on this same event/query set.

The slon daemon seems to be working happily, fetching new events from  
the other nodes and adding them to the system...even after I kill the  
slon daemon, the above process is still there, waiting...

Any suggestions on what else I can do? I'm wondering if I should just  
drop the node and rebuild it...

Thanks!
--Richard


More information about the Slony1-general mailing list