[Slony1-general] Replication node suddenly lagging, CPU bound postmaster

Wed Aug 20 03:15:20 PDT 2008

Hi everyone.

I have a replicating node that suddenly started to lag, on a 4 nodes Slony
cluster that worked well for months. This node is powerful enough (ie. older,
slower machines on the cluster achieve to keep up well). Network and block
devices are mostly idling (with regard to interrupts/second and throughput).
Strangely, the replication on this node seems CPU bound by the postmaster
process doing the actual inserts/updates for slon (this postmaster process
is stuck at 99% CPU usage since the beginning of the problem).
Neither Slony (at "slon -d2" level) nor PostgreSQL did log any warning or
error message, and the replication did not stopped on this node (it makes
progresses, but too slowly to keep up, so it's now 3 days behind master).

Any clue?

Where should I look to better diagnose this problem (I may miss the obvious,
I'm very new to Slony and PostgreSQL)?

More details:

* Slony 1.2.13 on PostgreSQL 8.1.10
* "psql_replication_check.pl" and "test_slony_state" do not find problem
* A few hours before I noticed the problem, I had an almost (98%) full
  tablespace on this node. Which I solved by moving a large index on an
  other tablespace (with "ALTER INDEX foo SET TABLESPACE bar;").
* Selects runs fast as usual on this node (ie. faster than some other nodes)
* According to pg_stat_activity , the only activity running now on the
  busy/working postmaster is the replication's inserts/updates. And there's
  no significant remaining activity running on this machine, beside replication
* No other processes than replication do writes on the node's database.
* The replicating/inserting postmaster process eats 99% of one CPU. Slon
  uses between 4% to 20% of an other CPU (this is a 8 cpu machine).
* The block devices aren't busy at all (iostat show a very low block I/O
  activity, ie. about 100KB to 500KB read&writes/second on an hardware
  raid10 array of 15Krpm scsi drives that can handle much more activity). 
  Actually, block devices are mostly idling on this node.
* The replication is not totally halted, it just became very slow (ie. the
  node has an increasing lag behind the master node, but makes progresses).
  So I guess it's not locked rebuilding a large index or halted on some
  error condition. And I can see the running replication insert/update
  requests evolve over time (according to pg_stat_activity).
* This node is one of the fastest machines of the cluster (wrt both cpu
  and blocks devices) and the slower nodes has no problem keeping up with
  replication.
* This cluster was working for months, and there was no activity peack the
  day it failed.
* We didn't change the replication set, and didn't alter any replicated
  table.
* Stracing the working postmaster process, I didn't see any suspect/long 
  syscalls that could explain the cpu usage or lag (just a few lseek, read, 
  send, recv, but since block devices and network link are mostly idle, that
  shouldn't cause slowdowns). I guess this process consumes cpu by doing
  some internal heavy calculations (?).
* I restarted the whole slon cluster twice, restarded this node's postgresql,
  and even (in despair) rebooted the node.

logs etc :

* examples logs :
Aug 18 17:39:22 mynode cluster_myclust: 2008-08-18 17:39:22 CEST DEBUG2 remoteListenThread_2: LISTEN
Aug 18 17:39:22 mynode cluster_myclust: 2008-08-18 17:39:22 CEST DEBUG2 remoteWorkerThread_2: forward confirm 1,66023883 received by 2
Aug 18 17:39:23 mynode cluster_myclust: 2008-08-18 17:39:23 CEST DEBUG2 remoteWorkerThread_2: forward confirm 1,66023884 received by 2
Aug 18 17:39:23 mynode cluster_myclust: 2008-08-18 17:39:23 CEST DEBUG2 remoteWorkerThread_2: forward confirm 2,7658887 received by 1
Aug 18 17:39:23 mynode cluster_myclust: 2008-08-18 17:39:23 CEST DEBUG2 remoteWorkerThread_2: forward confirm 3,7751837 received by 1
Aug 18 17:39:23 mynode cluster_myclust: 2008-08-18 17:39:23 CEST DEBUG2 remoteWorkerThread_2: forward confirm 4,1550604 received by 1
Aug 18 17:39:23 mynode cluster_myclust: 2008-08-18 17:39:23 CEST DEBUG2 remoteListenThread_1: queue event 1,66023883 SYNC
Aug 18 17:39:23 mynode cluster_myclust: 2008-08-18 17:39:23 CEST DEBUG2 remoteListenThread_1: queue event 1,66023884 SYNC
Aug 18 17:39:23 mynode cluster_myclust: 2008-08-18 17:39:23 CEST DEBUG2 remoteListenThread_1: queue event 1,66023885 SYNC
Aug 18 17:39:23 mynode cluster_myclust: 2008-08-18 17:39:23 CEST DEBUG2 remoteListenThread_1: UNLISTEN
[...]
Aug 19 13:57:51 mynode cluster_myclust: 2008-08-19 13:57:51 CEST DEBUG2 remoteListenThread_1: queue event 1,65832742 SYNC
Aug 19 13:57:51 mynode cluster_myclust: 2008-08-19 13:57:51 CEST DEBUG2 remoteListenThread_1: queue event 1,65832743 SYNC
Aug 19 13:57:51 mynode cluster_myclust: 2008-08-19 13:57:51 CEST DEBUG2 remoteWorkerThread_1: Received event 1,65832742 SYNC
Aug 19 13:57:51 mynode cluster_myclust: 2008-08-19 13:57:51 CEST DEBUG2 remoteListenThread_1: queue event 1,65832744 SYNC
Aug 19 13:57:51 mynode cluster_myclust: 2008-08-19 13:57:51 CEST DEBUG2 calc sync size - last time: 1 last length: 1660 ideal: 36 proposed size: 3
Aug 19 13:57:51 mynode cluster_myclust: 2008-08-19 13:57:51 CEST DEBUG2 remoteListenThread_1: queue event 1,65832745 SYNC
Aug 19 13:57:51 mynode cluster_myclust: 2008-08-19 13:57:51 CEST DEBUG2 remoteListenThread_1: queue event 1,65832746 SYNC
[...]
Aug 19 13:57:51 mynode cluster_myclust: 2008-08-19 13:57:51 CEST DEBUG2 remoteListenThread_1: queue event 1,65832758 SYNC
Aug 19 13:57:51 mynode cluster_myclust: 2008-08-19 13:57:51 CEST DEBUG2 remoteListenThread_1: queue event 1,65832759 SYNC
Aug 19 13:57:51 mynode cluster_myclust: 2008-08-19 13:57:51 CEST DEBUG2 remoteWorkerThread_1: SYNC 65832744 processing

* strace  -ttt -p 17450 :
1219073660.529469 _llseek(196, 17924096, [17924096], SEEK_SET) = 0
1219073660.529530 read(196, "p\v\0\0\0vke\1\0\0\0\344\0\20\33\360\37\3
\330\2370\0\270"..., 8192) = 8192
1219073660.531583 _llseek(214, 6086656, [6086656], SEEK_SET) = 0
1219073660.531680 read(214, "p\v\0\0\350oke\1\0\0\0<\2\340\v\0 \3
\260\233h\0|\233h"..., 8192) = 8192
1219073660.536787 send(6, "\3\0\0\0&\0\0\0\3\0\0\0*D\0\0<IDLE> in transa"...,
38, 0) = 38