Wed Jun 12 20:57:47 PDT 2019
- Previous message: [Slony1-general] slony backup during pg_dump
- Next message: [Slony1-general] slony backup during pg_dump
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, Jun 12, 2019 at 9:33 AM Christopher Browne <cbbrowne at afilias.info> wrote: > On Tue, Jun 11, 2019 at 3:44 PM Tory M Blue <tmblue at gmail.com> wrote: > >> >> >> On Tue, Jun 11, 2019 at 12:26 PM Christopher Browne < >> cbbrowne at afilias.info> wrote: >> >>> >>> >>> On Tue, Jun 11, 2019 at 1:30 PM Tory M Blue <tmblue at gmail.com> wrote: >>> >>>> I'm wondering if there is anything I can do to prevent what appears to >>>> be slony replication backup (not replicating) during a pg_dump. I'm using a >>>> -N slonschema, so wondering if there is any other techniques to prevent >>>> slon replication delay during pg_dumps? >>>> >>> >>> Hmm. I remember the "big deal" being to ensure that the Slony schema >>> was left out, as that would definitely lead to locking that would cause >>> replication to fall behind. >>> >>> It's possible that something new has come along that would lock in a >>> troublesome way. >>> >>> I suggest that you take a look in pg_catalog.pg_locks to see what >>> objects Slony is waiting on, thus, looking for: >>> >>> select * from pg_catalog.pg_locks where not granted; >>> >>> Some further work needs to be done to trace that to exactly what objects >>> are locked. >>> >> >> Ya will take a look, I also thought that bypassing the slon schema would >> solve it but apparently not, we can have 2-10million count in the sl_log >> before it clears and it's apparent that the replication is not happening, >> so I'll poke around some more. Glad to see folks are still around :) >> > > Well, growth of sl_log during the backup is fine, and not an indication of > anything untowards. Given that you opened a transaction when pg_dump > began, and that has not completed, that will lead to Slony not trimming > anything out of sl_log during the backup. That's normal, routine stuff. > Nothing's wrong (not on the basis of that, anyways). > > On the origin node, you can query the view sl_status, which will show how > far behind replication appears to be. (The view is available on all nodes; > it does not return terribly interesting results on nodes other than the > origin.) > > Here's an example of replication being very far behind: > > usaorigin=# select * from _usa.sl_status limit 3; > -[ RECORD 1 ]-------------+------------------------------ > st_origin | 1 > st_received | 3 > st_last_event | 5000002124 > st_last_event_ts | 2019-04-10 20:02:43.833414+00 > st_last_received | 5000002124 > st_last_received_ts | 2019-04-10 20:02:43.885806+00 > st_last_received_event_ts | 2019-04-10 20:02:43.833414+00 > st_lag_num_events | 0 > st_lag_time | 62 days 20:23:14.029816 > -[ RECORD 2 ]-------------+------------------------------ > st_origin | 1 > st_received | 4 > st_last_event | 5000002124 > st_last_event_ts | 2019-04-10 20:02:43.833414+00 > st_last_received | 5000002122 > st_last_received_ts | 2019-04-10 20:02:33.870577+00 > st_last_received_event_ts | 2019-04-10 20:02:23.76373+00 > st_lag_num_events | 2 > st_lag_time | 62 days 20:23:34.0995 > -[ RECORD 3 ]-------------+------------------------------ > st_origin | 1 > st_received | 5 > st_last_event | 5000002124 > st_last_event_ts | 2019-04-10 20:02:43.833414+00 > st_last_received | 5000002122 > st_last_received_ts | 2019-04-10 20:02:33.898605+00 > st_last_received_event_ts | 2019-04-10 20:02:23.76373+00 > st_lag_num_events | 2 > st_lag_time | 62 days 20:23:34.0995 > > That is a replication cluster where I shut down slon processes back in > April :-) > > It's only behind by a few events, for the self-same reason :-) > > If the last event received coincides with the time at which you started > the pg_dump run, well, that sure sounds like a smoking gun. > > If, on the other hand, SYNC events are being processed, just at slower > speed than you'd like, then perhaps the problem is that the I/O induced by > pg_dump is delaying replication, and that's more a hardware problem than a > software one. > > And if replication is a few seconds behind, while the system is busy, as > indicated by st_lag_time being a few seconds, then it's possible that there > is no problem there. > First postgres 9.x CentOS 7.2 Replication flows: primary ->secondary-> 3 query slaves Okay here it is, My dump started: Start export of schema clsdb for user postgres at Wed Jun 12 19:00:48 PDT 2019 So if I'm reading this correct it's 20:45 but something is showing that the last _received_event_ts was at 19:38? over an hour ago? sl_lag_time 1 hour? Primary: select * from _cls.sl_status limit 3; st_origin | st_received | st_last_event | st_last_event_ts | st_last_receive d | st_last_received_ts | st_last_received_event_ts | st_lag_num_events | st_lag_time -----------+-------------+---------------+-------------------------------+---------------- --+-------------------------------+-------------------------------+-------------------+--- -------------- 11 | 14 | 5036030946 | 2019-06-12 20:45:29.465785-07 | 503603013 8 | 2019-06-12 19:38:11.065976-07 | 2019-06-12 19:38:03.311267-07 | 808 | 01 :07:33.039694 11 | 15 | 5036030946 | 2019-06-12 20:45:29.465785-07 | 503603013 8 | 2019-06-12 19:38:12.202539-07 | 2019-06-12 19:38:03.311267-07 | 808 | 01 :07:33.039694 11 | 12 | 5036030946 | 2019-06-12 20:45:29.465785-07 | 503603013 8 | 2019-06-12 19:38:08.342637-07 | 2019-06-12 19:38:03.311267-07 | 808 | 01 :07:33.039694 (3 rows) I have 2.4 million rows in the sl_log_1 table, which looks like it started to hockey stick right around 19:30. On the secondary (which replicates to my query db's (where the dump is run from (not run from the primary), it includes these lines Secondary: select * from _cls.sl_status limit 3; st_origin | st_received | st_last_event | st_last_event_ts | st_la st_received | st_last_received_ts | st_last_received_event_ts | st _lag_num_events | st_lag_time -----------+-------------+---------------+-------------------------------+------ ------------+-------------------------------+-------------------------------+--- ----------------+----------------- 12 | 11 | 5008650405 | 2019-06-12 20:45:35.365757-07 | 5008650000 | 2019-06-12 19:38:03.815508-07 | 2019-06-12 19:38:03.338486-07 | 405 | 01:07:35.106539 12 | 13 | 5008650405 | 2019-06-12 20:45:35.365757-07 | 5008650403 | 2019-06-12 20:45:17.701455-07 | 2019-06-12 20:45:15.355233-07 | 2 | 00:00:23.089792 12 | 14 | 5008650405 | 2019-06-12 20:45:35.365757-07 | 5008650404 | 2019-06-12 20:45:26.642094-07 | 2019-06-12 20:45:25.360167-07 | 1 | 00:00:13.084858 (3 rows) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20190612/0ca570f6/attachment.htm
- Previous message: [Slony1-general] slony backup during pg_dump
- Next message: [Slony1-general] slony backup during pg_dump
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list