[Slony1-bugs] [Bug 175] Monitoring cluster better

Thu Mar 17 10:42:33 PDT 2011

http://www.slony.info/bugzilla/show_bug.cgi?id=175

--- Comment #13 from Christopher Browne <cbbrowne at ca.afilias.info> 2011-03-17 10:42:33 PDT ---
I don't see a particularly good way to readily analyze CPU loading from this
addition, and I'm not sure that's even the best measurement, in any case.

As an alternative, I instrumented the code in two places, with the following
patch:

diff --git a/src/slon/monitor_thread.c b/src/slon/monitor_thread.c
index 108b338..d88abfd 100644
--- a/src/slon/monitor_thread.c
+++ b/src/slon/monitor_thread.c
@@ -180,6 +180,8 @@ monitorThread_main(void *dummy)
                                                free(state.activity);
                                        if (state.event_type != NULL)
                                                free(state.event_type);
+                                       slon_log(SLON_INFO, "monitor_thread -
record - %s\n",
+                                                       
dstring_data(&monquery));
                                        res = PQexec(dbconn,
dstring_data(&monquery));
                                        if (PQresultStatus(res) !=
PGRES_TUPLES_OK)
                                        {
@@ -384,6 +386,9 @@ monitor_state(const char *actor, int node, pid_t conn_pid,
/* @null@ */ const ch
        {
                tos->event_type = NULL;
        }
+       slon_log(SLON_INFO, "monitor_state(pid:%d node:%d connpid:%d event:%lld
actor[%s] activity[%s] event_type[%s]\n",
+                        tos->pid, tos->node, tos->conn_pid, tos->event,
+                        tos->actor, tos->activity, tos->event_type);
        pthread_mutex_unlock(&stack_lock);
 }


a) Each time an entry is pushed onto the stack, it is recorded in the log
b) Each time an entry is to be submitted to the DB, the query is recorded in
the log.

In running a typical regression test (e.g. - test1), I saw the following
results...

Test ran from  Mar 17 13:18:00 to Mar 17 13:19:06

 grep monitor_thread /var/log/messages | grep select  | wc -l

-> There were 214 calls to the component_state() function, across two nodes,
so...

214 / (66*2) = 1.6 calls per second

-> There were 471 calls to monitor_state(), across those nodes, that induced
these 214 calls.

That suggests to me that Jan's recommendation to optimize by allowing
successive calls to overwrite a repeated entry for the same thread is doing a
lot of good, reducing the DB load by more than half.

I find it compelling to expect this to be a pretty nominal addition to system
load for Slony.

Poking at the table:

slonyregress1 at localhost->  analyze verbose _slony_regress1.sl_components ;
INFO:  analyzing "_slony_regress1.sl_components"
INFO:  "sl_components": scanned 1 of 1 pages, containing 6 live rows and 17
dead rows; 6 rows in sample, 6 estimated total rows
ANALYZE
slonyregress1 at localhost->  \c slonyregress2
You are now connected to database "slonyregress2" as user "postgres".
slonyregress2 at localhost->  analyze verbose _slony_regress1.sl_components ;
INFO:  analyzing "_slony_regress1.sl_components"
INFO:  "sl_components": scanned 1 of 1 pages, containing 6 live rows and 11
dead rows; 6 rows in sample, 6 estimated total rows
ANALYZE

-- 
Configure bugmail: http://www.slony.info/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.