Mon Apr 13 07:47:22 PDT 2009
- Previous message: [Slony1-general] Installing Slony-I 1.2.12R and PostgreSQL 8.3
- Next message: [Slony1-general] add table to existing set/namespace
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Bug #81 Filed: http://www.slony.info/bugzilla/show_bug.cgi?id=81 ~BAS On Fri, 27 Feb 2009, Bill Moran wrote: > > I'm seeing a problem similar to that described in this thread: > http://lists.slony.info/pipermail/slony1-general/2008-August/008492.html > > However, I was unable to isolate the cause. The solution was to restart > the slons on all nodes, which unstuck things and all tests show that > the cluster is replicating happily again. > > I'm trying to determine how/why the problem happened. In this case it > was triggered by moving an origin around (as part of a routine disaster > drill). The set moved to the backup without problem, but we didn't notice > that the slon on node 2 had died until we went to move things back. At > that point, the move set hung, and eventually bailed with "timeout exceeded > while waiting for event confirmation" > > If anyone has any insight into this, I'm all ears. Otherwise, I can set > up a test bed if there's anything I can do to provide more information. > > Some basic details: > FreeBSD 6.3 > PostgreSQL 8.3.5 > Slony 1.2.15 > > Cluster = > node 1: masterdb in facil0 > node 2: localslavedb in facil0 > node 3: remoteslavedb in facil1 > > 2009 Feb 27 06:31:43 -05:00 localslavedb [slon][55583] [local2] [err] slon[55583]: [8393-1] [55583] CONFIG moveSet: set_id=1 old_origin=1 new_origin=3 > 2009 Feb 27 06:31:43 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8394-1] [55583] DEBUG1 remoteWorkerThread_3: helper thread for provider 3 created > 2009 Feb 27 06:31:43 -05:00 localslavedb [slon][55583] [local2] [err] slon[55583]: [8395-1] [55583] CONFIG storeListen: li_origin=1 li_receiver=2 li_provider=1 > 2009 Feb 27 06:31:43 -05:00 localslavedb [slon][55583] [local2] [err] slon[55583]: [8396-1] [55583] CONFIG storeListen: li_origin=1 li_receiver=2 li_provider=3 > 2009 Feb 27 06:31:43 -05:00 localslavedb [slon][55583] [local2] [err] slon[55583]: [8397-1] [55583] CONFIG storeListen: li_origin=3 li_receiver=2 li_provider=3 > 2009 Feb 27 06:31:43 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8398-1] [55583] DEBUG1 remoteWorkerThread_1: helper thread for provider 1 terminated > 2009 Feb 27 06:31:43 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8399-1] [55583] DEBUG1 remoteWorkerThread_1: disconnecting from data provider 1 > 2009 Feb 27 06:31:52 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8400-1] [55583] DEBUG1 remoteWorkerThread_3: connected to data provider 3 on 'dbname=database_name_prod host=remoteslavedb-ifx.bdb.facil1 > 2009 Feb 27 06:31:52 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8400-2] user=database_name_slony password=[redacted]' > 2009 Feb 27 06:32:01 -05:00 localslavedb [slon][55583] [local2] [info] slon[55583]: [8401-1] [55583] INFO localListenThread: got restart notification > 2009 Feb 27 06:32:01 -05:00 localslavedb [slon][1058] [local2] [debug] slon[1058]: [2-1] [1058] DEBUG1 slon: retry requested > 2009 Feb 27 06:32:01 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8402-1] [55583] DEBUG1 main: scheduler mainloop returned > 2009 Feb 27 06:32:01 -05:00 localslavedb [slon][55583] [local2] [info] slon[55583]: [8403-1] [55583] INFO remoteListenThread_3: disconnecting from 'dbname=database_name_prod host=remoteslavedb-ifx.bdb.facil1 > 2009 Feb 27 06:32:01 -05:00 localslavedb [slon][55583] [local2] [info] slon[55583]: [8403-2] user=database_name_slony password=[redacted]' > 2009 Feb 27 06:32:01 -05:00 localslavedb [slon][55583] [local2] [info] slon[55583]: [8404-1] [55583] INFO remoteListenThread_1: disconnecting from 'dbname=database_name_prod host=masterdb-db.cluster00.facil0 > 2009 Feb 27 06:32:01 -05:00 localslavedb [slon][55583] [local2] [info] slon[55583]: [8404-2] user=database_name_slony password=[redacted]' > 2009 Feb 27 06:32:01 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8405-1] [55583] DEBUG1 syncThread: thread done > 2009 Feb 27 06:32:01 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8406-1] [55583] DEBUG1 cleanupThread: thread done > 2009 Feb 27 06:32:01 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8407-1] [55583] DEBUG1 remoteWorkerThread_1: thread done > 2009 Feb 27 06:32:02 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8408-1] [55583] DEBUG1 remoteListenThread_3: thread done > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8409-1] [55583] DEBUG1 remoteListenThread_1: thread done > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8410-1] [55583] DEBUG1 main: done > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][1058] [local2] [debug] slon[1058]: [3-1] [1058] DEBUG1 slon: restart of worker > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][1058] [local2] [err] slon[1058]: [1-1] [1058] CONFIG main: slon version 1.2.15 starting up > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [2-1] [27776] CONFIG main: local node id = 2 > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [3-1] [27776] CONFIG main: launching sched_start_mainloop > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [4-1] [27776] CONFIG main: loading current cluster configuration > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [5-1] [27776] CONFIG storeNode: no_id=1 no_comment='Master node' > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [6-1] [27776] CONFIG storeNode: no_id=3 no_comment='remote slave node' > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [7-1] [27776] CONFIG storePath: pa_server=1 pa_client=2 pa_conninfo="dbname=database_name_prod host=masterdb-db.cluster00.facil0 > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [7-2] user=database_name_slony password=[redacted]" pa_connretry=10 > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [8-1] [27776] CONFIG storePath: pa_server=3 pa_client=2 pa_conninfo="dbname=database_name_prod host=remoteslavedb-ifx.bdb.facil1 > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [8-2] user=database_name_slony password=[redacted]" pa_connretry=10 > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [9-1] [27776] CONFIG storeListen: li_origin=1 li_receiver=2 li_provider=1 > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [10-1] [27776] CONFIG storeListen: li_origin=1 li_receiver=2 li_provider=3 > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [11-1] [27776] CONFIG storeListen: li_origin=3 li_receiver=2 li_provider=3 > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [12-1] [27776] CONFIG storeSet: set_id=1 set_origin=3 set_comment='All tables' > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [warning] slon[27776]: [13-1] [27776] WARN remoteWorker_wakeup: node 3 - no worker thread > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [14-1] [27776] CONFIG storeSubscribe: sub_set=1 sub_provider=3 sub_forward='t' > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [warning] slon[27776]: [15-1] [27776] WARN remoteWorker_wakeup: node 3 - no worker thread > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [16-1] [27776] CONFIG enableSubscription: sub_set=1 > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [warning] slon[27776]: [17-1] [27776] WARN remoteWorker_wakeup: node 3 - no worker thread > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [18-1] [27776] CONFIG main: configuration complete - starting threads > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [debug] slon[27776]: [19-1] [27776] DEBUG1 localListenThread: thread starts > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [20-1] [27776] FATAL localListenThread: "select "_database_name_prod".cleanupNodelock(); insert into > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [20-2] "_database_name_prod".sl_nodelock values ( 2, 0, "pg_catalog".pg_backend_pid()); " - ERROR: duplicate key value violates > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [20-3] unique constraint "sl_nodelock-pkey" > 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][1058] [local2] [debug] slon[1058]: [2-1] [1058] DEBUG1 slon: shutdown requested > 2009 Feb 27 06:32:24 -05:00 localslavedb [slon][1058] [local2] [debug] slon[1058]: [3-1] [1058] DEBUG1 slon: child termination timeout - kill child > 2009 Feb 27 06:32:24 -05:00 localslavedb [slon][1058] [local2] [debug] slon[1058]: [4-1] [1058] DEBUG1 slon: done > > -- > Bill Moran > http://www.potentialtech.com > http://people.collaborativefusion.com/~wmoran/ > _______________________________________________ > Slony1-general mailing list > Slony1-general at lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general > l8* -lava (Brian A. Seklecki - Pittsburgh, PA, USA) http://www.spiritual-machines.org/ "Show me a young conservative and I'll show you someone with no heart. Show me an old liberal and I'll show you someone with no brains." ~ Winston Churchill
- Previous message: [Slony1-general] Installing Slony-I 1.2.12R and PostgreSQL 8.3
- Next message: [Slony1-general] add table to existing set/namespace
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list