We get this error very often and one common scenario is,
We have configured a master and slave database clusters in two seperate servers. in the master databse we created two replication sets, one for tables and another for sequences, then we merged the sequence set with tables set. after that we start the slon daemon and we get this error. and there is noway of coming back, unless we restore the whole cluster. Although we subscribe, no replication occurs.
remoteWorkerThread_1: node -1 not found in runtime configuration.
2010-01-25 15:45:18 IST WARN remoteWorkerThread_1: data copy for set 1 failed - sleep 60 seconds
Please mention the cause of a this problem, or atleast how to overcome this issue without restoring the whole cluster.
What we think is happening is that the subscription information for the set on the subscriber is being deleted (ie by an unsubscribe set, but a merge set might be similar?) before the ENABLE SUBSCRIPTION is processed by the slon. When the event is finally processed the row in sl_subscription has already been deleted.
Changed version to devel because actually fixing this requires features.
UNSUBSCRIBE SET should continue to be issued against the subscriber. If the event would originate from the set origin, the subscriber must crawl through all the backlog to finally unsubscribe. That is a waste.
The processing of ENABLE_EVENT should on node -1 error simply confirm the event, assuming that the subscription was canceled via UNSUBSCRIBE.
Upon receiving an UNSUBSCRIBE_SET event, the origin of that set will issue yet another UNSUBSCRIBE_SET in order to guard against a possible race condition where a third node, that is a forwarder for the set, receives the initial UNSUBSCRIBE_SET before processing the initial SUBSCRIBE_SET. This would cause it to wrongfully think that the node is actually subscribed to the set.