People frequently ask for assistance in figuring out what might be wrong with their cluster. The first thing that you should do if you think there might be a problem (or even if you don't) is to run the test state scripts. That may help point you to where the problem is; it may also help point other would-be helpers to where the problem is.
If you're not running these scripts hourly against your cluster(s), you really should be...