Jun 23, 2010 14:47
There was a network blip or something that caused ntp to go haywire, as a result, one of our RAC database nodes rebooted repeatedly. When we shut down the RAC cluster services on that node, we saw the other node reboot. While watching all of this happen, we saw some messages about ntp synchronization being lost, so started watching ntp closely, and sure enough, ntp was getting some anomalous results and jumping the time 30 seconds, which caused Oracle's cluster services to reboot.
We ask the administrator of the ntp hosts about the configuration, since we're pointed to hosts A and B as primary and secondary ntp servers, and have been told that the primary ntp server for the company is host C.
Host A was pointed to host C for time synchronization. Host B was pointed to host A.
We're now pointing our cluster nodes at host C and have been stable since making the change...