How do I fix the time server running 5-10min behind?


#1

I had been noticing this for a little bit now but it was only yesterday that the server got so behind that I was unable to get pkgs from either apt or dnf I don’t remember anymore - it’s been a blur the hours since then. The error was about certificates not being valid yet, which is almost never the case. I glanced at my phone nearby on a media console, it was a little over 5 minutes ahead of the computer, but less than 10.

That’s ideal screw-you-skew; just big enough to be significant, but not significant enough to be considered one of those weird fractional time zones. It would break Kerberos so fast and so silently it would be a nightmare to pinpoint the issue. The only thing that could make it worse is what actually happened here, Kerberos kept working. Domain Controllers get the time form the UCM and the rest of the network from DCs, like in other networks, therefore all systems were wrong in sync, which made them behave like nothing happened until the firewalls were crossed.

UCM’s daily workload

The UCM for a long time has been being used as a glorified NTP server because it’s 2020: people hate the phone. And, maybe devices go into a deep sleep or something sometimes I think because every once in a while to call another extension there would be some connection issue…on an isolated network where everything is statically assigned. There’s no load on the UCM or in its subnet.

UCM’s network privileges

On top of not doing anything, besides the resources to spare, the UCM is also among the highest-privileged devices on the network satarting with high QoS; 802.1P/CoS 5 and equivalent for DSCP, natted full-cone, manually routed IPv6, it’s actually the only host allowed to use the UDP port 123. Every other host either is blocked or it natted back to the UCM. Even so there isn’t a lot of traffic going to it since only rogue traffic gets natted to it, the vast majority of the local NTP traffic is handled by domain controllers, and lastly, in regards to the network, inter-VLAN traffic is handled by very fast L3 switches. If network a bottleneck is capable of causing this, that wouldn’t be it because there’s none.

Ruling out network interference

I set up two testing NTP servers, one in a PBX, one in a firewall, both virtualized so obviously against best practices, manually pointed a computer to them an waited for the time to catch up. It did. Soon it read the same time as the mobile phone. I reset the time that coming from the UCM and tried the next server with the same result. Both virtualized, resoruce-sharing servers were more accurate than a rack-mounted dedicated box with no load on it.

System updates & pool hopping

The last thing I attempted was updating the system, from 1.0.20.8 to 1.0.20.48. Recently where I live, the DST was finally phased out very quickly Apple and others pushed updates to their systems. On phones at least, they were significant enough to require a reboot. Though it wasn’t like a sudden change, the summer time was allowed to end as usual, it just wouldn’t come back again, and it’s an hour difference, not some minutes.

Recap

I ruled out the physical network, complete with less-than-ideal successful tests. I updated the device and toyed around with different upstream servers.

I sort of found out that other servers don’t like what the UCM has to say:

I think all that’s left is to do is manually adjusting the time, but that’s kind of the reason of its existence plus I have a feeling it won’t stay in sync anyway, and to change some very hidden dev-type setting — I assume — because the ones that are visible are barely any:

Has this happened to anybody?

How is ti fixed?

Thanks.


#2