8. Troubleshooting
Last update: June 27, 2022 16:22 UTC (1a7aee0a0)
If you have set up your software, you usually want to know whether it works. This section discusses topics related to configuration, monitoring, troubleshooting, and debugging of NTP.
8.1 Monitoring
8.1.1 How do I confirm my NTP server is working fine?
8.1.2 How do I use peerstats and loopstats?
8.1.3 How can I see the Time Difference between Client and Server?
8.1.4 What does 257 mean as value for reach?
8.1.5 How do I use statistics files?
8.1 Monitoring
Without any doubt, troubleshooting requires monitoring. Somehow you must find out that something is wrong before you wonder how to fix it.
8.1.1 How do I confirm my NTP server is working fine?
One of the quickest commands to verify that ntpd
is still up and running is ntpq -p
. That command will show all configured peers together with their performance data.
As the above command requires periodic invocation to monitor performance, it is also recommended to enable statistic files in ntpd
.
8.1.2 How do I use peerstats and loopstats?
I use the following lines in /etc/ntp.conf
to enable loopfilter statistics. New files are created every day, and the current files are available as /var/log/ntp/peers
and /var/log/ntp/loops
. Older files are archived as /var/log/ntp/peersYYYYMMDD
and /var/log/ntp/loops.YYYYMMDD
:
statistics loopstats
statsdir /var/log/ntp/
filegen peerstats file peers type day link enable
filegen loopstats file loops type day link enable
Usually I only monitor the loops
file. Table 8.1a lists the individual fields of each file.
Table 8.1a: Statistic Files
File Type |
List of Fields |
loopstats |
day, second, offset, drift compensation, estimated error, stability, polling interval |
peerstats |
day, second, address, status, offset, delay, dispersion, skew (variance) |
8.1.3 How can I see the Time Difference between Client and Server?
(By Terje Mathisen) Normally ntpd
maintains an estimate of the time offset. To inspect these offsets, you can use the following commands:
ntpq -p
will display the offsets for each reachable server in milliseconds (ntpdc -p
uses seconds instead).
ntpdc -c loopinfo
will display the combined offset in seconds, as seen at the last poll. If supported, ntpdc -c kerninfo
will display the current remaining correction, just as ntptime
does.
The first command can be used to check what ntpd
thinks the offset and jitter is currently, relative to the preferred/current server. The second command can tell you something about the estimated offset/error all the way to the stratum 1 source. Q: 8.1.2 describes a way to collect such data automatically.
If a PPS source is active, the offset displayed with the second choice is updated periodically, maybe every second.
Sometimes things are wrong and you may want to compare time offsets directly. An easy way is to use ntpdate -d server
to compare the local system time with the time taken from server
.
8.1.4 What does 257 mean as value for reach?
(Inspired by Martin Burnicki) The value displayed in column reach
is octal, and it represents the reachability register. One digit in the range of 0
to 7
represents three bits. The initial value of that register is 0
, and after every poll that register is shifted left by one position. If the corresponding time source sent a valid response, the rightmost bit is set.
During a normal startup the registers values are these: 0
, 1
, 3
, 7
, 17
, 37
, 77
, 177
, 377
Thus 257
in the dual system is 10101111
, saying that two valid responses were not received during the last eight polls. However, the last four polls worked fine.
8.1.5 How do I use statistics files?
You can do a lot of useful things with statistic files before you remove them. For example there is a utility named summary.pl
written in Perl to compute mean values and standard deviation (RMS) from the loopfilter and peer statistics. It will also show exceptional conditions found in these files. Here’s a short example output of summary.pl --dir=/var/log/ntp --start=19990518 --end=19990604
:
loops.19990518
loop 110, -30+/-36.5, rms 6.7, freq 14.95+/-1.149, var 0.612
loops.19990519
loop 113, -26+/-40.3, rms 6.9, freq 12.95+/-3.240, var 1.378
loops.19990520
loop 107, -7+/-32.0, rms 5.7, freq 13.04+/-3.253, var 1.579
loops.19990522
loop 190, 3+/-18.5, rms 2.9, freq 15.48+/-3.715, var 0.604
loops.19990523
loop 146, -5+/-9.2, rms 1.9, freq 15.77+/-0.716, var 0.305
loops.19990604
loop 73, -27+/-29.8, rms 6.9, freq 16.81+/-0.327, var 0.140
Still another utility named plot_summary.pl
can be used to make plots with these summary data. As an alternative you could plot the loopfilter file directly with gnuplot
using the command plot "/var/log/ntp/loops"
using 2:3 with linespoints.
The “GNU” in gnuplot is NOT related to the Free Software Foundation, the naming is just a coincidence (and a long story). Thus gnuplot is not covered by the Gnu copyleft, but rather by its own copyright statement, included in all source code files.
Figure 8.1a was produced with a little more complicated command. It shows yerrorbars
with the estimated errors for offset and frequency respectively.
Figure 8.1a: Plot of estimated Offset and Frequency Error (DCF77)
The reference clock, the antenna, and the computer system were located in an office room without air conditioning.
Now that we are looking at numbers and graphs, let us compare the data of a GPS clock (using PPS) with a typical low-cost clock (not using PPS). Figure 8.1b shows a very small offset for the GPS clock. The frequency is continuously adjusted. In comparison, the DCF77 clock shows a high variation for the offset, but the frequency is adjusted less drastically. Figure 8.1a shows values between those, using a better DCF77 receiver with PPS.
Figure 8.1b: Comparing Offset and Frequency Error of DCF77 and GPS