Once the NTP software distribution has been compiled and installed and the configuration file constructed, the next step is to verify correct operation and fix any bugs that may result. Usually, the command line that starts the daemon is included in the system startup file, so it is executed only at system boot time; however, the daemon can be stopped and restarted from root at any time. Usually, no command-line arguments are required, unless special actions described in the xntpd.8 man page are required. Once started, the daemon will begin sending messages, as specified in the configuration file, and interpreting received messages.
The best way to verify correct operation is using the ntpq
and
xntpdc
utility programs, either on the server itself
or from another machine elsewhere in the network. The ntpq
program implements the management functions specified in Appendix A of
the NTP specification RFC-1305,
Appendix A . The xntpdc
program implements additional
functions not provided in the standard. Both programs can be used to
inspect the state variables defined in the specification and, in the
case of xntpdc
, additional ones of interest. In addition,
the xntpdc
program can be used to selectively enable and
disable some functions of the daemon while the daemon is running.
In extreme cases with elusive bugs, the daemon can operate in two
modes, depending on the presence of the -d
command-line
debug switch. If not present, the daemon detaches from the controlling
terminal and proceeds autonomously. If one or more -d
switches are present, the daemon does not detach and generates special
output useful for debugging. In general, interpretation of this output
requires reference to the sources.
Some problems are immediately apparent when the daemon first starts
running. The most common of these are the lack of a ntp (UDP port 123)
in the host /etc/services
file. Note that NTP does not use
TCP in any form. Other problems are apparent in the system log file. The
log file should show the startup banner, some cryptic initialization
data, and the computed precision value. The next most common problem is
incorrect DNS names. Check that each DNS name used in the configuration
file responds to the Unix ping
command.
When first started, the daemon normally polls the servers listed in
the configuration file at 64-second intervals. In order to allow a
sufficient number of samples for the NTP algorithms to reliably
discriminate between correctly operating servers and possible intruders,
at least four valid messages from at least one server is required before
the daemon can not set the local clock. However, if the current local
time is greater than 1000 seconds in error from the server time, the
daemon will not set the local clock; instead, it will plant a message in
the system log and shut down. It is necessary to set the local clock to
within 1000 seconds first, either by a time-of-year hardware clock, by
first using the ntpdate
program or manually be eyeball and wristwatch.
After starting the daemon, run the ntpq
program using
the -n
switch, which will avoid possible distractions due
to name resolution problems. Use the pe
command to display
a billboard showing the status of configured peers and possibly other
clients poking the daemon. After operating for a few minutes, the
display should be something like:
ntpq>pe
remote refid st when poll reach delay offset disp
========================================================================
+128.4.2.6 132.249.16.1 2 131 256 373 9.89 16.28 23.25
*128.4.1.20 .WWVB. 1 137 256 377 280.62 21.74 20.23
-128.8.2.88 128.8.10.1 2 49 128 376 294.14 5.94 17.47
+128.4.2.17 .WWVB. 1 173 256 377 279.95 20.56 16.40
The host addresses shown in the remote
column should
agree with the DNS entries in the configuration file, plus any peers not
mentioned in the file at the same or lower than your stratum that happen
to be configured to peer with you. Be prepared for surprises in cases
where the peer has multiple addresses or multiple names. The
refid
entry shows the current source of synchronization for
each peer, while the st
reveals its stratum and the
poll
entry the polling interval, in seconds. The
when
entry shows the time since the peer was last heard,
normally in seconds, while the reach
entry shows the status
of the reachability register (see RFC-1305), which is in octal format.
The remaining entries show the latest delay, offset and dispersion
computed for the peer, in milliseconds.
The tattletale character at the left margin displays the
synchronization status of each peer. The currently selected peer is
marked *
, while additional peers designated acceptable for
synchronization, but not currently selected, are marked +
.
Peers marked *
and +
are included in a
weighted average computation to set the local clock; the data produced
by peers marked with other symbols are discarded. See the
ntpq
documentation for the meaning of these symbols.
Additional details for each peer separately can be determined by the
following procedure. First, use the as
command to display
an index of association identifiers, such as
ntpq>as
ind assID status conf reach auth condition last_event cnt
===========================================================
1 11670 7414 no yes ok synchr. reachable 1
2 11673 7614 no yes ok sys.peer reachable 1
3 11833 7314 no yes ok outlyer reachable 1
4 11868 7414 no yes ok synchr. reachable 1
Each line in this billboard is associated with the corresponding line
the pe
billboard above. Next, use the rv
command and the respective identifier to display a detailed synopsis of
the selected peer, such as
ntpq>rv 11670
status=7414 reach, auth, sel_sync, 1 event, event_reach
srcadr=128.4.2.6, srcport=123, dstadr=128.4.2.7, dstport=123, keyid=1,
stratum=2, precision=-10, rootdelay=362.00, rootdispersion=21.99,
refid=132.249.16.1,
reftime=af00bb44.849b0000 Fri, Jan 15 1993 4:25:40.517,
delay= 9.89, offset= 16.28, dispersion=23.25, reach=373, valid=8,
hmode=2, pmode=1, hpoll=8, ppoll=10, leap=00, flash=0x0,
org=af00bb48.31a90000 Fri, Jan 15 1993 4:25:44.193,
rec=af00bb48.305e3000 Fri, Jan 15 1993 4:25:44.188,
xmt=af00bb1e.16689000 Fri, Jan 15 1993 4:25:02.087,
filtdelay= 16.40 9.89 140.08 9.63 9.72 9.22 10.79 122.99,
filtoffset= 13.24 16.28 -49.19 16.04 16.83 16.49 16.95 -39.43,
filterror= 16.27 20.17 27.98 31.89 35.80 39.70 43.61 47.52
A detailed explanation of the fields in this billboard are beyond the scope of this discussion; however, most variables defined in the specification RFC-1305 can be found. The most useful portion for debugging is the last three lines, which give the roundtrip delay, clock offset and dispersion for each of the last eight measurement rounds, all in milliseconds. Note that the dispersion, which is an estimate of the error, increases as the age of the sample increases. From these data, it is usually possible to determine the incidence of severe packet loss, network congestion, and unstable local clock oscillators. There are no hard and fast rules here, since every case is unique; however, if one or more of the rounds show zeros, or if the clock offset changes dramatically in the same direction for each round, cause for alarm exists.
Finally, the state of the local clock can be determined using the
rv
command (without the argument), such as
ntpq>rv
status=0664 leap_none, sync_ntp, 6 events, event_peer/strat_chg
system="UNIX", leap=00, stratum=2, rootdelay=280.62,
rootdispersion=45.26, peer=11673, refid=128.4.1.20,
reftime=af00bb42.56111000 Fri, Jan 15 1993 4:25:38.336, poll=8,
clock=af00bbcd.8a5de000 Fri, Jan 15 1993 4:27:57.540, phase=21.147,
freq=13319.46, compliance=2
The most useful data in this billboard show when the clock was last
adjusted reftime
, together with its status and most recent
exception event. An explanation of these data is in the specification
RFC-1305.
When nothing seems to happen in the pe
billboard after
some minutes, there may be a network problem. The most common network
problem is an access controlled router on the path to the selected peer.
No known public NTP time server selectively restricts access at this
time, although this may change in future; however, many private networks
do. It also may be the case that the server is down or running in
unsynchronized mode due to a local problem. Use the ntpq
program to spy on its own variables in the same way you can spy on your
own.
Once the daemon has set the local clock, it will continuously track the discrepancy between local time and NTP time and adjust the local clock accordingly. There are two components of this adjustment, time and frequency. These adjustments are automatically determined by the clock discipline algorithm, which functions as a hybrid phase/frequency feedback loop. The behavior of this algorithm is carefully controlled to minimize residual errors due to network jitter and frequency variations of the local clock hardware oscillator that normally occur in practice. However, when started for the first time, the algorithm may take some time to converge on the intrinsic frequency error of the host machine.
It has sometimes been the experience that the local clock oscillator
frequency error is too large for the NTP discipline algorithm, which can
correct frequency errors as large as 30 seconds per day. There are two
possibilities that may result in this problem. First, the hardware time-
of-year clock chip must be disabled when using NTP, since this can
destabilize the discipline process. This is usually done using the tickadj
program and the
-s
command line argument, but other means may be necessary.
For instance, in the Sun Solaris kernel, this must be done using a
command in the system startup file.
Normally, the daemon will adjust the local clock in small steps in such a way that system and user programs are unaware of its operation. The adjustment process operates continuously as long as the apparent clock error exceeds 128 milliseconds, which for most Internet paths is a quite rare event. If the event is simply an outlyer due to an occasional network delay spike, the correction is simply discarded; however, if the apparent time error persists for an interval of about 20 minutes, the local clock is stepped to the new value (as an option, the daemon can be compiled to slew at an accelerated rate to the new value, rather than be stepped). This behavior is designed to resist errors due to severely congested network paths, as well as errors due to confused radio clocks upon the epoch of a leap second.
If the ntpq
or xntpdc
programs do not
show that messages are being received by the daemon or that received
messages do not result in correct synchronization, verify the following:
/etc/services
file host machine is
configured to accept UDP packets on the NTP port 123. NTP is
specifically designed to use UDP and does not respond to TCP.
xntpd
messages about
configuration errors, name-lookup failures or initialization problems.
xntpdc
program and iostats
command, verify that the received packets and packets sent counters are
incrementing. If the packets send counter does not increment and the
configuration file includes designated servers, something may be wrong
in the network configuration of the xntpd host. If this counter does
increment and packets are actually being sent to the network, but the
received packets counter does not increment, something may be wrong in
the network or the server may not be responding.
rec
timestamp in the
pe
billboard shows a date in 1972, received packets are
probably being discarded for some reason. There is a handy, undocumented
state variable flash
visible in the
pe
billboard. The value is in hex and normally has the value
zero (OK). However, if something is wrong, the bits of this variable,
reading from the right, correspond to the sanity checks listed in
Section 3.4.3 of the NTP specification
RFC-1305. A bit other than zero indicates the associated sanity
check failed.
org, rec
and xmt
timestamps in the pe
billboard appear current, but the
local clock is not set, as indicated by a stratum number less than 16 in
the rv
command without arguments, verify that valid clock
offset, roundtrip delay and dispersion are displayed for at least one
peer. The clock offset should be less than 1000 seconds, the roundtrip
delay less than one second and the dispersion less than one second.
xntpdc
program (or temporary configuration file) and
disable pll
command to prevent the xntpd
daemon from setting the clock. Using the ntpq
or
xntpdc
programs, watch the apparent offset as it varies
over time to determine the intrinsic frequency error. If the error
increases by more than 22 milliseconds per 64-second poll interval, the
intrinsic frequency must be reduced by some means. The easiest way to do
this is with the tickadj
program
and the -t
command line argument.