Case study

PTP Time Synchronization Delay Asymmetry Case Study

Telecommunications engineering case study on PTP time synchronization error caused by packet delay asymmetry, covering timestamp exchange, offset calculation, path evidence, QoS, correction, and release validation.

Packet time synchronization can fail even when the network has low average latency. Precision time protocols estimate clock offset from timestamp exchanges. If the forward and reverse packet paths have different delays, the computed offset is biased. The result is a system that appears synchronized according to protocol state, while field events, sampled waveforms or control records are shifted in time.

This case study follows a remote industrial monitoring network after a backhaul failover. Devices remain connected and packet loss is low, but time-aligned measurements disagree by tens of microseconds. The problem is traced to asymmetric packet delay through a microwave backup path and timestamping that did not account for the asymmetry.

The case is simplified for engineering learning. Real timing networks must follow the applicable time-synchronization profile, hardware capabilities, oscillator holdover design, security controls, service-provider boundaries, equipment documentation and commissioning procedure.

Case Context

A remote technical site sends event records and sampled measurements to an operations center. The measurement system requires time error within:

|TE| \le 5\ \mu\text{s}

during normal operation and during planned path failover.

The network uses a packet time-synchronization protocol with a grandmaster clock at the operations center and remote devices synchronized through access switches, an aggregation router, a packet microwave backup hop and a fiber metro segment. During normal fiber operation the system passes time-error tests. During a fiber maintenance outage, traffic moves to the microwave backup path. The protocol still reports a locked state, but independent GPS-referenced test equipment at the remote site shows a time error near 45 to 50\ \mu\text{s}.

The central decision is:

Is the time error caused by oscillator drift, packet delay variation, timestamp resolution, or a systematic delay asymmetry that must be corrected before release?

The evidence points to systematic asymmetry.

Simplified Network State

Use the following simplified data from the failed acceptance test.

QuantitySymbolValue
time-error requirementTE_{max}5\ \mu\text{s}
protocol lock statelocked
packet loss during testbelow 0.01\%
traffic conditionmicrowave backup under production load
master-to-slave measured delay termt_2-t_11284\ \mu\text{s}
slave-to-master measured delay termt_4-t_31190\ \mu\text{s}
independent measured time error46 to 49\ \mu\text{s}
timestamp methodhardware timestamp at switch ingress, software timestamp at one edge device
forward path classshared class with telemetry bursts
reverse path classmanagement/control class

The protocol messages are not being dropped. The fault is not gross reachability. It is a measurement and path-symmetry problem.

PTP Offset Calculation

In a simplified two-way timestamp exchange:

  • t_1 is the master transmit time for the Sync message;
  • t_2 is the slave receive time for the Sync message;
  • t_3 is the slave transmit time for the Delay_Req message;
  • t_4 is the master receive time for the Delay_Req message.

The ordinary symmetric-delay estimate of slave clock offset is:

\displaystyle \theta=\frac{(t_2-t_1)-(t_4-t_3)}{2}

The mean path delay estimate is:

\displaystyle d=\frac{(t_2-t_1)+(t_4-t_3)}{2}

These formulas assume that master-to-slave and slave-to-master delays are equal after all timestamp corrections. That assumption is exactly what the incident violates.

Step 1: Compute the Reported Offset

Use the measured delay terms:

t_2-t_1=1284\ \mu\text{s}
t_4-t_3=1190\ \mu\text{s}

Estimated offset:

\displaystyle \theta=\frac{1284-1190}{2}=47\ \mu\text{s}

Mean path delay:

\displaystyle d=\frac{1284+1190}{2}=1237\ \mu\text{s}

The protocol servo interprets the 47\ \mu\text{s} value as clock offset. If the actual clocks were nearly aligned before the asymmetric path was introduced, the servo will correct in the wrong direction and create a real time error of roughly the same magnitude.

Engineering Comment

The mean path delay looks plausible. That can mislead the review. The issue is not that the path is long. The issue is that the two directions are not equal, so half the directional difference appears as clock offset.

Step 2: Relate Asymmetry to Time Error

Let the true master-to-slave delay be:

d_{MS}

and the true slave-to-master delay be:

d_{SM}

If the protocol assumes symmetry, the offset bias caused by delay asymmetry is:

\displaystyle TE_{asym}\approx\frac{d_{MS}-d_{SM}}{2}

From the measured terms:

d_{MS}-d_{SM}=1284-1190=94\ \mu\text{s}

Therefore:

\displaystyle TE_{asym}\approx\frac{94}{2}=47\ \mu\text{s}

The requirement is:

|TE|\le5\ \mu\text{s}

The asymmetry-driven error is:

47\ \mu\text{s}>5\ \mu\text{s}

The timing service fails the requirement by almost one order of magnitude.

Step 3: Check Whether Jitter Alone Explains the Fault

Packet delay variation is present, but the offset error is stable. During a 30-minute test, the measured directional delay statistics are:

DirectionMedian delay termp95 delay termp99 delay term
master to slave1284\ \mu\text{s}1316\ \mu\text{s}1341\ \mu\text{s}
slave to master1190\ \mu\text{s}1204\ \mu\text{s}1217\ \mu\text{s}

Directional asymmetry at the median is:

A_{50}=1284-1190=94\ \mu\text{s}

Directional asymmetry at p95 is:

A_{95}=1316-1204=112\ \mu\text{s}

The corresponding p95 offset bias is:

\displaystyle TE_{95}\approx\frac{112}{2}=56\ \mu\text{s}

Engineering Comment

Random jitter would create a time-error distribution around the correct value if the timestamp filtering and servo design were adequate. Here the median itself is wrong. That is a systematic asymmetry problem with additional packet delay variation on top.

Step 4: Identify the Network Cause

The path review finds three contributors.

ContributorEvidenceTiming consequence
microwave scheduler uses different queues by directionforward event messages share a class with bursty telemetryforward Sync messages see extra variable delay
reverse Delay_Req messages use a lower-load management classreverse path has lower median delayprotocol sees a false positive offset
one edge device uses software timestampingtimestamp placed after local processinglocal delay is load-dependent and not fully corrected

The fiber path had nearly symmetric delay. The microwave failover path did not. The network design treated the backup path as a bandwidth and availability path, but the service requirement was actually a timing requirement.

Step 5: Correct the Error Budget

The time-error budget must include asymmetry, timestamp uncertainty and residual packet delay variation.

Use a simple root-sum-square screen for independent residual terms after corrective action:

u_{TE}=\sqrt{u_{asym}^2+u_{ts}^2+u_{servo}^2+u_{holdover}^2}

Proposed corrective actions target:

TermTarget residual
calibrated path asymmetryu_{asym}=1.5\ \mu\text{s}
timestamp uncertaintyu_{ts}=0.5\ \mu\text{s}
servo filtering under loadu_{servo}=1.2\ \mu\text{s}
short holdover during failoveru_{holdover}=1.0\ \mu\text{s}

Then:

u_{TE}=\sqrt{1.5^2+0.5^2+1.2^2+1.0^2}
u_{TE}=2.2\ \mu\text{s}

Using a conservative coverage factor of 2:

TE_{expanded}\approx2u_{TE}=4.4\ \mu\text{s}

That fits within the 5\ \mu\text{s} requirement, but only if the corrective controls are actually verified under the same load and failover conditions.

Step 6: Engineering Decision

The timing service should not be released for production failover in the found condition. The engineering decision is:

Hold failover acceptance, classify time-synchronization packets consistently in both directions, remove software timestamping from the timing boundary where required, calibrate or eliminate fixed path asymmetry, test under production burst load, and release only after independent time-error evidence meets the requirement.

Immediate actions:

  1. keep the timing service on the primary path or declare degraded timing accuracy during backup operation;
  2. map all timing event messages to a controlled low-jitter class in both directions;
  3. verify whether switches act as boundary clocks, transparent clocks or ordinary packet devices;
  4. place hardware timestamps at the actual timing boundary;
  5. measure directional delays with independent probes where possible;
  6. document fixed asymmetry compensation if the path cannot be made symmetric;
  7. repeat the test under traffic load, rain-fade capacity reduction and route failover;
  8. update monitoring so protocol lock is not treated as proof of time accuracy.

Release Criteria

Release requires evidence that timing accuracy survives the operating states, not only that protocol messages are exchanged.

CriterionRequired evidence
time errorindependent reference shows $
directional delaymaster-to-slave and slave-to-master delay asymmetry is measured, bounded or compensated
timestamp placementhardware timestamp boundary is documented and consistent with the correction model
QoS classtiming event messages use controlled classes in both directions
packet delay variationp95 and p99 delay variation are within the servo design assumptions
failover statetiming remains within limit after route change and during holdover
monitoringalarms include time-error estimate, asymmetry status and grandmaster/path state
operations handoverNOC procedures state what “locked” means and what it does not prove

Transferable Lessons

For packet timing, low latency is not enough. Symmetry, timestamp placement and correction model matter.

The practical diagnostic sequence is:

  1. compare protocol offset with an independent time reference;
  2. compute offset from the timestamp exchange;
  3. estimate directional asymmetry and divide it by two;
  4. separate median bias from random packet delay variation;
  5. inspect QoS and route differences by direction;
  6. verify hardware timestamp boundaries;
  7. test normal, degraded and failover states before release.

This case is distinct from a general latency budget. A service can have acceptable packet latency and still fail a time-synchronization requirement because the offset estimate is biased by asymmetric delay.

REF

See also