Case study
PTP Time Synchronization Delay Asymmetry Case Study
Telecommunications engineering case study on PTP time synchronization error caused by packet delay asymmetry, covering timestamp exchange, offset calculation, path evidence, QoS, correction, and release validation.
Packet time synchronization can fail even when the network has low average latency. Precision time protocols estimate clock offset from timestamp exchanges. If the forward and reverse packet paths have different delays, the computed offset is biased. The result is a system that appears synchronized according to protocol state, while field events, sampled waveforms or control records are shifted in time.
This case study follows a remote industrial monitoring network after a backhaul failover. Devices remain connected and packet loss is low, but time-aligned measurements disagree by tens of microseconds. The problem is traced to asymmetric packet delay through a microwave backup path and timestamping that did not account for the asymmetry.
The case is simplified for engineering learning. Real timing networks must follow the applicable time-synchronization profile, hardware capabilities, oscillator holdover design, security controls, service-provider boundaries, equipment documentation and commissioning procedure.
Case Context
A remote technical site sends event records and sampled measurements to an operations center. The measurement system requires time error within:
during normal operation and during planned path failover.
The network uses a packet time-synchronization protocol with a grandmaster clock at the operations center and remote devices synchronized through access switches, an aggregation router, a packet microwave backup hop and a fiber metro segment. During normal fiber operation the system passes time-error tests. During a fiber maintenance outage, traffic moves to the microwave backup path. The protocol still reports a locked state, but independent GPS-referenced test equipment at the remote site shows a time error near 45 to 50\ \mu\text{s}.
The central decision is:
Is the time error caused by oscillator drift, packet delay variation, timestamp resolution, or a systematic delay asymmetry that must be corrected before release?
The evidence points to systematic asymmetry.
Simplified Network State
Use the following simplified data from the failed acceptance test.
| Quantity | Symbol | Value |
|---|---|---|
| time-error requirement | TE_{max} | 5\ \mu\text{s} |
| protocol lock state | locked | |
| packet loss during test | below 0.01\% | |
| traffic condition | microwave backup under production load | |
| master-to-slave measured delay term | t_2-t_1 | 1284\ \mu\text{s} |
| slave-to-master measured delay term | t_4-t_3 | 1190\ \mu\text{s} |
| independent measured time error | 46 to 49\ \mu\text{s} | |
| timestamp method | hardware timestamp at switch ingress, software timestamp at one edge device | |
| forward path class | shared class with telemetry bursts | |
| reverse path class | management/control class |
The protocol messages are not being dropped. The fault is not gross reachability. It is a measurement and path-symmetry problem.
PTP Offset Calculation
In a simplified two-way timestamp exchange:
- t_1 is the master transmit time for the Sync message;
- t_2 is the slave receive time for the Sync message;
- t_3 is the slave transmit time for the Delay_Req message;
- t_4 is the master receive time for the Delay_Req message.
The ordinary symmetric-delay estimate of slave clock offset is:
The mean path delay estimate is:
These formulas assume that master-to-slave and slave-to-master delays are equal after all timestamp corrections. That assumption is exactly what the incident violates.
Step 1: Compute the Reported Offset
Use the measured delay terms:
Estimated offset:
Mean path delay:
The protocol servo interprets the 47\ \mu\text{s} value as clock offset. If the actual clocks were nearly aligned before the asymmetric path was introduced, the servo will correct in the wrong direction and create a real time error of roughly the same magnitude.
Engineering Comment
The mean path delay looks plausible. That can mislead the review. The issue is not that the path is long. The issue is that the two directions are not equal, so half the directional difference appears as clock offset.
Step 2: Relate Asymmetry to Time Error
Let the true master-to-slave delay be:
and the true slave-to-master delay be:
If the protocol assumes symmetry, the offset bias caused by delay asymmetry is:
From the measured terms:
Therefore:
The requirement is:
The asymmetry-driven error is:
The timing service fails the requirement by almost one order of magnitude.
Step 3: Check Whether Jitter Alone Explains the Fault
Packet delay variation is present, but the offset error is stable. During a 30-minute test, the measured directional delay statistics are:
| Direction | Median delay term | p95 delay term | p99 delay term |
|---|---|---|---|
| master to slave | 1284\ \mu\text{s} | 1316\ \mu\text{s} | 1341\ \mu\text{s} |
| slave to master | 1190\ \mu\text{s} | 1204\ \mu\text{s} | 1217\ \mu\text{s} |
Directional asymmetry at the median is:
Directional asymmetry at p95 is:
The corresponding p95 offset bias is:
Engineering Comment
Random jitter would create a time-error distribution around the correct value if the timestamp filtering and servo design were adequate. Here the median itself is wrong. That is a systematic asymmetry problem with additional packet delay variation on top.
Step 4: Identify the Network Cause
The path review finds three contributors.
| Contributor | Evidence | Timing consequence |
|---|---|---|
| microwave scheduler uses different queues by direction | forward event messages share a class with bursty telemetry | forward Sync messages see extra variable delay |
| reverse Delay_Req messages use a lower-load management class | reverse path has lower median delay | protocol sees a false positive offset |
| one edge device uses software timestamping | timestamp placed after local processing | local delay is load-dependent and not fully corrected |
The fiber path had nearly symmetric delay. The microwave failover path did not. The network design treated the backup path as a bandwidth and availability path, but the service requirement was actually a timing requirement.
Step 5: Correct the Error Budget
The time-error budget must include asymmetry, timestamp uncertainty and residual packet delay variation.
Use a simple root-sum-square screen for independent residual terms after corrective action:
Proposed corrective actions target:
| Term | Target residual |
|---|---|
| calibrated path asymmetry | u_{asym}=1.5\ \mu\text{s} |
| timestamp uncertainty | u_{ts}=0.5\ \mu\text{s} |
| servo filtering under load | u_{servo}=1.2\ \mu\text{s} |
| short holdover during failover | u_{holdover}=1.0\ \mu\text{s} |
Then:
Using a conservative coverage factor of 2:
That fits within the 5\ \mu\text{s} requirement, but only if the corrective controls are actually verified under the same load and failover conditions.
Step 6: Engineering Decision
The timing service should not be released for production failover in the found condition. The engineering decision is:
Hold failover acceptance, classify time-synchronization packets consistently in both directions, remove software timestamping from the timing boundary where required, calibrate or eliminate fixed path asymmetry, test under production burst load, and release only after independent time-error evidence meets the requirement.
Immediate actions:
- keep the timing service on the primary path or declare degraded timing accuracy during backup operation;
- map all timing event messages to a controlled low-jitter class in both directions;
- verify whether switches act as boundary clocks, transparent clocks or ordinary packet devices;
- place hardware timestamps at the actual timing boundary;
- measure directional delays with independent probes where possible;
- document fixed asymmetry compensation if the path cannot be made symmetric;
- repeat the test under traffic load, rain-fade capacity reduction and route failover;
- update monitoring so protocol lock is not treated as proof of time accuracy.
Release Criteria
Release requires evidence that timing accuracy survives the operating states, not only that protocol messages are exchanged.
| Criterion | Required evidence |
|---|---|
| time error | independent reference shows $ |
| directional delay | master-to-slave and slave-to-master delay asymmetry is measured, bounded or compensated |
| timestamp placement | hardware timestamp boundary is documented and consistent with the correction model |
| QoS class | timing event messages use controlled classes in both directions |
| packet delay variation | p95 and p99 delay variation are within the servo design assumptions |
| failover state | timing remains within limit after route change and during holdover |
| monitoring | alarms include time-error estimate, asymmetry status and grandmaster/path state |
| operations handover | NOC procedures state what “locked” means and what it does not prove |
Transferable Lessons
For packet timing, low latency is not enough. Symmetry, timestamp placement and correction model matter.
The practical diagnostic sequence is:
- compare protocol offset with an independent time reference;
- compute offset from the timestamp exchange;
- estimate directional asymmetry and divide it by two;
- separate median bias from random packet delay variation;
- inspect QoS and route differences by direction;
- verify hardware timestamp boundaries;
- test normal, degraded and failover states before release.
This case is distinct from a general latency budget. A service can have acceptable packet latency and still fail a time-synchronization requirement because the offset estimate is biased by asymmetric delay.