Exercise set
Packet Network Latency and Jitter Exercises
Solved packet-network exercises for propagation, queueing, shaper admission, TCP loss, microbursts, buffers, QoS and uncertainty.
These exercises practise packet-network latency and jitter calculations as service-engineering checks. They cover propagation, serialization, protocol overhead, queueing delay, tail delay, bandwidth-delay product, TCP loss-limited throughput, buffer sizing, microbursts, jitter buffers, QoS reservation, token-bucket bursts, shaper admission, packet loss, latency histograms, failover, SLA downtime budgets, and measurement uncertainty.
The goal is not only to compute delay. The goal is to decide whether a packet service can meet a stated requirement under the correct boundary, packet size, traffic class, load state, route, clock reference, and measurement method.
Assume simplified screening models unless an exercise states otherwise. Real service acceptance also requires traffic captures, device counters, QoS configuration review, route verification, synchronized clocks, active probes, passive telemetry, failover tests, and operational alarm thresholds.
How to Use These Exercises
For each problem, define:
- the service boundary, such as access switch to gateway, provider handoff to handoff, or application endpoint to endpoint;
- packet size including the headers relevant to the measured boundary;
- whether delay is one-way, round-trip, average, p95, p99, or maximum observed;
- the traffic class and service rate that actually controls the queue;
- the validation evidence needed before accepting the service.
The common mistake is quoting an average latency number without the traffic state, percentile, packet size, direction, QoS class, or clock reference. Packet services fail at boundaries and tails, not only at averages.
Release Evidence Notes
Packet latency evidence should start from a service boundary. For each result, record the endpoints, direction, packet size, encapsulation, traffic class, queueing policy, service rate, load state, route, failover state, clock reference, timestamp method and acceptance percentile. A latency or jitter number without that boundary cannot support release.
Queueing and buffer evidence should separate averages from tails. Average M/M/1 screens, p95 or p99 delay, bufferbloat, jitter-buffer sizing and token-bucket burst delay should state whether the result represents normal load, busy hour, degraded backhaul, shaped traffic or failover. A mean delay can pass while tail delay, jitter or burst behavior violates the service requirement.
QoS and availability evidence should preserve the degraded case. Reservation fit, class utilization, packet loss, outage packet count and monthly downtime budgets should be checked against scheduler configuration, policing, shaping, drop counters, failover duration, route convergence and traffic mix. A normal-path pass does not prove the backup path or maintenance state.
Measurement evidence should be strong near thresholds. Timestamp uncertainty, coarse histogram bins, RTT/2 estimates, directional delay asymmetry and clock synchronization should be visible before accepting p95, p99 or one-way delay gates. If the acceptance result depends on interpolation, symmetry assumptions or rounded counters, raw samples or finer probes are needed.
The practical release question is whether the timing model, QoS policy, route state, measurement method and operational evidence all describe the same service. If one layer disagrees, the result should trigger retest, counter review, QoS correction, route change or restricted release rather than acceptance from a single latency number.
Engineering Boundary Notes
Packet timing evidence is boundary-sensitive. A one-way delay, round-trip delay, device-to-device probe, application transaction, provider handoff test and passive capture do not measure the same thing. Each exercise result should state the endpoints, direction, encapsulation, packet size, traffic class, timestamp method and percentile being controlled.
Queueing is also a boundary. Serialization delay, propagation delay, scheduler delay, shaper delay, jitter-buffer delay, retransmission delay and route-convergence delay should not be merged into one average. A service may pass mean latency and still fail p99 latency, jitter, microburst recovery, failover packet loss or time-synchronization asymmetry.
Clock and sample evidence matter most near an SLA threshold. If the acceptance decision depends on sub-millisecond margins, the record should show clock synchronization, timestamp resolution, capture duration, packet-size mix, load state and histogram binning. RTT/2 approximations should be marked as estimates unless directional asymmetry is bounded.
Common Release Mistakes
- releasing a service from average latency while the requirement is p95, p99 or maximum delay;
- comparing one-way and round-trip results without identifying direction and timestamp reference;
- ignoring encapsulation, MTU, serialization and packet-size differences between tests;
- validating QoS reservation on a quiet path while failover or busy-hour traffic changes the scheduler state;
- using TCP throughput loss formulas without matching RTT, packet loss and congestion-control behavior;
- accepting jitter-buffer sizing without burst, packet-loss and clock-drift evidence;
- treating provider carrier-up time as service availability when packet loss, route convergence or QoS failure affects the application.
Scenario Map
| Scenario | Exercises | Primary check | Engineering decision |
|---|---|---|---|
| Fixed path and useful throughput | 1, 2, 5, 17 | Propagation, serialization, protocol efficiency, loss, bandwidth-delay product and TCP loss-limited throughput | Decide whether the path and endpoint behavior can support the useful packet service. |
| Queueing, buffer and tail-delay control | 3, 4, 6, 7, 9, 16, 18 | Average delay, p95/p99 delay, buffer drain time, microburst overflow, jitter buffer, token-bucket delay and shaper admission headroom | Check whether the service fails at the tail even when the average looks acceptable. |
| QoS, loss and availability acceptance | 8, 10, 11, 13 | Reservation fit, class utilization, packet loss ratio, outage packet count and monthly downtime budget | Decide whether degraded capacity, packet loss, recovery time or accumulated downtime breaks the service requirement. |
| Measurement, asymmetry and release evidence | 12, 14, 15 | Nominal p95 result, timestamp uncertainty, histogram percentile gates and directional delay asymmetry | Decide whether the acceptance test is strong enough to release the service. |
Validation Package Checklist
Before treating a packet-service result as release evidence, collect:
- endpoints, direction, handoff boundary, route and failover state;
- packet size, encapsulation, MTU, traffic class and scheduler policy;
- service rate, offered load, shaping, policing and queue configuration;
- metric definition: one-way, RTT, average, p95, p99, jitter, loss or availability;
- timestamp method, clock reference, capture duration and histogram resolution;
- device counters, probe records, active tests and passive telemetry;
- SLA threshold, uncertainty allowance, busy-hour or degraded-path condition;
- release decision, QoS correction, route restriction, retest or rollback action.
Exercise 1: Propagation and Serialization Delay
A packet service crosses:
- 120\ \text{km} of optical fiber;
- two free-space radio hops of 15\ \text{km} each;
- three egress links at 100\ \text{Mbit/s}, 50\ \text{Mbit/s}, and 1\ \text{Gbit/s}.
The measured packet size at this service boundary is:
Use:
and:
Estimate propagation delay, serialization delay, and their subtotal.
Solution
Fiber propagation:
Radio propagation:
Total propagation:
Packet length in bits:
Serialization at 100\ \text{Mbit/s}:
Serialization at 50\ \text{Mbit/s}:
Serialization at 1\ \text{Gbit/s}:
Total serialization:
Subtotal:
Engineering Comment
The physical path and serialization terms are below 1\ \text{ms} before device processing and queueing. If the measured service shows tens of milliseconds of delay, the cause is likely queueing, scheduling, security processing, route detour, endpoint processing, or measurement boundary mismatch.
Plausibility Check
The propagation subtotal is about 0.700 ms and the serialization subtotal is about 0.223 ms, giving 0.923 ms before queueing or processing. The 50 Mbit/s link contributes 0.144 ms of serialization, so it is the dominant serialization term even though the physical fiber distance dominates propagation.
Exercise 2: Protocol Overhead and Goodput
A service carries packets with:
of useful payload and:
of headers and encapsulation overhead at the measured boundary. Delivered line-rate throughput is:
Packet loss ratio is:
Estimate protocol efficiency and first-pass payload goodput.
Solution
Total packet size:
Protocol efficiency:
Convert packet loss ratio:
First-pass goodput:
Engineering Comment
The payload service receives less than the nominal line rate even before congestion-control effects. Overhead, tunnels, encryption, small packets, loss, and retransmission can materially change useful capacity. A bandwidth claim should state whether it is line rate, throughput, or application goodput.
Plausibility Check
The payload efficiency is 1200/1282=0.936, so overhead alone removes about 6.4\% of the line-rate payload. Applying the small 0.2\% packet loss leaves 46.7 Mbit/s, which is below 50 Mbit/s for the right reason rather than because of congestion in this simplified screen.
Exercise 3: Average Queueing Delay with an M/M/1 Screen
A traffic class carries 1200\ \text{byte} packets. Its reserved service rate is:
The offered load is:
Use a simplified M/M/1 screen to estimate utilization, packet service rate, arrival rate, average queueing delay, and average time in system.
Solution
Packet size:
Service rate in packets per second:
Arrival rate:
Utilization:
Average queueing delay:
So:
Average time in system:
So:
Engineering Comment
At 60\% utilization, average delay is modest in this simplified model. The same queue can still fail a real-time requirement if traffic is bursty, packet sizes vary, priority scheduling is wrong, or tail delay rather than average delay is the acceptance criterion.
Plausibility Check
The service rate is 1250 packets/s and the arrival rate is 750 packets/s, so the remaining service margin is 500 packets/s. The computed time in system is 2.0 ms; subtracting the 1/\mu=0.8 ms service time leaves the 1.2 ms average queueing delay.
Exercise 4: Tail Delay from the Same Queue
Use the queue from Exercise 3:
Estimate p95 and p99 time in system using:
Then estimate p95 queueing delay using:
for p>1-\rho.
Solution
p95 time in system:
p99 time in system:
For p95 queueing delay:
So:
Engineering Comment
The p99 delay is several times larger than the average queueing delay. Service requirements should state percentile and measurement interval. Average delay can pass while p95 or p99 delay violates a control, voice, or telemetry requirement.
Plausibility Check
The p99 time in system is 9.21 ms, which is about 4.6 times the 2.0 ms average time in system from Exercise 3. The p95 queueing-only delay of 4.97 ms is also much larger than the 1.2 ms average queueing delay, confirming why percentile requirements matter.
Exercise 5: Bandwidth-Delay Product and Window-Limited Throughput
A long-distance service has:
and:
Find the bandwidth-delay product. Then estimate the maximum throughput if an endpoint has only a 2\ \text{MB} effective transport window.
Solution
Bandwidth-delay product:
Convert to bytes:
So:
Window-limited throughput:
So:
Engineering Comment
The physical path may support 1\ \text{Gbit/s}, but the endpoint window limits this flow to about 200\ \text{Mbit/s}. High-bandwidth, high-latency services need endpoint, protocol, and application tuning, not only link capacity.
Plausibility Check
An 80 ms round trip at 1 Gbit/s needs about 10 MB in flight to fill the path. A 2 MB window is one fifth of that value, so the resulting 200 Mbit/s throughput is consistent with the window being the limiting element.
Exercise 6: Bufferbloat Delay from Oversized Buffers
An egress queue has an effective buffer:
The shaped egress rate is:
Estimate the delay if the buffer fills. Then find the maximum buffer size for an added queueing delay target of:
Solution
Full-buffer drain time:
So a full buffer can add:
of queueing delay.
Maximum buffer for 40\ \text{ms}:
Convert:
Engineering Comment
Large buffers can hide congestion while destroying latency. For real-time services, buffer sizing should be tied to delay targets, traffic shaping, active queue management, and class-specific drop policies.
Plausibility Check
At 20 Mbit/s, a 40 ms target permits only 0.8 Mbit of queued data. The installed 12 Mbit buffer is fifteen times that target-sized buffer, so a full queue delay of 600 ms is plausible and clearly outside a real-time service budget.
Exercise 7: Jitter Buffer Sizing
A voice service observes one-way network delay statistics:
| Statistic | Delay |
|---|---|
| median | 18\ \text{ms} |
| p95 | 27\ \text{ms} |
| p99 | 42\ \text{ms} |
The codec packetization interval is:
Estimate the jitter buffer needed to absorb p99 delay variation relative to the median. Then estimate median mouth-to-ear contribution from packetization, median network delay, and jitter buffer. The service target is below 80\ \text{ms} for these three terms.
Solution
p99 delay variation relative to median:
Use a jitter buffer of at least:
for this simplified screen.
Median contribution:
Compare with target:
Engineering Comment
The buffer can absorb p99 variation in this screen and still meet the simplified latency target. Real voice design must also include codec algorithmic delay, playout adaptation, packet loss concealment, clock drift, endpoint processing, echo control, and whether the p99 statistic is stable during congestion.
Plausibility Check
The p99 delay is 24 ms above the median, so a 24 ms buffer targets that observed variation. Adding packetization, median network delay and the buffer gives 62 ms, leaving 18 ms of margin against the simplified 80 ms target before other voice-system delays are included.
Exercise 8: QoS Reservation in a Degraded Backhaul
A degraded backhaul has available capacity:
The intended reservations are:
| Class | Reservation |
|---|---|
| strict-priority voice cap | 2.0\ \text{Mbit/s} |
| critical telemetry | 6.0\ \text{Mbit/s} |
| management | 0.5\ \text{Mbit/s} |
Critical telemetry offered load is:
Check whether the reservations fit inside degraded capacity and estimate telemetry utilization within its class.
Solution
Total reserved capacity:
Compare with degraded capacity:
The reservations fit, leaving:
for best effort or margin.
Telemetry class utilization:
Engineering Comment
The reservation plan is plausible, but 75\% utilization can still produce tail-delay problems under bursty traffic. The critical check is whether the deployed QoS policy actually enforces the voice cap and class reservation at the degraded bottleneck, not only in the design document.
Plausibility Check
The reservations total 8.5 Mbit/s, leaving 1.5 Mbit/s inside the degraded 10 Mbit/s backhaul. Telemetry uses 4.5/6.0=0.75 of its class reservation, so the design fits on paper but has enough utilization to justify a tail-delay check.
Exercise 9: Token-Bucket Burst Delay
A traffic shaper has:
and token-bucket burst allowance:
A burst of:
arrives nearly instantaneously. Estimate the excess burst beyond the bucket and the added delay to drain that excess at the shaped rate. The service target allows at most 50\ \text{ms} added shaping delay.
Solution
Excess beyond bucket:
Added drain delay:
So:
Compare with target:
Engineering Comment
The shaper protects downstream capacity but can add unacceptable delay if burst parameters are too large for the service. For real-time traffic, burst allowance should be coordinated with packetization, queue size, class rate, and maximum delay.
Plausibility Check
Only 1.0 Mbit of the 2.5 Mbit burst is covered by tokens, leaving 1.5 Mbit to drain at 5 Mbit/s. A 300 ms delay is six times the 50 ms target, so the burst parameter is incompatible with this service objective.
Exercise 10: Packet Loss Ratio from a Test Capture
An acceptance test sends:
packets in the critical telemetry class. The receiver records:
The packet loss requirement is:
Calculate observed packet loss ratio and decide whether the test passes.
Solution
Observed loss ratio:
Convert to percent:
Compare with requirement:
Engineering Comment
The test fails the stated packet-loss requirement. The engineering response should identify whether loss occurs at ingress policing, egress queue drops, radio retransmission exhaustion, optical errors, route changes, endpoint overload, or measurement setup. A loss number without a class and interface counter is incomplete evidence.
Plausibility Check
Eighteen lost packets out of 10000 is 0.18\%, which is 1.8 times the 0.1\% limit. The failure is not a rounding issue; the result is materially above the stated acceptance threshold.
Exercise 11: Failover Outage Packet Loss
A route failover creates an outage of:
A telemetry stream sends:
The service requirement is recovery below:
Estimate the number of packets affected during failover and decide whether the recovery target is met.
Solution
Packets affected:
Compare recovery time:
Engineering Comment
The route may reconverge successfully, but it fails the recovery-time requirement. For critical services, failover validation should include packet loss, burst loss duration, jitter after recovery, route symmetry, alarm timing, and whether applications can tolerate the gap.
Plausibility Check
An 80 ms outage at 500 packets/s affects about 40 packets. The recovery target is 50 ms, so the failover is 30 ms late and should fail even if the route eventually restores cleanly.
Exercise 12: Acceptance Result with Measurement Uncertainty
A one-way latency acceptance test reports:
The service requirement is:
The timestamp uncertainty contribution is:
Check the nominal result and the conservative result.
Solution
Nominal comparison:
The nominal result passes.
Conservative upper result:
Compare with requirement:
Engineering Comment
The nominal result is not enough to support acceptance when measurement uncertainty is included. The test plan should define timestamp accuracy, clock synchronization, packet size, traffic load, percentile method, sample size, route state, and the pass/fail rule before testing.
Plausibility Check
The nominal margin is only 25.0-24.2=0.8 ms, while timestamp uncertainty is 1.5 ms. Because the uncertainty is larger than the margin, the conservative value reaches 25.7 ms and the service cannot be accepted under a guarded interpretation.
Exercise 13: Monthly SLA Downtime Budget and Packet Exposure
A monitored packet service has a monthly availability requirement of:
Use a 30 day service month. During the month, customer-visible outages affecting the monitored traffic class lasted:
| Outage | Duration |
|---|---|
| route reconvergence incident | 4.5\ \text{min} |
| access switch reboot | 6.0\ \text{min} |
| provider handoff flap | 5.0\ \text{min} |
The critical stream sends:
Application replay can reconstruct about:
of updates affected during the outage window. A planned protection-switching drill may add another:
of customer-visible outage this month. Estimate the allowed downtime, actual availability, remaining downtime budget, packet exposure, residual unreconstructed updates, and whether the planned drill fits the same monthly SLA budget.
Solution
Monthly service time:
Allowed downtime:
Actual outage time:
Actual availability:
So:
Remaining downtime budget:
Packets exposed to outage windows:
Residual unreconstructed updates after replay:
If the planned drill is added:
and:
So:
The planned drill does not fit the remaining monthly SLA budget because:
It would exceed the budget by:
Engineering Comment
The service currently passes the monthly availability target, but the remaining downtime budget is too small for the planned drill if the drill is customer-visible. Packet exposure and SLA availability are related but not identical: replay can reduce application data loss, while the SLA still counts the outage time. The practical decision is to reschedule the drill, make it non-customer-visible through protection or maintenance routing, or obtain an explicit maintenance-window exclusion.
Plausibility Check
A 99.95\% monthly target allows only 0.05\% of 43200 minutes, or 21.6 minutes of downtime. The observed 15.5 minutes leaves 6.1 minutes, so an 8.0 minute drill cannot fit even though the service is still passing before the drill. At 250 packets/s, 15.5 minutes exposes 232500 packets, which is consistent with a high-rate critical stream over a multi-minute outage window.
Exercise 14: Latency Histogram Percentile SLA Gate
An active probe test runs for:
against a customer packet service. The one-way latency histogram for the monitored traffic class is:
| One-way latency bin | Probe count |
|---|---|
| t \le 10\ \text{ms} | 6200 |
| 10<t \le 15\ \text{ms} | 1800 |
| 15<t \le 20\ \text{ms} | 900 |
| 20<t \le 25\ \text{ms} | 500 |
| 25<t \le 35\ \text{ms} | 300 |
| 35<t \le 50\ \text{ms} | 180 |
| t>50\ \text{ms} | 120 |
The acceptance rule requires:
and:
Use nearest-rank percentile logic and a conservative upper-bin interpretation. Estimate the p95 and p99 acceptance result, the number of probes allowed above 50 ms by the p99 requirement, the excess high-latency probes, and the average excess rate during the test.
Solution
Total probe count:
Nearest-rank positions:
Cumulative counts:
| Upper bin edge | Cumulative probes |
|---|---|
| 10\ \text{ms} | 6200 |
| 15\ \text{ms} | 8000 |
| 20\ \text{ms} | 8900 |
| 25\ \text{ms} | 9400 |
| 35\ \text{ms} | 9700 |
| 50\ \text{ms} | 9880 |
| open-ended above 50\ \text{ms} | 10000 |
The 9500th probe lies in the:
bin. With a conservative upper-bin interpretation:
The p95 release margin is:
So the p95 evidence does not prove acceptance under the guarded rule.
The 9900th probe lies in the open-ended:
bin because only 9880 probes are at or below 50 ms. Therefore the p99 requirement fails.
The number of probes allowed above 50 ms by a p99 \le 50 ms target is:
Observed probes above 50 ms:
Excess high-latency probes:
Average excess rate during the 15 minute test:
The service fails the p99 gate and cannot be released from this test evidence.
Engineering Comment
The p95 bin is too coarse around the 30 ms threshold: linear interpolation might estimate a passing value, but the histogram cannot prove it under a conservative acceptance rule. The p99 result is stronger because the count above 50 ms already exceeds the 1\% allowance. The response should preserve raw samples or use finer bins around service thresholds, then correlate high-latency probes with queue counters, QoS drops, route changes, CPU load and traffic bursts.
Plausibility Check
The 9500th probe is just 100 samples into the 25 to 35 ms bin, so a coarse histogram can hide the exact crossing around the 30 ms target. The p99 rule allows only 100 of 10000 samples above 50 ms, while the test observed 120, so the fail decision does not depend on interpolation.
Exercise 15: Delay Asymmetry and the RTT/2 Trap
During a fiber maintenance outage, a timing-sensitive telemetry service moves to a backup path. The network dashboard reports round-trip time:
The dashboard estimates one-way latency as:
A calibrated probe on the same traffic class independently measures the reverse-path delay as:
The service requirement for the forward direction is:
A packet timing function on the same path has a maximum allowed time error of:
Assume a symmetric-delay timing algorithm, so the timing offset bias caused by directional delay asymmetry is approximately half the forward/reverse delay difference. If fixed asymmetry is later compensated, use residual uncertainty terms:
Check the RTT/2 estimate, actual forward one-way delay, asymmetry-driven timing error, and whether compensated timing evidence would pass.
Solution
The dashboard estimate is:
Using only RTT/2, the apparent one-way margin is:
That would appear to pass.
Actual forward delay from the directional measurement is:
Actual forward-delay margin is:
So the forward direction fails the one-way requirement.
Directional delay asymmetry is:
The error made by using RTT/2 for the forward direction is:
This equals half the directional asymmetry:
The timing offset bias is therefore approximately:
Timing-error margin before compensation is:
The timing service should not be released on the backup path without asymmetry correction or independent time-error evidence.
After fixed asymmetry compensation, combine the residual uncertainty terms by root-sum-square:
Compensated timing margin is:
The compensated timing evidence passes the 1.0\ \text{ms} timing-error screen, but the forward one-way telemetry latency still fails unless the service path, QoS class or requirement is changed.
Engineering Comment
RTT/2 is only safe when symmetry is justified for the measured service boundary. In this case it creates a false pass: the dashboard reports 7.0\ \text{ms} while the forward direction is actually 8.4\ \text{ms}. Timing services are even more sensitive because half the directional mismatch appears as clock-offset bias. The release decision should separate ordinary latency acceptance from timing-service acceptance.
Plausibility Check
The measured forward and reverse delays sum to the reported RTT:
so the arithmetic is internally consistent. The asymmetry is large enough that half of it, 1.4\ \text{ms}, exceeds the 1.0\ \text{ms} timing-error limit. After compensation, the residual uncertainty drops below the limit, but it does not change the actual forward path delay.
Exercise 16: Microburst Queue Overflow and Recovery Time
A packet service normally meets its average latency target, but interface counters show short congestion bursts. A class queue has service rate:
During a microburst, traffic enters the same class at:
for:
The queue already contains:
The finite queue capacity is:
Use a mean packet size of:
The service delay target allows no more than:
of added queueing delay. After the burst, offered traffic falls to:
Estimate the burst excess data, overflow bytes, dropped packets, peak queueing delay, time until the delay target is crossed, and recovery time back below the delay target.
Solution
Excess rate during the burst:
Burst excess data:
Using Mbit/s and seconds:
So:
Uncapped peak queue occupancy would be:
Overflow beyond the finite queue is:
Approximate dropped packets:
Peak queueing delay at full queue is:
or:
Queue size corresponding to the service delay target:
Time from burst start until the queue crosses the delay target:
Time until the queue first becomes full:
The queue is full and dropping packets for the last:
of the burst.
After the burst, the queue drains at:
Recovery time from full queue back below the delay-target queue size:
or:
Total time above the delay target is approximately:
The service should not be released from average latency evidence alone. A 40\ \text{ms} microburst creates packet loss and keeps the class above its added-delay target for about 0.63\ \text{s}.
Engineering Comment
Microbursts are dangerous because they are short enough to disappear in coarse average utilization graphs while still filling finite queues. The service can look healthy at one-minute utilization granularity and still drop packets or add hundreds of milliseconds of delay during a burst.
Release evidence should include queue-depth telemetry, drop counters by class, burst capture resolution, scheduler configuration, shaper or policer rates, active queue management state, packet-size distribution and application tolerance. If the queue target is 80\ \text{ms}, a full-buffer delay of 300\ \text{ms} is not an acceptable hidden reserve.
Plausibility Check
The incoming burst rate is 7.5 times the service rate, so a short burst can build queue quickly. The excess data is 1.30\ \text{MB}, which is almost the whole 1.50\ \text{MB} buffer before considering the initial 0.35\ \text{MB} occupancy. Dropping about 150 average packets and spending roughly 0.6\ \text{s} above the delay target is therefore plausible even though the burst itself lasts only 40\ \text{ms}.
Exercise 17: TCP Loss-Limited Throughput from RTT and Packet Loss
A file-transfer service crosses a routed packet network. The path capacity is above:
but a single long-lived TCP-like flow is underperforming. The measured round-trip time is:
The maximum segment size is:
Packet loss probability on the flow is:
Use the simplified loss-limited throughput screen:
where (RTT) is in seconds and (T) is in bit/s. The service target for this single flow is:
Estimate the loss-limited throughput, the service deficit, and the packet loss probability required to meet the target with the same RTT and MSS.
Solution
Convert RTT to seconds:
Payload bits per segment:
Loss square-root term:
Throughput screen:
So:
Service deficit:
To find the packet loss probability required for (25\ \text{Mbit/s}), rearrange:
As a percentage:
The current packet loss probability is:
That is about:
times higher than the simplified loss target. The service should not be released as a (25\ \text{Mbit/s}) single-flow service until loss is reduced, RTT is lowered, multiple parallel flows are accepted by the service definition, or the target is changed.
Engineering Comment
Loss-limited throughput is different from link capacity and different from window-limited throughput. A path can have enough physical bandwidth and enough transport window but still underperform when random loss, queue drops, policing, wireless retransmission exhaustion or optical errors trigger congestion control. Release evidence should identify where loss occurs and whether the application uses one flow, multiple flows, UDP, QUIC or a controlled transport profile.
Plausibility Check
A (0.1%) packet loss probability sounds small, but the square-root loss term is only (0.0316), and it appears in the denominator. With a (60\ \text{ms}) RTT, the simplified screen lands near (7.5\ \text{Mbit/s}), far below a (100\ \text{Mbit/s}) path. Reducing loss by roughly an order of magnitude to about (0.009%) is therefore consistent with the (25\ \text{Mbit/s}) target.
Exercise 18: Shaper Admission Headroom and P99 Latency Release Gate
A priority telemetry class crosses an egress shaper with a shaped service rate of:
The fixed one-way delay outside the egress queue is:
The service requirement is:
During the busy-hour acceptance test, the existing class already has a p99 queue occupancy of:
Use decimal megabytes. A proposed additional telemetry stream has a mean rate of:
The existing mean class load is:
So the mean load appears to fit the shaper. However, the proposed stream can burst at:
for:
while the existing class is still entering the queue at:
Estimate mean utilization, maximum queue occupancy allowed by the p99 latency budget, burst queue growth, peak queue delay, total p99 latency during the burst, and the largest allowable burst duration or candidate-stream burst rate that would keep the same p99 latency gate.
Solution
Mean class load after admitting the proposed stream is:
Mean utilization is therefore:
The average load screen appears to pass because the mean load is below the shaped rate.
The queue delay budget is the total p99 requirement minus fixed delay:
The corresponding maximum queue occupancy is:
So:
The remaining queue headroom after the existing p99 queue is:
During the burst, the combined ingress rate is:
The queue grows at the excess rate above the shaper:
Burst queue growth is:
So:
Peak queue occupancy becomes:
Queue delay at the shaper is:
Therefore:
Total p99 one-way latency during the burst is approximately:
The service fails the p99 release gate by:
For the same (8\ \text{Mbit/s}) excess rate, the largest allowable burst duration is set by the remaining headroom:
So:
For the same (60\ \text{ms}) burst duration, the largest allowable excess rate is:
Thus the largest combined burst rate is:
Because the existing burst rate is (14\ \text{Mbit/s}), the candidate-stream burst rate would need to be limited to:
The proposed stream should not be admitted as stated. It can be released only if the candidate burst is shortened to about (24\ \text{ms}), shaped to about (7.2\ \text{Mbit/s}) for a (60\ \text{ms}) burst, moved to a different class, or supported by a larger tested service rate and queue policy.
Engineering Comment
Admission is not proved by mean utilization. The added stream raises mean utilization to only (88.9%), but its burst consumes more queue headroom than the p99 latency budget allows. A release review should require shaper configuration, queue occupancy telemetry, class counters, packet-size distribution, burst envelope, drop counters, timestamp method and p99 probes taken under the same traffic policy.
If the service is timing-sensitive, the operational response should be to shape the source, reduce burst size, reserve another class, increase the class rate with scheduler evidence, or reject the new stream. Accepting the stream because the average load is below (18\ \text{Mbit/s}) would hide a predictable tail-latency violation.
Plausibility Check
The p99 budget leaves only (24\ \text{ms}) for queueing, which is (0.054\ \text{MB}) at an (18\ \text{Mbit/s}) shaper. The existing queue already uses (0.030\ \text{MB}), so only (0.024\ \text{MB}) remains. An (8\ \text{Mbit/s}) excess burst for (60\ \text{ms}) adds (0.060\ \text{MB}), more than twice the remaining headroom. A failed (56\ \text{ms}) total p99 latency is therefore plausible.
Review Checklist
When reviewing packet latency and jitter calculations, ask:
- Is the service boundary explicit?
- Are packet size, traffic class, direction, percentile, and load state defined?
- Is the queue controlled by physical rate, shaped rate, class reservation, or scheduler allocation?
- Is single-flow throughput checked against RTT, transport window and packet loss rather than only path capacity?
- Are degraded modes and failover states tested, not only normal operation?
- Does the measurement uncertainty affect the pass/fail result?
- Does accumulated outage time still fit the same SLA budget after planned tests or maintenance events?
- Are histogram bins fine enough around the SLA thresholds, or are raw samples needed?
- Has RTT/2 been rejected unless route symmetry, traffic class symmetry and timestamp boundaries are proven?
- Do microburst, queue-depth, drop-counter and recovery-time measurements support the latency claim?
- Does admission of a new stream preserve p99 queue headroom under its burst envelope, not only under mean load?
- Which counters, captures, probes, and alarms would prove the calculation in service?
Common Mistakes
- Quoting average latency while the requirement is p95, p99, maximum delay, jitter or one-way delay.
- Comparing tests with different packet sizes, encapsulation overhead, directions, traffic classes or timestamp boundaries.
- Treating RTT/2 as one-way delay without proving directional symmetry, route symmetry and clock or timestamp validity.
- Checking normal-path delay while ignoring degraded backhaul, failover convergence, maintenance routes and protection switching.
- Modeling queue delay from line rate when shaping, policing, scheduler allocation or class reservation controls the real service rate.
- Treating path capacity as application throughput while RTT, loss, window size or congestion control is limiting the flow.
- Accepting coarse histogram percentiles when bins straddle the SLA threshold and raw samples are not available.
- Averaging utilization over long intervals while microbursts fill finite queues and create packet drops.
- Admitting a new stream from mean utilization alone while its burst envelope consumes the p99 queue-delay budget.
- Sizing buffers or jitter buffers from a nominal value without burst behavior, packet loss, recovery time and application tolerance.
- Counting QoS reservation as protection while misclassification, default queues, control traffic or congestion drops remain unverified.
- Treating packet loss ratio as independent of latency when queue overflow, retransmission, timeout and jitter-buffer behavior are coupled.
- Releasing a timing-sensitive service without asymmetry compensation, holdover evidence and timestamp-uncertainty accounting.
Packet-network engineering is credible only when the timing model, QoS policy, measurement method, and operational evidence describe the same service boundary.