Exercise set

Packet Network Latency and Jitter Exercises

Solved packet-network exercises for propagation, queueing, shaper admission, TCP loss, microbursts, buffers, QoS and uncertainty.

Branch: Telecommunications Engineering
Content: Exercise set
Updated: Jul 03, 2026
Revision: v1.4.5 · reviewed

These exercises practise packet-network latency and jitter calculations as service-engineering checks. They cover propagation, serialization, protocol overhead, queueing delay, tail delay, bandwidth-delay product, TCP loss-limited throughput, buffer sizing, microbursts, jitter buffers, QoS reservation, token-bucket bursts, shaper admission, packet loss, latency histograms, failover, SLA downtime budgets, and measurement uncertainty.

The goal is not only to compute delay. The goal is to decide whether a packet service can meet a stated requirement under the correct boundary, packet size, traffic class, load state, route, clock reference, and measurement method.

Assume simplified screening models unless an exercise states otherwise. Real service acceptance also requires traffic captures, device counters, QoS configuration review, route verification, synchronized clocks, active probes, passive telemetry, failover tests, and operational alarm thresholds.

How to Use These Exercises

For each problem, define:

the service boundary, such as access switch to gateway, provider handoff to handoff, or application endpoint to endpoint;
packet size including the headers relevant to the measured boundary;
whether delay is one-way, round-trip, average, p95, p99, or maximum observed;
the traffic class and service rate that actually controls the queue;
the validation evidence needed before accepting the service.

The common mistake is quoting an average latency number without the traffic state, percentile, packet size, direction, QoS class, or clock reference. Packet services fail at boundaries and tails, not only at averages.

Release Evidence Notes

Packet latency evidence should start from a service boundary. For each result, record the endpoints, direction, packet size, encapsulation, traffic class, queueing policy, service rate, load state, route, failover state, clock reference, timestamp method and acceptance percentile. A latency or jitter number without that boundary cannot support release.

Queueing and buffer evidence should separate averages from tails. Average M/M/1 screens, p95 or p99 delay, bufferbloat, jitter-buffer sizing and token-bucket burst delay should state whether the result represents normal load, busy hour, degraded backhaul, shaped traffic or failover. A mean delay can pass while tail delay, jitter or burst behavior violates the service requirement.

QoS and availability evidence should preserve the degraded case. Reservation fit, class utilization, packet loss, outage packet count and monthly downtime budgets should be checked against scheduler configuration, policing, shaping, drop counters, failover duration, route convergence and traffic mix. A normal-path pass does not prove the backup path or maintenance state.

Measurement evidence should be strong near thresholds. Timestamp uncertainty, coarse histogram bins, RTT/2 estimates, directional delay asymmetry and clock synchronization should be visible before accepting p95, p99 or one-way delay gates. If the acceptance result depends on interpolation, symmetry assumptions or rounded counters, raw samples or finer probes are needed.

The practical release question is whether the timing model, QoS policy, route state, measurement method and operational evidence all describe the same service. If one layer disagrees, the result should trigger retest, counter review, QoS correction, route change or restricted release rather than acceptance from a single latency number.

Engineering Boundary Notes

Packet timing evidence is boundary-sensitive. A one-way delay, round-trip delay, device-to-device probe, application transaction, provider handoff test and passive capture do not measure the same thing. Each exercise result should state the endpoints, direction, encapsulation, packet size, traffic class, timestamp method and percentile being controlled.

Queueing is also a boundary. Serialization delay, propagation delay, scheduler delay, shaper delay, jitter-buffer delay, retransmission delay and route-convergence delay should not be merged into one average. A service may pass mean latency and still fail p99 latency, jitter, microburst recovery, failover packet loss or time-synchronization asymmetry.

Clock and sample evidence matter most near an SLA threshold. If the acceptance decision depends on sub-millisecond margins, the record should show clock synchronization, timestamp resolution, capture duration, packet-size mix, load state and histogram binning. RTT/2 approximations should be marked as estimates unless directional asymmetry is bounded.

Common Release Mistakes

releasing a service from average latency while the requirement is p95, p99 or maximum delay;
comparing one-way and round-trip results without identifying direction and timestamp reference;
ignoring encapsulation, MTU, serialization and packet-size differences between tests;
validating QoS reservation on a quiet path while failover or busy-hour traffic changes the scheduler state;
using TCP throughput loss formulas without matching RTT, packet loss and congestion-control behavior;
accepting jitter-buffer sizing without burst, packet-loss and clock-drift evidence;
treating provider carrier-up time as service availability when packet loss, route convergence or QoS failure affects the application.

Scenario Map

Scenario	Exercises	Primary check	Engineering decision
Fixed path and useful throughput	1, 2, 5, 17	Propagation, serialization, protocol efficiency, loss, bandwidth-delay product and TCP loss-limited throughput	Decide whether the path and endpoint behavior can support the useful packet service.
Queueing, buffer and tail-delay control	3, 4, 6, 7, 9, 16, 18	Average delay, p95/p99 delay, buffer drain time, microburst overflow, jitter buffer, token-bucket delay and shaper admission headroom	Check whether the service fails at the tail even when the average looks acceptable.
QoS, loss and availability acceptance	8, 10, 11, 13	Reservation fit, class utilization, packet loss ratio, outage packet count and monthly downtime budget	Decide whether degraded capacity, packet loss, recovery time or accumulated downtime breaks the service requirement.
Measurement, asymmetry and release evidence	12, 14, 15	Nominal p95 result, timestamp uncertainty, histogram percentile gates and directional delay asymmetry	Decide whether the acceptance test is strong enough to release the service.

Validation Package Checklist

Before treating a packet-service result as release evidence, collect:

endpoints, direction, handoff boundary, route and failover state;
packet size, encapsulation, MTU, traffic class and scheduler policy;
service rate, offered load, shaping, policing and queue configuration;
metric definition: one-way, RTT, average, p95, p99, jitter, loss or availability;
timestamp method, clock reference, capture duration and histogram resolution;
device counters, probe records, active tests and passive telemetry;
SLA threshold, uncertainty allowance, busy-hour or degraded-path condition;
release decision, QoS correction, route restriction, retest or rollback action.

Exercise 1: Propagation and Serialization Delay

A packet service crosses:

$120\ \text{km}$ of optical fiber;
two free-space radio hops of $15\ \text{km}$ each;
three egress links at $100\ \text{Mbit/s}$ , $50\ \text{Mbit/s}$ , and $1\ \text{Gbit/s}$ .

The measured packet size at this service boundary is:

L=900\ \text{bytes}

Use:

t_{fiber}=5\ \mu\text{s/km}

and:

t_{radio}=3.33\ \mu\text{s/km}

Estimate propagation delay, serialization delay, and their subtotal.

Solution

Fiber propagation:

t_{fiber}=120(5)=600\ \mu\text{s}=0.600\ \text{ms}

Radio propagation:

t_{radio,total}=2(15)(3.33)=99.9\ \mu\text{s}=0.100\ \text{ms}

Total propagation:

t_{prop}=0.600+0.100=0.700\ \text{ms}

Packet length in bits:

L=900(8)=7200\ \text{bits}

Serialization at $100\ \text{Mbit/s}$ :

\displaystyle t_{ser,100}=\frac{7200}{100\times10^6}=72\ \mu\text{s}=0.072\ \text{ms}

Serialization at $50\ \text{Mbit/s}$ :

\displaystyle t_{ser,50}=\frac{7200}{50\times10^6}=144\ \mu\text{s}=0.144\ \text{ms}

Serialization at $1\ \text{Gbit/s}$ :

\displaystyle t_{ser,1000}=\frac{7200}{10^9}=7.2\ \mu\text{s}=0.0072\ \text{ms}

Total serialization:

t_{ser,total}=0.072+0.144+0.0072=0.223\ \text{ms}

Subtotal:

t_{fixed}=0.700+0.223=0.923\ \text{ms}

Engineering Comment

The physical path and serialization terms are below $1\ \text{ms}$ before device processing and queueing. If the measured service shows tens of milliseconds of delay, the cause is likely queueing, scheduling, security processing, route detour, endpoint processing, or measurement boundary mismatch.

Plausibility Check

The propagation subtotal is about $0.700$ ms and the serialization subtotal is about $0.223$ ms, giving $0.923$ ms before queueing or processing. The $50$ Mbit/s link contributes $0.144$ ms of serialization, so it is the dominant serialization term even though the physical fiber distance dominates propagation.

Exercise 2: Protocol Overhead and Goodput

A service carries packets with:

L_p=1200\ \text{bytes}

of useful payload and:

L_h=82\ \text{bytes}

of headers and encapsulation overhead at the measured boundary. Delivered line-rate throughput is:

R=50\ \text{Mbit/s}

Packet loss ratio is:

PLR=0.2\%

Estimate protocol efficiency and first-pass payload goodput.

Solution

Total packet size:

L_{total}=L_p+L_h=1200+82=1282\ \text{bytes}

Protocol efficiency:

\displaystyle \eta_p=\frac{L_p}{L_{total}}=\frac{1200}{1282}=0.936

Convert packet loss ratio:

PLR=0.002

First-pass goodput:

G\approx \eta_p R(1-PLR)

G=0.936(50)(0.998)=46.7\ \text{Mbit/s}

Engineering Comment

The payload service receives less than the nominal line rate even before congestion-control effects. Overhead, tunnels, encryption, small packets, loss, and retransmission can materially change useful capacity. A bandwidth claim should state whether it is line rate, throughput, or application goodput.

Plausibility Check

The payload efficiency is $1200/1282=0.936$ , so overhead alone removes about $6.4\%$ of the line-rate payload. Applying the small $0.2\%$ packet loss leaves $46.7$ Mbit/s, which is below $50$ Mbit/s for the right reason rather than because of congestion in this simplified screen.

Exercise 3: Average Queueing Delay with an M/M/1 Screen

A traffic class carries $1200\ \text{byte}$ packets. Its reserved service rate is:

R=12\ \text{Mbit/s}

The offered load is:

A=7.2\ \text{Mbit/s}

Use a simplified M/M/1 screen to estimate utilization, packet service rate, arrival rate, average queueing delay, and average time in system.

Solution

Packet size:

L=1200(8)=9600\ \text{bits}

Service rate in packets per second:

\displaystyle \mu=\frac{R}{L}=\frac{12\times10^6}{9600}=1250\ \text{packets/s}

Arrival rate:

\displaystyle \lambda=\frac{A}{L}=\frac{7.2\times10^6}{9600}=750\ \text{packets/s}

Utilization:

\displaystyle \rho=\frac{\lambda}{\mu}=\frac{750}{1250}=0.60

Average queueing delay:

\displaystyle W_q=\frac{\rho}{\mu(1-\rho)}

\displaystyle W_q=\frac{0.60}{1250(1-0.60)}=0.0012\ \text{s}

So:

W_q=1.2\ \text{ms}

Average time in system:

\displaystyle W=\frac{1}{\mu-\lambda}=\frac{1}{1250-750}=0.002\ \text{s}

So:

W=2.0\ \text{ms}

Engineering Comment

At $60\%$ utilization, average delay is modest in this simplified model. The same queue can still fail a real-time requirement if traffic is bursty, packet sizes vary, priority scheduling is wrong, or tail delay rather than average delay is the acceptance criterion.

Plausibility Check

The service rate is $1250$ packets/s and the arrival rate is $750$ packets/s, so the remaining service margin is $500$ packets/s. The computed time in system is $2.0$ ms; subtracting the $1/\mu=0.8$ ms service time leaves the $1.2$ ms average queueing delay.

Exercise 4: Tail Delay from the Same Queue

Use the queue from Exercise 3:

\mu-\lambda=500\ \text{packets/s}

Estimate p95 and p99 time in system using:

\displaystyle W_p=\frac{-\ln(1-p)}{\mu-\lambda}

Then estimate p95 queueing delay using:

\displaystyle W_{q,p}=\frac{\ln\left(\frac{\rho}{1-p}\right)}{\mu-\lambda}

for $p>1-\rho$ .

Solution

p95 time in system:

\displaystyle W_{0.95}=\frac{-\ln(0.05)}{500}

\displaystyle W_{0.95}=\frac{2.996}{500}=0.00599\ \text{s}=5.99\ \text{ms}

p99 time in system:

\displaystyle W_{0.99}=\frac{-\ln(0.01)}{500}

\displaystyle W_{0.99}=\frac{4.605}{500}=0.00921\ \text{s}=9.21\ \text{ms}

For p95 queueing delay:

\displaystyle W_{q,0.95}=\frac{\ln(0.60/0.05)}{500}

\displaystyle W_{q,0.95}=\frac{\ln(12)}{500}=\frac{2.485}{500}=0.00497\ \text{s}

So:

W_{q,0.95}=4.97\ \text{ms}

Engineering Comment

The p99 delay is several times larger than the average queueing delay. Service requirements should state percentile and measurement interval. Average delay can pass while p95 or p99 delay violates a control, voice, or telemetry requirement.

Plausibility Check

The p99 time in system is $9.21$ ms, which is about $4.6$ times the $2.0$ ms average time in system from Exercise 3. The p95 queueing-only delay of $4.97$ ms is also much larger than the $1.2$ ms average queueing delay, confirming why percentile requirements matter.

Exercise 5: Bandwidth-Delay Product and Window-Limited Throughput

A long-distance service has:

R=1.0\ \text{Gbit/s}

and:

RTT=80\ \text{ms}

Find the bandwidth-delay product. Then estimate the maximum throughput if an endpoint has only a $2\ \text{MB}$ effective transport window.

Solution

Bandwidth-delay product:

BDP=R(RTT)

BDP=(1.0\times10^9)(0.080)=80\times10^6\ \text{bits}

Convert to bytes:

\displaystyle BDP=\frac{80\times10^6}{8}=10\times10^6\ \text{bytes}

So:

BDP\approx10\ \text{MB}

Window-limited throughput:

\displaystyle R_{window}=\frac{W_{bytes}(8)}{RTT}

\displaystyle R_{window}=\frac{(2\times10^6)(8)}{0.080}=200\times10^6\ \text{bit/s}

So:

R_{window}=200\ \text{Mbit/s}

Engineering Comment

The physical path may support $1\ \text{Gbit/s}$ , but the endpoint window limits this flow to about $200\ \text{Mbit/s}$ . High-bandwidth, high-latency services need endpoint, protocol, and application tuning, not only link capacity.

Plausibility Check

An $80$ ms round trip at $1$ Gbit/s needs about $10$ MB in flight to fill the path. A $2$ MB window is one fifth of that value, so the resulting $200$ Mbit/s throughput is consistent with the window being the limiting element.

Exercise 6: Bufferbloat Delay from Oversized Buffers

An egress queue has an effective buffer:

B=12\ \text{Mbit}

The shaped egress rate is:

R=20\ \text{Mbit/s}

Estimate the delay if the buffer fills. Then find the maximum buffer size for an added queueing delay target of:

40\ \text{ms}

Solution

Full-buffer drain time:

\displaystyle t_{drain}=\frac{B}{R}

\displaystyle t_{drain}=\frac{12}{20}=0.60\ \text{s}

So a full buffer can add:

600\ \text{ms}

of queueing delay.

Maximum buffer for $40\ \text{ms}$ :

B_{max}=R t

B_{max}=(20\times10^6)(0.040)=800000\ \text{bits}

Convert:

B_{max}=0.8\ \text{Mbit}=100\ \text{kB}

Engineering Comment

Large buffers can hide congestion while destroying latency. For real-time services, buffer sizing should be tied to delay targets, traffic shaping, active queue management, and class-specific drop policies.

Plausibility Check

At $20$ Mbit/s, a $40$ ms target permits only $0.8$ Mbit of queued data. The installed $12$ Mbit buffer is fifteen times that target-sized buffer, so a full queue delay of $600$ ms is plausible and clearly outside a real-time service budget.

Exercise 7: Jitter Buffer Sizing

A voice service observes one-way network delay statistics:

Statistic	Delay
median	$18\ \text{ms}$
p95	$27\ \text{ms}$
p99	$42\ \text{ms}$

The codec packetization interval is:

20\ \text{ms}

Estimate the jitter buffer needed to absorb p99 delay variation relative to the median. Then estimate median mouth-to-ear contribution from packetization, median network delay, and jitter buffer. The service target is below $80\ \text{ms}$ for these three terms.

Solution

p99 delay variation relative to median:

J_{99}=42-18=24\ \text{ms}

Use a jitter buffer of at least:

24\ \text{ms}

for this simplified screen.

Median contribution:

t_{total}=t_{packetization}+t_{network,median}+t_{jitter\ buffer}

t_{total}=20+18+24=62\ \text{ms}

Compare with target:

62<80\ \text{ms}

Engineering Comment

The buffer can absorb p99 variation in this screen and still meet the simplified latency target. Real voice design must also include codec algorithmic delay, playout adaptation, packet loss concealment, clock drift, endpoint processing, echo control, and whether the p99 statistic is stable during congestion.

Plausibility Check

The p99 delay is $24$ ms above the median, so a $24$ ms buffer targets that observed variation. Adding packetization, median network delay and the buffer gives $62$ ms, leaving $18$ ms of margin against the simplified $80$ ms target before other voice-system delays are included.

Exercise 8: QoS Reservation in a Degraded Backhaul

A degraded backhaul has available capacity:

R_{total}=10\ \text{Mbit/s}

The intended reservations are:

Class	Reservation
strict-priority voice cap	$2.0\ \text{Mbit/s}$
critical telemetry	$6.0\ \text{Mbit/s}$
management	$0.5\ \text{Mbit/s}$

Critical telemetry offered load is:

A_{tel}=4.5\ \text{Mbit/s}

Check whether the reservations fit inside degraded capacity and estimate telemetry utilization within its class.

Solution

Total reserved capacity:

R_{res}=2.0+6.0+0.5=8.5\ \text{Mbit/s}

Compare with degraded capacity:

8.5<10.0\ \text{Mbit/s}

The reservations fit, leaving:

10.0-8.5=1.5\ \text{Mbit/s}

for best effort or margin.

Telemetry class utilization:

\displaystyle \rho_{tel}=\frac{A_{tel}}{R_{tel}}=\frac{4.5}{6.0}=0.75

Engineering Comment

The reservation plan is plausible, but $75\%$ utilization can still produce tail-delay problems under bursty traffic. The critical check is whether the deployed QoS policy actually enforces the voice cap and class reservation at the degraded bottleneck, not only in the design document.

Plausibility Check

The reservations total $8.5$ Mbit/s, leaving $1.5$ Mbit/s inside the degraded $10$ Mbit/s backhaul. Telemetry uses $4.5/6.0=0.75$ of its class reservation, so the design fits on paper but has enough utilization to justify a tail-delay check.

Exercise 9: Token-Bucket Burst Delay

A traffic shaper has:

R=5\ \text{Mbit/s}

and token-bucket burst allowance:

B=1.0\ \text{Mbit}

A burst of:

2.5\ \text{Mbit}

arrives nearly instantaneously. Estimate the excess burst beyond the bucket and the added delay to drain that excess at the shaped rate. The service target allows at most $50\ \text{ms}$ added shaping delay.

Solution

Excess beyond bucket:

B_{excess}=2.5-1.0=1.5\ \text{Mbit}

Added drain delay:

\displaystyle t_{delay}=\frac{B_{excess}}{R}

\displaystyle t_{delay}=\frac{1.5}{5}=0.30\ \text{s}

So:

t_{delay}=300\ \text{ms}

Compare with target:

300\ \text{ms}>50\ \text{ms}

Engineering Comment

The shaper protects downstream capacity but can add unacceptable delay if burst parameters are too large for the service. For real-time traffic, burst allowance should be coordinated with packetization, queue size, class rate, and maximum delay.

Plausibility Check

Only $1.0$ Mbit of the $2.5$ Mbit burst is covered by tokens, leaving $1.5$ Mbit to drain at $5$ Mbit/s. A $300$ ms delay is six times the $50$ ms target, so the burst parameter is incompatible with this service objective.

Exercise 10: Packet Loss Ratio from a Test Capture

An acceptance test sends:

N_{sent}=10000

packets in the critical telemetry class. The receiver records:

N_{lost}=18

The packet loss requirement is:

PLR<0.1\%

Calculate observed packet loss ratio and decide whether the test passes.

Solution

Observed loss ratio:

\displaystyle PLR=\frac{N_{lost}}{N_{sent}}

\displaystyle PLR=\frac{18}{10000}=0.0018

Convert to percent:

PLR=0.18\%

Compare with requirement:

0.18\%>0.1\%

Engineering Comment

The test fails the stated packet-loss requirement. The engineering response should identify whether loss occurs at ingress policing, egress queue drops, radio retransmission exhaustion, optical errors, route changes, endpoint overload, or measurement setup. A loss number without a class and interface counter is incomplete evidence.

Plausibility Check

Eighteen lost packets out of $10000$ is $0.18\%$ , which is $1.8$ times the $0.1\%$ limit. The failure is not a rounding issue; the result is materially above the stated acceptance threshold.

Exercise 11: Failover Outage Packet Loss

A route failover creates an outage of:

t_{failover}=80\ \text{ms}

A telemetry stream sends:

500\ \text{packets/s}

The service requirement is recovery below:

50\ \text{ms}

Estimate the number of packets affected during failover and decide whether the recovery target is met.

Solution

Packets affected:

N=f_{pkt}t

N=(500)(0.080)=40\ \text{packets}

Compare recovery time:

80\ \text{ms}>50\ \text{ms}

Engineering Comment

The route may reconverge successfully, but it fails the recovery-time requirement. For critical services, failover validation should include packet loss, burst loss duration, jitter after recovery, route symmetry, alarm timing, and whether applications can tolerate the gap.

Plausibility Check

An $80$ ms outage at $500$ packets/s affects about $40$ packets. The recovery target is $50$ ms, so the failover is $30$ ms late and should fail even if the route eventually restores cleanly.

Exercise 12: Acceptance Result with Measurement Uncertainty

A one-way latency acceptance test reports:

p95=24.2\ \text{ms}

The service requirement is:

p95<25.0\ \text{ms}

The timestamp uncertainty contribution is:

\pm1.5\ \text{ms}

Check the nominal result and the conservative result.

Solution

Nominal comparison:

24.2<25.0\ \text{ms}

The nominal result passes.

Conservative upper result:

p95_{high}=24.2+1.5=25.7\ \text{ms}

Compare with requirement:

25.7>25.0\ \text{ms}

Engineering Comment

The nominal result is not enough to support acceptance when measurement uncertainty is included. The test plan should define timestamp accuracy, clock synchronization, packet size, traffic load, percentile method, sample size, route state, and the pass/fail rule before testing.

Plausibility Check

The nominal margin is only $25.0-24.2=0.8$ ms, while timestamp uncertainty is $1.5$ ms. Because the uncertainty is larger than the margin, the conservative value reaches $25.7$ ms and the service cannot be accepted under a guarded interpretation.

Exercise 13: Monthly SLA Downtime Budget and Packet Exposure

A monitored packet service has a monthly availability requirement of:

A_{req}=99.95\%

Use a $30$ day service month. During the month, customer-visible outages affecting the monitored traffic class lasted:

Outage	Duration
route reconvergence incident	$4.5\ \text{min}$
access switch reboot	$6.0\ \text{min}$
provider handoff flap	$5.0\ \text{min}$

The critical stream sends:

f_{pkt}=250\ \text{packets/s}

Application replay can reconstruct about:

70\%

of updates affected during the outage window. A planned protection-switching drill may add another:

8.0\ \text{min}

of customer-visible outage this month. Estimate the allowed downtime, actual availability, remaining downtime budget, packet exposure, residual unreconstructed updates, and whether the planned drill fits the same monthly SLA budget.

Solution

Monthly service time:

T_{month}=30(24)(60)=43200\ \text{min}

Allowed downtime:

D_{allow}=(1-A_{req})T_{month}

D_{allow}=(1-0.9995)(43200)=21.6\ \text{min}

Actual outage time:

D_{actual}=4.5+6.0+5.0=15.5\ \text{min}

Actual availability:

\displaystyle A_{actual}=1-\frac{D_{actual}}{T_{month}}

\displaystyle A_{actual}=1-\frac{15.5}{43200}=0.999641

So:

A_{actual}=99.9641\%

Remaining downtime budget:

D_{remaining}=21.6-15.5=6.1\ \text{min}

Packets exposed to outage windows:

N_{exposed}=f_{pkt}D_{actual}(60)

N_{exposed}=250(15.5)(60)=232500\ \text{packets}

Residual unreconstructed updates after replay:

N_{residual}=N_{exposed}(1-0.70)

N_{residual}=232500(0.30)=69750

If the planned drill is added:

D_{with\ drill}=15.5+8.0=23.5\ \text{min}

and:

\displaystyle A_{with\ drill}=1-\frac{23.5}{43200}=0.999456

So:

A_{with\ drill}=99.9456\%

The planned drill does not fit the remaining monthly SLA budget because:

8.0>6.1\ \text{min}

It would exceed the budget by:

8.0-6.1=1.9\ \text{min}

Engineering Comment

The service currently passes the monthly availability target, but the remaining downtime budget is too small for the planned drill if the drill is customer-visible. Packet exposure and SLA availability are related but not identical: replay can reduce application data loss, while the SLA still counts the outage time. The practical decision is to reschedule the drill, make it non-customer-visible through protection or maintenance routing, or obtain an explicit maintenance-window exclusion.

Plausibility Check

A $99.95\%$ monthly target allows only $0.05\%$ of $43200$ minutes, or $21.6$ minutes of downtime. The observed $15.5$ minutes leaves $6.1$ minutes, so an $8.0$ minute drill cannot fit even though the service is still passing before the drill. At $250$ packets/s, $15.5$ minutes exposes $232500$ packets, which is consistent with a high-rate critical stream over a multi-minute outage window.

Exercise 14: Latency Histogram Percentile SLA Gate

An active probe test runs for:

T=15\ \text{min}

against a customer packet service. The one-way latency histogram for the monitored traffic class is:

One-way latency bin	Probe count
$t \le 10\ \text{ms}$	$6200$
$10<t \le 15\ \text{ms}$	$1800$
$15<t \le 20\ \text{ms}$	$900$
$20<t \le 25\ \text{ms}$	$500$
$25<t \le 35\ \text{ms}$	$300$
$35<t \le 50\ \text{ms}$	$180$
$t>50\ \text{ms}$	$120$

The acceptance rule requires:

p95 \le 30\ \text{ms}

and:

p99 \le 50\ \text{ms}

Use nearest-rank percentile logic and a conservative upper-bin interpretation. Estimate the p95 and p99 acceptance result, the number of probes allowed above $50$ ms by the p99 requirement, the excess high-latency probes, and the average excess rate during the test.

Solution

Total probe count:

N=6200+1800+900+500+300+180+120=10000

Nearest-rank positions:

r_{95}=\lceil 0.95N \rceil=\lceil 0.95(10000) \rceil=9500

r_{99}=\lceil 0.99N \rceil=\lceil 0.99(10000) \rceil=9900

Cumulative counts:

Upper bin edge	Cumulative probes
$10\ \text{ms}$	$6200$
$15\ \text{ms}$	$8000$
$20\ \text{ms}$	$8900$
$25\ \text{ms}$	$9400$
$35\ \text{ms}$	$9700$
$50\ \text{ms}$	$9880$
open-ended above $50\ \text{ms}$	$10000$

The $9500$ th probe lies in the:

25<t \le 35\ \text{ms}

bin. With a conservative upper-bin interpretation:

p95_{guarded}=35\ \text{ms}

The p95 release margin is:

M_{95}=30-35=-5\ \text{ms}

So the p95 evidence does not prove acceptance under the guarded rule.

The $9900$ th probe lies in the open-ended:

t>50\ \text{ms}

bin because only $9880$ probes are at or below $50$ ms. Therefore the p99 requirement fails.

The number of probes allowed above $50$ ms by a $p99 \le 50$ ms target is:

N_{>50,allow}=(1-0.99)N=0.01(10000)=100

Observed probes above $50$ ms:

N_{>50,obs}=120

Excess high-latency probes:

N_{excess}=120-100=20

Average excess rate during the $15$ minute test:

\displaystyle f_{excess}=\frac{20}{15}=1.33\ \text{probes/min}

The service fails the p99 gate and cannot be released from this test evidence.

Engineering Comment

The p95 bin is too coarse around the $30$ ms threshold: linear interpolation might estimate a passing value, but the histogram cannot prove it under a conservative acceptance rule. The p99 result is stronger because the count above $50$ ms already exceeds the $1\%$ allowance. The response should preserve raw samples or use finer bins around service thresholds, then correlate high-latency probes with queue counters, QoS drops, route changes, CPU load and traffic bursts.

Plausibility Check

The $9500$ th probe is just $100$ samples into the $25$ to $35$ ms bin, so a coarse histogram can hide the exact crossing around the $30$ ms target. The $p99$ rule allows only $100$ of $10000$ samples above $50$ ms, while the test observed $120$ , so the fail decision does not depend on interpolation.

Exercise 15: Delay Asymmetry and the RTT/2 Trap

During a fiber maintenance outage, a timing-sensitive telemetry service moves to a backup path. The network dashboard reports round-trip time:

RTT=14.0\ \text{ms}

The dashboard estimates one-way latency as:

\displaystyle t_{RTT/2}=\frac{RTT}{2}

A calibrated probe on the same traffic class independently measures the reverse-path delay as:

t_{reverse}=5.6\ \text{ms}

The service requirement for the forward direction is:

t_{forward}\leq7.5\ \text{ms}

A packet timing function on the same path has a maximum allowed time error of:

TE_{max}=1.0\ \text{ms}

Assume a symmetric-delay timing algorithm, so the timing offset bias caused by directional delay asymmetry is approximately half the forward/reverse delay difference. If fixed asymmetry is later compensated, use residual uncertainty terms:

u_{asym}=0.25\ \text{ms},\quad u_{ts}=0.20\ \text{ms},\quad u_{servo}=0.30\ \text{ms},\quad u_{hold}=0.15\ \text{ms}

Check the RTT/2 estimate, actual forward one-way delay, asymmetry-driven timing error, and whether compensated timing evidence would pass.

Solution

The dashboard estimate is:

\displaystyle t_{RTT/2}=\frac{14.0}{2}=7.0\ \text{ms}

Using only RTT/2, the apparent one-way margin is:

M_{RTT/2}=7.5-7.0=0.5\ \text{ms}

That would appear to pass.

Actual forward delay from the directional measurement is:

t_{forward}=RTT-t_{reverse}

t_{forward}=14.0-5.6=8.4\ \text{ms}

Actual forward-delay margin is:

M_{forward}=7.5-8.4=-0.9\ \text{ms}

So the forward direction fails the one-way requirement.

Directional delay asymmetry is:

\Delta t_{asym}=t_{forward}-t_{reverse}

\Delta t_{asym}=8.4-5.6=2.8\ \text{ms}

The error made by using RTT/2 for the forward direction is:

t_{forward}-t_{RTT/2}=8.4-7.0=1.4\ \text{ms}

This equals half the directional asymmetry:

\displaystyle \frac{\Delta t_{asym}}{2}=\frac{2.8}{2}=1.4\ \text{ms}

The timing offset bias is therefore approximately:

TE_{bias}=1.4\ \text{ms}

Timing-error margin before compensation is:

M_{TE}=TE_{max}-TE_{bias}

M_{TE}=1.0-1.4=-0.4\ \text{ms}

The timing service should not be released on the backup path without asymmetry correction or independent time-error evidence.

After fixed asymmetry compensation, combine the residual uncertainty terms by root-sum-square:

u_{TE}=\sqrt{u_{asym}^2+u_{ts}^2+u_{servo}^2+u_{hold}^2}

u_{TE}=\sqrt{0.25^2+0.20^2+0.30^2+0.15^2}=0.464\ \text{ms}

Compensated timing margin is:

M_{comp}=1.0-0.464=0.536\ \text{ms}

The compensated timing evidence passes the $1.0\ \text{ms}$ timing-error screen, but the forward one-way telemetry latency still fails unless the service path, QoS class or requirement is changed.

Engineering Comment

RTT/2 is only safe when symmetry is justified for the measured service boundary. In this case it creates a false pass: the dashboard reports $7.0\ \text{ms}$ while the forward direction is actually $8.4\ \text{ms}$ . Timing services are even more sensitive because half the directional mismatch appears as clock-offset bias. The release decision should separate ordinary latency acceptance from timing-service acceptance.

Plausibility Check

The measured forward and reverse delays sum to the reported RTT:

8.4+5.6=14.0\ \text{ms}

so the arithmetic is internally consistent. The asymmetry is large enough that half of it, $1.4\ \text{ms}$ , exceeds the $1.0\ \text{ms}$ timing-error limit. After compensation, the residual uncertainty drops below the limit, but it does not change the actual forward path delay.

Exercise 16: Microburst Queue Overflow and Recovery Time

A packet service normally meets its average latency target, but interface counters show short congestion bursts. A class queue has service rate:

R_s=40\ \text{Mbit/s}

During a microburst, traffic enters the same class at:

R_{in}=300\ \text{Mbit/s}

for:

t_b=40\ \text{ms}

The queue already contains:

Q_0=0.35\ \text{MB}

The finite queue capacity is:

Q_{max}=1.50\ \text{MB}

Use a mean packet size of:

L_p=1000\ \text{bytes}

The service delay target allows no more than:

t_{q,target}=80\ \text{ms}

of added queueing delay. After the burst, offered traffic falls to:

R_{post}=25\ \text{Mbit/s}

Estimate the burst excess data, overflow bytes, dropped packets, peak queueing delay, time until the delay target is crossed, and recovery time back below the delay target.

Solution

Excess rate during the burst:

R_{excess}=R_{in}-R_s

R_{excess}=300-40=260\ \text{Mbit/s}

Burst excess data:

\displaystyle Q_{excess}=\frac{R_{excess}t_b}{8}

Using Mbit/s and seconds:

\displaystyle Q_{excess}=\frac{260\times10^6(0.040)}{8}=1.30\times10^6\ \text{bytes}

So:

Q_{excess}=1.30\ \text{MB}

Uncapped peak queue occupancy would be:

Q_{raw}=Q_0+Q_{excess}

Q_{raw}=0.35+1.30=1.65\ \text{MB}

Overflow beyond the finite queue is:

Q_{drop}=Q_{raw}-Q_{max}

Q_{drop}=1.65-1.50=0.15\ \text{MB}

Approximate dropped packets:

\displaystyle N_{drop}=\frac{0.15\times10^6}{1000}=150\ \text{packets}

Peak queueing delay at full queue is:

\displaystyle t_{q,peak}=\frac{Q_{max}(8)}{R_s}

\displaystyle t_{q,peak}=\frac{1.50\times10^6(8)}{40\times10^6}=0.300\ \text{s}

or:

t_{q,peak}=300\ \text{ms}

Queue size corresponding to the service delay target:

\displaystyle Q_{target}=\frac{R_s t_{q,target}}{8}

\displaystyle Q_{target}=\frac{40\times10^6(0.080)}{8}=0.40\ \text{MB}

Time from burst start until the queue crosses the delay target:

\displaystyle t_{cross}=\frac{(Q_{target}-Q_0)8}{R_{excess}}

\displaystyle t_{cross}=\frac{(0.40-0.35)\times10^6(8)}{260\times10^6}=1.54\ \text{ms}

Time until the queue first becomes full:

\displaystyle t_{full}=\frac{(Q_{max}-Q_0)8}{R_{excess}}

\displaystyle t_{full}=\frac{(1.50-0.35)\times10^6(8)}{260\times10^6}=35.4\ \text{ms}

The queue is full and dropping packets for the last:

40.0-35.4=4.6\ \text{ms}

of the burst.

After the burst, the queue drains at:

R_{drain}=R_s-R_{post}

R_{drain}=40-25=15\ \text{Mbit/s}

Recovery time from full queue back below the delay-target queue size:

\displaystyle t_{recover}=\frac{(Q_{max}-Q_{target})8}{R_{drain}}

\displaystyle t_{recover}=\frac{(1.50-0.40)\times10^6(8)}{15\times10^6}=0.587\ \text{s}

or:

t_{recover}=587\ \text{ms}

Total time above the delay target is approximately:

t_{above}=(40.0-1.54)+587=625\ \text{ms}

The service should not be released from average latency evidence alone. A $40\ \text{ms}$ microburst creates packet loss and keeps the class above its added-delay target for about $0.63\ \text{s}$ .

Engineering Comment

Microbursts are dangerous because they are short enough to disappear in coarse average utilization graphs while still filling finite queues. The service can look healthy at one-minute utilization granularity and still drop packets or add hundreds of milliseconds of delay during a burst.

Release evidence should include queue-depth telemetry, drop counters by class, burst capture resolution, scheduler configuration, shaper or policer rates, active queue management state, packet-size distribution and application tolerance. If the queue target is $80\ \text{ms}$ , a full-buffer delay of $300\ \text{ms}$ is not an acceptable hidden reserve.

Plausibility Check

The incoming burst rate is $7.5$ times the service rate, so a short burst can build queue quickly. The excess data is $1.30\ \text{MB}$ , which is almost the whole $1.50\ \text{MB}$ buffer before considering the initial $0.35\ \text{MB}$ occupancy. Dropping about $150$ average packets and spending roughly $0.6\ \text{s}$ above the delay target is therefore plausible even though the burst itself lasts only $40\ \text{ms}$ .

Exercise 17: TCP Loss-Limited Throughput from RTT and Packet Loss

A file-transfer service crosses a routed packet network. The path capacity is above:

100\ \text{Mbit/s}

but a single long-lived TCP-like flow is underperforming. The measured round-trip time is:

RTT=60\ \text{ms}

The maximum segment size is:

MSS=1448\ \text{bytes}

Packet loss probability on the flow is:

p=0.001

Use the simplified loss-limited throughput screen:

\displaystyle T\approx\frac{1.22\,MSS\,8}{RTT\sqrt{p}}

where (RTT) is in seconds and (T) is in bit/s. The service target for this single flow is:

T_{req}=25\ \text{Mbit/s}

Estimate the loss-limited throughput, the service deficit, and the packet loss probability required to meet the target with the same RTT and MSS.

Solution

Convert RTT to seconds:

RTT=60\ \text{ms}=0.060\ \text{s}

Payload bits per segment:

MSS(8)=1448(8)=11584\ \text{bits}

Loss square-root term:

\sqrt{p}=\sqrt{0.001}=0.03162

Throughput screen:

\displaystyle T\approx\frac{1.22(11584)}{0.060(0.03162)}

T\approx7.45\times10^6\ \text{bit/s}

So:

T\approx7.45\ \text{Mbit/s}

Service deficit:

D_T=25.0-7.45=17.55\ \text{Mbit/s}

To find the packet loss probability required for (25\ \text{Mbit/s}), rearrange:

\displaystyle p_{req}=\left(\frac{1.22\,MSS\,8}{RTT\,T_{req}}\right)^2

\displaystyle p_{req}=\left(\frac{1.22(11584)}{0.060(25\times10^6)}\right)^2

p_{req}=8.88\times10^{-5}

As a percentage:

p_{req}=0.0089\%

The current packet loss probability is:

0.001=0.10\%

That is about:

\displaystyle \frac{0.001}{8.88\times10^{-5}}=11.3

times higher than the simplified loss target. The service should not be released as a (25\ \text{Mbit/s}) single-flow service until loss is reduced, RTT is lowered, multiple parallel flows are accepted by the service definition, or the target is changed.

Engineering Comment

Loss-limited throughput is different from link capacity and different from window-limited throughput. A path can have enough physical bandwidth and enough transport window but still underperform when random loss, queue drops, policing, wireless retransmission exhaustion or optical errors trigger congestion control. Release evidence should identify where loss occurs and whether the application uses one flow, multiple flows, UDP, QUIC or a controlled transport profile.

Plausibility Check

A (0.1%) packet loss probability sounds small, but the square-root loss term is only (0.0316), and it appears in the denominator. With a (60\ \text{ms}) RTT, the simplified screen lands near (7.5\ \text{Mbit/s}), far below a (100\ \text{Mbit/s}) path. Reducing loss by roughly an order of magnitude to about (0.009%) is therefore consistent with the (25\ \text{Mbit/s}) target.

Exercise 18: Shaper Admission Headroom and P99 Latency Release Gate

A priority telemetry class crosses an egress shaper with a shaped service rate of:

R_s=18\ \text{Mbit/s}

The fixed one-way delay outside the egress queue is:

t_{fixed}=16\ \text{ms}

The service requirement is:

t_{p99,total}\le 40\ \text{ms}

During the busy-hour acceptance test, the existing class already has a p99 queue occupancy of:

Q_{base}=0.030\ \text{MB}

Use decimal megabytes. A proposed additional telemetry stream has a mean rate of:

R_{new,mean}=2\ \text{Mbit/s}

The existing mean class load is:

R_{base,mean}=14\ \text{Mbit/s}

So the mean load appears to fit the shaper. However, the proposed stream can burst at:

R_{new,burst}=12\ \text{Mbit/s}

for:

t_b=60\ \text{ms}

while the existing class is still entering the queue at:

R_{base,burst}=14\ \text{Mbit/s}

Estimate mean utilization, maximum queue occupancy allowed by the p99 latency budget, burst queue growth, peak queue delay, total p99 latency during the burst, and the largest allowable burst duration or candidate-stream burst rate that would keep the same p99 latency gate.

Solution

Mean class load after admitting the proposed stream is:

R_{mean}=R_{base,mean}+R_{new,mean}=14+2=16\ \text{Mbit/s}

Mean utilization is therefore:

\displaystyle \rho_{mean}=\frac{16}{18}=0.889

The average load screen appears to pass because the mean load is below the shaped rate.

The queue delay budget is the total p99 requirement minus fixed delay:

t_{q,allowed}=40-16=24\ \text{ms}

The corresponding maximum queue occupancy is:

\displaystyle Q_{allowed}=\frac{R_s t_{q,allowed}}{8}

\displaystyle Q_{allowed}=\frac{18\times10^6(0.024)}{8}=54000\ \text{bytes}

So:

Q_{allowed}=0.054\ \text{MB}

The remaining queue headroom after the existing p99 queue is:

Q_{headroom}=0.054-0.030=0.024\ \text{MB}

During the burst, the combined ingress rate is:

R_{in,burst}=14+12=26\ \text{Mbit/s}

The queue grows at the excess rate above the shaper:

R_{excess}=26-18=8\ \text{Mbit/s}

Burst queue growth is:

\displaystyle Q_{growth}=\frac{R_{excess}t_b}{8}

\displaystyle Q_{growth}=\frac{8\times10^6(0.060)}{8}=60000\ \text{bytes}

So:

Q_{growth}=0.060\ \text{MB}

Peak queue occupancy becomes:

Q_{peak}=0.030+0.060=0.090\ \text{MB}

Queue delay at the shaper is:

\displaystyle t_{q,peak}=\frac{Q_{peak}(8)}{R_s}

\displaystyle t_{q,peak}=\frac{0.090\times10^6(8)}{18\times10^6}=0.040\ \text{s}

Therefore:

t_{q,peak}=40\ \text{ms}

Total p99 one-way latency during the burst is approximately:

t_{p99,total}=16+40=56\ \text{ms}

The service fails the p99 release gate by:

56-40=16\ \text{ms}

For the same (8\ \text{Mbit/s}) excess rate, the largest allowable burst duration is set by the remaining headroom:

\displaystyle t_{b,max}=\frac{Q_{headroom}(8)}{R_{excess}}

\displaystyle t_{b,max}=\frac{0.024\times10^6(8)}{8\times10^6}=0.024\ \text{s}

So:

t_{b,max}=24\ \text{ms}

For the same (60\ \text{ms}) burst duration, the largest allowable excess rate is:

\displaystyle R_{excess,max}=\frac{Q_{headroom}(8)}{t_b}

\displaystyle R_{excess,max}=\frac{0.024\times10^6(8)}{0.060}=3.2\ \text{Mbit/s}

Thus the largest combined burst rate is:

R_{in,max}=18+3.2=21.2\ \text{Mbit/s}

Because the existing burst rate is (14\ \text{Mbit/s}), the candidate-stream burst rate would need to be limited to:

R_{new,burst,max}=21.2-14=7.2\ \text{Mbit/s}

The proposed stream should not be admitted as stated. It can be released only if the candidate burst is shortened to about (24\ \text{ms}), shaped to about (7.2\ \text{Mbit/s}) for a (60\ \text{ms}) burst, moved to a different class, or supported by a larger tested service rate and queue policy.

Engineering Comment

Admission is not proved by mean utilization. The added stream raises mean utilization to only (88.9%), but its burst consumes more queue headroom than the p99 latency budget allows. A release review should require shaper configuration, queue occupancy telemetry, class counters, packet-size distribution, burst envelope, drop counters, timestamp method and p99 probes taken under the same traffic policy.

If the service is timing-sensitive, the operational response should be to shape the source, reduce burst size, reserve another class, increase the class rate with scheduler evidence, or reject the new stream. Accepting the stream because the average load is below (18\ \text{Mbit/s}) would hide a predictable tail-latency violation.

Plausibility Check

The p99 budget leaves only (24\ \text{ms}) for queueing, which is (0.054\ \text{MB}) at an (18\ \text{Mbit/s}) shaper. The existing queue already uses (0.030\ \text{MB}), so only (0.024\ \text{MB}) remains. An (8\ \text{Mbit/s}) excess burst for (60\ \text{ms}) adds (0.060\ \text{MB}), more than twice the remaining headroom. A failed (56\ \text{ms}) total p99 latency is therefore plausible.

Review Checklist

When reviewing packet latency and jitter calculations, ask:

Is the service boundary explicit?
Are packet size, traffic class, direction, percentile, and load state defined?
Is the queue controlled by physical rate, shaped rate, class reservation, or scheduler allocation?
Is single-flow throughput checked against RTT, transport window and packet loss rather than only path capacity?
Are degraded modes and failover states tested, not only normal operation?
Does the measurement uncertainty affect the pass/fail result?
Does accumulated outage time still fit the same SLA budget after planned tests or maintenance events?
Are histogram bins fine enough around the SLA thresholds, or are raw samples needed?
Has RTT/2 been rejected unless route symmetry, traffic class symmetry and timestamp boundaries are proven?
Do microburst, queue-depth, drop-counter and recovery-time measurements support the latency claim?
Does admission of a new stream preserve p99 queue headroom under its burst envelope, not only under mean load?
Which counters, captures, probes, and alarms would prove the calculation in service?

Common Mistakes

Quoting average latency while the requirement is p95, p99, maximum delay, jitter or one-way delay.
Comparing tests with different packet sizes, encapsulation overhead, directions, traffic classes or timestamp boundaries.
Treating RTT/2 as one-way delay without proving directional symmetry, route symmetry and clock or timestamp validity.
Checking normal-path delay while ignoring degraded backhaul, failover convergence, maintenance routes and protection switching.
Modeling queue delay from line rate when shaping, policing, scheduler allocation or class reservation controls the real service rate.
Treating path capacity as application throughput while RTT, loss, window size or congestion control is limiting the flow.
Accepting coarse histogram percentiles when bins straddle the SLA threshold and raw samples are not available.
Averaging utilization over long intervals while microbursts fill finite queues and create packet drops.
Admitting a new stream from mean utilization alone while its burst envelope consumes the p99 queue-delay budget.
Sizing buffers or jitter buffers from a nominal value without burst behavior, packet loss, recovery time and application tolerance.
Counting QoS reservation as protection while misclassification, default queues, control traffic or congestion drops remain unverified.
Treating packet loss ratio as independent of latency when queue overflow, retransmission, timeout and jitter-buffer behavior are coupled.
Releasing a timing-sensitive service without asymmetry compensation, holdover evidence and timestamp-uncertainty accounting.

Packet-network engineering is credible only when the timing model, QoS policy, measurement method, and operational evidence describe the same service boundary.

REF