Case study

Fiber Route Diversity and Backhaul Restoration Case Study

Telecommunications case study of a fiber backhaul outage caused by shared physical route risk, covering route diversity, failover capacity, latency, restoration evidence, and operational lessons.

Branch: Telecommunications Engineering
Content: Case study
Updated: Jun 22, 2026
Revision: v1.0.0 · reviewed

This case study follows a realistic telecommunications outage: a regional service site loses both nominally redundant fiber backhaul circuits after one civil-work incident. The topology diagram showed two links. The field route records showed a different truth: both circuits shared the same bridge crossing and entered the site through the same duct bank.

The case is not about fiber being unreliable. Fiber links can be extremely reliable. The case is about a common engineering mistake: treating logical redundancy as physical diversity. A resilient service needs route evidence, failure-domain mapping, failover capacity, traffic prioritization, monitoring, and restoration records that operations can use under pressure.

Case Summary

Item	Engineering relevance
Service	Backhaul for a remote operations, telemetry, and public communication site.
Normal architecture	Two leased fiber circuits from different providers plus a lower-capacity microwave backup.
Trigger event	Construction damage at a bridge duct crossing.
Hidden weakness	Both fiber circuits used the same physical crossing and same site-entry duct.
Main consequence	The site entered degraded service on microwave backup with limited capacity and higher timing variation.
Useful outcome	Route-diversity audit, failover policy correction, restoration evidence, and monitoring thresholds.

The central engineering question is:

Did the service have true physical diversity, and could it remain useful when the assumed diverse fibers failed together?

The answer was no for the original design, but the incident created the evidence needed to redesign the service boundary.

Initial Architecture

The site supports:

operational voice and messaging;
telemetry from remote equipment;
maintenance access and monitoring;
ordinary user data traffic;
emergency coordination during severe weather.

The network design lists three backhaul paths:

Path	Nominal capacity	Expected role
Fiber A	$1.0\ \text{Gbit/s}$	primary service path
Fiber B	$1.0\ \text{Gbit/s}$	redundant service path
Microwave backup	$180\ \text{Mbit/s}$	degraded service path

The operations dashboard marks the site as protected because two fiber carriers are present. The design review, however, had not required proof of physical separation between ducts, poles, bridges, building entry, patch panels, and local power.

Operating Requirement

The site has three traffic classes:

Traffic class	Required throughput	Latency objective	Loss tolerance	Priority
Critical voice and control	$25\ \text{Mbit/s}$	less than $40\ \text{ms}$ one way	very low	highest
Telemetry and monitoring	$60\ \text{Mbit/s}$	less than $80\ \text{ms}$ one way	moderate	high
General data and maintenance	$350\ \text{Mbit/s}$ peak	best effort	tolerant	low

Normal peak demand can exceed $400\ \text{Mbit/s}$ , but the microwave backup can carry only $180\ \text{Mbit/s}$ under good RF conditions. Therefore degraded operation must use traffic prioritization. The backup path cannot preserve all normal services.

Event Timeline

The outage sequence is reconstructed from alarms, provider tickets, site logs, and field reports.

Time	Event
08:12	Fiber A reports loss of light. Traffic moves to Fiber B.
08:14	Fiber B reports loss of light. Site failover starts microwave backup.
08:17	Monitoring shows packet loss and high queue delay on low-priority traffic.
08:25	Provider notices both fiber circuits cross the same bridge duct segment.
09:10	Field crew confirms duct damage from construction work.
09:35	Operations applies degraded-service traffic policy.
13:20	Temporary fiber splice restores Fiber A.
16:40	Fiber B restored through same duct, but diversity remains unresolved.
Next week	Design team opens route-diversity correction project.

The first operational mistake was assuming Fiber B represented an independent failure domain. It did not. The second was letting general data traffic compete with critical traffic during the first degraded interval.

Capacity Check During Degraded Operation

During the initial failover interval, measured offered load is:

Traffic class	Offered load
Critical voice and control	$22\ \text{Mbit/s}$
Telemetry and monitoring	$54\ \text{Mbit/s}$
General data and maintenance	$290\ \text{Mbit/s}$

Total offered load:

R_{offered}=22+54+290=366\ \text{Mbit/s}

Microwave backup capacity:

R_{backup}=180\ \text{Mbit/s}

Overload ratio:

\displaystyle \rho=\frac{R_{offered}}{R_{backup}}=\frac{366}{180}=2.03

The backup path is offered about $203\%$ of its capacity. Congestion is expected unless low-priority traffic is shaped or dropped.

If the site admits only the critical and telemetry classes:

R_{protected}=22+54=76\ \text{Mbit/s}

Utilization on the backup path becomes:

\displaystyle u=\frac{76}{180}=0.422

Protected traffic uses about $42\%$ of the backup capacity, leaving margin for protocol overhead, burstiness, retransmission, management traffic, and RF modulation changes.

Engineering Interpretation

The backup link was not undersized for the essential service. It was undersized for the unfiltered service. The engineering failure was not only a physical route problem; it was also a traffic policy problem. Degraded operation must be designed, not discovered during the event.

Latency and Jitter Evidence

Before traffic policy correction, the microwave backup shows:

Metric	Measured value
$95$ th percentile one-way latency	$126\ \text{ms}$
Peak-to-peak jitter	$72\ \text{ms}$
Packet loss	$3.8\%$

After low-priority traffic is rate-limited and bulk maintenance flows are blocked:

Metric	Measured value
$95$ th percentile one-way latency	$31\ \text{ms}$
Peak-to-peak jitter	$14\ \text{ms}$
Packet loss	$0.05\%$

The protected service then meets the critical voice and control latency target:

31\ \text{ms}<40\ \text{ms}

It also stays below the telemetry target:

31\ \text{ms}<80\ \text{ms}

Engineering Interpretation

The microwave path had enough technical capacity for protected traffic, but it needed an explicit degraded-service policy. Without that policy, queueing delay dominated performance. This is why service assurance must include traffic classes, not only physical links.

Route Diversity Audit

After restoration, the team audits the physical dependency chain. The audit checks whether supposedly redundant services share any single failure point.

Dependency	Fiber A	Fiber B	Independent?
Long-haul provider	Provider 1	Provider 2	yes
Regional metro ring	North ring	South ring	yes
River crossing	Bridge duct 4	Bridge duct 4	no
Site-entry duct	East duct bank	East duct bank	no
Building patch room	Room A	Room A	no
DC power plant	Power plant 1	Power plant 1	no

The providers are different, but the local crossing and site entry are not. The network topology was diverse at a carrier layer and not diverse at the physical route layer.

The audit defines a shared-risk group:

SRLG_1=\{\text{bridge duct 4},\ \text{east site-entry duct},\ \text{patch room A}\}

Any service that relies on two circuits inside $SRLG_1$ should not be counted as physically diverse.

Restoration Decision

The first restoration option is to repair both fibers through the damaged bridge duct. That restores capacity quickly but does not correct diversity. The second option is to keep one circuit on the repaired bridge route and procure a second path through a separate river crossing and west site entry. That takes longer but removes the correlated failure.

The team separates immediate restoration from permanent remediation:

Restore Fiber A through the temporary splice for capacity.
Keep microwave backup active and monitored until both fiber services are stable.
Restore Fiber B only as temporary service, not as accepted diversity.
Open a route-diversity remediation package for a physically separate path.
Update service records so operations do not count Fiber B as independent until the route changes.

Engineering Interpretation

The repair that returns traffic is not necessarily the repair that restores resilience. Service restoration and resilience restoration are different states. The closeout record should say which state has been achieved.

Failure Modes Exposed

Failure mode	Evidence	Corrective control
false physical diversity	both providers used bridge duct 4	require route evidence and SRLG mapping
unprotected site entry	both circuits entered east duct bank	add west entry or alternate aerial route
backup congestion	offered load exceeded backup capacity	degraded-service traffic policy
weak handover records	operations trusted topology diagram	attach route map and dependency record
alarm ambiguity	both fibers failed as separate tickets	correlate alarms by site and route group
restoration ambiguity	service restored before diversity restored	split service-restored and resilience-restored states

The important lesson is not that every site needs fully separate everything. The lesson is that the claimed availability must match the actual failure domains.

Validation After Remediation

The permanent remediation adds a second physical path through a west entrance and a different river crossing. Validation evidence includes:

provider route confirmation with map references;
site walkdown photos for east and west duct entries;
optical power baseline for both paths;
optical time-domain reflectometry traces for final routes;
failover test from Fiber A to Fiber B;
failover test from fiber service to microwave backup;
traffic policy test under degraded operation;
monitoring alarms tied to route group and service impact.

Example post-remediation acceptance values:

Test	Result	Decision
Fiber A optical margin	$7.8\ \text{dB}$	pass
Fiber B optical margin	$8.4\ \text{dB}$	pass
Fiber A to Fiber B failover	$620\ \text{ms}$ service interruption	pass for this service
Fiber to microwave failover	$2.8\ \text{s}$ degraded transition	pass with traffic policy
Protected traffic on microwave	$33\ \text{ms}$ 95th percentile latency	pass
Bulk traffic during microwave mode	rate-limited to $70\ \text{Mbit/s}$	pass

The evidence supports a new operating statement: the site has two physically separated fiber paths for normal resilience and a microwave degraded-service path for temporary continuity.

Lessons for Engineering Practice

Route diversity must be proven at the layer where the failure occurs. Carrier diversity, VLAN diversity, router diversity, and logical topology diversity do not prove physical diversity. A backhoe, flood, bridge fire, building-entry failure, or patch-room error follows geography and process, not the network diagram.

Useful review questions are:

Do redundant circuits share ducts, bridges, poles, trays, risers, patch rooms, power, or maintenance procedures?
Does the backup path have enough capacity for the protected service, not the full normal load?
Are traffic classes enforced before queueing destroys latency and jitter?
Are alarms correlated by site impact and shared-risk group?
Does the restoration report distinguish restored traffic from restored resilience?
Can future engineers find the evidence without reconstructing the incident?

Transferable Takeaways

The case transfers to data centers, industrial plants, emergency networks, cellular backhaul, ports, campuses, mines, hospitals, and transportation systems. The same pattern appears whenever a service is declared redundant from a logical diagram while physical dependencies remain hidden.

A strong telecommunications design does not merely add backup links. It makes failure domains visible, sizes degraded service intentionally, tests failover under load, and leaves records that operations can trust during the next incident.

REF

Disciplines

Fiber Route Diversity and Backhaul Restoration Case Study

Case Summary

Initial Architecture

Operating Requirement

Event Timeline

Capacity Check During Degraded Operation

Engineering Interpretation

Latency and Jitter Evidence

Engineering Interpretation

Route Diversity Audit

Restoration Decision

Engineering Interpretation

Failure Modes Exposed

Validation After Remediation

Lessons for Engineering Practice

Transferable Takeaways

See also