Case study
CAN Bus Arbitration Latency Deadline Miss Case Study
Computer engineering case study on CAN bus arbitration latency, frame timing, bus utilization, worst-case response time, diagnostic traffic, deadline miss, corrective priorities, and validation evidence.
This case study analyzes an embedded control system that missed a real-time communication deadline after a firmware update added high-priority diagnostic traffic to a shared CAN bus. The control software still executed on time, but the command frame waited too long for bus access because arbitration favored other messages.
The case is useful because shared buses are often reviewed by average utilization. Real-time systems need worst-case response time, arbitration priority, blocking, bursts, error recovery, and validation evidence. A bus with acceptable average load can still miss a hard deadline.
Case Summary
| Item | Engineering relevance |
|---|---|
| System | Distributed embedded controller with a shared CAN bus. |
| Bus rate | 500\ \text{kbit/s} |
| Critical message | actuator command frame with 2\ \text{ms} deadline |
| Trigger | firmware update added frequent high-priority diagnostic frames |
| Symptom | actuator command occasionally arrived late during diagnostic mode |
| Root cause | arbitration priority and diagnostic period were not included in worst-case bus timing analysis |
| Corrective action | lower diagnostic priority, rate-limit diagnostics, define service mode, and validate worst-case response time |
The example uses a simplified CAN timing model. Real analysis should use exact frame format, identifier length, bit stuffing bound, error frames, retransmission behavior, oscillator tolerance, bus physical layer, transceiver delay, gateway behavior, and safety requirements.
Field Data
The bus carries a critical actuator command and several periodic messages.
| Message | Identifier priority | Period | Frame time | Deadline |
|---|---|---|---|---|
| safety heartbeat | higher than command | 10\ \text{ms} | 0.27\ \text{ms} | 10\ \text{ms} |
| inverter status | higher than command | 5\ \text{ms} | 0.27\ \text{ms} | 5\ \text{ms} |
| diagnostic stream after update | higher than command | 0.5\ \text{ms} | 0.27\ \text{ms} | service-mode only |
| actuator command | target message | 10\ \text{ms} | 0.27\ \text{ms} | 2\ \text{ms} |
| lower-priority telemetry | lower than command | mixed | 0.27\ \text{ms} | noncritical |
CAN arbitration is non-destructive: the frame with the highest priority identifier wins bus access. A lower-priority frame already in transmission cannot be pre-empted, so one lower-priority frame can block a higher-priority frame until it finishes.
Step 1: Estimate Frame Transmission Time
Use a conservative frame length including arbitration, control, data, CRC, acknowledgement, inter-frame space, and bit-stuffing allowance:
Bus rate:
Frame transmission time:
Engineering Comment
The frame time is not only the payload size divided by bit rate. Protocol overhead and bit stuffing matter. For release analysis, use the exact frame type and a justified worst-case bound.
Step 2: Check Bus Utilization
For a periodic message:
Heartbeat utilization:
Inverter status utilization:
Diagnostic utilization:
Actuator command utilization:
Subtotal for these messages:
Lower-priority telemetry and error recovery add more load, so the observed peak bus load near diagnostic mode is plausible.
Engineering Comment
The utilization is high but not above 100\%. That alone does not prove schedulability. The deadline miss comes from priority and phasing, not only total load.
Step 3: Calculate Response Time Without Diagnostic Stream
For a fixed-priority non-preemptive bus screen, response time for the command frame can be estimated by:
where:
- C_i is command frame time;
- B_i is blocking by one lower-priority frame already on the bus;
- hp(i) is the set of higher-priority messages;
- T_j is the period of higher-priority message j.
Use:
and one lower-priority blocking frame:
Without the diagnostic stream, higher-priority messages are heartbeat and inverter status.
First iteration:
Repeating with R_i=1.08\ \text{ms} gives the same interference counts:
The command deadline is:
So the original configuration passes:
Engineering Comment
Before the firmware update, the bus had enough arbitration margin for the command frame. This is why the issue did not appear in earlier bench tests.
Step 4: Calculate Response Time With Diagnostic Stream
After the update, the diagnostic stream has higher priority than the command and period:
Add its interference term:
Start with:
First evaluation:
Second evaluation:
Third evaluation:
Repeating remains at:
The deadline is:
Therefore:
The command can miss its deadline during diagnostic mode.
Engineering Comment
The result explains the field symptom. The command task did not necessarily run late. The command frame became late after it was ready because higher-priority diagnostic traffic repeatedly won arbitration.
Step 5: Identify the Design Error
The diagnostic firmware update made two unsafe assumptions:
- diagnostic traffic was treated as harmless because it was “only messages”;
- priority identifiers were assigned for convenience, not deadline consequence.
The update created a priority inversion at the bus level. Noncritical diagnostic traffic had higher arbitration priority than a time-critical actuator command.
This is not the same as CPU priority inversion, but the engineering pattern is similar: a less important activity delayed a more important deadline because the shared resource policy was wrong.
Step 6: Correct the Message Set
The corrected design moved diagnostic frames to lower priority than control frames and limited the diagnostic period in normal operation.
New diagnostic period:
New diagnostic priority: lower than the actuator command, so it does not appear in hp(i) for the command response-time calculation.
Command response time returns to:
If a service mode requires faster diagnostics, the mode must explicitly relax the actuator command requirement, inhibit active control, or run with a separate validation case.
Engineering Comment
The key correction is not only reducing bus load. It is aligning arbitration priority with real-time consequence. Critical control frames must not wait behind noncritical diagnostics.
Step 7: Check Remaining Bus Load
In normal operation after correction:
Subtotal utilization becomes:
This leaves capacity for lower-priority telemetry, retransmissions, and diagnostic bursts under controlled mode rules.
Engineering Comment
Low average utilization is helpful, but it is still not a full proof. Worst-case response time, error frames, gateways, interrupt service time, receive queue depth, and fault recovery still need validation.
Corrective Actions
The accepted corrective actions were:
- reserve highest arbitration priority for safety and control deadlines;
- move diagnostic and logging messages below control messages;
- rate-limit diagnostics during active control;
- create a service mode for high-rate diagnostics with explicit operating constraints;
- add bus response-time analysis to firmware release review;
- measure actual frame timing with a bus analyzer;
- test under maximum periodic load, diagnostic load, and error-recovery cases;
- monitor receive-queue high-water marks and dropped-frame counters;
- require rollback if bus deadline evidence is absent after a firmware change.
Validation Evidence
The corrected release should include:
- message database with identifier, period, deadline, frame length, and owner;
- worst-case response-time calculation for every hard-deadline frame;
- measured bus load in normal, startup, diagnostic, degraded, and fault-recovery modes;
- bus analyzer trace proving command frame latency below 2\ \text{ms};
- receive-queue occupancy and interrupt-load measurements;
- electromagnetic-interference and error-frame test results where relevant;
- bus-off recovery test;
- firmware configuration record matching the tested message set;
- regression test that fails if diagnostics regain higher priority than control.
Final Decision
The defensible engineering decision was:
Do not release the diagnostic firmware update until arbitration priority, diagnostic rate limiting, bus response-time analysis, and measured bus traces prove the actuator command deadline.
The main lesson is that a real-time data bus is a scheduled resource. Bandwidth, arbitration priority, frame length, burst behavior, and error recovery must be treated as part of the timing budget, not as an implementation detail after the control software is complete.