Glossary term
Stack Overflow
Engineering definition of stack overflow covering stack budget, high-water mark, recursion depth, interrupt nesting, guard margin and validation evidence.
Definition
phenomenonStack overflow is a failure in which a task, thread, interrupt or call chain uses more stack memory than the allocated stack region can safely contain.
Stack overflow appears in embedded firmware, operating systems, recursive algorithms, interrupt-heavy systems and long-running services when call depth, local variables, nested interrupts, exception handlers or library code exceed the stack budget. A useful review states allocated stack, high-water mark, guard band, worst-case call path, interrupt nesting, recursion depth, detection mechanism and validation evidence.
Stack overflow is a failure in which a task, thread, interrupt or call chain uses more stack memory than the allocated stack region can safely contain. The overflow can corrupt adjacent memory, overwrite control data, trigger a memory fault, reset a device or create silent state corruption before any obvious crash.
The risk is common in embedded firmware and real-time systems because stack budgets are small, interrupts can nest, library calls may allocate hidden stack frames and recursive or deeply layered code can grow beyond the tested path.
Stack Margin
Let allocated stack size be:
measured worst-case used stack be:
and required guard band be:
The release margin is:
The design is unsafe when:
High-Water Mark
A stack high-water mark estimates maximum observed stack use. If the stack is painted with a known pattern and the minimum untouched bytes after testing are:
then:
This is evidence from the tested workload, not a proof for every path.
Static And Runtime Evidence
Stack confidence usually needs both static and runtime evidence. Static analysis can estimate stack frames, call graph depth and library contribution, but it may struggle with indirect calls, recursion, assembly, interrupts and compiler-specific prologues. Runtime high-water marks prove only the paths that were exercised.
The two views should agree within a stated uncertainty band. If static analysis predicts a deeper call path than the test reached, the test workload is incomplete. If runtime use exceeds static estimates, the build settings, interrupt path or library assumptions need review.
Worked Stack Screen
Suppose a task has:
and post-test high-water evidence shows:
The observed stack use is:
With guard band:
the margin is:
If a rare nested interrupt path adds 384 bytes, the margin becomes -276 bytes. The release fails despite passing the original test run.
Recursion Depth
For recursive or deeply nested code, a simple stack screen is:
where d_max is maximum call depth and S_frame is stack per frame. Binary trees, parsers, graph traversal, callbacks and error handling can create deeper paths than the normal example.
Recursion should be bounded, converted to iteration or validated against worst-case input when stack is limited.
Interrupt And Exception Paths
Stack review must include interrupt entry frames, nested interrupts, exception handlers, logging from fault paths, floating-point context save, RTOS task switch frames and library calls made inside handlers. A normal application trace can understate the stack needed during fault recovery.
For safety-related firmware, a separate interrupt stack or memory protection guard can make overflow easier to detect before it corrupts unrelated state.
Guard Mechanisms
Detection options include stack canaries, red zones, memory protection regions, guard pages, RTOS overflow hooks and post-reset diagnostics. A guard mechanism is only useful if it triggers before unsafe state is acted on and if the recovery path records enough evidence to diagnose the failing task or call path.
Validation Evidence
Useful evidence includes allocated stack per task, high-water marks after representative load, static stack analysis, call graph, recursion bound, interrupt nesting rule, compiler options, FPU context policy, stack canary status, guard region, memory protection fault evidence, watchdog reset reason and fault-injection tests.
The workload should include startup, shutdown, communication bursts, error handling, logging, retries, diagnostics, nested interrupts and maximum-size inputs. The worst stack path is often not the hottest performance path.
Design Levers
Useful levers include increasing stack size, reducing local arrays, bounding recursion, moving large buffers to static or heap storage where appropriate, splitting handlers, avoiding heavy logging in interrupts, using static analysis, enabling stack canaries, adding MPU guard regions and exposing stack high-water marks in diagnostics.
Increasing stack size is not a substitute for understanding the call path. It may hide the problem until a later feature, compiler change or rare exception path consumes the margin.
Relationship To Neighbor Terms
Worst-case execution time and interrupt latency describe timing. Stack overflow describes memory exhaustion on the execution path. Memory fragmentation and memory leak affect heap or resident memory, while stack overflow affects a bounded stack region. Watchdog reset loop can be the visible symptom if overflow repeatedly crashes the same path during recovery.