Glossary term
Memory Leak
Engineering definition of a memory leak covering leak rate, resident memory growth, time to exhaustion, resource leaks, reliability impact and validation evidence.
Definition
phenomenonA memory leak is uncontrolled memory or resource growth caused by allocated objects, buffers, handles or references not being released when they are no longer needed.
Memory leaks appear in services, embedded firmware, device gateways, data pipelines, control software and managed runtimes when heap objects, buffers, subscriptions, file handles, sockets, timers, caches or native resources keep accumulating. A useful review states leak boundary, resident memory trend, allocation source, release path, leak rate, time to exhaustion, failure mode, restart behavior and validation evidence.
A memory leak is uncontrolled memory or resource growth caused by allocated objects, buffers, handles or references not being released when they are no longer needed. The leaking resource may be heap memory, native memory, a file descriptor, socket, timer, subscription, buffer pool entry or cached object without an eviction boundary.
Leaks matter because they often fail late. A short functional test passes, but a long soak, traffic burst or retry storm gradually consumes memory until latency rises, garbage collection increases, page faults appear, queues grow, or the process resets.
Leak Rate
Let memory growth over a steady observation interval be:
over elapsed time:
The observed leak rate is:
The interval should exclude intentional warmup unless warmup never reaches a plateau. A cache that grows to a bounded size is not the same as a leak, but an unbounded cache is still a reliability risk.
Time To Exhaustion
If the memory limit is:
current memory is:
and required guard band is:
then a simple exhaustion screen is:
This is a planning screen, not a guarantee. Leak rate can accelerate under error paths, retries, large payloads, reconnection loops or partial dependency failure.
Worked Soak Screen
Suppose a service has memory limit:
current resident memory:
guard band:
and measured leak rate:
The time to exhaustion is:
If retry traffic raises the leak rate to 18 MB/h, the time to exhaustion falls to:
A daily restart might hide the first case and still fail the second during an incident.
Steady-State Acceptance
A release test should define when memory is allowed to grow and when it must plateau. A warmup phase may load code, caches, connection pools and calibration data. After that phase, repeated equivalent workload should not keep increasing the retained baseline.
One practical acceptance check compares memory after two equal windows:
If:
the test should be treated as a leak suspect until heap or resource evidence proves the growth is bounded.
Failure Path
The first visible symptom may not be an out-of-memory crash. Memory growth can increase garbage collection pause, page fault latency, allocator time, container reclaim, swap activity, event loop lag, thread-pool saturation or watchdog resets before the process dies.
This is why leak review should include latency, error rate and restart behavior, not only heap size.
Resource Boundary
The report should identify which resource leaks. Heap growth, direct memory, GPU buffers, shared memory, file descriptors, sockets, timers, database cursors and message subscriptions have different evidence and different release paths.
Reference leaks in managed runtimes can keep objects alive even when memory is technically collectible. Native leaks can bypass language-level heap metrics. Resource leaks can fail before memory limits are reached.
Validation Evidence
Useful evidence includes resident set size, heap size, native memory, allocation profile, object counts, handle counts, cache size, queue depth, per-request allocation, long-duration soak trend, restart test, failure-path test, heap dump comparison, leak detector output and traces around retries, cancellations and reconnects.
The strongest test runs long enough to show slope, not just peak memory. It should prove that memory returns to a bounded baseline after load, cancellation, timeout and dependency-failure scenarios.
Design Levers
Useful levers include explicit ownership, bounded caches, eviction policies, scoped lifetimes, closing handles on every path, cancellation cleanup, backpressure, retry limits, leak tests in CI, heap-dump diffing, resource counters and restart behavior that preserves safety instead of hiding the defect.
The correct fix is usually not “increase memory.” More memory only lengthens the time to failure unless the leak is bounded or removed.
Relationship To Neighbor Terms
Garbage collection pause can grow when leaked live objects enlarge the heap. Page fault latency can appear when resident memory pressure rises. Queue backpressure and admission control reduce the load that amplifies leaks. Watchdog reset loop is a possible failure mode when a leaking process repeatedly restarts without clearing the underlying trigger.