Glossary term

Memory Leak

Engineering definition of a memory leak covering leak rate, resident memory growth, time to exhaustion, resource leaks, reliability impact and validation evidence.

Definition

phenomenon

A memory leak is uncontrolled memory or resource growth caused by allocated objects, buffers, handles or references not being released when they are no longer needed.

Memory leaks appear in services, embedded firmware, device gateways, data pipelines, control software and managed runtimes when heap objects, buffers, subscriptions, file handles, sockets, timers, caches or native resources keep accumulating. A useful review states leak boundary, resident memory trend, allocation source, release path, leak rate, time to exhaustion, failure mode, restart behavior and validation evidence.

A memory leak is uncontrolled memory or resource growth caused by allocated objects, buffers, handles or references not being released when they are no longer needed. The leaking resource may be heap memory, native memory, a file descriptor, socket, timer, subscription, buffer pool entry or cached object without an eviction boundary.

Leaks matter because they often fail late. A short functional test passes, but a long soak, traffic burst or retry storm gradually consumes memory until latency rises, garbage collection increases, page faults appear, queues grow, or the process resets.

Leak Rate

Let memory growth over a steady observation interval be:

\Delta M

over elapsed time:

\Delta t

The observed leak rate is:

\displaystyle r_{leak}=\frac{\Delta M}{\Delta t}

The interval should exclude intentional warmup unless warmup never reaches a plateau. A cache that grows to a bounded size is not the same as a leak, but an unbounded cache is still a reliability risk.

Time To Exhaustion

If the memory limit is:

M_{limit}

current memory is:

M_{now}

and required guard band is:

M_{guard}

then a simple exhaustion screen is:

\displaystyle T_{exhaust}=\frac{M_{limit}-M_{guard}-M_{now}}{r_{leak}}

This is a planning screen, not a guarantee. Leak rate can accelerate under error paths, retries, large payloads, reconnection loops or partial dependency failure.

Worked Soak Screen

Suppose a service has memory limit:

M_{limit}=1024\ MB

current resident memory:

M_{now}=720\ MB

guard band:

M_{guard}=64\ MB

and measured leak rate:

r_{leak}=8\ MB/h

The time to exhaustion is:

\displaystyle T_{exhaust}=\frac{1024-64-720}{8}=30\ h

If retry traffic raises the leak rate to 18 MB/h, the time to exhaustion falls to:

T_{exhaust,new}=240/18=13.33\ h

A daily restart might hide the first case and still fail the second during an incident.

Steady-State Acceptance

A release test should define when memory is allowed to grow and when it must plateau. A warmup phase may load code, caches, connection pools and calibration data. After that phase, repeated equivalent workload should not keep increasing the retained baseline.

One practical acceptance check compares memory after two equal windows:

\Delta M_{steady}=M_{end,2}-M_{end,1}

If:

\Delta M_{steady}>M_{allow}

the test should be treated as a leak suspect until heap or resource evidence proves the growth is bounded.

Failure Path

The first visible symptom may not be an out-of-memory crash. Memory growth can increase garbage collection pause, page fault latency, allocator time, container reclaim, swap activity, event loop lag, thread-pool saturation or watchdog resets before the process dies.

This is why leak review should include latency, error rate and restart behavior, not only heap size.

Resource Boundary

The report should identify which resource leaks. Heap growth, direct memory, GPU buffers, shared memory, file descriptors, sockets, timers, database cursors and message subscriptions have different evidence and different release paths.

Reference leaks in managed runtimes can keep objects alive even when memory is technically collectible. Native leaks can bypass language-level heap metrics. Resource leaks can fail before memory limits are reached.

Validation Evidence

Useful evidence includes resident set size, heap size, native memory, allocation profile, object counts, handle counts, cache size, queue depth, per-request allocation, long-duration soak trend, restart test, failure-path test, heap dump comparison, leak detector output and traces around retries, cancellations and reconnects.

The strongest test runs long enough to show slope, not just peak memory. It should prove that memory returns to a bounded baseline after load, cancellation, timeout and dependency-failure scenarios.

Design Levers

Useful levers include explicit ownership, bounded caches, eviction policies, scoped lifetimes, closing handles on every path, cancellation cleanup, backpressure, retry limits, leak tests in CI, heap-dump diffing, resource counters and restart behavior that preserves safety instead of hiding the defect.

The correct fix is usually not “increase memory.” More memory only lengthens the time to failure unless the leak is bounded or removed.

Relationship To Neighbor Terms

Garbage collection pause can grow when leaked live objects enlarge the heap. Page fault latency can appear when resident memory pressure rises. Queue backpressure and admission control reduce the load that amplifies leaks. Watchdog reset loop is a possible failure mode when a leaking process repeatedly restarts without clearing the underlying trigger.

REF

See also