Glossary term

Page Fault Latency

Engineering definition of page fault latency covering minor and major faults, working-set residency, fault rate, tail latency, deadline margin and validation evidence.

Branch: Computer Engineering
Glossary type: metric
Content: Glossary term
Updated: Jun 26, 2026
Revision: v1.0.0 · reviewed

Definition

metric

Page fault latency is the delay added when a processor or operating system must resolve a memory access to a virtual page that is not immediately usable.

Page fault latency appears in operating systems, virtual-memory workloads, embedded Linux devices, databases, telemetry services and latency-sensitive applications when a memory access triggers page-table repair, zero-fill, copy-on-write, demand paging, swap-in, file mapping or memory-pressure handling. A useful review states minor versus major fault type, fault rate, working set, resident memory, storage path, tail latency, timeout impact and validation evidence.

Page fault latency is the delay added when a processor or operating system must resolve a memory access to a virtual page that is not immediately usable. The page may need a page-table update, zero-fill, copy-on-write handling, file mapping, swap-in or fault recovery.

Not every page fault is equally harmful. A minor fault may be resolved in memory. A major fault may require storage or remote backing. In latency-sensitive systems, even rare faults can dominate p99 or maximum response time.

Fault Timing

For one fault, define:

T_{pf}=t_{resume}-t_{fault}

where t_fault is when the faulting access traps and t_resume is when useful execution resumes.

Over an observation window:

T_{obs}

with:

n_{pf}

faults, the fault rate is:

\displaystyle f_{pf}=\frac{n_{pf}}{T_{obs}}

The report should separate minor faults, major faults and allocation-related faults because their causes and remedies differ.

Working-Set Boundary

Let the hot working set be:

W_{hot}

and the available resident memory budget be:

W_{res}

A simple residency screen is:

W_{hot}+M_{guard}\leq W_{res}

where M_guard covers stack, heap growth, mapped files, runtime metadata, kernel buffers and co-located processes. If the working set does not fit, page faults can become a regular part of the timing path rather than a rare startup effect.

Minor And Major Faults

A minor fault can be resolved without reading from backing storage. It may still update page tables, allocate a zero-filled page or repair copy-on-write state. A major fault usually waits for a page to be read from storage or another backing source, so its tail can be much larger.

The classification matters for remediation. Minor faults may point to lazy allocation, copy-on-write behavior or missing warmup. Major faults may point to oversubscription, mapped-file access, swap, cold data or storage latency. A single combined fault count can hide the real cause.

Startup Versus Steady State

Some page faults are acceptable during controlled startup. They become dangerous when they appear during steady operation, failover, retry storms or mode changes. A test plan should therefore state which memory is preloaded, which paths are warmed, and which pages are allowed to fault after the system declares itself ready.

Fault Cost Screen

If the mean or selected percentile fault latency is:

\bar{T}_{pf}

then the observed delay fraction is:

U_{pf}=f_{pf}\bar{T}_{pf}

For major faults this is often wait time, not useful CPU time. It still consumes response margin and may keep locks, workers or event-loop callbacks unavailable.

Worked Latency Screen

Suppose a service records:

n_{pf}=240

faults over:

T_{obs}=60\ s

so:

f_{pf}=240/60=4\ faults/s

If p99 page fault latency is:

T_{pf,p99}=18\ ms

with base response 42 ms and queue delay 9 ms, the guarded response is:

R=42+9+18=69\ ms

For deadline:

D=80\ ms

the margin is:

M=80-69=11\ ms

If p99.9 page fault latency reaches 46 ms, the response becomes 97 ms and the deadline margin becomes -17 ms.

Common Causes

Common causes include cold startup, demand-loaded code, memory-mapped files, copy-on-write after fork, container memory limits, swapping, transparent huge page behavior, overcommit, fragmented memory, excessive working set, allocator behavior and background processes competing for resident memory.

The fault can be hidden by averages. A throughput test may pass while the specific request that touches a cold mapping exceeds its timeout.

Validation Evidence

Useful evidence includes minor and major fault counts, fault latency distribution, resident set size, working-set estimate, page size, memory limit, swap activity, mapped-file behavior, storage latency, cold-start trace, warmup policy, p95, p99 and maximum response, and traces linking page faults to user-visible or deadline-visible failures.

Validation should include cold start, warm operation, memory pressure, deployment restart, traffic burst and degraded storage conditions when those conditions are possible in production.

Design Levers

Useful levers include pre-faulting critical memory, reducing working set, avoiding unexpected file mappings, bounding heap growth, disabling swap for hard real-time paths, warming code and data, isolating latency-critical processes, reserving memory headroom and measuring page faults in release tests.

The goal is not zero faults everywhere. The goal is to keep page faults out of the critical timing path or prove that their bounded latency fits the requirement.

Relationship To Neighbor Terms

Latency and jitter describe the observed delay and variation. Page fault latency is one contributor. Garbage collection pause and allocation stalls can interact with memory pressure. Preemption latency can look worse when the resumed task immediately faults. Timeout budget, deadline miss and data age expose the engineering consequence.

REF

Disciplines