Exercise set
Operating System Memory, Deadlock, and Resource Safety Exercises
Solved OS memory and deadlock exercises for page faults, working sets, leaks, fragmentation, stack guards, GC pauses and release gates.
These exercises focus on operating-system memory behavior and resource safety. They cover page-fault latency, working-set RAM, memory leaks, fragmentation reserve, stack guard, deadlock, garbage-collection pause, reset-loop evidence and release gates.
Assume simplified screening models unless an exercise states otherwise. Production evidence should include memory profiles, page-fault histograms, heap and stack high-water marks, lock graphs, resource-order rules, watchdog logs and workload-specific acceptance criteria.
Release Evidence Notes
Memory and resource evidence should be collected under representative load and duration. A service can pass a short performance test while still failing after heap drift, fragmentation, page-cache eviction, stack growth, deadlock or resource starvation.
Engineering Boundary Notes
These calculations do not replace runtime profiling, static analysis, lock-order review, kernel tracing, soak testing, chaos testing or incident response design. They are release screens that identify where stronger evidence is required.
Scenario Map
| Scenario | Exercises | Primary check | Engineering decision |
|---|---|---|---|
| Memory pressure | 1, 2, 3, 4, 5, 15 | Page faults, working set, leaks, fragmentation, stack and swap | Decide whether the workload has safe memory margin. |
| Deadlock and blocking | 6, 7, 8, 9, 14 | Resource cycles, lock ordering, timeout and hold time | Decide whether locking design must change. |
| Runtime pauses and recovery | 10, 11, 12, 13, 16, 17 | GC pause, reset loop, file descriptors, cache growth and evidence completion | Decide whether resource safety is releasable. |
| Release gate | 18 | All-of resource gate | Decide whether deployment can proceed. |
Exercise 1: Page-Fault Service Latency
A workload sees 320 major page faults per second. Each major fault costs 1.8\ \text{ms}. Estimate CPU-equivalent blocked time per second.
Solution
So blocked time is:
Engineering Comment
Major faults can dominate tail latency even if CPU utilization looks acceptable. Release should investigate working set and storage behavior.
Plausibility Check
Hundreds of faults at nearly two milliseconds each sum to more than half a second per second.
Exercise 2: Working-Set RAM Shortfall
A service working set is 3.8\ \text{GB}. Container memory limit is 4.5\ \text{GB} and the OS/runtime reserve is 0.9\ \text{GB}. Check memory margin.
Solution
Available to the service:
Shortfall:
Engineering Comment
The workload exceeds guarded memory capacity. Paging, OOM kill or aggressive garbage collection is likely unless memory is reduced or limit increased.
Plausibility Check
The working set is slightly larger than available guarded memory, so the shortfall is small but real.
Exercise 3: Memory-Leak Endurance
A process has 900\ \text{MB} free margin after startup. Restart threshold requires 250\ \text{MB} remaining. Leak rate is 12\ \text{MB/h}. Estimate time to threshold.
Solution
Leak budget:
Time:
Engineering Comment
Two days of endurance is weak for most services. The leak should be fixed or bounded by controlled restart and alerting.
Plausibility Check
At about ten megabytes per hour, several hundred megabytes lasts a few dozen hours.
Exercise 4: Fragmentation Reserve
A heap has 1.2\ \text{GB} free total, but the largest contiguous allocatable block is 180\ \text{MB}. A workload needs a 220\ \text{MB} block. Decide status.
Solution
The allocation requires:
Largest block is:
Since:
the allocation can fail despite total free memory.
Engineering Comment
Fragmentation is a resource-safety issue. Pooling, compaction or allocation-size redesign may be required.
Plausibility Check
Total free memory is misleading because the required contiguous block is larger than any available block.
Exercise 5: Stack Guard Margin
A thread stack is 1.0\ \text{MB}. High-water use is 720\ \text{kB} and guard requirement is 160\ \text{kB}. Compute remaining margin.
Solution
Use stack in kB:
Guarded use:
Remaining margin:
Engineering Comment
The stack passes, but recursion, debug options and signal handlers can change high-water usage.
Plausibility Check
The guarded use is below one megabyte but not by much.
Exercise 6: Deadlock Resource Cycle
Thread A holds lock X and waits for lock Y. Thread B holds lock Y and waits for lock X. Identify whether this is a deadlock cycle.
Solution
The wait graph has:
This is a cycle, so deadlock is possible or present.
Engineering Comment
Timeouts can recover symptoms but do not remove the circular wait condition. A lock-order rule is stronger.
Plausibility Check
Each thread waits for the resource held by the other, which is the classic two-lock cycle.
Exercise 7: Lock-Order Violation Count
A code audit checks 240 lock acquisitions and finds 6 order violations. The release limit is below 1\%. Compute violation rate.
Solution
Since:
release fails.
Engineering Comment
Lock-order violations should be corrected, not averaged away. Rare paths often cause the most expensive deadlocks.
Plausibility Check
Six out of two hundred forty is one in forty, or two and a half percent.
Exercise 8: Deadlock Timeout Availability Loss
A deadlock recovery timeout restarts a worker after 30\ \text{s}. The incident happens 4 times per day. Estimate downtime per day.
Solution
Convert to minutes:
Engineering Comment
Two minutes may look small, but restarts can lose work, corrupt sessions or hide a serious locking defect.
Plausibility Check
Four half-minute events total two minutes.
Exercise 9: Lock Hold-Time Gate
A shared lock should be held less than 8\ \text{ms} at the 99th percentile. Trace shows P99=11\ \text{ms}. Decide status.
Solution
Since:
the gate fails by:
Engineering Comment
Long lock hold time can create convoying and priority inversion. The critical section should be reduced or split.
Plausibility Check
The measured percentile exceeds the limit directly.
Exercise 10: Garbage-Collection Pause Gate
A runtime has maximum observed GC pause 72\ \text{ms}. The service-level budget allows 60\ \text{ms} pause with 5\ \text{ms} guard. Decide status.
Solution
Guarded limit:
Since:
the gate fails.
Engineering Comment
GC tuning should be validated with allocation rate, heap size and production-like object lifetimes.
Plausibility Check
The observed pause exceeds even the raw limit, so guarded failure is expected.
Exercise 11: Watchdog Reset-Loop Count
A service watchdog restarts a process after missed heartbeats. Policy enters safe mode after 3 restarts in 15\ \text{min}. Restarts occur at 0, 240 and 720\ \text{s}. Decide status.
Solution
Span from first to third restart:
Since:
safe mode should be entered.
Engineering Comment
Reset loops are resource-safety evidence, not only availability events. The system should preserve logs and stop repeated damage.
Plausibility Check
Three restarts within twelve minutes fit inside a fifteen-minute window.
Exercise 12: File Descriptor Exhaustion
A process limit is 4096 file descriptors. Baseline use is 1300, peak load adds 2100 and guard is 500. Check margin.
Solution
Guarded use:
Margin:
Engineering Comment
The margin is small. Leaks or retry storms can exhaust descriptors quickly.
Plausibility Check
The guarded total is just below the configured limit.
Exercise 13: Page Cache Growth
A workload writes files and page cache grows at 180\ \text{MB/min}. Available memory before pressure threshold is 1.4\ \text{GB}. Estimate time to threshold.
Solution
Convert:
Time:
Engineering Comment
Page cache can cause memory pressure even when application heap is stable. Release should monitor cgroup and OS-level memory.
Plausibility Check
At nearly two hundred megabytes per minute, a little over one gigabyte lasts under ten minutes.
Exercise 14: Resource-Order Remediation
A system defines lock order A before B before C. One code path acquires C then A. Determine whether it violates the rule.
Solution
Required order:
Observed order:
This violates the global order.
Engineering Comment
Global order is a deadlock prevention control only if every code path follows it.
Plausibility Check
Acquiring C before A reverses the stated ordering.
Exercise 15: Swap-In Latency Budget
A request path can tolerate 25\ \text{ms} extra latency. It may incur 3 swap-ins at 9\ \text{ms} each. Check status.
Solution
Since:
the path fails by 2\ \text{ms}.
Engineering Comment
Swap-in latency should not be part of a normal release path for latency-sensitive services.
Plausibility Check
Three events just under ten milliseconds each exceed twenty-five milliseconds.
Exercise 16: Memory Safety Evidence Completion
A release checklist has 12 memory-safety evidence items. Ten are accepted, one is open and one is conditionally accepted. The gate requires all accepted. Decide status.
Solution
Accepted percentage:
Because the gate requires all accepted, release is blocked.
Engineering Comment
Conditional evidence is not accepted evidence for resource safety. The open items should be closed or formally dispositioned.
Plausibility Check
Two of twelve items are not accepted, so a full acceptance gate cannot pass.
Exercise 17: Heap Alert Threshold
A heap alert should fire at 80\% of a 2.0\ \text{GB} heap limit. Current guarded heap use is 1.74\ \text{GB}. Check status.
Solution
Alert threshold:
Since:
the alert should fire.
Engineering Comment
Alerting before the hard limit allows controlled shedding, restart or investigation rather than abrupt failure.
Plausibility Check
The current guarded use is well above four fifths of the heap limit.
Exercise 18: OS Resource Safety Release Gate
A resource-safety release requires memory margin pass, page-fault pass, deadlock-order pass, stack pass and GC-pause pass. Results are pass, pass, pass, pass and fail. Decide status.
Solution
The all-of gate fails because GC pause failed:
Engineering Comment
Resource safety is only as strong as the weakest required condition. A pause failure can invalidate otherwise good memory evidence.
Plausibility Check
One failed condition blocks an all-of release gate.
Common Release Mistakes
- Treating total free memory as enough when contiguous allocation fails.
- Measuring heap growth for minutes and extrapolating to months without soak evidence.
- Relying on deadlock timeouts instead of removing circular wait.
- Ignoring page faults because CPU utilization is low.
- Counting conditional checklist items as accepted.
- Tuning garbage collection on synthetic object lifetimes.
Validation Package Checklist
- Memory profile, working-set estimate and page-fault histogram under representative load.
- Leak, fragmentation, stack and descriptor high-water evidence.
- Lock graph, lock-order rule and violation audit.
- Deadlock timeout behavior plus prevention or remediation control.
- GC pause distribution with production-like heap and allocation rate.
- Watchdog/reset-loop evidence and retained diagnostic logs.
- Resource-safety release gate with every required condition accepted.