Glossary term

Replicated Log

Engineering definition of a replicated log covering append order, commit index, follower lag, log compaction, snapshot recovery and validation.

Branch: Computer Engineering
Glossary type: concept
Content: Glossary term
Updated: Jun 26, 2026
Revision: v1.0.0 · reviewed

Definition

concept

A replicated log is an ordered sequence of durable entries copied across nodes so a distributed system can apply the same commands or state transitions in the same order.

Replicated logs appear in consensus systems, metadata stores, distributed databases, lock services, controllers, event-sourced services and recovery mechanisms. A useful design states the entry format, append rule, commit rule, apply rule, durability boundary, follower lag metric, snapshot rule, compaction policy, recovery behavior and validation evidence.

A replicated log is an ordered sequence of durable entries copied across nodes so a distributed system can apply the same commands or state transitions in the same order. It is a common foundation for consensus systems, metadata stores, lock services, replicated databases and event-sourced services.

The log is not just a file. It is a contract about ordering, durability, commitment, replay and recovery. A system that loses, reorders or applies log entries inconsistently can violate linearizability, duplicate commands or recover into a state that never existed.

Entry Model

A log entry at index:

k

can be represented as:

L_k=(term_k,cmd_k,meta_k)

where term_k or epoch identifies the authority that wrote it, cmd_k is the command or state transition, and meta_k may include checksum, client request id, timestamp, dependency metadata or schema version.

The index gives a stable order:

L_1,L_2,\ldots,L_k

The meaning of each entry depends on applying prior committed entries first.

Append and Commit

Appending an entry is not the same as committing it. A leader or writer may append locally before the entry is safely replicated.

Let:

C

be the commit index. Entries with:

k\leq C

are safe to apply according to the system’s commit rule. Entries with:

k>C

may still be speculative, uncommitted or subject to truncation during recovery.

Apply Order

Each replica should apply committed entries in index order. If:

A_i

is the highest applied index on replica:

i

then normal application advances:

A_i\leftarrow A_i+1

only when entry:

L_{A_i+1}

is available and committed. Skipping an entry can create a state that no valid log prefix represents.

Follower Lag

Follower lag can be measured in entries:

N_{lag}=C-F_i

where:

F_i

is the highest replicated index on follower:

i

If entries arrive at rate:

\lambda_e

and a follower is behind by:

T_{lag}

seconds, then:

N_{lag}\approx\lambda_e T_{lag}

For:

\lambda_e=2000\ \text{entries/s},\quad T_{lag}=1.5\ \text{s}

the follower is approximately:

N_{lag}=2000\cdot1.5=3000\ \text{entries}

behind the commit path.

Storage Growth and Compaction

If each entry has average size:

B_e

and append rate is:

\lambda_e

storage growth is:

G_s=B_e\lambda_e

If storage budget is:

C_s

current log storage is:

S_0

and growth rate is:

G_s

time to budget exhaustion is:

\displaystyle T_s=\frac{C_s-S_0}{G_s}

For:

C_s=12000000000,\quad S_0=3000000000,\quad G_s=900000\ \text{bytes/s}

the time is:

\displaystyle T_s=\frac{12000000000-3000000000}{900000}=10000\ \text{s}

Snapshots and compaction are therefore correctness and availability mechanisms, not only disk cleanup.

Boundary With Consensus and CDC

Consensus algorithms decide which entries are committed and in what order. The replicated log stores and replays those entries. Change data capture may read a database log and publish changes downstream, but it does not necessarily provide the same command-ordering and commit-index contract as a consensus log.

A replicated log can also support linearizable reads by proving that a read observes at least a known commit index.

Recovery

Recovery should restore the last durable log prefix, verify checksums, replay committed entries, discard invalid speculative entries, install snapshots safely and resume from a known commit index. A follower that rejoins after a long outage may need snapshot transfer rather than entry-by-entry catch-up.

The recovery rule should define what happens when local entries conflict with the leader’s log, when snapshots are corrupt, when schema versions changed and when disk contains entries beyond the known commit index.

Validation

Validation should include leader crash after append, crash after commit, follower restart, divergent follower logs, snapshot install, compaction during reads, checksum failure, disk-full behavior, slow follower catch-up, duplicate client command, out-of-order delivery and mixed-version replay.

Useful evidence includes append latency, commit latency, fsync time, follower lag, applied index, snapshot age, compaction duration, replay time, truncation count, checksum failures and invariant checks after recovery.

Failure Modes

Common failure modes include applying uncommitted entries, truncating committed entries, serving reads from a follower behind the required commit index, compacting entries before snapshots are durable, losing client request ids during replay, replaying non-idempotent commands twice, accepting corrupted snapshots and monitoring only leader health while followers fall behind.

A replicated log is credible only when the system can prove what entry is committed, what entry each replica has applied, and how recovery preserves a valid committed prefix.

REF

Disciplines