Topic
Algorithmic Performance and Data Structures
A practical introduction to algorithmic performance and data structures, including complexity, latency, memory use, arrays, hash tables, trees, graphs, queues, and engineering trade-offs.
Algorithmic performance is the engineering study of how a computation behaves as input size, data shape, hardware, load, and operating constraints change. Data structures are the representations that make those computations practical. Together, they determine whether a system is fast enough, predictable enough, memory-efficient enough, and robust enough for its intended use.
The topic is often introduced as a set of abstract rules about Big O notation. That is useful, but incomplete. Real engineering decisions also depend on latency, memory hierarchy, allocation behaviour, branch prediction, concurrency, queueing, failure modes, testability, maintainability, and the distribution of actual inputs. A theoretically efficient algorithm can still be the wrong choice if it has poor locality, unbounded tail latency, hard-to-test invariants, or failure behaviour that the system cannot tolerate.
What performance means
Performance is not one number. Common measures include throughput, response time, latency percentiles, memory use, energy consumption, startup time, update cost, storage cost, and worst-case execution time. The right metric depends on the system.
A batch analytics job may care about total runtime and memory footprint. A real-time controller may care about bounded execution time and predictable jitter. A database index may care about update cost, lookup latency, cache behaviour, and crash recovery. A user interface may care about tail latency because rare stalls are highly visible.
Useful performance requirements state:
- Input size and expected growth.
- Operation mix, such as reads, writes, searches, inserts, deletes, or scans.
- Latency target and percentile, not just average time.
- Memory, storage, and energy limits.
- Concurrency and contention assumptions.
- Failure behaviour when data is malformed, large, bursty, or adversarial.
Without these details, algorithm selection becomes guesswork.
Complexity and scaling
Asymptotic complexity describes how resource use grows with input size. If n is the input size, common classes include constant O(1), logarithmic O(\log n), linear O(n), linearithmic O(n\log n), quadratic O(n^2), exponential O(2^n), and factorial O(n!).
Complexity is most useful for comparing growth rates. A linear algorithm can handle much larger inputs than a quadratic algorithm as n grows. A logarithmic lookup can remain practical even when the data set is large.
However, asymptotic notation hides constants, memory access patterns, input distributions, setup cost, and implementation details. For small inputs, a simple linear scan can beat a more complex indexed structure. For large systems, memory locality can dominate arithmetic cost. Engineers should use complexity to screen designs, then use measurement to confirm the behaviour that matters.
Arrays, lists, and contiguous storage
Arrays store elements in contiguous memory. This gives direct indexing:
with constant-time access when the index is known. Contiguous storage is often fast because nearby elements are fetched together by caches and memory prefetchers. Arrays are therefore strong choices for dense numeric data, lookup tables, buffers, vectors, and workloads dominated by iteration.
The trade-off is update flexibility. Inserting or deleting in the middle of an array may require shifting many elements:
Dynamic arrays reduce reallocation frequency by reserving extra capacity, but occasional growth still requires copying existing elements. Linked lists make some insertions cheap when the position is already known, but they usually have poor cache locality and require extra pointers. In many engineering systems, arrays or packed vectors outperform pointer-heavy structures unless insertion patterns clearly justify the alternative.
Hash tables and direct lookup
Hash tables map keys to array positions through a hash function. Average lookup, insertion, and deletion can be:
when the hash function distributes keys well and the load factor remains controlled. Hash tables are useful for dictionaries, caches, symbol tables, routing tables, de-duplication, memoization, and indexing.
The engineering risks are collision behaviour, memory overhead, resizing cost, ordering requirements, and adversarial inputs. A poor hash function or overloaded table can degrade performance. In real-time systems, resizing pauses may be unacceptable unless capacity is planned or bounded data structures are used. In persistent systems, hash table layout must also be considered with serialization, compatibility, and recovery.
Trees and ordered data
Trees represent hierarchical relationships. A binary tree gives each node at most two children. If the tree is balanced, many search operations can take:
If the tree becomes unbalanced, the same operations can degrade to:
Balanced search trees are useful when data must stay ordered while supporting inserts, deletes, predecessor or successor queries, range scans, and sorted traversal. B-trees and related structures are central to storage systems because they reduce expensive disk or page accesses by using high branching factors.
Tree choice depends on more than lookup complexity. Engineers must check balancing rules, memory layout, iterator stability, update frequency, concurrency control, persistence, recursion depth, and worst-case latency. A tree that is elegant in memory may be poor on storage, and a storage-optimized tree may be too heavy for small in-memory data.
Graphs and networks
Graphs model relationships between entities. Nodes may represent devices, states, tasks, roads, components, functions, packages, or services. Edges may represent connections, dependencies, transitions, flows, or costs.
Two common representations are adjacency lists and adjacency matrices. An adjacency list stores only neighbours and is efficient for sparse graphs. An adjacency matrix stores all possible pairs and is useful for dense graphs or constant-time edge checks:
for checking a stored matrix entry.
Graph algorithms are used in routing, scheduling, dependency analysis, reachability, control-flow analysis, circuit design, logistics, social networks, and distributed systems. They can also be expensive. A design that is practical on a sparse graph may be impossible on a dense graph. Engineers should estimate vertex count, edge count, update rate, path requirements, and whether the graph can contain cycles or disconnected components.
Queues, buffers, and service systems
Queues appear whenever work arrives faster or less regularly than it can be served. They are not only software containers. They are system-level components in processors, network interfaces, operating systems, message brokers, databases, manufacturing cells, and control pipelines.
Little’s Law gives a broad consistency check:
where L is average number of items in the system, \lambda is throughput or arrival rate, and W is average time in the system. If arrival rate rises while service capacity stays fixed, waiting time can increase sharply, especially near high utilization.
Queue engineering must account for capacity, overflow policy, priority, fairness, backpressure, batching, deadlines, retries, and failure isolation. An unbounded queue can hide overload until memory is exhausted. A bounded queue can drop work or block producers. Neither is universally correct; the policy must match the system’s safety and service requirements.
Workload evolution and adversarial inputs
Algorithmic choices should be reviewed against how the workload can change after release. A data structure that is efficient for a few thousand records may become a bottleneck when customers, sensors, logs, routes, transactions, or graph edges grow by two orders of magnitude. The first implementation may also assume clean input while the deployed system receives duplicated keys, skewed distributions, bursty traffic, malformed records, missing fields, or deliberately hostile requests.
Adversarial input does not only matter for public web systems. Industrial controllers, diagnostic tools, engineering solvers, and embedded devices can all receive unexpected data through corrupted files, failing sensors, clock drift, communication retries, calibration mistakes, or operator workflows that were not in the original benchmark set. A hash table with poor collision behaviour, a recursive algorithm with unbounded depth, or a queue with no backpressure can become a reliability problem.
Good reviews include growth tests, skewed distributions, maximum-size records, empty inputs, duplicate-heavy inputs, timeout cases, and memory-pressure runs. These tests reveal whether the chosen structure degrades gracefully or whether it needs explicit limits, streaming, indexing, pagination, preallocation, or rejection rules.
Latency, throughput, and tail behaviour
Throughput measures work completed per unit time. Latency measures delay for an individual operation. Increasing throughput does not always reduce latency. Batching can improve throughput while making individual requests wait longer. Parallelism can improve throughput while increasing contention, synchronization, or queueing delay.
Tail latency often matters more than average latency. A service with a 10 ms average response but occasional 2 s stalls may be unacceptable for interactive use or control. Tail behaviour can come from cache misses, garbage collection, page faults, lock contention, retries, scheduler delay, network congestion, storage compaction, or rare input shapes.
A practical performance review should report at least median, high-percentile, and worst observed latency under defined load. It should also state whether measurements include serialization, network transfer, rendering, database access, queueing, and retries.
Memory hierarchy and locality
Modern computers are not uniform machines. Registers, caches, main memory, storage, and networks differ by orders of magnitude in latency and bandwidth. Data structures that touch memory sequentially can be much faster than structures that jump through pointers, even when both have similar abstract operation counts.
Locality can change the best choice. A sorted array can outperform a tree for small or moderately sized data because binary search touches predictable memory and iteration is compact. A cache-friendly hash table can beat a theoretically clean structure that scatters allocations. For embedded systems, static allocation and compact layout can be more important than dynamic flexibility.
Memory cost also includes fragmentation, allocator overhead, metadata, alignment padding, and peak usage during resizing or copying. These costs should be included when memory is constrained or the system must run for long periods without restart.
Concurrency and data ownership
Concurrent access can change the best data-structure choice. A structure with excellent single-thread performance may perform poorly when every operation requires a global lock. Conversely, a lock-free or sharded structure can improve throughput while making ordering, memory reclamation, debugging, and correctness harder.
Data ownership should be explicit. Engineers should know which task, thread, interrupt, process, or service can read or mutate each structure. Immutable snapshots, copy-on-write layouts, append-only logs, ring buffers, message queues, and partitioned maps can reduce contention when they match the workload. They can also increase memory use or latency if used without a clear access pattern.
Performance tests should therefore include contention, not only isolated operations. The relevant question is whether the algorithm meets its latency and reliability targets while sharing CPU, memory, cache, locks, and I/O with the rest of the system.
Correctness, invariants, and maintainability
Performance does not excuse weak correctness. Many high-performance data structures rely on invariants: sorted order, heap property, hash distribution, tree balance, reference counts, version counters, lock ordering, or acyclic dependencies. If those invariants are not clear and tested, performance gains can create fragile systems.
Maintainability is an engineering cost. A simple structure with predictable behaviour may be better than a specialized structure that only one person understands. Specialized algorithms are justified when they remove real bottlenecks, enforce important bounds, or simplify a domain-specific workload. They should include tests for edge cases, adversarial cases, and performance-sensitive cases.
Measurement workflow
A disciplined workflow for algorithm and data-structure selection is:
- Define the operations and their frequencies.
- Estimate input sizes and growth over the system lifetime.
- Identify hard limits for latency, memory, energy, and worst-case behaviour.
- Select simple baseline structures first.
- Use complexity analysis to screen obviously poor choices.
- Benchmark with representative and adversarial inputs.
- Inspect memory allocation, cache behaviour, and contention.
- Validate tail latency, not only average runtime.
- Keep the chosen structure documented with its invariants and expected workload.
This workflow avoids treating algorithm choice as either pure theory or pure profiling. The best result usually comes from using theory to narrow the design space and measurement to verify the real system.
Common mistakes
Common mistakes include choosing a data structure from average complexity alone, ignoring memory locality, benchmarking only tiny inputs, and forgetting that input distributions can change in production. Another mistake is optimizing a cold path while the hot path remains dominated by I/O, synchronization, allocation, or queueing.
Engineers also often ignore failure behaviour. What happens when the table grows, the queue fills, the tree becomes unbalanced, an input is adversarial, a dependency stalls, or memory allocation fails? Algorithmic design is complete only when the system remains understandable under stress.