Data Structures
Hash Table
O(1) lookups via hashing and chaining
A hash table maps keys to values in O(1) average time by feeding each key through a hash function to compute a bucket index. When two keys hash to the same bucket — a collision — they're stored together in a small structure, typically a linked list. It's the workhorse data structure of every dynamic language and database.
- Lookup (avg, insert, delete)O(1)
- Lookup (worst case)O(n)
- SpaceO(n)
- Ordered iterationNo (without extra structure)
- Best forPoint queries, deduplication, caching
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How a hash table works
Three pieces, working together:
- An array of buckets — usually starting around size 16 or so.
- A hash function that turns any key into an integer. Modulo by the bucket count gives an index between 0 and (capacity − 1).
- A collision strategy for when two keys land in the same bucket.
To insert (key, value): compute bucket = hash(key) % capacity, then store the entry in that bucket. To look up key: compute the same bucket, walk the bucket's contents looking for the key, return its value.
This visualization uses separate chaining — each bucket is a small linked list. The other major strategy is open addressing, where collided entries spill over into nearby empty buckets. Both have O(1) average performance under low load factor.
What makes a good hash function
The hash function's job is to distribute keys uniformly across buckets. A bad hash function — say, "use the first character of the string" — bunches everything into a few buckets and turns the hash table into a linked list with extra steps.
- Uniformity. Each bucket should receive roughly the same fraction of keys.
- Determinism. The same key always produces the same hash. (No randomness.)
- Speed. The hash itself runs in O(1). For variable-length keys (strings), O(length) — but that's still constant per character.
- Avalanche. Tiny changes in the input produce big changes in the output. This makes adversarial collision attacks harder.
For non-cryptographic uses, popular fast hash functions include FNV, MurmurHash, xxHash, and SipHash. SipHash is the default in Python (since 3.4), Ruby, Rust, and Perl — it's specifically designed to defeat hash-flooding denial-of-service attacks.
Collision strategies
| Separate chaining | Open addressing | |
|---|---|---|
| Where do collisions go | Linked list per bucket | Probe to next empty bucket |
| Memory layout | Pointers everywhere (cache-unfriendly) | Contiguous (cache-friendly) |
| Max load factor | Can exceed 1.0 | Must stay < 1.0 |
| Delete | Easy — unlink from list | Tricky — needs tombstones |
| Worst-case probe | O(n) if hash is bad | O(n) on full table |
| Used by | Java HashMap, Python (older) | Python (current), Rust, Go |
Modern systems lean toward open addressing because cache locality wins on real hardware. Java's HashMap uses chaining historically but in Java 8+ converts buckets to red-black trees once they exceed 8 entries — defending against the O(n) worst case.
Load factor and resizing
The load factor is size / capacity. As the table fills, average chain length grows, and lookups slow. Most implementations resize automatically when load factor crosses a threshold — typically 0.7 for open addressing, 0.75 for chaining.
Resizing is expensive: allocate a new bucket array (usually 2× larger), then re-hash and re-insert every entry. The amortized cost stays O(1) per insert because each entry only gets rehashed O(log n) times across all the resize events — but a single insert can sometimes be O(n) when it triggers a resize. This matters for real-time systems where any single operation must be fast.
JavaScript implementation
class HashTable {
constructor(capacity = 16) {
this.buckets = Array.from({ length: capacity }, () => []);
this.size = 0;
}
_hash(key) {
let h = 0;
const s = String(key);
for (let i = 0; i < s.length; i++) {
h = (h * 31 + s.charCodeAt(i)) | 0;
}
return Math.abs(h) % this.buckets.length;
}
set(key, value) {
const bucket = this.buckets[this._hash(key)];
const entry = bucket.find(([k]) => k === key);
if (entry) entry[1] = value;
else { bucket.push([key, value]); this.size++; }
if (this.size / this.buckets.length > 0.75) this._resize();
}
get(key) {
const bucket = this.buckets[this._hash(key)];
const entry = bucket.find(([k]) => k === key);
return entry ? entry[1] : undefined;
}
_resize() {
const old = this.buckets;
this.buckets = Array.from({ length: old.length * 2 }, () => []);
this.size = 0;
for (const bucket of old)
for (const [k, v] of bucket) this.set(k, v);
}
}
For real production use: prefer the built-in Map. JavaScript's Map is implemented in C++ inside V8, much faster than any JS-level hash table you'll write.
Python implementation
class HashTable:
def __init__(self, capacity=16):
self.buckets = [[] for _ in range(capacity)]
self.size = 0
def _hash(self, key):
return hash(key) % len(self.buckets)
def set(self, key, value):
bucket = self.buckets[self._hash(key)]
for entry in bucket:
if entry[0] == key:
entry[1] = value
return
bucket.append([key, value])
self.size += 1
if self.size / len(self.buckets) > 0.75:
self._resize()
def get(self, key):
for k, v in self.buckets[self._hash(key)]:
if k == key: return v
return None
def _resize(self):
old = self.buckets
self.buckets = [[] for _ in range(len(old) * 2)]
self.size = 0
for bucket in old:
for k, v in bucket:
self.set(k, v)
Python's built-in dict is a hash table with open addressing, written in C — use it. Hand-rolled Python hash tables are educational only.
Hash table vs other lookup structures
| Lookup | Ordered iteration | Range queries | Memory | |
|---|---|---|---|---|
| Hash table | O(1) avg | No | No | Higher (overcommits) |
| Binary search (sorted array) | O(log n) | Yes | Yes | Tight |
| Balanced BST | O(log n) | Yes | Yes | Pointer-heavy |
| Trie | O(key length) | Yes (lex order) | Prefix only | Higher |
Hash tables win when you only need point queries and don't care about order. The instant you need "give me everything between X and Y" or "what's the smallest key larger than X", reach for a tree or a skip list instead.
Frequently asked questions
Why is hash table lookup O(1)?
A hash function turns any key into an integer in O(1) time, and integer indexing into an array is O(1). The catch is "average case" — collisions can degrade individual lookups, but with a good hash function and a reasonable load factor, the average chain length stays close to 1.
What's a hash collision?
When two different keys produce the same bucket index. They're inevitable because there are infinitely many possible keys but a finite number of buckets — pigeonhole principle. Hash tables resolve collisions either by chaining (storing collided keys in a list per bucket) or by open addressing (probing nearby buckets).
What's the load factor?
The ratio of items to buckets. Below ~0.75, performance is great. Above it, chains lengthen and lookups slow. Most production hash tables auto-resize (double the bucket array, rehash everything) when the load factor crosses a threshold — typically 0.7-0.8.
When should I NOT use a hash table?
When you need ordered iteration (use a balanced BST or sorted array), when you need range queries (use a B-tree or skip list), when keys are tiny integers in a known range (use a direct array), or when memory is extremely tight (hash tables overcommit memory to keep load factor low).
Are hash tables faster than binary search?
For point queries on unsorted data, yes — O(1) vs O(log n). But binary search wins when the data is already sorted on disk, or when you also need range queries, sorted iteration, or finding the next-smaller/larger element. Choose by what queries you actually run, not by raw point-lookup speed.