Why is hash table lookup O(1)?

A hash function turns any key into an integer in O(1) time, and integer indexing into an array is O(1). The catch is "average case" — collisions can degrade individual lookups, but with a good hash function and a reasonable load factor, the average chain length stays close to 1.

What's a hash collision?

When two different keys produce the same bucket index. They're inevitable because there are infinitely many possible keys but a finite number of buckets — pigeonhole principle. Hash tables resolve collisions either by chaining (storing collided keys in a list per bucket) or by open addressing (probing nearby buckets).

What's the load factor?

The ratio of items to buckets. Below ~0.75, performance is great. Above it, chains lengthen and lookups slow. Most production hash tables auto-resize (double the bucket array, rehash everything) when the load factor crosses a threshold — typically 0.7-0.8.

When should I NOT use a hash table?

When you need ordered iteration (use a balanced BST or sorted array), when you need range queries (use a B-tree or skip list), when keys are tiny integers in a known range (use a direct array), or when memory is extremely tight (hash tables overcommit memory to keep load factor low).

Are hash tables faster than binary search?

For point queries on unsorted data, yes — O(1) vs O(log n). But binary search wins when the data is already sorted on disk, or when you also need range queries, sorted iteration, or finding the next-smaller/larger element. Choose by what queries you actually run, not by raw point-lookup speed.

Hash Table — How It Works, Complexity & Code

How a hash table works

Three pieces, working together:

An array of buckets — usually starting around size 16 or so.
A hash function that turns any key into an integer. Modulo by the bucket count gives an index between 0 and (capacity − 1).
A collision strategy for when two keys land in the same bucket.

To insert (key, value): compute bucket = hash(key) % capacity, then store the entry in that bucket. To look up key: compute the same bucket, walk the bucket's contents looking for the key, return its value.

This visualization uses separate chaining — each bucket is a small linked list. The other major strategy is open addressing, where collided entries spill over into nearby empty buckets. Both have O(1) average performance under low load factor.

What makes a good hash function

The hash function's job is to distribute keys uniformly across buckets. A bad hash function — say, "use the first character of the string" — bunches everything into a few buckets and turns the hash table into a linked list with extra steps.

Uniformity. Each bucket should receive roughly the same fraction of keys.
Determinism. The same key always produces the same hash. (No randomness.)
Speed. The hash itself runs in O(1). For variable-length keys (strings), O(length) — but that's still constant per character.
Avalanche. Tiny changes in the input produce big changes in the output. This makes adversarial collision attacks harder.

For non-cryptographic uses, popular fast hash functions include FNV, MurmurHash, xxHash, and SipHash. SipHash is the default in Python (since 3.4), Ruby, Rust, and Perl — it's specifically designed to defeat hash-flooding denial-of-service attacks.

Collision strategies

	Separate chaining	Open addressing
Where do collisions go	Linked list per bucket	Probe to next empty bucket
Memory layout	Pointers everywhere (cache-unfriendly)	Contiguous (cache-friendly)
Max load factor	Can exceed 1.0	Must stay < 1.0
Delete	Easy — unlink from list	Tricky — needs tombstones
Worst-case probe	O(n) if hash is bad	O(n) on full table
Used by	Java HashMap, Python (older)	Python (current), Rust, Go

Modern systems lean toward open addressing because cache locality wins on real hardware. Java's HashMap uses chaining historically but in Java 8+ converts buckets to red-black trees once they exceed 8 entries — defending against the O(n) worst case.

Load factor and resizing

The load factor is size / capacity. As the table fills, average chain length grows, and lookups slow. Most implementations resize automatically when load factor crosses a threshold — typically 0.7 for open addressing, 0.75 for chaining.

Resizing is expensive: allocate a new bucket array (usually 2× larger), then re-hash and re-insert every entry. The amortized cost stays O(1) per insert because each entry only gets rehashed O(log n) times across all the resize events — but a single insert can sometimes be O(n) when it triggers a resize. This matters for real-time systems where any single operation must be fast.

JavaScript implementation

class HashTable {
  constructor(capacity = 16) {
    this.buckets = Array.from({ length: capacity }, () => []);
    this.size = 0;
  }

  _hash(key) {
    let h = 0;
    const s = String(key);
    for (let i = 0; i < s.length; i++) {
      h = (h * 31 + s.charCodeAt(i)) | 0;
    }
    return Math.abs(h) % this.buckets.length;
  }

  set(key, value) {
    const bucket = this.buckets[this._hash(key)];
    const entry = bucket.find(([k]) => k === key);
    if (entry) entry[1] = value;
    else { bucket.push([key, value]); this.size++; }
    if (this.size / this.buckets.length > 0.75) this._resize();
  }

  get(key) {
    const bucket = this.buckets[this._hash(key)];
    const entry = bucket.find(([k]) => k === key);
    return entry ? entry[1] : undefined;
  }

  _resize() {
    const old = this.buckets;
    this.buckets = Array.from({ length: old.length * 2 }, () => []);
    this.size = 0;
    for (const bucket of old)
      for (const [k, v] of bucket) this.set(k, v);
  }
}

For real production use: prefer the built-in Map. JavaScript's Map is implemented in C++ inside V8, much faster than any JS-level hash table you'll write.

Python implementation

class HashTable:
    def __init__(self, capacity=16):
        self.buckets = [[] for _ in range(capacity)]
        self.size = 0

    def _hash(self, key):
        return hash(key) % len(self.buckets)

    def set(self, key, value):
        bucket = self.buckets[self._hash(key)]
        for entry in bucket:
            if entry[0] == key:
                entry[1] = value
                return
        bucket.append([key, value])
        self.size += 1
        if self.size / len(self.buckets) > 0.75:
            self._resize()

    def get(self, key):
        for k, v in self.buckets[self._hash(key)]:
            if k == key: return v
        return None

    def _resize(self):
        old = self.buckets
        self.buckets = [[] for _ in range(len(old) * 2)]
        self.size = 0
        for bucket in old:
            for k, v in bucket:
                self.set(k, v)

Python's built-in dict is a hash table with open addressing, written in C — use it. Hand-rolled Python hash tables are educational only.

Hash table vs other lookup structures

	Lookup	Ordered iteration	Range queries	Memory
Hash table	O(1) avg	No	No	Higher (overcommits)
Binary search (sorted array)	O(log n)	Yes	Yes	Tight
Balanced BST	O(log n)	Yes	Yes	Pointer-heavy
Trie	O(key length)	Yes (lex order)	Prefix only	Higher

Hash tables win when you only need point queries and don't care about order. The instant you need "give me everything between X and Y" or "what's the smallest key larger than X", reach for a tree or a skip list instead.

Hash Table

Interactive visualization

Watch the 60-second explainer

How a hash table works

What makes a good hash function

Collision strategies

Load factor and resizing

JavaScript implementation

Python implementation

Hash table vs other lookup structures

Frequently asked questions

Interactive visualization

Watch the 60-second explainer

How a hash table works

What makes a good hash function

Collision strategies

Load factor and resizing

JavaScript implementation

Python implementation

Hash table vs other lookup structures

Frequently asked questions

Related concepts