Cache

/kæʃ/

noun … “Fast memory for frequently used data.”

Cache is a high-speed memory layer that stores copies of frequently accessed data to reduce access latency and improve overall system performance. It acts as an intermediary between slower main memory (e.g., RAM) or storage and the CPU, allowing repeated reads and writes to be served quickly. Caches are used in hardware (CPU caches, GPU caches), software (database query caching), and networking (CDN caches).

Key characteristics of Cache include:

Speed: typically implemented with faster memory types (e.g., SRAM) to minimize latency.
Hierarchy: CPU caches are often divided into levels—L1 (smallest, fastest), L2, and L3 (larger, slightly slower).
Locality: cache efficiency relies on temporal and spatial locality, predicting which data will be reused.
Coherency: ensures cached data is synchronized with main memory to prevent stale reads.
Eviction policies: strategies like LRU (Least Recently Used) decide which entries are replaced when the cache is full.

Workflow example: When a CPU requests data:

function read_data(address) {
    if cache.contains(address):
        return cache.get(address)  -- Fast access
    else:
        data = RAM.read(address)
        cache.update(address, data)
        return data
}

Here, the cache checks for the requested data. If present, it returns the value quickly. If not, it retrieves data from slower RAM and updates the cache for future access.

Conceptually, Cache is like keeping frequently referenced documents on your desk instead of fetching them from a filing cabinet every time—you trade a small amount of space for significant speed and convenience.

See Memory, RAM, CPU, GPU, Cache Coherency.

Hardware

Computing

Module