/ˌeɪtʃ biː ɛm/
n. "3D-stacked DRAM interface delivering terabyte-per-second bandwidth via TSVs and 1024-bit channels unlike narrow DQS DDR."
HBM is high-performance memory created by vertically stacking multiple DRAM dies connected through Through-Silicon Vias (TSVs), providing massive bandwidth for GPUs and AI accelerators through 1024-4096 bit interfaces on 2.5D silicon interposers. HBM3 stacks 12-Hi configurations delivering 1.2TB/s per stack while consuming 30% less power than GDDR6, enabling HPC matrix multiplications and PAM4 signal training infeasible on traditional DIMM architectures.
Key characteristics of HBM include:
- Wide Interfaces: 1024-bit per 4-Hi stack (256-bit × 4 channels); scales to 8192-bit with 8 stacks.
- TSV Interconnects: 170μm thin dies vertically stacked; microbumps <40μm pitch to interposer.
- Bandwidth Density: HBM3 1.2TB/s/stack @6.4Gbps/pin; 3TB HBM3e for 9.2Gbps.
- 2.5D Integration: Silicon interposer couples GPU+HBM with <1ns latency vs 10ns DDR5.
- Power Efficiency: 7pJ/bit vs DDR5 12pJ/bit; logic die handles refresh/ECC.
A conceptual example of HBM memory subsystem flow:
1. GPU tensor core requests 32KB matrix tile from HBM0 pseudo-channel 0
2. 1024 TSVs deliver 32KB @1.2TB/s in 213ns (HBM3 6.4Gbps)
3. Interposer routes via 4x RDL layers <0.5ns skew
4. HBM logic die arbitrates 8-channel access w/ bank group interleaving
5. 12-Hi stack services via independent 2KB page buffers
6. Return data bypasses L2 cache → tensor core SRAMConceptually, HBM is like a skyscraper apartment block right next to the office—thousands of memory floors (DRAM dies) connected by high-speed elevators (TSVs) deliver data terabytes-per-second to the GPU tenant downstairs, eliminating slow street traffic of traditional DDR buses.
In essence, HBM fuels the AI/HPC revolution by collapsing the memory wall, feeding SerDes 400G networks and HPC clusters while riding ENIG interposers that mitigate EMI in dense LED-status racks.