FP16 | CΛTΞИCOΔΞ

/ˌɛf ˈpiː ˈsɪks ˈti:n/

n. "IEEE 754 half-precision 16-bit floating point format trading precision for 2x HBM throughput in AI training."

FP16 is a compact binary16 floating-point format using 1 sign bit, 5 exponent bits, and 10 mantissa bits to represent ±6.55×10⁴ range with ~3.3 decimal digits precision—optimized for RNN forward/backward passes where FP32 master weights preserve accuracy during gradient accumulation. Half-precision enables 4x higher tensor core throughput on NVIDIA/AMD GPUs while mixed-precision training scales models infeasible in pure FP32 due to HBM memory limits.

Key characteristics of FP16 include:

IEEE 754 Layout: 1 sign + 5 biased exponent (15) + 10 fraction bits = 16 total.
Dynamic Range: ±6.10×10⁻⁵ to ±6.55×10⁴; machine epsilon 9.77×10⁻⁴.
Tensor Core Native: FP16×FP16→FP32 accumulation 125-1000TFLOPS on H100.
Mixed Precision: FP16 compute with FP32 master weights/gradients for stability.
Memory Efficiency: 2 bytes/value enables 2x larger RNN batches vs FP32.

A conceptual example of FP16 mixed-precision training flow:

1. Cast FP32 model weights → FP16 for forward pass
2. FP16 matmul: tensor_core(A_fp16, B_fp16) → C_fp32_acc
3. Loss computation FP16 → cast to FP32 for backprop
4. FP32 gradients × learning_rate → FP32 weight update
5. Cast updated weights → FP16 for next iteration
6. Loss scale × 128 prevents FP16 underflow

Conceptually, FP16 is like shooting arrows with training wheels—reduced precision mantissa speeds SIMD tensor cores 8x versus FP32 while FP32 "safety copies" catch accuracy drift, perfect for HPC training where throughput > ultimate precision.

In essence, FP16 unlocks HBM-limited AI scale from billion-parameter RNN inference to trillion-parameter LLMs on SerDes clusters, vectorized via SIMD while FFT-preprocessed Bluetooth beamweights run FP16-optimized on EMI-shielded edge GPUs.

Compute

Performance

Processing