Intermediate Representation

/ˌaɪ ˈɑːr/

noun … “The shared language between source code and machines.”

IR, short for Intermediate Representation, is an abstract, structured form of code used internally by a Compiler to bridge the gap between high-level source languages and low-level machine instructions. It is not meant to be written by humans or executed directly by hardware. Instead, IR exists as a stable, analyzable format that enables transformation, optimization, and portability across languages and architectures.

The core purpose of IR is separation of concerns. Front ends translate source code into IR, capturing program structure, control flow, and data flow without committing to a specific processor. Back ends then consume IR to generate target-specific machine code. By standardizing this middle layer, a single optimizer and code generator can serve many languages and platforms. This design is foundational to systems such as LLVM, where multiple language front ends and many hardware targets share a common optimization pipeline.

A defining property of IR is that it is lower level than syntax trees but higher level than assembly. Compared to an AST, IR removes most surface syntax and focuses on explicit operations, control flow, and data dependencies. Compared to Bytecode, IR is usually richer in semantic detail and designed for aggressive optimization rather than direct interpretation. This balance makes IR ideal for program analysis, transformation, and performance tuning.

Strong typing is another common characteristic of IR. Values and operations carry explicit type information, allowing compilers to reason precisely about correctness and optimization opportunities. Control flow is typically represented using basic blocks and explicit branches, which simplifies analysis such as dominance, liveness, and dependency tracking. These structural choices allow optimization passes to be composed, reordered, and reused without ambiguity.

In practical workflows, IR enables powerful optimization strategies. A compiler may convert source code into IR, run dozens of optimization passes, and repeatedly refine the program representation before emitting final machine code. The same IR can be optimized differently depending on goals such as speed, code size, or energy efficiency. In dynamic systems, IR may be generated and optimized at runtime by a JIT compiler, adapting the program based on observed execution behavior.

Consider a typical compilation pipeline. Source code is parsed and type-checked, then lowered into IR. Optimizers analyze loops, eliminate redundant computations, and simplify control flow within the IR. Finally, the refined IR is translated into instructions tailored for a specific CPU. At no point does the optimizer need to know which language the program came from, only how the IR behaves.

Conceptually, IR is like a universal wiring diagram. Different architects may sketch buildings in different styles, and different electricians may wire systems differently, but the diagram captures the essential connections in a standard form. Once everything is reduced to that shared diagram, improvements and adaptations become systematic rather than ad hoc.

See Compiler, LLVM, AST, Bytecode, JIT.

Software

Programming

Representation