A Note on LLVM
This is a note for Lopes’ Getting Started with LLVM Core Libraries. n.d. Accessed October 2, 2025. https://learning.oreilly.com/library/view/getting-started-with/9781782166924/.
Tools and Design
LLVM IR:
- Single Static Assignment form (SSA)
- Infinite number of Registers
- Easy link-time optimizations by storing entire programs in an on-disk IR representation
Apart from IR, other forms of program representations are:
- Abstract Syntax Tree (AST), immediate result from parsing
- Directed Acyclic Gragh (DAG) , after generation
- Modules
Fun fact that LLVM is originally aims for a vm runtime, like JVM. I’m glad they do not go this direction:
As the project matured, the design decision of maintaining an on-disk representation of the compiler IR remained as an enabler of link-time optimizations, giving less attention to the original idea of lifelong program optimizations. Eventually, LLVM’s core libraries formalized their lack of interest in becoming a platform by renouncing the acronym Low Level Virtual Machine, adopting just the name LLVM for historical reasons, making it clear that the LLVM project is geared to being a strong and practical C/C++ compiler rather than a Java platform competitor.
How LLVM works:
Actually, “clang” is a driver of many llvm tools and platform tools. For example:
- “clang -cc1” The llvm C frontend for IR generation
- “ld”: linux linker
- “opt”: IR level optimisation tool
Use clang -###
to view what tools are driving by clang
Those tools can be used individually:
$ clang -emit-llvm –S -c main.c -o main.ll
$ clang -emit-llvm –S -c sum.c -o sum.ll
LLVM IR
define i32 @sum(i32 %a, i32 %b) #0 {
entry:
%a.addr = alloca i32, align 4
%b.addr = alloca i32, align 4
store i32 %a, i32* %a.addr, align 4
store i32 %b, i32* %b.addr, align 4
%0 = load i32* %a.addr, align 4
%1 = load i32* %b.addr, align 4
%add = add nsw i32 %0, %1
ret i32 %add
}
attributes #0 = { nounwind ssp uwtable ... }
- Local Values are analogs of the registers in the assembly language.
- Local identifiers are with
%
and globals are with@
- An array is written as
[<number of elements> x <element type>]
attributes
are translation of C/C++ function decorators such as no throwing exceptions (nounwind
, use stack smash protectorssp
)- function body is divided into basic blocks (BBs)
- Each instruction is in the form of three address code
alloca
instruction reserves space on the stack frame
Optimisation on LLVM IR
The optimisation tool opt supports optimisation flags -Ox
. O0
means no optimisation, O2
includes most optimisation and Oz
is the highest level of optimisation.
clang -emit-llvm -O0 -S ./sum.c -o ./sum.ll
opt -Oz ./sum.ll -o ./sum.oz.ll -S
One can also use --passes
to select specific optimisation pass. See Invoke OPT
opt
operates on the llvm IR level.
Clang Static Analyzer
The clang static analyzer , also known as the scan-build
tool leverages a set of checkers to build elaborate bug reports.
It relies on symbolic execution engine and thus it has exponential time complexity.
⬢ [qfeng@toolbx ❱ llvm-experiment code]$ cat ./scan-test2.c
#include <stdio.h>
int main() { return 0; }
void my_function(int unknownvalue) {
int schroedinger_integer;
if (unknownvalue)
schroedinger_integer = 5;
printf("hi");
if (!unknownvalue)
printf("%d", schroedinger_integer);
}
⬢ [qfeng@toolbx ❱ llvm-experiment code]$ scan-build clang ./scan-test2.c
scan-build: Using '/usr/local/bin/clang-19' for static analysis
./scan-test2.c:11:9: warning: 2nd function call argument is an uninitialized value [core.CallAndMessage]
11 | printf("%d", schroedinger_integer);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
scan-build: 1 bug found.
The analysis is path-sensitive but no sound or complete guarantee.