This is a note for Lopes’ Getting Started with LLVM Core Libraries. n.d. Accessed October 2, 2025. https://learning.oreilly.com/library/view/getting-started-with/9781782166924/.

Tools and Design

LLVM IR:

  • Single Static Assignment form (SSA)
  • Infinite number of Registers
  • Easy link-time optimizations by storing entire programs in an on-disk IR representation

Apart from IR, other forms of program representations are:

Fun fact that LLVM is originally aims for a vm runtime, like JVM. I’m glad they do not go this direction:

As the project matured, the design decision of maintaining an on-disk representation of the compiler IR remained as an enabler of link-time optimizations, giving less attention to the original idea of lifelong program optimizations. Eventually, LLVM’s core libraries formalized their lack of interest in becoming a platform by renouncing the acronym Low Level Virtual Machine, adopting just the name LLVM for historical reasons, making it clear that the LLVM project is geared to being a strong and practical C/C++ compiler rather than a Java platform competitor.

How LLVM works:

Actually, “clang” is a driver of many llvm tools and platform tools. For example:

  • “clang -cc1” The llvm C frontend for IR generation
  • “ld”: linux linker
  • “opt”: IR level optimisation tool

Use clang -### to view what tools are driving by clang

Those tools can be used individually:

$ clang -emit-llvm –S -c main.c -o main.ll
$ clang -emit-llvm –S -c sum.c -o sum.ll

LLVM IR

define i32 @sum(i32 %a, i32 %b) #0 {
entry:
  %a.addr = alloca i32, align 4
  %b.addr = alloca i32, align 4
  store i32 %a, i32* %a.addr, align 4
  store i32 %b, i32* %b.addr, align 4
  %0 = load i32* %a.addr, align 4
  %1 = load i32* %b.addr, align 4
  %add = add nsw i32 %0, %1
  ret i32 %add
}

  
attributes #0 = { nounwind ssp uwtable ... }
  • Local Values are analogs of the registers in the assembly language.
  • Local identifiers are with % and globals are with @
  • An array is written as [<number of elements> x <element type>]
  • attributes are translation of C/C++ function decorators such as no throwing exceptions (nounwind, use stack smash protector ssp)
  • function body is divided into basic blocks (BBs)
  • Each instruction is in the form of three address code
  • alloca instruction reserves space on the stack frame

Optimisation on LLVM IR

The optimisation tool opt supports optimisation flags -Ox. O0 means no optimisation, O2 includes most optimisation and Oz is the highest level of optimisation.

clang -emit-llvm -O0 -S ./sum.c -o ./sum.ll
opt -Oz ./sum.ll -o ./sum.oz.ll -S

One can also use --passes to select specific optimisation pass. See Invoke OPT

opt operates on the llvm IR level.

Clang Static Analyzer

The clang static analyzer , also known as the scan-build tool leverages a set of checkers to build elaborate bug reports.

It relies on symbolic execution engine and thus it has exponential time complexity.

⬢ [qfeng@toolbx ❱ llvm-experiment code]$ cat ./scan-test2.c 
#include <stdio.h>

int main() { return 0; }

void my_function(int unknownvalue) {
    int schroedinger_integer;
    if (unknownvalue)
        schroedinger_integer = 5;
    printf("hi");
    if (!unknownvalue)
        printf("%d", schroedinger_integer);
}
⬢ [qfeng@toolbx ❱ llvm-experiment code]$ scan-build clang ./scan-test2.c 
scan-build: Using '/usr/local/bin/clang-19' for static analysis
./scan-test2.c:11:9: warning: 2nd function call argument is an uninitialized value [core.CallAndMessage]
   11 |         printf("%d", schroedinger_integer);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
scan-build: 1 bug found.

The analysis is path-sensitive but no sound or complete guarantee.