High-Level SIL Optimization in the Swift Compiler

Matt Rajca discovered last year that Swift’s Array.append(element) is 6x faster than Array operator+=(collection). This is a shame, because the latter is semantically equivalent, easier to type, and more pleasing to the eye. Swift is a new language, so there is no shortage of opportunities for optimization.

I’ve enjoyed the recent uptick in “how to build a super simple compiler” blog posts, but there isn’t a lot of material for those who want to go deeper into the world of programming languages and compilers. That’s mostly why it took me so long to make this contribution to the Swift compiler, despite already having some experience working with LLVM.

This post is aimed at those who already understand the basics of compiler design and terminology, but have yet to contribute to an optimizing compiler for a popular language. It’ll be most useful for those who want to work on the high level optimization passes in Swift.

It’s more of a reference than a hand-holdy tutorial, because if you try to write about compilers in too much detail, it might take you a lifetime, and you may never finish.

We’ll cover:

  1. How to read SIL (the bulk of this post, because it’s so important, and you’ll be reading a lot of it)
  2. How to write a SIL test case
  3. How to write an optimization pass
  4. How adding an optimization pass can make things slower

Skip to the end if you just want my thoughts on what contributing to Swift is like. This is a very long post, not meant to be read in a single sitting.

Why High-Level SIL Optimization?

Our goal is to replace arr += [5] with arr.append(5).

First, we have to know where we’ll implement this optimization, and why. Prior to this, the only optimizations I had written were LLVM optimization passes, which operated on LLVM IR.

The problem with this is that function calls can be very difficult to replace after they’ve been lowered into LLVM IR. At such a low level, it’s hard to tell what instructions are part of the original function call, which significantly complicates the implementation of the optimization pass, and makes it very fragile.

The next most logical place to perform this optimization is at the AST level. But since Swift ASTs are only syntactically valid, and not yet semantically valid, we cannot safely transform them.

To solve this problem, the Swift compiler has an intermediate representation between the AST and LLVM IR, called SIL (Swift Intermediate Language).

SIL, like LLVM, is a Static-Single-Assignment (SSA) IR. Unlike LLVM IR, it has a richer type system, uses basic block parameters rather than phi nodes, and crucially for us, allows functions to be annotated with semantic attributes.

Reading SIL

Let’s look at the generated SIL for the Swift that we want to optimize. Compile Swift, and then emit canonical SIL.

// input.swift
var list = [Int]();

for _ in 0..<1_000_000 {
  list += [5] // we'll swap this to list.append(5) to compare later
}
swift-sources/build/Ninja-DebugAssert/swift-macosx-x86_64/bin/swiftc\
-frontend\
-target x86_64-apple-macosx10.9\
-sdk /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk\
-Onone\
-emit-sil\
-o out.sil\
input.swift

We’re only going to look at what’s happening in the main method. I’ll represent types with T, T1, T2 etc, so Array means an Array of elements of type T.

The Setup

// main
sil @main : $@convention(c) (Int32, UnsafeMutablePointer<Optional>) -> Int32 {
  • // main this is a comment containing the name of the following method. SIL’s comments help humans make sense of it.
  • @main this method’s name is “main”. Identifier names in SIL are prefixed with the @ symbol.
  • @convention(c) the C calling convention should be used. Calling conventions specify how arguments and return values should be handled when a function is called.
  • (Int32, the 1st argument to this method is a 32-bit integer. By convention, we know that this integer is the number of arguments given to the program. The C equivalent of this is int argc
  • UnsafeMutablePointer<Optional>) there’s a bit to unpack here!
    • UnsafeMutablePointer is a raw pointer (memory address) to something of type T.
    • Optional means that you’ll either Some or None.
    • Int8 is an 8-bit integer. In context, this is a character (char).
    • UnsafeMutablePointer a pointer to a character. In context, this is a character array, also known as a string.
    • Optional either a pointer to a string, or null.
    • Putting it all together, this is a null-terminated array of pointers to strings. Its equivalent in C is char* [].
  • -> Int32 this method returns a 32-bit integer. By convention, we know that this is the exit code of the program.

 

 

// %0       // user: %7
// %1       // user: %14
bb0(%0 : $Int32, %1 : $UnsafeMutablePointer\<Optional\\>):
  • // %0 // user: %7  a comment, noting that register %7 depends on register %0. Registers in SIL are integers prefixed with %. Because SIL, like LLVM IR, is a Static Single Assignment (SSA) IR, you can think of these registers as constants in a typical programming language; they hold a value, and once set, cannot be changed. Whenever the compiler needs a new register, it simply increments the register number. The immutable registers in SSA make optimizations easier to write. Technically, these are virtual registers; a later pass will “lower” them to use real registers on the target architecture. If we look further in the code, we see that %0 is indeed used in an instruction whose result is stored in %7. Register numbers are scoped to the method that their basic blocks live in.
  • bb0 basic block zero. A basic block is a straight-line code sequence. That means that it has no branches, except at its entry and exit.
  • (%0 : $Int32 the first parameter to the basic block is a 32-bit integer. Say that basic block bb3 has immediate predecessors bb1 and bb2. bb3 needs to refer to register %7 in bb1, or register %11 in bb2, depending on which predecessor was used to reach it. In LLVM IR, we would use a Φ (Phi) function in bb3 to “choose” between %7 or %11 and assign the chosen value to a new register. In SIL, the predecessor basic blocks bb1 and bb2 do the choosing, by passing arguments to bb3 in the branch (br) instruction.
  • %1 : $UnsafeMutablePointer<Optional>) the second parameter, with the type already explained above. I’m not sure what the tildes (~) mean.

 

 

%2 = alloc\_stack $IndexingIterator, var, name "$i$generator", loc "input.swift":3:7, scope 2 // users: %78, %53, %59
  • alloc_stack T Allocate (uninitialized) memory on the stack to contain T, and return the address of the allocated memory.
  • $IndexingIterator The type of iterator that we are allocating memory for. Types in SIL start with $.

 

 

%3 = metatype $@thin CommandLine.Type, scope 1
// function\_ref CommandLine.\_argc.unsafeMutableAddressor
%4 = function\_ref @\_TFOs11CommandLineau5\_argcVs5Int32 : $@convention(thin) () -> Builtin.RawPointer, scope 1 // user: %5
%5 = apply %4() : $@convention(thin) () -> Builtin.RawPointer, scope 1 // user: %6
%6 = pointer\_to\_address %5 : $Builtin.RawPointer to \[strict] $\*Int32, scope 1 // user: %9
%7 = struct\_extract %0 : $Int32, #Int32.\_value, scope 1 // user: %8
%8 = struct $Int32 (%7 : $Builtin.Int32), scope 1 // user: %9
store %8 to %6 : $\*Int32, scope 1 // id: %9
%10 = metatype $@thin CommandLine.Type, scope 1
// function\_ref CommandLine.\_unsafeArgv.unsafeMutableAddressor
%11 = function\_ref @\_TFOs11CommandLineau11\_unsafeArgvGSpGSqGSpVs4Int8\_\_\_ : $@convention(thin) () -> Builtin.RawPointer, scope 1 // user: %12
%12 = apply %11() : $@convention(thin) () -> Builtin.RawPointer, scope 1 // user: %13
%13 = pointer\_to\_address %12 : $Builtin.RawPointer to \[strict] $\*UnsafeMutablePointer<Optional>, scope 1 // user: %14
store %1 to %13 : $\*UnsafeMutablePointer<Optional>, scope 1 // id: %14
%15 = tuple (), scope 1

These instructions handle command-line arguments to our program. Since we never used those arguments, a future optimizing pass will remove these unnecessary instructions. To keep things concise, I’ll skip over this section.

 

alloc\_global @\_Tv3out4listGSaSi\_, loc "input.swift":1:5, scope 1 // id: %16
%17 = global\_addr @\_Tv3out4listGSaSi\_ : $\*Array~~, loc "input.swift":1:5, scope 1 // users: %21, %73
  • alloc_global @foo Initialize memory for the global variable @foo.
  • global_addr @foo get the address of the global variable @foo.
  • @_Tv3out4listGSaSi_ is the mangled name of the array of integers Array that we later add elements to.

 

 

// function\_ref Array.init() -> \[A]
%18 = function\_ref @\_TFSaCfT\_GSax\_ : $@convention(method)  (@thin Array.Type) -> @owned Array, loc "input.swift":1:16, scope 1 // user: %20
%19 = metatype $@thin Array~~.Type, loc "input.swift":1:12, scope 1 // user: %20
%20 = apply %18~~(%19) : $@convention(method)  (@thin Array.Type) -> @owned Array, loc "input.swift":1:18, scope 1 // user: %21
store %20 to %17 : $\*Array~~, loc "input.swift":1:18, scope 1 // id: %21  // function\_ref Array.init() -> \[A]
  • There’s a lot going on in register %18
    • function_ref @foo : $T  create a reference to the function @foo with type T.
    • @convention(method) specify the Swift Method Calling Convention. This means that the SIL function will be called with the “self” argument last, because it is an instance method.
    • (@thin Array.Type) -> @owned Array this function type has a metatype parameter and returns a type. A metatype is the type of a type. τ\_0\_0 is the placeholder type of this generic function. @thin means that the metatype requires no storage, because it’s an exact type. @owned means that the recipient is responsible for destroying the value.
    • Putting it all together, this creates a reference to the generic Array.init function, and stores it in register %18.
  • metatype $T.Type create a reference to the metatype object for type T. Here we’re getting a reference to the type of the Array type. Note that this is an actual type, because it doesn’t have any placeholder types.
  • apply %0(%1, %2, ...) : $(A, B, ...) -> R call the function %0 with arguments %1, %2, ... of type A, B, ..., returning a value of type R.
  • Putting it all together, we’re calling the generic Array.init() function with the metatype Array.Type as the first and only argument, resulting in an Array. We’ve now initialized the global array that we’ll add elements to later.

 

 

// function\_ref Collection<A>.makeIterator() -> IndexingIterator~<A>~
%22 = function\_ref @\_TFesRxs10Collectionwx8IteratorzGVs16IndexingIteratorx\_wx8\_ElementzWxS0\_7Element\_rS\_12makeIteratorfT\_GS1\_x\_ : $@convention(method)  (@in\_guaranteed τ\_0\_0) -> @out IndexingIterator, loc "input.swift":3:14, scope 2 // user: %53
  • Create a function reference to Collection.makeIterator(). We don’t use this until much later in basic block 13.

 

 

%23 = integer\_literal $Builtin.Int64, 0, loc "input.swift":3:10, scope 2 // user: %24
%24 = struct $Int (%23 : $Builtin.Int64), loc "input.swift":3:10, scope 2 // user: %43
%25 = integer\_literal $Builtin.Int64, 1000000, loc "input.swift":3:14, scope 2 // user: %26
%26 = struct $Int (%25 : $Builtin.Int64), loc "input.swift":3:14, scope 2 // user: %45
%27 = alloc\_stack $CountableRange~~, loc "input.swift":3:11, scope 2 // users: %46, %55, %50
br bb1, loc "input.swift":3:11, scope 2 // id: %28
  • Create the integer literals 0 and 1000000, and then create values of struct type $Int with those literals
  • Allocate space on the stack for a $CountableRange
  • Branch to basic block 1 (bb1)

 

 

bb1: // Preds: bb0
 br bb2, loc "input.swift":3:11, scope 2 // id: %29
bb2: // Preds: bb1
 br bb3, loc "input.swift":3:11, scope 2 // id: %30
bb3: // Preds: bb2
 br bb4, loc "input.swift":3:11, scope 2 // id: %31
bb4: // Preds: bb3
 br bb5, loc "input.swift":3:11, scope 2 // id: %32
bb5: // Preds: bb4
 br bb6, loc "input.swift":3:11, scope 2 // id: %33
bb6: // Preds: bb5
 br bb7, loc "input.swift":3:11, scope 2 // id: %34
bb7: // Preds: bb6
 br bb8, loc "input.swift":3:11, scope 2 // id: %35
bb8: // Preds: bb7
 br bb9, loc "input.swift":3:11, scope 2 // id: %36
bb9: // Preds: bb8
 br bb10, loc "input.swift":3:11, scope 2 // id: %37
bb10: // Preds: bb9
 br bb11, loc "input.swift":3:11, scope 2 // id: %38
bb11: // Preds: bb10

These basic blocks do nothing and immediately branch to the following basic block. They will be removed during optimization. This might seem wasteful, but it simplifies the implementation of the compiler, because the initial code generation is decoupled from the optimization passes.

 

 

bb12:                                             // Preds: bb11
  // function\_ref CountableRange.init(uncheckedBounds : (lower : A, upper : A)) -> CountableRange~<A>~
  %40 = function\_ref @\_TFVs14CountableRangeCfT15uncheckedBoundsT5lowerx5upperx\_\_GS\_x\_ : $@convention(method)  (@in τ\_0\_0, @in τ\_0\_0, @thin CountableRange.Type) -> @out CountableRange, loc "input.swift":3:11, scope 2 // user: %46
  %41 = metatype $@thin CountableRange~~.Type, loc "input.swift":3:11, scope 2 // user: %46
  • Create a reference to CountableRange.init(uncheckedBounds : (lower : A, upper : A)) -> CountableRange<A>.
  • Create a reference to the metatype CountableRange.Type.

 

 

  %42 = alloc\_stack $Int, loc "input.swift":3:11, scope 2 // users: %43, %48, %46
  store %24 to %42 : $\*Int, loc "input.swift":3:11, scope 2 // id: %43
  %44 = alloc\_stack $Int, loc "input.swift":3:11, scope 2 // users: %45, %47, %46
  store %26 to %44 : $\*Int, loc "input.swift":3:11, scope 2 // id: %45
  • store %0 to %1 stores the value %0 at memory address %1.
  • In basic block 0, we created two $Int struct values that hold 0 and 1000000. Now, we allocate space on the stack, and store those values there. We have to put them on stack in order to for another method to use them.

 

 

  %46 = apply %40(%27, %42, %44, %41) : $@convention(method)  (@in τ\_0\_0, @in τ\_0\_0, @thin CountableRange.Type) -> @out CountableRange, loc "input.swift":3:11, scope 2
  • %46 We’re initializing a CountableRange with this apply instruction. For convenience, here are the arguments:
    • %27 points to space we allocated for the CountableRange.
    • %42 points to space with the $Int struct containing 0.
    • %44 points to space with the $Int struct containing 1000000.
    • %41 is a reference to the CountableRange~~.Type

 

  metatype.dealloc_stack %44 : $*Int, loc "input.swift":3:11, scope 2 // id: %47
  dealloc_stack %42 : $*Int, loc "input.swift":3:11, scope 2 // id: %48
  br bb13, loc "input.swift":3:11, scope 2 // id: %49
  • We don’t need those $Int structs on the stack anymore, so we deallocate them here.
  • Then we branch to basic block 13. This is another unnecessary branch, as bb13 only has one predecessor.

 

bb13:                                             // Preds: bb12
  %50 = load %27 : $\*CountableRange~~, loc "input.swift":3:11, scope 2 // user: %52
  %51 = alloc\_stack $CountableRange~~, loc "input.swift":3:11, scope 2 // users: %52, %54, %53
  store %50 to %51 : $\*CountableRange~~, loc "input.swift":3:11, scope 2 // id: %52
  • load reads the CountableRange from memory address %27 and stores it in register %50.
  • Allocate space on the stack for a CountableRange and store the address in register %51.
  • store the CountableRange that we just loaded into the newly allocated space at %51, effectively copying it.

 

 

  %53 = apply %22<CountableRange~~, Int, Int, CountableRange~~, CountableRange~~, Int, Int, Int, Int, Int, Int, IndexingIterator, CountableRange~~, Int, Int, IndexingIterator, CountableRange~~, Int, Int, Int, Int, Int, Int, Int, Int>(%2, %51) : $@convention(method)  (@in\_guaranteed τ\_0\_0) -> @out IndexingIterator, loc "input.swift":3:14, scope 2
  dealloc\_stack %51 : $\*CountableRange~~, loc "input.swift":3:14, scope 2 // id: %54
  dealloc\_stack %27 : $\*CountableRange~~, loc "input.swift":3:14, scope 2 // id: %55
  br bb14, loc "input.swift":3:1, scope 2         // id: %56
  • We call %22, which is a function reference to Collection.makeIterator(). It gets two arugments:
    • %2 is uninitialized memory that we allocated ages ago in basic block 0 for an IndexingIterator.
    • %51 is the address where we stored a copy of the CountableRange.
    • Putting this all together, we’re creating an iterator for the CountableRange.
  • We deallocate the memory for the CountableRange and its copy. This is because the IndexingIterator we just made contains all the information we needed from the countable range.
  • Note that the copy of the CountableRange wasn’t necessary. It will probably be removed in optimization.
  • Then we branch to basic block 14. Unlike all the previous branches, this one is necessary, because bb14 is the first (and only!) basic block with two predecessors.

The Loop Header

This basic block is a Loop Header because it dominates all other basic blocks in the loop. That means that every path to the other basic blocks must go through this basic block.

bb14:                                             // Preds: bb15 bb13
  // function\_ref IndexingIterator.next() -> A.\_Element?
  %57 = function\_ref @\_TFVs16IndexingIterator4nextfT\_GSqwx8\_Element\_ : $@convention(method)  (@inout IndexingIterator) -> @out Optional, loc "input.swift":3:7, scope 2 // user: %59
  %58 = alloc\_stack $Optional~~, loc "input.swift":3:7, scope 2 // users: %61, %60, %59
  %59 = apply %57(%58, %2) : $@convention(method)  (@inout IndexingIterator) -> @out Optional, loc "input.swift":3:7, scope 2
  %60 = load %58 : $\*Optional~~, loc "input.swift":3:7, scope 2 // users: %66, %64
  dealloc\_stack %58 : $\*Optional~~, loc "input.swift":3:7, scope 2 // id: %61
  • Get a reference to IndexingIterator.next().
  • Allocate space on the stack for an Optional.
  • Call IndexingIterator.next() with two arguments:
    • %58 The address for the space we just allocated for the Optional.
    • %2the uninitialized memory that we allocated ages ago in basic block 0 for an IndexingIterator.
  • This is interesting: unlike previous method calls, the return value %59 of IndexingIterator.next() is ignored. Instead, the returned Optional is loaded from %58 into %60.

 

 

  %62 = integer\_literal $Builtin.Int1, -1, loc "input.swift":3:1, scope 2 // user: %64
  %63 = integer\_literal $Builtin.Int1, 0, loc "input.swift":3:1, scope 2 // user: %64
  %64 = select\_enum %60 : $Optional~~, case #Optional.some!enumelt.1: %62, default %63 : $Builtin.Int1, loc "input.swift":3:1, scope 2 // user: %65
  cond\_br %64, bb15, bb16, loc "input.swift":3:1, scope 2 // id: %65
  • select_enum %0 : $E case #foo: %1, case #bar: %2, ... default %3 if enum %0 of type $E has value foo return %1, and so on for the other cases, by default returning %3. In this case, %64 will be -1 if the Optional has .Some value, and 0 otherwise.
  • cond_br %64, bb15, bb16 a conditional branch: if %64 is equal to 1, branch to bb15, otherwise branch to bb16 with the specified arguments.
  • Putting this all into context, the code will branch to bb15 if the Optional has .Some value, and bb16 otherwise.
  • Eagle-eyed readers will notice that the integer_literal used to initialize %62 is -1, not 1. I’m not sure why that is, because the docs for cond_br only document its behavior for 0 and 1.

The Loop Body

bb15:                                             // Preds: bb14
  %66 = unchecked\_enum\_data %60 : $Optional~~, #Optional.some!enumelt.1, loc "input.swift":3:1, scope 2 // user: %67
  debug\_value %66 : $Int, let, name "i", loc "input.swift":3:5, scope 2 // id: %67
  • unchecked_enum_data %60 : $E, #E.foo unsafely extracts the value of the enum %60 for the given case #E.foo.

At this point, the generated code diverges depending on whether we used .append(5) or += [5].

The .append(5) Case

  // function\_ref Array.append(A) -> ()
  %68 = function\_ref @\_TFSa6appendfxT\_ : $@convention(method)  (@in τ\_0\_0, @inout Array) -> (), loc "input.swift":4:10, scope 3 // user: %73
  %69 = integer\_literal $Builtin.Int64, 5, loc "input.swift":4:17, scope 3 // user: %70
  %70 = struct $Int (%69 : $Builtin.Int64), loc "input.swift":4:17, scope 3 // user: %72
  %71 = alloc\_stack $Int, loc "input.swift":4:17, scope 3 // users: %72, %74, %73
  store %70 to %71 : $\*Int, loc "input.swift":4:17, scope 3 // id: %72
  %73 = apply %68~~(%71, %17) : $@convention(method)  (@in τ\_0\_0, @inout Array) -> (), loc "input.swift":4:18, scope 3
  dealloc\_stack %71 : $\*Int, loc "input.swift":4:18, scope 3 // id: %74
  br bb14, loc "input.swift":5:1, scope 2         // id: %75
  • Create a function reference to Array.append().
  • Create an integer_literal with value 5, then create an $Int struct with that literal.
  • Allocate memory on the stack for the $Int struct, then store it there.
  • Call Array.append() with the arguments
    • %71, the $Int struct holding the value of 5
    • %17, the address of the global Array variable
  • Deallocate memory for the $Int struct
  • Branch back to bb14, the loop header.

The +=[5] Case

  // function_ref += infix<A> (inout [A.Iterator.Element], A) -> ()
  %68 = function_ref @_TFsoi2peuRxs10CollectionrFTRGSaWx8Iterator7Element__x_T_ : $@convention(thin)  (@inout Array, @in τ_0_0) -> (), loc "input.swift":5:10, scope 3 // user: %83
  • Operators are actually functions. This creates a reference to the += function.

 

  // function_ref Array.init(arrayLiteral : [A]...) -> [A]
  %69 = function_ref @_TFSaCft12arrayLiteralGSax__GSax_ : $@convention(method)  (@owned Array, @thin Array.Type) -> @owned Array, loc "input.swift":5:13, scope 3 // user: %80
  %70 = metatype $@thin Array.Type, loc "input.swift":5:13, scope 3 // user: %80
  %71 = integer_literal $Builtin.Word, 1, loc "input.swift":5:14, scope 3 // user: %73
  • Create a reference to the Array.init(arrayLiteral : [A]...) -> [A]  function.
  • Create a reference to the metatype Array.Type.
  • Create an integer_literal with value 1. This is the number of elements in the array that we’re about to allocate memory for.

 

  // function_ref _allocateUninitializedArray<A> (Builtin.Word) -> ([A], Builtin.RawPointer)
  %72 = function_ref @_TFs27_allocateUninitializedArrayurFBwTGSax_Bp_ : $@convention(thin)  (Builtin.Word) -> (@owned Array, Builtin.RawPointer), loc "input.swift":5:14, scope 3 // user: %73
  %73 = apply %72(%71) : $@convention(thin)  (Builtin.Word) -> (@owned Array, Builtin.RawPointer), loc "input.swift":5:14, scope 3 // users: %75, %74
  • Create a reference to the _allocateUninitializedArray(count: Builtin.Word) function, returns a tuple containing an array of count uninitialized elements and a pointer to the first element.
  • Call _allocateUninitializedArray with the integer_literal of 1 as its only argument

 

  %74 = tuple_extract %73 : $(Array, Builtin.RawPointer), 0, loc "input.swift":5:14, scope 3 // user: %80
  %75 = tuple_extract %73 : $(Array, Builtin.RawPointer), 1, loc "input.swift":5:14, scope 3 // user: %76
  %76 = pointer_to_address %75 : $Builtin.RawPointer to [strict] $*Int, loc "input.swift":5:14, scope 3 // user: %79
  • The returned value in %73 is a tuple, so tuple_extract extracts the first value (the uninitialized Array) into %74 and the second value (the pointer) into %75.
  • pointer_to_address %75 is an unchecked conversion of the pointer %75 into an address.

 

  %77 = integer_literal $Builtin.Int64, 5, loc "input.swift":5:14, scope 3 // user: %78
  %78 = struct $Int (%77 : $Builtin.Int64), loc "input.swift":5:14, scope 3 // user: %79
  store %78 to %76 : $*Int, loc "input.swift":5:14, scope 3 // id: %79
  • Create an integer_literal with value 5, then create an $Int struct with that literal.
  • Store that $Int struct at the start of the uninitialized array.
  • The array %74 is now initialized!

 

  %80 = apply %69(%74, %70) : $@convention(method)  (@owned Array, @thin Array.Type) -> @owned Array, loc "input.swift":5:14, scope 3 // user: %82
  • %69 is a reference to the Array.init(arrayLiteral : [A]...) -> [A]  function. We’re calling it on these arguments:
    • %74 is the array we just initialized
    • %70 is a reference to the metatype Array.Type.
  • %80 is now an Array initialized with the Int value 5
  %81 = alloc_stack $Array, loc "input.swift":5:13, scope 3  // users: %82, %84, %83
  store %80 to %81 : $*Array, loc "input.swift":5:13, scope 3 // id: %82
  • Allocate space on the stack for an Array, and store the array containing the Int with value 5 there.

 

  %83 = apply %68<[Int], Int, Int, CountableRange, IndexingIterator, ArraySlice, Int, Int, Int, Int, Int, Int, IndexingIterator, CountableRange, Int, Int, Int, IndexingIterator, ArraySlice, Int, Int, Int, Int, Int, Int, Int, Int>(%17, %81) : $@convention(thin)  (@inout Array, @in τ_0_0) -> (), loc "input.swift":5:10, scope 3
  • %68 is a reference to the += function. We’re calling it on these arguments:
    • %17, the address of the global Array variable
    • %81, the address on the stack where we stored the array containing the Int with value 5.dealloc_stack %81 : $*Array, loc “input.swift”:5:15, scope 3 // id: %84br bb14, loc “input.swift”:6:1, scope 2 // id: %85
  • Finally, we deallocate the memory for the temporary array, and branch back to the loop header

The Shared Exit Block

When the IndexingIterator is finished, the IndexingIterator.next() in the loop header will return an Optional with no value. That causes the cond_br to branch to this basic block.

bb16:                                             // Preds: bb14
  %76 = integer\_literal $Builtin.Int32, 0, scope 2 // user: %77
  %77 = struct $Int32 (%76 : $Builtin.Int32), scope 2 // user: %79
  dealloc\_stack %2 : $\*IndexingIterator, loc "input.swift":3:7, scope 2 // id: %78
  return %77 : $Int32, scope 2                    // id: %79
}
  • Create an integer_literal with value 0, then create an $Int struct with that literal.
  • Deallocate the memory for the IndexingIterator; we don’t need it anymore.
  • Return register %77, which is the Int with value 0, telling the operating system that the program exited with no errors.

The Test Case

You now know enough SIL to write a simple test case for the optimization that we’re about to implement.

// CHECK-LABEL: sil @append_contentsOf
// CHECK:   [[ACFUN:%.*]] = function_ref @arrayAppendContentsOf
// CHECK-NOT: apply [[ACFUN]]
// CHECK:   [[AEFUN:%.*]] = function_ref @_TFSa6appendfxT_
// CHECK:   apply [[AEFUN]]
// CHECK: return
sil @append_contentsOf : $@convention(thin) () -> () {

The // CHECK comments instruct the test runner to assert on a match (or the absence of a match) of the specified text, in the order that the // CHECK comments appear in. These instructions assert that the old Array.append(contentsOf:) function call has been replaced with a call to Array.append(element:). The latter call’s name has been mangled into @_TFSa6appendfxT_ because it comes from the standard library. In real code, the former call’s name would be mangled too, but because our optimization will only look at the semantic attribute of the function, we can use a more readable function name in testing.

  %0 = function_ref @swift_bufferAllocate : $@convention(thin) () -> @owned AnyObject
  %1 = integer_literal $Builtin.Int64, 1
  %2 = struct $MyInt (%1 : $Builtin.Int64)
  %3 = apply %0() : $@convention(thin) () -> @owned AnyObject
  %4 = metatype $@thin Array.Type
  %5 = function_ref @arrayAdoptStorage : $@convention(thin) (@owned AnyObject, MyInt, @thin Array.Type) -> @owned (Array, UnsafeMutablePointer)
  %6 = apply %5(%3, %2, %4) : $@convention(thin) (@owned AnyObject, MyInt, @thin Array.Type) -> @owned (Array, UnsafeMutablePointer)
  %7 = tuple_extract %6 : $(Array, UnsafeMutablePointer), 0
  %8 = tuple_extract %6 : $(Array, UnsafeMutablePointer), 1
  %9 = struct_extract %8 : $UnsafeMutablePointer, #UnsafeMutablePointer._rawValue
  %10 = pointer_to_address %9 : $Builtin.RawPointer to [strict] $*MyInt
  %11 = integer_literal $Builtin.Int64, 27
  %12 = struct $MyInt (%11 : $Builtin.Int64)
  store %12 to %10 : $*MyInt

Creates an array with one element. We’re not writing this SIL exactly the way that it would be generated on macOS because the test would fail on Linux, where Objective-C bridging doesn’t exist.

  %13 = alloc_stack $Array
  %14 = metatype $@thin Array.Type
  %15 = function_ref @arrayInit : $@convention(method) (@thin Array.Type) -> @owned Array
  %16 = apply %15(%14) : $@convention(method) (@thin Array.Type) -> @owned Array
  store %16 to %13 : $*Array

Creates an empty array, and stores it on the stack. It needs to be on the stack because it’s the self parameter to Array.append(contentsOf:).

  %17 = function_ref @arrayAppendContentsOf : $@convention(method) (@owned Array, @inout Array) -> ()
  %18 = apply %17(%7, %13) : $@convention(method) (@owned Array, @inout Array) -> ()

Gets a reference to Array.append(contentsOf:) and calls it with the two arrays as arguments.

  dealloc_stack %13 : $*Array
  %19 = tuple ()
  return %19 : $()
}

Cleans up the stack, and returns void, which is represented by an empty tuple in SIL.

The Optimization Pass

There are two types of SIL: Raw SIL, and Canonical SIL. The Swift compiler produces Raw SIL when it lowers the AST. Raw SIL might not be semantically valid. A series of deterministic optimization passes is run on Raw SIL, and that produces Canonical SIL, which is guaranteed to be semantically valid. Functions are specialized and inlined during this transformation, but we can delay the inlining of a function by adding a semantic attribute to it:

@semantic(“sometext”)

Semantic attributes instruct the Swift compiler to optimize code differently. They can be used to disable an optimization (like inlining), or to force an optimization. They’re used extensively in the Swift standard library, where hand-tuning how code is optimized is important.

Array methods are defined in stdlib/public/core/Arrays.swift.gyb, where gyb stands for “Generate Your Boilerplate”. We’ll add a semantic attribute to Array.append(contentsOf:), since that’s the function call that exists after Array’s operator+= function call is inlined. We’ll then modify include/swift/SILOptimizer/Analysis/ArraySemantic.h and its implementation file so that it’s easier to work with new semantic attribute. Lastly, we’ll add the new semantic attribute to lib/SILOptimizer/LoopTransforms/COWArrayOpt.cpp to squash some warnings about non-exhaustive switch statements.

Next, we need to write code that finds initializations of array literals, finds uses of those array literals in Array.append(contentsOf: instructions, and then proves that the arrays weren’t modified or escaped in between those two things. Thankfully, there’s a pass that already does something very similar: ArrayElementValuePropagation. That optimization converts code like this:

let a = [1, 2, 3];
let b = a[0] + a[2];

Into code like this:

let a = [1, 2, 3];
let b = 1 + 3;

It’s close enough to the optimization that we want to perform, so we’ll modify this pass to do some additional work, rather than create an entirely new pass.

The “main method” of an optimization pass is void run(), and we’re working in a SILFunctionTransform because we’re transforming function bodies. Here’s a walkthrough of the modified run() method.

  void run() override {
    auto &Fn = *getFunction(); // Get a reference to the function we're in

    // Store information about the calls that we want to replace
    llvm::SmallVector
      GetElementReplacements;
    llvm::SmallVector
      AppendContentsOfReplacements;

    // Iterate through the basic blocks in the function
    for (auto &BB :Fn) {
      // Iterate through the instructions in the basic blocks
      for (auto &Inst : BB) {
        // Filter for only apply instructions
        if (auto *Apply = dyn_cast(&Inst)) {
          // This is a helper class that tells us if an apply instruction is an array literal allocation
          // and simplifies getting the elements in that literal
          ArrayAllocation ALit;
          if (ALit.analyze(Apply)) {
            ALit.getGetElementReplacements(GetElementReplacements);
            // We call out helper method that extracts all the elements of an array literal
            ALit.getAppendContentOfReplacements(AppendContentsOfReplacements);
          }
        }
      }
    }

    bool Changed = false;
    
    // This was already in this optimization.
    for (ArrayAllocation::GetElementReplacement &Repl : GetElementReplacements) {
      ArraySemanticsCall GetElement(Repl.GetElementCall);
      Changed |= GetElement.replaceByValue(Repl.Replacement);
    }

    // We'll just add this call to our helper function here
    Changed |= replaceAppendCalls(AppendContentsOfReplacements);

    // You need to invalidate the analysis if you've changed something in an optimization
    if (Changed) {
      PM->invalidateAnalysis(
          &Fn, SILAnalysis::InvalidationKind::CallsAndInstructions);
    }
  }

Now let’s take a look at the replacement helper method.

  bool replaceAppendCalls(
                  ArrayRef Repls) {
    auto &Fn = *getFunction();
    auto &M = Fn.getModule();
    auto &Ctx = M.getASTContext();

    if (Repls.empty())
      return false;

    DEBUG(llvm::dbgs() << "Array append contentsOf calls replaced in "
                       << Fn.getName() << " (" << Repls.size() getAnyNominal();
      SubstitutionMap ArraySubMap = ArrayType.getSwiftRValueType()
        ->getContextSubstitutionMap(M.getSwiftModule(), NTD);
      
      GenericSignature *Sig = NTD->getGenericSignature();
      assert(Sig && "Array type must have generic signature");
      SmallVector Subs;
      Sig->getSubstitutions(ArraySubMap, Subs);
      
      // Finally we call the helper function we added to ArraySemanticsCall
      // to perform the actual replacement.
      AppendContentsOf.replaceByAppendingValues(M, AppendFn,
                                                Repl.ReplacementValues, Subs);
    }
    return true;
  }

A Wild Benchmark Regression Appears

I thought I was done at this point, but although my optimization was working, it had caused other benchmarks to regress. Not good. I had to isolate the problem by comparing the benchmark SIL generated from tip-of-tree swiftc versus my modified branch. I noticed that the function calls in my modified branch were unspecialized, and that was causing the slowdown.

The problem was that by adding semantic attributes to array.append_element and array.append_contentsOf, we delayed their inlining. Those functions contain calls to other generic functions, and the generic specializer was counting on the parent function being inlined so that it could specialize the function calls inside of them.

One solution that I considered was to add another inlining stage to the performance inliner transform. However, it seemed silly to add an entire inlining stage for just two functions.

Another solution I considered was to add a different type of semantic tag that’s only used for identifying functions, and not for changing the optimizer’s behavior.

Ultimately, eeckstein got this pull request across the finish line by special-casing the inlining of the two Array functions that I had added semantic attributes to.

Lessons Learned

It’s very difficult to casually contribute to swiftc. It took me two months to get my PR merged, and it felt like a part-time job while I was working on it. I don’t think I would make another contribution like this unless I was being paid to do it, and had an easier way to communicate with the very helpful Apple folks than the mailing list.

I switched to a 12” MacBook before I started working on my swiftc PR. It was so slow that I was only able to iterate on the code once a day, because a single compile and test run would take all night. I ended up buying a top-of-the-line 15” MacBook Pro because it was the only way to iterate on the codebase more than once a day.

Even on the new MacBook Pro, a full integration test run takes over four hours. This is a problem because only committers to the Swift repo have permissions to run tests on their CI infrastructure. Swift might be open-source, but if contributors need to drop thousands on a machine in order to work on it, it’s not going to have very many contributors.

I was one of the top 100 contributors to Swift after making my humble pull request, and most of the files I was working on had only ever been touched by ten or so people. That isn’t great for such a popular language, and I hope that changes.

Swift’s developer workflow is slow and clunky. If you switch branches, you have to wait for most of the files to be recompiled, which can take 15 or more minutes even on a beefy machine. I ended up keeping multiple Swift repositories around (20gb each!) so I could “switch branches” more easily as I was diffing the SIL from different versions of the compiler. I had to write some bash scripts to manage this mess.

Swiftc has a very steep learning curve. There is very little documentation, and the code isn’t very well commented. It’s a huge project, so even a modest change like mine required me to talk to multiple subject-matter experts. And even with a bunch of helpful Apple folks working with you…

It’s really easy to break swiftc because of how complex it is. My original pull request was approved and merged in a month. Despite only having about 200 lines of changes, I received 125 comments from six reviewers. Even after that much scruitiny, it was reverted almost immediately because it introduced a memory leak that a seventh person found after running a four hour long standard library integration test.

It was a fluke that that test caught the memory leak at all. The test was written in Swift, and my optimization broke the memory leak assertion in that test. In other words, my optimization caused a memory leak in the test itself, not the code that the test was testing. That would have been a serious bug if it had made it into production, because this optimization would have introduced memory leaks into Swift programs compiled with the buggy swiftc.

The optimization pipeline is also really easy to break. The order that passes run in, and how they are configured, is extremely important, yet not very well documented. There’s a lot of tacit knowledge in there, and it’s not obvious to me when a solution is hacky or acceptable. Special-casing the inlining of those two methods in code still seems hacky to me, but I don’t have as much context as the Apple folks do.

If you’re going to contribute to swiftc, be sure to run all the tests and benchmarks. The standard library integration test is not documented in the README, but you should run that too. Sometimes this is hard. The benchmark runner wasn’t working while I was working on this PR, so I had to write my own benchmarking script. Benchmarks on Apple’s CI were working, but that doesn’t help me when only committers have access to that.

Three Ideas For Improvment

I think that Swift has great potential as an open-source project, but I just don’t see how the majority of people can contribute to it right now.

  • Anyone, not just contributors, must be able to run tests and benchmarks on CI.
  • There needs to be a lower latency way to get help from Apple employees than the mailing list.
  • The developer workflow needs to be improved. Switching between branches and recompiling should be cheap. Scripts like the benchmark runner should be well maintained.

I glossed over some parts of what it took to get this optimization to work, but this post is already far too long. I may come back later to fill in the gaps, but in the meantime you can find the final version of this pull request here.

Algebraic Data Types in Swift

An algebraic data type is a type that’s the union of other data types. That is, it’s a type that may be one of several other types. Here’s how we would implement a linked list as an algebraic data type in Swift:

enum LinkedList<Element> {  
    case empty
    indirect case node(data: Element, next: LinkedList)
}

This defines an enum called LinkedList that might either be .empty or a .node that points to another LinkedList. There are three interesting things to note. The first is that we’ve created a generic data type, so the type of Element is declared by the consumer of LinkedList. The second is that the .node case uses LinkedList recursively, and must therefore be marked with indirect. The third is that since the .empty case has no parameters, the parenthesis may be omitted.

Here’s how we define instances of LinkedList:

let a: LinkedList<Int> = .empty  
let b: LinkedList<Int> = .node(data: 1, next: .node(data: 2, next: .empty))  

To work with an algebraic data type, we deconstruct it using pattern matching. Here’s how we would print a LinkedList:

enum LinkedList<Element>: CustomStringConvertible {  
    '' // cases omitted for brevity
    var description: String {
        switch self {
        case .empty:
        return "(end)"
        case let .node(data, next):
        return "(data), (next.description)"
        }
    }
}

let b: LinkedList<Int> = .node(data: 1, next: .node(data: 2, next: .empty))

print("b: (b)") // => b: 1, 2, (end)  

We’ve implemented the CustomStringConvertible protocol so that we can use string interpolation to print LinkedList instances. While it is possible in Swift to pattern match using an if case statement, the switch is preferable because the compiler will warn us if we’ve forgotten to handle a case. This safety is one big advantage that algebraic data types have over their classical counterparts. In a traditionally implemented linked list, you would have to remember to check if the next pointer was null to know if you were at the end. This problem gets worse as the number of cases increase in more complex data structures, such as full binary range trees with sentinels.

Note that since the description instance variable only has a getter, we do not need to use the more verbose syntax:

var description = {  
    get {
        // etc
    }
    set {
        // etc
    }
}

Our print function was useful, but in order to do interesting things we need to be able to modify algebraic data types. Rather than mutate the existing data structure, we’ll return a new data structure that represents the result after the requested operation. Since we’re not going to mutate the original data structure, we’ll follow the Swift 3 naming convention of using a gerund for our methods. Here’s how we would add an .inserting method to LinkedList:

enum LinkedList<Element>: CustomStringConvertible {  
    // cases and "description" omitted for brevity
    func inserting(e: Element) -> LinkedList<Element> {
        switch self {
            case .empty:
            return .node(data: e, next: .empty)
            case .node:
            return .node(data: e, next: self)
        }
    }
}

let c = b.inserting(e: 0)  
print("c: (c)") // => c: 0, 1, 2, (end)  

The key is that we’re returning a new LinkedList that represents the result after the insertion. Notice how in the .node case, we do not need to pattern match on .node(data, next) because data and next are not needed in order to construct the new .node; we can simply use self as the next: node.

Finally, let’s implement the classic “reverse a linked list” interview question using our algebraic data type:

enum LinkedList<Element>: CustomStringConvertible {  
    // cases, "description", and "insert" omitted for brevity
    func appending(_ e: Element) -> LinkedList<Element> {
        switch self {
            case .empty:
            return .node(data: e, next: .empty)
            case let .node(oldData, next):
            return .node(data: oldData, next: next.appending(e))
        }
    }

    func reversed() -> LinkedList<Element> {
        switch self {
            case .empty:
            return self
            case let .node(data, next):
            return next.reversed().appending(data)
        }
    }
}

print("reversed c: (c.reversed())") // => reversed c: 2, 1, 0, (end)  

I’ll leave it as an exercise to the reader to figure out what the running time for this algorithm is.

Here’s the Playground on GitHub

Isomorphisms Explained

I haven’t seen a good introduction to isomorphisms since they came into vogue last year, so I decided to write one.

Here goes!

A Parallel Postulate

Try defining “distance”. Most of us would say something like “how far two things are from each other”. Unfortunately, that makes answering a question like “how far is Alice from Bob” very difficult because there are several perfectly valid answers: Ten meters. One thousand centimeters. If Alice is ten meters away from Bob, is Bob negative ten meters away from Alice? Can I measure a zig-zag line from Alice to Bob?

To avoid ambiguity, a mathematician would use a definition like “Distance is the minimum number of one meter rulers, laid end-to-end, needed to touch both A and B”. It’s interesting how fragile yet robust these definitions are. Remove any piece of the definition and it falls apart. The word “minimum” ensures that the shortest path from A to B is measured. The use of one meter rulers ensures that distance is always an integer (no decimals or fractions). The answer will always be a positive number because you can’t have a negative number of rulers.

Now let’s change our perspective. We zoom really far out and see that Alice and Bob are on opposite sides of the Earth. Well, now “distance” depends on whether we are allowed to lay rulers through the center of the Earth. What if Alice and Bob were on the surface of a balloon? If we defined distance as the smallest angle between them from the center of the balloon, we’d have a unique answer even if the balloon inflates or deflates.

Here’s the main takeaway: the word “distance” doesn’t have any inherent meaning. Its meaning depends on our perspective, and if we are creative enough with our perspective, multiple meanings can exist at the same time. Keep this idea in mind, as it will return in the conclusion of this post.

I’m going to explain three types of isomorphism in this post, in order of complexity. The first is Graph Isomorphism. The second is Group Isomorphism. The third is Gödel Numbering.

Graph Isomorphism

I find that I learn best when the same thing is explained in at least two different ways. We’re going to start with graph isomorphisms because we can approach them both visually and theoretically.

A graph is a tuple (V, E) of a set of vertices V and a set of edges E. An edge is a tuple (Vi, Vj) where Vi and Vj are vertices. The vertices Va and Vb are adjacent if and only if there exists an edge (Va, Vb).

The graphs G and H are isomorphic if there is an isomorphism between them. An isomorphism between G and H is a bijection that maps vertices in G to a vertices in H such that if two vertices are adjacent in G, the vertices that they are mapped to are also adjacent in H.

It’s easier to understand this with pictures. Two graphs are isomorphic if you can arrange their vertices and edges in such a way that they look exactly the same.

graph-isomorphism-1-1-1-2.png

Figures 1.1 and Figures 1.2 are isomorphic. You can verify this visually by rotating the second figure ninety degrees clockwise and verifying that it looks exactly the same as the first figure. The other way is to define a function f(x) that meets the strict requirements we laid out above. Here is such a function f(x):

f(1) = d
f(2) = b
f(3) = c
f(4) = a

We now verify that this function meets our requirements. The first rule is that f(x) must be a bijective function. Bijective functions are also called bijections. A bijection is both injective and surjective. Injective functions map every member of their input set to a unique member of their output set. Surjective functions map at least one member of their input set to every member of their output set. Take a moment to understand this with the help of the following diagram, and then convince yourself that f(x) is a bijection.

graph-isomorphism-function-types.png

The second and final rule is that vertices that were adjacent in 1.1 must be adjacent in 1.2 after going through f(x). For succinctness, we’ll write “Vertex 1 is adjacent to vertex 2 and vertex 3” as 1 ADJ 2, 3. Convince yourself that f(x) preserves adjacency with the help of the table below.

1 ADJ 2, 3    => d ADJ b, c
2 ADJ 1, 3    => b ADJ d, c
3 ADJ 1, 2, 4 => c ADJ d, b, a
4 ADJ 3       => a ADJ c

That’s all there is to it. Simple? Not quite. Can you tell if these two graphs are isomorphic?

graph-isomorphism-1-3-1-4.png

As it turns out, it’s pretty difficult to figure out if two graphs are isomorphic once they get large enough. How about these two?

graph-isomorphism-1-5-1-6.png

There is no easy way to tell if two graphs are isomorphic. Figures 1.3 and 1.4 are isomorphic, but look nothing alike. Counting degrees (the number of edges each vertex has) does not work either — figures 1.5 and 1.6 both have degrees 2, 2, 3, 3, yet they are not isomorphic. Why is that? In order to prove that two graphs are not isomorphic, you need to show that no isomorphism could possibly exist between them. This is easy if their vertex degrees do not match, but as we just saw, it is possible for vertex degrees to match when no isomorphism is possible.

For more about graph isomorphism, I recommend this blog post by Jeremy Kun.

Group Isomorphism

Groups are a tuple (S, P) where S is a set of things and P is a function P(a, b) that operates on things a and b in S and returns something else. Two groups (G, *) and (H, +) are isomorphic if there is an isomorphism between them. The * and + are just symbols for operations on elements in the groups. They don’t necessarily mean multiplication or addition, though it is less confusing to choose a symbol that makes sense for whatever you’re doing.

The rules for group isomorphisms look similar to those for graph isomorphisms:

  1. There is a bijection P that maps elements in A to elements in B
  2. P preserves group operations P(a * b) = P(a) + P(b)

There isn’t a great way I know of to visualize this, so we’ll work through a few examples to understand what’s going on.

Degrees and Radians

Let’s show that angles in degrees and angles in radians are isomorphic under addition. Define the function P(d) = d * PI / 180. This function converts angles in degrees to angles in radians. We’re going to show that it’s an isomorphism. First we need to show that it is a bijection using a proof by contradiction. I’m only going to do this part of the proof for this example because it is quite lengthy and fairly obvious.

Show That P Is Injective

  1. Assume that P is not injective. That implies that at least one of the following is true:
    1. P is not defined for some input. This is not possible since P accepts all real numbers as input, and never divides by zero.
    2. P maps two inputs to the same output. Assume that this is possible. This implies that there exists some x and y where x != y such that P(x) = P(y). Substitute to get x * PI / 180 = y * PI / 180. Simplify to get x = y. This is a contradiction, so P cannot map two inputs to the same input.
  2. Since neither of the above is true, P must be injective.

Show That P Is Surjective

Assume that P is not surjective. That implies that there is a real number y such that there is no real number x where P(x) = y. Substitute to get x * PI / 180 = y. Rearrange to get x = y * 180 / PI. We now have a function C(y) that always returns a real number x that satisfies the equation P(x) = y for all y.

Show That P Preserves Group Operations

We need to show that P(a + b) = P(a) + P(b), where + stands for addition.

P(a + b) = (a + b) * PI / 180
         = (a * PI / 180) + (b * PI / 180)  // via the distributive property
         = P(a) + P(b)  // substitute back into definition of P

Phew, that’s it! We’ve shown that angles in degrees and angles in radians are isomorphic under addition.

Odds And Evens

Are odd integers are isomorphic to even integers under addition? Let’s give it a shot with this function: P(x) = x + 1. As you can see, given an odd number, it returns an even number. To show that it is an isomorphism, we first need to show that it is a bijection. You can do this now, or just take my word for it so we can move on.

Let’s look at whether this function preserves addition.

P(1 + 3) = P(4) = 5
P(1) + P(3) = 2 + 4 = 6

Oops, we’ve found a counterexample that violates rule number two. Can you think of a different way of mapping the odds to the evens that will preserve addition? Is there an operator besides addition that would work with the same mapping function?

Gödel Numbering

This is a very special sort of isomorphism that links formal systems with number theory. Formal systems are systems of thought. They have an alphabet, a grammar, a set of axioms, and a set of inference rules.

Formal systems let you encode statements like “5 is prime” in strings like ~Ea:Eb:(SSa*SSb)=SSSSS0, and prove the statement by using strict rules to manipulate strings. We’re going to start with a simple formal system and work our way up.

The EASY System

The EASY system is so named because it has four symbols in its alphabet: E, A, S, and Y. The grammar for this system says that “All Es appear before As, all As appear before Ss, and all Ss appear before Ys”. In other words, they appear in the order EASY.

A formula is a string made with the formal system’s alphabet. These are strings: EASS, SSSS, EY. These are not strings: XKCD, KISS, JUMP.

A formula is well-formed if it obeys the formal system’s grammar. ESYY and AAA are well-formed formulas. YES and EYES are not.

Let’s introduce some axioms and inference rules. Axioms are well-formed formulas that form the basis of all other theorems of the system. The EASY system has one axiom, A. It also has the following inference rules:

  1. A => AYY
  2. AYYY => S
  3. AS => E
  4. SY => EA

Using these inference rules, we can derive more theorems from the axioms:

A
AYY (Rule 1)
AYYYY (Rule 1)
AYYYYYY (Rule 1)
SYYY (Rule 2)
EAYY (Rule 4)
EAYYYY (Rule 1)
EASY (Rule 2)

All of these formulas are theorems because they can be derived from the formal system’s axioms using its inference rules. Note that formulas might not be well-formed, and well-formed formulas might not be theorems. This distinction is important! EY is a well-formed formula of the EASY system, but is it a theorem? Try deriving it from the axiom using the inference rules. How can you be sure that it isn’t a theorem?

The question “is EY a theorem of the EASY system” can be generalized to “is X a theorem of the Y system”. As Y gets more powerful, the questions we can ask get more interesting. The EASY system does not appear to do anything remotely useful because it is not isomorphic to something else we already know about. This is important: meaning is something that humans attach to formal systems. Formal systems are rather dumb systems of thought based on mechanically manipulating strings using pedantic rules. Let’s look at a formal system that we can attach some meaning to.

Arithmetic

Before we can express statements like “5 is prime” we need to be able to do some basic math. Let’s create a new formal system, the POES system.

The alphabet of our new system is S, 0, p, e.

The grammar of this system is {x}p{y}e{z}. The placeholders that look like {x} represent any string that starts with S and ends with 0.

The axiom of this system is 0p0e0

The rules of our system are:

1. {x}p{y}e{z}  => S{x}p{y}eS{z}

2. {x}p{y}e{z}  => {x}pS{y}eS{z}

Now let’s derive some theorems from the axiom:

0p0e0
S0p0eS0 (Rule 1)
S0pS0eSS0 (Rule 2)

This formal system looks nothing like the EASY system, but it works the exact same way. You have a set of axioms, and you can derive theorems from those axioms by applying some rules.

Although it is based on the same principles as the EASY system, the POES system is more interesting because it happens to be isomorphic to a tiny part of number theory. Here’s how you can interpret theorems of this system:

  • 0 means zero
  • S means “successor of” (so SSSSS0 means five)
  • p means “plus”
  • e means “equals”

An example:

0p0e0      =>  0+0=0
S0p0eS0    =>  1+0=1
S0pS0eSS0  =>  1+1=2

If you were given a string S0p0e0, would you be able to tell if it was a theorem of the system? It fits the grammar of the system, but in order to tell if it is a theorem, we need to show that we can derive it from the axioms. Now that you know that this system models addition, you might be tempted to say “no way, because 1+0 does not equal 0”. However, that would be using human intuition to work outside the system. You can’t jump to conclusions in a formal system; you have to mechanically prove that a string is a theorem by starting from an axiom and applying rules until you get the same string.

Is it possible to show that a string is not a theorem of a formal system?

Logic

In order to express a statement like “five is prime”, we need a more powerful formal system. One such system is Typographical Number Theory, or TNT. Explaining the entire system is beyond the scope of this post, but I’ll explain how to understand the example string from the beginning of this section.

~Ea:Eb:(SSa*SSb)=SSSSS0

From the left: ~ means “not”. E{x}: means “exists {x}”. a and b are simply variables in the statement. * means

“multiply”, and the parenthesis () are just for grouping.

Put together, this string says “There do not exist a and b such that (2 + a) * (2 + b) = 5”. In other words, five is prime.

It bears repeating that strings by themselves have no meaning. We find meaning in strings through isomorphism. The meaning in TNT comes from propositional calculus and number theory. However, you do not need to know any of that to work with this formal system.

It takes practice in order to turn statements like “16 is a power of 2” into TNT, but it can be done. The more complex the thought, the larger the resulting string. For example, you could try to prove Fermat’s Last Theorem by turning it into a gigantic string of TNT, and trying to derive it from the axioms.

Here is Ray Toal’s summary of TNT if you would like further reading.

Gödel Numbering

We’re finally ready to understand a very special type of isomorphism. Earlier we saw how otherwise meaningless strings like S0pS0eSS0 can acquire meaning through isomorphism. Now we’ll see how these strings can acquire an additional layer of meaning through a second isomorphism.

Computers store strings by encoding them into numbers. One popular encoding is ASCII. In ASCII, the lowercase letter a is encoded as the number 97. Assuming that we stay within the ASCII character set, we can encode any string into a number. Here is how the string S0pS0eSS0 can be encoded as the number 83,048,112,083,048,101,083,083,048:

S   0   p   S   0   e   S   S   0
083 048 112 083 048 101 083 083 048

Recall that we derived this string from an axiom of the POES system using inference rules. I’ve reproduced the proof below, this time with the ASCII encoding of each step.

0p0e0       048,112,048,101,048
S0p0eS0     083,048,112,048,101,083,048
S0pS0eSS0   083,048,112,083,048,101,083,083,048

Hey, this looks like the beginning of an isomorphism! We have two sets of things, and a way of mechanically converting between them. Specifically, we have a bijection between strings of a formal system and natural numbers.

To complete the isomorphism, we also need to convert our inference rules so that they work on numbers.

Gödel’s key insight was that each step of a formal system is isomorphic to some sequence of arithmetic operations. You can multiply by ten to shift a string to the left, or divide by ten to shift a string to the right. You can add to insert new characters, and subtract to remove old ones.

For example, one of the inference rules of the EASY system was A => AYY. If we encode A` as 1 and Y as `2, then this rule could be expressed as “If x is equal to 1, and x is an EASY number, then x*100 + 2*10 + 2 is an EASY number”.

This strategy works even for more complex rules. Recall that the rules of the POES system are:

1. {x}p{y}e{z}  => S{x}p{y}eS{z}

2. {x}p{y}e{z}  => {x}pS{y}eS{z}

The trick to translating more complex inference rules is to use multiple variables and constraints. For example, rule two can be translated into this:

  1. If j*10^(m+9) + 112*10^(m+6) + p*10^(m+3) + 101*10^m + n is a POES number where the following is true:
    • n < 10^m
    • p < 10^3
  2. then j*10^(m+15) + 112*10^(m+12) + 83 * 10^(m+9) + p*10^(m+6) + 101*10^(m+3) + 83*10^m + n is a POES number

Let’s see this rule in action!

Input string and number:
  S0p0eS0  =>   83,048,112,048,101,083,048

Fit the components of the input formula:
                         83,048 => n
                    101,000,000 => 101*10^m where m=6
                 48,000,000,000 => p*10(m+3) where p=48
            112,000,000,000,000 => 112*10^(m+6)
     83,048,000,000,000,000,000 => j*10(m+9) where j=83,048

Verify that the conditions are true:
  n < 10^m   =>  83,048 < 10^6  =>  true
  p < 10^3   =>  48 < 10^3      =>  true

Create the components of the output formula:


             83,048,000,000,000,000,000,000,000 => j*10^(m+15)
                    112,000,000,000,000,000,000 => 112*10^(m+12)
                         83,000,000,000,000,000 => 83*10^(m+9)
                             48,000,000,000,000 => p*10^(m+6)
                                101,000,000,000 => 101*10^(m+3)
                                     83,000,000 => 83*10^m
                                         83,048 => n


Sum it all up to get:
   83,048,112,083,048,101,083,083,048  =>  S0pS0eSS0

It takes some practice, but the key is this: we can take any formal system and talk about it using the natural numbers and simple arithmetic — even TNT! What a wonderful new type of isomorphism. What can we do with it?

A Strange Loop

Can TNT make statements about itself? For example, can we create a string of TNT that says “S is a theorem of TNT”?

Using Gödel numbering, we can! To make things easier to follow, we’ll define two operations, tnt2nt and nt2tnt. They stand for “TNT to Number Theory” and vice versa.

The operation tnt2nt represents the tedious work of converting from formal systems to number theory. For example, tnt2nt('S0p0eS0') would spit out the number 83,048,112,048,101,083,048, and tnt2nt(Rules) would spit out the arithmetic equivalents of the inference rules. Remember that axioms are strings, so you can also pass them through tnt2nt().

The operation nt2tnt represents the far more difficult task of converting statements of number theory into TNT. For example, nt2tnt('Five is a prime number') would produce ~Ea:Eb:(SSa*SSb)=SSSSS0. nt2tnt('Four is an even number') would produce Ea:SS0*a=SSSS0. We can even use nt2tnt() to turn “There exist numbers a, b, and c where c < 10^b such that a = 101*10^(b+3) + 99*10^b + c” into a massive string of TNT. Note the similarity of this string to the previous example!

With these two operations defined, we now have a way to create TNT strings that talk about TNT:

  1. S is a theorem of TNT
  2. The string S can be derived from axioms A following the rules R
  3. The number tnt2nt(S) can be produced from the numbers tnt2nt(A) with the arithmetic operations tnt2nt(R)
  4. nt2tnt('The number tnt2nt(S) can be produced from the numbers tnt2nt(A) with the arithmetic operations tnt2nt(R)')

This mind-blowing result is what happens when you have not one, but two isomorphisms working together. We’ve created a string of TNT that talks about number theory, and through number theory, talks about other TNT strings.

In Conclusion

To summarize: an isomorphism is a bijection that preserves relationships.

  • When graphs are isomorphic to each other, the bijection is between vertices, and adjacency is preserved.
  • When groups are isomorphic to each other, the bijection is between items, and group operations are preserved.
  • The POES system is isomorphic to a tiny piece of number theory, where the bijection is between strings like 0pS0eS0 and formulas like a+b=c, and the properties of addition are preserved.
  • Formal systems and number theory are isomorphic to each other, where the bijection is between strings and natural numbers, and inference rules are preserved.