High-Level SIL Optimization in the Swift Compiler

Matt Rajca discovered last year that Swift’s Array.append(element) is 6x faster than Array operator+=(collection). This is a shame, because the latter is semantically equivalent, easier to type, and more pleasing to the eye. Swift is a new language, so there is no shortage of opportunities for optimization.

I’ve enjoyed the recent uptick in “how to build a super simple compiler” blog posts, but there isn’t a lot of material for those who want to go deeper into the world of programming languages and compilers. That’s mostly why it took me so long to make this contribution to the Swift compiler, despite already having some experience working with LLVM.

This post is aimed at those who already understand the basics of compiler design and terminology, but have yet to contribute to an optimizing compiler for a popular language. It’ll be most useful for those who want to work on the high level optimization passes in Swift.

It’s more of a reference than a hand-holdy tutorial, because if you try to write about compilers in too much detail, it might take you a lifetime, and you may never finish.

We’ll cover:

  1. How to read SIL (the bulk of this post, because it’s so important, and you’ll be reading a lot of it)
  2. How to write a SIL test case
  3. How to write an optimization pass
  4. How adding an optimization pass can make things slower

Skip to the end if you just want my thoughts on what contributing to Swift is like. This is a very long post, not meant to be read in a single sitting.

Why High-Level SIL Optimization?

Our goal is to replace arr += [5] with arr.append(5).

First, we have to know where we’ll implement this optimization, and why. Prior to this, the only optimizations I had written were LLVM optimization passes, which operated on LLVM IR.

The problem with this is that function calls can be very difficult to replace after they’ve been lowered into LLVM IR. At such a low level, it’s hard to tell what instructions are part of the original function call, which significantly complicates the implementation of the optimization pass, and makes it very fragile.

The next most logical place to perform this optimization is at the AST level. But since Swift ASTs are only syntactically valid, and not yet semantically valid, we cannot safely transform them.

To solve this problem, the Swift compiler has an intermediate representation between the AST and LLVM IR, called SIL (Swift Intermediate Language).

SIL, like LLVM, is a Static-Single-Assignment (SSA) IR. Unlike LLVM IR, it has a richer type system, uses basic block parameters rather than phi nodes, and crucially for us, allows functions to be annotated with semantic attributes.

Reading SIL

Let’s look at the generated SIL for the Swift that we want to optimize. Compile Swift, and then emit canonical SIL.

// input.swift
var list = [Int]();

for _ in 0..<1_000_000 {
  list += [5] // we'll swap this to list.append(5) to compare later
}
swift-sources/build/Ninja-DebugAssert/swift-macosx-x86_64/bin/swiftc\
-frontend\
-target x86_64-apple-macosx10.9\
-sdk /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk\
-Onone\
-emit-sil\
-o out.sil\
input.swift

We’re only going to look at what’s happening in the main method. I’ll represent types with T, T1, T2 etc, so Array means an Array of elements of type T.

The Setup

// main
sil @main : $@convention(c) (Int32, UnsafeMutablePointer<Optional>) -> Int32 {
  • // main this is a comment containing the name of the following method. SIL’s comments help humans make sense of it.
  • @main this method’s name is “main”. Identifier names in SIL are prefixed with the @ symbol.
  • @convention(c) the C calling convention should be used. Calling conventions specify how arguments and return values should be handled when a function is called.
  • (Int32, the 1st argument to this method is a 32-bit integer. By convention, we know that this integer is the number of arguments given to the program. The C equivalent of this is int argc
  • UnsafeMutablePointer<Optional>) there’s a bit to unpack here!
    • UnsafeMutablePointer is a raw pointer (memory address) to something of type T.
    • Optional means that you’ll either Some or None.
    • Int8 is an 8-bit integer. In context, this is a character (char).
    • UnsafeMutablePointer a pointer to a character. In context, this is a character array, also known as a string.
    • Optional either a pointer to a string, or null.
    • Putting it all together, this is a null-terminated array of pointers to strings. Its equivalent in C is char* [].
  • -> Int32 this method returns a 32-bit integer. By convention, we know that this is the exit code of the program.

 

 

// %0       // user: %7
// %1       // user: %14
bb0(%0 : $Int32, %1 : $UnsafeMutablePointer\<Optional\\>):
  • // %0 // user: %7  a comment, noting that register %7 depends on register %0. Registers in SIL are integers prefixed with %. Because SIL, like LLVM IR, is a Static Single Assignment (SSA) IR, you can think of these registers as constants in a typical programming language; they hold a value, and once set, cannot be changed. Whenever the compiler needs a new register, it simply increments the register number. The immutable registers in SSA make optimizations easier to write. Technically, these are virtual registers; a later pass will “lower” them to use real registers on the target architecture. If we look further in the code, we see that %0 is indeed used in an instruction whose result is stored in %7. Register numbers are scoped to the method that their basic blocks live in.
  • bb0 basic block zero. A basic block is a straight-line code sequence. That means that it has no branches, except at its entry and exit.
  • (%0 : $Int32 the first parameter to the basic block is a 32-bit integer. Say that basic block bb3 has immediate predecessors bb1 and bb2. bb3 needs to refer to register %7 in bb1, or register %11 in bb2, depending on which predecessor was used to reach it. In LLVM IR, we would use a Φ (Phi) function in bb3 to “choose” between %7 or %11 and assign the chosen value to a new register. In SIL, the predecessor basic blocks bb1 and bb2 do the choosing, by passing arguments to bb3 in the branch (br) instruction.
  • %1 : $UnsafeMutablePointer<Optional>) the second parameter, with the type already explained above. I’m not sure what the tildes (~) mean.

 

 

%2 = alloc\_stack $IndexingIterator, var, name "$i$generator", loc "input.swift":3:7, scope 2 // users: %78, %53, %59
  • alloc_stack T Allocate (uninitialized) memory on the stack to contain T, and return the address of the allocated memory.
  • $IndexingIterator The type of iterator that we are allocating memory for. Types in SIL start with $.

 

 

%3 = metatype $@thin CommandLine.Type, scope 1
// function\_ref CommandLine.\_argc.unsafeMutableAddressor
%4 = function\_ref @\_TFOs11CommandLineau5\_argcVs5Int32 : $@convention(thin) () -> Builtin.RawPointer, scope 1 // user: %5
%5 = apply %4() : $@convention(thin) () -> Builtin.RawPointer, scope 1 // user: %6
%6 = pointer\_to\_address %5 : $Builtin.RawPointer to \[strict] $\*Int32, scope 1 // user: %9
%7 = struct\_extract %0 : $Int32, #Int32.\_value, scope 1 // user: %8
%8 = struct $Int32 (%7 : $Builtin.Int32), scope 1 // user: %9
store %8 to %6 : $\*Int32, scope 1 // id: %9
%10 = metatype $@thin CommandLine.Type, scope 1
// function\_ref CommandLine.\_unsafeArgv.unsafeMutableAddressor
%11 = function\_ref @\_TFOs11CommandLineau11\_unsafeArgvGSpGSqGSpVs4Int8\_\_\_ : $@convention(thin) () -> Builtin.RawPointer, scope 1 // user: %12
%12 = apply %11() : $@convention(thin) () -> Builtin.RawPointer, scope 1 // user: %13
%13 = pointer\_to\_address %12 : $Builtin.RawPointer to \[strict] $\*UnsafeMutablePointer<Optional>, scope 1 // user: %14
store %1 to %13 : $\*UnsafeMutablePointer<Optional>, scope 1 // id: %14
%15 = tuple (), scope 1

These instructions handle command-line arguments to our program. Since we never used those arguments, a future optimizing pass will remove these unnecessary instructions. To keep things concise, I’ll skip over this section.

 

alloc\_global @\_Tv3out4listGSaSi\_, loc "input.swift":1:5, scope 1 // id: %16
%17 = global\_addr @\_Tv3out4listGSaSi\_ : $\*Array~~, loc "input.swift":1:5, scope 1 // users: %21, %73
  • alloc_global @foo Initialize memory for the global variable @foo.
  • global_addr @foo get the address of the global variable @foo.
  • @_Tv3out4listGSaSi_ is the mangled name of the array of integers Array that we later add elements to.

 

 

// function\_ref Array.init() -> \[A]
%18 = function\_ref @\_TFSaCfT\_GSax\_ : $@convention(method)  (@thin Array.Type) -> @owned Array, loc "input.swift":1:16, scope 1 // user: %20
%19 = metatype $@thin Array~~.Type, loc "input.swift":1:12, scope 1 // user: %20
%20 = apply %18~~(%19) : $@convention(method)  (@thin Array.Type) -> @owned Array, loc "input.swift":1:18, scope 1 // user: %21
store %20 to %17 : $\*Array~~, loc "input.swift":1:18, scope 1 // id: %21  // function\_ref Array.init() -> \[A]
  • There’s a lot going on in register %18
    • function_ref @foo : $T  create a reference to the function @foo with type T.
    • @convention(method) specify the Swift Method Calling Convention. This means that the SIL function will be called with the “self” argument last, because it is an instance method.
    • (@thin Array.Type) -> @owned Array this function type has a metatype parameter and returns a type. A metatype is the type of a type. τ\_0\_0 is the placeholder type of this generic function. @thin means that the metatype requires no storage, because it’s an exact type. @owned means that the recipient is responsible for destroying the value.
    • Putting it all together, this creates a reference to the generic Array.init function, and stores it in register %18.
  • metatype $T.Type create a reference to the metatype object for type T. Here we’re getting a reference to the type of the Array type. Note that this is an actual type, because it doesn’t have any placeholder types.
  • apply %0(%1, %2, ...) : $(A, B, ...) -> R call the function %0 with arguments %1, %2, ... of type A, B, ..., returning a value of type R.
  • Putting it all together, we’re calling the generic Array.init() function with the metatype Array.Type as the first and only argument, resulting in an Array. We’ve now initialized the global array that we’ll add elements to later.

 

 

// function\_ref Collection<A>.makeIterator() -> IndexingIterator~<A>~
%22 = function\_ref @\_TFesRxs10Collectionwx8IteratorzGVs16IndexingIteratorx\_wx8\_ElementzWxS0\_7Element\_rS\_12makeIteratorfT\_GS1\_x\_ : $@convention(method)  (@in\_guaranteed τ\_0\_0) -> @out IndexingIterator, loc "input.swift":3:14, scope 2 // user: %53
  • Create a function reference to Collection.makeIterator(). We don’t use this until much later in basic block 13.

 

 

%23 = integer\_literal $Builtin.Int64, 0, loc "input.swift":3:10, scope 2 // user: %24
%24 = struct $Int (%23 : $Builtin.Int64), loc "input.swift":3:10, scope 2 // user: %43
%25 = integer\_literal $Builtin.Int64, 1000000, loc "input.swift":3:14, scope 2 // user: %26
%26 = struct $Int (%25 : $Builtin.Int64), loc "input.swift":3:14, scope 2 // user: %45
%27 = alloc\_stack $CountableRange~~, loc "input.swift":3:11, scope 2 // users: %46, %55, %50
br bb1, loc "input.swift":3:11, scope 2 // id: %28
  • Create the integer literals 0 and 1000000, and then create values of struct type $Int with those literals
  • Allocate space on the stack for a $CountableRange
  • Branch to basic block 1 (bb1)

 

 

bb1: // Preds: bb0
 br bb2, loc "input.swift":3:11, scope 2 // id: %29
bb2: // Preds: bb1
 br bb3, loc "input.swift":3:11, scope 2 // id: %30
bb3: // Preds: bb2
 br bb4, loc "input.swift":3:11, scope 2 // id: %31
bb4: // Preds: bb3
 br bb5, loc "input.swift":3:11, scope 2 // id: %32
bb5: // Preds: bb4
 br bb6, loc "input.swift":3:11, scope 2 // id: %33
bb6: // Preds: bb5
 br bb7, loc "input.swift":3:11, scope 2 // id: %34
bb7: // Preds: bb6
 br bb8, loc "input.swift":3:11, scope 2 // id: %35
bb8: // Preds: bb7
 br bb9, loc "input.swift":3:11, scope 2 // id: %36
bb9: // Preds: bb8
 br bb10, loc "input.swift":3:11, scope 2 // id: %37
bb10: // Preds: bb9
 br bb11, loc "input.swift":3:11, scope 2 // id: %38
bb11: // Preds: bb10

These basic blocks do nothing and immediately branch to the following basic block. They will be removed during optimization. This might seem wasteful, but it simplifies the implementation of the compiler, because the initial code generation is decoupled from the optimization passes.

 

 

bb12:                                             // Preds: bb11
  // function\_ref CountableRange.init(uncheckedBounds : (lower : A, upper : A)) -> CountableRange~<A>~
  %40 = function\_ref @\_TFVs14CountableRangeCfT15uncheckedBoundsT5lowerx5upperx\_\_GS\_x\_ : $@convention(method)  (@in τ\_0\_0, @in τ\_0\_0, @thin CountableRange.Type) -> @out CountableRange, loc "input.swift":3:11, scope 2 // user: %46
  %41 = metatype $@thin CountableRange~~.Type, loc "input.swift":3:11, scope 2 // user: %46
  • Create a reference to CountableRange.init(uncheckedBounds : (lower : A, upper : A)) -> CountableRange<A>.
  • Create a reference to the metatype CountableRange.Type.

 

 

  %42 = alloc\_stack $Int, loc "input.swift":3:11, scope 2 // users: %43, %48, %46
  store %24 to %42 : $\*Int, loc "input.swift":3:11, scope 2 // id: %43
  %44 = alloc\_stack $Int, loc "input.swift":3:11, scope 2 // users: %45, %47, %46
  store %26 to %44 : $\*Int, loc "input.swift":3:11, scope 2 // id: %45
  • store %0 to %1 stores the value %0 at memory address %1.
  • In basic block 0, we created two $Int struct values that hold 0 and 1000000. Now, we allocate space on the stack, and store those values there. We have to put them on stack in order to for another method to use them.

 

 

  %46 = apply %40(%27, %42, %44, %41) : $@convention(method)  (@in τ\_0\_0, @in τ\_0\_0, @thin CountableRange.Type) -> @out CountableRange, loc "input.swift":3:11, scope 2
  • %46 We’re initializing a CountableRange with this apply instruction. For convenience, here are the arguments:
    • %27 points to space we allocated for the CountableRange.
    • %42 points to space with the $Int struct containing 0.
    • %44 points to space with the $Int struct containing 1000000.
    • %41 is a reference to the CountableRange~~.Type

 

  metatype.dealloc_stack %44 : $*Int, loc "input.swift":3:11, scope 2 // id: %47
  dealloc_stack %42 : $*Int, loc "input.swift":3:11, scope 2 // id: %48
  br bb13, loc "input.swift":3:11, scope 2 // id: %49
  • We don’t need those $Int structs on the stack anymore, so we deallocate them here.
  • Then we branch to basic block 13. This is another unnecessary branch, as bb13 only has one predecessor.

 

bb13:                                             // Preds: bb12
  %50 = load %27 : $\*CountableRange~~, loc "input.swift":3:11, scope 2 // user: %52
  %51 = alloc\_stack $CountableRange~~, loc "input.swift":3:11, scope 2 // users: %52, %54, %53
  store %50 to %51 : $\*CountableRange~~, loc "input.swift":3:11, scope 2 // id: %52
  • load reads the CountableRange from memory address %27 and stores it in register %50.
  • Allocate space on the stack for a CountableRange and store the address in register %51.
  • store the CountableRange that we just loaded into the newly allocated space at %51, effectively copying it.

 

 

  %53 = apply %22<CountableRange~~, Int, Int, CountableRange~~, CountableRange~~, Int, Int, Int, Int, Int, Int, IndexingIterator, CountableRange~~, Int, Int, IndexingIterator, CountableRange~~, Int, Int, Int, Int, Int, Int, Int, Int>(%2, %51) : $@convention(method)  (@in\_guaranteed τ\_0\_0) -> @out IndexingIterator, loc "input.swift":3:14, scope 2
  dealloc\_stack %51 : $\*CountableRange~~, loc "input.swift":3:14, scope 2 // id: %54
  dealloc\_stack %27 : $\*CountableRange~~, loc "input.swift":3:14, scope 2 // id: %55
  br bb14, loc "input.swift":3:1, scope 2         // id: %56
  • We call %22, which is a function reference to Collection.makeIterator(). It gets two arugments:
    • %2 is uninitialized memory that we allocated ages ago in basic block 0 for an IndexingIterator.
    • %51 is the address where we stored a copy of the CountableRange.
    • Putting this all together, we’re creating an iterator for the CountableRange.
  • We deallocate the memory for the CountableRange and its copy. This is because the IndexingIterator we just made contains all the information we needed from the countable range.
  • Note that the copy of the CountableRange wasn’t necessary. It will probably be removed in optimization.
  • Then we branch to basic block 14. Unlike all the previous branches, this one is necessary, because bb14 is the first (and only!) basic block with two predecessors.

The Loop Header

This basic block is a Loop Header because it dominates all other basic blocks in the loop. That means that every path to the other basic blocks must go through this basic block.

bb14:                                             // Preds: bb15 bb13
  // function\_ref IndexingIterator.next() -> A.\_Element?
  %57 = function\_ref @\_TFVs16IndexingIterator4nextfT\_GSqwx8\_Element\_ : $@convention(method)  (@inout IndexingIterator) -> @out Optional, loc "input.swift":3:7, scope 2 // user: %59
  %58 = alloc\_stack $Optional~~, loc "input.swift":3:7, scope 2 // users: %61, %60, %59
  %59 = apply %57(%58, %2) : $@convention(method)  (@inout IndexingIterator) -> @out Optional, loc "input.swift":3:7, scope 2
  %60 = load %58 : $\*Optional~~, loc "input.swift":3:7, scope 2 // users: %66, %64
  dealloc\_stack %58 : $\*Optional~~, loc "input.swift":3:7, scope 2 // id: %61
  • Get a reference to IndexingIterator.next().
  • Allocate space on the stack for an Optional.
  • Call IndexingIterator.next() with two arguments:
    • %58 The address for the space we just allocated for the Optional.
    • %2the uninitialized memory that we allocated ages ago in basic block 0 for an IndexingIterator.
  • This is interesting: unlike previous method calls, the return value %59 of IndexingIterator.next() is ignored. Instead, the returned Optional is loaded from %58 into %60.

 

 

  %62 = integer\_literal $Builtin.Int1, -1, loc "input.swift":3:1, scope 2 // user: %64
  %63 = integer\_literal $Builtin.Int1, 0, loc "input.swift":3:1, scope 2 // user: %64
  %64 = select\_enum %60 : $Optional~~, case #Optional.some!enumelt.1: %62, default %63 : $Builtin.Int1, loc "input.swift":3:1, scope 2 // user: %65
  cond\_br %64, bb15, bb16, loc "input.swift":3:1, scope 2 // id: %65
  • select_enum %0 : $E case #foo: %1, case #bar: %2, ... default %3 if enum %0 of type $E has value foo return %1, and so on for the other cases, by default returning %3. In this case, %64 will be -1 if the Optional has .Some value, and 0 otherwise.
  • cond_br %64, bb15, bb16 a conditional branch: if %64 is equal to 1, branch to bb15, otherwise branch to bb16 with the specified arguments.
  • Putting this all into context, the code will branch to bb15 if the Optional has .Some value, and bb16 otherwise.
  • Eagle-eyed readers will notice that the integer_literal used to initialize %62 is -1, not 1. I’m not sure why that is, because the docs for cond_br only document its behavior for 0 and 1.

The Loop Body

bb15:                                             // Preds: bb14
  %66 = unchecked\_enum\_data %60 : $Optional~~, #Optional.some!enumelt.1, loc "input.swift":3:1, scope 2 // user: %67
  debug\_value %66 : $Int, let, name "i", loc "input.swift":3:5, scope 2 // id: %67
  • unchecked_enum_data %60 : $E, #E.foo unsafely extracts the value of the enum %60 for the given case #E.foo.

At this point, the generated code diverges depending on whether we used .append(5) or += [5].

The .append(5) Case

  // function\_ref Array.append(A) -> ()
  %68 = function\_ref @\_TFSa6appendfxT\_ : $@convention(method)  (@in τ\_0\_0, @inout Array) -> (), loc "input.swift":4:10, scope 3 // user: %73
  %69 = integer\_literal $Builtin.Int64, 5, loc "input.swift":4:17, scope 3 // user: %70
  %70 = struct $Int (%69 : $Builtin.Int64), loc "input.swift":4:17, scope 3 // user: %72
  %71 = alloc\_stack $Int, loc "input.swift":4:17, scope 3 // users: %72, %74, %73
  store %70 to %71 : $\*Int, loc "input.swift":4:17, scope 3 // id: %72
  %73 = apply %68~~(%71, %17) : $@convention(method)  (@in τ\_0\_0, @inout Array) -> (), loc "input.swift":4:18, scope 3
  dealloc\_stack %71 : $\*Int, loc "input.swift":4:18, scope 3 // id: %74
  br bb14, loc "input.swift":5:1, scope 2         // id: %75
  • Create a function reference to Array.append().
  • Create an integer_literal with value 5, then create an $Int struct with that literal.
  • Allocate memory on the stack for the $Int struct, then store it there.
  • Call Array.append() with the arguments
    • %71, the $Int struct holding the value of 5
    • %17, the address of the global Array variable
  • Deallocate memory for the $Int struct
  • Branch back to bb14, the loop header.

The +=[5] Case

  // function_ref += infix<A> (inout [A.Iterator.Element], A) -> ()
  %68 = function_ref @_TFsoi2peuRxs10CollectionrFTRGSaWx8Iterator7Element__x_T_ : $@convention(thin)  (@inout Array, @in τ_0_0) -> (), loc "input.swift":5:10, scope 3 // user: %83
  • Operators are actually functions. This creates a reference to the += function.

 

  // function_ref Array.init(arrayLiteral : [A]...) -> [A]
  %69 = function_ref @_TFSaCft12arrayLiteralGSax__GSax_ : $@convention(method)  (@owned Array, @thin Array.Type) -> @owned Array, loc "input.swift":5:13, scope 3 // user: %80
  %70 = metatype $@thin Array.Type, loc "input.swift":5:13, scope 3 // user: %80
  %71 = integer_literal $Builtin.Word, 1, loc "input.swift":5:14, scope 3 // user: %73
  • Create a reference to the Array.init(arrayLiteral : [A]...) -> [A]  function.
  • Create a reference to the metatype Array.Type.
  • Create an integer_literal with value 1. This is the number of elements in the array that we’re about to allocate memory for.

 

  // function_ref _allocateUninitializedArray<A> (Builtin.Word) -> ([A], Builtin.RawPointer)
  %72 = function_ref @_TFs27_allocateUninitializedArrayurFBwTGSax_Bp_ : $@convention(thin)  (Builtin.Word) -> (@owned Array, Builtin.RawPointer), loc "input.swift":5:14, scope 3 // user: %73
  %73 = apply %72(%71) : $@convention(thin)  (Builtin.Word) -> (@owned Array, Builtin.RawPointer), loc "input.swift":5:14, scope 3 // users: %75, %74
  • Create a reference to the _allocateUninitializedArray(count: Builtin.Word) function, returns a tuple containing an array of count uninitialized elements and a pointer to the first element.
  • Call _allocateUninitializedArray with the integer_literal of 1 as its only argument

 

  %74 = tuple_extract %73 : $(Array, Builtin.RawPointer), 0, loc "input.swift":5:14, scope 3 // user: %80
  %75 = tuple_extract %73 : $(Array, Builtin.RawPointer), 1, loc "input.swift":5:14, scope 3 // user: %76
  %76 = pointer_to_address %75 : $Builtin.RawPointer to [strict] $*Int, loc "input.swift":5:14, scope 3 // user: %79
  • The returned value in %73 is a tuple, so tuple_extract extracts the first value (the uninitialized Array) into %74 and the second value (the pointer) into %75.
  • pointer_to_address %75 is an unchecked conversion of the pointer %75 into an address.

 

  %77 = integer_literal $Builtin.Int64, 5, loc "input.swift":5:14, scope 3 // user: %78
  %78 = struct $Int (%77 : $Builtin.Int64), loc "input.swift":5:14, scope 3 // user: %79
  store %78 to %76 : $*Int, loc "input.swift":5:14, scope 3 // id: %79
  • Create an integer_literal with value 5, then create an $Int struct with that literal.
  • Store that $Int struct at the start of the uninitialized array.
  • The array %74 is now initialized!

 

  %80 = apply %69(%74, %70) : $@convention(method)  (@owned Array, @thin Array.Type) -> @owned Array, loc "input.swift":5:14, scope 3 // user: %82
  • %69 is a reference to the Array.init(arrayLiteral : [A]...) -> [A]  function. We’re calling it on these arguments:
    • %74 is the array we just initialized
    • %70 is a reference to the metatype Array.Type.
  • %80 is now an Array initialized with the Int value 5
  %81 = alloc_stack $Array, loc "input.swift":5:13, scope 3  // users: %82, %84, %83
  store %80 to %81 : $*Array, loc "input.swift":5:13, scope 3 // id: %82
  • Allocate space on the stack for an Array, and store the array containing the Int with value 5 there.

 

  %83 = apply %68<[Int], Int, Int, CountableRange, IndexingIterator, ArraySlice, Int, Int, Int, Int, Int, Int, IndexingIterator, CountableRange, Int, Int, Int, IndexingIterator, ArraySlice, Int, Int, Int, Int, Int, Int, Int, Int>(%17, %81) : $@convention(thin)  (@inout Array, @in τ_0_0) -> (), loc "input.swift":5:10, scope 3
  • %68 is a reference to the += function. We’re calling it on these arguments:
    • %17, the address of the global Array variable
    • %81, the address on the stack where we stored the array containing the Int with value 5.dealloc_stack %81 : $*Array, loc “input.swift”:5:15, scope 3 // id: %84br bb14, loc “input.swift”:6:1, scope 2 // id: %85
  • Finally, we deallocate the memory for the temporary array, and branch back to the loop header

The Shared Exit Block

When the IndexingIterator is finished, the IndexingIterator.next() in the loop header will return an Optional with no value. That causes the cond_br to branch to this basic block.

bb16:                                             // Preds: bb14
  %76 = integer\_literal $Builtin.Int32, 0, scope 2 // user: %77
  %77 = struct $Int32 (%76 : $Builtin.Int32), scope 2 // user: %79
  dealloc\_stack %2 : $\*IndexingIterator, loc "input.swift":3:7, scope 2 // id: %78
  return %77 : $Int32, scope 2                    // id: %79
}
  • Create an integer_literal with value 0, then create an $Int struct with that literal.
  • Deallocate the memory for the IndexingIterator; we don’t need it anymore.
  • Return register %77, which is the Int with value 0, telling the operating system that the program exited with no errors.

The Test Case

You now know enough SIL to write a simple test case for the optimization that we’re about to implement.

// CHECK-LABEL: sil @append_contentsOf
// CHECK:   [[ACFUN:%.*]] = function_ref @arrayAppendContentsOf
// CHECK-NOT: apply [[ACFUN]]
// CHECK:   [[AEFUN:%.*]] = function_ref @_TFSa6appendfxT_
// CHECK:   apply [[AEFUN]]
// CHECK: return
sil @append_contentsOf : $@convention(thin) () -> () {

The // CHECK comments instruct the test runner to assert on a match (or the absence of a match) of the specified text, in the order that the // CHECK comments appear in. These instructions assert that the old Array.append(contentsOf:) function call has been replaced with a call to Array.append(element:). The latter call’s name has been mangled into @_TFSa6appendfxT_ because it comes from the standard library. In real code, the former call’s name would be mangled too, but because our optimization will only look at the semantic attribute of the function, we can use a more readable function name in testing.

  %0 = function_ref @swift_bufferAllocate : $@convention(thin) () -> @owned AnyObject
  %1 = integer_literal $Builtin.Int64, 1
  %2 = struct $MyInt (%1 : $Builtin.Int64)
  %3 = apply %0() : $@convention(thin) () -> @owned AnyObject
  %4 = metatype $@thin Array.Type
  %5 = function_ref @arrayAdoptStorage : $@convention(thin) (@owned AnyObject, MyInt, @thin Array.Type) -> @owned (Array, UnsafeMutablePointer)
  %6 = apply %5(%3, %2, %4) : $@convention(thin) (@owned AnyObject, MyInt, @thin Array.Type) -> @owned (Array, UnsafeMutablePointer)
  %7 = tuple_extract %6 : $(Array, UnsafeMutablePointer), 0
  %8 = tuple_extract %6 : $(Array, UnsafeMutablePointer), 1
  %9 = struct_extract %8 : $UnsafeMutablePointer, #UnsafeMutablePointer._rawValue
  %10 = pointer_to_address %9 : $Builtin.RawPointer to [strict] $*MyInt
  %11 = integer_literal $Builtin.Int64, 27
  %12 = struct $MyInt (%11 : $Builtin.Int64)
  store %12 to %10 : $*MyInt

Creates an array with one element. We’re not writing this SIL exactly the way that it would be generated on macOS because the test would fail on Linux, where Objective-C bridging doesn’t exist.

  %13 = alloc_stack $Array
  %14 = metatype $@thin Array.Type
  %15 = function_ref @arrayInit : $@convention(method) (@thin Array.Type) -> @owned Array
  %16 = apply %15(%14) : $@convention(method) (@thin Array.Type) -> @owned Array
  store %16 to %13 : $*Array

Creates an empty array, and stores it on the stack. It needs to be on the stack because it’s the self parameter to Array.append(contentsOf:).

  %17 = function_ref @arrayAppendContentsOf : $@convention(method) (@owned Array, @inout Array) -> ()
  %18 = apply %17(%7, %13) : $@convention(method) (@owned Array, @inout Array) -> ()

Gets a reference to Array.append(contentsOf:) and calls it with the two arrays as arguments.

  dealloc_stack %13 : $*Array
  %19 = tuple ()
  return %19 : $()
}

Cleans up the stack, and returns void, which is represented by an empty tuple in SIL.

The Optimization Pass

There are two types of SIL: Raw SIL, and Canonical SIL. The Swift compiler produces Raw SIL when it lowers the AST. Raw SIL might not be semantically valid. A series of deterministic optimization passes is run on Raw SIL, and that produces Canonical SIL, which is guaranteed to be semantically valid. Functions are specialized and inlined during this transformation, but we can delay the inlining of a function by adding a semantic attribute to it:

@semantic(“sometext”)

Semantic attributes instruct the Swift compiler to optimize code differently. They can be used to disable an optimization (like inlining), or to force an optimization. They’re used extensively in the Swift standard library, where hand-tuning how code is optimized is important.

Array methods are defined in stdlib/public/core/Arrays.swift.gyb, where gyb stands for “Generate Your Boilerplate”. We’ll add a semantic attribute to Array.append(contentsOf:), since that’s the function call that exists after Array’s operator+= function call is inlined. We’ll then modify include/swift/SILOptimizer/Analysis/ArraySemantic.h and its implementation file so that it’s easier to work with new semantic attribute. Lastly, we’ll add the new semantic attribute to lib/SILOptimizer/LoopTransforms/COWArrayOpt.cpp to squash some warnings about non-exhaustive switch statements.

Next, we need to write code that finds initializations of array literals, finds uses of those array literals in Array.append(contentsOf: instructions, and then proves that the arrays weren’t modified or escaped in between those two things. Thankfully, there’s a pass that already does something very similar: ArrayElementValuePropagation. That optimization converts code like this:

let a = [1, 2, 3];
let b = a[0] + a[2];

Into code like this:

let a = [1, 2, 3];
let b = 1 + 3;

It’s close enough to the optimization that we want to perform, so we’ll modify this pass to do some additional work, rather than create an entirely new pass.

The “main method” of an optimization pass is void run(), and we’re working in a SILFunctionTransform because we’re transforming function bodies. Here’s a walkthrough of the modified run() method.

  void run() override {
    auto &Fn = *getFunction(); // Get a reference to the function we're in

    // Store information about the calls that we want to replace
    llvm::SmallVector
      GetElementReplacements;
    llvm::SmallVector
      AppendContentsOfReplacements;

    // Iterate through the basic blocks in the function
    for (auto &BB :Fn) {
      // Iterate through the instructions in the basic blocks
      for (auto &Inst : BB) {
        // Filter for only apply instructions
        if (auto *Apply = dyn_cast(&Inst)) {
          // This is a helper class that tells us if an apply instruction is an array literal allocation
          // and simplifies getting the elements in that literal
          ArrayAllocation ALit;
          if (ALit.analyze(Apply)) {
            ALit.getGetElementReplacements(GetElementReplacements);
            // We call out helper method that extracts all the elements of an array literal
            ALit.getAppendContentOfReplacements(AppendContentsOfReplacements);
          }
        }
      }
    }

    bool Changed = false;
    
    // This was already in this optimization.
    for (ArrayAllocation::GetElementReplacement &Repl : GetElementReplacements) {
      ArraySemanticsCall GetElement(Repl.GetElementCall);
      Changed |= GetElement.replaceByValue(Repl.Replacement);
    }

    // We'll just add this call to our helper function here
    Changed |= replaceAppendCalls(AppendContentsOfReplacements);

    // You need to invalidate the analysis if you've changed something in an optimization
    if (Changed) {
      PM->invalidateAnalysis(
          &Fn, SILAnalysis::InvalidationKind::CallsAndInstructions);
    }
  }

Now let’s take a look at the replacement helper method.

  bool replaceAppendCalls(
                  ArrayRef Repls) {
    auto &Fn = *getFunction();
    auto &M = Fn.getModule();
    auto &Ctx = M.getASTContext();

    if (Repls.empty())
      return false;

    DEBUG(llvm::dbgs() << "Array append contentsOf calls replaced in "
                       << Fn.getName() << " (" << Repls.size() getAnyNominal();
      SubstitutionMap ArraySubMap = ArrayType.getSwiftRValueType()
        ->getContextSubstitutionMap(M.getSwiftModule(), NTD);
      
      GenericSignature *Sig = NTD->getGenericSignature();
      assert(Sig && "Array type must have generic signature");
      SmallVector Subs;
      Sig->getSubstitutions(ArraySubMap, Subs);
      
      // Finally we call the helper function we added to ArraySemanticsCall
      // to perform the actual replacement.
      AppendContentsOf.replaceByAppendingValues(M, AppendFn,
                                                Repl.ReplacementValues, Subs);
    }
    return true;
  }

A Wild Benchmark Regression Appears

I thought I was done at this point, but although my optimization was working, it had caused other benchmarks to regress. Not good. I had to isolate the problem by comparing the benchmark SIL generated from tip-of-tree swiftc versus my modified branch. I noticed that the function calls in my modified branch were unspecialized, and that was causing the slowdown.

The problem was that by adding semantic attributes to array.append_element and array.append_contentsOf, we delayed their inlining. Those functions contain calls to other generic functions, and the generic specializer was counting on the parent function being inlined so that it could specialize the function calls inside of them.

One solution that I considered was to add another inlining stage to the performance inliner transform. However, it seemed silly to add an entire inlining stage for just two functions.

Another solution I considered was to add a different type of semantic tag that’s only used for identifying functions, and not for changing the optimizer’s behavior.

Ultimately, eeckstein got this pull request across the finish line by special-casing the inlining of the two Array functions that I had added semantic attributes to.

Lessons Learned

It’s very difficult to casually contribute to swiftc. It took me two months to get my PR merged, and it felt like a part-time job while I was working on it. I don’t think I would make another contribution like this unless I was being paid to do it, and had an easier way to communicate with the very helpful Apple folks than the mailing list.

I switched to a 12” MacBook before I started working on my swiftc PR. It was so slow that I was only able to iterate on the code once a day, because a single compile and test run would take all night. I ended up buying a top-of-the-line 15” MacBook Pro because it was the only way to iterate on the codebase more than once a day.

Even on the new MacBook Pro, a full integration test run takes over four hours. This is a problem because only committers to the Swift repo have permissions to run tests on their CI infrastructure. Swift might be open-source, but if contributors need to drop thousands on a machine in order to work on it, it’s not going to have very many contributors.

I was one of the top 100 contributors to Swift after making my humble pull request, and most of the files I was working on had only ever been touched by ten or so people. That isn’t great for such a popular language, and I hope that changes.

Swift’s developer workflow is slow and clunky. If you switch branches, you have to wait for most of the files to be recompiled, which can take 15 or more minutes even on a beefy machine. I ended up keeping multiple Swift repositories around (20gb each!) so I could “switch branches” more easily as I was diffing the SIL from different versions of the compiler. I had to write some bash scripts to manage this mess.

Swiftc has a very steep learning curve. There is very little documentation, and the code isn’t very well commented. It’s a huge project, so even a modest change like mine required me to talk to multiple subject-matter experts. And even with a bunch of helpful Apple folks working with you…

It’s really easy to break swiftc because of how complex it is. My original pull request was approved and merged in a month. Despite only having about 200 lines of changes, I received 125 comments from six reviewers. Even after that much scruitiny, it was reverted almost immediately because it introduced a memory leak that a seventh person found after running a four hour long standard library integration test.

It was a fluke that that test caught the memory leak at all. The test was written in Swift, and my optimization broke the memory leak assertion in that test. In other words, my optimization caused a memory leak in the test itself, not the code that the test was testing. That would have been a serious bug if it had made it into production, because this optimization would have introduced memory leaks into Swift programs compiled with the buggy swiftc.

The optimization pipeline is also really easy to break. The order that passes run in, and how they are configured, is extremely important, yet not very well documented. There’s a lot of tacit knowledge in there, and it’s not obvious to me when a solution is hacky or acceptable. Special-casing the inlining of those two methods in code still seems hacky to me, but I don’t have as much context as the Apple folks do.

If you’re going to contribute to swiftc, be sure to run all the tests and benchmarks. The standard library integration test is not documented in the README, but you should run that too. Sometimes this is hard. The benchmark runner wasn’t working while I was working on this PR, so I had to write my own benchmarking script. Benchmarks on Apple’s CI were working, but that doesn’t help me when only committers have access to that.

Three Ideas For Improvment

I think that Swift has great potential as an open-source project, but I just don’t see how the majority of people can contribute to it right now.

  • Anyone, not just contributors, must be able to run tests and benchmarks on CI.
  • There needs to be a lower latency way to get help from Apple employees than the mailing list.
  • The developer workflow needs to be improved. Switching between branches and recompiling should be cheap. Scripts like the benchmark runner should be well maintained.

I glossed over some parts of what it took to get this optimization to work, but this post is already far too long. I may come back later to fill in the gaps, but in the meantime you can find the final version of this pull request here.