High-Level SIL Optimization in the Swift Compiler

Matt Rajca discovered last year that Swift’s Array.append(element) is 6x faster than Array operator+=(collection). This is a shame, because the latter is semantically equivalent, easier to type, and more pleasing to the eye. Swift is a new language, so there is no shortage of opportunities for optimization.

I’ve enjoyed the recent uptick in “how to build a super simple compiler” blog posts, but there isn’t a lot of material for those who want to go deeper into the world of programming languages and compilers. That’s mostly why it took me so long to make this contribution to the Swift compiler, despite already having some experience working with LLVM.

This post is aimed at those who already understand the basics of compiler design and terminology, but have yet to contribute to an optimizing compiler for a popular language. It’ll be most useful for those who want to work on the high level optimization passes in Swift.

It’s more of a reference than a hand-holdy tutorial, because if you try to write about compilers in too much detail, it might take you a lifetime, and you may never finish.

We’ll cover:

  1. How to read SIL (the bulk of this post, because it’s so important, and you’ll be reading a lot of it)
  2. How to write a SIL test case
  3. How to write an optimization pass
  4. How adding an optimization pass can make things slower

Skip to the end if you just want my thoughts on what contributing to Swift is like. This is a very long post, not meant to be read in a single sitting.

Why High-Level SIL Optimization?

Our goal is to replace arr += [5] with arr.append(5).

First, we have to know where we’ll implement this optimization, and why. Prior to this, the only optimizations I had written were LLVM optimization passes, which operated on LLVM IR.

The problem with this is that function calls can be very difficult to replace after they’ve been lowered into LLVM IR. At such a low level, it’s hard to tell what instructions are part of the original function call, which significantly complicates the implementation of the optimization pass, and makes it very fragile.

The next most logical place to perform this optimization is at the AST level. But since Swift ASTs are only syntactically valid, and not yet semantically valid, we cannot safely transform them.

To solve this problem, the Swift compiler has an intermediate representation between the AST and LLVM IR, called SIL (Swift Intermediate Language).

SIL, like LLVM, is a Static-Single-Assignment (SSA) IR. Unlike LLVM IR, it has a richer type system, uses basic block parameters rather than phi nodes, and crucially for us, allows functions to be annotated with semantic attributes.

Reading SIL

Let’s look at the generated SIL for the Swift that we want to optimize. Compile Swift, and then emit canonical SIL.

// input.swift
var list = [Int]();

for _ in 0..<1_000_000 {
  list += [5] // we'll swap this to list.append(5) to compare later
}
swift-sources/build/Ninja-DebugAssert/swift-macosx-x86_64/bin/swiftc\
-frontend\
-target x86_64-apple-macosx10.9\
-sdk /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk\
-Onone\
-emit-sil\
-o out.sil\
input.swift

We’re only going to look at what’s happening in the main method. I’ll represent types with T, T1, T2 etc, so Array means an Array of elements of type T.

The Setup

// main
sil @main : $@convention(c) (Int32, UnsafeMutablePointer<Optional>) -> Int32 {
  • // main this is a comment containing the name of the following method. SIL’s comments help humans make sense of it.
  • @main this method’s name is “main”. Identifier names in SIL are prefixed with the @ symbol.
  • @convention(c) the C calling convention should be used. Calling conventions specify how arguments and return values should be handled when a function is called.
  • (Int32, the 1st argument to this method is a 32-bit integer. By convention, we know that this integer is the number of arguments given to the program. The C equivalent of this is int argc
  • UnsafeMutablePointer<Optional>) there’s a bit to unpack here!
    • UnsafeMutablePointer is a raw pointer (memory address) to something of type T.
    • Optional means that you’ll either Some or None.
    • Int8 is an 8-bit integer. In context, this is a character (char).
    • UnsafeMutablePointer a pointer to a character. In context, this is a character array, also known as a string.
    • Optional either a pointer to a string, or null.
    • Putting it all together, this is a null-terminated array of pointers to strings. Its equivalent in C is char* [].
  • -> Int32 this method returns a 32-bit integer. By convention, we know that this is the exit code of the program.

 

 

// %0       // user: %7
// %1       // user: %14
bb0(%0 : $Int32, %1 : $UnsafeMutablePointer\<Optional\\>):
  • // %0 // user: %7  a comment, noting that register %7 depends on register %0. Registers in SIL are integers prefixed with %. Because SIL, like LLVM IR, is a Static Single Assignment (SSA) IR, you can think of these registers as constants in a typical programming language; they hold a value, and once set, cannot be changed. Whenever the compiler needs a new register, it simply increments the register number. The immutable registers in SSA make optimizations easier to write. Technically, these are virtual registers; a later pass will “lower” them to use real registers on the target architecture. If we look further in the code, we see that %0 is indeed used in an instruction whose result is stored in %7. Register numbers are scoped to the method that their basic blocks live in.
  • bb0 basic block zero. A basic block is a straight-line code sequence. That means that it has no branches, except at its entry and exit.
  • (%0 : $Int32 the first parameter to the basic block is a 32-bit integer. Say that basic block bb3 has immediate predecessors bb1 and bb2. bb3 needs to refer to register %7 in bb1, or register %11 in bb2, depending on which predecessor was used to reach it. In LLVM IR, we would use a Φ (Phi) function in bb3 to “choose” between %7 or %11 and assign the chosen value to a new register. In SIL, the predecessor basic blocks bb1 and bb2 do the choosing, by passing arguments to bb3 in the branch (br) instruction.
  • %1 : $UnsafeMutablePointer<Optional>) the second parameter, with the type already explained above. I’m not sure what the tildes (~) mean.

 

 

%2 = alloc\_stack $IndexingIterator, var, name "$i$generator", loc "input.swift":3:7, scope 2 // users: %78, %53, %59
  • alloc_stack T Allocate (uninitialized) memory on the stack to contain T, and return the address of the allocated memory.
  • $IndexingIterator The type of iterator that we are allocating memory for. Types in SIL start with $.

 

 

%3 = metatype $@thin CommandLine.Type, scope 1
// function\_ref CommandLine.\_argc.unsafeMutableAddressor
%4 = function\_ref @\_TFOs11CommandLineau5\_argcVs5Int32 : $@convention(thin) () -> Builtin.RawPointer, scope 1 // user: %5
%5 = apply %4() : $@convention(thin) () -> Builtin.RawPointer, scope 1 // user: %6
%6 = pointer\_to\_address %5 : $Builtin.RawPointer to \[strict] $\*Int32, scope 1 // user: %9
%7 = struct\_extract %0 : $Int32, #Int32.\_value, scope 1 // user: %8
%8 = struct $Int32 (%7 : $Builtin.Int32), scope 1 // user: %9
store %8 to %6 : $\*Int32, scope 1 // id: %9
%10 = metatype $@thin CommandLine.Type, scope 1
// function\_ref CommandLine.\_unsafeArgv.unsafeMutableAddressor
%11 = function\_ref @\_TFOs11CommandLineau11\_unsafeArgvGSpGSqGSpVs4Int8\_\_\_ : $@convention(thin) () -> Builtin.RawPointer, scope 1 // user: %12
%12 = apply %11() : $@convention(thin) () -> Builtin.RawPointer, scope 1 // user: %13
%13 = pointer\_to\_address %12 : $Builtin.RawPointer to \[strict] $\*UnsafeMutablePointer<Optional>, scope 1 // user: %14
store %1 to %13 : $\*UnsafeMutablePointer<Optional>, scope 1 // id: %14
%15 = tuple (), scope 1

These instructions handle command-line arguments to our program. Since we never used those arguments, a future optimizing pass will remove these unnecessary instructions. To keep things concise, I’ll skip over this section.

 

alloc\_global @\_Tv3out4listGSaSi\_, loc "input.swift":1:5, scope 1 // id: %16
%17 = global\_addr @\_Tv3out4listGSaSi\_ : $\*Array~~, loc "input.swift":1:5, scope 1 // users: %21, %73
  • alloc_global @foo Initialize memory for the global variable @foo.
  • global_addr @foo get the address of the global variable @foo.
  • @_Tv3out4listGSaSi_ is the mangled name of the array of integers Array that we later add elements to.

 

 

// function\_ref Array.init() -> \[A]
%18 = function\_ref @\_TFSaCfT\_GSax\_ : $@convention(method)  (@thin Array.Type) -> @owned Array, loc "input.swift":1:16, scope 1 // user: %20
%19 = metatype $@thin Array~~.Type, loc "input.swift":1:12, scope 1 // user: %20
%20 = apply %18~~(%19) : $@convention(method)  (@thin Array.Type) -> @owned Array, loc "input.swift":1:18, scope 1 // user: %21
store %20 to %17 : $\*Array~~, loc "input.swift":1:18, scope 1 // id: %21  // function\_ref Array.init() -> \[A]
  • There’s a lot going on in register %18
    • function_ref @foo : $T  create a reference to the function @foo with type T.
    • @convention(method) specify the Swift Method Calling Convention. This means that the SIL function will be called with the “self” argument last, because it is an instance method.
    • (@thin Array.Type) -> @owned Array this function type has a metatype parameter and returns a type. A metatype is the type of a type. τ\_0\_0 is the placeholder type of this generic function. @thin means that the metatype requires no storage, because it’s an exact type. @owned means that the recipient is responsible for destroying the value.
    • Putting it all together, this creates a reference to the generic Array.init function, and stores it in register %18.
  • metatype $T.Type create a reference to the metatype object for type T. Here we’re getting a reference to the type of the Array type. Note that this is an actual type, because it doesn’t have any placeholder types.
  • apply %0(%1, %2, ...) : $(A, B, ...) -> R call the function %0 with arguments %1, %2, ... of type A, B, ..., returning a value of type R.
  • Putting it all together, we’re calling the generic Array.init() function with the metatype Array.Type as the first and only argument, resulting in an Array. We’ve now initialized the global array that we’ll add elements to later.

 

 

// function\_ref Collection<A>.makeIterator() -> IndexingIterator~<A>~
%22 = function\_ref @\_TFesRxs10Collectionwx8IteratorzGVs16IndexingIteratorx\_wx8\_ElementzWxS0\_7Element\_rS\_12makeIteratorfT\_GS1\_x\_ : $@convention(method)  (@in\_guaranteed τ\_0\_0) -> @out IndexingIterator, loc "input.swift":3:14, scope 2 // user: %53
  • Create a function reference to Collection.makeIterator(). We don’t use this until much later in basic block 13.

 

 

%23 = integer\_literal $Builtin.Int64, 0, loc "input.swift":3:10, scope 2 // user: %24
%24 = struct $Int (%23 : $Builtin.Int64), loc "input.swift":3:10, scope 2 // user: %43
%25 = integer\_literal $Builtin.Int64, 1000000, loc "input.swift":3:14, scope 2 // user: %26
%26 = struct $Int (%25 : $Builtin.Int64), loc "input.swift":3:14, scope 2 // user: %45
%27 = alloc\_stack $CountableRange~~, loc "input.swift":3:11, scope 2 // users: %46, %55, %50
br bb1, loc "input.swift":3:11, scope 2 // id: %28
  • Create the integer literals 0 and 1000000, and then create values of struct type $Int with those literals
  • Allocate space on the stack for a $CountableRange
  • Branch to basic block 1 (bb1)

 

 

bb1: // Preds: bb0
 br bb2, loc "input.swift":3:11, scope 2 // id: %29
bb2: // Preds: bb1
 br bb3, loc "input.swift":3:11, scope 2 // id: %30
bb3: // Preds: bb2
 br bb4, loc "input.swift":3:11, scope 2 // id: %31
bb4: // Preds: bb3
 br bb5, loc "input.swift":3:11, scope 2 // id: %32
bb5: // Preds: bb4
 br bb6, loc "input.swift":3:11, scope 2 // id: %33
bb6: // Preds: bb5
 br bb7, loc "input.swift":3:11, scope 2 // id: %34
bb7: // Preds: bb6
 br bb8, loc "input.swift":3:11, scope 2 // id: %35
bb8: // Preds: bb7
 br bb9, loc "input.swift":3:11, scope 2 // id: %36
bb9: // Preds: bb8
 br bb10, loc "input.swift":3:11, scope 2 // id: %37
bb10: // Preds: bb9
 br bb11, loc "input.swift":3:11, scope 2 // id: %38
bb11: // Preds: bb10

These basic blocks do nothing and immediately branch to the following basic block. They will be removed during optimization. This might seem wasteful, but it simplifies the implementation of the compiler, because the initial code generation is decoupled from the optimization passes.

 

 

bb12:                                             // Preds: bb11
  // function\_ref CountableRange.init(uncheckedBounds : (lower : A, upper : A)) -> CountableRange~<A>~
  %40 = function\_ref @\_TFVs14CountableRangeCfT15uncheckedBoundsT5lowerx5upperx\_\_GS\_x\_ : $@convention(method)  (@in τ\_0\_0, @in τ\_0\_0, @thin CountableRange.Type) -> @out CountableRange, loc "input.swift":3:11, scope 2 // user: %46
  %41 = metatype $@thin CountableRange~~.Type, loc "input.swift":3:11, scope 2 // user: %46
  • Create a reference to CountableRange.init(uncheckedBounds : (lower : A, upper : A)) -> CountableRange<A>.
  • Create a reference to the metatype CountableRange.Type.

 

 

  %42 = alloc\_stack $Int, loc "input.swift":3:11, scope 2 // users: %43, %48, %46
  store %24 to %42 : $\*Int, loc "input.swift":3:11, scope 2 // id: %43
  %44 = alloc\_stack $Int, loc "input.swift":3:11, scope 2 // users: %45, %47, %46
  store %26 to %44 : $\*Int, loc "input.swift":3:11, scope 2 // id: %45
  • store %0 to %1 stores the value %0 at memory address %1.
  • In basic block 0, we created two $Int struct values that hold 0 and 1000000. Now, we allocate space on the stack, and store those values there. We have to put them on stack in order to for another method to use them.

 

 

  %46 = apply %40(%27, %42, %44, %41) : $@convention(method)  (@in τ\_0\_0, @in τ\_0\_0, @thin CountableRange.Type) -> @out CountableRange, loc "input.swift":3:11, scope 2
  • %46 We’re initializing a CountableRange with this apply instruction. For convenience, here are the arguments:
    • %27 points to space we allocated for the CountableRange.
    • %42 points to space with the $Int struct containing 0.
    • %44 points to space with the $Int struct containing 1000000.
    • %41 is a reference to the CountableRange~~.Type

 

  metatype.dealloc_stack %44 : $*Int, loc "input.swift":3:11, scope 2 // id: %47
  dealloc_stack %42 : $*Int, loc "input.swift":3:11, scope 2 // id: %48
  br bb13, loc "input.swift":3:11, scope 2 // id: %49
  • We don’t need those $Int structs on the stack anymore, so we deallocate them here.
  • Then we branch to basic block 13. This is another unnecessary branch, as bb13 only has one predecessor.

 

bb13:                                             // Preds: bb12
  %50 = load %27 : $\*CountableRange~~, loc "input.swift":3:11, scope 2 // user: %52
  %51 = alloc\_stack $CountableRange~~, loc "input.swift":3:11, scope 2 // users: %52, %54, %53
  store %50 to %51 : $\*CountableRange~~, loc "input.swift":3:11, scope 2 // id: %52
  • load reads the CountableRange from memory address %27 and stores it in register %50.
  • Allocate space on the stack for a CountableRange and store the address in register %51.
  • store the CountableRange that we just loaded into the newly allocated space at %51, effectively copying it.

 

 

  %53 = apply %22<CountableRange~~, Int, Int, CountableRange~~, CountableRange~~, Int, Int, Int, Int, Int, Int, IndexingIterator, CountableRange~~, Int, Int, IndexingIterator, CountableRange~~, Int, Int, Int, Int, Int, Int, Int, Int>(%2, %51) : $@convention(method)  (@in\_guaranteed τ\_0\_0) -> @out IndexingIterator, loc "input.swift":3:14, scope 2
  dealloc\_stack %51 : $\*CountableRange~~, loc "input.swift":3:14, scope 2 // id: %54
  dealloc\_stack %27 : $\*CountableRange~~, loc "input.swift":3:14, scope 2 // id: %55
  br bb14, loc "input.swift":3:1, scope 2         // id: %56
  • We call %22, which is a function reference to Collection.makeIterator(). It gets two arugments:
    • %2 is uninitialized memory that we allocated ages ago in basic block 0 for an IndexingIterator.
    • %51 is the address where we stored a copy of the CountableRange.
    • Putting this all together, we’re creating an iterator for the CountableRange.
  • We deallocate the memory for the CountableRange and its copy. This is because the IndexingIterator we just made contains all the information we needed from the countable range.
  • Note that the copy of the CountableRange wasn’t necessary. It will probably be removed in optimization.
  • Then we branch to basic block 14. Unlike all the previous branches, this one is necessary, because bb14 is the first (and only!) basic block with two predecessors.

The Loop Header

This basic block is a Loop Header because it dominates all other basic blocks in the loop. That means that every path to the other basic blocks must go through this basic block.

bb14:                                             // Preds: bb15 bb13
  // function\_ref IndexingIterator.next() -> A.\_Element?
  %57 = function\_ref @\_TFVs16IndexingIterator4nextfT\_GSqwx8\_Element\_ : $@convention(method)  (@inout IndexingIterator) -> @out Optional, loc "input.swift":3:7, scope 2 // user: %59
  %58 = alloc\_stack $Optional~~, loc "input.swift":3:7, scope 2 // users: %61, %60, %59
  %59 = apply %57(%58, %2) : $@convention(method)  (@inout IndexingIterator) -> @out Optional, loc "input.swift":3:7, scope 2
  %60 = load %58 : $\*Optional~~, loc "input.swift":3:7, scope 2 // users: %66, %64
  dealloc\_stack %58 : $\*Optional~~, loc "input.swift":3:7, scope 2 // id: %61
  • Get a reference to IndexingIterator.next().
  • Allocate space on the stack for an Optional.
  • Call IndexingIterator.next() with two arguments:
    • %58 The address for the space we just allocated for the Optional.
    • %2the uninitialized memory that we allocated ages ago in basic block 0 for an IndexingIterator.
  • This is interesting: unlike previous method calls, the return value %59 of IndexingIterator.next() is ignored. Instead, the returned Optional is loaded from %58 into %60.

 

 

  %62 = integer\_literal $Builtin.Int1, -1, loc "input.swift":3:1, scope 2 // user: %64
  %63 = integer\_literal $Builtin.Int1, 0, loc "input.swift":3:1, scope 2 // user: %64
  %64 = select\_enum %60 : $Optional~~, case #Optional.some!enumelt.1: %62, default %63 : $Builtin.Int1, loc "input.swift":3:1, scope 2 // user: %65
  cond\_br %64, bb15, bb16, loc "input.swift":3:1, scope 2 // id: %65
  • select_enum %0 : $E case #foo: %1, case #bar: %2, ... default %3 if enum %0 of type $E has value foo return %1, and so on for the other cases, by default returning %3. In this case, %64 will be -1 if the Optional has .Some value, and 0 otherwise.
  • cond_br %64, bb15, bb16 a conditional branch: if %64 is equal to 1, branch to bb15, otherwise branch to bb16 with the specified arguments.
  • Putting this all into context, the code will branch to bb15 if the Optional has .Some value, and bb16 otherwise.
  • Eagle-eyed readers will notice that the integer_literal used to initialize %62 is -1, not 1. I’m not sure why that is, because the docs for cond_br only document its behavior for 0 and 1.

The Loop Body

bb15:                                             // Preds: bb14
  %66 = unchecked\_enum\_data %60 : $Optional~~, #Optional.some!enumelt.1, loc "input.swift":3:1, scope 2 // user: %67
  debug\_value %66 : $Int, let, name "i", loc "input.swift":3:5, scope 2 // id: %67
  • unchecked_enum_data %60 : $E, #E.foo unsafely extracts the value of the enum %60 for the given case #E.foo.

At this point, the generated code diverges depending on whether we used .append(5) or += [5].

The .append(5) Case

  // function\_ref Array.append(A) -> ()
  %68 = function\_ref @\_TFSa6appendfxT\_ : $@convention(method)  (@in τ\_0\_0, @inout Array) -> (), loc "input.swift":4:10, scope 3 // user: %73
  %69 = integer\_literal $Builtin.Int64, 5, loc "input.swift":4:17, scope 3 // user: %70
  %70 = struct $Int (%69 : $Builtin.Int64), loc "input.swift":4:17, scope 3 // user: %72
  %71 = alloc\_stack $Int, loc "input.swift":4:17, scope 3 // users: %72, %74, %73
  store %70 to %71 : $\*Int, loc "input.swift":4:17, scope 3 // id: %72
  %73 = apply %68~~(%71, %17) : $@convention(method)  (@in τ\_0\_0, @inout Array) -> (), loc "input.swift":4:18, scope 3
  dealloc\_stack %71 : $\*Int, loc "input.swift":4:18, scope 3 // id: %74
  br bb14, loc "input.swift":5:1, scope 2         // id: %75
  • Create a function reference to Array.append().
  • Create an integer_literal with value 5, then create an $Int struct with that literal.
  • Allocate memory on the stack for the $Int struct, then store it there.
  • Call Array.append() with the arguments
    • %71, the $Int struct holding the value of 5
    • %17, the address of the global Array variable
  • Deallocate memory for the $Int struct
  • Branch back to bb14, the loop header.

The +=[5] Case

  // function_ref += infix<A> (inout [A.Iterator.Element], A) -> ()
  %68 = function_ref @_TFsoi2peuRxs10CollectionrFTRGSaWx8Iterator7Element__x_T_ : $@convention(thin)  (@inout Array, @in τ_0_0) -> (), loc "input.swift":5:10, scope 3 // user: %83
  • Operators are actually functions. This creates a reference to the += function.

 

  // function_ref Array.init(arrayLiteral : [A]...) -> [A]
  %69 = function_ref @_TFSaCft12arrayLiteralGSax__GSax_ : $@convention(method)  (@owned Array, @thin Array.Type) -> @owned Array, loc "input.swift":5:13, scope 3 // user: %80
  %70 = metatype $@thin Array.Type, loc "input.swift":5:13, scope 3 // user: %80
  %71 = integer_literal $Builtin.Word, 1, loc "input.swift":5:14, scope 3 // user: %73
  • Create a reference to the Array.init(arrayLiteral : [A]...) -> [A]  function.
  • Create a reference to the metatype Array.Type.
  • Create an integer_literal with value 1. This is the number of elements in the array that we’re about to allocate memory for.

 

  // function_ref _allocateUninitializedArray<A> (Builtin.Word) -> ([A], Builtin.RawPointer)
  %72 = function_ref @_TFs27_allocateUninitializedArrayurFBwTGSax_Bp_ : $@convention(thin)  (Builtin.Word) -> (@owned Array, Builtin.RawPointer), loc "input.swift":5:14, scope 3 // user: %73
  %73 = apply %72(%71) : $@convention(thin)  (Builtin.Word) -> (@owned Array, Builtin.RawPointer), loc "input.swift":5:14, scope 3 // users: %75, %74
  • Create a reference to the _allocateUninitializedArray(count: Builtin.Word) function, returns a tuple containing an array of count uninitialized elements and a pointer to the first element.
  • Call _allocateUninitializedArray with the integer_literal of 1 as its only argument

 

  %74 = tuple_extract %73 : $(Array, Builtin.RawPointer), 0, loc "input.swift":5:14, scope 3 // user: %80
  %75 = tuple_extract %73 : $(Array, Builtin.RawPointer), 1, loc "input.swift":5:14, scope 3 // user: %76
  %76 = pointer_to_address %75 : $Builtin.RawPointer to [strict] $*Int, loc "input.swift":5:14, scope 3 // user: %79
  • The returned value in %73 is a tuple, so tuple_extract extracts the first value (the uninitialized Array) into %74 and the second value (the pointer) into %75.
  • pointer_to_address %75 is an unchecked conversion of the pointer %75 into an address.

 

  %77 = integer_literal $Builtin.Int64, 5, loc "input.swift":5:14, scope 3 // user: %78
  %78 = struct $Int (%77 : $Builtin.Int64), loc "input.swift":5:14, scope 3 // user: %79
  store %78 to %76 : $*Int, loc "input.swift":5:14, scope 3 // id: %79
  • Create an integer_literal with value 5, then create an $Int struct with that literal.
  • Store that $Int struct at the start of the uninitialized array.
  • The array %74 is now initialized!

 

  %80 = apply %69(%74, %70) : $@convention(method)  (@owned Array, @thin Array.Type) -> @owned Array, loc "input.swift":5:14, scope 3 // user: %82
  • %69 is a reference to the Array.init(arrayLiteral : [A]...) -> [A]  function. We’re calling it on these arguments:
    • %74 is the array we just initialized
    • %70 is a reference to the metatype Array.Type.
  • %80 is now an Array initialized with the Int value 5
  %81 = alloc_stack $Array, loc "input.swift":5:13, scope 3  // users: %82, %84, %83
  store %80 to %81 : $*Array, loc "input.swift":5:13, scope 3 // id: %82
  • Allocate space on the stack for an Array, and store the array containing the Int with value 5 there.

 

  %83 = apply %68<[Int], Int, Int, CountableRange, IndexingIterator, ArraySlice, Int, Int, Int, Int, Int, Int, IndexingIterator, CountableRange, Int, Int, Int, IndexingIterator, ArraySlice, Int, Int, Int, Int, Int, Int, Int, Int>(%17, %81) : $@convention(thin)  (@inout Array, @in τ_0_0) -> (), loc "input.swift":5:10, scope 3
  • %68 is a reference to the += function. We’re calling it on these arguments:
    • %17, the address of the global Array variable
    • %81, the address on the stack where we stored the array containing the Int with value 5.dealloc_stack %81 : $*Array, loc “input.swift”:5:15, scope 3 // id: %84br bb14, loc “input.swift”:6:1, scope 2 // id: %85
  • Finally, we deallocate the memory for the temporary array, and branch back to the loop header

The Shared Exit Block

When the IndexingIterator is finished, the IndexingIterator.next() in the loop header will return an Optional with no value. That causes the cond_br to branch to this basic block.

bb16:                                             // Preds: bb14
  %76 = integer\_literal $Builtin.Int32, 0, scope 2 // user: %77
  %77 = struct $Int32 (%76 : $Builtin.Int32), scope 2 // user: %79
  dealloc\_stack %2 : $\*IndexingIterator, loc "input.swift":3:7, scope 2 // id: %78
  return %77 : $Int32, scope 2                    // id: %79
}
  • Create an integer_literal with value 0, then create an $Int struct with that literal.
  • Deallocate the memory for the IndexingIterator; we don’t need it anymore.
  • Return register %77, which is the Int with value 0, telling the operating system that the program exited with no errors.

The Test Case

You now know enough SIL to write a simple test case for the optimization that we’re about to implement.

// CHECK-LABEL: sil @append_contentsOf
// CHECK:   [[ACFUN:%.*]] = function_ref @arrayAppendContentsOf
// CHECK-NOT: apply [[ACFUN]]
// CHECK:   [[AEFUN:%.*]] = function_ref @_TFSa6appendfxT_
// CHECK:   apply [[AEFUN]]
// CHECK: return
sil @append_contentsOf : $@convention(thin) () -> () {

The // CHECK comments instruct the test runner to assert on a match (or the absence of a match) of the specified text, in the order that the // CHECK comments appear in. These instructions assert that the old Array.append(contentsOf:) function call has been replaced with a call to Array.append(element:). The latter call’s name has been mangled into @_TFSa6appendfxT_ because it comes from the standard library. In real code, the former call’s name would be mangled too, but because our optimization will only look at the semantic attribute of the function, we can use a more readable function name in testing.

  %0 = function_ref @swift_bufferAllocate : $@convention(thin) () -> @owned AnyObject
  %1 = integer_literal $Builtin.Int64, 1
  %2 = struct $MyInt (%1 : $Builtin.Int64)
  %3 = apply %0() : $@convention(thin) () -> @owned AnyObject
  %4 = metatype $@thin Array.Type
  %5 = function_ref @arrayAdoptStorage : $@convention(thin) (@owned AnyObject, MyInt, @thin Array.Type) -> @owned (Array, UnsafeMutablePointer)
  %6 = apply %5(%3, %2, %4) : $@convention(thin) (@owned AnyObject, MyInt, @thin Array.Type) -> @owned (Array, UnsafeMutablePointer)
  %7 = tuple_extract %6 : $(Array, UnsafeMutablePointer), 0
  %8 = tuple_extract %6 : $(Array, UnsafeMutablePointer), 1
  %9 = struct_extract %8 : $UnsafeMutablePointer, #UnsafeMutablePointer._rawValue
  %10 = pointer_to_address %9 : $Builtin.RawPointer to [strict] $*MyInt
  %11 = integer_literal $Builtin.Int64, 27
  %12 = struct $MyInt (%11 : $Builtin.Int64)
  store %12 to %10 : $*MyInt

Creates an array with one element. We’re not writing this SIL exactly the way that it would be generated on macOS because the test would fail on Linux, where Objective-C bridging doesn’t exist.

  %13 = alloc_stack $Array
  %14 = metatype $@thin Array.Type
  %15 = function_ref @arrayInit : $@convention(method) (@thin Array.Type) -> @owned Array
  %16 = apply %15(%14) : $@convention(method) (@thin Array.Type) -> @owned Array
  store %16 to %13 : $*Array

Creates an empty array, and stores it on the stack. It needs to be on the stack because it’s the self parameter to Array.append(contentsOf:).

  %17 = function_ref @arrayAppendContentsOf : $@convention(method) (@owned Array, @inout Array) -> ()
  %18 = apply %17(%7, %13) : $@convention(method) (@owned Array, @inout Array) -> ()

Gets a reference to Array.append(contentsOf:) and calls it with the two arrays as arguments.

  dealloc_stack %13 : $*Array
  %19 = tuple ()
  return %19 : $()
}

Cleans up the stack, and returns void, which is represented by an empty tuple in SIL.

The Optimization Pass

There are two types of SIL: Raw SIL, and Canonical SIL. The Swift compiler produces Raw SIL when it lowers the AST. Raw SIL might not be semantically valid. A series of deterministic optimization passes is run on Raw SIL, and that produces Canonical SIL, which is guaranteed to be semantically valid. Functions are specialized and inlined during this transformation, but we can delay the inlining of a function by adding a semantic attribute to it:

@semantic(“sometext”)

Semantic attributes instruct the Swift compiler to optimize code differently. They can be used to disable an optimization (like inlining), or to force an optimization. They’re used extensively in the Swift standard library, where hand-tuning how code is optimized is important.

Array methods are defined in stdlib/public/core/Arrays.swift.gyb, where gyb stands for “Generate Your Boilerplate”. We’ll add a semantic attribute to Array.append(contentsOf:), since that’s the function call that exists after Array’s operator+= function call is inlined. We’ll then modify include/swift/SILOptimizer/Analysis/ArraySemantic.h and its implementation file so that it’s easier to work with new semantic attribute. Lastly, we’ll add the new semantic attribute to lib/SILOptimizer/LoopTransforms/COWArrayOpt.cpp to squash some warnings about non-exhaustive switch statements.

Next, we need to write code that finds initializations of array literals, finds uses of those array literals in Array.append(contentsOf: instructions, and then proves that the arrays weren’t modified or escaped in between those two things. Thankfully, there’s a pass that already does something very similar: ArrayElementValuePropagation. That optimization converts code like this:

let a = [1, 2, 3];
let b = a[0] + a[2];

Into code like this:

let a = [1, 2, 3];
let b = 1 + 3;

It’s close enough to the optimization that we want to perform, so we’ll modify this pass to do some additional work, rather than create an entirely new pass.

The “main method” of an optimization pass is void run(), and we’re working in a SILFunctionTransform because we’re transforming function bodies. Here’s a walkthrough of the modified run() method.

  void run() override {
    auto &Fn = *getFunction(); // Get a reference to the function we're in

    // Store information about the calls that we want to replace
    llvm::SmallVector
      GetElementReplacements;
    llvm::SmallVector
      AppendContentsOfReplacements;

    // Iterate through the basic blocks in the function
    for (auto &BB :Fn) {
      // Iterate through the instructions in the basic blocks
      for (auto &Inst : BB) {
        // Filter for only apply instructions
        if (auto *Apply = dyn_cast(&Inst)) {
          // This is a helper class that tells us if an apply instruction is an array literal allocation
          // and simplifies getting the elements in that literal
          ArrayAllocation ALit;
          if (ALit.analyze(Apply)) {
            ALit.getGetElementReplacements(GetElementReplacements);
            // We call out helper method that extracts all the elements of an array literal
            ALit.getAppendContentOfReplacements(AppendContentsOfReplacements);
          }
        }
      }
    }

    bool Changed = false;
    
    // This was already in this optimization.
    for (ArrayAllocation::GetElementReplacement &Repl : GetElementReplacements) {
      ArraySemanticsCall GetElement(Repl.GetElementCall);
      Changed |= GetElement.replaceByValue(Repl.Replacement);
    }

    // We'll just add this call to our helper function here
    Changed |= replaceAppendCalls(AppendContentsOfReplacements);

    // You need to invalidate the analysis if you've changed something in an optimization
    if (Changed) {
      PM->invalidateAnalysis(
          &Fn, SILAnalysis::InvalidationKind::CallsAndInstructions);
    }
  }

Now let’s take a look at the replacement helper method.

  bool replaceAppendCalls(
                  ArrayRef Repls) {
    auto &Fn = *getFunction();
    auto &M = Fn.getModule();
    auto &Ctx = M.getASTContext();

    if (Repls.empty())
      return false;

    DEBUG(llvm::dbgs() << "Array append contentsOf calls replaced in "
                       << Fn.getName() << " (" << Repls.size() getAnyNominal();
      SubstitutionMap ArraySubMap = ArrayType.getSwiftRValueType()
        ->getContextSubstitutionMap(M.getSwiftModule(), NTD);
      
      GenericSignature *Sig = NTD->getGenericSignature();
      assert(Sig && "Array type must have generic signature");
      SmallVector Subs;
      Sig->getSubstitutions(ArraySubMap, Subs);
      
      // Finally we call the helper function we added to ArraySemanticsCall
      // to perform the actual replacement.
      AppendContentsOf.replaceByAppendingValues(M, AppendFn,
                                                Repl.ReplacementValues, Subs);
    }
    return true;
  }

A Wild Benchmark Regression Appears

I thought I was done at this point, but although my optimization was working, it had caused other benchmarks to regress. Not good. I had to isolate the problem by comparing the benchmark SIL generated from tip-of-tree swiftc versus my modified branch. I noticed that the function calls in my modified branch were unspecialized, and that was causing the slowdown.

The problem was that by adding semantic attributes to array.append_element and array.append_contentsOf, we delayed their inlining. Those functions contain calls to other generic functions, and the generic specializer was counting on the parent function being inlined so that it could specialize the function calls inside of them.

One solution that I considered was to add another inlining stage to the performance inliner transform. However, it seemed silly to add an entire inlining stage for just two functions.

Another solution I considered was to add a different type of semantic tag that’s only used for identifying functions, and not for changing the optimizer’s behavior.

Ultimately, eeckstein got this pull request across the finish line by special-casing the inlining of the two Array functions that I had added semantic attributes to.

Lessons Learned

It’s very difficult to casually contribute to swiftc. It took me two months to get my PR merged, and it felt like a part-time job while I was working on it. I don’t think I would make another contribution like this unless I was being paid to do it, and had an easier way to communicate with the very helpful Apple folks than the mailing list.

I switched to a 12” MacBook before I started working on my swiftc PR. It was so slow that I was only able to iterate on the code once a day, because a single compile and test run would take all night. I ended up buying a top-of-the-line 15” MacBook Pro because it was the only way to iterate on the codebase more than once a day.

Even on the new MacBook Pro, a full integration test run takes over four hours. This is a problem because only committers to the Swift repo have permissions to run tests on their CI infrastructure. Swift might be open-source, but if contributors need to drop thousands on a machine in order to work on it, it’s not going to have very many contributors.

I was one of the top 100 contributors to Swift after making my humble pull request, and most of the files I was working on had only ever been touched by ten or so people. That isn’t great for such a popular language, and I hope that changes.

Swift’s developer workflow is slow and clunky. If you switch branches, you have to wait for most of the files to be recompiled, which can take 15 or more minutes even on a beefy machine. I ended up keeping multiple Swift repositories around (20gb each!) so I could “switch branches” more easily as I was diffing the SIL from different versions of the compiler. I had to write some bash scripts to manage this mess.

Swiftc has a very steep learning curve. There is very little documentation, and the code isn’t very well commented. It’s a huge project, so even a modest change like mine required me to talk to multiple subject-matter experts. And even with a bunch of helpful Apple folks working with you…

It’s really easy to break swiftc because of how complex it is. My original pull request was approved and merged in a month. Despite only having about 200 lines of changes, I received 125 comments from six reviewers. Even after that much scruitiny, it was reverted almost immediately because it introduced a memory leak that a seventh person found after running a four hour long standard library integration test.

It was a fluke that that test caught the memory leak at all. The test was written in Swift, and my optimization broke the memory leak assertion in that test. In other words, my optimization caused a memory leak in the test itself, not the code that the test was testing. That would have been a serious bug if it had made it into production, because this optimization would have introduced memory leaks into Swift programs compiled with the buggy swiftc.

The optimization pipeline is also really easy to break. The order that passes run in, and how they are configured, is extremely important, yet not very well documented. There’s a lot of tacit knowledge in there, and it’s not obvious to me when a solution is hacky or acceptable. Special-casing the inlining of those two methods in code still seems hacky to me, but I don’t have as much context as the Apple folks do.

If you’re going to contribute to swiftc, be sure to run all the tests and benchmarks. The standard library integration test is not documented in the README, but you should run that too. Sometimes this is hard. The benchmark runner wasn’t working while I was working on this PR, so I had to write my own benchmarking script. Benchmarks on Apple’s CI were working, but that doesn’t help me when only committers have access to that.

Three Ideas For Improvment

I think that Swift has great potential as an open-source project, but I just don’t see how the majority of people can contribute to it right now.

  • Anyone, not just contributors, must be able to run tests and benchmarks on CI.
  • There needs to be a lower latency way to get help from Apple employees than the mailing list.
  • The developer workflow needs to be improved. Switching between branches and recompiling should be cheap. Scripts like the benchmark runner should be well maintained.

I glossed over some parts of what it took to get this optimization to work, but this post is already far too long. I may come back later to fill in the gaps, but in the meantime you can find the final version of this pull request here.

Toggling Transparency in iTerm2

I recently started developing full-time in Vim again because all my code has to run on remote virtual machines. I like using iTerm2 with some transparency enabled so I can see what’s going on in my browser, but it’s started giving me a headache now that I’m spending all day in my terminal. I couldn’t find a built-in hotkey for toggling transparency in iTerm2, so I cooked something up with AppleScript.

  1. Launch Automator, which comes pre-installed with your Mac.
  2. When prompted for a new document type, select “Service”.step-2
  3. In the “Actions” tab on the left, search for “applescript”.step-3
  4. Drag the “Run AppleScript” result into the window on the right.step-4
  5. Copy and paste this script into the text area, then hit the hammer icon to build it.

    tell application "iTerm"
        if the transparency of the current session of the current window > 0 then
            repeat with aWindow in windows
                tell aWindow
                    repeat with aTab in tabs of aWindow
                        repeat with aSession in sessions of aTab
                            tell aSession
                                set transparency to 0
                            end tell
                        end repeat
                    end repeat
                end tell
            end repeat
        else
            repeat with aWindow in windows
                tell aWindow
                    repeat with aTab in tabs of aWindow
                        repeat with aSession in sessions of aTab
                            tell aSession
                                set transparency to 0.3
                            end tell
                        end repeat
                    end repeat
                end tell
            end repeat
        end if
    end tell
    

    step-5

  6. Change the dropdown boxes on top so that they read “Service receives no input in iTerm”. You may have to click “Other…” in order to select iTerm.step-6
  7. Save this as “Toggle Transparency”.step-7
  8. Open System Preferences, and go to the Keyboard section, then the Shortcuts tab, then the Services category.step-8
  9. Click on “Add Shortcut” next to “Toggle Transparency”, and record a key combination. I use Cmd + Shift + U.

Try it out! Open iTerm2, and hit your key combination a few times. You can also run the script by click on iTerm2 in your menu bar and going to Services. You may want to tweak the AppleScript I provided if your default profile has transparency because my default profile does not have transparency. Just flip the transparency values of 0.3 and 0.

Algebraic Data Types in Swift

An algebraic data type is a type that’s the union of other data types. That is, it’s a type that may be one of several other types. Here’s how we would implement a linked list as an algebraic data type in Swift:

enum LinkedList<Element> {  
    case empty
    indirect case node(data: Element, next: LinkedList)
}

This defines an enum called LinkedList that might either be .empty or a .node that points to another LinkedList. There are three interesting things to note. The first is that we’ve created a generic data type, so the type of Element is declared by the consumer of LinkedList. The second is that the .node case uses LinkedList recursively, and must therefore be marked with indirect. The third is that since the .empty case has no parameters, the parenthesis may be omitted.

Here’s how we define instances of LinkedList:

let a: LinkedList<Int> = .empty  
let b: LinkedList<Int> = .node(data: 1, next: .node(data: 2, next: .empty))  

To work with an algebraic data type, we deconstruct it using pattern matching. Here’s how we would print a LinkedList:

enum LinkedList<Element>: CustomStringConvertible {  
    '' // cases omitted for brevity
    var description: String {
        switch self {
        case .empty:
        return "(end)"
        case let .node(data, next):
        return "(data), (next.description)"
        }
    }
}

let b: LinkedList<Int> = .node(data: 1, next: .node(data: 2, next: .empty))

print("b: (b)") // => b: 1, 2, (end)  

We’ve implemented the CustomStringConvertible protocol so that we can use string interpolation to print LinkedList instances. While it is possible in Swift to pattern match using an if case statement, the switch is preferable because the compiler will warn us if we’ve forgotten to handle a case. This safety is one big advantage that algebraic data types have over their classical counterparts. In a traditionally implemented linked list, you would have to remember to check if the next pointer was null to know if you were at the end. This problem gets worse as the number of cases increase in more complex data structures, such as full binary range trees with sentinels.

Note that since the description instance variable only has a getter, we do not need to use the more verbose syntax:

var description = {  
    get {
        // etc
    }
    set {
        // etc
    }
}

Our print function was useful, but in order to do interesting things we need to be able to modify algebraic data types. Rather than mutate the existing data structure, we’ll return a new data structure that represents the result after the requested operation. Since we’re not going to mutate the original data structure, we’ll follow the Swift 3 naming convention of using a gerund for our methods. Here’s how we would add an .inserting method to LinkedList:

enum LinkedList<Element>: CustomStringConvertible {  
    // cases and "description" omitted for brevity
    func inserting(e: Element) -> LinkedList<Element> {
        switch self {
            case .empty:
            return .node(data: e, next: .empty)
            case .node:
            return .node(data: e, next: self)
        }
    }
}

let c = b.inserting(e: 0)  
print("c: (c)") // => c: 0, 1, 2, (end)  

The key is that we’re returning a new LinkedList that represents the result after the insertion. Notice how in the .node case, we do not need to pattern match on .node(data, next) because data and next are not needed in order to construct the new .node; we can simply use self as the next: node.

Finally, let’s implement the classic “reverse a linked list” interview question using our algebraic data type:

enum LinkedList<Element>: CustomStringConvertible {  
    // cases, "description", and "insert" omitted for brevity
    func appending(_ e: Element) -> LinkedList<Element> {
        switch self {
            case .empty:
            return .node(data: e, next: .empty)
            case let .node(oldData, next):
            return .node(data: oldData, next: next.appending(e))
        }
    }

    func reversed() -> LinkedList<Element> {
        switch self {
            case .empty:
            return self
            case let .node(data, next):
            return next.reversed().appending(data)
        }
    }
}

print("reversed c: (c.reversed())") // => reversed c: 2, 1, 0, (end)  

I’ll leave it as an exercise to the reader to figure out what the running time for this algorithm is.

Here’s the Playground on GitHub

A Faster Horse: The Future Of Web Development

Let’s talk about the rise and fall of technologies. There’s a neat graphical representation of the life-cycle of a technology called an S-Curve:

s-curve.jpeg

JavaScript is enjoying tremendous success right now, so I’m not going to bore you with the metrics. Exponential growth is fantastic, and as someone who uses JavaScript at almost every layer of the tech stack, I couldn’t be happier for my community. It’s certainly made my job as a web developer much easier.

Now, what about the future of JavaScript are you most excited about? I can think of a few off the top of my head:

  • New language features: ES6 generators, template strings, SIMD, …
  • Package manager/module ecosystem upgrades: parameterized scripts, private repositories, …
  • Framework updates: Angular 2, koa, …

Now, these are all quite exciting. New language features let us write more expressive code, and do faster computations. New frameworks help us write more robust applications. npm is amazing, and it is certainly the innovation that made node so successful, so any improvement to it is just icing on the cake.

However, these are all incremental improvements. We’re still sliding up the same S-Curve, and we’re going to reach maturity eventually because all technologies experience diminishing returns. You’re optimistic to a fault if you think that HTML+CSS+JavaScript is the holy grail of web development, and that we can’t do better. As much as we love our tools, we have to accept that they are far from perfect.

multiple-s-curves.jpeg

S-Curves don’t exist in isolation. Something else is on the horizon, and it’s not going to be an incremental improvement. This is why I think it was fantastic that TJ made a high-profile jump to a different S-Curve. Go has its own set of problems, but that’s not the point; he recognized the limits of the tools at hand, and tried something different. It’s far easier to pour more effort into the tools you are already familiar with than it is to try something completely different.

Pick any technology out there and you’ll find someone who can wax poetic about how its better than what you’re using right now. It doesn’t matter if you think they’re right or wrong, listen like you’re wrong, because eventually, you will be wrong. Eventually, you will be the cranky administrator who still believes that JSP is the holy grail of web development. Eventually, you will have to do something insane like write a new VM for the aging technology you’re locked into. Eventually, you will still be concatenating strings like this when everyone else is using the + operator.

What is next-generation web development going to look like? I don’t know, but I do have a small wish list:

  • Lower-level access to rendering: Lose the HTML/CSS, and go straight to the canvas
  • Multithreading support: We’re close to the limit of single-core performance, and even cell phones have multiple cores now
  • Lower-level access to memory: This is a compliment to multithreading, and its nice to not have to rely on garbage collection if you know what you’re doing
  • Static verification: This should be an option for applications where correctness is important
  • Better error handling: This is a real pain in node right now

What do you want to see next?

web-dev-s-curves.jpeg

Jankproof Javascript

Javascript is getting faster all the time, but things like long lists of complex cells are always going to be expensive to compute. To solve this, I wrote the unjank module last week. It helps you do expensive things in Javascript without causing the user experience to suffer.

I’m happy to report that after a week of production use, it’s clear that this technique is a significant improvement over what we were doing before for several reasons.

Device Agnostic

It doesn’t matter how fast or slow the task is; unjank benchmarks it on-the-fly and runs it as quickly as the device will allow. This means that your application is jank-free on all devices without you having to come up with magic numbers that determine how quickly a task should run.

Smooth Scrolling

An unexpected discovery was that kinetic scrolling in Webkit works very well even if the page is getting longer during the scroll. This means that if your user is scrolling down a long list as it is being rendered with unjank, they will not perceive it as slow at all. Webkit preserves the momentum of the scroll and keeps going as the page gets longer.

Aborting Tasks

The ability to abort an ongoing task is critical because most tasks are initiated by a user action. For example, if you have two tabs that have a long list each, quickly switching between the tabs will eventually crash the application unless the rendering of the lists is aborted when the tab becomes inactive.

Conclusion

I’m going to be using unjank a lot more going forward, especially where lists are involved. I pulled up the Getable app to experience it pre-unjank, and it has that signature lagginess associated with web apps, despite our use of requestAnimationFrame. With unjank, our longest lists no longer cause the browser to stutter — a small step out of the uncanny valley of hybrid apps.