orphan: |
---|
Contents
This whitepaper discusses the Swift calling convention, at least as we want it to be.
It's a basic assumption in this paper that Swift shouldn't make an implicit promise to exactly match the default platform calling convention. That is, if a C or Objective-C programmer manages to derive the address of a Swift function, we don't have to promise that an obvious translation of the type of that function will be correctly callable from C. For example, this wouldn't be guaranteed to work:
// In Swift: func foo(x: Int, y: Double) -> MyClass { ... } // In Objective-C: extern id _TF4main3fooFTSiSd_CS_7MyClass(intptr_t x, double y);
We do sometimes need to be able to match C conventions, both to use them and to generate implementations of them, but that level of compatibility should be opt-in and site-specific. If Swift would benefit from internally using a better convention than C/Objective-C uses, and switching to that convention doesn't damage the dynamic abilities of our target platforms (debugging, dtrace, stack traces, unwinding, etc.), there should be nothing preventing us from doing so. (If we did want to guarantee compatibility on this level, this paper would be a lot shorter!)
Function call rules in high-level languages have three major components, each operating on a different abstraction level:
- the high-level semantics of the call (pass-by-reference vs. pass-by-value),
- the ownership and validity conventions about argument and result values ("+0" vs. "+1", etc.), and
- the "physical" representation conventions of how values are actually communicated between functions (in registers, on the stack, etc.).
We'll tackle each of these in turn, then conclude with a detailed discussion of function signature lowering.
The major division in argument passing conventions between languages is between pass-by-reference and pass-by-value languages. It's a distinction that only really makes sense in languages with the concept of an l-value, but Swift does, so it's pertinent.
In general, the terms "pass-by-X" and "call-by-X" are used interchangeably. It's unfortunate. We'll prefer "pass-by-X" for consistency and to emphasize that these conventions are argument-specific.
In pass-by-reference (also called pass-by-name or pass-by-address), if A is an l-value expression, foo(A) is passed some sort of opaque reference through which the original l-value can be modified. If A is not an l-value, the language may prohibit this, or (if pass-by-reference is the default convention) it may pass a temporary variable containing the result of A.
Don't confuse pass-by-reference with the concept of a reference type. A reference type is a type whose value is a reference to a different object; for example, a pointer type in C, or a class type in Java or Swift. A variable of reference type can be passed by value (copying the reference itself) or by reference (passing the variable itself, allowing it to be changed to refer to a different object). Note that references in C++ are a generalization of pass-by-reference, not really a reference type; in C++, a variable of reference type behaves completely unlike any other variable in the language.
Also, don't confuse pass-by-reference with the physical convention of passing an argument value indirectly. In pass-by-reference, what's logically being passed is a reference to a tangible, user-accessible object; changes to the original object will be visible in the reference, and changes to the reference will be reflected in the original object. In an indirect physical convention, the argument is still logically an independent value, no longer associated with the original object (if there was one).
If every object in the language is stored in addressable memory, pass-by-reference can be easily implemented by simply passing the address of the object. If an l-value can have more structure than just a single, independently-addressable object, more information may be required from the caller. For example, an array argument in FORTRAN can be a row or column vector from a matrix, and so arrays are generally passed as both an address and a stride. C and C++ do have unaddressable l-values because of bitfields, but they forbid passing bitfields by reference (in C++) or taking their address (in either language), which greatly simplifies pointer and reference types in those languages.
FORTRAN is the last remaining example of a language that defaults to pass-by-reference. Early FORTRAN implementations famously passed constants by passing the address of mutable global memory initialized to the constant; if the callee modified its parameter (illegal under the standard, but...), it literally changed the constant for future uses. FORTRAN now allows procedures to explicitly take arguments by value and explicitly declare that arguments must be l-values.
However, many languages do allow parameters to be explicitly marked as pass-by-reference. As mentioned for C++, sometimes only certain kinds of l-values are allowed.
Swift allows parameters to be marked as pass-by-reference with inout. Arbitrary l-values can be passed. The Swift convention is to always pass an address; if the parameter is not addressable, it must be materialized into a temporary and then written back. See the accessors proposal for more details about the high-level semantics of inout arguments.
In pass-by-value, if A is an l-value expression, foo(A) copies the current value there. Any modifications foo makes to its parameter are made to this copy, not to the original l-value.
Most modern languages are pass-by-value, with specific functions able to opt in to pass-by-reference semantics. This is exactly what Swift does.
There's not much room for variation in the high-level semantics of passing arguments by value; all the variation is in the ownership and physical conventions.
Arguments and results that require cleanup, like an Objective-C object reference or a non-POD C++ object, raise two questions about responsibility: who is responsible for cleaning it up, and when?
These questions arise even when the cleanup is explicit in code. C's strdup function returns newly-allocated memory which the caller is responsible for freeing, but strtok does not. Objective-C has standard naming conventions that describe which functions return objects that the caller is responsible for releasing, and outside of ARC these must be followed manually. Of course, conventions designed to be implemented by programmers are often designed around the simplicity of that implementation, rather than necessarily being more efficient.
Pass-by-reference arguments generally don't involve a transfer of ownership. It's assumed that the caller will ensure that the referent is valid at the time of the call, and that the callee will ensure that the referent is still valid at the time of return.
FORTRAN does actually allow parameters to be tagged as out-parameters, where the caller doesn't guarantee the validity of the argument before the call. Objective-C has something similar, where an indirect method argument can be marked out; ARC takes advantage of this with autoreleasing parameters to avoid a copy into the writeback temporary. Neither of these are something we semantically care about supporting in Swift.
There is one other theoretically interesting convention question here: the argument has to be valid before the call and after the call, but does it have to valid during the call? Swift's answer to this is generally "yes". Swift does have inout aliasing rules that allow a certain amount of optimization, but the compiler is forbidden from exploiting these rules in any way that could cause memory corruption (at least in the absence of race conditions). So Swift has to ensure that an inout argument is valid whenever it does something (including calling an opaque function) that could potentially access the original l-value.
If Swift allowed local variables to be captured through inout parameters, and therefore needed to pass an implicit owner parameter along with an address, this owner parameter would behave like a pass-by-value argument and could use any of the conventions listed below. However, the optimal convention for this is obvious: it should be guaranteed, since captures are very unlikely and callers are almost always expected to use the value of an inout variable afterwards.
All conventions for this have performance trade-offs.
We're only going to discuss static conventions, where the transfer is picked at compile time. It's possible to have a dynamic convention, where the caller passes a flag indicating whether it's okay to directly take responsibility for the value, and the callee can (conceptually) return a flag indicating whether it actually did take responsibility for it. If copying is extremely expensive, that can be worthwhile; otherwise, the code cost may overwhelm any other benefits.
This discussion will ignore one particular impact of these conventions on code size. If a function has many callers, conventions that require more code in the caller are worse, all else aside. If a single call site has many possible targets, conventions that require more code in the callee are worse, all else aside. It's not really reasonable to decide this in advance for unknown code; we could maybe make rules about code calling system APIs, except that system APIs are by definition locked down, and we can't change them. It's a reasonable thing to consider changing with PGO, though.
A common refrain in this performance analysis will be whether a function has responsibility for a value. A function has to get a value from somewhere:
A caller is usually responsible for the return values it receives: the callee generated the value and the caller is responsible for destroying it. Any other convention has to rely on heavily restricting what kind of value can be returned. (If you're thinking about Objective-C autoreleased results, just accept this for now; we'll talk about that later.)
A function isn't necessarily responsible for a value it loads from memory. Ignoring race conditions, the function may be able to immediately use the value without taking any specific action to keep it valid.
A callee may or may not be responsible for a value passed as a parameter, depending on the convention it was passed with.
A function might come from a source that doesn't necessarily make the function responsible, but if the function takes an action which invalidates the source before using the value, the function has to take action to keep the value valid. At that point, the function has responsibility for the value despite its original source.
For example, a function foo() might load a reference r from a global variable x, call an unknown function bar(), and then use r in some way. If bar() can't possibly overwrite x, foo() doesn't have to do anything to keep r alive across the call; otherwise it does (e.g. by retaining it in a refcounted environment). This is a situation where humans are often much smarter than compilers. Of course, it's also a situation where humans are sometimes insufficiently conservative.
A function may also require responsibility for a value as part of its operation:
- Since a variable is always responsible for the current value it stores, a function which stores a value into memory must first gain responsibility for that value.
- A callee normally transfers responsibility for its return value to its caller; therefore it must gain responsibility for its return value before returning it.
- A caller may need to gain responsibility for a value before passing it as an argument, depending on the parameter's ownership-transfer convention.
There are three static parameter conventions for ownership worth considering here:
The caller may transfer responsibility for the value to the callee. In SIL, we call this an owned parameter.
This is optimal if the caller has responsibility for the value and doesn't need it after the call. This is an extremely common situation; for example, it comes up whenever a call result is immediately used an argument. By giving the callee responsibility for the value, this convention allows the callee to use the value at a later point without taking any extra action to keep it alive.
The flip side is that this convention requires a lot of extra work when a single value is used multiple times in the caller. For example, a value passed in every iteration of a loop will need to be copied/retained/whatever each time.
The caller may provide the value without any responsibility on either side. In SIL, we call this an unowned parameter. The value is guaranteed to be valid at the moment of the call, and in the absence of race conditions, that guarantee can be assumed to continue unless the callee does something that might invalidate it. As discussed above, humans are often much smarter than computers about knowing when that's possible.
This is optimal if the caller can acquire the value without responsibility and the callee doesn't require responsibility of it. In very simple code --- e.g., loading values from an array and passing them to a comparator function which just reads a few fields from each and returns --- this can be extremely efficient.
Unfortunately, this convention is completely undermined if either side has to do anything that forces it to take action to keep the value alive. Also, if that happens on the caller side, the convention can keep values alive longer than is necessary. It's very easy for both sides of the convention to end up doing extra work because of this.
The caller may assert responsibility for the value. In SIL, we call this a guaranteed parameter. The callee can rely on the value staying valid for the duration of the call.
This is optimal if the caller needs to use the value after the call and either has responsibility for it or has a guarantee like this for it. Therefore, this convention is particularly nice when a value is likely to be forwarded by value a great deal.
However, this convention does generally keep values alive longer than is necessary, since the outermost function which passed it as an argument will generally be forced to hold a reference for the duration. By the same mechanism, in refcounted systems, this convention tends to cause values to have multiple retains active at once; for example, if a copy-on-write array is created in one function, passed to another, stored in a mutable variable, and then modified, the callee will see a reference count of 2 and be forced to do a structural copy. This can occur even if the caller literally constructed the array for the sole and immediate purpose of passing it to the callee.
Objective-C generally uses the unowned convention for object-pointer parameters. It is possible to mark a parameter as being consumed, which is basically the owned convention. As a special case, in ARC we assume that callers are responsible for keeping self values alive (including in blocks), which is effectively the guaranteed convention.
unowned causes a lot of problems without really solving any, in my experience looking at ARC-generated code and optimizer output. A human can take advantage of it, but the compiler is so frequently blocked. There are many common idioms (like chains of functions that just add default arguments at each step) have really awful performance because the compiler is adding retains and releases at every single level. It's just not a good convention to adopt by default. However, we might want to consider allowing specific function parameters to opt into it; sort comparators are a particularly interesting candidate for this. unowned is very similar to C++'s const & for things like that.
guaranteed is good for some things, but it causes a lot of silly code bloat when values are really only used in one place, which is quite common. The liveness / refcounting issues are also pretty problematic. But there is one example that's very nice for guaranteed: self. It's quite common for clients of a type to call multiple methods on a single value, or for methods to dispatch to multiple other methods, which are exactly the situations where guaranteed excels. And it's relatively uncommon (but not unimaginable) for a non-mutating method on a copy-on-write struct to suddenly store self aside and start mutating that copy.
owned is a good default for other parameters. It has some minor performance disadvantages (unnecessary retains if you have an unoptimizable call in a loop) and some minor code size benefits (in common straight-line code), but frankly, both of those points pale in importance to the ability to transfer copy-on-write structures around without spuriously increasing reference counts. It doesn't take too many unnecessary structural copies before any amount of reference-counting traffic (especially the Swift-native reference-counting used in copy-on-write structures) is basically irrelevant in comparison.
There's no major semantic split in result conventions like that between pass-by-reference and pass-by-value. In most languages, a function has to return a value (or nothing). There are languages like C++ where functions can return references, but that's inherently limited, because the reference has to refer to something that exists outside the function. If Swift ever adds a similar language mechanism, it'll have to be memory-safe and extremely opaque, and it'll be easy to just think of that as a kind of weird value result. So we'll just consider value results here.
Value results raise some of the same ownership-transfer questions as value arguments. There's one major limitation: just like a by-reference result, an actual unowned convention is inherently limited, because something else other than the result value must be keeping it valid. So that's off the table for Swift.
What Objective-C does is something more dynamic. Most APIs in Objective-C give you a very ephemeral guarantee about the validity of the result: it's valid now, but you shouldn't count on it being valid indefinitely later. This might be because the result is actually owned by some other object somewhere, or it might be because the result has been placed in the autorelease pool, a thread-local data structure which will (when explicitly drained by something up the call chain) eventually release that's been put into it. This autorelease pool can be a major source of spurious memory growth, and in classic manual reference-counting it was important to drain it fairly frequently. ARC's response to this convention was to add an optimization which attempts to prevent things from ending up in the autorelease pool; the net effect of this optimization is that ARC ends up with an owned reference regardless of whether the value was autoreleased. So in effect, from ARC's perspective, these APIs still return an owned reference, mediated through some extra runtime calls to undo the damage of the convention.
So there's really no compelling alternative to an owned return convention as the default in Swift.
The lowest abstraction level for a calling convention is the actual "physical" rules for the call:
- where the caller should place argument values in registers and memory before the call,
- how the callee should pass back the return values in registers and/or memory after the call, and
- what invariants hold about registers and memory over the call.
In theory, all of these could be changed in the Swift ABI. In practice, it's best to avoid changes to the invariant rules, because those rules could complicate Swift-to-C interoperation:
- Assuming a higher stack alignment would require dynamic realignment whenever Swift code is called from C.
- Assuming a different set of callee-saved registers would require additional saves and restores when either Swift code calls C or is called from C, depending on the exact change. That would then inhibit some kinds of tail call.
So we will limit ourselves to considering the rules for allocating parameters and results to registers. Our platform C ABIs are usually quite good at this, and it's fair to ask why Swift shouldn't just use C's rules. There are three general answers:
- Platform C ABIs are specified in terms of the C type system, and the Swift type system allows things to be expressed which don't have direct analogues in C (for example, enums with payloads).
- The layout of structures in Swift does not necessarily match their layout in C, which means that the C rules don't necessarily cover all the cases in Swift.
- Swift places a larger emphasis on first-class structs than C does. C ABIs often fail to allocate even small structs to registers, or use inefficient registers for them, and we would like to be somewhat more aggressive than that.
Accordingly, the Swift ABI is defined largely in terms of lowering: a Swift function signature is translated to a C function signature with all the aggregate arguments and results eliminated (possibly by deciding to pass them indirectly). This lowering will be described in detail in the final section of this whitepaper.
However, there are some specific circumstances where we'd like to deviate from the platform ABI:
As mentioned above, Swift puts a lot of focus on first-class value types. As part of this, it's very valuable to be able to return common value types fully in registers instead of indirectly. The magic number here is three: it's very common for copy-on-write value types to want about three pointers' worth of data, because that's just enough for some sort of owner pointer plus a begin/end pair.
Unfortunately, many common C ABIs fall slightly short of that. Even those ABIs that do allow small structs to be returned in registers tend to only allow two pointers' worth. So in general, Swift would benefit from a very slightly-tweaked calling convention that allocates one or two more registers to the result.
There are several language features in Swift which require implicit parameters:
Swift's function types are "thick" by default, meaning that a function value carries an optional context object which is implicitly passed to the function when it is called. This context object is reference-counted, and it should be passed guaranteed for straightforward reasons:
- It's not uncommon for closures to be called many times, in which case an owned convention would be unnecessarily expensive.
- While it's easy to imagine a closure which would want to take responsibility for its captured values, giving it responsibility for a retain of the context object doesn't generally allow that. The closure would only be able to take ownership of the captured values if it had responsibility for a unique reference to the context. So the closure would have to be written to do different things based on the uniqueness of the reference, and it would have to be able to tear down and deallocate the context object after stealing values from it. The optimization just isn't worth it.
- It's usually straightforward for the caller to guarantee the validity of the context reference; worst case, a single extra Swift-native retain/release is pretty cheap. Meanwhile, not having that guarantee would force many closure functions to retain their contexts, since many closures do multiple things with values from the context object. So unowned would not be a good convention.
Many functions don't actually need a context, however; they are naturally "thin". It would be best if it were possible to construct a thick function directly from a thin function without having to introduce a thunk just to move parameters around the missing context parameter. In the worst case, a thunk would actually require the allocation of a context object just to store the original function pointer; but that's only necessary when converting from a completely opaque function value. When the source function is known statically, which is far more likely, the thunk can just be a global function which immediately calls the target with the correctly shuffled arguments. Still, it'd be better to be able to avoid creating such thunks entirely.
In order to reliably avoid creating thunks, it must be possible for code invoking an opaque thick function to pass the context pointer in a way that can be safely and implicitly ignored if the function happens to actually be thin. There are two ways to achieve this:
The context can be passed as the final parameter. In most C calling conventions, extra arguments can be safely ignored; this is because most C calling conventions support variadic arguments, and such conventions inherently can't rely on the callee knowing the extent of the arguments.
However, this is sub-optimal because the context is often used repeatedly in a closure, especially at the beginning, and putting it at the end of the argument list makes it more likely to be passed on the stack.
The context can be passed in a register outside of the normal argument sequence. Some ABIs actually even reserve a register for this purpose; for example, on x86-64 it's %r10. Neither of the ARM ABIs do, however.
Having an out-of-band register would be the best solution.
(Surprisingly, the ownership transfer convention for the context doesn't actually matter here. You might think that an owned convention would be prohibited, since the callee would fail to release the context and would therefore leak it. However, a thin function should always have a nil context, so this would be harmless.)
Either solution works acceptably with curried partial application, since the inner parameters can be left in place while transforming the context into the outer parameters. However, an owned convention would either prevent the uncurrying forwarder from tail-calling the main function or force all the arguments to be spilled. Neither is really acceptable; one more argument against an owned convention. (This is another example where guaranteed works quite nicely, since the guarantees are straightforward to extend to the main function.)
Methods (both static and instance) require a self parameter. In all of these cases, it's reasonable to expect that self will used frequently, so it's best to pass it in a register. Also, many methods call other methods on the same object, so it's also best if the register storing self is stable across different method signatures.
In static methods on value types, self doesn't require any dynamic information: there's only one value of the metatype, and there's usually no point in passing it.
In static methods on class types, self is a reference to the class metadata, a single pointer. This is necessary because it could actually be the class object of a subclass.
In instance methods on class types, self is a reference to the instance, again a single pointer.
In mutating instance methods on value types, self is the address of an object.
In non-mutating instance methods on value types, self is a value; it may require multiple registers, or none, or it may need to be passed indirectly.
All of these cases except mutating instance methods on value types can be partially applied to create a function closure whose type is the formal type of the method. That is, if class A has a method declared func foo(x: Int) -> Double, then A.foo yields a function of type (Int) -> Double. Assuming that we continue to feel that this is a useful language feature, it's worth considered how we could support it efficiently. The expenses associated with a partial application are (1) the allocation of a context object and (2) needing to introduce a thunk to forward to the original function. All else aside, we can avoid the allocation if the representation of self is compatible with the representation of a context object reference; this is essentially true only if self is a class instance using Swift reference counting. Avoiding the thunk is possible only if we successfully avoided the allocation (since otherwise a thunk is required in order to extract the correct self value from the allocated context object) and self is passed in exactly the same manner as a closure context would be.
It's unclear whether making this more efficient would really be worthwhile on its own, but if we do support an out-of-band context parameter, taking advantage of it for methods is essentially trivial.
The calling convention implications of Swift's error handling design aren't yet settled. It may involve extra parameters; it may involve extra return values. Considerations:
Callers will generally need to immediately check for an error. Being able to quickly check a register would be extremely convenient.
If the error is returned as a component of the result value, it shouldn't be physically combined with the normal result. If the normal result is returned in registers, it would be unfortunate to have to do complicated logic to test for error. If the normal result is returned indirectly, contorting the indirect result with the error would likely prevent the caller from evaluating the call in-place.
It would be very convenient to be able to trivially turn a function which can't produce an error into a function which can. This is an operation that we expect higher-order code to have do frequently, if it isn't completely inlined away. For example:
// foo() expects its argument to follow the conventions of a // function that's capable of throwing. func foo(fn: () throws -> ()) throwsIf(fn) // Here we're passing foo() a function that can't throw; this is // allowed by the subtyping rules of the language. We'd like to be // able to do this without having to introduce a thunk that maps // between the conventions. func bar(fn: () -> ()) { foo(fn) }
We'll consider two ways to satisfy this.
The first is to pass a pointer argument that doesn't interfere with the normal argument sequence. The caller would initialize the memory to a zero value. If the callee is a throwing function, it would be expected to write the error value into this argument; otherwise, it would naturally ignore it. Of course, the caller then has to load from memory to see whether there's an error. This would also either consume yet another register not in the normal argument sequence or have to be placed at the end of the argument list, making it more likely to be passed on the stack.
The second is basically the same idea, but using a register that's otherwise callee-save. The caller would initialize the register to a zero value. A throwing function would write the error into it; a non-throwing function would consider it callee-save and naturally preserve it. It would then be extremely easy to check it for an error. Of course, this would take away a callee-save register in the caller when calling throwing functions. Also, if the caller itself isn't throwing, it would have to save and restore that register.
Both solutions would allow tail calls, and the zero store could be eliminated for direct calls to known functions that can throw. The second is the clearly superior solution, but definitely requires more work in the backend.
By default, Swift is resilient about default arguments and treats them as essentially one part of the implementation of the function. This means that, in general, a caller using a default argument must call a function to emit the argument, instead of simply inlining that emission directly into the call.
These default argument generation functions are unlike any other because they have very precise information about how their result will be used: it will be placed into a specific position in specific argument list. The only reason the caller would ever want to do anything else with the result is if it needs to spill the value before emitting the call.
Therefore, in principle, it would be really nice if it were possible to tell these functions to return in a very specific way, e.g. to return two values in the second and third argument registers, or to return a value at a specific location relative to the stack pointer (although this might be excessively constraining; it would be reasonable to simply opt into an indirect return instead). The function should also preserve earlier argument registers (although this could be tricky if the default argument generator is in a generic context and therefore needs to be passed type-argument information).
This enhancement is very easy to postpone because it doesn't affect any basic language mechanics. The generators are always called directly, and they're inherently attached to a declaration, so it's quite easy to take any particular generator and compatibly enhance it with a better convention.
Most of the platforms we support have pretty good C calling conventions. The exceptions are i386 (for the iOS simulator) and ARM32 (for iOS). We really, really don't care about i386, but iOS on ARM32 is still an important platform. Switching to a better physical calling convention (only for calls from Swift to Swift, of course) would be a major improvement.
It would be great if this were as simple as flipping a switch, but unfortunately the obvious convention to switch to (AAPCS-VFP) has a slightly different set of callee-save registers: iOS treats r9 as a scratch register. So we'd really want a variant of AAPCS-VFP that did the same. We'd also need to make sure that SJ/LJ exceptions weren't disturbed by this calling convention; we aren't really supporting exception propagation through Swift frames, but completely breaking propagation would be unfortunate, and we may need to be able to catch exceptions.
So this would also require some amount of additional support from the backend.
Function signatures in Swift are lowered in two phases.
The first phase is a high-level semantic lowering, which does a number of things:
It determines a high-level calling convention: specifically, whether the function must match the C calling convention or the Swift calling convention.
It decides the types of the parameters:
- Functions exported for the purposes of C or Objective-C may need to use bridged types rather than Swift's native types. For example, a function that formally returns Swift's String type may be bridged to return an NSString reference instead.
- Functions which are values, not simply immediately called, may need their types lowered to follow to match a specific generic abstraction pattern. This applies to functions that are parameters or results of the outer function signature.
It identifies specific arguments and results which must be passed indirectly:
- Some types are inherently address-only:
- The address of a weak reference must be registered with the runtime at all times; therefore, any struct with a weak field must always be passed indirectly.
- An existential type (if not class-bounded) may contain an inherently address-only value, or its layout may be sensitive to its current address.
- A value type containing an inherently address-only type as a field or case payload becomes itself inherently address-only.
- Some types must be treated as address-only because their layout is
not known statically:
- The layout of a resilient value type may change in a later release; the type may even become inherently address-only by adding a weak reference.
- In a generic context, the layout of a type may be dependent on a type parameter. The type parameter might even be inherently address-only at runtime.
- A value type containing a type whose layout isn't known statically itself generally will not have a layout that can be known statically.
- Other types must be passed or returned indirectly because the function type uses an abstraction pattern that requires it. For example, a generic map function expects a function that takes a T and returns a U; the generic implementation of map will expect these values to be passed indirectly because their layout isn't statically known. Therefore, the signature of a function intended to be passed as this argument must pass them indirectly, even if they are actually known statically to be non-address-only types like (e.g.) Int and Float.
- Some types are inherently address-only:
It expands tuples in the parameter and result types. This is done at this level both because it is affected by abstraction patterns and because different tuple elements may use different ownership conventions. (This is most likely for imported APIs, where it's the tuple elements that correspond to specific C or Objective-C parameters.)
This completely eliminates top-level tuple types from the function signature except when they are a target of abstraction and thus are passed indirectly. (A function with type (Float, Int) -> Float can be abstracted as (T) -> U, where T == (Float, Int).)
It determines ownership conventions for all parameters and results.
After this phase, a function type consists of an abstract calling convention, a list of parameters, and a list of results. A parameter is a type, a flag for indirectness, and an ownership convention. A result is a type, a flag for indirectness, and an ownership convention. (Results need ownership conventions only for non-Swift calling conventions.) Types will not be tuples unless they are indirect.
Semantic lowering may also need to mark certain parameters and results as special, for the purposes of the special-case physical treatments of self, closure contexts, and error results.
The second phase of lowering translates a function type produced by semantic lowering into a C function signature. If the function involves a parameter or result with special physical treatment, physical lowering initially ignores this value, then adds in the special treatment as agreed upon with the backend.
Central to the operation of the physical-lowering algorithm is the generic expansion algorithm. This algorithm turns any non-address-only Swift type in a sequence of zero or more legal type, where a legal type is either:
- an integer type, with a power-of-two size no larger than the maximum integer size supported by C on the target,
- a floating-point type supported by the target, or
- a vector type supported by the target.
Obviously, this is target-specific. The target also specifies a maximum voluntary integer size. The legal type sequence only contains vector types or integer types larger than the maximum voluntary size when the type was explicit in the input.
Pointers are represented as integers in the legal type sequence. We assume there's never a reason to differentiate them in the ABI as long as the effect of address spaces on pointer size is taken into account. If that's not true, this algorithm should be adjusted.
The result of the algorithm also associates each legal type with an offset. This information is sufficient to reconstruct an object in memory from a series of values and vice-versa.
The algorithm proceeds in two steps.
First, the type is recursively analyzed to produce a typed layout. A typed layout associates ranges of bytes with either (1) a legal type (whose storage size must match the size of the associated byte range), (2) the special type opaque, or (3) the special type empty. Adjacent ranges mapped to opaque or empty can be combined.
For most of the types in Swift, this process is obvious: they either correspond to an obvious legal type (e.g. thick metatypes are pointer-sized integers), or to an obvious sequence of scalars (e.g. class existentials are a sequence of pointer-sized integers). Only a few cases remain:
Integer types that are not legal types should be mapped as opaque.
Vector types that are not legal types should be broken into smaller vectors, if their size is an even multiple of a legal vector type, or else broken into their components. (This rule may need some tinkering.)
Tuples and structs are mapped by merging the typed layouts of the fields, as padded out to the extents of the aggregate with empty-mapped ranges. Note that, if fields do not overlap, this is equivalent to concatenating the typed layouts of the fields, in address order, mapping internal padding to empty. Bit-fields should map the bits they occupy to opaque.
For example, given the following struct type:
struct FlaggedPair { var flag: Bool var pair: (MyClass, Float) }
If Swift performs naive, C-like layout of this structure, and this is a 64-bit platform, typed layout is mapped as follows:
FlaggedPair.flag := [0: i1, ] FlaggedPair.pair := [ 8-15: i64, 16-19: float] FlaggedPair := [0: i1, 8-15: i64, 16-19: float]
If Swift instead allocates flag into the spare (little-endian) low bits of pair.0, the typed layout map would be:
FlaggedPair.flag := [0: i1 ] FlaggedPair.pair := [0-7: i64, 8-11: float] FlaggedPair := [0-7: opaque, 8-11: float]
Unions (imported from C) are mapped by merging the typed layouts of the fields, as padded out to the extents of the aggregate with empty-mapped ranges. This will often result in a fully-opaque mapping.
Enums are mapped by merging the typed layouts of the cases, as padded out to the extents of the aggregate with empty-mapped ranges. A case's typed layout consists of the typed layout of the case's directly-stored payload (if any), merged with the typed layout for its discriminator. We assume that checking for a discriminator involves a series of comparisons of bits extracted from non-overlapping ranges of the value; the typed layout of a discriminator maps all these bits to opaque and the rest to empty.
For example, given the following enum type:
enum Sum { case Yes(MyClass) case No(Float) case Maybe }
If Swift, in its infinite wisdom, decided to lay this out sequentially, and to use invalid pointer values the class to indicate that the other cases are present, the layout would look as follows:
Sum.Yes.payload := [0-7: i64 ] Sum.Yes.discriminator := [0-7: opaque ] Sum.Yes := [0-7: opaque ] Sum.No.payload := [ 8-11: float] Sum.No.discriminator := [0-7: opaque ] Sum.No := [0-7: opaque, 8-11: float] Sum.Maybe := [0-7: opaque ] Sum := [0-7: opaque, 8-11: float]
If Swift instead chose to just use a discriminator byte, the layout would look as follows:
Sum.Yes.payload := [0-7: i64 ] Sum.Yes.discriminator := [ 8: opaque] Sum.Yes := [0-7: i64, 8: opaque] Sum.No.payload := [0-3: float ] Sum.No.discriminator := [ 8: opaque] Sum.No := [0-3: float, 8: opaque] Sum.Maybe := [ 8: opaque] Sum := [0-8: opaque ]
If Swift chose to use spare low (little-endian) bits in the class pointer, and to offset the float to make this possible, the layout would look as follows:
Sum.Yes.payload := [0-7: i64 ] Sum.Yes.discriminator := [0: opaque ] Sum.Yes := [0-7: opaque ] Sum.No.payload := [ 4-7: float] Sum.No.discriminator := [0: opaque ] Sum.No := [0: opaque, 4-7: float] Sum.Maybe := [0: opaque ] Sum := [0-7: opaque ]
The merge algorithm for typed layouts is as follows. Consider two typed layouts L and R. A range from L is said to conflict with a range from R if they intersect and they are mapped as different non-empty types. If two ranges conflict, and either range is mapped to a vector, replace it with mapped ranges for the vector elements. If two ranges conflict, and neither range is mapped to a vector, map them both to opaque, combining them with adjacent opaque ranges as necessary. If a range is mapped to a non-empty type, and the bytes in the range are all mapped as empty in the other map, add that range-mapping to the other map. L and R should now match perfectly; this is the result of the merge. Note that this algorithm is both associative and commutative.
Once the typed layout is constructed, it can be turned into a legal type sequence.
Note that this transformation is sensitive to the offsets of ranges in the complete type. It's possible that the simplifications described here could be integrated directly into the construction of the typed layout without changing the results, but that's not yet proven.
In all of these examples, the maximum voluntary integer size is 4 (i32) unless otherwise specified.
If any range is mapped as a non-empty, non-opaque type, but its start offset is not a multiple of its natural alignment, remap it as opaque. For these purposes, the natural alignment of an integer type is the minimum of its size and the maximum voluntary integer size; the natural alignment of any other type is its C ABI type. Combine adjacent opaque ranges.
For example:
[1-2: i16, 4: i8, 6-7: i16] ==> [1-2: opaque, 4: i8, 6-7: i16]
If any range is mapped as an integer type that is not larger than the maximum voluntary size, remap it as opaque. Combine adjacent opaque ranges.
For example:
[1-2: opaque, 4: i8, 6-7: i16] ==> [1-2: opaque, 4: opaque, 6-7: opaque] [0-3: i32, 4-11: i64, 12-13: i16] ==> [0-3: opaque, 4-11: i64, 12-13: opaque]
An aligned storage unit is an N-byte-aligned range of N bytes, where N is a power of 2 no greater than the maximum voluntary integer size. A maximal aligned storage unit has a size equal to the maximum voluntary integer size.
Note that any remaining ranges mapped as integers must fully occupy multiple maximal aligned storage units.
Split all opaque ranges at the boundaries of maximal aligned storage units. From this point on, never combine adjacent opaque ranges across these boundaries.
For example:
[1-6: opaque] ==> [1-3: opaque, 4-6: opaque]
Within each maximal aligned storage unit, find the smallest aligned storage unit which contains all the opaque ranges. Replace the first opaque range in the maximal aligned storage unit with a mapping from that aligned storage unit to an integer of the aligned storage unit's size. Remove any other opaque ranges in the maximal aligned storage unit. Note that this can create overlapping ranges in some cases. For this purposes of this calculation, the last maximal aligned storage unit should be considered "full", as if the type had an infinite amount of empty tail-padding.
For example:
[1-2: opaque] ==> [0-3: i32] [0-1: opaque] ==> [0-1: i16] [0: opaque, 2: opaque] ==> [0-3: i32] [0-9: fp80, 10: opaque] ==> [0-9: fp80, 10: i8] // If maximum voluntary size is 8 (i64): [0-9: fp80, 11: opaque, 13: opaque] ==> [0-9: fp80, 8-15: i64]
(This assumes that fp80 is a legal type for illustrative purposes. It would probably be a better policy for the actual x86-64 target to consider it illegal and treat it as opaque from the start, at least when lowering for the Swift calling convention; for C, it is important to produce an fp80 mapping for ABI interoperation with C functions that take or return long double by value.)
The final legal type sequence is the sequence of types for the non-empty ranges in the map. The associated offset for each type is the offset of the start of the corresponding range.
Only the final step can introduce overlapping ranges, and this is only possible if there's a non-integer legal type which:
- has a natural alignment less than half of the size of the maximum voluntary integer size or
- has a store size is not a multiple of half the size of the maximum voluntary integer size.
On our supported platforms, these conditions are only true on x86-64, and only of long double.
Given the address of an object and a legal type sequence for its type, it's straightforward to load a valid sequence or store the sequence back into memory. For the most part, it's sufficient to simply load or store each value at its appropriate offset. There are two subtleties:
- If the legal type sequence had any overlapping ranges, the integer values should be stored first to prevent overwriting parts of the other values they overlap.
- Care must be taken with the final values in the sequence; integer values may extend slightly beyond the ordinary storage size of the argument type. This is usually easy to compensate for.
The value sequence essentially has the same semantics that the value in memory would have: any bits that aren't part of the actual representation of the original type have a completely unspecified value.
As mentioned before, in principle the process of physical lowering turns a semantically-lowered Swift function type (in implementation terms, a SILFunctionType) into a C function signature, which can then be lowered according to the usual rules for the ABI. This is, in fact, what we do when trying to match a C calling convention. However, for the native Swift calling convention, because we actively want to use more aggressive rules for results, we instead build an LLVM function type directly. We first construct a direct result type that we're certain the backend knows how to interpret according to our more aggressive desired rules, and then we use the expansion algorithm to construct a parameter sequence consisting solely of types with obvious ABI lowering that the backend can reliably handle. This bypasses the need to consult Clang for our own native calling convention.
We have this generic expansion algorithm, but it's important to understand that the physical lowering process does not just naively use the results of this algorithm. The expansion algorithm will happily expand an arbitrary structure; if that structure is very large, the algorithm might turn it into hundreds of values. It would be foolish to pass it as an argument that way; it would use up all the argument registers and basically turn into a very inefficient memcpy, and if the caller wanted it all in one place, they'd have to very painstakingly reassemble. It's much better to pass large structures indirectly. And with result values, we really just don't have a choice; there's only so many registers you can use before you have to give up and return indirectly. Therefore, even in the Swift native convention, the expansion algorithm is basically used as a first pass. A second pass then decides whether the expanded sequence is actually reasonable to pass directly.
Recall that one aspect of the semantically-lowered Swift function type is whether we should be matching the C calling convention or not. The following algorithm here assumes that the importer and semantic lowering have conspired in a very particular way to make that possible. Specifically, we assume is that an imported C function type, lowered semantically by Swift, will follow some simple structural rules:
- If there was a by-value struct or union parameter or result in the imported C type, it will correspond to a by-value direct parameter or return type in Swift, and the Swift type will be a nominal type whose declaration links back to the original C declaration.
- Any other parameter or result will be transformed by the importer and semantic lowering to a type that the generic expansion algorithm will expand to a single legal type whose representation is ABI-compatible with the original parameter. For example, an imported pointer type will eventually expand to an integer of pointer size.
- There will be at most one result in the lowered Swift type, and it will be direct.
Given this, we go about lowering the function type as follows. Recall that, when matching the C calling convention, we're building a C function type; but that when matching the Swift native calling convention, we're building an LLVM function type directly.
The first step is to consider the results of the function.
There's a different set of rules here when we're matching the C calling convention. If there's a single direct result type, and it's a nominal type imported from Clang, then the result type of the C function type is that imported Clang type. Otherwise, concatenate the legal type sequences from the direct results. If this yields an empty sequence, the result type is void. If it yields a single legal type, the result type is the corresponding Clang type. No other could actually have come from an imported C declaration, so we don't have any real compatibility requirements; for the convenience of interoperation, this is handled by constructing a new C struct which contains the corresponding Clang types for the legal type sequence as its fields.
Otherwise, we are matching the Swift calling convention. Concatenate the legal type sequences from all the direct results. If target-specific logic decides that this is an acceptable collection to return directly, construct the appropriate IR result type to convince the backend to handle it. Otherwise, use the void IR result type and return the "direct" results indirectly by passing the address of a tuple combining the original direct results (not the types from the legal type sequence).
Finally, any indirect results from the semantically-lowered function type are simply added as pointer parameters.
After all the results are collected, it's time to collect the parameters. This is done one at the time, from left to right, adding parameters to our physically-lowered type.
If semantic lowering has decided that we have to pass the parameter indirectly, we simply add a pointer to the type. This covers both mandatory-indirect pass-by-value parameters and pass-by-reference parameters. The latter can arise even in C and Objective-C.
Otherwise, the rules are somewhat different if we're matching the C calling convention. If the parameter is a nominal type imported from Clang, then we just add the imported Clang type to the Clang function type as a parameter. Otherwise, we derive the legal type sequence for the parameter type. Again, we should only have compatibility requirements if the legal type sequence has a single element, but for the convenience of interoperation, we collect the corresponding Clang types for all of the elements of the sequence.
Finally, if we're matching the Swift calling convention, derive the legal type sequence. If the result appears to be a reasonably small and efficient set of parameters, add their corresponding IR types to the function type we're building; otherwise, ignore the legal type sequence and pass the address of the original type indirectly.
Considerations for whether a legal type sequence is reasonable to pass directly:
- There probably ought to be a maximum size. Unless it's a single 256-bit vector, it's hard to imagine wanting to pass more than, say, 32 bytes of data as individual values. The callee may decide that it needs to reconstruct the value for some reason, and the larger the type gets, the more expensive this is. It may also be reasonable for this cap to be lower on 32-bit targets, but that might be dealt with better by the next restriction.
- There should also be a cap on the number of values. A 32-byte limit might be reasonable for passing 4 doubles. It's probably not reasonable for passing 8 pointers. That many values will exhaust all the parameter registers for just a single value. 4 is probably a reasonable cap here.
- There's no reason to require the data to be homogeneous. If a struct contains three floats and a pointer, why force it to be passed in memory?
When all of the parameters have been processed in this manner, the function type is complete.