Contents
Commentary
In addition to the main spec, there are lots of open ended questions, justification, and ideas of what best practices should be. That random discussion is placed in boxes like this one to clarify what is normative and what is discussion.
This is the language reference manual for the Swift language, which is highly volatile and constantly under development. As the prototype evolves, this document should be kept up to date with what is actually implemented.
The grammar and structure of the language is defined in BNF form in yellow boxes. Examples are shown in gray boxes, and assume that the standard library is in use (unless otherwise specified).
Commentary
A non-goal of the Swift project in general is to become some amazing research project. We really want to focus on delivering a real product, and having the design and spec co-evolve.
In no particular order, and not explained well:
- Support building great frameworks and applications, with a specific focus on permiting rich and powerful APIs.
- Get the defaults right: this reduces the barrier to entry and increases the odds that the right thing happens.
- Through our support for building great APIs, we aim to provide an expressive and productive language that is fun to program in.
- Support low-level system programming. We should want to write compilers, operating system kernels, and media codecs in Swift. This means that being able to obtain high performance is really quite important.
- Provide really great tools, like an IDE, debugger, profiling, etc.
- Where possible, steal great ideas instead of innovating new things that will work out in unpredictable ways. It turns out that there are a lot of good ideas already out there.
- Memory safe by default: array overrun errors, uninitialized values, and other problems endemic to C should not occur in Swift, even if it means some amount of runtime overhead. Eventually these checks will be disablable for people who want ultimate performance in production builds.
- Efficiently implementable with a static compiler: runtime compilation is great technology and Swift may eventually get a runtime optimizer, but it is a strong goal to be able to implement swift with just a static compiler.
- Interoperate as transparently as possible with C, Objective-C, and C++ without having to write an equivalent of "extern C" for every referenced definition.
- Great support for efficient by-value types.
- Elegant and natural syntax, aiming to be familiar and easy to transition to for "C" people. Differences from the C family should only be done when it provides a significant win (e.g. eliminate declarator syntax).
- Lots of other stuff too.
A smaller wishlist goal is to support embedded sub-languages in swift, so that we don't get the OpenCL-is-like-C-but-very-different-in-many-details problem.
Commentary
Pushing as much of the language as realistic out of the compiler and into the library is generally good for a few reasons: 1) we end up with a smaller core language. 2) we force the language that is left to be highly expressive and extensible. 3) this highly expressive language core can then be used to build a lot of other great libraries, hopefully many we can't even anticipate at this point.
The basic approach in designing and implementing the Swift prototype was to start at the very bottom of the stack (simple expressions and the trivial bits of the type system) and incrementally build things up one brick at a time. There is a big focus on making things as simple as possible and having a clean internal core. Where it makes sense, sugar is added on top to make the core more expressive for common situations.
One major aspect that dovetails with expressivity, learnability, and focus on
API development is that much of the language is implemented in a :ref:`standard
library <langref.stdlib>` (inspired in part by the Haskell Standard Prelude).
This means that things like Int
and Void
are not part of the language
itself, but are instead part of the standard library.
Commentary
Because Swift doesn't rely on a C-style "lexer hack" to know what is a type and what is a value, it is possible to fully parse a file without resolving import declarations.
Swift has a strict separation between its phases of translation, and the compiler follows a conceptually simple design. The phases of translation are:
- :ref:`Lexing <langref.lexical>`: A source file is broken into tokens
according to a (nearly,
/**/
comments can be nested) regular grammar. - Parsing and AST Building: The tokens are parsed according to the grammar set out below. The grammar is context free and does not require any "type feedback" from the lexer or later stages. During parsing, name binding for references to local variables and other declarations that are not at module (and eventually namespace) scope are bound.
- :ref:`Name Binding <langref.namebind>`: At this phase, references to non-local types and values are bound, and :ref:`import directives <langref.decl.import>` are both validated and searched. Name binding can cause recursive compilation of modules that are referenced but not yet built.
- :ref:`Type Checking <langref.typecheck>`: During this phase all types are resolved within value definitions, :ref:`function application <langref.expr.call>` and <a href="#expr-infix">binary expressions</a> are found and formed, and overloaded functions are resolved.
- Code Generation: The AST is converted the LLVM IR, optimizations are performed, and machine code generated.
- Linking: runtime libraries and referenced modules are linked in.
FIXME: "import Swift" implicitly added as the last import in a source file.
Commentary
Not all characters are "taken" in the language, this is because it is still growing. As there becomes a reason to assign things into the identifier or punctuation bucket, we will do so as swift evolves.
The lexical structure of a Swift file is very simple: the files are tokenized
according to the following productions and categories. As is usual with most
languages, tokenization uses the maximal munch rule and whitespace separates
tokens. This means that "a b
" and "ab
" lex into different token
streams and are therefore different in the grammar.
Commentary
Nested block comments are important because we don't have the nestable #if
0
hack from C to rely on.
whitespace ::= ' '
whitespace ::= '\n'
whitespace ::= '\r'
whitespace ::= '\t'
whitespace ::= '\0'
comment ::= //.*[\n\r]
comment ::= /* .... */
Space, newline, tab, and the nul byte are all considered whitespace and are
discarded, with one exception: a '(
' or '[
' which does not follow a
non-whitespace character is different kind of token (called spaced)
from one which does not (called unspaced). A '(
' or '[
' at the
beginning of a file is spaced.
Comments may follow the BCPL style, starting with a "//
" and running to the
end of the line, or may be recursively nested /**/
style comments. Comments
are ignored and treated as whitespace.
Commentary
Note that ->
is used for function types () -> Int
, not pointer
dereferencing.
punctuation ::= '('
punctuation ::= ')'
punctuation ::= '{'
punctuation ::= '}'
punctuation ::= '['
punctuation ::= ']'
punctuation ::= '.'
punctuation ::= ','
punctuation ::= ';'
punctuation ::= ':'
punctuation ::= '='
punctuation ::= '->'
punctuation ::= '&' // unary prefix operator
These are all reserved punctuation that are lexed into tokens. Most other non-alphanumeric characters are matched as :ref:`operators <langref.lexical.operator>`. Unlike operators, these tokens are not overloadable.
Commentary
The number of keywords is reduced by pushing most functionality into the
library (e.g. "builtin" datatypes like Int
and Bool
). This allows us
to add new stuff to the library in the future without worrying about
conflicting with the user's namespace.
// Declarations and Type Keywords
keyword ::= 'class'
keyword ::= 'destructor'
keyword ::= 'extension'
keyword ::= 'import'
keyword ::= 'init'
keyword ::= 'func'
keyword ::= 'enum'
keyword ::= 'protocol'
keyword ::= 'struct'
keyword ::= 'subscript'
keyword ::= 'Type'
keyword ::= 'typealias'
keyword ::= 'var'
keyword ::= 'where'
// Statements
keyword ::= 'break'
keyword ::= 'case'
keyword ::= 'continue'
keyword ::= 'default'
keyword ::= 'do'
keyword ::= 'else'
keyword ::= 'if'
keyword ::= 'in'
keyword ::= 'for'
keyword ::= 'return'
keyword ::= 'switch'
keyword ::= 'then'
keyword ::= 'while'
// Expressions
keyword ::= 'as'
keyword ::= 'is'
keyword ::= 'new'
keyword ::= 'super'
keyword ::= 'self'
keyword ::= 'Self'
keyword ::= 'type'
keyword ::= '__COLUMN__'
keyword ::= '__FILE__'
keyword ::= '__LINE__'
These are the builtin keywords. Keywords can still be used as names via escaped identifiers <langref.lexical.escapedident>.
Swift uses several contextual keywords at various parts of the language. Contextual keywords are not reserved words, meaning that they can be used as identifiers. However, in certain contexts, they act as keywords, and are represented as such in the grammar below. The following identifiers act as contextual keywords within the language:
get
infix
mutating
nonmutating
operator
override
postfix
prefix
set
integer_literal ::= [0-9][0-9_]*
integer_literal ::= 0x[0-9a-fA-F][0-9a-fA-F_]*
integer_literal ::= 0o[0-7][0-7_]*
integer_literal ::= 0b[01][01_]*
Integer literal tokens represent simple integer values of unspecified
precision. They may be expressed in decimal, binary with the '0b
' prefix,
octal with the '0o
' prefix, or hexadecimal with the '0x
' prefix.
Unlike C, a leading zero does not affect the base of the literal.
Integer literals may contain underscores at arbitrary positions after the first digit. These underscores may be used for human readability and do not affect the value of the literal.
789 0789 1000000 1_000_000 0b111_101_101 0o755 0b1111_1011 0xFB
Commentary
We require a digit on both sides of the dot to allow lexing "4.km
" as
"4 . km
" instead of "4. km
" and for a series of dots to be an
operator (for ranges). The regex for decimal literals is same as Java, and
the one for hex literals is the same as C99, except that we do not allow a
trailing suffix that specifies a precision.
floating_literal ::= [0-9][0-9_]*\.[0-9][0-9_]*
floating_literal ::= [0-9][0-9_]*\.[0-9][0-9_]*[eE][+-]?[0-9][0-9_]*
floating_literal ::= [0-9][0-9_]*[eE][+-]?[0-9][0-9_]*
floating_literal ::= 0x[0-9A-Fa-f][0-9A-Fa-f_]*
(\.[0-9A-Fa-f][0-9A-Fa-f_]*)?[pP][+-]?[0-9][0-9_]*
Floating point literal tokens represent floating point values of unspecified precision. Decimal and hexadecimal floating-point literals are supported.
The integer, fraction, and exponent of a floating point literal may each
contain underscores at arbitrary positions after their first digits. These
underscores may be used for human readability and do not affect the value of
the literal. Each part of the floating point literal must however start with a
digit; 1._0
would be a reference to the _0
member of 1
.
1.0 1000000.75 1_000_000.75 0x1.FFFFFFFFFFFFFp1022 0x1.FFFF_FFFF_FFFF_Fp1_022
character_literal ::= '[^'\\\n\r]|character_escape'
character_escape ::= [\]0 [\][\] | [\]t | [\]n | [\]r | [\]" | [\]'
character_escape ::= [\]x hex hex
character_escape ::= [\]u hex hex hex hex
character_escape ::= [\]U hex hex hex hex hex hex hex hex
hex ::= [0-9a-fA-F]
character_literal
tokens represent a single character, and are surrounded
by single quotes.
The ASCII and Unicode character escapes:
\0 == nul
\n == new line
\r == carriage return
\t == horizontal tab
\u == small Unicode code points
\U == large Unicode code points
\x == raw ASCII byte (less than 0x80)
Commentary
FIXME: Forcing +
to concatenate strings is somewhat gross, a proper protocol
would be better.
string_literal ::= ["]([^"\\\n\r]|character_escape|escape_expr)*["]
escape_expr ::= [\]escape_expr_body
escape_expr_body ::= [(]escape_expr_body[)]
escape_expr_body ::= [^\n\r"()]
string_literal
tokens represent a string, and are surrounded by double
quotes. String literals cannot span multiple lines.
String literals may contain embedded expressions in them (known as "interpolated expressions") subject to some specific lexical constraints: the expression may not contain a double quote ["], newline [n], or carriage return [r]. All parentheses must be balanced.
In addition to these lexical rules, an interpolated expression must satisfy the
:ref:`expr <langref.expr>` production of the general swift grammar. This
expression is evaluated, and passed to the constructor for the inferred type of
the string literal. It is concatenated onto any fixed portions of the string
literal with a global "+
" operator that is found through normal name
lookup.
// Simple string literal. "Hello world!" // Interpolated expressions. "\(min)...\(max)" + "Result is \((4+i)*j)"
identifier ::= id-start id-continue*
// An identifier can start with an ASCII letter or underscore...
id-start ::= [A-Za-z_]
// or a Unicode alphanumeric character in the Basic Multilingual Plane...
// (excluding combining characters, which can't appear initially)
id-start ::= [\u00A8\u00AA\u00AD\u00AF\u00B2-\u00B5\u00B7-00BA]
id-start ::= [\u00BC-\u00BE\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF]
id-start ::= [\u0100-\u02FF\u0370-\u167F\u1681-\u180D\u180F-\u1DBF]
id-start ::= [\u1E00-\u1FFF]
id-start ::= [\u200B-\u200D\u202A-\u202E\u203F-\u2040\u2054\u2060-\u206F]
id-start ::= [\u2070-\u20CF\u2100-\u218F\u2460-\u24FF\u2776-\u2793]
id-start ::= [\u2C00-\u2DFF\u2E80-\u2FFF]
id-start ::= [\u3004-\u3007\u3021-\u302F\u3031-\u303F\u3040-\uD7FF]
id-start ::= [\uF900-\uFD3D\uFD40-\uFDCF\uFDF0-\uFE1F\uFE30-FE44]
id-start ::= [\uFE47-\uFFFD]
// or a non-private-use, valid code point outside of the BMP.
id-start ::= [\u10000-\u1FFFD\u20000-\u2FFFD\u30000-\u3FFFD\u40000-\u4FFFD]
id-start ::= [\u50000-\u5FFFD\u60000-\u6FFFD\u70000-\u7FFFD\u80000-\u8FFFD]
id-start ::= [\u90000-\u9FFFD\uA0000-\uAFFFD\uB0000-\uBFFFD\uC0000-\uCFFFD]
id-start ::= [\uD0000-\uDFFFD\uE0000-\uEFFFD]
// After the first code point, an identifier can contain ASCII digits...
id-continue ::= [0-9]
// and/or combining characters...
id-continue ::= [\u0300-\u036F\u1DC0-\u1DFF\u20D0-\u20FF\uFE20-\uFE2F]
// in addition to the starting character set.
id-continue ::= id-start
identifier-or-any ::= identifier
identifier-or-any ::= '_'
The set of valid identifier characters is consistent with WG14 N1518, "Recommendations for extended identifier characters for C and C++". This roughly corresponds to the alphanumeric characters in the Basic Multilingual Plane and all non-private-use code points outside of the BMP. It excludes mathematical symbols, arrows, line and box drawing characters, and private-use and invalid code points. An identifier cannot begin with one of the ASCII digits '0' through '9' or with a combining character.
The Swift compiler does not normalize Unicode source code, and matches identifiers by code points only. Source code must be normalized to a consistent normalization form before being submitted to the compiler.
// Valid identifiers foo _0 swift vernissé 闪亮 מבריק 😄 // Invalid identifiers ☃ // Is a symbol 0cool // Starts with an ASCII digit ́foo // Starts with a combining character (U+0301) // Is a private-use character (U+F8FF)
<a name="operator">operator</a> ::= [/=-+*%<>!&|^~]+
<a name="operator">operator</a> ::= \.+
<a href="#reserved_punctuation">Reserved for punctuation</a>: '.', '=', '->', and unary prefix '&'
<a href="#whitespace">Reserved for comments</a>: '//', '/*' and '*/'
operator-binary ::= operator
operator-prefix ::= operator
operator-postfix ::= operator
left-binder ::= [ \r\n\t\(\[\{,;:]
right-binder ::= [ \r\n\t\)\]\},;:]
<a name="any-identifier">any-identifier</a> ::= identifier | operator
operator-binary
, operator-prefix
, and operator-postfix
are
distinguished by immediate lexical context. An operator token is called
left-bound if it is immediately preceded by a character matching
left-binder
. An operator token is called right-bound if it is
immediately followed by a character matching right-binder
. An operator
token is an operator-prefix
if it is right-bound but not left-bound, an
operator-postfix
if it is left-bound but not right-bound, and an
operator-binary
in either of the other two cases.
As an exception, an operator immediately followed by a dot ('.
') is only
considered right-bound if not already left-bound. This allows a!.prop
to
be parsed as (a!).prop
rather than as a ! .prop
.
The '!
' operator is postfix if it is left-bound.
The '?
' operator is postfix (and therefore not the ternary operator) if it
is left-bound. The sugar form for Optional
types must be left-bound.
When parsing certain grammatical constructs that involve '<
' and '>
'
(such as <a href="#type-composition">protocol composition types</a>), an
operator
with a leading '<
' or '>
' may be split into two or more
tokens: the leading '<
' or '>
' and the remainder of the token, which
may be an operator
or punctuation
token that may itself be further
split. This rule allows us to parse nested constructs such as A<B<C>>
without requiring spaces between the closing '>
's.
dollarident ::= '$' id-continue+
Tokens that start with a $
are separate class of identifier, which are
fixed purpose names that are defined by the implementation.
identifier ::= '`' id-start id-continue* '`'
An identifier that would normally be a keyword <langref.lexical.keyword> may
be used as an identifier by wrapping it in backticks '\`
', for example:
func `class`() { /* ... */ } let `type` = 0.type
Any identifier may be escaped, though only identifiers that would normally be parsed as keywords are required to be. The backtick-quoted string must still form a valid, non-operator identifier:
let `0` = 0 // Error, "0" doesn't start with an alphanumeric let `foo-bar` = 0 // Error, '-' isn't an identifier character let `+` = 0 // Error, '+' is an operator
...
...
decl-var-head-kw ::= ('static' | 'class')? 'override'?
decl-var-head-kw ::= 'override'? ('static' | 'class')?
decl-var-head ::= attribute-list decl-var-head-kw? 'var'
decl-var ::= decl-var-head pattern initializer? (',' pattern initializer?)*
// 'get' is implicit in this syntax.
decl-var ::= decl-var-head identifier ':' type brace-item-list
decl-var ::= decl-var-head identifier ':' type '{' get-set '}'
decl-var ::= decl-var-head identifier ':' type initializer? '{' willset-didset '}'
// For use in protocols.
decl-var ::= decl-var-head identifier ':' type '{' get-set-kw '}'
get-set ::= get set?
get-set ::= set get
get ::= attribute-list ( 'mutating' | 'nonmutating' )? 'get' brace-item-list
set ::= attribute-list ( 'mutating' | 'nonmutating' )? 'set' set-name? brace-item-list
set-name ::= '(' identifier ')'
willset-didset ::= willset didset?
willset-didset ::= didset willset?
willset ::= attribute-list 'willSet' set-name? brace-item-list
didset ::= attribute-list 'didSet' set-name? brace-item-list
get-kw ::= attribute-list ( 'mutating' | 'nonmutating' )? 'get'
set-kw ::= attribute-list ( 'mutating' | 'nonmutating' )? 'set'
get-set-kw ::= get-kw set-kw?
get-set-kw ::= set-kw get-kw
var
declarations form the backbone of value declarations in Swift. A
var
declaration takes a pattern and an optional initializer, and declares
all the pattern-identifiers in the pattern as variables. If there is an
initializer and the pattern is :ref:`fully-typed <langref.types.fully_typed>`,
the initializer is converted to the type of the pattern. If there is an
initializer and the pattern is not fully-typed, the type of initializer is
computed independently of the pattern, and the type of the pattern is derived
from the initializer. If no initializer is specified, the pattern must be
fully-typed, and the values are default-initialized.
If there is more than one pattern in a var
declaration, they are each
considered independently, as if there were multiple declarations. The initial
attribute-list
is shared between all the declared variables.
A var declaration may contain a getter and (optionally) a setter, which will
be used when reading or writing the variable, respectively. Such a variable
does not have any associated storage. A var
declaration with a getter or
setter must have a type (call it T
). The getter function, whose
body is provided as part of the var-get
clause, has type () -> T
.
Similarly, the setter function, whose body is part of the var-set
clause
(if provided), has type (T) -> ()
.
If the var-set
or willset
clause contains a set-name
clause, the
identifier of that clause is used as the name of the parameter to the setter or
the observing accessor. Otherwise, the parameter name is newValue
. Same
applies to didset
clause, but the default parameter name is oldValue
.
FIXME: Should the type of a pattern which isn't fully typed affect the type-checking of the expression (i.e. should we compute a structured dependent type)?
Like all other declarations, var
s can optionally have a list of
:ref:`attributes <langref.decl.attribute_list>` applied to them.
The type of a variable must be :ref:`materializable
<langref.types.materializable>`. A variable is an lvalue unless it has a
var-get
clause but not var-set
clause.
Here are some examples of var
declarations:
// Simple examples. var a = 4 var b: Int var c: Int = 42 // This decodes the tuple return value into independently named parts // and both 'val' and 'err' are in scope after this line. var (val, err) = foo() // Variable getter/setter var _x: Int = 0 var x_modify_count: Int = 0 var x1: Int { return _x } var x2: Int { get { return _x } set { x_modify_count = x_modify_count + 1 _x = value } }
Note that get
, set
, willSet
and didSet
are context-sensitive
keywords.
static
keyword is allowed inside structs and enums, and extensions of
those.
class
keyword is allowed inside classes, class extensions, and
protocols.
Ambiguity 1
The production for implicit get
makes this grammar ambiguous. For example:
class A { func get(_: () -> Int) {} var a: Int { get { return 0 } // Getter declaration or call to 'get' with a trailing closure? } // But if this was intended as a call to 'get' function, then we have a // getter without a 'return' statement, so the code is invalid anyway. }
We disambiguate towards get-set
or willset-didset
production if the
first token after {
is the corresponding keyword, possibly preceeded by
attributes. Thus, the following code is rejected because we are expecting
{
after set
:
class A { var set: Foo var a: Int { set.doFoo() return 0 } }
Ambiguity 2
The production with initializer
and an accessor block is ambiguous. For
example:
func takeClosure(_: () -> Int) {} struct A { var willSet: Int var a: Int = takeClosure { willSet {} // A 'willSet' declaration or a call to 'takeClosure'? } }
We disambiguate towards willget-didset
production if the first token
after {
is the keyword willSet
or didSet
, possibly preceeded by
attributes.
Rationale
Even though it is possible to do further checks and speculatively parse more, it introduces unjustified complexity to cover (hopefully rare) corner cases. In ambiguous cases users can always opt-out of the trailing closure syntax by using explicit parentheses in the function call.
// Keywords can be specified in any order.
decl-func-head-kw ::= ( 'static' | 'class' )? 'override'? ( 'mutating' | 'nonmutating' )?
decl-func ::= attribute-list decl-func-head-kw? 'func' any-identifier generic-params? func-signature brace-item-list?
func
is a declaration for a function. The argument list and optional
return value are specified by the type production of the function, and the body
is either a brace expression or elided. Like all other declarations, functions
are can have attributes.
If the type is not syntactically a function type (i.e., has no ->
in it at
top-level), then the return value is implicitly inferred to be ()
. All of
the argument and return value names are injected into the <a
href="#namebind_scope">scope</a> of the function body.</p>
A function in an <a href="#decl-extension">extension</a> of some type (or
in other places that are semantically equivalent to an extension) implicitly
get a self
argument with these rules ... [todo]
static
and class
functions are only allowed in an <a
href="#decl-extension">extension</a> of some type (or in other places that are
semantically equivalent to an extension). They indicate that the function is
actually defined on the <a href="#metatype">metatype</a> for the type, not on
the type itself. Thus its implicit self
argument is actually of
metatype type.
static
keyword is allowed inside structs and enums, and extensions of those.
class
keyword is allowed inside classes, class extensions, and protocols.
TODO: Func should be an immutable name binding, it should implicitly add an attribute immutable when it exists.
TODO: Incoming arguments should be readonly, result should be implicitly writeonly when we have these attributes.
...
An argument name is a keyword argument if: - It is an argument to an initializer, or - It is an argument to a method after the first argument, or - It is preceded by a back-tick (`), or - Both a keyword argument name and an internal parameter name are specified.
subscript-head ::= attribute-list 'override'? 'subscript' pattern-tuple '->' type
decl-subscript ::= subscript-head '{' get-set '}'
// 'get' is implicit in this syntax.
decl-subscript ::= subscript-head brace-item-list
// For use in protocols.
decl-subscript ::= subscript-head '{' get-set-kw '}'
A subscript declaration provides support for <a href="#expr-subscript"> subscripting</a> an object of a particular type via a getter and (optional) setter. Therefore, subscript declarations can only appear within a type definition or extension.
The pattern-tuple
of a subscript declaration provides the indices that
will be used in the subscript expression, e.g., the i
in a[i]
. This
pattern must be fully-typed. The type
following the arrow provides the
type of element being accessed, which must be materializable. Subscript
declarations can be overloaded, so long as either the pattern-tuple
or
type
differs from other declarations.
The get-set
clause specifies the getter and setter used for subscripting.
The getter is a function whose input is the type of the pattern-tuple
and
whose result is the element type. Similarly, the setter is a function whose
result type is ()
and whose input is the type of the pattern-tuple
with a parameter of the element type added to the end of the tuple; the name
of the parameter is the set-name
, if provided, or value
otherwise.
// Simple bit vector with storage for 64 boolean values struct BitVector64 { var bits: Int64 // Allow subscripting with integer subscripts and a boolean result. subscript (bit : Int) -> Bool { // Getter tests the given bit get { return bits & (1 << bit)) != 0 } // Setter sets the given bit to the provided value. set { var mask = 1 << bit if value { bits = bits | mask } else { bits = bits & ~mask } } } } var vec = BitVector64() vec[2] = true if vec[3] { print("third bit is set") }
...
...
...
...
Commentary
The pattern grammar mirrors the expression grammar, or to be more specific, the grammar of literals. This is because the conceptual algorithm for matching a value against a pattern is to try to find an assignment of values to variables which makes the pattern equal the value. So every expression form which can be used to build a value directly should generally have a corresponding pattern form.
pattern-atom ::= pattern-var
pattern-atom ::= pattern-any
pattern-atom ::= pattern-tuple
pattern-atom ::= pattern-is
pattern-atom ::= pattern-enum-element
pattern-atom ::= expr
pattern ::= pattern-atom
pattern ::= pattern-typed
A pattern represents the structure of a composite value. Parts of a value can be extracted and bound to variables or compared against other values by pattern matching. Among other places, pattern matching occurs on the left-hand side of :ref:`var bindings <langref.decl.var>`, in the arguments of :ref:`func declarations <langref.decl.func>`, and in the <tt>case</tt> labels of :ref:`switch statements <langref.stmt.switch>`. Some examples:
var point = (1, 0, 0) // Extract the elements of the "point" tuple and bind them to // variables x, y, and z. var (x, y, z) = point print("x=\(x) y=\(y) z=\(z)") // Dispatch on the elements of a tuple in a "switch" statement. switch point { case (0, 0, 0): print("origin") // The pattern "_" matches any value. case (_, 0, 0): print("on the x axis") case (0, _, 0): print("on the y axis") case (0, 0, _): print("on the z axis") case (var x, var y, var z): print("x=\(x) y=\(y) z=\(z)") }
A pattern may be "irrefutable", meaning informally that it matches all values
of its type. Patterns in declarations, such as :ref:`var <langref.decl.var>`
and :ref:`func <langref.decl.func>`, are required to be irrefutable. Patterns
in the case
labels of :ref:`switch statements <langref.stmt.switch>`,
however, are not.
The basic pattern grammar is a literal "atom" followed by an optional type annotation. Type annotations are useful for documentation, as well as for coercing a matched expression to a particular kind. They are also required when patterns are used in a :ref:`function signature <langref.decl.func.signature>`. Type annotations are currently not allowed in switch statements.
A pattern has a type. A pattern may be "fully-typed", meaning informally that its type is fully determined by the type annotations it contains. Some patterns may also derive a type from their context, be it an enclosing pattern or the way it is used; this set of situations is not yet fully determined.
pattern-typed ::= pattern-atom ':' type
A type annotation constrains a pattern to have a specific type. An annotated pattern is fully-typed if its annotation type is fully-typed. It is irrefutable if and only if its subpattern is irrefutable.
Type annotations are currently not allowed in the case
labels of switch
statements; case patterns always get their type from the subject of the switch.
pattern-any ::= '_'
The symbol _
in a pattern matches and ignores any value. It is irrefutable.
pattern-var ::= 'let' pattern
pattern-var ::= 'var' pattern
The var
and let
keywords within a pattern introduces variable bindings.
Any identifiers within the subpattern bind new named variables to their
matching values. 'var' bindings are mutable within the bound scope, and 'let'
bindings are immutable.
var point = (0, 0, 0) switch point { // Bind x, y, z to the elements of point. case (var x, var y, var z): print("x=\(x) y=\(y) z=\(z)") } switch point { // Same. 'var' distributes to the identifiers in its subpattern. case var (x, y, z): print("x=\(x) y=\(y) z=\(z)") }
Outside of a <tt>var</tt> pattern, an identifier behaves as an :ref:`expression pattern <langref.pattern.expr>` referencing an existing definition.
var zero = 0 switch point { // x and z are bound as new variables. // zero is a reference to the existing 'zero' variable. case (var x, zero, var z): print("point off the y axis: x=\(x) z=\(z)") default: print("on the y axis") }
The left-hand pattern of a :ref:`var declaration <langref.decl.var>` and the
argument pattern of a :ref:`func declaration <langref.decl.func>` are
implicitly inside a var
pattern; identifiers in their patterns always bind
variables. Variable bindings are irrefutable.
The type of a bound variable must be :ref:`materializable
<langref.types.materializable>` unless it appears in a :ref:`func-signature
<langref.decl.func.signature>` and is directly of a inout
-annotated type.
pattern-tuple ::= '(' pattern-tuple-body? ')'
pattern-tuple-body ::= pattern-tuple-element (',' pattern-tuple-body)* '...'?
pattern-tuple-element ::= pattern
A tuple pattern is a list of zero or more patterns. Within a :ref:`function signature <langref.decl.func.signature>`, patterns may also be given a default-value expression.
A tuple pattern is irrefutable if all its sub-patterns are irrefutable.
A tuple pattern is fully-typed if all its sub-patterns are fully-typed, in
which case its type is the corresponding tuple type, where each
type-tuple-element
has the type, label, and default value of the
corresponding pattern-tuple-element
. A pattern-tuple-element
has a
label if it is a named pattern or a type annotation of a named pattern.
A tuple pattern whose body ends in '...'
is a varargs tuple. The last
element of such a tuple must be a typed pattern, and the type of that pattern
is changed from T
to T[]
. The corresponding tuple type for a varargs
tuple is a varargs tuple type.
As a special case, a tuple pattern with one element that has no label, has no default value, and is not varargs is treated as a grouping parenthesis: it has the type of its constituent pattern, not a tuple type.
pattern-is ::= 'is' type
is
patterns perform a type check equivalent to the x is T
<a
href="#expr-cast">cast operator</a>. The pattern matches if the runtime type of
a value is of the given type. is
patterns are refutable and thus cannot
appear in declarations.
class B {} class D1 : B {} class D2 : B {} var bs : B[] = [B(), D1(), D2()] for b in bs { switch b { case is B: print("B") case is D1: print("D1") case is D2: print("D2") } }
pattern-enum-element ::= type-identifier? '.' identifier pattern-tuple?
Enum element patterns match a value of <a href="#type-enum">enum type</a> if
the value matches the referenced case
of the enum. If the case
has a
type, the value of that type can be matched against an optional subpattern.
enum HTMLTag { case A(href: String) case IMG(src: String, alt: String) case BR } switch tag { case .BR: print("<br>") case .IMG(var src, var alt): print("<img src=\"\(escape(src))\" alt=\"\(escape(alt))\">") case .A(var href): print("<a href=\"\(escape(href))\">") }
Enum element patterns are refutable and thus cannot appear in declarations.
(They are currently considered refutable even if the enum contains only a
single case
.)
Patterns may include arbitrary expressions as subpatterns. Expression patterns
are refutable and thus cannot appear in declarations. An expression pattern is
compared to its corresponding value using the ~=
operator. The match
succeeds if expr ~= value
evaluates to true. The standard library provides
a default implementation of ~=
using ==
equality; additionally, range
objects may be matched against integer and floating-point values. The ~=
operator may be overloaded like any function.
var point = (0, 0, 0) switch point { // Equality comparison. case (0, 0, 0): print("origin") // Range comparison. case (-10...10, -10...10, -10...10): print("close to the origin") default: print("too far away") } // Define pattern matching of an integer value to a string expression. func ~=(pattern:String, value:Int) -> Bool { return pattern == "\(value)" } // Now we can pattern-match strings to integers: switch point { case ("0", "0", "0"): print("origin") default: print("not the origin") }
The order of evaluation of expressions in patterns, including whether an expression is evaluated at all, is unspecified. The compiler is free to reorder or elide expression evaluation in patterns to improve dispatch efficiency. Expressions in patterns therefore cannot be relied on for side effects.
...
...
...
stmt-return ::= 'break'
The 'break' statement transfers control out of the enclosing 'for' loop or 'while' loop.
stmt-return ::= 'continue'
The 'continue' statement transfers control back to the start of the enclosing 'for' loop or 'while' loop.
...
stmt-switch ::= 'switch' expr-basic '{' stmt-switch-case* '}'
stmt-switch-case ::= (case-label | default-label) brace-item+
stmt-switch-case ::= (case-label | default-label) ';'
case-label ::= 'case' pattern ('where' expr)? (',' pattern ('where' expr)?)* ':'
default-label ::= 'default' ':'
'switch' statements branch on the value of an expression by :ref:`pattern
matching <langref.pattern>`. The subject expression of the switch is evaluated
and tested against the patterns in its case
labels in source order. When a
pattern is found that matches the value, control is transferred into the
matching case
block. case
labels may declare multiple patterns
separated by commas. Only a single case
labels may precede a block of
code. Case labels may optionally specify a guard expression, introduced by
the where
keyword; if present, control is transferred to the case only if
the subject value both matches the corresponding pattern and the guard
expression evaluates to true. Patterns are tested "as if" in source order; if
multiple cases can match a value, control is transferred only to the first
matching case. The actual execution order of pattern matching operations, and
in particular the evaluation order of :ref:`expression patterns
<langref.pattern.expr>`, is unspecified.
A switch may also contain a default
block. If present, it receives control
if no cases match the subject value. The default
block must appear at the
end of the switch and must be the only label for its block. default
is
equivalent to a final case _
pattern. Switches are required to be
exhaustive; either the contained case patterns must cover every possible value
of the subject's type, or else an explicit default
block must be specified
to handle uncovered cases.
Every case and default block has its own scope. Declarations within a case or
default block are only visible within that block. Case patterns may bind
variables using the :ref:`var keyword <langref.pattern.var>`; those variables
are also scoped into the corresponding case block, and may be referenced in the
where
guard for the case label. However, if a case block matches multiple
patterns, none of those patterns may contain variable bindings.
Control does not implicitly 'fall through' from one case block to the next. :ref:`fallthrough statements <langref.stmt.fallthrough>` may explicitly transfer control among case blocks. :ref:`break <langref.stmt.break>` and :ref:`continue <langref.stmt.continue>` within a switch will break or continue out of an enclosing 'while' or 'for' loop, not out of the 'switch' itself.
At least one brace-item
is required in every case or default block. It is
allowed to be a no-op. Semicolon can be used as a single no-op statement in
otherwise empty cases in switch statements.
func classifyPoint(point: (Int, Int)) { switch point { case (0, 0): print("origin") case (_, 0): print("on the x axis") case (0, _): print("on the y axis") case (var x, var y) where x == y: print("on the y = x diagonal") case (var x, var y) where -x == y: print("on the y = -x diagonal") case (var x, var y): print("length \(sqrt(x*x + y*y))") } } switch x { case 1, 2, 3: print("x is 1, 2 or 3") default: ; }
stmt-fallthrough ::= 'fallthrough'
fallthrough
transfers control from a case
block of a :ref:`switch
statement <langref.stmt.switch>` to the next case
or default
block
within the switch. It may only appear inside a switch
. fallthrough
cannot be used in the final block of a switch
. It also cannot transfer
control into a case
block whose pattern contains :ref:`var bindings
<langref.pattern.var>`.
Commentary
It would be really great to have literate swift code someday, that way this could be generated directly from the code. This would also be powerful for Swift library developers to be able to depend on being available and standardized.
This describes some of the standard swift code as it is being built up. Since Swift is designed to give power to the library developers, much of what is normally considered the "language" is actually just implemented in the library.
All of this code is published by the 'swift
' module, which is implicitly
imported into each source file, unless some sort of pragma in the code
(attribute on an import?) is used to change or disable this behavior.
In the initial Swift implementation, a module named Builtin
is imported
into every file. Its declarations can only be found by <a href="#expr-dot">dot
syntax</a>. It provides access to a small number of primitive representation
types and operations defined over them that map directly to LLVM IR.
The existance of and details of this module are a private implementation detail used by our implementation of the standard library. Swift code outside the standard library should not be aware of this library, and an independent implementation of the swift standard library should be allowed to be implemented without the builtin library if it desires.
For reference below, the description of the standard library uses the
"Builtin.
" namespace to refer to this module, but independent
implementations could use another implementation if they so desire.
// Void is just a type alias for the empty tuple. typealias Void = ()
Commentary
Having a single standardized integer type that can be used by default everywhere is important. One advantage Swift has is that by the time it is in widespread use, 64-bit architectures will be pervasive, and the LLVM optimizer should grow to be good at shrinking 64-bit integers to 32-bit in many cases for those 32-bit architectures that persist.
// Fixed size types are simple structs of the right size. struct Int8 { value : Builtin.Int8 } struct Int16 { value : Builtin.Int16 } struct Int32 { value : Builtin.Int32 } struct Int64 { value : Builtin.Int64 } struct Int128 { value : Builtin.Int128 } // Int is just an alias for the 64-bit integer type. typealias Int = Int64
struct Float { value : Builtin.FPIEEE32 } struct Double { value : Builtin.FPIEEE64 }
// Bool is a simple enum. enum Bool { true, false } // Allow true and false to be used unqualified. var true = Bool.true var false = Bool.false
func * (lhs: Int, rhs: Int) -> Int func / (lhs: Int, rhs: Int) -> Int func % (lhs: Int, rhs: Int) -> Int func + (lhs: Int, rhs: Int) -> Int func - (lhs: Int, rhs: Int) -> Int
func < (lhs : Int, rhs : Int) -> Bool func > (lhs : Int, rhs : Int) -> Bool func <= (lhs : Int, rhs : Int) -> Bool func >= (lhs : Int, rhs : Int) -> Bool func == (lhs : Int, rhs : Int) -> Bool func != (lhs : Int, rhs : Int) -> Bool
func && (lhs: Bool, rhs: ()->Bool) -> Bool func || (lhs: Bool, rhs: ()->Bool) -> Bool
Swift has a simplified precedence levels when compared with C. From highest to lowest:
"exponentiative:" <<, >> "multiplicative:" *, /, %, & "additive:" +, -, |, ^ "comparative:" ==, !=, <, <=, >=, > "conjunctive:" && "disjunctive:" ||