Name	Name	Last commit message	Last commit date
parent directory ..
.doc	.doc
CMakeLists.txt	CMakeLists.txt
DeclSyntax.cpp	DeclSyntax.cpp
ExprSyntax.cpp	ExprSyntax.cpp
Format.cpp	Format.cpp
GenericSyntax.cpp	GenericSyntax.cpp
LegacyASTTransformer.cpp	LegacyASTTransformer.cpp
README.md	README.md
RawSyntax.cpp	RawSyntax.cpp
Status.md	Status.md
StmtSyntax.cpp	StmtSyntax.cpp
Syntax.cpp	Syntax.cpp
SyntaxData.cpp	SyntaxData.cpp
SyntaxFactory.cpp	SyntaxFactory.cpp
TokenSyntax.cpp	TokenSyntax.cpp
Trivia.cpp	Trivia.cpp
TypeSyntax.cpp	TypeSyntax.cpp
UnknownSyntax.cpp	UnknownSyntax.cpp

Swift Syntax and Structured Editing Library

Welcome to lib/Syntax!

This library implements data structures and algorithms for dealing with Swift syntax, striving to be safe, correct, and intuitive to use. The library emphasizes immutable, thread-safe data structures, full-fidelity representation of source, and facilities for structured editing.

What is structured editing? It's an editing strategy that is keenly aware of the structure of source code, not necessarily its representation (i.e. characters or bytes). This can be achieved at different granularities: replacing an identifier, changing a call to global function to a method call, or indenting and formatting an entire source file based on declarative rules. These kinds of diverse operations are critical to the Swift Migrator, which is the immediate client for this library, now developed in the open. Along with that, the library will also provide infrastructure for a first-class swift-format tool.

Eventually, the goal of this library is to represent Swift syntax in all of the compiler. Currently, lib/AST structures don't make a very clear distinction between syntactic and semantic information. Long term, we hope to achieve the following based on work here:

Adoption throughout the compiler
Clear separation of syntactic and semantic information
Greater stability with immutable data structures
Lower high-water memory use due to reference counting without the need for leak-forever memory contexts
Incremental re-parsing
Incremental, lazier re-type-checking, helped by separating syntactic information

This library is a work in progress and should be expected to be in a molten state for some time. Don't integrate this into other areas of the compiler or use it for anything serious just now.

You can read more about the status of the library's implementation at the Syntax Status Page. More information about opportunities to get involved to come.

Design and Implementation Guidelines

In no particular order, here is a summary of the design and implementation points for this library:

Represent Swift source with "full fidelity" - parsing a source file and printing the syntax tree should result in the same file.
Provide good structured editing APIs at all granularities.
Make public API as intuitive as possible. If you can't quickly answer, "What do I do next?", that's a bug. Compiler development shouldn't be black magic, so let's make an effort so that beginners and experts alike feel welcome here.
Don't crash.
Don't open yourself up to crashes for the sake of performance.
Don't hand raw pointers to clients. Give them realized values that they either own or borrow, and indicate optionality with types.
If it represents authored Swift source, then the data representing it is immutable. You don't change what an author wrote without their permission and initiation.
This library is not concerned with semantic analysis. Don't store types, symbolic references, lookup logic, and the like here. This is purely for syntactic constructs.
All public APIs must be covered by tests. Whenever applicable, consider:
- C++ unit tests, at a minimum, all public C++ APIs
- Round-trip lex/parse test cases
- Exercised by the formatter/migrator
- For each grammar production, as many combinations as possible, especially with respect to optional terms and expected by missing terms
All public APIs must have documentation comments.
Represent Swift grammar and use naming conventions in accordance with The Swift Programming Language book as much as possible, so people know what to look for.
Accommodate "bad syntax" - humans are imperfect and source code is constantly in a state of flux in an editor. Unforunately, we still live in a character-centric world - the library shouldn't fall over on bad syntax just because someone is in the middle of typing struct.

APIs

Make APIs

Make APIs are for creating new syntax nodes in a single call. Although you need to provide all of the pieces of syntax to these APIs, you are free to use "missing" placeholders as substructure. Make APIs return freestanding syntax nodes and do not establish parental relationships.

The SyntaxFactory

The SyntaxFactory embodies the Make APIs and is the one-stop shop for creating new syntax nodes and tokens in a single call. There are two main Make APIs exposed for each Syntax node: making the node with all of the pieces, or making a blank node with all of the pieces marked as missing. For example, SyntaxFactory has makeStructDeclSyntax and makeBlankStructDeclSyntax that both return a StructDeclSyntax.

Instead of constructors on each syntax node's class, static creation methods are all supplied here in the SyntaxFactory for better code completion - you don't need to know the exact name of the class. Just type SyntaxFactory::make and let code completion show you what you can make.

Example

// A 'typealias' keyword with one space after
auto TypeAliasKeyword = SyntaxFactory::makeTypeAliasKeyword({}, Trivia::spaces(1));

// The identifier "Element" with one space after
auto ElementID = SyntaxFactory::makeIdentifier("Element", {}, Trivia::spaces(1));

// An equal '=' token with one space after
auto Equal = SyntaxFactory::makeEqualToken({}, Trivia::spaces(1));

// A type identifier for "Int"
auto IntType = SyntaxFactory::makeTypeIdentifier("Int", {}, {})

// Finally, the actual type alias declaration syntax.
auto TypeAlias = SyntaxFactory::makeTypeAliasDecl(TypeAliasKeyword,
                                                  ElementID,
                                                  EmptyGenericParams,
                                                  Equal,
                                                  IntType);
TypeAlias.print(llvm::outs());

typealias Element = Int

With APIs

With APIs are essentially setters on Syntax nodes you already have in hand but, because they are immutable, return new Syntax nodes with only the specified substructure replaced. Raw backing storage is shared as much as possible.

Example

Say you have a MyStruct of type StructDeclSyntax representing:

struct MyStruct {}

Now, let's create a new struct with a different identifier, "YourStruct". The original struct is unharmed but identical tokens are shared.

auto NewIdentifier = SyntaxFactory::makeIdentifier("YourStruct",
    MyStruct.getIdentifier().getLeadingTrivia(),
    MyStruct.getIdentifier().getTrailingTrivia());

MyStruct.withIdentifier(NewIdentifier).print(llvm::outs());

struct YourStruct {}

Builder APIs

Builder APIs are provided for building up syntax incrementally as it appears. At any point in the building process, you can call build() and get a reasonably formed Syntax node (i.e. with no raw nullptrs) using what you've provided to the builder so far. Anything that you haven't supplied is marked as missing. This is essentially what the parser does so, looking forward to future adoption, the builders are designed with the parser in mind, with the hope that we can better specify recovery behavior and incremental (re-)parsing.

Example

StructDeclSyntaxBuilder Builder;

// We previously parsed a struct keyword, let's tell the builder to use it.
Builder.useStructKeyword(StructKeyword);

// Hm, we didn't see an identifier, but saw a left brace. Let's keep going.
Builder.useLeftBraceToken(ParsedLeftBrace)

// No members of the struct; we saw a right brace.
Builder.useRightBraceToken(ParsedRightBrace);

Let's see what we have so far.

auto StructWithoutIdentifier = Builder.build();
StructWithoutIdentifier.print(llvm::outs());

struct {}

Whoops! You forgot an identifier. Let's add one here for fun.

auto MyStructID = SyntaxFactory::createIdentifier("MyStruct", {}, Trivia::spaces(1));
Builder.useIdentifier(MyStructID);

auto StructWithIdentifier = Builder.build();
StructWithIdentifier.print(llvm::outs());

struct MyStruct {}

Much better!

Note that syntax builders own and mutate the data they will eventually use to build a syntax node. They themselves should not be shared between threads. However, anything the builder builds and returns to you is safe and immutable.

Syntax Rewriters

TODO.

Internals

RawSyntax

RawSyntax are the raw immutable backing store for all syntax. Essentially, they store a kind, whether they were missing in the source, and the layout, which is a list of children and represents the recursive substructure. Although these are tree-like in nature, they maintain no parental relationships because they can be shared among many nodes. Eventually, RawSyntax bottoms out in tokens, represented by the TokenSyntax class.

RawSyntax summary

RawSyntax are the immutable backing store for all syntax.
RawSyntax are immutable.
RawSyntax establishes the tree structure of syntax.
RawSyntax store no parental relationships and can therefore be shared among syntax nodes if they have identical content.

TokenSyntax

These are special cases of RawSyntax and represent all terminals in the grammar. Aside from the token kind and the text, they have two very important pieces of information for full-fidelity source: leading and trailing source trivia surrounding the token.

TokenSyntax summary

TokenSyntax are RawSyntax and represent the terminals in the Swift grammar.
Like RawSyntax, TokenSyntax are immutable.
TokenSyntax do not have pointer equality, as they can be shared among syntax nodes.
TokenSyntax have leading- and trailing trivia, the purely syntactic formatting information like whitespace and comments.

Trivia

You've already seen some uses of Trivia in the examples above. These are pieces of syntax that aren't really relevant to the semantics of the program, such as whitespace and comments. These are modeled as collections and, with the exception of comments, are sort of "run-length" encoded. For example, a sequence of four spaces is represented by { Kind: TriviaKind::Space, Count: 4 }, not the literal text " ".

Some examples of the "atoms" of Trivia:

Spaces
Tabs
Newlines
Single-line developer (//) comments
Block developer (/* ... */) comments
Single-line documentation (///) comments
Block documentation (/** ... */) comments
Backticks

There are two Rules of Trivia that you should obey when parsing or constructing new Syntax nodes:

A token owns all of its trailing trivia up to, but not including, the next newline character.
Looking backward in the text, a token owns all of the leading trivia up to and including the first contiguous sequence of newlines characters.

Let's take a look at how this shows up in practice with a small snippet of Swift code.

Example

func foo() {
  var x = 2
}

Breaking this down token by token:

func
- Leading trivia: none.
- Trailing trivia: Takes up the space after (Rule 1).
```
// Equivalent to:
Trivia::spaces(1)
```
foo
- Leading trivia: none. The previous func ate the space before.
- Trailing trivia: none.
(
- Leading trivia: none.
- Trailing trivia: none.
)
- Leading trivia: none.
- Trailing trivia: Takes up the space after (Rule 1).
{
- Leading trivia: none. The previous ) ate the space before.
- Trailing trivia: none. Because of Rule 1, it doesn't take the following newline.
var
- Leading trivia: One newline followed by two spaces because of Rule 2.
```
// Equivalent to:
Trivia::newlines(1) + Trivia::spaces(2)
```
- Trailing trivia: Takes the space after (Rule 1).
x
- Leading trivia: none. The previous var ate the space before.
- Trailing trivia: Takes up the space after (Rule 1).
=
- Leading trivia: none. The previous x ate the space before.
- Trailing trivia: Takes up the space after (Rule 1).
2
- Leading trivia: none. The previous = ate the space before.
- Trailing trivia: none: Because of Rule 1, it doesn't take the following newline.
}
- Leading trivia: One newline, due to Rule 2.
- Trailing trivia: none.
EOF
- Leading trivia: none.
- Trailing trivia: none.

A couple of remarks about the EOF token:

Starting with the first newline after the last non-EOF token, EOF takes all remaining trivia in the source file as its leading trivia.
Because of this, EOF never has trailing trivia.

Summary of Trivia

Trivia represent source trivia, the whitespace and comments in a Swift source file.
Trivia are immutable.
Trivia don't have pointer identity - they are primitive values.

SyntaxData

SyntaxData nodes wrap RawSyntax nodes with a few important pieces of additional information: a pointer to a parent, the position in which the node occurs in its parent, and cached children.

For example, if we have a StructDeclSyntaxData, wrapping a RawSyntax for a struct declaration, we might ask for the generic parameter clause. At first, this is only represented in the raw syntax. On first ask, we thaw those out by creating a new GenericParameterClauseSyntaxData, cache it as our child, set its parent to this, and send it back to the caller. These cached children are strong references, keeping the syntax tree alive in memory.

You can think of SyntaxData as "concrete" or "realized" syntax nodes. They represent a specific piece of source code, have an absolute location, line and column number, etc. RawSyntax are more like the integer 1 - a single theoretical entity that exists, but manifesting everywhere it occurs identically in Swift source code.

Beyond this, SyntaxData nodes have no signficant public API.

SyntaxData are immutable. However, they may mutate themselves in order to implement lazy instantiation of children and caching. That caching operation transparent and thread-safe.
SyntaxData have identity, i.e. they can be compared with "pointer equality".
SyntaxData are implementation detail have no public API.

Syntax

RawSyntax and SyntaxData are essentially implementation detail in order to maintain all of those nice properties like immutability and information sharing. Now, we get to the main players: the Syntax nodes. These have the interesting public interface: the With APIs, getters, etc. Anyone working with the Syntax library will be touching these nodes.

Internally, they are actually packaged as a strong reference to the root of the tree in which that node resides, and a weak reference to the SyntaxData representing that node. Why a weak reference to the data? We do this to prevent retain cycles and minimize retain/release traffic: all strong references point down in the tree, starting at the root.

Although it's important for the entire library to be easy to use and maintain in general, it's especially important that the APIs in Syntax nodes remain intuitive and do what you expect with no weird side effects, necessary contexts to maintain, etc. If you have a handle on a Syntax node, you're safe to query anything about it without other processes pulling out the rug from under you.

Example Object Diagram: `{ return 1 }`

Here's an example of what you might have as a result of the following C++ code:

auto LeftBrace = SyntaxFactory::makeLeftBraceToken({}, Trivia::spaces(1));

auto IntegerTok = SyntaxFactory::makeIntegerLiteralToken("1", {}, Trivia::spaces(1));
auto Integer = SyntaxFactory::makeIntegerLiteralExpr(IntegerTok);

auto ReturnKW = SyntaxFactory::makeReturnKeyword({}, Trivia::spaces(1));

// This ReturnStmtSyntax is floating, with no root.
auto Return = SyntaxFactory::makeReturnStmt(ReturnKW, Integer);

auto RightBrace = SyntaxFactory::makeLeftBraceToken({}, {});

auto Statements = SyntaxFactory::makeBlankStmtList()
  .addExpr(Return);

auto Block = SyntaxFactory::makeBlankCodeBlockStmt()
  // Takes a reference of the token directly and increments the
  // reference count.
  .withLeftBraceToken(LeftBrace)

  // Only takes a strong reference to the RawSyntax of the
  // ReturnStmtSyntax above.
  .withStatements(Statements)

  // Takes a reference of the token directly and increments the
  // reference count.
  .withRightBraceToken(RightBrace);

// Returns a new ReturnStmtSyntax with the root set to the Block
// above, and the parent set to the StmtListSyntax.
auto MyReturn = Block.getStatement(0).castTo<ReturnStmt>;

Here's what the corresponding object diagram would look like starting with MyReturn.

Legend:

Green: RawSyntax types (TokenSyntax is a RawSyntax)
Red: SyntaxData types
Blue: Syntax types
Gray: Trivia
Solid Arrows: Strong references
Dashed Arrows: Weak references

A couple of interesting points and reminders:

All strong references point downward in the tree.
One SyntaxData for each RawSyntax.
Remember, a SyntaxData is essentially a RawSyntax with a parent pointer and cached SyntaxData children.
Parent pointers are omitted here but there are weak references pointing upward among SyntaxData (red) nodes.
Clients only work with Syntax (blue) nodes and Trivia (gray), and should never see SyntaxData (red) or RawSyntax (green) nodes.

Adding new Syntax Nodes

Here's a handy checklist when implementing a production in the grammar.

Check that the corresponding lib/AST node has SourceLocs for all terms. If it doesn't, [file a Swift bug][NewSwiftBug] and fix that first.
- Add the Syntax bug label!
Check if it's not already being worked on, and then [file a Swift bug][NewSwiftBug], noting which grammar productions are affected.
- Add the Syntax bug label!
Add a kind to include/swift/Syntax/SyntaxKinds.def
Create the ${KIND}SyntaxData class.
- Cached children members as RC<${CHILDKIND}SyntaxData>
Create the ${KIND}Syntax class.
Be sure to implement the following:
- Define the Cursor enum for the syntax node. This specifies all of the terms of the production, including optional terms. For example, a same-type generic requirement is:
  same-type-requirement -> type-identifier '==' type
  
  That's three terms in the production, and you can see this reflected in the StructDeclSyntaxData class:
```
enum Cursor : CursorIndex {
  LeftTypeIdentifier,
  EqualityToken,
  RightType,
};
```
- With APIs for all layout elements (e.g. withLeftTypeIdentifier(...))
  - Add C++ unit tests.
    - Check that the resulting Syntax node has identical content except for what you changed. print the new node and check the text.
    - Check that the new node has a different parent.
- Getters for all layout elements (e.g. getLeftTypeIdentifier())
  - Caching mechanics in corresponding ${KIND}SyntaxData class.
  - Add a C++ unit test.
    - After geting the child, verify:
      - The child's parent and root are correct
      - The child's content is correct
      - The child prints the expected text.
Implement static Make APIs in SyntaxFactory
- make${KIND}Syntax(... all elements ...)
  - Add a C++ unit test.
    - Supply various inputs, some missing, some not.
    - Supply incorrect token kinds for elements that are RC<TokenSyntax>. Check that the asserts are as expected.
- makeBlank${KIND}Syntax()
  - Add a C++ unit test.
If applicable, create a ${KIND}SyntaxBuilder.
- use____(...) methods for each layout element - takes a ${KIND}Syntax for that child type.
- ${KIND}Syntax build() const
  - Add a C++ unit test.
    - build() at all stages of building, followed by print().
Add a round-trip test for the grammar production
- Create a .swift file in test/Syntax with all possible configurations of the piece of syntax, with two RUN lines:
  - check for a zero-diff print with -round-trip-lex, and
  - check for a zero-diff print with -round-trip-parse
Update lib/Syntax/Status.md if applicable.

[NewSwiftBug]: https://bugs.swift.org/secure/CreateIssue!default.jspa)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syntax

Syntax

README.md

Swift Syntax and Structured Editing Library

Design and Implementation Guidelines

APIs

Make APIs

The SyntaxFactory

With APIs

Builder APIs

Syntax Rewriters

Internals

RawSyntax

RawSyntax summary

TokenSyntax

TokenSyntax summary

Trivia

Summary of Trivia

SyntaxData

Syntax

Example Object Diagram: `{ return 1 }`

Adding new Syntax Nodes

Files

Syntax

Directory actions

More options

Directory actions

More options

Latest commit

History

Syntax

Folders and files

parent directory

README.md

Swift Syntax and Structured Editing Library

Design and Implementation Guidelines

APIs

Make APIs

The SyntaxFactory

With APIs

Builder APIs

Syntax Rewriters

Internals

RawSyntax

RawSyntax summary

TokenSyntax

TokenSyntax summary

Trivia

Summary of Trivia

SyntaxData

Syntax

Example Object Diagram: { return 1 }

Adding new Syntax Nodes

Example Object Diagram: `{ return 1 }`