From ddddf84ea94197beed51ce4f8f09b032856b9115 Mon Sep 17 00:00:00 2001 From: Daniel Zheng Date: Thu, 26 Apr 2018 06:24:41 -0700 Subject: [PATCH] Use relative links, cleanup whitespace. --- CODE_OF_CONDUCT.md | 8 ++-- Installation.md | 1 - README.md | 46 +++++++++++++++------- Usage.md | 4 +- docs/DesignOverview.md | 14 +++---- docs/GraphProgramExtraction.md | 71 +++++++++++++++++----------------- docs/PythonInteroperability.md | 15 ++++--- docs/WhySwiftForTensorFlow.md | 14 +++---- 8 files changed, 92 insertions(+), 81 deletions(-) diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md index 5fff9d05a1c..1f2e4c6a508 100644 --- a/CODE_OF_CONDUCT.md +++ b/CODE_OF_CONDUCT.md @@ -19,7 +19,7 @@ Examples of unacceptable behavior by participants include: * Trolling, insulting/derogatory comments, and personal or political attacks * Public or private harassment * Publishing others' private information, such as a physical or electronic address, without explicit permission -* Conduct which could reasonably be considered inappropriate for the forum in which it occurs. +* Conduct which could reasonably be considered inappropriate for the forum in which it occurs. All TensorFlow forums and spaces are meant for professional interactions, and any behavior which could reasonably be considered inappropriate in a professional setting is unacceptable. @@ -35,7 +35,7 @@ Project maintainers have the right and responsibility to remove, edit, or reject This Code of Conduct applies to all content on tensorflow.org, TensorFlow’s GitHub organization, or any other official TensorFlow web presence allowing for community interactions, as well as at all official TensorFlow events, whether offline or online. -The Code of Conduct also applies within project spaces and in public spaces whenever an individual is representing TensorFlow or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed or de facto representative at an online or offline event. +The Code of Conduct also applies within project spaces and in public spaces whenever an individual is representing TensorFlow or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed or de facto representative at an online or offline event. ## Conflict Resolution @@ -44,11 +44,11 @@ Conflicts in an open source project can take many forms, from someone having a b If the behavior is threatening or harassing, or for other reasons requires immediate escalation, please see below. -However, for the vast majority of issues, we aim to empower individuals to first resolve conflicts themselves, asking for help when needed, and only after that fails to escalate further. This approach gives people more control over the outcome of their dispute. +However, for the vast majority of issues, we aim to empower individuals to first resolve conflicts themselves, asking for help when needed, and only after that fails to escalate further. This approach gives people more control over the outcome of their dispute. If you are experiencing or witnessing conflict, we ask you to use the following escalation strategy to address the conflict: -1. Address the perceived conflict directly with those involved, preferably in a real-time medium. +1. Address the perceived conflict directly with those involved, preferably in a real-time medium. 2. If this fails, get a third party (e.g. a mutual friend, and/or someone with background on the issue, but not involved in conflict) to intercede. 3. If you are still unable to resolve the conflict, and you believe it rises to harassment or another code of conduct violation, report it. diff --git a/Installation.md b/Installation.md index 0b3d497e9a3..ead38df35a4 100644 --- a/Installation.md +++ b/Installation.md @@ -1,4 +1,3 @@ - # Install Swift for TensorFlow To install Swift for TensorFlow, download one of the packages below and follow the instructions for your operating system. After installation, you can use the full suite of Swift tools, including `swift` (Swift REPL/interpreter) and `swiftc` (Swift compiler). See [here](Usage.md) for more details about using Swift for TensorFlow. diff --git a/README.md b/README.md index ca1ec31c21b..fc5b12a54f7 100644 --- a/README.md +++ b/README.md @@ -2,18 +2,30 @@ Welcome to the Swift for TensorFlow development community! -Swift for TensorFlow is the result of first-principles thinking applied to machine learning frameworks and aims to take TensorFlow usability to new heights. Swift for TensorFlow is based on the belief that machine learning is important enough for first-class language and compiler support, and thus works very differently from normal language bindings. +Swift for TensorFlow is the result of first-principles thinking applied to +machine learning frameworks and aims to take TensorFlow usability to new +heights. Swift for TensorFlow is based on the belief that machine learning is +important enough for first-class language and compiler support, and thus works +very differently from normal language bindings. First-class language and compiler support allow us to innovate in areas that -traditionally were out of bounds for machine learning libraries. Our programming model combines the performance of TensorFlow graphs with the flexibility and expressivity of Eager execution, while keeping a strong focus on improved usability at every level of the stack. +traditionally were out of bounds for machine learning libraries. Our +programming model combines the performance of TensorFlow graphs with the +flexibility and expressivity of Eager execution, while keeping a strong focus +on improved usability at every level of the stack. -**Note:** Swift for TensorFlow is an early stage research project. It has been released to enable open source development and is not yet ready for general use by machine learning developers. +**Note:** Swift for TensorFlow is an early stage research project. It has been +released to enable open source development and is not yet ready for general use +by machine learning developers. ## Installation and Usage -You can download a pre-built package for Swift for TensorFlow [here](https://github.com/tensorflow/swift/blob/master/Installation.md). After installing Swift for TensorFlow, you can learn how to use the project [here](https://github.com/tensorflow/swift/blob/master/Usage.md). +You can download a pre-built package for Swift for TensorFlow +[here](Installation.md). After installing Swift for TensorFlow, you can learn +how to use the project [here](Usage.md). -For instructions on building from source, visit [google/swift](https://github.com/google/swift/tree/tensorflow). +For instructions on building from source, visit +[google/swift](https://github.com/google/swift/tree/tensorflow). ## Documentation @@ -21,18 +33,19 @@ Below are some documents explaining the Swift for TensorFlow project. Conceptual: -- [Swift for TensorFlow Design Overview](https://github.com/tensorflow/swift/blob/master/docs/DesignOverview.md) -- [Why *Swift* for TensorFlow?](https://github.com/tensorflow/swift/blob/master/docs/WhySwiftForTensorFlow.md) +- [Swift for TensorFlow Design Overview](docs/DesignOverview.md) +- [Why *Swift* for TensorFlow?](docs/WhySwiftForTensorFlow.md) Deeper dives: -- [Graph Program Extraction](https://github.com/tensorflow/swift/blob/master/docs/GraphProgramExtraction.md) -- [Automatic Differentiation](https://github.com/tensorflow/swift/blob/master/docs/AutomaticDifferentiation.md) -- [Python Interoperability](https://github.com/tensorflow/swift/blob/master/docs/PythonInteroperability.md) +- [Graph Program Extraction](docs/GraphProgramExtraction.md) +- [Automatic Differentiation](docs/AutomaticDifferentiation.md) +- [Python Interoperability](docs/PythonInteroperability.md) ## Source code -Currently, the active development of Swift for TensorFlow will happen under the "tensorflow" branch of +Currently, the active development of Swift for TensorFlow will happen under +the "tensorflow" branch of [google/swift](https://github.com/google/swift/tree/tensorflow). These projects include: @@ -40,7 +53,8 @@ These projects include: - The compiler and standard libraries: [google/swift](http://github.com/google/swift/tree/tensorflow) - Debugger and REPL support: [google/swift-lldb](http://github.com/google/swift-lldb) -As the code matures, we aim to move it upstream to the corresponding [Swift.org](https://swift.org) repositories. +As the code matures, we aim to move it upstream to the corresponding +[Swift.org](https://swift.org) repositories. ## Models @@ -59,7 +73,10 @@ mailing list. ## Contributing -We welcome source code contributions: please read the [Contributor Guide](https://github.com/google/swift/blob/tensorflow/CONTRIBUTING.md) to get started. It's always a good idea to discuss your plans on the mailing list before making any major submissions. +We welcome source code contributions: please read the [Contributor +Guide](https://github.com/google/swift/blob/tensorflow/CONTRIBUTING.md) to get +started. It is always a good idea to discuss your plans on the mailing list +before making any major submissions. ## Code of Conduct @@ -71,4 +88,5 @@ experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation. The Swift for TensorFlow community is guided by our [Code of -Conduct](CODE_OF_CONDUCT.md), which we encourage everybody to read before participating. +Conduct](CODE_OF_CONDUCT.md), which we encourage everybody to read before +participating. diff --git a/Usage.md b/Usage.md index 8f758030031..48279a863af 100644 --- a/Usage.md +++ b/Usage.md @@ -112,7 +112,7 @@ This was a simple demonstration of Swift for TensorFlow. To see example models w ## (Mac-only) Xcode -To use Swift for TensorFlow with Xcode, you must have installed a toolchain from [this page](Installation.md). +To use Swift for TensorFlow with Xcode, you must have installed a toolchain from [this page](Installation.md). 1. Open Xcode’s `Preferences`, navigate to `Components > Toolchains`, and select the installed Swift for TensorFlow toolchain. The name of the toolchain should start with "Swift for TensorFlow Development Snapshot". @@ -120,7 +120,7 @@ To use Swift for TensorFlow with Xcode, you must have installed a toolchain from Select toolchain in Xcode preferences. -2. In the menu bar, select `File > New > Playground...`. +2. In the menu bar, select `File > New > Playground...`. 3. Then, select `macOS` and `Blank` and hit `Next`. diff --git a/docs/DesignOverview.md b/docs/DesignOverview.md index 3acddfe1018..e1cd9d49667 100644 --- a/docs/DesignOverview.md +++ b/docs/DesignOverview.md @@ -9,7 +9,7 @@ This document provides a high level view of these subcomponents and describe how We go describe these pieces of the project: - [Swift](#swift) - - [TensorFlow](#tensorflow) + - [TensorFlow](#tensorflow) - [Graph Program Extraction](#graph-program-extraction) - [The TensorFlow module](#the-tensorflow-module) - [Automatic Differentiation](#automatic-differentiation) @@ -35,7 +35,7 @@ One warning: Swift evolved rapidly in its early years, so you should be careful ## TensorFlow -[TensorFlow](https://tensorflow.org/) is a popular and widely-used machine learning framework. TensorFlow provides a graph-based Python API where you explicitly build graph operations and then execute the graph one or more times with the session API. In addition, TensorFlow added [eager execution](https://www.tensorflow.org/programmers_guide/eager) which lets you call operations one-by-one in a Pythonic mode, but without the benefits of graphs. +[TensorFlow](https://tensorflow.org/) is a popular and widely-used machine learning framework. TensorFlow provides a graph-based Python API where you explicitly build graph operations and then execute the graph one or more times with the session API. In addition, TensorFlow added [eager execution](https://www.tensorflow.org/programmers_guide/eager) which lets you call operations one-by-one in a Pythonic mode, but without the benefits of graphs. In that context, many users will initially think Swift for TensorFlow is just a straight language binding. However, Swift for TensorFlow lets you write imperative eager execution-style code, while Swift gives you the full performance of the explicit graph APIs. The magic behind this is a [compiler transformation](#graph-program-extraction) that analyzes your code and automatically builds the TensorFlow graph and runtime calls for you. The nice thing about this is that TensorFlow "just works", and you don’t have to think about graphs at all. @@ -46,12 +46,12 @@ Swift for TensorFlow has a low-level syntax that gives you direct access to any ```swift struct Tensor { ... - // Implement the infix `+` operator on Tensor in terms of the TensorFlow `Add` op, + // Implement the infix `+` operator on Tensor in terms of the TensorFlow `Add` op, // which takes two input tensors and returns one result. static func +(lhs: Tensor, rhs: Tensor) -> Tensor { return #tfop("Add", lhs, rhs) } - // Another example that implements a method in terms of the TensorFlow `Conv2D` op, + // Another example that implements a method in terms of the TensorFlow `Conv2D` op, // which takes two input tensors, as well as a `strides` and `padding` attribute. func convolved2D(withFilter filter: Tensor, strides: (Int32, Int32, Int32, Int32), @@ -75,7 +75,7 @@ The Graph Program Extraction transformation is the key technique that allows Ten First, the compiler finds the tensor operations in the code (which is trivial due to the low-level `#tfop` syntax described above). Next, it desugars high-level abstractions (like structs, tuples, generics, functions, variables, etc) that connect tensor operations through a process called "deabstraction". After deabstraction, the tensor operations are directly connected to each other through SSA dataflow edges and are embedded in a control flow graph represented in the [Swift Intermediate Language](https://github.com/apple/swift/blob/master/docs/SIL.rst) (SIL). The code for this is primarily implemented in [TFDeabstraction.cpp](Link to Github). -Once the tensor operations are desugared, a transformation we call "partitioning" extracts the graph operations from the program and builds a new SIL function to represent the tensor code. In addition to removing the tensor operations from the host code, new calls are injected that call into [our new runtime library](#runtime-entry-points-for-extraction) to start up TensorFlow, rendezvous to collect any results, and send/receive values between the host and the tensor program as it runs. The bulk of the Graph Program Extraction transformation itself lives in [TFPartition.cpp](TODO: LINK TO GITHUB). +Once the tensor operations are desugared, a transformation we call "partitioning" extracts the graph operations from the program and builds a new SIL function to represent the tensor code. In addition to removing the tensor operations from the host code, new calls are injected that call into [our new runtime library](#runtime-entry-points-for-extraction) to start up TensorFlow, rendezvous to collect any results, and send/receive values between the host and the tensor program as it runs. The bulk of the Graph Program Extraction transformation itself lives in [TFPartition.cpp](TODO: LINK TO GITHUB). Once the tensor function is formed, it has some transformations applied to it, and is eventually emitted to a TensorFlow graph using the code in [TFLowerGraph.cpp](TODO: LINK TO GITHUB). After the TensorFlow graph is formed, we serialize it to a protobuf and encode the bits directly into the executable, making it easy to load at program runtime. @@ -150,7 +150,7 @@ The most significant unimplemented piece of our compiler and runtime model is su ## Automatic Differentiation -[Automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation) (AD) is a powerful technique that all machine learning frameworks are expected to implement, because gradients are so important for this work (e.g. with [SGD](https://en.wikipedia.org/wiki/Stochastic_gradient_descent)). TensorFlow implements automatic differentiation as a TensorFlow graph transformation, but we would like to deploy more powerful techniques to improve user experience in failure cases, enable differentiating custom data structures, recursion, and higher-order differentiation. As such, we built a stand-alone AD feature for Swift: one that is completely independent of the standard TensorFlow implementation of AD, and also completely independent of TensorFlow support in Swift. +[Automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation) (AD) is a powerful technique that all machine learning frameworks are expected to implement, because gradients are so important for this work (e.g. with [SGD](https://en.wikipedia.org/wiki/Stochastic_gradient_descent)). TensorFlow implements automatic differentiation as a TensorFlow graph transformation, but we would like to deploy more powerful techniques to improve user experience in failure cases, enable differentiating custom data structures, recursion, and higher-order differentiation. As such, we built a stand-alone AD feature for Swift: one that is completely independent of the standard TensorFlow implementation of AD, and also completely independent of TensorFlow support in Swift. The way this works is by having Swift AD support arbitrary user-defined types. Swift for TensorFlow builds on this by making its Tensor types conform to the AD system, allowing them to participate as you’d expect. A nice thing about this is that Swift programmers interested in non-Tensor numerical analysis can use AD for any other types that are important for their work. @@ -233,5 +233,3 @@ We’re focusing on finishing the basic Swift for TensorFlow model, gaining more **Differentiating Opaque Closures:** Statically differentiating a function requires the body of the function to be visible to the compiler. However, this limits the expressiveness of the differential operator, e.g. users can’t apply the gradient operator to a function argument that has a function type because the compiler can’t always see into the body of the original function. We will discuss the possibility to introduce a new function convention - when a differentiable function is passed around, a pointer to its primal and adjoint gets passed along. This enables the compiler to directly call the primal and the adjoint, without the need to see into the function declaration. This is important for class and protocol methods. **Quantization Support:** We believe we can get a much better user experience for [fixed-point quanitization tools](https://www.tensorflow.org/performance/quantization) if we integrate them into the compiler, and this should help with integrating quanitization into the training process. - - diff --git a/docs/GraphProgramExtraction.md b/docs/GraphProgramExtraction.md index 25c391e08f5..24dfc87096e 100644 --- a/docs/GraphProgramExtraction.md +++ b/docs/GraphProgramExtraction.md @@ -1,6 +1,6 @@ # Graph Program Extraction -Swift for TensorFlow provides a define-by-run programming model while also providing the full benefit of graphs. This is possible because of a core "graph program extraction" algorithm that we’ve built into the Swift compiler that takes imperative Swift code and automatically builds a graph as part of the normal compilation flow. This document [frames and motivates the challenge](#motivation), explains [related work](#related-work), describes our [technique at a high level](#graph-program-extraction-a-new-define-by-run-approach) to contrast with prior work, explains an inductive mental model for [how our approach works](#building-a-programming-model), and explains the [resultant programming model](#explaining-the-swift-for-tensorflow-model-to-users) in user terms. +Swift for TensorFlow provides a define-by-run programming model while also providing the full benefit of graphs. This is possible because of a core "graph program extraction" algorithm that we’ve built into the Swift compiler that takes imperative Swift code and automatically builds a graph as part of the normal compilation flow. This document [frames and motivates the challenge](#motivation), explains [related work](#related-work), describes our [technique at a high level](#graph-program-extraction-a-new-define-by-run-approach) to contrast with prior work, explains an inductive mental model for [how our approach works](#building-a-programming-model), and explains the [resultant programming model](#explaining-the-swift-for-tensorflow-model-to-users) in user terms. It is helpful to have an idea of how the overall design of Swift for TensorFlow works, which you can get from the [Swift for TensorFlow design overview document](DesignOverview.md). @@ -43,7 +43,7 @@ This approach has a lot of advantages, however there are also limitations of thi - The low performance of the Python interpreter can matter for some kinds of models - particularly ones that use fine grained operations. - The [GIL](https://en.wikipedia.org/wiki/Global_interpreter_lock) can force complicated workarounds (e.g. mixing in C++ code) for models that want to harness multicore CPUs as part of their work. - Even if we had an infinitely fast interpreter without a GIL, interpreters cannot "look ahead" beyond the current op. This prevents discovering future work that is dependent on work that it is currently dispatching and waiting for. This in turn prevents certain optimizations, like general op fusion, model parallelism, etc. - + Finally, while [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation) (AD) is not the focus of this whitepaper, define-by-run approaches prevent the use of "[Source Code Transformation](https://en.wikipedia.org/wiki/Automatic_differentiation#Source_code_transformation_(SCT))" techniques to AD. The "[operator overloading](https://en.wikipedia.org/wiki/Automatic_differentiation#Operator_overloading_(OO))" approaches they use are effective, but lose the ability to translate control flow constructs in the host language to the computation graph, make it difficult to perform optimizations or provide good error messages when differentiation fails. ### Define-by-run approaches - Tracing JITs @@ -54,7 +54,7 @@ Several active research projects are exploring the use of tracing JIT compilers - Tracing JITs fully "unroll" computations, which can lead to very large traces. - Tracing JIT are unable to "look ahead" across data dependent branches, which can lead to short traces in certain types of models, and bubbles in the execution pipeline. - Tracing JITs allow dynamic models to intermix non-Tensor Python computation, but doing so reintroduces the performance problems of Python: values produced by such computation split traces, can introduce execution bubbles, and delay trace execution. - + Overall, these approaches provide a hybrid model that provide much of the performance of graphs along with some of the usability of the interpreter-based define-by-run models, but include compromises along both axes. ### Lightweight Modular Staging (LMS) @@ -82,7 +82,7 @@ There is a final point that is worth emphasis: while TensorFlow is the critical ## Building a programming model -Our goal is to provide a simple, predictable, and reliable programming model that is easy to intuitively understand, can be explained to a user in a few paragraphs, and which the compiler can reinforce with warnings and other diagnostics. +Our goal is to provide a simple, predictable, and reliable programming model that is easy to intuitively understand, can be explained to a user in a few paragraphs, and which the compiler can reinforce with warnings and other diagnostics. The most significant challenge is that the programming model must be amenable to reliable static analysis, but also allow the use of high-level user-defined abstractions like (layers and estimators). Our approach can be implemented in a few languages that support reliable static analysis. For a detailed discussion of the issues involved, please see our [Why *Swift* for TensorFlow?](WhySwiftForTensorFlow.md) document. @@ -113,7 +113,7 @@ func multiplyAndAdd(x: TensorHandle, w: TensorHandle, b: TensorHandle) -> Tensor } ``` -Because we added so many constraints on what we accept, it is trivial to transform this into a graph through static analysis: we can do a top-down walk over the code. Each function parameter is turned into an input to the graph. Each operation is trivial to identify, and can be transformed 1-1 into graph nodes: given a top-down walk, inputs are already graph nodes, and the op name, inputs, and any attributes (which aren’t discussed here) are immediately available. Because we have no control flow, there is exactly one return instruction, and it designates the result of the graph. +Because we added so many constraints on what we accept, it is trivial to transform this into a graph through static analysis: we can do a top-down walk over the code. Each function parameter is turned into an input to the graph. Each operation is trivial to identify, and can be transformed 1-1 into graph nodes: given a top-down walk, inputs are already graph nodes, and the op name, inputs, and any attributes (which aren’t discussed here) are immediately available. Because we have no control flow, there is exactly one return instruction, and it designates the result of the graph. This gives us a result like this: @@ -231,16 +231,16 @@ Before we explain our approach, we’ll introduce an (abstracted) example that u ```swift func hostAndGraphCommunication() -> TensorHandle { var values = #tfop("RandomInitOp") - for i in 0 ... 1000 { + for i in 0 ... 1000 { let x = #tfop("SomeOp", values) - + // This is not a tensor op, it has to run on the host CPU. // It might be dispatched to a cluster of worker machines. let result = atariGameSimulator(x) - + let y = #tfop("AnotherOp", x) values = #tfop("MixOp", result, y) - } + } return result } @@ -248,7 +248,7 @@ func hostAndGraphCommunication() -> TensorHandle { In interpreter-based define-by-run systems, the host CPU is running the interpreter, and it dispatches every operation when it encounters them. When it encounters the `atariGameSimulator` call (which isn’t a TensorFlow op), the interpreter just copies the data back from the accelerator to the host, makes the call, and copy the result back to the accelerator when it gets to the `MixOp` operation that uses it. -Tracing JITs take this further by having the interpreter collect longer series of tensor operations - this "trace" of operations allows more optimization of the tensor code. This example is too simple to really show the power of this, but even here a tracing JIT should be able to build a trace that includes both the `RandomInitOp` operation and the `SomeOp` operation on the first iteration, allowing inter-op fusion between them. On the other hand, tracing JITs are forced to end a trace any time a data dependency is found: the call to `atariGameSimulator` needs the value of `x`, so the trace stops there. +Tracing JITs take this further by having the interpreter collect longer series of tensor operations - this "trace" of operations allows more optimization of the tensor code. This example is too simple to really show the power of this, but even here a tracing JIT should be able to build a trace that includes both the `RandomInitOp` operation and the `SomeOp` operation on the first iteration, allowing inter-op fusion between them. On the other hand, tracing JITs are forced to end a trace any time a data dependency is found: the call to `atariGameSimulator` needs the value of `x`, so the trace stops there. Because of the way these systems work, neither of them can discover that `AnotherOp` can be run on the accelerator in parallel with `atariGameSimulator` on the host. Furthermore, because a tracing JIT splits the trace, data layout optimizations between `SomeOp` and `AnotherOp` are not generally possible: the two are in separate traces. @@ -263,16 +263,16 @@ First we start by duplicating the function, and replace all host code with send ```swift func hostAndGraphCommunication_ForGraph() -> TensorHandle { var values = #tfop("RandomInitOp") - for i in 0 ... 1000 { + for i in 0 ... 1000 { let x = #tfop("SomeOp", values) - + // REMOVED: let result = atariGameSimulator(x) #tfop("SendToHost", x) let result = #tfop("ReceiveFromHost") - + let y = #tfop("AnotherOp", x) values = #tfop("MixOp", result, y) - } + } return result } @@ -289,18 +289,18 @@ func hostAndGraphCommunication() -> TensorHandle { let tensorProgram = startTensorFlowGraph("... proto buf for TensorFlow graph ... ") // REMOVED: var values = #tfop("RandomInitOp") - for i in 0 ... 1000 { + for i in 0 ... 1000 { // REMOVED: let x = #tfop("SomeOp", values) let x = receiveFromTensorFlow(tensorProgram) // This is not a tensor op, it has to run on the host CPU. // It might be dispatched to a cluster of worker machines. let result = atariGameSimulator(x) - + sendToTensorFlow(tensorProgram, result) // REMOVED: let y = #tfop("AnotherOp", x) // REMOVED: values = #tfop("MixOp", result, y) - } + } let result = finishTensorFlowGraph(tensorProgram) return result } @@ -324,9 +324,9 @@ func countUntilKeyPressed() -> TensorHandle { while true { let stop = keyPressed() if stop { break } - + result = #tfop("HeavyDutyComputation", result) - } + } return result } @@ -341,9 +341,9 @@ func countUntilKeyPressed_ForGraph() -> TensorHandle { // REMOVED: let stop = keyPressed() let stop = #tfop("ReceiveFromHost") if stop { break } - + result = #tfop("HeavyDutyComputation", result) - } + } return result } @@ -360,9 +360,9 @@ func countUntilKeyPressed() -> TensorHandle { sendToTensorFlow(tensorProgram, stop) if stop { break } - + // REMOVED: result = #tfop("HeavyDutyComputation", result) - } + } let result = finishTensorFlowGraph(tensorProgram) return result @@ -379,7 +379,7 @@ On the other hand, this approach can be a performance concern when the latency b A final important topic is performance predictability. We like that our define-by-run model allows the user to flexibly intermix host and tensor code, but this brings in a concern that this could lead to very difficult to understand pitfalls, where large tensor values are bouncing back and forth between devices excessively. -The solution to this fits right into the standard compiler design: by default, Swift produces compiler warnings when an implicit copy is made (as in the examples above), which makes it immediately clear when a copy is being introduced. +The solution to this fits right into the standard compiler design: by default, Swift produces compiler warnings when an implicit copy is made (as in the examples above), which makes it immediately clear when a copy is being introduced. These warnings can be annoying when the copies are intentional, so the user can either disable the warning entirely (e.g. when doing research on small problem sizes where performance doesn’t matter at all) or the warnings can be disabled on a case-by-case basis by calling a method (currently named `x.toDevice()` and `x.toHost()`) to tell the compiler (and future maintainers of the code!) that the copy is intentional. @@ -394,9 +394,9 @@ As we discussed before, we can add any abstractions to our model as long as we h This is great because it allows users to compose high level abstractions out of tensors and other values. For example, the compiler scalarizes this code: ```swift -struct S { +struct S { var a, b: TensorHandle - var c: String + var c: String } let value = S(a: t1, b: t2, c: "Hello World") @@ -429,10 +429,10 @@ The next big jump is to add function calls, which the compiler can provably elim Fortunately, Swift has a strong static side, and all top-level functions, methods on structs, and many other things (like computed properties on structs) are consistently direct calls. This is a huge step forward in terms of our modeling power, because we can now build a reasonable user-facing set of Tensor APIs. For example, something like this: ```swift -struct Tensor { +struct Tensor { // TensorHandle is now an internal implementation detail, not user exposed! private var value: TensorHandle - + func matmul(_ b: Tensor) -> Tensor { return Tensor(#tfop("MatMul", self.value, b.value)) } @@ -446,21 +446,21 @@ func calculate(a: Tensor, b: Tensor, c: Tensor) -> Tensor { let result = a.matmul(b) + c return result } -``` +``` Desugars the body of the `calculate` function with inlining into: ```swift let tmp = Tensor(#tfop("MatMul", a.value, b.value)) let result = Tensor(#tfop("Add", tmp.value, c.value)) -``` +``` ... and then scalarizes the `Tensor` structs to produce this: - + ```swift let tmp_value = #tfop("MatMul", a_value, b_value) let result_value = #tfop("Add", tmp_value, c_value) -``` +``` ... which is trivially promotable to the graph. It is very nice how these simple desugaring transformations compose cleanly, but this is only the case if they are guaranteed and can be tied to simple language constructs that the user can understand. We don’t have space to go into it here, but this inlining transformation also applies to higher-order functions like `map` and `filter` so long as their closure parameters are non-escaping (which is the default): inlining a call to `map` eventually exposes a direct call to its closure. Additionally, an important design point of Swift is that the non-aliasing property we depend on even extends to `inout` arguments and the `self` argument of `mutating` struct methods. This allows the compiler to aggressively analyze and transform these values, and is a result of Swift’s [law of exclusivity](https://github.com/apple/swift-evolution/blob/master/proposals/0176-enforce-exclusive-access-to-memory.md) which grants Fortran style non-aliasing properties to these values. @@ -472,9 +472,9 @@ It is also worth mentioning that TensorFlow graphs support function calls. In t The Swift generics model can be provably desugared using generics specialization - and of course, this is also an important performance optimization for normal Swift code! This is a huge expansion of the expressive capabilities of our system: it allows the rules around the `dtype` of Tensors to be captured and enforced directly by Swift. For example, we can expand our example above to look like this: ```swift -struct Tensor { +struct Tensor { private var value: TensorHandle - + func matmul(b: Tensor) -> Tensor { return Tensor(#tfop("MatMul", self.value, b.value)) } @@ -483,7 +483,7 @@ struct Tensor { return Tensor(#tfop("Add", lhs.value, rhs.value)) } } -``` +``` The nice thing about this is that users of the Tensor API get `dtype` checking automatically: if you accidentally attempt to add a `Tensor` with a `Tensor`, you’ll get a compile-time error, instead of a runtime error from TensorFlow. This happens even though the underlying `TensorHandle` abstraction is untyped. @@ -546,4 +546,3 @@ Our user model fits in a single paragraph: you write normal imperative Swift cod One of the beauties of this user model is that directly aligns with several of the defaults encouraged by the Swift language (e.g. closures default to non-escaping and the use of zero-cost abstractions to build high level APIs), and the core values of Swift API design (e.g. the pervasive use of value semantics strongly encourages the use of structs over classes). We believe that this will make Swift for TensorFlow "feel nice in practice" because you don’t have to resort to anti-idiomatic design to get things to work. Our implementation work is still early, but we are shifting from an early research project into a public open source project now because we believe that the theory behind this approach has been proven out. We are far enough along in the implementation to have a good understanding of the engineering concerns facing an actual implementation of these algorithms. - diff --git a/docs/PythonInteroperability.md b/docs/PythonInteroperability.md index 1a72fad4295..62a29f59d56 100644 --- a/docs/PythonInteroperability.md +++ b/docs/PythonInteroperability.md @@ -23,13 +23,13 @@ let np = Python.import("numpy") let a = np.arange(15).reshape(3, 5) let b = np.array([6, 7, 8]) -// Python: +// Python: // import gzip as gzip // import pickle as pickle let gzip = Python.import("gzip") let pickle = Python.import("pickle") -// Python: +// Python: // file = gzip.open("mnist.pkl.gz", "rb") // (images, labels) = pickle.load(file) // print(images.shape) // (50000, 784) @@ -40,7 +40,7 @@ print(images.shape) // (50000, 784) As you can see, the syntax here is immediately understandable to a Python programmer: the major differences are that Swift requires values to be declared before use (with `let` or `var`) and that we chose to put [Python builtin functions](https://docs.python.org/3/library/functions.html) like `import`, `type`, `slice` etc under a `Python.` namespace (simply to avoid cluttering the global scope). This is a result of a conscious balance between trying to make Python feel natural and familiar, while not compromising the global design of the Swift language. -This line is established through a simple requirement: we should not depend on *any Python-specific compiler or language features* to achieve Python interop - it should be completely implemented as a Swift library. After all, while Python is incredibly important to the machine learning community, there are other dynamic languages (Javascript, Ruby, etc) that have strong footholds in other domains, and we don’t want each of these domains to impose an endless complexity creep onto the Swift language. +This line is established through a simple requirement: we should not depend on *any Python-specific compiler or language features* to achieve Python interop - it should be completely implemented as a Swift library. After all, while Python is incredibly important to the machine learning community, there are other dynamic languages (Javascript, Ruby, etc) that have strong footholds in other domains, and we don’t want each of these domains to impose an endless complexity creep onto the Swift language. You can see the current implementation of our bridging layer in [Python.swift](https://github.com/google/swift/blob/tensorflow/stdlib/public/Python/Python.swift). This is pure Swift code that works with unmodified Swift 4.1. @@ -128,7 +128,7 @@ func printPythonCollection(_ collection: PyValue) { Furthermore, because `PyValue` conforms to `MutableCollection`, you get full access to the [Swift APIs for Collections](https://developer.apple.com/documentation/swift/mutablecollection), including functions like `map`, `filter`, `sort`, etc. ### Conversions to and from Swift values -Now that Swift can represent and operate on Python values, it becomes important to be able to convert between Swift native types like `Int` and `Array` and the Python equivalents. This is handled by the `PythonConvertible` protocol - to which the basic Swift types like `Int` conform to, and to the Swift collection types like `Array` and `Dictionary` conditionally conform to (when their elements conform). This makes the conversions fit naturally into the Swift model. +Now that Swift can represent and operate on Python values, it becomes important to be able to convert between Swift native types like `Int` and `Array` and the Python equivalents. This is handled by the `PythonConvertible` protocol - to which the basic Swift types like `Int` conform to, and to the Swift collection types like `Array` and `Dictionary` conditionally conform to (when their elements conform). This makes the conversions fit naturally into the Swift model. For example, if you know you need a Swift integer or you’d like to convert a Swift integer to Python, you can use: @@ -141,7 +141,7 @@ if let swiftInt = Int(somePythonValue) { // Succeeds if the Python value is con Similarly, aggregate types like arrays work exactly the same way: -```swift +```swift // This succeeds when somePythonValue is a collection of values that are convertible to Int. if let swiftIntArray = Array(somePythonValue) { print(swiftIntArray) @@ -226,7 +226,7 @@ Python’s approach to exception handling is similar to C++ and many other langu This is an inherent gap between the two languages, and we don’t want to paper over this difference with a language extension. Our current solution to this builds on the observation that even though any function call *could* throw, most calls do not. Furthermore, given that Swift makes error handling explicit in the language, it is reasonable for a Python-in-Swift programmer to also think about where they expect errors to be throwable and catchable. We do this with an explicit `.throwing` projection on `PyValue`. Here’s an example: ```swift - // Open a file. If this fails, the program is terminated, just like an + // Open a file. If this fails, the program is terminated, just like an // unhandled exception in Python. // file = open("foo.txt") @@ -255,11 +255,10 @@ Python slicing is more general than Swift’s slicing syntax. Right now you can We need to investigate and settle on the right model to use for subclassing of Python classes. There is currently no way to make a struct like `PyValue` work with tuple pattern matching, so we use projection properties like `.tuple2`. If this becomes a problem in practice, we can investigate adding this to Swift, but we currently don’t think it will be enough of a problem to be worth solving in the near term. -## Summary and Conclusion +## Summary and Conclusion We feel good about this direction and think that there are several interesting aspects of this work: it is great that there are no Python specific changes in the Swift compiler or language. We are able to achieve good Python interoperability through a library written in Swift by composing Python-independent language features. We believe that other communities will be able to compose the same feature set to directly integrate with the dynamic languages (and their runtimes) that are important to other communities (e.g. JavaScript, Ruby, etc). Another interesting aspect of this work is that Python support is completely independent of the other TensorFlow and automatic differentiation logic we’re building as part of Swift for TensorFlow. This is a generally useful extension to the Swift ecosystem that can stand alone, useful for server side development or anything else that wants to interoperate with existing Python APIs. Finally, it is important to point out one major caveat in the context of Swift for TensorFlow: while you can directly call into an arbitrary Python API, the code partitioning analysis that automatically builds TensorFlow graphs for you cannot understand dynamic Python API calls. While directly using APIs for TensorFlow (sessions, Keras, etc) through the Python interop layer is technically possible, it won't benefit from the compiler analyses and transformations we've built in Swift for TensorFlow. Instead, we need to invent our own high-level APIs, and draw inspiration from Keras and other existing APIs. Please see the [Graph Program Extraction](GraphProgramExtraction.md) document for more details about this. - diff --git a/docs/WhySwiftForTensorFlow.md b/docs/WhySwiftForTensorFlow.md index af274f208fb..3090b46f625 100644 --- a/docs/WhySwiftForTensorFlow.md +++ b/docs/WhySwiftForTensorFlow.md @@ -78,7 +78,7 @@ First, let’s start with the hopefully non-controversial strengths of Python wh **Python APIs:** TensorFlow users benefit from and rely on a huge collection of Python APIs outside the TensorFlow product, e.g. for visualization and data science stuff. If we ignore this reality, adoption of our new system (no matter how great it is) will be very slow. -### Python challenges +### Python challenges Python is great at the points above, but it has some challenges as well. Here are some things that would make TensorFlow better if they were improved: @@ -106,14 +106,14 @@ There are subtle, but critical aspects to making this work in a usable way, and ```swift class Layer { ... } -class Convolution2DLayer : Layer { ... } -class BatchNormalizationLayer : Layer { ... } +class Convolution2DLayer : Layer { ... } +class BatchNormalizationLayer : Layer { ... } class ResNetBlock { // Layers this block is made out of. var conv1, batchNorm1, conv2, batchNorm2: Layer - init(inFilterCount: Int32, outFilterCount: Int32, + init(inFilterCount: Int32, outFilterCount: Int32, strides: (Int32, Int32)) { conv1 = Convolution2DLayer(filterShape: [3, 3, inFilterCount, outFilterCount], strides: (1, strides.0, strides.1, 1)) @@ -154,7 +154,7 @@ In addition to reliable static analysis, we benefit from a few other properties Finally, there is the topic of static vs dynamic typing. Static typing offers a number of advantages particularly relevant for our use case. Static types: - can catch bugs at compile time instead of runtime. People get very frustrated when a silly type error brings down a long training run hours into it. - - directly improve tooling experience like code completion/intellisense, jump to definition, etc. + - directly improve tooling experience like code completion/intellisense, jump to definition, etc. - make it easy to know what operations are Tensor operations. On the other hand, statically typed languages can require a lot of ceremony and boilerplate, which is infuriating and directly harms productivity. There are promising middle grounds though, such as languages that are statically typed but that use type inference to eliminate explicit typing in the common case. @@ -203,7 +203,7 @@ It might be interesting to see how we evaluated Swift against the point-by-point **"Mainstream" Syntax:** Swift is designed to fit in with the "extended C family" of programming languages, and intentionally tries to feel "familiar". -**Shallow learning curve:** A key design goal of Swift is to [progressively disclose complexity](https://en.wikipedia.org/wiki/Progressive_disclosure) in the language, which makes it an extremely teachable language. This is one of the things that has enabled teaching Swift code to kids as their first programming language (targeting middle-school, 7th and 8th grade) in the [Swift Playgrounds iPad app](https://www.apple.com/swift/playgrounds/). +**Shallow learning curve:** A key design goal of Swift is to [progressively disclose complexity](https://en.wikipedia.org/wiki/Progressive_disclosure) in the language, which makes it an extremely teachable language. This is one of the things that has enabled teaching Swift code to kids as their first programming language (targeting middle-school, 7th and 8th grade) in the [Swift Playgrounds iPad app](https://www.apple.com/swift/playgrounds/). **High productivity:** Swift aims to maximize clarity of code, and thus it fights to reduce boilerplate. The top-end goal of Swift is to optimize the time it takes to write and maintain a working app, which includes debugging time and other things that go beyond just pounding out the code. @@ -273,5 +273,3 @@ You can see how each of these things is used by the Swift for TensorFlow project - the culture of modularity and composability allows us to define much our "language" features (like the `Tensor` type) in the TensorFlow library, without having to make invasive changes to the language and compiler The implementation of the Swift for TensorFlow features and capabilities is still not done, but we feel good about how the project is going in practice. That said, we’d really love to see the algorithms we are exploring get applied by other languages and communities! - -