Add stop gradient example to the tutorial.

onisimchukv · Mar 9, 2019 · a200ff8 · a200ff8
1 parent 36deb0b
commit a200ff8
Showing 1 changed file with 53 additions and 3 deletions.
diff --git a/docs/site/tutorials/custom_differentiation.ipynb b/docs/site/tutorials/custom_differentiation.ipynb
@@ -84,7 +84,7 @@
       "metadata": {
         "id": "j0a8prgZTlEO",
         "colab_type": "code",
-        "outputId": "08eca7dd-d14b-40fb-e9e5-08a4e9c3d888",
+        "outputId": "b32010bb-e291-4ae1-b090-f2c13b3e061c",
         "colab": {
           "base_uri": "https://localhost:8080/",
           "height": 85
@@ -109,7 +109,7 @@
         "print(\"exp(3) =\", sillyExp(3))\n",
         "print(\"𝛁exp(3) =\", gradient(of: sillyExp)(3))"
       ],
-      "execution_count": 0,
+      "execution_count": 50,
       "outputs": [
         {
           "output_type": "stream",
@@ -123,6 +123,56 @@
         }
       ]
     },
+    {
+      "metadata": {
+        "id": "eQPX9r3R5OP-",
+        "colab_type": "text"
+      },
+      "cell_type": "markdown",
+      "source": [
+        "## Stop derivatives from propagating\n",
+        "\n",
+        "Commonly known as \"stop gradient\" in machine learning use cases, method [`withoutDerivative()`](https://www.tensorflow.org/swift/api_docs/Protocols/Differentiable#/s:10TensorFlow14DifferentiablePAAE17withoutDerivativexyF) stops derivatives from propagating.\n",
+        "\n",
+        "Plus, `withoutDerivative()` can sometimes help the Swift compiler with identifying what not to differentiate and producing more efficient derivaitves. When it is detectable that the derivative of a function will always be zero, the Swift compiler will produce a warning. Explicitly using `.withoutDerivative()` silences that warning."
+      ]
+    },
+    {
+      "metadata": {
+        "id": "ctRt6vBO5Wle",
+        "colab_type": "code",
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 68
+        },
+        "outputId": "5ff3cb6b-9a1c-4fcc-f9e2-d7daaea369d0"
+      },
+      "cell_type": "code",
+      "source": [
+        "let x: Float = 2.0\n",
+        "let y: Float = 3.0\n",
+        "gradient(at: x, y) { x, y in\n",
+        "  sin(sin(sin(x))) + cos(cos(cos(y))).withoutDerivative()\n",
+        "}"
+      ],
+      "execution_count": 54,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "▿ 2 elements\n",
+              "  - .0 : -0.18009877\n",
+              "  - .1 : 0.0\n"
+            ]
+          },
+          "metadata": {
+            "tags": []
+          },
+          "execution_count": 54
+        }
+      ]
+    },
     {
       "metadata": {
         "id": "EeV3wXQ79WS2",
@@ -609,7 +659,7 @@
       },
       "cell_type": "markdown",
       "source": [
-        "When we run a training loop, we can see that the convolution layer's activation is computed twice: once during forward propagation, and once during backpropagation."
+        "When we run a training loop, we can see that the convolution layer's activations are computed twice: once during layer application, and once during backpropagation."
       ]
     },
     {