Add caching to our tests to avoid clean rebuilds. (carbon-language#534)

This is both a bit tricky and really easy with Bazel. The easy part, especially compared to other build systems is that we can do this regardless of the state of the repository -- Bazel will hermetically check that everything is up-to-date, allowing the cache to be a bit stale but still totally functional. The easy part is that we can just ask Bazel to use an output base that we cache and restore. This is super nice and even avoids most of the Bazel installation bits. The tricky part is that we need this to reconnect correctly to the installed tree, so we need to exclude a crucial symlink that will then get re-created at the right moment. This gets really tricky due to LLVM and Clang (and this we would struggle with no matter what build system we used). Building LLVM creates a *ton* of object code. Just a huge amount. As a consequence, we'd run into GitHub's limit for action cache size (5gb) really quickly with 4 configurations. So we can do another bit of tricky business and exclude the downloaded `external` tree. This gets rebuilt easily, and there's no real need to download it with the cached state -- its downloaded either way. There are two follow-ups that I'd like to make here. One is to prod the Bazel team to make things like persisting your output base a bit easier to do reliably. Even better to make it easier to do *partial* persisting. Second follow-up is to work to make our usage of LLVM *much* less wasteful. There are a bunch of steps here from changing how we use sanitizers to how LLVM is built. Those will be follow-ups though.
avlouis · May 24, 2021 · 80883db · 80883db
1 parent 59add8c
commit 80883db
Showing 1 changed file with 62 additions and 2 deletions.
diff --git a/.github/workflows/tests.yaml b/.github/workflows/tests.yaml
@@ -33,6 +33,11 @@ jobs:
         build_mode: [fastbuild, opt]
     runs-on: ${{ matrix.os }}
     steps:
+      - name: Create environment variables
+        run: |
+          echo "BAZEL_OUTPUT_BASE_PATH=$HOME/.cache/bazel-output-base" >> $GITHUB_ENV
+          echo "BAZEL=bazelisk --output_base=$HOME/.cache/bazel-output-base" >> $GITHUB_ENV
+
       - name: Checkout
         uses: actions/checkout@v2
         with:
@@ -59,16 +64,71 @@ jobs:
           bazelisk --version
           python --version
 
+      # Make the date available to subsequent steps.
+      - name: Get date
+        id: date
+        shell: bash
+        run: |
+          echo "::set-output name=date::$(/bin/date -u "+%Y%m%d")"
+
+      # Preserve and restore the Bazel cache across builds.
+      #
+      # These cached entries are keyed on this file (which configures the
+      # command and other inputs) as well as the date. Because Bazel is
+      # hermetic, it doesn't matter if we get a "stale" entry here, but it makes
+      # the cache less efficient. So we use the date to get a fresh baseline
+      # cache once per day. We include this file so that changing the build
+      # structure itself returns us to a clean state. We can also increment the
+      # counter in this comment to forcibly rotate this hash in the event we get
+      # a corrupted cache that breaks Bazel. This is a risk because we very
+      # selectively prune directories out of the internals of the Bazel output
+      # base to reduce the storage of the cache. At any point, this might break
+      # invariants and require us to clear our caches. There is no way to
+      # actually delete them and so instead we can simply bump the counter here.
+      #
+      # Current cache version: 1
+      - name: Cache Bazel build data
+        uses: actions/cache@v2
+        with:
+          # We exclude the install symlink so that Bazel reconnects to the
+          # current installation. This helps repair any damage done by other
+          # exclusions as well. Then we exclude the `external` tree because
+          # those are all downloaded and rapidly rebuilt, so no benefit from
+          # downloading them with the cached state.
+          path: |
+            ${{ env.BAZEL_OUTPUT_BASE_PATH }}
+            !${{ env.BAZEL_OUTPUT_BASE_PATH }}/install
+            !${{ env.BAZEL_OUTPUT_BASE_PATH }}/external
+          key: |
+            bazel-${{ matrix.os }}-${{ matrix.build_mode }}-${{ hashFiles('.github/workflows/tests.yaml')
+            }}-${{ steps.date.outputs.date }}
+          # When we get a cache miss, try finding the most recent previous day's
+          # cache to start.
+          restore-keys: |
+            bazel-${{ matrix.os }}-${{ matrix.build_mode }}-${{ hashFiles('.github/workflows/tests.yaml')
+            }}-
+
+      # Print Bazel diagnostics to make debugging easier.
+      - name: Print Bazel info
+        run: |
+          ${{ env.BAZEL }} info
+
       # Build all targets first to isolate build failures.
       - name: Build (${{ matrix.build_mode }})
         run: |
-          bazelisk build -c ${{ matrix.build_mode }} --verbose_failures \
+          ${{ env.BAZEL }} build -c ${{ matrix.build_mode }} --verbose_failures \
             --deleted_packages=migrate_cpp,migrate_cpp/cpp_refactoring \
             //...:all
 
       # Run all test targets.
       - name: Test (${{ matrix.build_mode }})
         run: |
-          bazelisk test -c ${{ matrix.build_mode }} --test_output errors \
+          ${{ env.BAZEL }} test -c ${{ matrix.build_mode }} --test_output errors \
             --deleted_packages=migrate_cpp,migrate_cpp/cpp_refactoring \
             --verbose_failures //...:all
+
+      # We manually shut down the Bazel server to make sure the cached files
+      # don't interact with it.
+      - name: Shutdown Bazel
+        run: |
+          ${{ env.BAZEL }} shutdown