Skip to content

Qi Meeting Apr 26 2024

Siddhartha Kasivajhula edited this page Nov 7, 2024 · 3 revisions

Our Take on Take

Qi Meeting Apr 26 2024

Adjacent meetings: Previous | Up | Next

Summary

We discussed the implementation of the take transformer in Qi's stream fusion optimization. We also reviewed our current test coverage.

Background

As part of Qi 4, we released an optimizing compiler including the stream fusion optimization for sequences of functional list operations. The initial release included fusing map, filter, foldl, foldr, range and car. Recently, Dominik has been extending fusion to all applicable interfaces in racket/list. take presented some challenges.

The Problem With Take

In our implementation of stream fusion, we divided interfaces into three classes:

  1. Producers
  2. Transformers
  3. Consumers

For instance, range is a producer because it is the source of the elements that are functionally processed by subsequent stages. map and filter are transformers because they translate one stream to another. foldl and foldr are consumers, because they compute some value from an input stream.

Now, consider take, which operates the following way:

(take (range 100) 5) ;=> '(0 1 2 3 4)

Which category does take belong in?

Typically, transformers like map and filter do not have an internal notion of state since they simply process one element at a time without needing any additional context. Yet, take would be an exception as it would need to keep track of state since it needs to know how many elements remain to be "taken." The operation is informed not just by the current element in the stream but also on what has happened before. So far, it is only consumers that keep track of state, by means of a loop.

So, perhaps take should be a consumer.

As a consumer, take would process an input stream and produce a list as a value.

It would work, but consider this case:

(car (map sqr (filter odd? (take (range 100000000000) 100000))))

Naively, this processes a huge list and ultimately returns a single value at the head of the resulting list after a series of transformations. By employing stream fusion, we would still need to fully construct the (large) intermediate output of the take consumer before fusing the subsequent filter, map and car stages.

We should be able to do better. take consumes a list and produces a list. Perhaps it should be a transformer after all, transforming one stream into another.

To understand how take should be implemented as a stream component that manages state, we considered its basic (ordinary) implementation. It would essentially be a simple loop with two accumulators -- one to keep track of the remaining input and one to keep track of the result. Something like:

(define (take n vs)
  (if (= 0 n)
      null
      (cons (car vs)
            (take (sub1 n) (cdr vs)))))

We considered three possible approaches.

Compilation to for forms

The intuition behind this approach is: can we transform an entire functional sequence into a single loop?

For functions that contain a loop, the Chez Scheme compiler will simply leave arguments in registers, instead of allocating stack space.

Indeed, Racket's for forms take advantage of this by ultimately compiling to for/foldr/derived which in turn compiles to a big driver loop (i.e. a named let) with many fragments that come from various specifications in various surface for forms. For instance, outer expressions may contribute bindings, loop expressions may contribute accumulators, and sequences like in-list and in-range compile to :do-in which likewise contributes a fragment to the for/foldr/derived.

We could emulate this, so that each stream component, such as map or take, contribute fragments to an underlying driver loop, either directly or perhaps via :do-in and for/foldr/derived.

CPS with set!

Another approach is to continue with the continuation-passing approach but, in order to account for the new sequence in take, mutate the state by using set!. The use of mutation seemed undesirable and we wondered whether it might block optimizations in the Chez Scheme compiler, but felt that it probably would be OK. This is the approach Dominik has already implemented. We also did some googling and found that Clojure uses this approach in its implementation of "transducers."

CPS with Tupling

Yet another approach would be to define a kind of type by annotating the state with some data (such as a position in the list), and then unwrapping it each time in subsequent stages (such as in a producer) before re-wrapping it if necessary.

This seemed like it might incur a performance cost, as it's likely that the Chez optimizer won't optimize away the pairs, and there might therefore be a performance hit from garbage collection. It's unclear if we could avoid the cost of cons cell construction here.

But we did learn that this kind of "tupling and untupling" is how Haskell does it in its implementation of fusion.

Evaluation

We agreed that it would be necessary to try out these different approaches and assess their performance before deciding on an implementation.

There may also be papers out there on implementing fusion using the continuation-passing style that could be useful references.

Towards 100% Coverage

We reviewed lines that are not covered by tests today. We expect some of them to become covered now that we are expanding the set of producers, transformers and consumers. But for one of them, the issue is the "arity fast tracking" in the Racket runtime, so that the function isn't even called during execution if the arity doesn't match. Thus, it is impossible for test execution to "cover" it.

Since it is the body of the function that isn't covered, we felt that changing:

(λ (v) v)

to

identity

… might address this case.

Next Steps

(Some of these are carried over from last time)

  • Decide on a release model, document it in the user docs, and announce the performance regression (and the remedy) to users.
  • Improve unit testing infrastructure for deforestation.
  • Discuss and work out Qi's theory of effects and merge the corresponding PR.
  • Schedule a discussion for the language composition proposal and implement a proof of concept.
  • Decide on appropriate reference implementations to use for comparison in the new benchmarks report and add them.
  • Deforest other racket/list APIs via qi/list
  • Decide on whether there will be any deforestation in the Qi core, upon (require qi) (without (require qi/list))
  • Review and merge the fixes for premature termination of compiler passes.
  • Continue investigating options to preserve or synthesize the appropriate source syntax through expansion for blame purposes.

Attendees

Dominik, Michael, Sid

Clone this wiki locally