Limit reads from the store #162

YPares · 2020-03-10T10:16:52Z

From my knowledge, in the current state of funflow there are two kind of hashes for step/stepIO, those obtained from ContentHashable and those obtained from the binary representation of a serialized result (and they have no reason to converge).
In a cached step a :: (ContentHashable m x, Store y) => Flow ex eff x y, y is indexed in the store by using the hash of its binary representation. That means that if pure step a is used is this flow:

a >>> b

Then to know whether to re-execute b or not, we have to read y from the store and let b hash it in order to know if b should run or not. Am I correct on that? This might be different for external tasks as I understand, but I think for step/stepIO it's the case.
If we were to require y to be ContentHashable too, we would only use one type of hash, and unless some step is inserted between a and b that modifies or takes a subset of y, then b would not need to read anything as we would already know the hash of its input (automatically passing the hashes along the flow could be done transparently).

We have a potential use case in which we would heavily rely on remote caching, which I want to assess feasibility of, and limiting the reads in such a remote context would become important.

The same could be obtained by requiring input x to be Store, and always hashing the store representation (based on the logic that some step input is often another step's output, and needs to be serialized at some point anyway).

cc @nc6 @regnat

The text was updated successfully, but these errors were encountered:

nc6 · 2020-04-14T10:24:00Z

Yes, I see the issue here. I'm not sure that changing the hashing mechanism (by itself) would help that much, because we still actually try to return the value from the store for the computation. What we could do instead is return something like:

data Cached o = Cached {
  -- | Deliberately lazy
  cachedValue :: ~o
  cachedHash :: ContentHash
}

The runner could then take advantage of this to "skip" steps if not needed. We would want to ensure that the hash:

Is deducible from the store path/metadata, and
Matches the hash given from serialisation

This should hopefully be the case, but would need to check.

dorranh added design question labels Jul 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit reads from the store #162

Limit reads from the store #162

YPares commented Mar 10, 2020 •

edited

Loading

nc6 commented Apr 14, 2020

Limit reads from the store #162

Limit reads from the store #162

Comments

YPares commented Mar 10, 2020 • edited Loading

nc6 commented Apr 14, 2020

YPares commented Mar 10, 2020 •

edited

Loading