Storage space reservations for the sales process #161

markspanbroek · 2023-03-20T15:35:03Z

Documents in more detail how storage space reservations are handled by the sales module, and how they are stored on disk and on chain.

dryajov · 2023-03-20T15:41:10Z

design/sales.md

+First, it will select a slot from the request to fill. Then, it will reserve
+space in the Repo to ensure that it keeps enough space available to store the
+content. It will mark the reserved space as belonging to the slot. Next, it will
+download the content, calculate a storage proof, and submit the proof on chain.
+If any of these later steps fail, then the node should release the storage that
+it reserved earlier.


I realize that this doesn't really fundamentally change what is being described here, but it is probably better to reserve a chunk of storage from the RepoStore and then slice it across several availabilities with different pricing strategies and durations. You almost never want to reserve right before filling a slot, since it most likely will make the process more complicated to handle at the code level, due to potential synchronization issues and other bookkeeping woes.

Do you meant that we should reserve space once, and then split that reserved space to fill slots? Wouldn't that lead to double bookkeeping, whereby we have to keep track of how much of the reserved space we've used?

Do you meant that we should reserve space once, and then split that reserved space to fill slots?

Yes

Wouldn't that lead to double bookkeeping, whereby we have to keep track of how much of the reserved space we've used?

Not really, the RepoStore keeps track of that, you can reserve a large chunk and then release in smaller chunks. The RepoStore will tell you how much you have reserved and won't allow to reserve passed the allotted quota and wont allow you to release passed the total amount of storage. So all the bookkeeping is already in place and all that it is expected is that the sales module keeps track of the availabilities over the sliced total reserved storage.

you can reserve a large chunk and then release in smaller chunks

In codex-storage/nim-codex#340, we've been discussing a "new" model of Availabilities, such that a host will define size, duration, and min price as CLI params at startup. At startup, we could reserve size bytes in the repo. Then once a request comes in, is matched to this availability, and finally goes through its lifecycle until the end, what is meant to happen to the space that was occupied by that request? Are you saying to release it ("release in smaller chunks")? If so, it would then be available for other node functions (eg caching) and the host would have to re-reserve it.

The RepoStore will tell you how much you have reserved and won't allow to reserve passed the allotted quota

Assuming a case where the size of storage for sale is less than half of the total storage in the repo, how do we prevent duplication of reserved space on node restart?

I think it makes most sense as described in the PR above for the CLI params to dictate a "maximum desired size for sale", which the reservations module will keep track of. Once a request comes in, only match that request (and reserve the space) if the size required for the slot storage plus the total space already reserved for the sales module would not exceed the "maximum desired size for sale". Once the request has completed its lifecycle, that space would be released.

The repo has a concept of reserved and used, this where specifically added so that the sales module can "reserve" space that it wants to then offer for rent. Any amount can be reserved and released back to the repo at will.

Lets define what reserving vs renting means:

Reserving: the repo allows reserving chunks of storage. This can be any size, but you probably want to reserve a large chunk and later release in smaller chunks (most likely block sized)

Renting: availabilities allow further logically chunking the reserved space into configurable pieces that can be rented out under different conditions. For example, different duration vs size vs space parameters.

The flow should look something like:

Reserve some (large) chunk of space from the repo (this won't be usable by anything else)

Slice the reserved chunk into several (or one) availabilities

When requests are matched against an availability, the host downloads data from the remote and writes it on drive

The host releases storage back to the repo in chunk/block sizes, right before writing data on drive. Releasing in small chunks avoids all sorts of concurrency issues that might arise (if this isn't enough, we can add atomic operations at the repo level, but this ins't something to worry about right now)

The above allows for a very simple an flexible mode of operation, where there is no need to do much at the repo level, just "reserve enough storage and release it on demand", and all the sales module needs to worry about is releasing back to the repo as much as it needs to write right before doing so.

In codex-storage/nim-codex#340, we've been discussing a "new" model of Availabilities, such that a host will define size, duration, and min price as CLI params at startup.

There is no reason to limit the system to the CLI, this should be dynamic parameters that can be tweaked at runtime.

There are two separate configurable parameters:

Total reserved space and

The availabilities that map to that reserved space

Limiting these two options to the CLI, limits the ability to adjust this parameters without restarting the node. For example, a user might want to adjust the total number of reserved bytes (this should obviously not affect existing availabilities); or a user might want to change, remove or add new availabilities. Limiting this to CLI parameters doesn't really make much sense.

Ok, I think i have a better understand of this, but please correct me if I'm wrong: storage bytes that have been written to disk need not be reserved, as they already count against the RepoStore's quota. That is why we can release chunks just before they are written to disk.

I had assumed incorrectly that bytes written to disk could still be reserved. 😅

Renting: availabilities allow further logically chunking the reserved space into configurable pieces that can be rented out under different conditions. For example, different duration vs size vs space parameters.

Ok, you have a good point and now I understand why availabilities was always meant to be more than a single size, duration, min price.

As described in codex-storage/nim-codex#340 (comment), we should probably include the following in the spec as it describes quite well how availabilities would be treated after matching:

I would think of availabilities strictly as pre-contract "offers for sale", in other words, this is local information that can be altered at any point by the storage node without affecting any global state and they get discarded and/or adjusted once a slot has been occupied. Discarded if an availability gets matched to a request 1:1, altered if a request uses part of an availability, in which case the available storage in the availability gets adjusted to reflect the matched request.

This brings up another point in that a host should be able to remove (or modify) availabilities that it no longer needs. An example would be if an availability offers 1000MiB for sale and a request uses 999MiB, the availability would get altered to be 1MiB in size, but with the same duration and size. The host may not think the duration and price are applicable and thus may want to delete (or modify) that availability.

There is no reason to limit the system to the CLI, this should be dynamic parameters that can be tweaked at runtime.

I was always thinking of the CLI params as a starting point to reserve space in the repo, with the ability to modify at runtime if needed. You mentioned "the repo allows reserving chunks of storage. This can be any size, but you probably want to reserve a large chunk". How large of a chunk should be reserved if there is no cli param specifying this?

Ok, I think i have a better understand of this, but please correct me if I'm wrong: storage bytes that have been written to disk need not be reserved, as they already count against the RepoStore's quota. That is why we can release chunks just before they are written to disk.

Yep, thats correct - more info here #161 (review)

As described in codex-storage/nim-codex#340 (comment), we should probably include the following in the spec as it describes quite well how availabilities would be treated after matching:

Yep, agree, it's probably good to include for clarity.

I was always thinking of the CLI params as a starting point to reserve space in the repo, with the ability to modify at runtime if needed. You mentioned "the repo allows reserving chunks of storage. This can be any size, but you probably want to reserve a large chunk". How large of a chunk should be reserved if there is no cli param specifying this?

We already have a param for the total storage quota storage-quota, but we should probably add one for reservations as well. There should be some sounds defaults, like a 10% of the user's drive gets allotted to Codex, but the reserved quota should be only enabled if the salles/persistence options are set. I would still keep this simple, the majority of the admin of this should be exposed over REST APIs and the CLI should just enable some basic startup settings and defaults.

This brings up another point in that a host should be able to remove (or modify) availabilities that it no longer needs. An example would be if an availability offers 1000MiB for sale and a request uses 999MiB, the availability would get altered to be 1MiB in size, but with the same duration and size. The host may not think the duration and price are applicable and thus may want to delete (or modify) that availability.

Yep, good point.

emizzle · 2023-03-21T02:07:45Z

design/sales.md

+Releasing storage space
+-----------------------
+
+When a storage request ends, or an attempt to fill a slot fails, it is important
+to release the reserved space that belonged to the slot. To ensure that the


As I understand it, if space is released then the node could use that space for other functions. If that happens, then some of the space that the host desired to be used for sale, would not be able to be re-reserved. This would affect how close we could get to the "maximum desired size for sale".

Release is probably the wrong word here. As I described in https://github.com/status-im/codex-research/pull/161/files#r1142846958, we want to purge the blocks that were already written and re-reserve the space before announcing the availability again.

emizzle · 2023-03-21T02:25:25Z

design/sales.md

+the state is kept on local disk by the Repo and the Datastore. How much space is
+reserved by the sales module, and how much remains available to sell is
+persisted on disk by the Repo. A mapping between amounts of reserved space and


How much space is reserved by the sales module, and how much remains available to sell is
persisted on disk by the Repo

How can we distinguish which bytes are reserved for the sales module and which bytes were reserved for other parts of the node? Perhaps we should store some of this information in the metadata datastore?

Actually, I think this was already covered above:

To ensure that the right amount of space is released, and that it is only released once, we keep
track of how much space is reserved in the Repo for a slot.

Maybe we should specify it as being tracked as metadata?

How can we distinguish which bytes are reserved for the sales module and which bytes were reserved for other parts of the node? Perhaps we should store some of this information in the metadata datastore?

Is this needed - not really. The repo only cares about how much is used vs reserved vs available, wether it was set by module A or module B, really doesn't matter from the repo perspective, nor should it from the system perspective as a whole.

track of how much space is reserved in the Repo for a slot.

We only need to keep track of this while in the process of filling the slot (i.e. downloading data) and we only care about what blocks to delete if something goes wrong, other than that, the repo already keeps track of usages on every write - once a block is written on disk, the repo counts this as used, if it's deleted, it counts that as free. Also, if a slot was filled correctly, the repo will make sure to cleanup the blocks automatically after the slots expires.

the repo will make sure to cleanup the blocks automatically after the slots expires.

How will the repo know to cleanup expired slots?

the repo will make sure to cleanup the blocks automatically after the slots expires.

How will the repo know to cleanup expired slots?

Each block has an expiration ttl, once a block expires it's picked up by the manager module and cleaned up.

We don't foresee any other module reserving storage, only the sales module ever needs to reserve storage, the total reserved storage should match the total storage of all the availabilities.

If possibly in the future, a user wanted to set aside X bytes of data to be a caching node (and take advantage of bandwidth incentives), would they use this reservation api? Or is something else being considered?

After the large chunk is initially reserved, let's say that all that storage is eventually released by completed contracts. The next time an availability comes in, what should happen? Should the size specified in the availability be reserved? Or should the large chunk (% of user's disk) again be reserved?

If possibly in the future, a user wanted to set aside X bytes of data to be a caching node (and take advantage of bandwidth incentives), would they use this reservation api? Or is something else being considered?

The notion of reserving space is specific to the renting model (and module), where you want to set some space aside without actually using it. I don't see how a caching node would use this and right now, but if the need arrises, the reservation api could certainly be used for that as well. I would not however try to go after this hypothetical scenario right now, even if we end up requiring it, it will be sometime in the future, right now bandwidth is a future feature.

@dryajov ^^ (didn't want it to get buried)

Would we want to consider the "large chunk" CLI param to be "the maximum amount of storage to rent out"?

emizzle

Overall, we should probably include what happens when:

the node restarts
the node restarts after a catastrophic failure

dryajov · 2023-03-21T03:16:34Z

design/sales.md

+will go through several steps to try and fill a slot in the request.
+
+First, it will select a slot from the request to fill. Then, it will reserve
+space in the Repo to ensure that it keeps enough space available to store the


As mentioned elsewhere, we should reserve enough space a-priory, not once the availability has been matched, this is so that the node doesn't use this space for other purposes. Reserving a large chunk that is further sliced by the availabilities is most likely better than doing so per availability, but doing it per availability is also possible.

dryajov · 2023-03-21T03:35:51Z

design/sales.md

+right amount of space is released, and that it is only released once, we keep
+track of how much space is reserved in the Repo for a slot.
+
+Releasing storage space goes in three steps. First, we look up the amount of


We should probably use the correct terminology to avoid confusion.

The repo has 3 types of storage:

Available - unused and unreserved space, the repo is allowed to either physically use this bytes by writing blocks on disk or mark them as reserved

Reserved - bytes that are not available, but aren't physically occupied on disk. This is usually used to mark space as "reserved" for rent/sell

Used - physically used bytes on drive

The repo exposes these parameters:

quotaMaxBytes*: uint # maximum available bytes quotaUsedBytes*: uint # bytes used by the repo quotaReservedBytes*: uint # bytes reserved by the repo

Furthermore, the totalUsed prop exposes the total number reserved+used bytes:

func totalUsed*(self: RepoStore): uint = (self.quotaUsedBytes + self.quotaReservedBytes)

When writing block on disk, the totalUsed + data.len <= quotaMaxBytes is used to determined if there is enough room available.

When talking about the repo then, we should say - reserved/released when space is set as reserved but not physically used and used/free when space is physically used or freed.

AuHau · 2023-03-22T09:54:58Z

I am still wrapping my head around what is being talked about here, but few remarks from my side.

I now better understand the need for REST APIs to manage availabilities. Mainly I have understood the use-case when the node operator wants to terminate the node gracefully, he needs to be able to withdraw all availabilities (so no new Slot are accepted) and then wait until the currently filled Slots expire.

That said, my original point why I proposed the CLI flag was because of how I imagine that a potentially big part of Host nodes might operate. For sure, there will be professional operators that will be optimizing things, but also (hopefully) a big part of the operators will be enthusiasts who will set up Raspi or similar, will want to configure it once, and then forget about it. For these operators, it will be IMHO crucial that they do not have to regularly check and optimize the availabilities, but simply say I have a 500GB disk that I want to rent out for this price and be done with it.

(Sidenote, maybe it would be interesting supporting "dynamic pricing" where price could, for example, follow Codex token's fluctuation compared to stablecoin or fiat)

emizzle · 2023-03-23T02:57:50Z

That said, my original point why I proposed the CLI flag was because of how I imagine that a potentially big part of Host nodes might operate. For sure, there will be professional operators that will be optimizing things, but also (hopefully) a big part of the operators will be enthusiasts who will set up Raspi or similar, will want to configure it once, and then forget about it. For these operators, it will be IMHO crucial that they do not have to regularly check and optimize the availabilities, but simply say I have a 500GB disk that I want to rent out for this price and be done with it.

This is a really good point Adam. I do agree some node operators would like to have a "set it and forget" setup. Perhaps we could have initial availabilities created via a config file (specified on the command line). On startup, the node would parse the config and add the availabilities, reserving the needed space. These availabilities could be "auto renewed", such that each time a contract completes, availabilities in the config that no longer exist on the node could be re-added.

Incorporates review comments from dryajov, emizzle and AuHau.

markspanbroek · 2023-03-28T17:31:53Z

I just updated the design document, and I think I captured most comments and suggestions. Could you take another look @dryajov and @emizzle?

emizzle

Looks really great, Mark! Do you think we should include in the specs what happens when the node experiences catastrophic failure and the repo/datastore are gone?

design/sales.md

emizzle · 2023-04-04T10:28:54Z

design/sales.md

+sold twice. If an availability matches, but is larger than the requested
+storage, then the Sales module may decide to split the availability into a part
+that we can use for the request, and a remainder that can be sold separately.


Suggested change

sold twice. If an availability matches, but is larger than the requested

storage, then the Sales module may decide to split the availability into a part

that we can use for the request, and a remainder that can be sold separately.

sold twice.

We may not really need this clarification. In the reservations module, we are simply releasing bytes are they are stored, and updating the used availability's size to reflect the remaining bytes. Once the bytes have been completely downloaded, the availability will have the correct, reduced, size.

I'd consider that to be an implementation detail. The design allows for availability to be split, and we're free to implement that as we see fit.

emizzle · 2023-04-04T10:40:09Z

design/sales.md

+repo and the storage space can be made available for sale again. The same should
+happen when something went wrong in the process of selling storage.
+
+The time-to-live value should be removed from the content in the Repo, reserved


Should the ttl initially be set to the request expiry, then updated once the request has started? After that, does the ttl need to be removed?

The request can fail, and then we need to remove the ttl.

Co-authored-by: Eric Mastro <[email protected]>

markspanbroek · 2023-05-09T13:33:10Z

Do you think we should include in the specs what happens when the node experiences catastrophic failure and the repo/datastore are gone?

Let's do that in a separate PR, I want these changes merged first.

Storage space reservations for the sales process

66a1346

Documents in more detail how storage space reservations are handled by the sales module, and how they are stored on disk and on chain.

dryajov reviewed Mar 20, 2023

View reviewed changes

markspanbroek mentioned this pull request Mar 20, 2023

[marketplace] Add Reservations Module codex-storage/nim-codex#340

Merged

emizzle reviewed Mar 21, 2023

View reviewed changes

dryajov reviewed Mar 21, 2023

View reviewed changes

Update reservations design

ec5d758

Incorporates review comments from dryajov, emizzle and AuHau.

emizzle mentioned this pull request Apr 4, 2023

[marketplace] Add time-to-live to stored blocks as part of a request codex-storage/nim-codex#389

Closed

emizzle reviewed Apr 4, 2023

View reviewed changes

typo, wording

4ea760c

Co-authored-by: Eric Mastro <[email protected]>

markspanbroek merged commit 33cd86a into master May 11, 2023

markspanbroek deleted the design-sales branch May 11, 2023 14:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storage space reservations for the sales process #161

Storage space reservations for the sales process #161

markspanbroek commented Mar 20, 2023

dryajov Mar 20, 2023

markspanbroek Mar 20, 2023

dryajov Mar 20, 2023

emizzle Mar 21, 2023 •

edited

Loading

emizzle Mar 21, 2023

dryajov Mar 21, 2023 •

edited

Loading

emizzle Mar 21, 2023

dryajov Mar 21, 2023

dryajov Mar 21, 2023

emizzle Mar 21, 2023 •

edited

Loading

dryajov Mar 21, 2023

emizzle Mar 21, 2023

emizzle Mar 21, 2023

dryajov Mar 21, 2023

emizzle Mar 21, 2023

dryajov Mar 21, 2023

emizzle Mar 21, 2023

emizzle Mar 21, 2023

dryajov Mar 22, 2023

emizzle Mar 23, 2023 •

edited

Loading

emizzle Mar 23, 2023

emizzle left a comment

dryajov Mar 21, 2023

dryajov Mar 21, 2023

AuHau commented Mar 22, 2023

emizzle commented Mar 23, 2023

markspanbroek commented Mar 28, 2023

emizzle left a comment

emizzle Apr 4, 2023

markspanbroek May 9, 2023

emizzle Apr 4, 2023

markspanbroek May 9, 2023

markspanbroek commented May 9, 2023

Storage space reservations for the sales process #161

Storage space reservations for the sales process #161

Conversation

markspanbroek commented Mar 20, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emizzle Mar 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dryajov Mar 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emizzle Mar 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emizzle Mar 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emizzle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AuHau commented Mar 22, 2023

emizzle commented Mar 23, 2023

markspanbroek commented Mar 28, 2023

emizzle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markspanbroek commented May 9, 2023

emizzle Mar 21, 2023 •

edited

Loading

dryajov Mar 21, 2023 •

edited

Loading

emizzle Mar 21, 2023 •

edited

Loading

emizzle Mar 23, 2023 •

edited

Loading