-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial design to reduce metadata overhead #169
Conversation
I like this plan. I have read the doc briefly and I'll read it again some more shortly, but I have a question for you: |
@benbierens I think that 2 elements per node is optimal when it comes to the size of the proofs. Think about the extreme case when you have N leaves and N elements per node. The size of the proof for each of the N leaves will be O(N). @dryajov I like the Idea. I'm not convinced yet that we need to store the entire tree. I think we could (should) store only a merkle proof next to a block and that should suffice for all the use cases in my opinion. It will take only 2 times more on the disk, but I think it would reduce the complexity of the flows and possibly increase the performance of disk operations (sequential reads only). |
design/metadata-overhead.md
Outdated
1. Once new peers have been discovered and connected, go to step 1.1.1 | ||
2. Once blocks are received from the remote nodes | ||
1. The hashes are verified against the requested Merkle root and if they pass | ||
1. The block is persisted to the network |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably you meant persisted to the repo/local store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it! Makes a lot sense to use a merkle root instead of the block hashes in the manifest.
design/metadata-overhead.md
Outdated
|
||
#### Announcing over the DHT | ||
|
||
Also, datasets are now announced by their Merkle root instead of each individual block as was the case in the previous implementation. Announcing individual blocks is still supported, for example manifests are announced exactly the same as before, by their cid. Announcing individual blocks is also supported (but not required) and can be usefull in the case of bandwidth incentives. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a Codex node that stores only the data for a single slot, still announce that data under the merkle root of the entire dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question - yes, so far I don't see a better way of doing so. There isn't a good way of announcing block ranges over a DHT, but even in that case, blocks in slots aren't continuous. Definitely not ideal, and something to investigate further.
Co-authored-by: markspanbroek <[email protected]>
Co-authored-by: markspanbroek <[email protected]>
Co-authored-by: markspanbroek <[email protected]>
Co-authored-by: markspanbroek <[email protected]>
Co-authored-by: markspanbroek <[email protected]>
This, the size of proofs increases with the arity of the tree.
The problem with this approach is that, in the new flow, you only know the root of the tree and the index of the leaf, you don't know the actual leaf or block hash before hand, the merkle tree serves as a kind of an index, where you read the tree and fetch the leaf (block hash) and after you can read it from the repo. |
This document proposes a change to reduce metadata overhead in manifest files as well as consolidating the slot blocks handling with the network block handling.