Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Branch support #29

Open
happydpc opened this issue Jan 19, 2022 · 12 comments
Open

Branch support #29

happydpc opened this issue Jan 19, 2022 · 12 comments
Labels
enhancement New feature or request help wanted Extra attention is needed question Further information is requested

Comments

@happydpc
Copy link

Bonsaidb is a wonderful DB. Can this implement internal branch support for a struct with the versioned tree? Thanks,

@ecton
Copy link
Member

ecton commented Jan 19, 2022

Thank you for the kind words about BonsaiDb! I'm not sure what "internal branch support for a struct" would refer to. Could you help provide a little more context or a link showing the concept in another database/codebase?

@happydpc
Copy link
Author

Like git operation, make a branching database from origin/master. Then we can manage different branches' databases.

@ecton
Copy link
Member

ecton commented Jan 19, 2022

I honestly had never considered that possibility. I'm not sure if it's something that could be in the scope of this project, but if anyone has ideas on ways to extend this format to aide in supporting this concept, I would be happy to look into it more.

For BonsaiDb we plan on supporting the replication logic at the BonsaiDb level, and we are hoping to support bi-directional replication with customizable conflict resolution. This is a form of branching, because you can clone a database using replication, modify it, and then eventually replicate again in either direction. At this time, I don't believe I'm missing any features in Nebari that will be needed to support replication for BonsaiDb.

It might be that when we get to actually writing the code, however, that we discover that it would be better to add more features to Nebari. Only time will tell. In the meantime, if anyone is interested in using Nebari directly with some sort of on-disk branching support, please add a comment or add a 👍 reaction to the issue.

@ecton ecton added enhancement New feature or request help wanted Extra attention is needed question Further information is requested labels Jan 19, 2022
@pauleveritt
Copy link

Funny enough, this is exactly what I would like. Of course, I am highly unlikely to get around to (a) learning enough Rust to (b) plug this into Python, so feel free to throw out my comment.

That said, I've thought a lot about it over the last two years and have a specific project for it.

@ecton
Copy link
Member

ecton commented Feb 1, 2022

Thank you for the comment, @pauleveritt! I'm definitely going to be tackling this, but I don't quite know what it will look like yet. I actually have opened this issue up with the intention of writing on the subject multiple times, but each time led to more ideas/questions that made me hesitate in sharing my thoughts.

The long-story-short is that I've come around to the idea that "diffing" a Nebari tree should be able to be done in a generic fashion such that BonsaiDb could utilize the same mechanism even if the Nebari trees aren't on the same machine. For Nebari users, it should be as simple as executing a function call with two Tree instances.

One thing that current Nebari files lack is a unique ID. For safety reasons, I'd prefer Nebari to generate a unique ID on tree creation so that it could safeguard against attempting to merge trees that are unrelated.

@pauleveritt
Copy link

Thanks for quick reply. If it would help, I could write up what I hope to do. OTOH, it could prove to be so niche, it would be a distraction. OTOH, if I succeeded (1% chance), it would be an important usage.

@ecton
Copy link
Member

ecton commented Feb 1, 2022

As I've never implemented something like this before, I would love any ideas on approaches. I have a vague approach in mind, but I haven't dedicated too much deep thought on the subject yet. I'm not sure how quickly I am planning on getting to this particular feature -- it sort of depends on what (if any) feedback I get from releasing BonsaiDb's alpha this week. If you have the time, please do share your thoughts on what you'd do or like to see from this feature.

@pauleveritt
Copy link

I'll give you a super-terse version, enough for you to tell me to leave. 😀 As background, I was part of this Python database.

My interest: a Python static site generator (or engine for Sphinx) that has the cache richness of GatsbyJS (but actually works) and the performance of Hugo, but usable from Python.

Python has the PyO3 package to work with Rust, but likely that some things have issues in the interface. In a perfect world, I could get a tree in a pool of Python processes: (a) MVCC-ish, (b) transactional, (c) immutable-oriented, (d) most hard operations on the Rust side, (e) very scalable, (f) change-oriented.

This would mean some shared memory kind of thing, which might be a deal breaker.

What does change-oriented mean? A data model that supported incremental rebuilding across process runs. I've been leaning towards adopting a Git-style data model. I will stop at that point, as perhaps I lost you at at "Python shared memory." But there's a good bit more on the next step that is pretty interesting.

@ecton
Copy link
Member

ecton commented Feb 1, 2022

Thank you for the follow up. The shared memory part is something I'm not sure I have any interest in exploring -- it's just so far from the use cases I personally have.

With Nebari today, if you can synchronize writes through some form of IPC, you can use TreeFile directly and only open the file for write when you've acquired the system-wide lock. Since this is an exclusive file operation, it can be done by looping until the file is opened successfully, but that obviously isn't as ideal as being able to park and wait for a signal that the file can be opened.

Each time a new write happened, each process would need to re-initialize its State before it would see the new data. This scan is very efficient.

If that doesn't sound too hacky, the only remaining task would be to expose TreeFile to Python. If there are good IPC primitives that could help control this state management from within Python, this might not be too hacky at all.

@pauleveritt
Copy link

pauleveritt commented Feb 1, 2022

That would be interesting to see if "very efficient" is efficient enough. :) And in fact, if the injector is as good as I have in mind, message passing would suffice.

So the next part is weirder: the tree-based data model I have in mind and whether I need it. Should I continue the conversation?

Edit: I should add, this next part could likely be done atop your thing. But there are aspects where it would be better (eventually) as a first-class citizen.

@ecton
Copy link
Member

ecton commented Feb 1, 2022

That would be interesting to see if "very efficient" is efficient enough. :)

It's quite efficient, but there are too many variables to give you a good idea on the speed. The operation is fairly simple: seek to the last multiple of the page size into the file, read a few bytes to verify it's actually a header. If it is, parse the header. If it isn't, move back by 1 page and check again.

So the next part is weirder: the tree-based data model I have in mind and whether I need it. Should I continue the conversation?

Sure. If it's directly related to the idea of keeping track of a document's history and merging it with another, we can keep the conversation here. To me this request is about synchronizing multiple files together rather than using a single file. If the idea is to have multiple processes synchronizing with a single file and build a custom tree using that setup, then let's explore that idea in either a discussion or a new issue.

@pauleveritt
Copy link

Thanks @ecton and thanks for the response. I will start writing down some notes and reply in a discussion post, as it isn't really an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants