Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time complexity of operations #162

Open
Lars-Meijer opened this issue Mar 17, 2017 · 5 comments
Open

Time complexity of operations #162

Lars-Meijer opened this issue Mar 17, 2017 · 5 comments

Comments

@Lars-Meijer
Copy link

@tidwall

I think you have a good grasp of the (avarage/estimated) time complexity of operations. I think it would be a good idea to document these in the commands section. This would be especially useful for: SET, GET and the search commands. It might also be useful for FSET, JSET, KEYS ect.

@tidwall
Copy link
Owner

tidwall commented Mar 17, 2017

Thanks for the suggestion and I totally agree.
The complexity varies slightly depending on which options are used for each command, but for the most part it should be straight forward to document.
Each command at tile38.com/commands should have their Big O listed.

@Lars-Meijer
Copy link
Author

Lars-Meijer commented Mar 31, 2017

You've mentioned earlier that SET has a time complexity of log(mn), I am assuming one is the number of points in the set, what is the other?

If I have 1 set with 100 points I can easily update them 20k/s, but when I have 10k sets with 100 points each (1 million points in total), latency goes up a lot, is the total number of points in the DB also a factor?

I have a lot of points that I keep track of, with lots of updates but almost no get/search queries, so any boost in SET performance would greatly help :)

Edit:
This example is a little extreme, when I use 500,000 points and split it into 10 collections of 50,000 points performance seems to be only marginally better than 1 collection of 500,000 points

@tidwall
Copy link
Owner

tidwall commented Mar 31, 2017

The log(mN) is a log(m)+log(n) representing two operations.

Setting a point requires two operations:

  1. Adding the point to the btree which is ordered lexographically by the point ID.
  2. Adding the point to the rtree.

The btree insert complexity is O(log n).
The rtree insert complexity is often on average O(log n). Though it can get more complex depending on factors such as rectangle overlap, and the insertion algorithm. Just to note, the current implementation that Tile38 is using is the standard r-tree with quadratic splits. We may later change this to an r*tree split.

The more points you add the slower it gets but eventually it should level out.

This example is a little extreme, when I use 500,000 points and split it into 10 collections of 50,000 points performance seems to be only marginally better than 1 collection of 500,000 points

You will probably not see much in the way of a performance benefit from splitting up the data set over multiple collections because the collection must be located prior to setting the point. Collections are mainly good for organizing your data like namespaces.

@Lars-Meijer
Copy link
Author

You will probably not see much in the way of a performance benefit from splitting up the data set over multiple collections because the collection must be located prior to setting the point. Collections are mainly good for organizing your data like namespaces.

I see, I thought each collection would keep its own btree and rtree, thus having faster performance for smaller sets. From my testing it does seem to level out when more points are added.

@tidwall
Copy link
Owner

tidwall commented Mar 31, 2017

Each collection does have it's own btree and rtree, but the collections themselves are stored in a parent btree. So each SET must traverse the parent btree first. As far as overall complexity goes, the collection btree is just an extension of the parent btree. Thus not adding any real benefit.

If the collections were stored in a different structure such as a hashmap, then we would see a bonus... at the expense of not having ordered keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants