-
Notifications
You must be signed in to change notification settings - Fork 106
default limits of 101 objects? #7
Comments
limit is not actualy limit. it amount of objects that MongoDB will send in first chunk. |
Is there a way using mongodb to know that there are more chunks coming in and to go get them? |
about asyncmongo:
|
Is there any way to override the default? I tried what I thought might work like passing in limit=0 to no avail. |
setting limit=0 require get answer in several chuncks |
So effectively no for limit 0. Thanks! What's the override otherwise?
|
Is there any reliable way to iterate over a whole set of elements matching a query, without risking losing one or having duplicates ? |
If no limit is set, the mongo server by default will return about 101 documents or 1mb of data, whichever is less, in the first batch. As @energycsdx points out, asyncmongo doesn't support retrieving batches of data. A hacky way to ensure that the first batch contains all the data is to do find(callback=mycallback, limit=1000000). However, MongoDB enforces a hard limit of 4MB per batch of data, and there is no override. |
Right now, I am iterating over the collection using skip and a sort order on However, I am not entirely sure of the robustness of this method (works quite well when the data doesn't change in between). |
This is the code I use to iterate over my collection. It should work pretty well, even if the DB is being modified at the same time (in theory, there would be no duplicate elements, nor should it skip over elements if there were deletions). You use it as such : next = mongo_find_all(db.mycollection, callback=mycallback)
def mycallback(response, error):
# do_stuff
next() # Call next when done to get next batch.
next() # Get first batch. Please comment or spot bugs : def mongo_find_all(collection, spec=None, fields=None, **kwargs):
""" A generator over a mongodb query.
When querying mongodb with `find()`, we do not get systematically
all the elements that matched.
This generator iterates over **all** the documents that match
the description, making several calls to `find()` while doing so.
The trick is that it orders the query thanks to the `_id` element
of the objects, making subsequent queries from the last `_id` encountered.
It will also inspect the asked sort order to maintain the sorting in
the resulting query.
This can only work on collections with the `_id` attribute.
:param collection: The database.collection object
:param spec: The specification of the query.
:param kwargs: all the keywords arguments that will be given
to the find() method.
"""
direction = 1
found = False
sort = kwargs.get("sort", [])
for field, order in sort:
if field == "_id":
direction = order
found = True
spec = dict(**spec) # We take a copy since we're going to alter it.
callback = kwargs.get("callback", None)
if not callable(callback):
raise Exception("Please provide a callback")
# We make sure we have a sorting on the _id
if not found:
sort = sort + [('_id', 1)]
kwargs["sort"] = sort
last = None
def _next():
collection.find(spec, **kwargs)
def _callback(response, error=None):
if not response:
callback([], error)
return
last = response[-1]
for field, order in sort:
spec[field] = { "$gt" if order == 1 else "$lt" : last[field] }
callback(response, error)
kwargs["callback"] = _callback
return _next |
@ceymard that looks like it'll work; there will be performance penalties compared to iterating a synchronous pymongo cursor. You might also consider a development branch in my fork: https://github.com/ajdavis/asyncmongo/commit/5d210aa25805f69843f1a8a5d3e74c0787d0fe81 See the unittest at the bottom for usage. Essentially, find() will now return a cursor, and you can do if cursor.alive: cursor.getmore(callback) It's very far from production-ready, but could be a basis for implementing getmore() and tailable cursors in asyncmongo. |
Patch: |
closing in favor of the patch in #39 (or something close to it) |
using the limit kw arg into find, can give you more results, however when it is not specified, asyncmongo defaults to 101? the code does not say anything about this.
import asyncmongo
import pymongo
import threading
import time
import tornado
def start_loop_timeout(timeout=0.05):
def kill():
time.sleep(timeout)
tornado.ioloop.IOLoop.instance().stop()
t = threading.Thread(target=kill)
t.start()
tornado.ioloop.IOLoop.instance().start()
class AsyncDBTester(object):
def init(self):
self.responses = []
self.errors = []
db = asyncmongo.Client(pool_id='test', host='127.0.0.1', port=27017, dbname='test')
AsyncDBTester = AsyncDBTester
dbtester = AsyncDBTester()
db.dummy.remove(callback=dbtester.async_callback)
start_loop_timeout(0.1)
count = 0
for b in range(10):
for c in range(100):
db.dummy.insert({'a' : b * c}, callback=dbtester.async_callback)
count += 1
start_loop_timeout(0.1)
dbtester = AsyncDBTester()
db.dummy.find(callback=dbtester.async_callback, limit=1000)
start_loop_timeout(0.2)
print 'inserted', count
print "can get by specifying limit", len(dbtester.responses[0])
dbtester = AsyncDBTester()
db.dummy.find(callback=dbtester.async_callback)
start_loop_timeout(0.2)
print count
print "but why 101 here?", len(dbtester.responses[0])
blockingdb = pymongo.connection.Connection(host = '127.0.0.1', port = 27017)['test']
result = list(blockingdb.dummy.find())
print "pymongo gives me", len(result)
The text was updated successfully, but these errors were encountered: