-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Difference in performance between pyvo and astroquery #633
Comments
Thank you @jwfraustro for the experiment. For context: we do have TAPPlus around to support the ESA modules, but we do not support its usage for other TAP services. For the other services the only supported client within astroquery and pyvo is the pyvo implementation. |
This comment is not a request to re-open the issue, just wanted to add to the findings. @jwfraustro Thanks for investigating further! I was also thinking I was doing something un-idiomatic so wanted to make sure by asking here. I think the primary reason I recreated the service every time was because I used to use a process pool to avoid potential issues with python's GIL. I ran into issues with passing the TAPService across process boundaries due to not being pickleable. This seems to work now though? Testing this again with process pool and passing in the service as a function parameter, I get this I don't think this is actionable or a fault with pyvo but rather passing services across processes doesn't really make sense to do. |
Background
I am not sure this issue is that significant so feel free to close this.
I used to use
astroquery
and its gaia module to perform Gaia queries programmatically to be able to parallelise my queries across say bins in Galactic longitude.Something like this
and using python's either multiprocessing or ThreadPoolExecutor.
I then changed to using
pyvo
and by feel it seemed slower than before. This is mostly feelings based as I didn't do a benchmark of the two approaches because I changed how I dispatched the queries (astroquery and ThreadPoolExecutor while I used async python with pyvo).Benchmark results
I was curious to see if there is a measurable difference so I wrote a benchmark script (see the bottom of the post) and got these as the results.
You can see that for smaller queries,
pyvo
tends to be slower thanastroquery
mostly due to consistently longer "read" times (read being the time taken for the query and to download to memory). This flips at higher number of objects queried but then "write" times (writing the in-memory table to disk as votable) become higher forpyvo
although the error bars imply that their performance is comparable.The point of this issue is that although the performance isn't that different at large queries (what I thought) it does seem to be different enough at smaller queries with
pyvo
being slower thanastroquery
.Is this performance comparison known? I would like to use
pyvo
overastroquery
because it is more flexible and is more general over different TAP services but it seems to be slightly worse in performance.I know the benchmark is not solid evidence because of other factors like network and disk complicating things so I also wanted to know if others get the same-ish results.
Benchmarking script
The text was updated successfully, but these errors were encountered: