-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is this still under development? #3
Comments
I didn't give the R version much love and attention recently. Reason being that I wan't to port my easy2oracle tool, which includes fast Presto access, to R. This would render RPresto obsolete. easy2oracle should already work with R, see this article I wrote: load-excel-data-into-r Sometime in the near future I will create an R installer and binary packages for Windows, RHEL, Ubuntu. Let me know if this works for you. Regards, |
I have been looking into adapting the C version of the presto client for R. It works flawlessly, but it is about as slow as the native R version. At the moment I can see 3 possible solutions:
My personal favorite is option 2. But at the moment I do not have the time to implement it. |
Hi, wondering if you have made any progress on integrating the C library with R? These functions are still significantly faster than the ones found in https://github.com/prestodb/RPresto. I built the C version using cmake and just called it from inside R using system(cprestoclient server query) - while sloppy it is very fast. When you say the C version of the presto client works flawlessly, do you mean that the C version is stable/robust? |
Hi. The answer is no and the reason for this is that other developments rendered this unnecessary. Some guys at facebook developed a dbi driver for R (option 2 from my list) a couple of months ago. I have documented this in the readme for my R client. |
Unfortunately the dbi driver is slow, I think if this code becomes stable it will have a lot of value. Using RPresto from the prestodb/RPresto repo takes 45 seconds to pull 100k rows and 22 columns. Using your python interface to dump the data to a text file, then read it back into R, takes a total of 6 seconds (for both steps), and using your C interface in the same way it takes 3 seconds. On larger queries the performance difference makes the data workflow difficult. If you have ideas of where the C version could go in terms of stability I can offer to help, I am not looking for a clean R package with all the C files well organized, a solution that uses system("cprestoclient server query") would do well as long as you felt it was stable. Do you have any fresh ideas of what can be improved? |
Wow, that is a surprisingly big difference. My experiments in the past with reading big/huge text files into R showed me that R simply is very slow to convert text into its internal memory structures. I haven't used dbi much, but always assumed it would be faster since it has direct access to creating data.frame's. Apparently the dbi overhead is huge. As an example, the data.table R package has its own methods for creating R memory structures and is way faster as a result. As for stability of the C presto client: in my mind it is ready for production use since I have never found any errors in it, nor have seen any errors reported. But keep in mind that the userbase is probably very small ;) I did an blog article long ago showing how it works: load-excel-data-into-r I never pursued this any further because the performance bottleneck still is R itself. Any help is of course welcome, but i'm not sure how to proceed. |
Would love to see the C version incorporated into RPresto. Is there a way I can help in development?
The text was updated successfully, but these errors were encountered: