-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use multipart upload instead of a temporary file #47
Comments
It is a good idea. I'll keep this issue up as a reminder, but I'm not sure when I'll have time to look at it just yet. |
Thanks for your prompt response @willmcgugan, I really appreciate. If somebody or myself want to pick this and submit a PR, would you:
|
I would be delighted to accept a PR...
Third party libs would be okay, as long as they are of a good quality and don't have too many dependancies themselves. But I'd prefer to avoid C extensions if possible.
It's rare to have one method being better than another in every way, so I'd need to consider the pros and cons. For instance, if you write a line at a time, that will be quite efficient with the current tmpfile, but I imagine it would be very slow to do a request for every line. But if there is some buffering on top of that, it might not be an issue. If you need it for something commercial, I am available for contract work. |
Thanks for proposing. Actually, I would have needed it for commercial stuff, but it's too late already. I did things otherwise (and it was not huge enough to spend more time on it anyways). I may contribute in my spare time for fun though, yet not sure if and not sure when, just like you 😉 . Cheers. |
Hi,
If I understand the current implementation correctly, it creates a temporary file on disk, writes everything to it, and just when the file is closed, uploads it to S3. This implementation is not applicable for very large files and is probably slightly suboptimal in some circumstances. For these reasons, I think it would be beneficial should s3fs could take advantage of multipart uploads.
There's a lot of gotchas and boilerplate associated with multipart uploads though. One possibly simpler implementation could be to delegate the hassle to a third-party library that already does the proxying stuff, such as smart_open (great library by the way). Needless to say that all this could be opt-in.
Thank you so much for all the great work on pyfilesystem2 and s3fs 👍
The text was updated successfully, but these errors were encountered: