-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement zero copy writes for the WebSocket writer #9634
base: master
Are you sure you want to change the base?
Conversation
CodSpeed Performance ReportMerging #9634 will degrade performances by 13.08%Comparing Summary
Benchmarks breakdown
|
❌ 6 Tests Failed:
View the top 3 failed tests by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
Benchmarks barely show any improvement. I think the message payload would have to be rather large for it to matter. We don't have any benchmarks for that |
Still probably not a good benchmark. Will need to extend the round trip WebSocket benchmark to have another one that sends large messages to really get an idea of how it will perform |
Nevermind. I was misremembering. It’s reads that get paused which makes more sense |
Anyways. Let’s see what the profile looks like with now with all the memcpy reduction optimizations |
We still end up with more fragmentation somehow. Probably need to work on the reader some more |
need to look at this with syscalls |
I got this working so it was a net win for the StreamWriter in #9839 I think we need better logic to handle larger payloads only with Running it again now so I can look at the flame graphs once codspeed finishes as it will likely give a better idea on where the breakpoint is |
The websocket data queue is very small 65k so as soon as we get a large message, we always pause the reader |
For now, +-10% difference doesn't look very impressive. |
no it definitely doesn't |
also codspeed doesn't show syscalls since they can't be consistently instrumented so anything that does syscalls has to be checked manually in the codspeed UI since it can be a significant improvement or significant degradation. |
@bdraco if you are looking for possible optimizations, replacing https://docs.python.org/3/library/asyncio-protocol.html#asyncio.Protocol with https://docs.python.org/3/library/asyncio-protocol.html#asyncio.BufferedProtocol could be an interesting experiment. |
That could help quite a bit if they are small. Not sure if it will help if they are not small though since we have to combine them at some point to wrap it in Still most messages are small so it should save one |
Oh but we still have to do one slice to split of the headers and message body.....hmmm |
No, I'm talking about a little different situation. Memcpy stays as is, but preallocated membuffers could eliminate alloc/free calls. |
Yeah I expect it won't make much difference. Any idea on how to avoid the copy at aiohttp/aiohttp/_http_parser.pyx Line 766 in be31bed
|
I'm far from my laptop right now, I cannot watch at generated Cython code. |
diff --git a/aiohttp/_http_parser.pyx b/aiohttp/_http_parser.pyx
index 30767dae6..804f3f8b5 100644
--- a/aiohttp/_http_parser.pyx
+++ b/aiohttp/_http_parser.pyx
@@ -11,6 +11,7 @@ from cpython cimport (
PyBytes_AsStringAndSize,
PyObject_GetBuffer,
)
+from cpython.memoryview cimport PyMemoryView_FromMemory
from cpython.mem cimport PyMem_Free, PyMem_Malloc
from libc.limits cimport ULLONG_MAX
from libc.string cimport memcpy
@@ -763,7 +764,7 @@ cdef int cb_on_headers_complete(cparser.llhttp_t* parser) except -1:
cdef int cb_on_body(cparser.llhttp_t* parser,
const char *at, size_t length) except -1:
cdef HttpParser pyparser = <HttpParser>parser.data
- cdef bytes body = at[:length]
+ body = PyMemoryView_FromMemory(<char*>at, length, 0);
try:
pyparser._payload.feed_data(body)
except BaseException as underlying_exc: This kinda works but only if we don't release the buffer. Ideally we find a way for this to defer releasing the buffer until all the memory views are consumed... |
Implement zero copy writes for the WebSocket writer
closes #9633
python/cpython#91166 made
writelines
zero copy for Python 3.12+. For older versions it will still join the bytesWould like another set of 👀 on this to make sure it makes sense.