-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] feat: use native Go package for decoding Ogg Vorbis audio #118
base: master
Are you sure you want to change the base?
Conversation
5d70cac
to
f7c46d7
Compare
Together with #121 this new Vorbis decoder is in fact more efficient than the existing libvorbis implementation, by about 20% on my laptop (Asahi Linux, which is arm64 like modern Raspberry Pis but perhaps not very comparable in terms of performance). The main thing it's missing now is seek indices. That will likely require an addition to the oggvorbis package. |
Added support for seeking. This requires a patch to the oggvorbis package: jfreymuth/oggvorbis@master...aykevl:oggvorbis:add-SetPositionAfter In fact, with this change seeking becomes precise: it will seek to exactly the sample calculated in I'll be testing this PR for a while, to see whether there are any regressions. |
Seems to be working well. (While testing I found a bug though: if you play a bit of a loud part of a song, pause, seek to a quiet part, and then resume, you'll hear a short bit of the loud part. But that bug was already present before this change). |
Did some benchmarking on a Raspberry Pi 3, and unfortunately the new decoder is about 37% slower there. So maybe this is not such a good idea after all. |
This is great for removing a lot of Vorbis-specific code by offloading it to a library and also for making cross-compiling easier. It is peculiar that it is that much slower than the native implementation, but floating-point math has always been killing performance on librespot-java too. I am surprised floating-point in Go is slower than in C, perhaps the native library is just less optimized? |
My guess is that Go is missing some optimizations that are part of C compilers. The Go compiler hasn't had nearly the amount of performance work that GCC and Clang have had. And the Apple M1 Pro is probably good enough to detect all that and make it fast regardless, but the Cortex-A53 (in the Raspberry Pi 3) isn't. Specifically, I saw that a C compiler is able to use the So that would probably be a significant performance hit. (Even loading a |
Apparently this is a known issue: golang/go#19715 |
Went way too deep into this rabbit hole and made a Go patch to support Unfortunately this won't help the vorbis package directly, some changes are needed so that the compiler can recognize the stores are in fact next to each other in memory. But it probably won't help that much anyway, and I suspect there's lower hanging fruit (like bounds check elimination). |
This is great! 😄
Yes, it's not probably a matter of a few instructions. |
I did a quick test with TinyGo (which uses LLVM, same as Clang and Rust), and the function that seems to be most important ( When I disable bounds checking in TinyGo, the difference is even bigger: it's now 54% faster (or 50% when optimizing for code size). Of course that's an unfair comparison, but it shows that the Go code could be much faster with some optimizations (mostly to avoid bounds checks) and a more advanced optimizer. Disabling bounds checking in the Go compiler has a much smaller effect. Honestly I probably shouldn't put much more time in this. The existing vorbis decoder works fine. But if you're willing to deal with the slowdown, this new decoder could be worth it. |
IMO, the new decoder is great for reducing the amount of code to maintain and for removing a native dependency, but if we have to then maintain the native library too it doesn't make much sense. It may be worth reconsidering this in the future. |
This is something I'm experimenting with. It's nowhere near finished, though it does work. The main thing that's missing is seek indices (which makes seeking a bit slower than it could be).
Advantages:
I did some benchmarking, to see whether using a native Go version would perhaps be faster than using libvorbis through cgo because of the CGo call overhead. What I found was that the native version is a lot slower, but not for the reason you might think: the audio decryptor actually becomes twice as slow while the decoder itself is only about 8% slower. So what I'm suspecting is that the new decoder does a lot of unnecessary seeking or reads more data than it actually needs.
I wanted to share this as an experiment. Maybe I'll continue working on it to see whether I can fix the decryptor slowdowns and of course to implement seeking.