-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EPUB3 Media Overlays: playback with fast speech rate => bad synchronization at word level #1068
Comments
Hello Manfred, thank you for reporting this. I have not experienced this problem during my tests. |
its Windows 10 Pro Version 1809 |
I have just shown Thorium to some collegues at our audiobook recording studio. We all like the progess that you have made regarding media overlays - congratulations! But I have to report, that also with 1x speed the highlighting is a little behind the audio. |
Thank you for the feedback Manfred, much appreciated. Could you please clarify which revision of Thorium you used for testing? Recently some important bug fixes and improvements were introduced, you may install the latest automated build of Thorium to test the obvious TTS readaloud presentation/rendering changes (and some underlying less obvious EPUB3 Media Overlays updates): I am not able to reproduce any audio lag, is there a particular TTS or Media Overlays EPUB you would like me to try? (I have a slower Windows computer which I can use to test this) |
We used 1.3.1 alpha 1.40 - but I have just re-tested it with 1.54. |
this is a short Video to show the behaviour |
Wow, this is quite a perceivable delay / glitch. Thank you for the video screen capture. I have not experienced this with any of my test EPUBs, including several commercial fixed-layout word-level children’s talking books (like the one in the video). So, would you mind sharing the EPUB with me please, so I can run some tests? Thank you! |
Hello Manfred, thank you for sharing the test EPUBs. That being said, I can see how/why there would be a delay in some circumstances, due to the asynchronous nature of the communication between the audio playback engine (which runs perfectly smooth) and the "highlighting" mechanism which requires IPC (Inter Process Communication) messages with the |
Helle Daniel |
Hello Manfred, I strongly suspect that the performance issues were due to a bug in Thorium's database store. Could you please re-install the app from the automated releases: https://github.com/edrlab/thorium-reader/releases There should still be a few milliseconds delay, but nothing like the lag observed in your video. |
Hi Daniel |
Mac Thorium 1.3.1-alpha.1.2515 It is there but only slightly at 1x speed but becomes very far behind at faster speeds. Also the start and stop times are also out of step at faster speeds so audio from the next page is being played whilst the delayed media overlays are catching up. Video linked and I can also share the file too if you wish. https://www.dropbox.com/s/lk378r9f8xsvfco/Thorium%20Delayed%20highlights.mp4?dl=0 |
Very useful feedback Ken, thank you. I suspect the performance bottleneck is the Electron IPC asynchronous events which instruct the embedded sandboxed webview(s) to update the synchronized highlights. The audio playback itself occurs in the parent BrowserWindow runtime that hosts one of two webviews (two in the case of fixed-layout two-page spreads). If I’m right, this is going to be a tough technical problem to work around, but first I need to be able to consistently reproduce the highlight lag. So please would you mind sharing the FXL title so I can run tests? (I have a slower Mac laptop which I can use to run stress tests) |
No problem, here you go https://www.dropbox.com/t/xEBH8GBAgXXjCmXe |
Thank you Ken, 100% reproducible with I am pretty confident that this is due to “slow” Electron IPC messaging (with text-audio sync, a few milliseconds suffice to break seamless playback) |
Interesting: I investigated this further and my findings seem to indicate that no significant delay is caused by the passing of asynchronous messages across the IPC boundary (Inter Process Communication) between the Media Overlays "audio controller" (which resides in the top-level renderer process) and target XHTML documents (which are hosted by independent / isolated webview renderer processes, inside sandboxed iframes). I also measured execution timings between the moment an instruction is given by the MO controller to play the next logical synchronised fragment defined in SMIL, and the moment an instruction is given by the webview API to highlight the synchronisation target (i.e. to apply the authored CSS class to the destination XHTML element). There is no apparent delay here either. Yet, at accelerated playback speed the highlighted utterances clearly lag behind by a few significant milliseconds (this could indicate that the queued Note that I have been using To speed up my tests, I hacked into ...to no avail. I am not able to pin-point a particular performance hog in our code logic, the perceived "sluggish" Media Overlays highlighting caused by short word-level playback synchronisation occurs regardless. Penultimate technical note: Thorium makes use of the ES6-2015 transpiled output instead of the ES8-2017 Javascript code generated by the TypeScript build system in all => Final technical note: in the above code snippet, note the |
Based on these findings, I think that the next logical step in this process of elimination is to migrate the Media Overlays "finite state machine" (i.e. granular SMIL orchestration based on the clock tick from audio playback) from the top-level controller context (i.e. principal Electron renderer process) directly into the XHTML content webviews (i.e. secondary isolated renderer processes). This approach introduces a design inconsistency, in that the audio playback engine which generates the clock ticks at the user-chosen playback rate will then be located in the publication documents themselves. This is architecturally incorrect, but this is an acceptable trade-off if it yields the required rendering performance. In practice, the continuous audio playback from one FXL page to another is implemented physically at the file level (e.g. single MP3 resource for an entire book), however at the logical level, utterances are delineated at authoring time in such a way that a pause in the narration is expected between page turns. So with the new approach where audio playback is not controlled from the top-level application context, the perceivable break when switching from one document to another (i.e. one FXL page to the next) due to the audio player instance being reset / reinstantiated, should not degrade the user experience much. Again, this would be an acceptable trade-off given the expected gain in highlighting fluidity. To be continued... |
Thorium v1.3.1-alpha.1.40
Books with media overlays and word level synchronization.
If I set the playback to a faster speed, the highlighting of the text is falling behind.
The text was updated successfully, but these errors were encountered: