-
Notifications
You must be signed in to change notification settings - Fork 16
Increased timeout to 16MB spiffs flash problem #127
Increased timeout to 16MB spiffs flash problem #127
Conversation
There is a greater than 25s delay when writing a 13MB partition to 16MB flash So 30s timeout might be insufficient in some cases Hence increased to 40s
src/const.ts
Outdated
@@ -185,7 +185,7 @@ export const USB_RAM_BLOCK = 0x800; | |||
export const ESP_RAM_BLOCK = 0x1800; | |||
|
|||
// Timeouts | |||
export const DEFAULT_TIMEOUT = 3000; | |||
export const DEFAULT_TIMEOUT = 40000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't just increase the timeout for all tasks to 40 seconds. In that case we might as well have no timeout at all.
- Why does it take 25 seconds before we get a response to writing a block?
- If we can't solve that, can we detect we're writing a spiffs partition and increase the timeout only then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've looked into this a little more and I think maybe I'm misunderstanding something about how the stub vs main loader work. I based my code for using esp-web-flasher on the esp-web-tools repo ... https://github.com/esphome/esp-web-tools - in this repo all flashing is done with the stub (https://github.com/esphome/esp-web-tools/blob/main/src/flash.ts line 185). In this case the timeout used is the DEFAULT_TIMEOUT. I guess if some (or all) of the flashing was done with the main instance of ESPLoader then the timeout would be calculated using the ERASE_REGION_TIMEOUT_PER_MB value which is 30 seconds per megabyte and that would be ample. However, I have tried flashing the spiffs partition with the main ESPLoader instance and I get a different error "Didn't get enough status bytes". The issue here seems to be that a packet with 2 status bytes is the first thing received after switching to the main ESPLoader and this throws an error because 4 status bytes are expected when not using the stub. I can "fix" this by simply commenting out the code which checks the status bytes altogether and it "works" - but obviously some error checking has been removed. I'm not sure how to proceed as I don't really understand when the stub should be used and when it shouldn't - for uploading the bootloader I guess - but for the partition table too? And maybe there is a correct way to switch from using the stub to using the main ESPLoader that I'm not following?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timeoutPerMb
is only applied for when it needs to erase. An image is broken into blocks which are individually written with a command. I still don't understand why that would require higher timeouts when more writes are being done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick video of flashing the same files with esptool https://youtu.be/NezsdWi1kqY and the ~24 second delay can be seen. I don't know why it stops at this point. I guess it would be possible to borrow the timeout setting code which calculates a timeout based on the size of the file to be flashed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we adapt the timeout code that ESPTool uses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic which controls timeout in esptool is around line 496 in cmds.py (https://github.com/espressif/esptool/blob/master/esptool/cmds.py) and is as follows:
block_timeout = max(
DEFAULT_TIMEOUT,
timeout_per_mb(ERASE_WRITE_TIMEOUT_PER_MB, block_uncompressed),
)
if not esp.IS_STUB:
timeout = (
block_timeout # ROM code writes block to flash before ACKing
)
esp.flash_defl_block(block, seq, timeout=timeout)
When flashing larger blocks with the stub this results in timout being calculated from the block size and, in the case of the ~14MB block the result is a bit over 367 seconds. The reason it is done like this I guess is that the blocks are compressed so a small write command can expand into a large amount of flash writing - in my case the file system is mainly empty initially.
In the esp-web-flasher the equivalent logic seems to be in esp_loader.ts around line 748 (https://github.com/NabuCasa/esp-web-flasher/blob/main/src/esp_loader.ts) and looks like this:
if (this.IS_STUB) {
writeSize = size; // stub expects number of bytes here, manages erasing internally
timeout = DEFAULT_TIMEOUT;
} else {
writeSize = eraseBlocks * flashWriteSize; // ROM expects rounded up to erase block size
timeout = timeoutPerMb(ERASE_REGION_TIMEOUT_PER_MB, writeSize); // ROM performs the erase up front
}
buffer = pack("<IIII", writeSize, numBlocks, flashWriteSize, offset);
await this.checkCommand(ESP_FLASH_DEFL_BEGIN, buffer, 0, timeout);
The logic seems to be the opposite way around regarding the stub. So that in the case of esp-web-flasher the timeout for any size block (when flashing with the stub) is 3 seconds.
However, this only calculates the timeout for the flashDeflBegin function and the same timeout value needs to be used in the flashDeflBlock function.
I have updated the PR as follows:
- reverted the original change to the DEFAULT_TIMEOUT.
- reversed the logic described above i.e. line 748 of esp_loader.ts now says: if (!this.IS_STUB) {
- return the timeout value from the flashDeflBegin function (it used to return the numBlocks but the return value was ignored (line 761)
- use the timeout value returned from flashDeflBegin (line 569) as the timeout parameter in the call to flashDeflBlock (line 611)
- in the flashDeflBlock function pass the timeout parameter to the checkCommand function (line 659)
This works for my application. I haven't done any other testing but I expect that it is a relatively safe change as the behaviour is now much closer to that of esptool.
Why not making this timeout depending on the binary size? |
src/const.ts
Outdated
@@ -1,303 +1,303 @@ | |||
import { toByteArray } from "./util"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've changed the new lines of this file, please revert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted - apologies
@@ -745,7 +747,7 @@ export class ESPLoader extends EventTarget { | |||
let timeout = 0; | |||
let buffer; | |||
|
|||
if (this.IS_STUB) { | |||
if (!this.IS_STUB) { | |||
writeSize = size; // stub expects number of bytes here, manages erasing internally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if-check also checks for write size, not just timeout. You're changing how that is determined based on it being a stub now too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are correct - my mistake. I have now moved the timeout logic to be the other way around so the writeSize calculation is as it was previously.
Released in ESP Web Flasher 5.1.4 |
When flashing an ESP32 with 16MB flash chip, during flash of spiffs partition (~13MB) there is a long delay when the first block is written. This causes a time-out and results in the flash failing. Increasing the DEFAULT_TIMEOUT value from 3000 to 40000 fixes this problem. I found that the actual delay is around 25 seconds so a timeout of 40 seconds seems reasonable and probably doesn't make the user experience too much worse in cases where communication really does break down.