Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Board freezes periodically #3

Open
SmarTripDood opened this issue Jun 10, 2021 · 21 comments
Open

Board freezes periodically #3

SmarTripDood opened this issue Jun 10, 2021 · 21 comments

Comments

@SmarTripDood
Copy link

First, thank you for this cool project. It works great except I find it freezes every few hours (both the sign and the LED on the Matrix Portal on the back) and I have to reset it. Has this problem occurred for others and is there a fix?

@NBlair52
Copy link

I just recently made 2 boards (thanks so much!) and noticed I also have this problem. Simply unplugging and plugging back in seems to fix the problem, but is not ideal.

@Swimmable7861
Copy link

I don't have a board yet, but have been following as I want to get one eventually but $. @erikrrodriguez maybe has some ideas. They have a fork of this repo with a couple changes to allow multiple stations and a walking distance modifier to ignore trains you won't make in time. Their latest commit says they were attempting to fix the board reset issues.

@erikrrodriguez
Copy link

Unfortunately I have not successfully solved this, and haven't been able to pin down why it happens. I have to reset my board every 2 or 3 days. But if I ever figure it out I'll be sure to push the code to my repo!

@SmarTripDood
Copy link
Author

Is it possible there is something with WMATA's data feed that causes this? I don't think that's it but wanted to check.

@erikrrodriguez
Copy link

Perhaps, but I don't think so. When I've tested on my PC using Python's requests library, the program has run for days with no issues. So I believe it's to do with the board's internal requests library, perhaps getting overloaded.

I've also tried to query MetroHero's API instead of WMATA's. But the can't even get a response using the board's requests library (PC testing again works fine)

@condeepadunov
Copy link

My solution to this problem: I plug it into a smart plug and have it turn OFF at 59 minutes into the hour and back ON on the hour. Not necessarily on an hourly basis, put periodically throughout the day. That solves the issue for me.

@ghost
Copy link

ghost commented Mar 6, 2022

I am running into this issue also. I don't know how to troubleshoot it. The display gets stuck and stops updating at some point. I've tried adding print statements to help troubleshoot. With serial console open, it stops sending out messages too. If anyone knows better ways to troubleshoot, please let me know.

@condeepadunov
Copy link

I am running into this issue also. I don't know how to troubleshoot it. The display gets stuck and stops updating at some point. I've tried adding print statements to help troubleshoot. With serial console open, it stops sending out messages too. If anyone knows better ways to troubleshoot, please let me know.

See the solution above with the smart plug. It's imperfect but it works. Have the thing switch off and on every hour (which is far more often than it freezes) and it'll keep auto restarting and the problem goes away.

@ghost
Copy link

ghost commented Mar 6, 2022

Thanks, that is a nice workaround. I am hoping to find a programming fix of the root cause though, assuming it's possible and the firmware or other hardware problem isn't the issue.

@dylanjtastet
Copy link

My hunch is a memory leak. The portal has very little memory and the adafruit libraries have become notoriously heavy. I tried to load their version of the datetime library and it immediately crashed the board in a similar fashion.

@ghost
Copy link

ghost commented May 29, 2022

@dylanjtastet I thought so too. But I am monitoring memory with gc module and it doesn't show a loss in memory. Would the leak be detectable any other way?

@SmarTripDood
Copy link
Author

I'm using a smart plug, but I really think the only permanent solution is to set this up on different hardware. There are a number of similar boards out there that do the same thing, using Raspberry Pi.

@ScottKekoaShay
Copy link

I tried various cords and plugs just for kicks, and mine does the same thing--just craps out usually within an hour, but sometimes it lasts longer. I had it connected to my computer to see the console, and this is what I get:

Retrieving data...Received response from WMATA api...
Reply received.
Successfully updated.
Refreshing train information...
Retrieving data...Traceback (most recent call last):
  File "code.py", line 25, in <module>
  File "train_board.py", line 41, in refresh
  File "code.py", line 22, in <lambda>
  File "code.py", line 15, in refresh_trains
  File "metro_api.py", line 17, in fetch_train_predictions
  File "metro_api.py", line 23, in _fetch_train_predictions
  File "adafruit_portalbase/network.py", line 518, in fetch
  File "adafruit_requests.py", line 823, in get
  File "adafruit_requests.py", line 679, in request
  File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 138, in recv
  File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 210, in available
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 776, in socket_available
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 332, in _send_command_get_response
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 299, in _wait_response_cmd
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 278, in _wait_spi_char
TimeoutError: Timed out waiting for SPI char

Code done running.

So I think it is an issue in the library. Unfortunately, that's beyond my knowledge, but perhaps someone who has a deeper understanding can figure out a solution (esp. one that doesn't involve actually updating the library). I am thinking it should be possible to catch the error and then have the thing restart itself, if nothing else? But I wasn't able to do that. It doesn't seem to be able to recover from the error gracefully. If anyone can figure it out, let me know!

@dylanjtastet
Copy link

I'm going to try using @erikrrodriguez's fork. Digging through the network library, it looks like there's a bit of redundancy in using the adafruit_request library for http as adafruit_esp32spi_wifimanager.ESPSPI_WiFiManager already provides this api. It also uses the request library, which is a global singleton, so the network library is creating redundant sockets and re-initializing the request library.

This is where I stopped digging, but my hunch now is that all of this is crashing the wifi coprocessor. Erik's code will also reset the coprocessor if the board is failing requests which should keep things turning if all else fails.

@erikrrodriguez
Copy link

Thanks @dylanjtastet I was going to ping and also recommend @ScottKekoaShay try my fork.

I will admit that my board sometimes also freezes, and I haven't been able to discover why. It seems like it hangs after the request is made and ignores the timeout in waiting for a response. So I think it is ultimately still and issue in the adafruit_request library.

But, my board was been running the past 4 days without me needing to manually reset it 🎉

@ScottKekoaShay
Copy link

Thanks i will give it a try as well!

@ghost
Copy link

ghost commented Apr 9, 2023

I tried various cords and plugs just for kicks, and mine does the same thing--just craps out usually within an hour, but sometimes it lasts longer. I had it connected to my computer to see the console, and this is what I get:

Retrieving data...Received response from WMATA api...
Reply received.
Successfully updated.
Refreshing train information...
Retrieving data...Traceback (most recent call last):
  File "code.py", line 25, in <module>
  File "train_board.py", line 41, in refresh
  File "code.py", line 22, in <lambda>
  File "code.py", line 15, in refresh_trains
  File "metro_api.py", line 17, in fetch_train_predictions
  File "metro_api.py", line 23, in _fetch_train_predictions
  File "adafruit_portalbase/network.py", line 518, in fetch
  File "adafruit_requests.py", line 823, in get
  File "adafruit_requests.py", line 679, in request
  File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 138, in recv
  File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 210, in available
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 776, in socket_available
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 332, in _send_command_get_response
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 299, in _wait_response_cmd
  File "adafruit_esp32spi/adafruit_esp32spi.py", line 278, in _wait_spi_char
TimeoutError: Timed out waiting for SPI char

Code done running.

So I think it is an issue in the library. Unfortunately, that's beyond my knowledge, but perhaps someone who has a deeper understanding can figure out a solution (esp. one that doesn't involve actually updating the library). I am thinking it should be possible to catch the error and then have the thing restart itself, if nothing else? But I wasn't able to do that. It doesn't seem to be able to recover from the error gracefully. If anyone can figure it out, let me know!

I forgot to update this when I resolved my issue. For me on the Matrix Portal M4 with the help of Dan Halbert, one of the core develoers of CircuitPython, the root issue was the firmware for Circuit Python 7.3 which had a buggy DMA feature that caused the SPI failures of all kinds. He fixed it and it should be pushed to later versions, so if your board has old firmware, try flashing a new version.

@erikrrodriguez
Copy link

erikrrodriguez commented Apr 9, 2023

Makes sense, I'm currently using Circuit Python 8 as of the other week. Thanks for getting in touch with Dan, I'm glad he could push a fix!

Edit: Something else I did was update the ESP firmware separate from Circuit Python using this guide: https://learn.adafruit.com/upgrading-esp32-firmware/upgrade-all-in-one-esp32-airlift-firmware

@dylanjtastet
Copy link

Yep makes sense, I was getting the same issue after switching to @erikrrodriguez's fork. Will try updating firmware now.

@SmarTripDood
Copy link
Author

Upgrading firmware to 8 and the latest files from @erikrrodriguez's fork solved it for me -- many thanks!

@ScottKekoaShay
Copy link

Updating the ESP firmware did the trick for me--mine's been running several days without crapping out. Thanks for the tips @erikrrodriguez !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants