Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

"Cannot start hyperg" while it's already running #5143

Open
etam opened this issue Mar 24, 2020 · 2 comments
Open

"Cannot start hyperg" while it's already running #5143

etam opened this issue Mar 24, 2020 · 2 comments
Labels
bug P3 Severity-Low/Effort-hard

Comments

@etam
Copy link
Contributor

etam commented Mar 24, 2020

Description

Golem Version: f5a985e

OS: Linux

Branch: b0.23

Reproducible: sometimes

Description of the issue:

When starting golem with hyperg already running it usually correctly detects it, but sometimes fails with

2020-03-21 04:13:33 CRITICAL golem.client                        Can't start network. Giving up.
Traceback (most recent call last):
  File "/home/buildbot-worker/worker/test_node_integration/build/golem/client.py", line 373, in start
    self.start_network()
  File "/home/buildbot-worker/worker/test_node_integration/build/golem/client.py", line 480, in start_network
    self.daemon_manager.start()
  File "/home/buildbot-worker/worker/test_node_integration/build/golem/network/hyperdrive/daemon_manager.py", line 116, in start
    return self._start()
  File "/home/buildbot-worker/worker/test_node_integration/build/golem/report.py", line 173, in wrapper
    return func(*args, **kwargs)
  File "/home/buildbot-worker/worker/test_node_integration/build/golem/network/hyperdrive/daemon_manager.py", line 138, in _start
    raise RuntimeError("Cannot start {}".format(self._executable))
RuntimeError: Cannot start hyperg

Actual result:

Golem fails to start.

Steps To Reproduce

  1. Start hyperg
  2. Start golem

Expected behavior

Golem should always detect running hyperg.

Logs and any additional context

https://buildbot.golem.network/buildbot/#builders/15/builds/979 (test test_task_timeout)
https://buildbot.golem.network/buildbot/#/builders/15/builds/981 (test test_frame_restart)

@etam etam added bug brass P3 Severity-Low/Effort-hard labels Mar 24, 2020
@etam
Copy link
Contributor Author

etam commented Mar 27, 2020

Hypothesis: Before starting hyperg, golem tries to connect to potentially existing one. This might be undefined behaviored by twisted, if called from thread.

@maaktweluit
Copy link
Contributor

AFAIR the node_integration_tests are responsible for starting their own hyperg
i think it can be related to:

  • not properly closing hyperg after test1, making test2 fail ( zombie-g )
  • race when starting the hyperg on the same machine from multiple nodes at the same time

@badb badb removed the brass label Jun 22, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug P3 Severity-Low/Effort-hard
Projects
None yet
Development

No branches or pull requests

4 participants