You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there 🙂 Not really a bug report (since it's expected behavior), but what is the recommended way to let a container with an attached volume self-heal from a crash?
Scenario
Dockerized Neo4J (in Kubernetes cluster, deployed with Helm in the standalone way)
Pod (e.g. provided by StatefulSet) crashes; pod gets restarted, i.e. new container but same volume
Usual lock file error: Lock file has been locked by another process: /data/databases/store_lock, when the original process was killed and not properly stopped
If I can ensure that the container/pod is the only one using that volume, is there a recommended way to clear the lock file at startup?
I understand that it makes sense from a concurrency perspective, but if the container runtime crashes unexpectedly, the pod can't come up again (CrashLoopBackOff) on its own, and has to manually be restarted...
The text was updated successfully, but these errors were encountered:
I played around with the init process (the Helm chart runs the init.sh script from the config map), and tried to check for the lock file status from the command line. What's interesting is that in the case after the crash the file (/data/databases/store_lock) exists, but can be locked & unlocked in CLI with flock, but the Java process still fails to obtain the lock...
This was my test snippet at the start of the container:
if { flock -x -n /data/databases/store_lock echo -n; }; then
echo "/data/databases/store_lock not locked, continuing"
else
echo "/data/databases/store_lock is locked; deleting"
rm -rf /data/databases/store_lock
fi
which prints ... not locked, though the Java process fails afterwards. Is there another way to check for the lock file, or simply delete it at startup (in my particular example, where there is no other Neo4J instance operating on that filesystem)?
@sdaschner In regular non-k8s docker, you can just delete the store_lock file if you know that there are no running neo4j databases using the mounted data volume. However, you specifically asked about kubernetes and our helm charts, so I know this doesn't exactly answer your question.
Could you re-create this issue on the github repo for the helm charts please? They'll be better able to answer your question / feature request. https://github.com/neo4j-contrib/neo4j-helm
(I can't transfer the issue myself because it's owned by a different account).
Hi there 🙂 Not really a bug report (since it's expected behavior), but what is the recommended way to let a container with an attached volume self-heal from a crash?
Scenario
Lock file has been locked by another process: /data/databases/store_lock
, when the original process was killed and not properly stoppedIf I can ensure that the container/pod is the only one using that volume, is there a recommended way to clear the lock file at startup?
I understand that it makes sense from a concurrency perspective, but if the container runtime crashes unexpectedly, the pod can't come up again (
CrashLoopBackOff
) on its own, and has to manually be restarted...The text was updated successfully, but these errors were encountered: