Power cycling is the process of turning hardware off and then turning it on again. Powercycle test is designed to work across two machines, one machine is a "server" that controls and monitors the workflow and a "client" that runs Mongo server and is remotely crashed by "server" regularly.
In evergreen the localhost that runs the task acts as a "server" and the remote
host which is created by host.create
evergreen command acts as a "client".
Powercycle test is the part of resmoke. Python 3.10+ with python venv is required to run the resmoke (python3 from mongodbtoolchain is highly recommended). Python venv can be set up by running in the root mongo repo directory:
python3 -m venv python3-venv
source python3-venv/bin/activate
pip install -r buildscripts/requirements.txt
If python venv is already set up activate it before running the resmoke:
source python3-venv/bin/activate
There are several commands that can be run by calling resmoke powercycle subcommand:
python buildscripts/resmoke.py powercycle --help
The main entry point of resmoke powercycle subcommand is located in this file:
buildscripts/resmokelib/powercycle/__init__.py
Evergreen host.create command
- in Evergreen the remote host is created with the same distro as the localhost runs and some initial connections are made to ensure it's up before further stepsResmoke powercycle setup-host command
- prepares remote host via ssh to run the powercycle test:
python buildscripts/resmoke.py powercycle setup-host
Powercycle setup-host operations are located in
buildscripts/resmokelib/powercycle/setup/__init__.py
.
expansions.yml
file is used to load the configuration to run operations which is
created by expansions.write
command in Evergreen.
It runs several operations via ssh:
- create directory on the remote host
- copy
buildscripts
andmongoDB executables
from localhost to the remote host - set up python venv on the remote host
- set up curator to collect system & process stats on the remote host
- install NotMyFault to crash Windows (only on Windows)
Remote operation via ssh implementation is located in
buildscripts/resmokelib/powercycle/lib/remote_operations.py
.
The following operations are supported:
copy_to
- copy files from the localhost to the remote hostcopy_from
- copy files from the remote host to the localhostshell
- runs shell command on the remote host
Resmoke powercycle run command
- runs the powercycle test on the localhost
which runs remote operations on the remote host via ssh and local validation
checks:
python buildscripts/resmoke.py powercycle run \
--sshUserHost=${user_name}@${host_ip} \
--sshConnection=\"-i ${ssh_public_key_file}\" \
--taskName=${task_name}
The arguments for resmoke powercycle run command are defined in add_subcommand()
function in buildscripts/resmokelib/powercycle/__init__.py
. When powercycle test
runs remote operations on the remote host it calls the copied version of this script
on the remote host. Thus, some resmoke powercycle run command arguments are needed
for the remote call and shouldn't be used when calling the script on the localhost.
--taskName
argument is used to get powercycle task configurations that are stored
in buildscripts/resmokeconfig/powercycle/powercycle_tasks.yml
There is a known issue with --setParameter
mongod options incorrectly processed
from mongod_options
that is described in SERVER-47621
The powercycle test main implementation is located in main()
function in
buildscripts/resmokelib/powercycle/powercycle.py
.
The value of --remoteOperation
argument is used to distinguish if we are running the script
on the localhost or on the remote host.
remote_handler()
function performs the following remote operations:
noop
- do nothingcrash_server
- internally crash the serverkill_mongod
- kill mongod processinstall_mongod
- install mongodstart_mongod
- start mongod processstop_mongod
- stop mongod processshutdown_mongod
- run shutdown command using mongo clientrsync_data
- backups mongod dataseed_docs
- seed a collection with random document valuesset_fcv
- run set FCV command using mongo clientcheck_disk
- runchkdsk
command on Windows
When running on localhost the powercycle test loops do the following steps:
- Rsync the database post-crash (starting from the 2nd loop), pre-recovery on the remote host
- makes a backup before recovery
- Start mongod on the secret port on the remote host and wait for it to recover
- also sets FCV and seeds documents on the 1st loop
- Validate canary from the localhost (starting from the 2nd loop)
- uses mongo client to connect to the remote mongod
- Validate collections from the localhost
- calls resmoke to perform the validation on the remote mongod
- Shutdown mongod on the remote host
- Rsync the database post-recovery on the remote host
- makes a backup after recovery
- Start mongod on the standard port on the remote host
- Start CRUD and FSM clients on the localhost
- calls resmoke to run CRUD and FSM clients
- Generate canary document from the localhost
- uses mongo client to connect to the remote mongod
- Crash the remote server or kill mongod on the remote host
- most of the powercycle tasks do crashes
- Run check disk on the remote host (on Windows)
- Exit loop if one of these occurs:
- loop number exceeded
- any step fails
exit_handler()
function writes a report and does cleanups any time after the test run exits.
Resmoke powercycle save-diagnostics command
- copies powercycle diagnostics
files from the remote host to the localhost (mainly used by Evergreen):
python buildscripts/resmoke.py powercycle save-diagnostics
Powercycle save-diagnostics operations are located in
buildscripts/resmokelib/powercycle/save_diagnostics/__init__.py
.
expansions.yml
file is used to load the configuration to run operations which is
created by expansions.write
command in Evergreen.
It runs several operations via ssh:
gatherRemoteEventLogs
- runs on Windows
tarEC2Artifacts
- on success archives
mongod.log
- on failure additionally archives data files and all before-recovery and after-recovery backups
- on failure on Windows additionally archives event logs
- on success archives
copyEC2Artifacts
- from the remote host to the localhost
copyEC2MonitorFiles
- from the remote host to the localhost
gatherRemoteMongoCoredumps
- copies all mongo core dumps to a single directory
copyRemoteMongoCoredumps
- from the remote host to the localhost
Resmoke powercycle remote-hang-analyzer command
- runs hang analyzer on the
remote host (mainly used by Evergreen):
$python buildscripts/resmoke.py powercycle remote-hang-analyzer
Powercycle remote-hang-analyzer command calls resmoke hang analyzer on the
remote host and is located in
buildscripts/resmokelib/powercycle/remote_hang_analyzer/__init__.py
expansions.yml
file is used to load the configuration to run this command which is
created by expansions.write
command in Evergreen.