forked from flux-framework/flux-core
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pull] master from flux-framework:master #38
Open
pull
wants to merge
9,562
commits into
Mattlk13:master
Choose a base branch
from
flux-framework:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
DeepCode's analysis on #fde90d found:
Top issues
👉 View analysis in DeepCode’s Dashboard | Configure the bot |
DeepCode's analysis on #b3b81a found:
Top issues
👉 View analysis in DeepCode’s Dashboard | Configure the bot |
Codecov Report
@@ Coverage Diff @@
## master #38 +/- ##
==========================================
+ Coverage 81.28% 82.98% +1.69%
==========================================
Files 247 428 +181
Lines 38262 75675 +37413
==========================================
+ Hits 31103 62800 +31697
- Misses 7159 12875 +5716
|
Problem: the sdexec code references 'flux imp kill' in an expanatory comment, but we plan to remove the kill subcommand from the IMP and have it forward signals to the cgroup. Drop that comment.
Problem: sdexec_start_transient_unit() does not support the SendSIGKILL attribute, which may need to be set to "off" for Flux guest jobs. Add support for the SendSIGKILL atttribute. To use, set the subprocess command option "SDEXEC_PROP_SendSIGKILL" to "off" (or other valid systemd boolean config value).
Problem: there is no test coverage that confirms sdexec can set KillMode and SendSIGKILL. Add some tests.
Problem: For multi-user jobs spawned via SDEXEC, the systemd user instance running as the flux user does not have permission to kill guest processes, yet it does try and in the process may kill off the only process that does have permission to continue cleanup efforts, the IMP. When the job is run by the IMP and sdexec, Set KillMode=process so that systemd only delivers signals to the IMP, which it should forward to the shell and/or cgroup per RFC 15. Also set SendSIGKILL to "off" so that SIGKILL is never deployed against the IMP. Fixes #6399
sdexec: set KillMode=process SendSIGKILL=no for multi-user jobs
Problem: There is a small chance of receiving EADDRINUSE from an overlay_bind() call if a prior test has not fully cleaned itself up. Destroy and recreate the zeroMQ context to ensure things are cleaned up before continuing on to additional tests. Fixes #6404
broker/test: avoid race in zeromq cleanup
Problem: The `defaults` job frobnicator plugin fails to override global defaults with queue defaults because the global default is applied to the jobspec before the queue defaults are examined, and since a value is already set in jobspec, the queue default is ignored. Instead of applying global defaults before examining queue defaults, update the `queue_defaults()` method to take a defaults dict and update it with queue specific defaults. Then the combined defaults are applied in one pass, avoiding the issue above. Fixes #6400
Problem: A queue-specific default that overrides a global default is not tested in the job frobnicator tests. Add a test to t2112-job-ingest-frobnicator.t that ensures global defaults can be overridden by queue defaults.
frobnicator: allow queue-specific defaults to override global defaults
Problem: One test in t0005-exec.t uses spaces instead of a tab for indentation. Fix tabbing for consistency.
Problem: In the near future additional code flux-exec will use the zlistx_t data structure. Some other code in flux-exec already uses the zhashx_t data structure. One lingering data structure still uses the zlist_t data structure. For consistency going forward, convert the lingering use of zlist_t to zlistx_t.
Problem: libsubprocess now supports stdin flow control via credits, but that is not used in flux-exec. Support credits and flow control in flux-exec to avoid overflowing the stdin buffer. Fixes #4572
Problem: sdexec does not yet support flow control Disable flow control if the service is set to sdexec. Add a --stdin-flow=on|off hidden option to force it either way.
Problem: There is no coverage to ensure that stdin flow control truly works with flux-exec. Add a test to t0005-exec.t that sends a 5 meg file while only using a 4K buffer. A buffer overflow would almost certainly happen if flow control was not working.
flux-exec: use subprocess credits to avoid overflowing stdin buffers
Problem: When the a prolog is canceled, but the cancel times out, no job exception is raised and the job proceeds to start. No only this, but the epilog can end up being run twice since there's an assumption in the code that there will be no finish event. Add a new flag to the perilog_proc struct to capture when a canceled prolog times out. Emit a specific exception in emit_finish_event() when this flag is set and force the prolog exit status to 1.
Problem: An error message is logged using `flux_log_error(3)` when the prolog times out, but there is no errno to decode, so the log message always displays `Success`. Use `flux_log(3)` with `LOG_ERR` instead of `flux_log_error(3)`.
Problem: No tests check that the perilog plugin handles the case where a job prolog is canceled and the cancel times out. Add a test to t2274-manager-perilog-per-rank.t that cancels a prolog that ignores SIGTERM and ensure the job does not start.
resolve perilog plugin issue that lets a job start after prolog timeout when cancellation fails
Problem: libsubprocess duplicates the child ends of channel socketpairs onto stdin, stdout, stderr in the child, but leave the original copies open. Exclude those channels from the idset generated by subprocess_childfds(). Both fork.c and posix_spawn.c already include logic to close descriptors that do not appear in this idset after stdio is duplicated. Fixes #6415
Problem: there are no tests that check for leaked file descriptors in subprocesses. Add test.
libsubprocess: close extra file descriptors
Problem: flux core now requires the IMP signal forwarding features of flux-security 0.13.0, but configure only checks for >= 0.9.0. Modify configure to require >= 0.13.0.
Problem: flux core now requires the IMP signal forwarding features of flux-security 0.13.0, but CI specifies requires 0.11.0. Modify docker-run-checks.sh to require the newer version.
Problem: job-exec uses 'flux imp kill' to deliver signals to multi-user jobs but that command is deprecated. Don't call bulk_exec_set_imp_path().
Problem: compilation of test program fails due to missing jansson headers. Add $(JANSSON_CFLAGS) to AM_CPPFLAGS.
Problem: compilation fails on macos when %ju or %jd is used with non-matching integer types. Cast them to [u]intmax_t.
fix macos portability issues (fifth batch)
Problem: The Rv1 scheduling key as JGF can be large and time-consuming to generate and distribute, but currently it is required on all nodes. Since the key is really only used by the scheduler on rank 0, it would be ideal if resource.scheduling was only read on rank 0. Only process the resource.scheduling config key on rank 0. Fixes #6480
resource: only read `resource.scheduling` config on rank 0
Problem: If a task closes stdin and the job shell tries to write data to it, the shell is immediately terminated with SIGPIPE, but it would be better to handle an EPIPE error instead of terminating the entire job. Block SIGPIPE in the job shell. Fixes #6487
Problem: There's no test to ensure the job shell is not killed by SIGPIPE when writing data to a task that has closed stdin. Attempt to add a test for this issue. Add a note that the test is inherently racy, and it is just a best effort at this time.
shell: ignore SIGPIPE
Problem: libflux-optparse.so has unresolved jansson symbols on focal. A recent macos portability change dropped the linker option to garbage collect unused sections, resulting in unresolved json_* symbols in libflux-optparse.so from libmissing.la on focal. The change, introduced in #6476, was supposed to set a make variable $ld_gc_sections when that option is available, but the change was incomplete and the variable was never set. Add the missing configure logic. Confirmed: - json_ symbols are no longer unresolved in libflux-optparse.so on focal - macos still builds, and json_ symbols do not appear there Fixes #6496
build: use -Wl,--gc-sections when appropriate
Problem: a unit test uses SIGRTMAX + 1 as an invalid signal, but SIGRTMAX is not defined on macos. Use NSIG instead. Although it is not standardized, it works on both linux and macos.
Problem: EBADE was remapped to EINVAL on macos in the KVS but a unit test is still looking for it. Remap there as well.
Problem: a subprocess unit test fails to compile on macos because 'environ' is undefined. Define it as an extern, as has been done elsewhere.
Problem: flux-core has been partially ported to macos, to the point of building but not running, but there is no test to notify of regressions. Add a simple build test to the CI workflow. Fixes #6484
ci: add macos builder
Problem: There's no interface to update the jobspec of a job outside of a jobtap callback for that job. Add flux_jobtap_jobspec_update_id_pack(). This function just does some nominal checks on the target job, then wraps event_post_pack().
Problem: The jobtap API call flux_jobtap_jobspec_update_id_pack() is not tested. Add it to the testing in t2212-job-manager-plugins.t.
jobtap: add `flux_jobtap_jobspec_update_id_pack()`
Problem: It would be useful to query the builtin intree or installed configuration path from libflux, but flux_conf_builtin_get() does not support that specific value. Add confdir to the list of supported keys for flux_conf_builtin_get().
Problem: Python programs do not have access to libflux builtin configuration values via flux_conf_builtin_get(). Add flux.conf_builtin.conf_builtin_get() which provides access to this function from Python. Unfortunately, the bahavior of FLUX_CONF_AUTO must be duplicated because the builtin libflux intree vs installed tests do not work when `python` is the executable (since it isn't part of Flux).
Problem: There are no unit tests for the Python interface to flux_conf_builtin_get(). Add some very simple tests.
add `flux.conf_builtin.conf_builtin_get()` to give Python access to compiled-in config values
Problem: some code does not conform to RFC 7 and project norms. Break long parameter lists to one per line. Format pack/unpack with key-value pairs on the same line.
Problem: some public functions have new footprints but the section 3 manual does not reflect this. Update manual.
Problem: some libflux interfaces use the C int type to represent buffer and I/O sizes, but size_t, as an unsigned type with more range, is more appropriate in public interfaces. Convert the following public API interfaces to use size_t: flux_msg_get_payload() flux_msg_set_payload() flux_request_decode_raw() flux_request_encode_raw() flux_response_decode_raw() flux_response_encode_raw() flux_respond_raw() flux_event_encode_raw() flux_event_decode_raw() flux_event_publish_raw() flux_rpc_raw() flux_rpc_get_raw() flux_kvs_lookup_get_raw() flux_kvs_txn_put_raw() Update tests and in-tree users, where appropriate.
Problem: several functions in the internal blobref class use int types where size_t would be more appropriate. Switch to size_t.
Problem: several functions in the internal filemap and fileref classes use int types where size_t would be more appropriate. Switch to size_t. Update unit test.
Problem: a function in the internal mpart class uses an int type where size_t would be more appropriate. Switch to size_t.
Problem: several functions in the internal stdlog class use int types where size_t would be more appropriate. Switch to size_t. Update unit test. Update users.
libflux: update API to use size_t where appropriate
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by pull[bot]. Want to support this open source service? Please star it : )