Skip to content

Commit

Permalink
Merge branch 'ds/maintenance-part-3'
Browse files Browse the repository at this point in the history
Parts of "git maintenance" to ease writing crontab entries (and
other scheduling system configuration) for it.

* ds/maintenance-part-3:
  maintenance: add troubleshooting guide to docs
  maintenance: use 'incremental' strategy by default
  maintenance: create maintenance.strategy config
  maintenance: add start/stop subcommands
  maintenance: add [un]register subcommands
  for-each-repo: run subcommands on configured repos
  maintenance: add --schedule option and config
  maintenance: optionally skip --auto process
  • Loading branch information
gitster committed Nov 18, 2020
2 parents c042c45 + 0016b61 commit 7660da1
Show file tree
Hide file tree
Showing 17 changed files with 759 additions and 8 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@
/git-filter-branch
/git-fmt-merge-msg
/git-for-each-ref
/git-for-each-repo
/git-format-patch
/git-fsck
/git-fsck-objects
Expand Down
25 changes: 25 additions & 0 deletions Documentation/config/maintenance.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,35 @@
maintenance.auto::
This boolean config option controls whether some commands run
`git maintenance run --auto` after doing their normal work. Defaults
to true.

maintenance.strategy::
This string config option provides a way to specify one of a few
recommended schedules for background maintenance. This only affects
which tasks are run during `git maintenance run --schedule=X`
commands, provided no `--task=<task>` arguments are provided.
Further, if a `maintenance.<task>.schedule` config value is set,
then that value is used instead of the one provided by
`maintenance.strategy`. The possible strategy strings are:
+
* `none`: This default setting implies no task are run at any schedule.
* `incremental`: This setting optimizes for performing small maintenance
activities that do not delete any data. This does not schedule the `gc`
task, but runs the `prefetch` and `commit-graph` tasks hourly and the
`loose-objects` and `incremental-repack` tasks daily.

maintenance.<task>.enabled::
This boolean config option controls whether the maintenance task
with name `<task>` is run when no `--task` option is specified to
`git maintenance run`. These config values are ignored if a
`--task` option exists. By default, only `maintenance.gc.enabled`
is true.

maintenance.<task>.schedule::
This config option controls whether or not the given `<task>` runs
during a `git maintenance run --schedule=<frequency>` command. The
value must be one of "hourly", "daily", or "weekly".

maintenance.commit-graph.auto::
This integer config option controls how often the `commit-graph` task
should be run as part of `git maintenance run --auto`. If zero, then
Expand Down
59 changes: 59 additions & 0 deletions Documentation/git-for-each-repo.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
git-for-each-repo(1)
====================

NAME
----
git-for-each-repo - Run a Git command on a list of repositories


SYNOPSIS
--------
[verse]
'git for-each-repo' --config=<config> [--] <arguments>


DESCRIPTION
-----------
Run a Git command on a list of repositories. The arguments after the
known options or `--` indicator are used as the arguments for the Git
subprocess.

THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.

For example, we could run maintenance on each of a list of repositories
stored in a `maintenance.repo` config variable using

-------------
git for-each-repo --config=maintenance.repo maintenance run
-------------

This will run `git -C <repo> maintenance run` for each value `<repo>`
in the multi-valued config variable `maintenance.repo`.


OPTIONS
-------
--config=<config>::
Use the given config variable as a multi-valued list storing
absolute path names. Iterate on that list of paths to run
the given arguments.
+
These config values are loaded from system, global, and local Git config,
as available. If `git for-each-repo` is run in a directory that is not a
Git repository, then only the system and global config is used.


SUBPROCESS BEHAVIOR
-------------------

If any `git -C <repo> <arguments>` subprocess returns a non-zero exit code,
then the `git for-each-repo` process returns that exit code without running
more subprocesses.

Each `git -C <repo> <arguments>` subprocess inherits the standard file
descriptors `stdin`, `stdout`, and `stderr`.


GIT
---
Part of the linkgit:git[1] suite
99 changes: 98 additions & 1 deletion Documentation/git-maintenance.txt
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,55 @@ Git repository.
SUBCOMMANDS
-----------

register::
Initialize Git config values so any scheduled maintenance will
start running on this repository. This adds the repository to the
`maintenance.repo` config variable in the current user's global
config and enables some recommended configuration values for
`maintenance.<task>.schedule`. The tasks that are enabled are safe
for running in the background without disrupting foreground
processes.
+
The `register` subcomand will also set the `maintenance.strategy` config
value to `incremental`, if this value is not previously set. The
`incremental` strategy uses the following schedule for each maintenance
task:
+
--
* `gc`: disabled.
* `commit-graph`: hourly.
* `prefetch`: hourly.
* `loose-objects`: daily.
* `incremental-repack`: daily.
--
+
`git maintenance register` will also disable foreground maintenance by
setting `maintenance.auto = false` in the current repository. This config
setting will remain after a `git maintenance unregister` command.

run::
Run one or more maintenance tasks. If one or more `--task` options
are specified, then those tasks are run in that order. Otherwise,
the tasks are determined by which `maintenance.<task>.enabled`
config options are true. By default, only `maintenance.gc.enabled`
is true.

start::
Start running maintenance on the current repository. This performs
the same config updates as the `register` subcommand, then updates
the background scheduler to run `git maintenance run --scheduled`
on an hourly basis.

stop::
Halt the background maintenance schedule. The current repository
is not removed from the list of maintained repositories, in case
the background maintenance is restarted later.

unregister::
Remove the current repository from background maintenance. This
only removes the repository from the configured list. It does not
stop the background maintenance processes from running.

TASKS
-----

Expand Down Expand Up @@ -110,7 +152,18 @@ OPTIONS
only if certain thresholds are met. For example, the `gc` task
runs when the number of loose objects exceeds the number stored
in the `gc.auto` config setting, or when the number of pack-files
exceeds the `gc.autoPackLimit` config setting.
exceeds the `gc.autoPackLimit` config setting. Not compatible with
the `--schedule` option.

--schedule::
When combined with the `run` subcommand, run maintenance tasks
only if certain time conditions are met, as specified by the
`maintenance.<task>.schedule` config value for each `<task>`.
This config value specifies a number of seconds since the last
time that task ran, according to the `maintenance.<task>.lastRun`
config value. The tasks that are tested are those provided by
the `--task=<task>` option(s) or those with
`maintenance.<task>.enabled` set to true.

--quiet::
Do not report progress or other information over `stderr`.
Expand All @@ -122,6 +175,50 @@ OPTIONS
`maintenance.<task>.enabled` configured as `true` are considered.
See the 'TASKS' section for the list of accepted `<task>` values.


TROUBLESHOOTING
---------------
The `git maintenance` command is designed to simplify the repository
maintenance patterns while minimizing user wait time during Git commands.
A variety of configuration options are available to allow customizing this
process. The default maintenance options focus on operations that complete
quickly, even on large repositories.

Users may find some cases where scheduled maintenance tasks do not run as
frequently as intended. Each `git maintenance run` command takes a lock on
the repository's object database, and this prevents other concurrent
`git maintenance run` commands from running on the same repository. Without
this safeguard, competing processes could leave the repository in an
unpredictable state.

The background maintenance schedule runs `git maintenance run` processes
on an hourly basis. Each run executes the "hourly" tasks. At midnight,
that process also executes the "daily" tasks. At midnight on the first day
of the week, that process also executes the "weekly" tasks. A single
process iterates over each registered repository, performing the scheduled
tasks for that frequency. Depending on the number of registered
repositories and their sizes, this process may take longer than an hour.
In this case, multiple `git maintenance run` commands may run on the same
repository at the same time, colliding on the object database lock. This
results in one of the two tasks not running.

If you find that some maintenance windows are taking longer than one hour
to complete, then consider reducing the complexity of your maintenance
tasks. For example, the `gc` task is much slower than the
`incremental-repack` task. However, this comes at a cost of a slightly
larger object database. Consider moving more expensive tasks to be run
less frequently.

Expert users may consider scheduling their own maintenance tasks using a
different schedule than is available through `git maintenance start` and
Git configuration options. These users should be aware of the object
database lock and how concurrent `git maintenance run` commands behave.
Further, the `git gc` command should not be combined with
`git maintenance run` commands. `git gc` modifies the object database
but does not take the lock in the same way as `git maintenance run`. If
possible, use `git maintenance run --task=gc` instead of `git gc`.


GIT
---
Part of the linkgit:git[1] suite
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -694,6 +694,7 @@ TEST_BUILTINS_OBJS += test-advise.o
TEST_BUILTINS_OBJS += test-bloom.o
TEST_BUILTINS_OBJS += test-chmtime.o
TEST_BUILTINS_OBJS += test-config.o
TEST_BUILTINS_OBJS += test-crontab.o
TEST_BUILTINS_OBJS += test-ctype.o
TEST_BUILTINS_OBJS += test-date.o
TEST_BUILTINS_OBJS += test-delta.o
Expand Down Expand Up @@ -1089,6 +1090,7 @@ BUILTIN_OBJS += builtin/fetch-pack.o
BUILTIN_OBJS += builtin/fetch.o
BUILTIN_OBJS += builtin/fmt-merge-msg.o
BUILTIN_OBJS += builtin/for-each-ref.o
BUILTIN_OBJS += builtin/for-each-repo.o
BUILTIN_OBJS += builtin/fsck.o
BUILTIN_OBJS += builtin/gc.o
BUILTIN_OBJS += builtin/get-tar-commit-id.o
Expand Down
1 change: 1 addition & 0 deletions builtin.h
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix);
int cmd_fetch_pack(int argc, const char **argv, const char *prefix);
int cmd_fmt_merge_msg(int argc, const char **argv, const char *prefix);
int cmd_for_each_ref(int argc, const char **argv, const char *prefix);
int cmd_for_each_repo(int argc, const char **argv, const char *prefix);
int cmd_format_patch(int argc, const char **argv, const char *prefix);
int cmd_fsck(int argc, const char **argv, const char *prefix);
int cmd_gc(int argc, const char **argv, const char *prefix);
Expand Down
58 changes: 58 additions & 0 deletions builtin/for-each-repo.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
#include "cache.h"
#include "config.h"
#include "builtin.h"
#include "parse-options.h"
#include "run-command.h"
#include "string-list.h"

static const char * const for_each_repo_usage[] = {
N_("git for-each-repo --config=<config> <command-args>"),
NULL
};

static int run_command_on_repo(const char *path,
void *cbdata)
{
int i;
struct child_process child = CHILD_PROCESS_INIT;
struct strvec *args = (struct strvec *)cbdata;

child.git_cmd = 1;
strvec_pushl(&child.args, "-C", path, NULL);

for (i = 0; i < args->nr; i++)
strvec_push(&child.args, args->v[i]);

return run_command(&child);
}

int cmd_for_each_repo(int argc, const char **argv, const char *prefix)
{
static const char *config_key = NULL;
int i, result = 0;
const struct string_list *values;
struct strvec args = STRVEC_INIT;

const struct option options[] = {
OPT_STRING(0, "config", &config_key, N_("config"),
N_("config key storing a list of repository paths")),
OPT_END()
};

argc = parse_options(argc, argv, prefix, options, for_each_repo_usage,
PARSE_OPT_STOP_AT_NON_OPTION);

if (!config_key)
die(_("missing --config=<config>"));

for (i = 0; i < argc; i++)
strvec_push(&args, argv[i]);

values = repo_config_get_value_multi(the_repository,
config_key);

for (i = 0; !result && i < values->nr; i++)
result = run_command_on_repo(values->items[i].string, &args);

return result;
}
Loading

0 comments on commit 7660da1

Please sign in to comment.