Skip to content

Commit 34e849b

Browse files
committed
Merge branch 'jt/cdn-offload'
The "fetch/clone" protocol has been updated to allow the server to instruct the clients to grab pre-packaged packfile(s) in addition to the packed object data coming over the wire. * jt/cdn-offload: upload-pack: fix a sparse '0 as NULL pointer' warning upload-pack: send part of packfile response as uri fetch-pack: support more than one pack lockfile upload-pack: refactor reading of pack-objects out Documentation: add Packfile URIs design doc Documentation: order protocol v2 sections http-fetch: support fetching packfiles by URL http-fetch: refactor into function http: refactor finish_http_pack_request() http: use --stdin when indexing dumb HTTP pack
2 parents 1046282 + cae2ee1 commit 34e849b

19 files changed

+750
-167
lines changed

Documentation/git-http-fetch.txt

+8-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ git-http-fetch - Download from a remote Git repository via HTTP
99
SYNOPSIS
1010
--------
1111
[verse]
12-
'git http-fetch' [-c] [-t] [-a] [-d] [-v] [-w filename] [--recover] [--stdin] <commit> <url>
12+
'git http-fetch' [-c] [-t] [-a] [-d] [-v] [-w filename] [--recover] [--stdin | --packfile=<hash> | <commit>] <url>
1313

1414
DESCRIPTION
1515
-----------
@@ -40,6 +40,13 @@ commit-id::
4040

4141
<commit-id>['\t'<filename-as-in--w>]
4242

43+
--packfile=<hash>::
44+
Instead of a commit id on the command line (which is not expected in
45+
this case), 'git http-fetch' fetches the packfile directly at the given
46+
URL and uses index-pack to generate corresponding .idx and .keep files.
47+
The hash is used to determine the name of the temporary file and is
48+
arbitrary. The output of index-pack is printed to stdout.
49+
4350
--recover::
4451
Verify that everything reachable from target is fetched. Used after
4552
an earlier fetch is interrupted.
+78
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
Packfile URIs
2+
=============
3+
4+
This feature allows servers to serve part of their packfile response as URIs.
5+
This allows server designs that improve scalability in bandwidth and CPU usage
6+
(for example, by serving some data through a CDN), and (in the future) provides
7+
some measure of resumability to clients.
8+
9+
This feature is available only in protocol version 2.
10+
11+
Protocol
12+
--------
13+
14+
The server advertises the `packfile-uris` capability.
15+
16+
If the client then communicates which protocols (HTTPS, etc.) it supports with
17+
a `packfile-uris` argument, the server MAY send a `packfile-uris` section
18+
directly before the `packfile` section (right after `wanted-refs` if it is
19+
sent) containing URIs of any of the given protocols. The URIs point to
20+
packfiles that use only features that the client has declared that it supports
21+
(e.g. ofs-delta and thin-pack). See protocol-v2.txt for the documentation of
22+
this section.
23+
24+
Clients should then download and index all the given URIs (in addition to
25+
downloading and indexing the packfile given in the `packfile` section of the
26+
response) before performing the connectivity check.
27+
28+
Server design
29+
-------------
30+
31+
The server can be trivially made compatible with the proposed protocol by
32+
having it advertise `packfile-uris`, tolerating the client sending
33+
`packfile-uris`, and never sending any `packfile-uris` section. But we should
34+
include some sort of non-trivial implementation in the Minimum Viable Product,
35+
at least so that we can test the client.
36+
37+
This is the implementation: a feature, marked experimental, that allows the
38+
server to be configured by one or more `uploadpack.blobPackfileUri=<sha1>
39+
<uri>` entries. Whenever the list of objects to be sent is assembled, all such
40+
blobs are excluded, replaced with URIs. The client will download those URIs,
41+
expecting them to each point to packfiles containing single blobs.
42+
43+
Client design
44+
-------------
45+
46+
The client has a config variable `fetch.uriprotocols` that determines which
47+
protocols the end user is willing to use. By default, this is empty.
48+
49+
When the client downloads the given URIs, it should store them with "keep"
50+
files, just like it does with the packfile in the `packfile` section. These
51+
additional "keep" files can only be removed after the refs have been updated -
52+
just like the "keep" file for the packfile in the `packfile` section.
53+
54+
The division of work (initial fetch + additional URIs) introduces convenient
55+
points for resumption of an interrupted clone - such resumption can be done
56+
after the Minimum Viable Product (see "Future work").
57+
58+
Future work
59+
-----------
60+
61+
The protocol design allows some evolution of the server and client without any
62+
need for protocol changes, so only a small-scoped design is included here to
63+
form the MVP. For example, the following can be done:
64+
65+
* On the server, more sophisticated means of excluding objects (e.g. by
66+
specifying a commit to represent that commit and all objects that it
67+
references).
68+
* On the client, resumption of clone. If a clone is interrupted, information
69+
could be recorded in the repository's config and a "clone-resume" command
70+
can resume the clone in progress. (Resumption of subsequent fetches is more
71+
difficult because that must deal with the user wanting to use the repository
72+
even after the fetch was interrupted.)
73+
74+
There are some possible features that will require a change in protocol:
75+
76+
* Additional HTTP headers (e.g. authentication)
77+
* Byte range support
78+
* Different file formats referenced by URIs (e.g. raw object)

Documentation/technical/protocol-v2.txt

+38-10
Original file line numberDiff line numberDiff line change
@@ -325,13 +325,26 @@ included in the client's request:
325325
indicating its sideband (1, 2, or 3), and the server may send "0005\2"
326326
(a PKT-LINE of sideband 2 with no payload) as a keepalive packet.
327327

328+
If the 'packfile-uris' feature is advertised, the following argument
329+
can be included in the client's request as well as the potential
330+
addition of the 'packfile-uris' section in the server's response as
331+
explained below.
332+
333+
packfile-uris <comma-separated list of protocols>
334+
Indicates to the server that the client is willing to receive
335+
URIs of any of the given protocols in place of objects in the
336+
sent packfile. Before performing the connectivity check, the
337+
client should download from all given URIs. Currently, the
338+
protocols supported are "http" and "https".
339+
328340
The response of `fetch` is broken into a number of sections separated by
329341
delimiter packets (0001), with each section beginning with its section
330-
header.
342+
header. Most sections are sent only when the packfile is sent.
331343

332-
output = *section
333-
section = (acknowledgments | shallow-info | wanted-refs | packfile)
334-
(flush-pkt | delim-pkt)
344+
output = acknowledgements flush-pkt |
345+
[acknowledgments delim-pkt] [shallow-info delim-pkt]
346+
[wanted-refs delim-pkt] [packfile-uris delim-pkt]
347+
packfile flush-pkt
335348

336349
acknowledgments = PKT-LINE("acknowledgments" LF)
337350
(nak | *ack)
@@ -349,13 +362,17 @@ header.
349362
*PKT-LINE(wanted-ref LF)
350363
wanted-ref = obj-id SP refname
351364

365+
packfile-uris = PKT-LINE("packfile-uris" LF) *packfile-uri
366+
packfile-uri = PKT-LINE(40*(HEXDIGIT) SP *%x20-ff LF)
367+
352368
packfile = PKT-LINE("packfile" LF)
353369
*PKT-LINE(%x01-03 *%x00-ff)
354370

355371
acknowledgments section
356-
* If the client determines that it is finished with negotiations
357-
by sending a "done" line, the acknowledgments sections MUST be
358-
omitted from the server's response.
372+
* If the client determines that it is finished with negotiations by
373+
sending a "done" line (thus requiring the server to send a packfile),
374+
the acknowledgments sections MUST be omitted from the server's
375+
response.
359376

360377
* Always begins with the section header "acknowledgments"
361378

@@ -406,9 +423,6 @@ header.
406423
which the client has not indicated was shallow as a part of
407424
its request.
408425

409-
* This section is only included if a packfile section is also
410-
included in the response.
411-
412426
wanted-refs section
413427
* This section is only included if the client has requested a
414428
ref using a 'want-ref' line and if a packfile section is also
@@ -422,6 +436,20 @@ header.
422436
* The server MUST NOT send any refs which were not requested
423437
using 'want-ref' lines.
424438

439+
packfile-uris section
440+
* This section is only included if the client sent
441+
'packfile-uris' and the server has at least one such URI to
442+
send.
443+
444+
* Always begins with the section header "packfile-uris".
445+
446+
* For each URI the server sends, it sends a hash of the pack's
447+
contents (as output by git index-pack) followed by the URI.
448+
449+
* The hashes are 40 hex characters long. When Git upgrades to a new
450+
hash algorithm, this might need to be updated. (It should match
451+
whatever index-pack outputs after "pack\t" or "keep\t".
452+
425453
packfile section
426454
* This section is only included if the client has sent 'want'
427455
lines in its request and either requested that no more

builtin/fetch-pack.c

+11-6
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,8 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
4848
struct ref **sought = NULL;
4949
int nr_sought = 0, alloc_sought = 0;
5050
int fd[2];
51-
char *pack_lockfile = NULL;
52-
char **pack_lockfile_ptr = NULL;
51+
struct string_list pack_lockfiles = STRING_LIST_INIT_DUP;
52+
struct string_list *pack_lockfiles_ptr = NULL;
5353
struct child_process *conn;
5454
struct fetch_pack_args args;
5555
struct oid_array shallow = OID_ARRAY_INIT;
@@ -134,7 +134,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
134134
}
135135
if (!strcmp("--lock-pack", arg)) {
136136
args.lock_pack = 1;
137-
pack_lockfile_ptr = &pack_lockfile;
137+
pack_lockfiles_ptr = &pack_lockfiles;
138138
continue;
139139
}
140140
if (!strcmp("--check-self-contained-and-connected", arg)) {
@@ -235,10 +235,15 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
235235
}
236236

237237
ref = fetch_pack(&args, fd, ref, sought, nr_sought,
238-
&shallow, pack_lockfile_ptr, version);
239-
if (pack_lockfile) {
240-
printf("lock %s\n", pack_lockfile);
238+
&shallow, pack_lockfiles_ptr, version);
239+
if (pack_lockfiles.nr) {
240+
int i;
241+
242+
printf("lock %s\n", pack_lockfiles.items[0].string);
241243
fflush(stdout);
244+
for (i = 1; i < pack_lockfiles.nr; i++)
245+
warning(_("Lockfile created but not reported: %s"),
246+
pack_lockfiles.items[i].string);
242247
}
243248
if (args.check_self_contained_and_connected &&
244249
args.self_contained_and_connected) {

builtin/pack-objects.c

+76
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,8 @@ static unsigned long window_memory_limit = 0;
117117

118118
static struct list_objects_filter_options filter_options;
119119

120+
static struct string_list uri_protocols = STRING_LIST_INIT_NODUP;
121+
120122
enum missing_action {
121123
MA_ERROR = 0, /* fail if any missing objects are encountered */
122124
MA_ALLOW_ANY, /* silently allow ALL missing objects */
@@ -125,6 +127,15 @@ enum missing_action {
125127
static enum missing_action arg_missing_action;
126128
static show_object_fn fn_show_object;
127129

130+
struct configured_exclusion {
131+
struct oidmap_entry e;
132+
char *pack_hash_hex;
133+
char *uri;
134+
};
135+
static struct oidmap configured_exclusions;
136+
137+
static struct oidset excluded_by_config;
138+
128139
/*
129140
* stats
130141
*/
@@ -969,6 +980,25 @@ static void write_reused_pack(struct hashfile *f)
969980
unuse_pack(&w_curs);
970981
}
971982

983+
static void write_excluded_by_configs(void)
984+
{
985+
struct oidset_iter iter;
986+
const struct object_id *oid;
987+
988+
oidset_iter_init(&excluded_by_config, &iter);
989+
while ((oid = oidset_iter_next(&iter))) {
990+
struct configured_exclusion *ex =
991+
oidmap_get(&configured_exclusions, oid);
992+
993+
if (!ex)
994+
BUG("configured exclusion wasn't configured");
995+
write_in_full(1, ex->pack_hash_hex, strlen(ex->pack_hash_hex));
996+
write_in_full(1, " ", 1);
997+
write_in_full(1, ex->uri, strlen(ex->uri));
998+
write_in_full(1, "\n", 1);
999+
}
1000+
}
1001+
9721002
static const char no_split_warning[] = N_(
9731003
"disabling bitmap writing, packs are split due to pack.packSizeLimit"
9741004
);
@@ -1266,6 +1296,25 @@ static int want_object_in_pack(const struct object_id *oid,
12661296
}
12671297
}
12681298

1299+
if (uri_protocols.nr) {
1300+
struct configured_exclusion *ex =
1301+
oidmap_get(&configured_exclusions, oid);
1302+
int i;
1303+
const char *p;
1304+
1305+
if (ex) {
1306+
for (i = 0; i < uri_protocols.nr; i++) {
1307+
if (skip_prefix(ex->uri,
1308+
uri_protocols.items[i].string,
1309+
&p) &&
1310+
*p == ':') {
1311+
oidset_insert(&excluded_by_config, oid);
1312+
return 0;
1313+
}
1314+
}
1315+
}
1316+
}
1317+
12691318
return 1;
12701319
}
12711320

@@ -2864,6 +2913,29 @@ static int git_pack_config(const char *k, const char *v, void *cb)
28642913
pack_idx_opts.version);
28652914
return 0;
28662915
}
2916+
if (!strcmp(k, "uploadpack.blobpackfileuri")) {
2917+
struct configured_exclusion *ex = xmalloc(sizeof(*ex));
2918+
const char *oid_end, *pack_end;
2919+
/*
2920+
* Stores the pack hash. This is not a true object ID, but is
2921+
* of the same form.
2922+
*/
2923+
struct object_id pack_hash;
2924+
2925+
if (parse_oid_hex(v, &ex->e.oid, &oid_end) ||
2926+
*oid_end != ' ' ||
2927+
parse_oid_hex(oid_end + 1, &pack_hash, &pack_end) ||
2928+
*pack_end != ' ')
2929+
die(_("value of uploadpack.blobpackfileuri must be "
2930+
"of the form '<object-hash> <pack-hash> <uri>' (got '%s')"), v);
2931+
if (oidmap_get(&configured_exclusions, &ex->e.oid))
2932+
die(_("object already configured in another "
2933+
"uploadpack.blobpackfileuri (got '%s')"), v);
2934+
ex->pack_hash_hex = xcalloc(1, pack_end - oid_end);
2935+
memcpy(ex->pack_hash_hex, oid_end + 1, pack_end - oid_end - 1);
2936+
ex->uri = xstrdup(pack_end + 1);
2937+
oidmap_put(&configured_exclusions, ex);
2938+
}
28672939
return git_default_config(k, v, cb);
28682940
}
28692941

@@ -3462,6 +3534,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
34623534
N_("do not pack objects in promisor packfiles")),
34633535
OPT_BOOL(0, "delta-islands", &use_delta_islands,
34643536
N_("respect islands during delta compression")),
3537+
OPT_STRING_LIST(0, "uri-protocol", &uri_protocols,
3538+
N_("protocol"),
3539+
N_("exclude any configured uploadpack.blobpackfileuri with this protocol")),
34653540
OPT_END(),
34663541
};
34673542

@@ -3650,6 +3725,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
36503725
}
36513726

36523727
trace2_region_enter("pack-objects", "write-pack-file", the_repository);
3728+
write_excluded_by_configs();
36533729
write_pack_file();
36543730
trace2_region_leave("pack-objects", "write-pack-file", the_repository);
36553731

connected.c

+5-3
Original file line numberDiff line numberDiff line change
@@ -43,10 +43,12 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
4343

4444
if (transport && transport->smart_options &&
4545
transport->smart_options->self_contained_and_connected &&
46-
transport->pack_lockfile &&
47-
strip_suffix(transport->pack_lockfile, ".keep", &base_len)) {
46+
transport->pack_lockfiles.nr == 1 &&
47+
strip_suffix(transport->pack_lockfiles.items[0].string,
48+
".keep", &base_len)) {
4849
struct strbuf idx_file = STRBUF_INIT;
49-
strbuf_add(&idx_file, transport->pack_lockfile, base_len);
50+
strbuf_add(&idx_file, transport->pack_lockfiles.items[0].string,
51+
base_len);
5052
strbuf_addstr(&idx_file, ".idx");
5153
new_pack = add_packed_git(idx_file.buf, idx_file.len, 1);
5254
strbuf_release(&idx_file);

0 commit comments

Comments
 (0)