Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Planning for pinning #143

Open
sdboyer opened this issue Nov 18, 2015 · 12 comments
Open

Planning for pinning #143

sdboyer opened this issue Nov 18, 2015 · 12 comments

Comments

@sdboyer
Copy link
Member

sdboyer commented Nov 18, 2015

tl;dr: this is how we think glide can ensure replicable builds and still be easy to use, for all use cases we've been able to think of.

To help ensure we do pinning properly, @mattfarina @technosophos and I spent a bit of time on videochat, and drew a purty thing. I'm putting it up here, along with my understanding of what we discussed, for reference:

img_20151118_122939

The matrix describes the set of possible disk states glide might encounter, against the set of possible commands a user might run. Our goal was to articulate a basic class of behavior for each of those combinations.

For this diagram to make sense, it's important to understand the overall strategy glide is pursuing: glide ALWAYS tries to create the most reproducible build possible. This entails that, for each main package and its corresponding glide.yaml, all dependencies (direct and transitive) should be pinned to specific, immutable versions. There's no wild-westing, no managing some packages but not others. Glide's sole behavior is to strive for completely deterministic builds; you don't get to turn it off.

Other language package managers have historically been lax in this regard. We think that we can make glide's forced determinism easy and transparent enough that even those who "just want to run HEAD" won't be bothered by it, and will still get all the benefits of deterministic builds.

We think this approach is worthwhile because it makes glide's operation less complex, more predictable, and exposes the user to fewer possibly confusing results. It allows us say that, for every command that is run, we target exactly one class of output disk states (in the matrix, it's the state with an orange star):

  • a valid glide.yaml file, comprising (at least) all direct dependencies
  • lock file that is correct with respect to the version bounds established in the glide.yaml file
  • vendor directory containing packages that match exactly the pinned versions described in the lock file

Because there is only the one class of target output states, it removes ambiguity about how glide should deal with different types of user actions. This is not only because the end goal is clear, but also because it is impossible that glide would have itself intentionally left the disk in anything other than the correct output state - and therefore, any on-disk state glide finds represents user intent.

These rules amount to removing circular dependencies. M. C. Escher is great, but you don't want him in your package manager.

The disk states in the matrix aren't quite exhaustive, but it's pretty good. They are:

  • No glide.yaml: basically, when you first start out. No glide.yaml implies no lock file and no vendor (if they exist, they are considered detritus)
  • Has glide.yaml, no lock file:
  • Has glide.yaml, outdated lock file: the lock file always contains a hash of the dependencies from the version of the glide.yaml file from which it was generated. If the hash does not match, we know the lock file is out of date.
  • Has glide.yaml, lock file, wrong local state: there is a discrepancy between what's in the lock file and what's checked out in the vendor directory.
  • Has glide.yaml, lock file, correct local state: this is the 'blessed' state that all commands seek to reach.
  • Has empty glide.yaml and .go files contain some external dependencies: commonly the case when starting using glide on a new project.
  • Has empty glide.yaml and .go files contain no external dependencies: also common for new projects.

The other dimension is the user action (so, the command run by the user). These are pretty self-explanatory, I think?

We used three marks in the matrix. We weren't very rigorous about their definition, but it goes something like this:

  • Red X: Running this command from this disk state is an error, and the command will exit without modifying disk state at all.
  • Orange triangle: Running this command from this disk state is acceptable. glide will probably do more than the user had directly intended, but the results should still be predictable and non-problematic. For example, running glide get with a yaml and no lock or vendor will effectively add your new dep to the glide.yaml, then run a full glide install in order to write out the entire lockfile and fully populate vendor/.
  • Green circle: from this disk state, this command will do pretty much just what the user would expect it would do.
@mattfarina mattfarina added this to the 0.8.0 milestone Nov 18, 2015
mattfarina added a commit that referenced this issue Nov 23, 2015
- Removed legacy guess command not that create does this.
- Added GuardYaml command to guard against overwriting yaml file.
- Works with -y global flag.
@imikushin
Copy link

Regarding prune command: it would be great to have an option (--aggressive?) to remove all *.go files from packages not imported (transitively) in the project's Go sources, not just the unspecified subpackages (in glide.yaml). All non-.go files except LICENSE(.md), and empty dirs, would also be removed.

@technosophos
Copy link
Member

What's the use case for --aggressive? I don't think I've ever seen a package manager that behaved that way.

@imikushin
Copy link

Well, if you want to store go libraries under version control, you find
yourself wanting to minimize the amount of vendored code to the bare
minimum. Docker does it with a cryptic shell script. I'd like to use glide
:)

On Tue, Nov 24, 2015, 20:34 Matt Butcher [email protected] wrote:

What's the use case for --aggressive? I don't think I've ever seen a
package manager that behaved that way.


Reply to this email directly or view it on GitHub
#143 (comment).

@sdboyer
Copy link
Member Author

sdboyer commented Nov 24, 2015

to remove all *.go files from packages not imported (transitively) in the project's Go sources

@imikushin i think what you're looking for is unreachable/dead code elimination, yes? I'd say that's probably out of scope for a package manager (though I can understand why, if you were to need that, it would be convenient to attach it to the package manager).

I'm not sure there's really anything in Go-dom that does that level of analysis outside of the compiler itself. The closest thing I can readily find is go tool vet -unreachable. But that's really limited static analysis; afaik it is based solely on whether or not an identifier is ever referenced, at all, by anything else in the search scope. What I suspect you're looking for is determining if it's possible to eliminate any Go files based on the specific identifiers that are transitively called from your entry point (the current main.main()).

That's a harder and more expensive problem. You'd probably have to run some version of a connected components algorithm to figure out which identifiers actually can be safely eliminated (though, thinking through it right now, it seems like a tree/map could be sufficient...), then see if there are any files comprised entirely of unused identifiers. Worth doing in a compiler if you're already traversing all the code, but...

End of the day, though, I'm inclined to think it's not something glide should do because it breaks the basic guarantee that glide leaves the disk state (aka vendor/) an exact reflection of the information reflected in the lock file. It makes for at least one additional row in the diagram - disk in "normal" or "aggressively pruned" state. That necessarily introduce a possible gotcha+step in local workflows where, when a developer working off of a dependent package's godoc calls a new function that aggressive pruning had eliminated, glide has to rerun in order to recompute the new aggressively pruned disk state. ...but the developer won't know - all they'll see is a compiler error for a missing function, and scratch their head, because it's right there in the docs.

TBH, I haven't experienced a case myself where this kind of thing would really add much value. Could you provide a link to Docker's "cryptic shell script"?

@imikushin
Copy link

@sdboyer Thanks for your reply. The proposed --aggressive option is an ideal (very) nice to have feature, but definitely not a hard requirement.

A must have though, is the prune command that will remove the unspecified packages (per glide.yaml), i.e. *.go files from the unspecified dirs. This is possible without projects source code analysis and provides one-to-one relationship between the disk state and glide.yaml content (independent of the projects source code).

The cryptic shell script I mentioned is this one:
https://github.com/docker/docker/blob/master/hack/.vendor-helpers.sh
And here is it's usage: https://github.com/docker/docker/blob/master/hack/vendor.sh

I failed to adapt it to my project and instead patched glide :)

@mattfarina
Copy link
Member

@imikushin what's the use case? Why prune all the packages that aren't specified at a more detailed level? "As a developer, ...."?

@mattfarina
Copy link
Member

e370601 has glide update writing a lock file (glide.lock).

@sdboyer
Copy link
Member Author

sdboyer commented Nov 25, 2015

@imikushin ...tbh i'm still a bit confused about what you're asking for. You say:

*.go files from the unspecified dirs

I'm not sure that's what you actually want? To literally do what you're saying there, it means not removing non-*.go files from package directories that are otherwise unused, and even preserving potentially empty directories. This specific phrasing is why I inferred you were looking for dead/unreachable code elimination.

However, from this:

This is possible without projects source code analysis and provides one-to-one relationship between the disk state and glide.yaml content (independent of the projects source code).

And judging from what the docker script does, I think you're actually just asking for a tool that removes whole repositories that are present under vendor/, but not specified in glide.yaml. If that's the case, that is very much so the plan. There's an argument to be made that it should be done by all commands, being part of the guarantee of the "blessed state" originally described. If it's not, though, then that's exactly what prune would do, without any --aggressive option. In fact, I can't think of anything prune would do other than this, so I'm not sure what work you were imagining prune would do WITHOUT --aggressive.

@sdboyer
Copy link
Member Author

sdboyer commented Nov 25, 2015

Oh, I also meant to note in my original response - carving up stuff under vendor/ is also really out of scope for glide because that would make for dirty trees from the VCS' perspective, which would considerably complicate the interactions with them. No bueno.

@imikushin
Copy link

Sorry for not expressing my use case clearly enough. Here it is, in full. Maybe I'm doing something you guys think is stupid. If that's the case, I want to know what it is. Here's the use case:

As a maintainer of a project written in Go programming language, using quite a few of source code dependencies (one of which is docker and is pretty large), I need to manage these dependencies. The project has a policy (imposed by the project owner) to store all source level dependencies under version control.

I don't need to store these dependencies entirely. Storing just the relevant parts is fine, as long as the tool ensures repeatability: given the same dependency specification, the tool should put the same content into the ./vendor directory.

Now, minimizing dependencies' footprint is important because most of these files are just taking up space and bandwidth, slowing down the project checkouts and builds.

So, I need a tool to retrieve the specified dependencies and strip the unneeded parts.

I'm currently using my own glide fork to work on RancherOS:

  1. Edit glide.yaml
  2. Run ~/bin/glide up -u --quick --skip-gopath --cache --delete. This puts all dependencies to ./vendor dir. To my surprise --delete only works before retrieving dependencies, so I also need to:
  3. Run ~/bin/glide del, which is my custom command, based on a patched version of glide's delete.go

This is good enough, but glide can do better:

  • To have the del (or call it prune) command implemented in upstream glide
  • Better yet have it as an option to the up command (--delete doesn't work, who'd guess?)
  • Have prune remove the packages not explicitly specified in glide.yaml. (Ideally, remove the non-imported code, but I agree, that is way too much for the tool to stay simple.)

By removing packages I mean removing all files from the corresponding directories (except LICENSEs and READMEs) and then remove empty dirs as well.

Why not just remove the dirs with rm -rf? Because, we might need a subpackage, but not its parent package, e.g. like this:

- package: github.com/docker/libcompose
  version: 0919e089edff3ba95d84119228f46d414882ded1
  subpackages:
  - cli
  - docker
  - logger
  - lookup
  - project
  - utils

In this particular case I'd like ./vendor/github.com/docker/libcompose dir to only contain cli, docker, logger, lookup, project and utils sub-directories and maybe the license notice and a readme, but nothing else.

@mattfarina
Copy link
Member

Anyone following along, now is a good time to test the feat/lockfile branch. The init, update, and install commands are all in working shape. Docs do still need updating.

@mattfarina
Copy link
Member

This should remain open until purge has been implemented. The other elements are ready to be tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants