forked from github/platform-samples
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request github#248 from github/ls/scripts
move scripts over from git-repo-analysis
- Loading branch information
Showing
12 changed files
with
657 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# Git Repo Analysis Scripts | ||
|
||
Git can become slow if a repository exceeds certain thresholds ([read this for details](http://larsxschneider.github.io/2016/09/21/large-git-repos)). Use the scripts explained below to identify possible culprits in a repository. The scripts have been tested on macOS but they should run on Linux as is. | ||
|
||
_Hint:_ The scripts can run for a long time and output a lot lines. Pipe their output to a file (`./script > myfile`) for further processing. | ||
|
||
## Large by File Size | ||
Use the [git-find-large-files](git-find-large-files) script to identity large files in your Git repository that you could move to [Git LFS](https://git-lfs.github.com/) (e.g. using [git-lfs-migrate](https://github.com/git-lfs/git-lfs/blob/master/docs/man/git-lfs-migrate.1.ronn)). | ||
|
||
Use the [git-find-lfs-extensions](git-find-lfs-extensions) script to identify certain file types that you could move to [Git LFS](https://git-lfs.github.com/). | ||
|
||
## Large by File Count | ||
Use the [git-find-dirs-many-files](git-find-dirs-many-files) and [git-find-dirs-unwanted](git-find-dirs-unwanted) scripts to identify directories with a large number of files. These might indicate 3rd party components that could be extracted. | ||
|
||
Use the [git-find-dirs-deleted-files](git-find-dirs-deleted-files) to identify directories that have been deleted and used to contain a lot of files. If you purge all files under these directories from your history then you might be able significantly reduce the overall size of your repository. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
#!/usr/bin/env bash | ||
# | ||
# Fix an invalid committer/author all commits of your repository. | ||
# | ||
# Usage: | ||
# git-change-author <old-email> <new-name> <new-email> | ||
# | ||
# Author: Lars Schneider, https://github.com/larsxschneider | ||
# | ||
|
||
filter=$(cat <<EOF | ||
OLD_EMAIL='$1' | ||
NEW_NAME='$2' | ||
NEW_EMAIL='$3' | ||
if [ "\$GIT_COMMITTER_EMAIL" = "\$OLD_EMAIL" ] | ||
then | ||
export GIT_COMMITTER_NAME="\$NEW_NAME" | ||
export GIT_COMMITTER_EMAIL="\$NEW_EMAIL" | ||
fi | ||
if [ "\$GIT_AUTHOR_EMAIL" = "\$OLD_EMAIL" ] | ||
then | ||
export GIT_AUTHOR_NAME="\$NEW_NAME" | ||
export GIT_AUTHOR_EMAIL="\$NEW_EMAIL" | ||
fi | ||
EOF | ||
) | ||
|
||
git filter-branch --env-filter "$filter" --tag-name-filter cat -- --all |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
#!/usr/bin/env bash | ||
# | ||
# Print the number of deleted files per directory. The output indicates | ||
# if the directory is present in the HEAD revision. | ||
# | ||
# A deleted directory with a lot of files could indicate a 3rd party | ||
# component that has been deleted. These are usually good candidates for | ||
# purging to make Git repositories smaller (see `git-purge-files`). | ||
# | ||
# The script must be called from the root of the Git repository. | ||
# | ||
# Usage: | ||
# git-find-dirs-deleted-files | ||
# | ||
# Output: [deleted file count] [directory still in HEAD revision?] [directory] | ||
# | ||
# Author: Lars Schneider, https://github.com/larsxschneider | ||
# | ||
|
||
git -c diff.renameLimit=30000 log --diff-filter=D --summary | | ||
grep ' delete mode ...... ' | | ||
sed 's/ delete mode ...... //' | | ||
while read -r F ; do | ||
D=$(dirname "$F"); | ||
if ! [ -d "$D" ]; then | ||
while ! [ -d "$(dirname "$D")" ] ; do D=$(dirname "$D"); done; | ||
echo "deleted $D"; | ||
else | ||
echo "present $D"; | ||
fi; | ||
done | | ||
sort | | ||
uniq -c | | ||
sort -k 2,2 -r |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
#!/usr/bin/env bash | ||
# | ||
# Print directories with the number of files underneath them. | ||
# | ||
# A directory with a lot of files could indicates a 3rd party component. | ||
# These are usually good candidates for purging to make Git repositories | ||
# smaller (see `git-purge-files`). | ||
# | ||
# The script must be called from the root of the Git repository. | ||
# | ||
# Usage: | ||
# git-find-dirs-many-files [file count threshold] | ||
# | ||
# Author: Lars Schneider, https://github.com/larsxschneider | ||
# | ||
|
||
if [ -z "$1" ]; then | ||
FILE_COUNT=100 | ||
else | ||
FILE_COUNT=$1 | ||
fi | ||
|
||
IFS=$'\n'; | ||
DIRS=$(find . -type d -not -path "./.git/*" -exec bash -c 'COUNT=$(find "$0" -type f | wc -l); echo "$COUNT $0"' {} \; | sort -nr) | ||
|
||
for DIR in $DIRS; do | ||
if [ $(($(echo $DIR | sed 's/\..*//'))) -le $FILE_COUNT ]; then | ||
break | ||
fi | ||
echo $DIR | ||
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
#!/usr/bin/env bash | ||
# | ||
# git-find-dirs-unwanted.sh | ||
# | ||
# Search the entire history of a Git repository for (potentially) | ||
# unwanted directories. E.g. 3rd party directories, temp, build or | ||
# Perforce stream directories. | ||
# | ||
# The script prints the number of files under each directory to see the | ||
# impact on the Git tree. Directories with a large number of files can | ||
# be good candidates for exclusions in repository migrations to Git. | ||
# | ||
# The script must be called in the Git root directory. | ||
# | ||
# Author: Lars Schneider, https://github.com/larsxschneider | ||
# | ||
|
||
DIRS=$(git -c diff.renameLimit=30000 log --all --name-only --pretty=format: \ | ||
| awk -F'[^/]*$' '{print $1}' \ | ||
| sort -u \ | ||
| grep -i \ | ||
-e 3p \ | ||
-e 3rd \ | ||
-e artifacts \ | ||
-e assemblies \ | ||
-e backup \ | ||
-e bin \ | ||
-e build \ | ||
-e components \ | ||
-e debug \ | ||
-e deploy \ | ||
-e generated \ | ||
-e install \ | ||
-e lib \ | ||
-e modules \ | ||
-e obj \ | ||
-e output \ | ||
-e packages \ | ||
-e party \ | ||
-e recycle.bin \ | ||
-e release \ | ||
-e resources \ | ||
-e streams \ | ||
-e temp \ | ||
-e third \ | ||
-e tmp \ | ||
-e tools \ | ||
-e util \ | ||
-e vendor \ | ||
-e x64 \ | ||
-e x86 \ | ||
) | ||
|
||
IFS=$'\n' | ||
for I in $DIRS; do | ||
if [ -e "$I" ]; then | ||
FILE_COUNT=$(find "$I" -type f | wc -l) | ||
echo "$FILE_COUNT $I" | ||
else | ||
while ! [ -e $(dirname "$I") ]; do | ||
I=$(dirname "$I")/; | ||
done; | ||
echo "deleted $I" | ||
fi | ||
done | sort -n -r | uniq |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
#!/usr/bin/env bash | ||
# | ||
# Find all files present in the index and working tree ignored by .gitignore. | ||
# | ||
# Usage: | ||
# git-find-ignored-files [-s | --sort-by-size] [--help] | ||
# | ||
# Author: Patrick Lühne, https://www.luehne.de/ | ||
# | ||
|
||
function print_help | ||
{ | ||
grep "^# Usage" < "$0" | cut -c 3- | ||
} | ||
|
||
if [[ $# -gt 1 ]] | ||
then | ||
print_help | ||
exit 1 | ||
fi | ||
|
||
case "$1" in | ||
-h|--help) | ||
print_help | ||
exit 0 | ||
;; | ||
-s|--sort-by-size) | ||
;; | ||
*) | ||
if [[ $# -gt 0 ]] | ||
then | ||
(>&2 echo "error: unknown option “$1”") | ||
print_help | ||
exit 1 | ||
fi | ||
;; | ||
esac | ||
|
||
# Find all ignored files | ||
files=$(git ls-files --ignored --exclude-standard) | ||
|
||
# Stop if no ignored files were found | ||
if [[ -z $files ]] | ||
then | ||
(>&2 echo "info: no ignored files in working tree or index") | ||
exit 0 | ||
fi | ||
|
||
# Compute the file sizes of all these files | ||
file_sizes=$(echo "$files" | tr '\n' '\0' | xargs -0 du -sh) | ||
|
||
# Obtain the origins why these files are ignored | ||
gitignore_origins=$(echo "$files" | git check-ignore --verbose --stdin --no-index) | ||
|
||
# Merge the two lists into one | ||
command="join -1 2 -2 2 -t $'\t' -o 1.1,1.2,2.1 <(echo \"$file_sizes\") <(echo \"$gitignore_origins\")" | ||
|
||
if [[ $1 =~ ^-s|--sort-by-size$ ]] | ||
then | ||
command="$command | sort -h" | ||
fi | ||
|
||
eval "$command" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
#!/usr/bin/env bash | ||
# | ||
# Print the largest files in a Git repository. The script must be called | ||
# from the root of the Git repository. You can pass a threshold to print | ||
# only files greater than a certain size (compressed size in Git database, | ||
# default is 500kb). | ||
# | ||
# Files that have a large compressed size should usually be stored in | ||
# Git LFS [2]. | ||
# | ||
# Based on script from Antony Stubbs [1] and improved with ideas from Peff. | ||
# | ||
# [1] http://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist-line/ | ||
# [2] https://git-lfs.github.com/ | ||
# | ||
# Usage: | ||
# git-find-large-files [size threshold in KB] | ||
# | ||
# Author: Lars Schneider, https://github.com/larsxschneider | ||
# | ||
|
||
if [ -z "$1" ]; then | ||
MIN_SIZE_IN_KB=500 | ||
else | ||
MIN_SIZE_IN_KB=$1 | ||
fi | ||
|
||
# Use "look" if it is available, otherwise use "grep" (e.g. on Windows) | ||
if look >/dev/null 2>&1; then | ||
# On Debian the "-b" is available and required to make "look" perform | ||
# a binary search (see https://unix.stackexchange.com/a/499312/275508 ). | ||
if look 2>&1 | grep -q .-b; then | ||
search="look -b" | ||
else | ||
search=look | ||
fi | ||
else | ||
search=grep | ||
fi | ||
|
||
# set the internal field separator to line break, | ||
# so that we can iterate easily over the verify-pack output | ||
IFS=$'\n'; | ||
|
||
# list all objects including their size, sort by compressed size | ||
OBJECTS=$( | ||
git cat-file \ | ||
--batch-all-objects \ | ||
--batch-check='%(objectsize:disk) %(objectname)' \ | ||
| sort -nr | ||
) | ||
|
||
TMP_DIR=$(mktemp -d "${TMPDIR:-/tmp}/git-find-large-files.XXXXXX") || exit | ||
trap "rm -rf '$TMP_DIR'" EXIT | ||
|
||
git rev-list --all --objects | sort > "$TMP_DIR/objects" | ||
git rev-list --all --objects --max-count=1 | sort > "$TMP_DIR/objects.1" | ||
|
||
for OBJ in $OBJECTS; do | ||
# extract the compressed size in kilobytes | ||
COMPRESSED_SIZE=$(($(echo $OBJ | cut -f 1 -d ' ')/1024)) | ||
|
||
if [ $COMPRESSED_SIZE -le $MIN_SIZE_IN_KB ]; then | ||
break | ||
fi | ||
|
||
# extract the SHA | ||
SHA=$(echo $OBJ | cut -f 2 -d ' ') | ||
|
||
# find the objects location in the repository tree | ||
LOCATION=$($search $SHA "$TMP_DIR/objects" | sed "s/$SHA //") | ||
if $search $SHA "$TMP_DIR/objects.1" >/dev/null; then | ||
# Object is in the head revision | ||
HEAD="Present" | ||
elif [ -e $LOCATION ]; then | ||
# Objects path is in the head revision | ||
HEAD="Changed" | ||
else | ||
# Object nor its path is in the head revision | ||
HEAD="Deleted" | ||
fi | ||
|
||
echo "$COMPRESSED_SIZE,$HEAD,$LOCATION" >> "$TMP_DIR/output" | ||
done | ||
|
||
if [ -f "$TMP_DIR/output" ]; then | ||
column -t -s ',' < "$TMP_DIR/output" | ||
fi | ||
|
||
rm -rf "$TMP_DIR" | ||
exit 0 |
Oops, something went wrong.