Skip to content

Commit

Permalink
--dirstat-by-file: Make it faster and more correct
Browse files Browse the repository at this point in the history
Currently, when using --dirstat-by-file, it first does the full --dirstat
analysis (using diffcore_count_changes()), and then resets 'damage' to 1,
if any damage was found by diffcore_count_changes().

But --dirstat-by-file is not interested in the file damage per se. It only
cares if the file changed at all. In that sense it only cares if the blob
object for a file has changed. We therefore only need to compare the
object names of each file pair in the diff queue and we can skip the
entire --dirstat analysis and simply set 'damage' to 1 for each entry
where the object name has changed.

This makes --dirstat-by-file faster, and also bypasses --dirstat's practice
of ignoring rearranged lines within a file.

The patch also contains an added testcase verifying that --dirstat-by-file
now detects changes that only rearrange lines within a file.

Signed-off-by: Johan Herland <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
  • Loading branch information
jherland authored and gitster committed Apr 11, 2011
1 parent 204f01a commit 0133dab
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 5 deletions.
25 changes: 20 additions & 5 deletions diff.c
Original file line number Diff line number Diff line change
Expand Up @@ -1539,9 +1539,27 @@ static void show_dirstat(struct diff_options *options)
struct diff_filepair *p = q->queue[i];
const char *name;
unsigned long copied, added, damage;
int content_changed;

name = p->one->path ? p->one->path : p->two->path;

if (p->one->sha1_valid && p->two->sha1_valid)
content_changed = hashcmp(p->one->sha1, p->two->sha1);
else
content_changed = 1;

if (DIFF_OPT_TST(options, DIRSTAT_BY_FILE)) {
/*
* In --dirstat-by-file mode, we don't really need to
* look at the actual file contents at all.
* The fact that the SHA1 changed is enough for us to
* add this file to the list of results
* (with each file contributing equal damage).
*/
damage = content_changed ? 1 : 0;
goto found_damage;
}

if (DIFF_FILE_VALID(p->one) && DIFF_FILE_VALID(p->two)) {
diff_populate_filespec(p->one, 0);
diff_populate_filespec(p->two, 0);
Expand All @@ -1564,14 +1582,11 @@ static void show_dirstat(struct diff_options *options)
/*
* Original minus copied is the removed material,
* added is the new material. They are both damages
* made to the preimage. In --dirstat-by-file mode, count
* damaged files, not damaged lines. This is done by
* counting only a single damaged line per file.
* made to the preimage.
*/
damage = (p->one->size - copied) + added;
if (DIFF_OPT_TST(options, DIRSTAT_BY_FILE) && damage > 0)
damage = 1;

found_damage:
ALLOC_GROW(dir.files, dir.nr + 1, dir.alloc);
dir.files[dir.nr].name = name;
dir.files[dir.nr].changed = damage;
Expand Down
2 changes: 2 additions & 0 deletions t/t4013-diff-various.sh
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,8 @@ diff master master^ side
diff --dirstat master~1 master~2
# --dirstat doesn't notice changes that simply rearrange existing lines
diff --dirstat initial rearrange
# ...but --dirstat-by-file does notice changes that only rearrange lines
diff --dirstat-by-file initial rearrange
EOF

test_expect_success 'log -S requires an argument' '
Expand Down
3 changes: 3 additions & 0 deletions t/t4013/diff.diff_--dirstat-by-file_initial_rearrange
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
$ git diff --dirstat-by-file initial rearrange
100.0% dir/
$

0 comments on commit 0133dab

Please sign in to comment.