Skip to content

Commit

Permalink
augment JournalDelta with unclean paths on snapshot hash change
Browse files Browse the repository at this point in the history
Summary:
We were previously generating a simple JournalDelta consisting of
just the from/to snapshot hashes.  This is great from a `!O(repo)` perspective
when recording what changed but makes it difficult for clients downstream
to reason about changes that are not tracked in source control.

This diff adds a concept of `uncleanPaths` to the journal; these are paths
that we think are/were different from the hashes in the journal entry.

Since JournalDelta needs to be able to be merged I've opted for a simple
list of the paths that have a differing status; I'm not including all of
the various dirstate states for this because it is not obvious how to
reconcile the state across successive snapshot change events.

The `uncleanPaths` set is populated with an initial set of different paths as
the first part of the checkout call (prior to changing the hash), and then is
updated after the hash has changed to capture any additional differences.

Care needs to be taken to avoid recursively attempting to grab the parents lock
so I'm replicating just a little bit of the state management glue in the
`performDiff` method.

The Journal was not setting the from/to snapshot hashes when merging deltas.
This manifested in the watchman integration tests; we'd see the null revision
as the `from` and the `to` revision held the `from` revision(!).

On the watchman side we need to ask source control to expand the list of
files that changed when the from/to hashes are different; I've added code
to handle this.  This doesn't do anything smart in the case that the
source control aware queries are in use.  We'll look at that in a following
diff as it isn't strictly eden specific.

`watchman clock` was returning a basically empty clock unconditionally,
which meant that most since queries would report everything since the start
of time.  This is most likely contributing to poor Buck performance, although
I have not investigated the performance aspect of this.  It manifested itself
in the watchman integration tests.

Reviewed By: simpkins

Differential Revision: D5896494

fbshipit-source-id: a88be6448862781a1d8f5e15285ca07b4240593a
  • Loading branch information
wez authored and facebook-github-bot committed Oct 17, 2017
1 parent 52b51c2 commit 2c6765e
Show file tree
Hide file tree
Showing 8 changed files with 257 additions and 38 deletions.
5 changes: 5 additions & 0 deletions facebook/TARGETS
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
include_defs("//eden/DEFS")

python_binary(
name = "watchman-diag",
srcs = {"watchman-diag": "watchman_diag.py"},
Expand Down Expand Up @@ -27,6 +29,9 @@ python_binary(
deps = [
"@/eden/cli:cli",
"@/eden/fs/service:edenfs",
"@/eden/hg/eden:eden",
"@/eden/hooks/hg:post-clone",
"@/eden/integration/hg/lib:testutil",
"@/eden/integration/lib:lib",
"@/libfb/py:pathutils",
"@/watchman:runtests-py3",
Expand Down
87 changes: 66 additions & 21 deletions scm/Mercurial.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,23 @@ W_CAP_REG("scm-hg")

namespace watchman {

using Options = ChildProcess::Options;
ChildProcess::Options Mercurial::makeHgOptions() const {
ChildProcess::Options opt;
// Ensure that the hgrc doesn't mess with the behavior
// of the commands that we're runing.
opt.environment().set("HGPLAIN", w_string("1"));
// This method is called from the eden watcher and can trigger before
// mercurial has finalized writing out its history data. Setting this
// environmental variable allows us to break the view isolation and read
// information about the commit before the transaction is complete.
opt.environment().set("HG_PENDING", getRootPath());
opt.nullStdin();
opt.pipeStdout();
opt.pipeStderr();
opt.chdir(getRootPath());

return opt;
}

Mercurial::infoCache::infoCache(std::string path) : dirStatePath(path) {
memset(&dirstate, 0, sizeof(dirstate));
Expand Down Expand Up @@ -84,18 +100,9 @@ w_string Mercurial::mergeBaseWith(w_string_piece commitId) const {
}
}

Options opt;
// Ensure that the hgrc doesn't mess with the behavior
// of the commands that we're runing.
opt.environment().set("HGPLAIN", w_string("1"));
opt.nullStdin();
opt.pipeStdout();
opt.pipeStderr();
opt.chdir(getRootPath());

auto revset = to<std::string>("ancestor(.,", commitId, ")");
ChildProcess proc(
{"hg", "log", "-T", "{node}", "-r", revset}, std::move(opt));
{"hg", "log", "-T", "{node}", "-r", revset}, makeHgOptions());

auto outputs = proc.communicate();
auto status = proc.wait();
Expand Down Expand Up @@ -125,19 +132,10 @@ w_string Mercurial::mergeBaseWith(w_string_piece commitId) const {

std::vector<w_string> Mercurial::getFilesChangedSinceMergeBaseWith(
w_string_piece commitId) const {
Options opt;
// Ensure that the hgrc doesn't mess with the behavior
// of the commands that we're runing.
opt.environment().set("HGPLAIN", w_string("1"));
opt.nullStdin();
opt.pipeStdout();
opt.pipeStderr();
opt.chdir(getRootPath());

// The "" argument at the end causes paths to be printed out relative to the
// cwd (set to root path above).
ChildProcess proc(
{"hg", "status", "-n", "--rev", commitId, ""}, std::move(opt));
{"hg", "status", "-n", "--rev", commitId, ""}, makeHgOptions());

auto outputs = proc.communicate();

Expand All @@ -157,4 +155,51 @@ std::vector<w_string> Mercurial::getFilesChangedSinceMergeBaseWith(

return lines;
}

SCM::StatusResult Mercurial::getFilesChangedBetweenCommits(
w_string_piece commitA,
w_string_piece commitB) const {
// The "" argument at the end causes paths to be printed out relative to the
// cwd (set to root path above).
ChildProcess proc(
{"hg", "status", "--print0", "--rev", commitA, "--rev", commitB, ""},
makeHgOptions());

auto outputs = proc.communicate();

std::vector<w_string> lines;
w_string_piece(outputs.first).split(lines, '\0');

auto status = proc.wait();
if (status) {
throw std::runtime_error(to<std::string>(
"failed query for the hg status; command returned with status ",
status,
" out=",
outputs.first,
" err=",
outputs.second));
}

SCM::StatusResult result;
log(ERR, "processing ", lines.size(), " status lines\n");
for (auto& line : lines) {
if (line.size() < 3) {
continue;
}
w_string fileName(line.data() + 2, line.size() - 2);
switch (line.data()[0]) {
case 'A':
result.addedFiles.emplace_back(std::move(fileName));
break;
case 'D':
result.removedFiles.emplace_back(std::move(fileName));
break;
default:
result.changedFiles.emplace_back(std::move(fileName));
}
}

return result;
}
} // namespace watchman
9 changes: 8 additions & 1 deletion scm/Mercurial.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,10 @@

#include <string>
#include <unordered_map>
#include "ChildProcess.h"
#include "FileInformation.h"
#include "SCM.h"
#include "watchman_synchronized.h"
#include "FileInformation.h"

namespace watchman {

Expand All @@ -17,8 +18,14 @@ class Mercurial : public SCM {
w_string mergeBaseWith(w_string_piece commitId) const override;
std::vector<w_string> getFilesChangedSinceMergeBaseWith(
w_string_piece commitId) const override;
SCM::StatusResult getFilesChangedBetweenCommits(
w_string_piece commitA,
w_string_piece commitB) const override;

private:
// Returns options for invoking hg
ChildProcess::Options makeHgOptions() const;

struct infoCache {
std::string dirStatePath;
FileInformation dirstate;
Expand Down
13 changes: 13 additions & 0 deletions scm/SCM.h
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,19 @@ class SCM {
virtual std::vector<w_string> getFilesChangedSinceMergeBaseWith(
w_string_piece commitId) const = 0;

struct StatusResult {
std::vector<w_string> changedFiles;
std::vector<w_string> addedFiles;
std::vector<w_string> removedFiles;
};

// Compute the set of paths that have changed between the two
// specified commits. This is purely a history operation and
// does not consider the working copy status.
virtual StatusResult getFilesChangedBetweenCommits(
w_string_piece commitA,
w_string_piece commitB) const = 0;

private:
w_string rootPath_;
w_string scmRoot_;
Expand Down
12 changes: 11 additions & 1 deletion tests/TARGETS
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
include_defs("//eden/DEFS")

include_defs("//common/defs/environment")

if os_getenv('SANDCASTLE') is not None:
Expand All @@ -17,15 +19,23 @@ if os_getenv('SANDCASTLE') is not None:
],
)

# We test only with the oss version of eden because it builds faster
# and we do not expect that testing against the internal version
# will provide any additional signal.
oss_suffix = get_oss_suffix()

artifacts = get_test_env_and_deps(oss_suffix)

custom_unittest(
name = "runtests-py3",
command = [
"$(location @/watchman/facebook:runtests-py3)",
"--testpilot-json",
],
env = artifacts["env"],
tags = ["run_as_bundle"],
type = "json",
deps = [
deps = artifacts["deps"] + [
"@/watchman/facebook:runtests-py3",
],
)
Expand Down
39 changes: 32 additions & 7 deletions tests/integration/WatchmanEdenTestCase.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,15 @@
import WatchmanInstance
import os
import tempfile
try:
import unittest2 as unittest
except ImportError:
import unittest

TestParent = object
try:
import configparser # python3
from eden.integration.lib import (edenclient, hgrepo)
from eden.integration.hg.lib.hg_extension_test_base import (
EDEN_EXT_DIR,
POST_CLONE,
)

def is_sandcastle():
return 'SANDCASTLE' in os.environ
Expand Down Expand Up @@ -50,6 +51,11 @@ def setUp(self):
self.system_config_dir = os.path.join(self.etc_eden_dir, 'config.d')
os.mkdir(self.system_config_dir)

self.hooks_dir = os.path.join(self.etc_eden_dir, 'hooks')
os.mkdir(self.hooks_dir)

self.edenrc = os.path.join(self.eden_home, '.edenrc')

# where we'll mount the eden client(s)
self.mounts_dir = self.mkdtemp(prefix='eden_mounts')

Expand Down Expand Up @@ -79,14 +85,17 @@ def tearDown(self):
if self.eden_watchman:
self.eden_watchman.stop()

def makeEdenMount(self, populate_fn=None):
def makeEdenMount(self, populate_fn=None, enable_hg=False):
''' populate_fn is a function that accepts a repo object and
that is expected to populate it as a pre-requisite to
starting up the eden mount for it. '''
starting up the eden mount for it.
if enable_hg is True then we enable the hg extension
and post-clone hooks to populate the .hg control dir.
'''

repo_path = self.mkdtemp(prefix='eden_repo_')
repo_name = os.path.basename(repo_path)
repo = hgrepo.HgRepository(repo_path)
repo = self.repoForPath(repo_path)
repo.init()

if populate_fn:
Expand All @@ -95,8 +104,24 @@ def makeEdenMount(self, populate_fn=None):
self.eden.add_repository(repo_name, repo_path)

mount_path = os.path.join(self.mounts_dir, repo_name)

if enable_hg:
config = configparser.ConfigParser()
config.read(self.edenrc)
config['hooks'] = {}
config['hooks']['hg.edenextension'] = EDEN_EXT_DIR
config['repository %s' % repo_name]['hooks'] = self.hooks_dir
post_clone_hook = os.path.join(self.hooks_dir, 'post-clone')
os.symlink(POST_CLONE, post_clone_hook)

with open(self.edenrc, 'w') as f:
config.write(f)

self.eden.clone(repo_name, mount_path)
return mount_path

def repoForPath(self, path):
return hgrepo.HgRepository(path)

def setDefaultConfiguration(self):
self.setConfiguration('local', 'bser')
62 changes: 62 additions & 0 deletions tests/integration/test_eden_journal.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# vim:ts=4:sw=4:et:
# Copyright 2017-present Facebook, Inc.
# Licensed under the Apache License, Version 2.0

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
# no unicode literals

import WatchmanEdenTestCase


class TestEdenJournal(WatchmanEdenTestCase.WatchmanEdenTestCase):

def test_eden_journal(self):
def populate(repo):
repo.write_file('hello', 'hola\n')
repo.commit('initial commit.')

root = self.makeEdenMount(populate, enable_hg=True)
repo = self.repoForPath(root)
initial_commit = repo.get_head_hash()

res = self.watchmanCommand('watch', root)
self.assertEqual('eden', res['watcher'])

clock = self.watchmanCommand('clock', root)

self.touchRelative(root, 'newfile')

res = self.watchmanCommand('query', root, {
'fields': ['name'],
'since': clock})
clock = res['clock']
self.assertFileListsEqual(res['files'], ['newfile'])

repo.add_file('newfile')
repo.commit(message='add newfile')
res = self.watchmanCommand('query', root, {
'expression': ['not', ['dirname', '.hg']],
'fields': ['name'],
'since': clock})
clock = res['clock']
self.assertFileListsEqual(res['files'], [
'newfile'],
message='We expect to report the files changed in the commit')

# Test the the journal has the correct contents across a "reset" like
# operation where the parents are poked directly. This is using
# debugsetparents rather than reset because the latter isn't enabled
# by default for hg in the watchman test machinery.
self.touchRelative(root, 'unclean')
repo.hg('debugsetparents', initial_commit)
res = self.watchmanCommand('query', root, {
'expression': ['not', ['dirname', '.hg']],
'fields': ['name'],
'since': clock})
self.assertFileListsEqual(res['files'], [
'newfile',
'unclean'],
message=('We expect to report the file changed in the commit '
'as well as the unclean file'))
Loading

0 comments on commit 2c6765e

Please sign in to comment.