forked from iterative/dvc
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request iterative#740 from efiop/master
Add initial support for s3/gs outputs/deps/cache_dirs
- 1.10.2
- 1.10.1
- 1.10.0
- 1.9.1
- 1.9.0
- 1.8.4
- 1.8.3
- 1.8.2
- 1.8.1
- 1.8.0
- 1.7.9
- 1.7.8
- 1.7.7
- 1.7.6
- 1.7.5
- 1.7.4
- 1.7.3
- 1.7.2
- 1.7.1
- 1.7.0
- 1.6.6
- 1.6.5
- 1.6.4
- 1.6.3
- 1.6.2
- 1.6.1
- 1.6.0
- 1.5.1
- 1.5.0
- 1.4.0
- 1.3.1
- 1.3.0
- 1.2.3
- 1.2.2
- 1.2.1
- 1.2.0
- 1.1.11
- 1.1.10
- 1.1.9
- 1.1.8
- 1.1.7
- 1.1.6
- 1.1.5
- 1.1.4
- 1.1.3
- 1.1.2
- 1.1.1
- 1.1.0
- 1.0.2
- 1.0.1
- 1.0.0
- 1.0.0b6
- 1.0.0b5
- 1.0.0b4
- 1.0.0b3
- 1.0.0b2
- 1.0.0b1
- 1.0.0b0
- 1.0.0a11
- 1.0.0a10
- 1.0.0a9
- 1.0.0a8
- 1.0.0a7
- 1.0.0a6
- 1.0.0a5
- 1.0.0a4
- 1.0.0a3
- 1.0.0a2
- 1.0.0a1
- 1.0.0a0
- 0.94.1
- 0.94.0
- 0.93.0
- 0.92.1
- 0.92.0
- 0.91.4
- 0.91.3
- 0.91.2
- 0.91.1
- 0.91.0
- 0.90.2
- 0.90.1
- 0.90.0
- 0.89.0
- 0.88.0
- 0.87.0
- 0.86.5
- 0.86.4
- 0.86.3
- 0.86.2
- 0.86.1
- 0.86.0
- 0.85.0
- 0.84.0
- 0.83.0
- 0.82.10
- 0.82.9
- 0.82.8
- 0.82.7
- 0.82.6
- 0.82.5
- 0.82.4
- 0.82.3
- 0.82.2
- 0.82.1
- 0.82.0
- 0.81.3
- 0.81.2
- 0.81.1
- 0.81.0
- 0.80.0
- 0.78.1
- 0.78.0
- 0.77.3
- 0.77.2
- 0.77.1
- 0.77.0
- 0.76.0
- 0.75.1
- 0.75.0
- 0.74.0
- 0.73.0
- 0.72.0
- 0.71.0
- 0.70.0
- 0.69.0
- 0.68.1
- 0.68.0
- 0.67.1
- 0.67.0
- 0.66.11
- 0.66.10
- 0.66.9
- 0.66.8
- 0.66.7
- 0.66.6
- 0.66.5
- 0.66.4
- 0.66.3
- 0.66.2
- 0.66.1
- 0.66.0
- 0.65.0
- 0.64.1
- 0.64.0
- 0.63.4
- 0.63.3
- 0.63.2
- 0.63.1
- 0.63.0
- 0.62.1
- 0.62.0
- 0.61.2
- 0.61.1
- 0.61.0
- 0.60.1
- 0.60.0
- 0.59.2
- 0.59.0
- 0.58.1
- 0.58.0
- 0.57.0
- 0.56.0
- 0.55.1
- 0.55.0
- 0.54.1
- 0.54.0
- 0.53.2
- 0.53.1
- 0.52.1
- 0.52.0
- 0.51.2
- 0.50.1
- 0.50.0
- 0.41.3
- 0.41.2
- 0.41.1
- 0.41.0
- 0.40.7
- 0.40.6
- 0.40.5
- 0.40.4
- 0.40.3
- 0.40.2
- 0.40.1
- 0.40.0
- 0.35.7
- 0.35.6
- 0.35.5
- 0.35.4
- 0.35.3
- 0.34.2
- 0.34.1
- 0.34.0
- 0.33.1
- 0.33.0
- 0.32.1
- 0.32.0
- 0.31.1
- 0.31.0
- 0.30.1
- 0.30.0
- 0.29.0
- 0.28.1
- 0.28.0
- 0.27.1
- 0.27.0
- 0.26.1
- 0.26.0
- 0.25.4
- 0.25.3
- 0.25.2
- 0.25.1
- 0.25.0
- 0.24.3
- 0.24.2
- 0.24.1
- 0.24.0
- 0.23.2
- 0.23.1
- 0.23.0
- 0.22.0
- 0.21.3
- 0.21.2
- 0.21.1
- 0.21.0
- 0.20.8
- 0.20.7
- 0.20.6
- 0.20.5
- 0.20.4
- 0.20.3
- 0.20.2
- 0.20.1
- 0.20.0
- 0.19.15
- 0.19.14
- 0.19.13
- 0.19.12
- 0.19.11
- 0.19.10
- 0.19.9
- 0.19.8
- 0.19.7
- 0.19.6
- 0.19.5
- 0.19.4
- 0.19.3
- 0.19.2
- 0.19.1
- 0.19.0
- 0.18.15
- 0.18.14
- 0.18.13
- 0.18.12
- 0.18.11
- 0.18.10
- 0.18.9
- 0.18.8
- 0.18.7
- 0.18.6
- 0.18.5
- 0.18.4
- 0.18.3
- 0.18.2
- 0.18.1
- 0.18.0
- 0.17.1
- 0.17.0
- 0.16.6
- 0.16.5
- 0.16.4
- 0.16.3
- 0.16.0
- 0.15.3
- 0.15.2
- 0.15.1
- 0.14.4
- 0.14.3
- 0.14.2
- 0.14.1
- 0.14.0
- 0.13.0
- 0.12.0
- 0.11.0
- 0.10.2
- 0.10.1
- 0.10.0
Showing
31 changed files
with
772 additions
and
459 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,210 +1,38 @@ | ||
import os | ||
import json | ||
import shutil | ||
|
||
from dvc.state import State, LinkState | ||
from dvc.system import System | ||
from dvc.logger import Logger | ||
from dvc.utils import move, remove | ||
from dvc.lock import Lock | ||
from dvc.exceptions import DvcException | ||
from dvc.config import Config | ||
from dvc.remote import Remote | ||
|
||
|
||
class Cache(object): | ||
CACHE_DIR = 'cache' | ||
CACHE_DIR_LOCK = 'cache.lock' | ||
CACHE_TYPES = ['reflink', 'hardlink', 'symlink', 'copy'] | ||
CACHE_TYPE_MAP = { | ||
'copy': shutil.copyfile, | ||
'symlink': System.symlink, | ||
'hardlink': System.hardlink, | ||
'reflink': System.reflink, | ||
} | ||
|
||
def __init__(self, root_dir, dvc_dir, cache_dir=None, cache_type=None): | ||
self.cache_type = cache_type | ||
def __init__(self, project): | ||
config = project.config._config[Config.SECTION_CACHE] | ||
|
||
cache_dir = cache_dir if cache_dir else self.CACHE_DIR | ||
if os.path.isabs(cache_dir): | ||
self.cache_dir = cache_dir | ||
local = config.get(Config.SECTION_CACHE_LOCAL, None) | ||
if local: | ||
sect = project.config._config[Config.SECTION_REMOTE_FMT.format(local)] | ||
else: | ||
self.cache_dir = os.path.abspath(os.path.realpath(os.path.join(dvc_dir, cache_dir))) | ||
|
||
if not os.path.exists(self.cache_dir): | ||
os.mkdir(self.cache_dir) | ||
|
||
self.state = State(self.cache_dir) | ||
self.link_state = LinkState(root_dir, dvc_dir) | ||
self.lock = Lock(self.cache_dir, name=self.CACHE_DIR_LOCK) | ||
|
||
@staticmethod | ||
def init(root_dir, dvc_dir, cache_dir=None): | ||
return Cache(root_dir, dvc_dir, cache_dir=None) | ||
|
||
def all(self): | ||
with self.lock: | ||
clist = [] | ||
for entry in os.listdir(self.cache_dir): | ||
subdir = os.path.join(self.cache_dir, entry) | ||
if not os.path.isdir(subdir): | ||
continue | ||
|
||
for cache in os.listdir(subdir): | ||
path = os.path.join(subdir, cache) | ||
clist.append(path) | ||
|
||
return clist | ||
|
||
def get(self, md5): | ||
if not md5: | ||
sect = {} | ||
cache_dir = config.get(Config.SECTION_CACHE_DIR, self.CACHE_DIR) | ||
if not os.path.isabs(cache_dir): | ||
cache_dir = os.path.abspath(os.path.realpath(os.path.join(project.dvc_dir, cache_dir))) | ||
sect[Config.SECTION_REMOTE_URL] = cache_dir | ||
t = config.get(Config.SECTION_CACHE_TYPE, None) | ||
if t: | ||
sect[Config.SECTION_CACHE_TYPE] = t | ||
|
||
self.local = Remote(project, sect) | ||
|
||
self.s3 = self._get_remote(project, config, Config.SECTION_CACHE_S3) | ||
self.gs = self._get_remote(project, config, Config.SECTION_CACHE_GS) | ||
self.ssh = self._get_remote(project, config, Config.SECTION_CACHE_SSH) | ||
|
||
def _get_remote(self, project, config, name): | ||
remote = config.get(name, None) | ||
if not remote: | ||
return None | ||
|
||
return os.path.join(self.cache_dir, md5[0:2], md5[2:]) | ||
|
||
def path_to_md5(self, path): | ||
relpath = os.path.relpath(path, self.cache_dir) | ||
return os.path.dirname(relpath) + os.path.basename(relpath) | ||
|
||
def _changed(self, md5): | ||
cache = self.get(md5) | ||
if self.state.changed(cache, md5=md5): | ||
if os.path.exists(cache): | ||
Logger.warn('Corrupted cache file {}'.format(os.path.relpath(cache))) | ||
remove(cache) | ||
return True | ||
|
||
return False | ||
|
||
def changed(self, md5): | ||
with self.lock: | ||
return self._changed(md5) | ||
|
||
def link(self, src, link): | ||
dname = os.path.dirname(link) | ||
if not os.path.exists(dname): | ||
os.makedirs(dname) | ||
|
||
if self.cache_type != None: | ||
types = [self.cache_type] | ||
else: | ||
types = self.CACHE_TYPES | ||
|
||
for typ in types: | ||
try: | ||
self.CACHE_TYPE_MAP[typ](src, link) | ||
self.link_state.update(link) | ||
return | ||
except Exception as exc: | ||
msg = 'Cache type \'{}\' is not supported'.format(typ) | ||
Logger.debug(msg) | ||
if typ == types[-1]: | ||
raise DvcException(msg, cause=exc) | ||
|
||
@staticmethod | ||
def load_dir_cache(path): | ||
if os.path.isabs(path): | ||
relpath = os.path.relpath(path) | ||
else: | ||
relpath = path | ||
|
||
try: | ||
with open(path, 'r') as fd: | ||
d = json.load(fd) | ||
except Exception as exc: | ||
msg = u'Failed to load dir cache \'{}\'' | ||
Logger.error(msg.format(relpath), exc) | ||
return [] | ||
|
||
if not isinstance(d, list): | ||
msg = u'Dir cache file format error \'{}\': skipping the file' | ||
Logger.error(msg.format(relpath)) | ||
return [] | ||
|
||
return d | ||
|
||
@staticmethod | ||
def get_dir_cache(path): | ||
res = {} | ||
d = Cache.load_dir_cache(path) | ||
|
||
for entry in d: | ||
res[entry[State.PARAM_RELPATH]] = entry[State.PARAM_MD5] | ||
|
||
return res | ||
|
||
def dir_cache(self, cache): | ||
res = {} | ||
dir_cache = self.get_dir_cache(cache) | ||
|
||
for relpath, md5 in dir_cache.items(): | ||
res[relpath] = self.get(md5) | ||
|
||
return res | ||
|
||
@staticmethod | ||
def is_dir_cache(cache): | ||
return cache.endswith(State.MD5_DIR_SUFFIX) | ||
|
||
def _checkout(self, path, md5): | ||
cache = self.get(md5) | ||
|
||
if not cache or not os.path.exists(cache) or self._changed(md5): | ||
if cache: | ||
Logger.warn(u'\'{}({})\': cache file not found'.format(os.path.relpath(cache), | ||
os.path.relpath(path))) | ||
remove(path) | ||
return | ||
|
||
if os.path.exists(path): | ||
msg = u'Data \'{}\' exists. Removing before checkout' | ||
Logger.debug(msg.format(os.path.relpath(path))) | ||
remove(path) | ||
|
||
msg = u'Checking out \'{}\' with cache \'{}\'' | ||
Logger.debug(msg.format(os.path.relpath(path), os.path.relpath(cache))) | ||
|
||
if not self.is_dir_cache(cache): | ||
self.link(cache, path) | ||
return | ||
|
||
dir_cache = self.dir_cache(cache) | ||
for relpath, c in dir_cache.items(): | ||
p = os.path.join(path, relpath) | ||
self.link(c, p) | ||
|
||
def checkout(self, path, md5): | ||
with self.lock: | ||
return self._checkout(path, md5) | ||
|
||
def _save_file(self, path): | ||
md5 = self.state.update(path) | ||
cache = self.get(md5) | ||
if self._changed(md5): | ||
move(path, cache) | ||
self.state.update(cache) | ||
self._checkout(path, md5) | ||
|
||
def _save_dir(self, path): | ||
md5 = self.state.update(path) | ||
cache = self.get(md5) | ||
dname = os.path.dirname(cache) | ||
dir_info = self.state.collect_dir(path) | ||
|
||
for entry in dir_info: | ||
relpath = entry[State.PARAM_RELPATH] | ||
p = os.path.join(path, relpath) | ||
|
||
self._save_file(p) | ||
|
||
if not os.path.isdir(dname): | ||
os.makedirs(dname) | ||
|
||
with open(cache, 'w+') as fd: | ||
json.dump(dir_info, fd, sort_keys=True) | ||
|
||
def save(self, path): | ||
with self.lock: | ||
if os.path.isdir(path): | ||
self._save_dir(path) | ||
else: | ||
self._save_file(path) | ||
sect = project.config._config[Config.SECTION_REMOTE_FMT.format(remote)] | ||
return Remote(project, sect) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.