Skip to content

Commit

Permalink
Switched to downloading the data files with curl.
Browse files Browse the repository at this point in the history
Introduced scripts for creating TGZ files that one could
upload to EOS, and for downloading such archives and
unpacking them.
  • Loading branch information
krasznaa committed Mar 20, 2023
1 parent 6c797fb commit a11175f
Show file tree
Hide file tree
Showing 6 changed files with 215 additions and 12 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/builds.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,13 +39,13 @@ jobs:
shell: bash
steps:
- name: Dependencies
run: apt-get install -y git-lfs
run: apt-get install -y git-lfs wget
- uses: actions/checkout@v2
with:
submodules: true
lfs: true
- name: Unpack data files
run: data/extract_files.sh
- name: Download data files
run: data/traccc_data_get_files.sh
- name: Configure
run: |
source ${GITHUB_WORKSPACE}/.github/ci_setup.sh ${{ matrix.platform.name }}
Expand Down
12 changes: 3 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ flowchart LR
linkStyle 24 stroke: brown;
```

## Requirements and dependencies
## Requirements and dependencies

### OS & compilers:

Expand All @@ -179,12 +179,6 @@ and toolchains that are currently known to work (last updated 2022/01/24):
| --- | --- | --- | --- | --- |
| CUDA | Ubuntu 20.04 | 9.3.0 | 11.5 | runs on CI |

### Data directory

The `data` directory is a submodule hosted as `git lfs` on `https://gitlab.cern.ch/acts/traccc-data`.
After cloning the submodule, you need to run the script `extract_files.sh`, that is located
in the submodule checkout. It will unpack the `tar.gz` files which contain the test data.

### Prerequisites

- [Boost](https://www.boost.org/): program_options
Expand All @@ -211,7 +205,7 @@ cmake --build <build_directory> <options>

### Build options

| Option | Description |
| Option | Description |
| --- | --- |
| TRACCC_BUILD_CUDA | Build the CUDA sources included in traccc |
| TRACCC_BUILD_SYCL | Build the SYCL sources included in traccc |
Expand All @@ -230,7 +224,7 @@ cmake --build <build_directory> <options>
### cpu reconstruction chain

```sh
<build_directory>/bin/traccc_seq_example --detector_file=tml_detector/trackml-detector.csv --digitization_config_file=tml_detector/default-geometric-config-generic.json --input_directory=tml_pixels/ --events=10
<build_directory>/bin/traccc_seq_example --detector_file=tml_detector/trackml-detector.csv --digitization_config_file=tml_detector/default-geometric-config-generic.json --input_directory=tml_pixels/ --events=10
```

### cuda reconstruction chain
Expand Down
15 changes: 15 additions & 0 deletions data/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#
# (c) 2023 CERN for the benefit of the ACTS project
#
# Mozilla Public License Version 2.0
#
cca_test/
detray_simulation/
single_module/
tml_detector/
tml_full/
tml_pixel_barrel/
tml_pixels/
two_modules/
*.tar.gz
*.md5
16 changes: 16 additions & 0 deletions data/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Data Directory

This subdirectory is what normally holds the data files used by the tests
and examples.

To download the "default version" of the files (corresponding to the version
of the code in the repository), just execute

```
./traccc_data_get_files.sh
```

without any additional arguments.

To produce a new tarball of data files, use the `traccc_data_package_files.sh`
script.
97 changes: 97 additions & 0 deletions data/traccc_data_get_files.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
#!/bin/bash
#
# (c) 2023 CERN for the benefit of the ACTS project
#
# Mozilla Public License Version 2.0
#
# Script downloading the traccc data file(s) through HTTPS, and unpacking them.
#

# Stop on errors.
set -e
set -o pipefail

# Function printing the usage information for the script.
usage() {
echo "Script downloading/unpacking data TGZ/MD5 files"
echo ""
echo "Usage: traccc_data_get_files.sh [options]"
echo ""
echo "Options:"
echo " -f <filename> Name of the data file, without its extension"
echo " -d <webDirectory> Directory holding the data and MD5 files"
echo " -o <dataDirectory> Main data directory"
echo " -c <cmakeExecutable> CMake executable to use in the script"
echo " -w <curlExecutable> CUrl executable to use in the script"
echo ""
}

# Default script arguments.
TRACCC_DATA_NAME=${TRACCC_DATA_NAME:-"traccc-data-v1"}
TRACCC_WEB_DIRECTORY=${TRACCC_WEB_DIRECTORY:-"https://acts.web.cern.ch/traccc/data"}
TRACCC_DATA_DIRECTORY=${TRACCC_DATA_DIRECTORY:-$(cd $(dirname "${BASH_SOURCE[0]}") && pwd)}
TRACCC_CMAKE_EXECUTABLE=${TRACCC_CMAKE_EXECUTABLE:-cmake}
TRACCC_CURL_EXECUTABLE=${TRACCC_CURL_EXECUTABLE:-curl}

# Parse the command line argument(s).
while getopts ":f:d:o:c:wh" opt; do
case $opt in
f)
TRACCC_DATA_NAME=$OPTARG
;;
d)
TRACCC_WEB_DIRECTORY=$OPTARG
;;
o)
TRACCC_DATA_DIRECTORY=$OPTARG
;;
c)
TRACCC_CMAKE_EXECUTABLE=$OPTARG
;;
w)
TRACCC_CURL_EXECUTABLE=$OPTARG
;;
h)
usage
exit 0
;;
:)
echo "Argument -$OPTARG requires a parameter!"
usage
exit 1
;;
?)
echo "Unknown argument: -$OPTARG"
usage
exit 1
;;
esac
done

# Go into the target directory.
cd "${TRACCC_DATA_DIRECTORY}"

# Download the TGZ and MD5 files.
"${TRACCC_CURL_EXECUTABLE}" --retry 5 --retry-connrefused --retry-delay 10 \
--output "${TRACCC_DATA_NAME}.tar.gz" \
"${TRACCC_WEB_DIRECTORY}/${TRACCC_DATA_NAME}.tar.gz"
"${TRACCC_CURL_EXECUTABLE}" --retry 5 --retry-connrefused --retry-delay 10 \
--output "${TRACCC_DATA_NAME}.md5" \
"${TRACCC_WEB_DIRECTORY}/${TRACCC_DATA_NAME}.md5"

# Verify that the download succeeded.
"${TRACCC_CMAKE_EXECUTABLE}" -E md5sum "${TRACCC_DATA_NAME}.tar.gz" > \
"${TRACCC_DATA_NAME}.md5-test"
"${TRACCC_CMAKE_EXECUTABLE}" -E compare_files "${TRACCC_DATA_NAME}.md5" \
"${TRACCC_DATA_NAME}.md5-test"

# Extract the data files.
"${TRACCC_CMAKE_EXECUTABLE}" -E tar xf "${TRACCC_DATA_NAME}.tar.gz"

# Clean up.
"${TRACCC_CMAKE_EXECUTABLE}" -E remove "${TRACCC_DATA_NAME}.tar.gz" \
"${TRACCC_DATA_NAME}.md5" "${TRACCC_DATA_NAME}.md5-test"

# Leave the user with a message.
"${TRACCC_CMAKE_EXECUTABLE}" -E echo \
"Files from ${TRACCC_DATA_NAME} unpacked under '${TRACCC_DATA_DIRECTORY}'"
81 changes: 81 additions & 0 deletions data/traccc_data_package_files.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
#!/bin/bash
#
# (c) 2023 CERN for the benefit of the ACTS project
#
# Mozilla Public License Version 2.0
#
# Script generating TGZ/MD5 files that could then be uploaded to the ACTS web
# service.
#

# Stop on errors.
set -e
set -o pipefail

# Function printing the usage information for the script.
usage() {
echo "Script generating data TGZ/MD5 files"
echo ""
echo "Usage: traccc_data_package_files.sh [options]"
echo ""
echo "Options:"
echo " -o <outputName> Set the name of the output file(s)"
echo " -i <inputDirectory> Additional input directory to pick up"
echo " -d <dataDirectory> Main data directory"
echo " -c <cmakeExecutable> CMake executable to use in the script"
echo ""
}

# Default script arguments.
TRACCC_DATA_NAME=${TRACCC_DATA_NAME:-"traccc-data-v2"}
TRACCC_DATA_DIRECTORY_NAMES=("cca_test" "detray_simulation" "single_module"
"tml_detector" "tml_full" "tml_pixel_barrel" "tml_pixels" "two_modules")
TRACCC_DATA_DIRECTORY=${TRACCC_DATA_DIRECTORY:-$(cd $(dirname "${BASH_SOURCE[0]}") && pwd)}
TRACCC_CMAKE_EXECUTABLE=${TRACCC_CMAKE_EXECUTABLE:-cmake}

# Parse the command line argument(s).
while getopts ":o:i:d:ch" opt; do
case $opt in
o)
TRACCC_DATA_NAME=$OPTARG
;;
i)
TRACCC_DATA_DIRECTORY_NAMES+=($OPTARG)
;;
d)
TRACCC_DATA_DIRECTORY=$OPTARG
;;
c)
TRACCC_CMAKE_EXECUTABLE=$OPTARG
;;
h)
usage
exit 0
;;
:)
echo "Argument -$OPTARG requires a parameter!"
usage
exit 1
;;
?)
echo "Unknown argument: -$OPTARG"
usage
exit 1
;;
esac
done

# Go into the source directory.
cd "${TRACCC_DATA_DIRECTORY}"

# Compress the directories.
"${TRACCC_CMAKE_EXECUTABLE}" -E tar czf "${TRACCC_DATA_NAME}.tar.gz" \
${TRACCC_DATA_DIRECTORY_NAMES[@]}

# Generate an MD5 file.
"${TRACCC_CMAKE_EXECUTABLE}" -E md5sum "${TRACCC_DATA_NAME}.tar.gz" > \
"${TRACCC_DATA_NAME}.md5"

# Leave the user with a message.
"${TRACCC_CMAKE_EXECUTABLE}" -E echo \
"Generated files '${TRACCC_DATA_NAME}.tar.gz' and '${TRACCC_DATA_NAME}.md5'"

0 comments on commit a11175f

Please sign in to comment.