A Harmony service to convert NetCDF4 files to Zarr files. Takes conventional Harmony messages and translates their input granules to Zarr using xarray.
This library intentionally does very little checking of the input files and
file extensions. It is designed to work on NetCDF granules. It ought to work
with any other file type that can be opened with
xarray.open_mfdataset
using the h5netcdf
driver. This includes some HDF5 EOSDIS datasets.
Individual collections must be tested to ensure compatibility.
It is recommended that the NetCDF-to-Zarr service is tested and developed using a local Harmony instance. This can be established following the instructions in the Harmony repository.
It is possible to develop and run this service locally using only Docker. This is the recommended option for validation and small changes. Install Docker on your development machine.
This service uses the
harmony-service-lib-py,
and requires that certain environment variables be set, as shown in the Harmony
Service Lib README. For example, STAGING_BUCKET
and STAGING_PATH
are
required, and EDL_USERNAME
and EDL_PASSWORD
are required for any
data behind Earthdata Login. For local testing (not integrated into Harmony in
a dev environment or AWS deployment), use the example .env
file in this repo:
$ cp example/dotenv .env
and update the .env
with the correct values.
If you would like to do local development outside of Docker, install Python (3.7.4), and create a Python virtual environment.
Install project dependencies:
$ python -m pip install --upgrade pip
$ make install
If you'd rather not build the image locally (as instructed below), you can simply pull the latest image:
$ docker pull harmonyservices/netcdf-to-zarr
Some of the Makefile targets referenced below include an optional argument that allows us to use a local copy of
harmony-service-lib-py
(which is useful for concurrent development):
$ make target-name LOCAL_SVCLIB_DIR=../harmony-service-lib-py
To run unit tests, coverage reports, or run the service on a sample message outside of the entire Harmony stack, start by building new runtime and test images:
IMPORTANT: If Minikube is installed, be sure to do these steps in a shell in which has not been updated to point to
the Minikube Docker daemon. This is usually done via a shell eval
command. Doing so will
cause tests and the service to fail due to limitations in Minikube.
$ make build-image
$ make build-test-image
Run unit tests and generate overage reports. This will mount the local directory into the container and run the unit tests. So all tests will reflect local changes to the service.
$ make test-in-docker
Finally, run the service using an example Harmony operation request (example/harmony-operation.json) as input. This will reflect local changes to this repo, but will not include local changes to the Harmony Service Lib.
$ make run-in-docker
Without local Harmony Service Lib changes:
If using Minikube, be sure your environment is pointed to the Minikube Docker daemon:
$ eval $(minikube docker-env)
Build the image:
$ make build-image
You can now run a workflow in your local Harmony stack and it will execute using this image.
Restart the services in your local Harmony instance (the script below is contained in the Harmony repository):
$ bin/restart-services
This will require credentials for the Harmony Sandbox NGAPShApplicationDeveloper
to be present in your ~/.aws/credentials
file.
Run tests with coverage reports:
$ make test
Run an example:
$ dotenv run python3 -m harmony_netcdf_to_zarr --harmony-action invoke --harmony-input "$(bin/replace.sh example/harmony-operation.json)"
You may be concurrently developing on this service as well as the harmony-service-lib-py
. If so, and you
want to test changes to it along with this service, install the harmony-service-lib-py
in 'development mode'.
Install it using pip and the path to the local clone of the service library:
pip install -e ../harmony-service-lib-py
Now any changes made to that local repo will be visible in this project when you run tests, etc.
Finally, you can test & run the service in Harmony just as shown in the Development with Docker
section above.
Developers working on the NetCDF-to-Zarr service will need to create a feature
branch for their work. The code in the repository has a unittest
suite, which
should be updated when any code is added or updated within the repository.
When a feature branch is ready for review, a Pull Request (PR) should be opened
against the main
branch. This will automatically trigger a GitHub workflow
that will run the unittest
suite (see:
.github/workflows/run_tests_on_pull_requests.yml
).
When a PR is merged against the main
branch, a different workflow will check
if there are updates to the version.txt
file. This file should contain a
semantic version number (see: .github/workflows/publish_docker_image.yml
).
If there are updates to version.txt
, the GitHub workflow will:
- Extract the semantic version number from that file.
- Extract the latest release notes from
CHANGELOG.md
. - Run the
unittest
suite. - Tag the most recent commit on the
main
branch with the semantic version number. - Create a GitHub release using the release notes and semantic version number.
- Publish the NetCDF-to-Zarr service Docker image to ghcr.io. It will be tagged with the semantic version number.
For this reason, when releasing, please be sure to update both:
- version.txt
- CHANGELOG.md