Name		Name	Last commit message	Last commit date
Latest commit History 359 Commits
.github/workflows		.github/workflows
aodn_cloud_optimised		aodn_cloud_optimised
docs		docs
integration_testing		integration_testing
notebooks		notebooks
test_aodn_cloud_optimised		test_aodn_cloud_optimised
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE.md		LICENSE.md
README.md		README.md
environment.yml		environment.yml
install.sh		install.sh
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Repository files navigation

AODN Cloud Optimised Conversion

A tool designed to convert IMOS NetCDF and CSV files into Cloud Optimised formats such as Zarr and Parquet

Documentation

Visit the documentation on ReadTheDocs for detailed information.

Key Features

Conversion of CSV/NetCDF to Cloud Optimised format (Zarr/Parquet)
- YAML configuration approach with parent and child YAML configuration if multiple dataset are very similar (i.e. Radar ACORN, GHRSST, see config)
- Generic handlers for most dataset (GenericParquetHandler, GenericZarrHandler).
- Specific handlers can be written and inherits methods from a generic handler (Argo handler, Mooring Timseries Handler)
Clustering capability:
- Local dask cluster
- Remote Coiled cluster
- driven by configuration/can be easily overwritten
- Zarr: gridded dataset are done in batch and in parallel with xarray.open_mfdataset
- Parquet: tabular files are done in batch and in parallel as independent task, done with future
Reprocessing:
- Zarr,: reprocessing is achieved by writting to specific regions with slices. Non-contigous regions are handled
- Parquet: reprocessing is done via pyarrow internal overwritting function, but can also be forced in case an input file has significantly changed
Chunking:
- Parquet: to facilitate the query of geospatial data, polygon and timestamp slices are created as partitions
- Zarr: done via dataset configuration
Metadata:
- Parquet: Metadata is created as a sidecar _metadata.parquet file
Unittesting of module: Very close to integration testing, local cluster is used to create cloud optimised files

Quick Guide

Installation

Requirements:

Python >= 3.10.14
AWS SSO to push files to S3
An account on Coiled for remote clustering (Optional)

Automatic installation of the latest wheel release

curl -s https://raw.githubusercontent.com/aodn/aodn_cloud_optimised/main/install.sh | bash

Otherwise, go to the release page.

Development

See ReadTheDocs - Dev

Usage

See ReadTheDocs - Usage

Notebooks

Notebooks can directly be imported into Google Colab

You can also click on the binder button right below to spin the environment and execute the notebooks (note that Binder is free with limited resources)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AODN Cloud Optimised Conversion

Documentation

Key Features

Quick Guide

Installation

Automatic installation of the latest wheel release

Development

Usage

Notebooks

About

Releases 31

Packages

Contributors 7

Languages

License

aodn/aodn_cloud_optimised

Folders and files

Latest commit

History

Repository files navigation

AODN Cloud Optimised Conversion

Documentation

Key Features

Quick Guide

Installation

Automatic installation of the latest wheel release

Development

Usage

Notebooks

About

Resources

License

Stars

Watchers

Forks

Releases 31

Packages 0

Contributors 7

Languages

Packages