Skip to content

Latest commit

 

History

History
125 lines (85 loc) · 3.45 KB

module-overview.rst

File metadata and controls

125 lines (85 loc) · 3.45 KB

Module Overview

Generic Cloud Optimised Creation

The generic_cloud_optimised_creation script is installed as part of the aodn_cloud_optimised package.

.. automodule:: aodn_cloud_optimised.bin.generic_cloud_optimised_creation
    :members:
    :undoc-members:

Create Dataset Configuration (semi-automatic)

The cloud_optimised_create_dataset_config script is installed as part of the aodn_cloud_optimised package and helps greatly in the creation of a full dataset configuration. The file will need to be modified manually, see full documentation at :ref:`dataset-config-doc`

Usage Example

Expand the video:

.. asciinema:: _static/recordings/dataset_config.cast
   :preload: 1
   :theme: solarized-dark
   :autoplay: true
   :speed: 0.80


Definition

.. autofunction:: aodn_cloud_optimised.bin.create_dataset_config.main


Create AWS Registry dataset entry

The cloud_optimised_create_aws_registry_dataset script is installed as part of the aodn_cloud_optimised package.

Usage Example

.. asciinema:: _static/recordings/create_aws_registry.cast
   :preload: 1
   :theme: solarized-dark
   :autoplay: true
   :speed: 0.80
   :cols: 100
   :rows: 30


Definition

.. autofunction:: aodn_cloud_optimised.bin.create_aws_registry_dataset.main


Common Handler

.. automodule:: aodn_cloud_optimised.lib.CommonHandler
    :members:


Parquet Handlers

handler steps

The conversion process is broken down into a series of ordered steps, each responsible for a specific task. These steps include:

  1. delete_existing_matching_parquet: Deletes existing Parquet files that match the current processing criteria.
  2. preprocess_data: Generates a DataFrame and Dataset from the input NetCDF file.
  3. publish_cloud_optimised: Creates Parquet files containing the processed data. - _add_timestamp_df: Adds timestamp information to the DataFrame. Useful for partitioning. - _add_columns_df: Adds generic columns such as site_code and filename to the DataFrame. - _add_columns_df_custom: Adds custom columns (useful for specific handlers). - _rm_bad_timestamp_df: Removes rows with bad timestamps from the DataFrame. - _add_metadata_sidecar: Adds metadata from the PyArrow table to the xarray dataset as sidecar attributes.
  4. postprocess: Cleans up resources used during data processing.

Generic Parquet Handler definition

.. automodule:: aodn_cloud_optimised.lib.GenericParquetHandler
   :members:
   :show-inheritance:

.. inheritance-diagram:: aodn_cloud_optimised.lib.GenericParquetHandler


Argo Parquet Handler

.. automodule:: aodn_cloud_optimised.lib.ArgoHandler
   :members:
   :show-inheritance:

.. inheritance-diagram:: aodn_cloud_optimised.lib.ArgoHandler


Mooring Hourly Timeseries Parquet Handler

.. automodule:: aodn_cloud_optimised.lib.AnmnHourlyTsHandler
   :members:
   :show-inheritance:

.. inheritance-diagram:: aodn_cloud_optimised.lib.AnmnHourlyTsHandler


Zarr Handler

handler steps

Handler definition

.. automodule:: aodn_cloud_optimised.lib.GenericZarrHandler
   :members:

.. inheritance-diagram:: aodn_cloud_optimised.lib.GenericZarrHandler