diff --git a/notebooks/dataset_organization.ipynb b/notebooks/dataset_organization.ipynb index ec8f0b7..c5f34c6 100644 --- a/notebooks/dataset_organization.ipynb +++ b/notebooks/dataset_organization.ipynb @@ -3,51 +3,146 @@ { "cell_type": "markdown", "id": "7b4c0499-4e55-4976-924d-0754bc252b39", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ - "# Dataset organization\n", + "# Dataset structure\n", "\n", - " The SPREE dataset contains collocations of GPM constellation sensors with *reference preciptiation measurements* from different sources. All collocations are provided on two grids: The native grid of the GPM PMW sensor and regridded to a regular lat/lon grid with a resolution of 0.036$^\\circ$. These two types of collocations will be referred to as *native* and *regridded*.\n", + "SPEED consists of collocations of GPM PMW sensors with *reference preciptiation estimates* from multiple *reference data sources*. All collocations are provided on two grids: The native grid of the respective GPM PMW sensor and regridded to a regular lat/lon grid with a resolution of 0.036$^\\circ$. These two types of collocations will be referred to as *native* and *gridded*.\n", " " ] }, { "cell_type": "markdown", "id": "15fd0b4d-2539-4567-a421-eca05af9e8d6", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ - "For a given source of reference data, here named ``reference``, the data is organized as follows:\n", + "## Organization\n", + "\n", + "For a given source of reference data, here named ``reference``, the data is organized into folders as shown below.\n", "\n", "````\n", - "reference\n", - " ├── native\n", - " │   ├── sensor1\n", - " │   │   ├── reference_sensor1_YYYYMMDDHHMMSS.nc\n", + "\n", + " ├── \n", + " │   ├── native\n", + " │   │   ├── __YYYYMMDDHHMMSS.nc\n", " │   │   └── ...\n", - " │   └── sensor2\n", - " │   ├── reference_sensor2_YYYYMMDDHHMMSS.nc\n", + " │   └── gridded\n", + " │   ├── __YYYYMMDDHHMMSS.nc\n", " │   └── ...\n", - " └── regridded\n", - " ├── reference\n", - " │   └── reference_YYYYMMDDHHMMSS.nc\n", - " ├── sensor1\n", - " │   ├── sensor1_YYYYMMDDHHMMSS.nc\n", + " └── \n", + " ├── native\n", + " │   ├── __YYYYMMDDHHMMSS.nc\n", " │   └── ...\n", - " └── sensor2\n", - " ├── sensor2_YYYYMMDDHHMMSS.nc\n", - " └── ...\n", + " └── gridded\n", + "    ├── __YYYYMMDDHHMMSS.nc\n", + "    └── ...\n", "````\n", "\n", - "The ``native`` sub-folder contains folders for each of the input sources, here ``sensor1`` and ``sensor2``. These sensors folder contain the collocations in NetCDF4 format. The ``regridded`` folder contains the corresponding regridded collocations organized into folders for each input sources and an additional folder containing the reference data." + "At the highest-level, the data is separated by reference data source. The collocations for every reference data souce are split up into a ``native`` sub-folder containing the collocations on the native grids and a ``gridded`` folder containing the gridded collocations. \n", + "Within the ``native`` folder, collcation files are organized into different folders with respect to the sensor they are derived from (``sensor1`` and ``sensor2`` in the example)." + ] + }, + { + "cell_type": "markdown", + "id": "fc135c3f-bd78-4961-bf7c-3cd97a150c19", + "metadata": {}, + "source": [ + "# File content\n", + "\n", + "The file structure of the native and regridded data is slightly different but they share the same variable names. Native-grid files contain both *input* and *reference* data in separate groups, whereas for the regridded data the reference data is provided as a separate file.\n", + "\n", + "## Variable names" ] }, { - "cell_type": "code", - "execution_count": null, - "id": "b66b8e5f-1b9d-4464-819a-d7c284c9e06f", + "cell_type": "markdown", + "id": "a5ce0ff7-abd0-412b-aa90-43c32e1ed416", "metadata": {}, - "outputs": [], - "source": [] + "source": [ + "### Input data\n", + "\n", + "The input data files all contain the following variables:" + ] + }, + { + "cell_type": "markdown", + "id": "c636db5f-e361-4e17-a4a1-aa8580660b8f", + "metadata": {}, + "source": [ + "#### Observations\n", + "\n", + "| Variable name | Explanation | Unit |\n", + "|-----------------------------|---------------------------------------------------------|--------------|\n", + "| ``tbs_mw`` | Microwave brightness temperatures | K | \n", + "| ``tbs_ir`` | 11 $\\mu m$ brightness temperatures | K |" + ] + }, + { + "cell_type": "markdown", + "id": "3dc30c08-a7b8-4096-a9da-6ef5adb81732", + "metadata": {}, + "source": [ + "#### Ancillary data\n", + "\n", + "| Variable name | Explanation | Unit |\n", + "|-----------------------------|---------------------------------------------------------|--------------|\n", + "| ``earth_incidence_angle`` | Earth incidence angle | Degree |\n", + "| ``wet_bulb_temperature`` | Wet-bulb temperature | K |\n", + "| ``lapse_rate`` | Lapse rate | K / km |\n", + "| ``total_column_water_vapor``| Total-column water vapor | kg / m$^2$ |\n", + "| ``surface_temperature`` | Surface temperature | K |\n", + "| ``two_meter_temperature`` | Two-meter temperature | K |\n", + "| ``convective_precipitation``| ERA5 convective precipitation | mm / h |\n", + "| ``moisture_convergence`` | ERA5 moisture convergence | kg / m$^2$ |\n", + "| ``leaf_area_index`` | Leaf-area index | m$^2$ / m$^2$|\n", + "| ``snow_depth`` | Snow depth | mm |\n", + "| ``orographic_wind`` | ERA5 orographic wind | m / s |\n", + "| ``10m_wind`` | ERA5 10-m wind | m / s |\n", + "| ``mountain_type`` | Mountain type | --- |\n", + "| ``land_fraction`` | Land fraction | % |\n", + "| ``ice_fraction`` | Ice fraction | % |\n", + "| ``l1c_quality_flag`` | GPM L1C quality flag | --- |\n", + "| ``sunglint_angle`` | Sunglint angle | Degree |\n", + "| ``surface_type`` | CSU surface type | --- |\n", + "| ``airlifting_index`` | Airlifting index | --- |\n", + "\n", + "### Geolocation and time\n", + "\n", + "| Variable name | Explanation | Unit |\n", + "|-----------------------------|---------------------------------------------------------|--------------|\n", + "| ``latitude``* | Latitude | Degree N |\n", + "| ``longitude``* | Longitude | Degree E |\n", + "| ``scan_time``* | Time stamp marking the start of the scan line | --- |\n", + "\n", + "## Native grids\n", + "\n", + "\n", + "## Input data\n", + "\n", + "## Reference data\n", + "\n", + "| Variable name | Explanation | Unit |\n", + "|-----------------------------|---------------------------------------------------------|--------------|\n", + "| ``surface_precip`` | Ground-truth surface precipitation | mm/h | \n", + "| ``surface_precip_cmb`` | Surface precip from GPM CMB | mm/h |\n", + "| ``surface_precip_mirs`` | Surface precip from MIRS | mm/h |\n", + "\n", + "> **Note**: Not all reference data variables are present in all files. The ``surface_precip_cmb`` and ``surface_precip_mirs``\n", + " fields, for example, are only present in reference data derived from GPM CMB. The ``precip_type`` and ``radar_quality_index``\n", + " field, on the other hand, are present only in files derived from MRMS." + ] } ], "metadata": { @@ -66,7 +161,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.12" + "version": "3.10.13" } }, "nbformat": 4,