Skip to content

Commit

Permalink
Update dataset organization.
Browse files Browse the repository at this point in the history
  • Loading branch information
simonpf committed Dec 12, 2023
1 parent a9520e7 commit 5c345b3
Showing 1 changed file with 121 additions and 26 deletions.
147 changes: 121 additions & 26 deletions notebooks/dataset_organization.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -3,51 +3,146 @@
{
"cell_type": "markdown",
"id": "7b4c0499-4e55-4976-924d-0754bc252b39",
"metadata": {},
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"# Dataset organization\n",
"# Dataset structure\n",
"\n",
" The SPREE dataset contains collocations of GPM constellation sensors with *reference preciptiation measurements* from different sources. All collocations are provided on two grids: The native grid of the GPM PMW sensor and regridded to a regular lat/lon grid with a resolution of 0.036$^\\circ$. These two types of collocations will be referred to as *native* and *regridded*.\n",
"SPEED consists of collocations of GPM PMW sensors with *reference preciptiation estimates* from multiple *reference data sources*. All collocations are provided on two grids: The native grid of the respective GPM PMW sensor and regridded to a regular lat/lon grid with a resolution of 0.036$^\\circ$. These two types of collocations will be referred to as *native* and *gridded*.\n",
" "
]
},
{
"cell_type": "markdown",
"id": "15fd0b4d-2539-4567-a421-eca05af9e8d6",
"metadata": {},
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"For a given source of reference data, here named ``reference``, the data is organized as follows:\n",
"## Organization\n",
"\n",
"For a given source of reference data, here named ``reference``, the data is organized into folders as shown below.\n",
"\n",
"````\n",
"reference\n",
" ├── native\n",
" │   ├── sensor1\n",
" │   │   ├── reference_sensor1_YYYYMMDDHHMMSS.nc\n",
"<reference_data>\n",
" ├── <sensor_1>\n",
" │   ├── native\n",
" │   │   ├── <reference_data>_<sensor_2>_YYYYMMDDHHMMSS.nc\n",
" │   │   └── ...\n",
" │   └── sensor2\n",
" │   ├── reference_sensor2_YYYYMMDDHHMMSS.nc\n",
" │   └── gridded\n",
" │   ├── <reference_data>_<sensor_2>_YYYYMMDDHHMMSS.nc\n",
" │   └── ...\n",
" └── regridded\n",
" ├── reference\n",
" │   └── reference_YYYYMMDDHHMMSS.nc\n",
" ├── sensor1\n",
" │   ├── sensor1_YYYYMMDDHHMMSS.nc\n",
" └── <sensor_2>\n",
" ├── native\n",
" │   ├── <reference_data>_<sensor2>_YYYYMMDDHHMMSS.nc\n",
" │   └── ...\n",
" └── sensor2\n",
" ├── sensor2_YYYYMMDDHHMMSS.nc\n",
" └── ...\n",
" └── gridded\n",
"    ├── <reference_data>_<sensor_2>_YYYYMMDDHHMMSS.nc\n",
"    └── ...\n",
"````\n",
"\n",
"The ``native`` sub-folder contains folders for each of the input sources, here ``sensor1`` and ``sensor2``. These sensors folder contain the collocations in NetCDF4 format. The ``regridded`` folder contains the corresponding regridded collocations organized into folders for each input sources and an additional folder containing the reference data."
"At the highest-level, the data is separated by reference data source. The collocations for every reference data souce are split up into a ``native`` sub-folder containing the collocations on the native grids and a ``gridded`` folder containing the gridded collocations. \n",
"Within the ``native`` folder, collcation files are organized into different folders with respect to the sensor they are derived from (``sensor1`` and ``sensor2`` in the example)."
]
},
{
"cell_type": "markdown",
"id": "fc135c3f-bd78-4961-bf7c-3cd97a150c19",
"metadata": {},
"source": [
"# File content\n",
"\n",
"The file structure of the native and regridded data is slightly different but they share the same variable names. Native-grid files contain both *input* and *reference* data in separate groups, whereas for the regridded data the reference data is provided as a separate file.\n",
"\n",
"## Variable names"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b66b8e5f-1b9d-4464-819a-d7c284c9e06f",
"cell_type": "markdown",
"id": "a5ce0ff7-abd0-412b-aa90-43c32e1ed416",
"metadata": {},
"outputs": [],
"source": []
"source": [
"### Input data\n",
"\n",
"The input data files all contain the following variables:"
]
},
{
"cell_type": "markdown",
"id": "c636db5f-e361-4e17-a4a1-aa8580660b8f",
"metadata": {},
"source": [
"#### Observations\n",
"\n",
"| Variable name | Explanation | Unit |\n",
"|-----------------------------|---------------------------------------------------------|--------------|\n",
"| ``tbs_mw`` | Microwave brightness temperatures | K | \n",
"| ``tbs_ir`` | 11 $\\mu m$ brightness temperatures | K |"
]
},
{
"cell_type": "markdown",
"id": "3dc30c08-a7b8-4096-a9da-6ef5adb81732",
"metadata": {},
"source": [
"#### Ancillary data\n",
"\n",
"| Variable name | Explanation | Unit |\n",
"|-----------------------------|---------------------------------------------------------|--------------|\n",
"| ``earth_incidence_angle`` | Earth incidence angle | Degree |\n",
"| ``wet_bulb_temperature`` | Wet-bulb temperature | K |\n",
"| ``lapse_rate`` | Lapse rate | K / km |\n",
"| ``total_column_water_vapor``| Total-column water vapor | kg / m$^2$ |\n",
"| ``surface_temperature`` | Surface temperature | K |\n",
"| ``two_meter_temperature`` | Two-meter temperature | K |\n",
"| ``convective_precipitation``| ERA5 convective precipitation | mm / h |\n",
"| ``moisture_convergence`` | ERA5 moisture convergence | kg / m$^2$ |\n",
"| ``leaf_area_index`` | Leaf-area index | m$^2$ / m$^2$|\n",
"| ``snow_depth`` | Snow depth | mm |\n",
"| ``orographic_wind`` | ERA5 orographic wind | m / s |\n",
"| ``10m_wind`` | ERA5 10-m wind | m / s |\n",
"| ``mountain_type`` | Mountain type | --- |\n",
"| ``land_fraction`` | Land fraction | % |\n",
"| ``ice_fraction`` | Ice fraction | % |\n",
"| ``l1c_quality_flag`` | GPM L1C quality flag | --- |\n",
"| ``sunglint_angle`` | Sunglint angle | Degree |\n",
"| ``surface_type`` | CSU surface type | --- |\n",
"| ``airlifting_index`` | Airlifting index | --- |\n",
"\n",
"### Geolocation and time\n",
"\n",
"| Variable name | Explanation | Unit |\n",
"|-----------------------------|---------------------------------------------------------|--------------|\n",
"| ``latitude``* | Latitude | Degree N |\n",
"| ``longitude``* | Longitude | Degree E |\n",
"| ``scan_time``* | Time stamp marking the start of the scan line | --- |\n",
"\n",
"## Native grids\n",
"\n",
"\n",
"## Input data\n",
"\n",
"## Reference data\n",
"\n",
"| Variable name | Explanation | Unit |\n",
"|-----------------------------|---------------------------------------------------------|--------------|\n",
"| ``surface_precip`` | Ground-truth surface precipitation | mm/h | \n",
"| ``surface_precip_cmb`` | Surface precip from GPM CMB | mm/h |\n",
"| ``surface_precip_mirs`` | Surface precip from MIRS | mm/h |\n",
"\n",
"> **Note**: Not all reference data variables are present in all files. The ``surface_precip_cmb`` and ``surface_precip_mirs``\n",
" fields, for example, are only present in reference data derived from GPM CMB. The ``precip_type`` and ``radar_quality_index``\n",
" field, on the other hand, are present only in files derived from MRMS."
]
}
],
"metadata": {
Expand All @@ -66,7 +161,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.10.13"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 5c345b3

Please sign in to comment.