{ "cells": [ { "cell_type": "markdown", "id": "0a58f202", "metadata": {}, "source": [ "# Converting MetObs Toolkit data to xarray Datasets\n", "\n", "This notebook demonstrates the `to_xr()` methods of the `Station` and `Dataset` classes from the `metobs_toolkit` package using the built-in demo dataset.\n", "\n", "We show:\n", "1. Loading the demo dataset.\n", "2. Converting a single station to an `xarray.Dataset`.\n", "3. Inspecting the structure (dimensions, variables, attributes).\n", "4. Converting the full multi-station dataset to xarray.\n", "5. Exploring selections (e.g. picking observation values vs. labels).\n" ] }, { "cell_type": "markdown", "id": "cd590cef", "metadata": {}, "source": [ "## What is xarray?\n", "\n", "[xarray](https://xarray.dev) is a Python library that brings the labeled data concepts of pandas to N-dimensional arrays (NetCDF-style). It enables:\n", "- Named dimensions (e.g. `datetime`, `kind`, `name`)\n", "- Coordinate-based indexing and selection\n", "- Rich metadata via attributes\n", "- Easy export to formats like NetCDF / Zarr / GRIB (with plugins)\n", "\n", "It is especially useful for structured time series, gridded data, or any multi-dimensional scientific data." ] }, { "cell_type": "code", "execution_count": 1, "id": "f8be2ee7", "metadata": {}, "outputs": [], "source": [ "# Imports\n", "import metobs_toolkit\n", "import xarray as xr" ] }, { "cell_type": "code", "execution_count": 2, "id": "e89caa2e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "Luchtdruk is present in the datafile, but not found in the template! This column will be ignored.\n", "Neerslagintensiteit is present in the datafile, but not found in the template! This column will be ignored.\n", "Neerslagsom is present in the datafile, but not found in the template! This column will be ignored.\n", "Rukwind is present in the datafile, but not found in the template! This column will be ignored.\n", "Luchtdruk_Zeeniveau is present in the datafile, but not found in the template! This column will be ignored.\n", "Globe Temperatuur is present in the datafile, but not found in the template! This column will be ignored.\n", "The following columns are present in the data file, but not in the template! They are skipped!\n", " ['Globe Temperatuur', 'Luchtdruk', 'Rukwind', 'Neerslagsom', 'Neerslagintensiteit', 'Luchtdruk_Zeeniveau']\n", "The following columns are found in the metadata, but not in the template and are therefore ignored: \n", "['stad', 'sponsor', 'Network', 'benaming']\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Number of stations: 28\n", "First 5 station names: ['vlinder01', 'vlinder02', 'vlinder03', 'vlinder04', 'vlinder05']\n" ] } ], "source": [ "# 1. Load the demo dataset into a Dataset object\n", "dataset = metobs_toolkit.Dataset()\n", "dataset.import_data_from_file(\n", " template_file=metobs_toolkit.demo_template,\n", " input_metadata_file=metobs_toolkit.demo_metadatafile,\n", " input_data_file=metobs_toolkit.demo_datafile,\n", ")\n", "\n", "print(f\"Number of stations: {len(dataset.stations)}\")\n", "print(\"First 5 station names:\", [s.name for s in dataset.stations[:5]])" ] }, { "cell_type": "code", "execution_count": 3, "id": "d5103a1e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/repetitions_check.py:78: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.\n", " groups.get_group(\n" ] } ], "source": [ "# 2. Pick one station (e.g. 'vlinder05') and run a simple QC check to add labels\n", "station = dataset.get_station('vlinder05')\n", "station.repetitions_check(max_N_repetitions=200)\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "fe84dec8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset> Size: 311kB\n",
       "Dimensions:         (kind: 2, datetime: 4320)\n",
       "Coordinates:\n",
       "  * kind            (kind) <U5 40B 'obs' 'label'\n",
       "  * datetime        (datetime) datetime64[ns] 35kB 2022-09-01 ... 2022-09-15T...\n",
       "    lat             float64 8B 51.05\n",
       "    lon             float64 8B 3.675\n",
       "    altitude        float64 8B nan\n",
       "    LCZ             float64 8B nan\n",
       "    school          <U12 48B 'Sint-Barbara'\n",
       "Data variables:\n",
       "    wind_speed      (kind, datetime) float64 69kB 1.611 1.611 1.611 ... 0.0 0.0\n",
       "    wind_direction  (kind, datetime) float64 69kB 45.0 45.0 45.0 ... 0.0 0.0 0.0\n",
       "    humidity        (kind, datetime) float64 69kB 61.0 61.0 61.0 ... 0.0 0.0 0.0\n",
       "    temp            (kind, datetime) float64 69kB 21.1 21.1 21.1 ... 5.0 5.0 5.0
" ], "text/plain": [ " Size: 311kB\n", "Dimensions: (kind: 2, datetime: 4320)\n", "Coordinates:\n", " * kind (kind) \n", " .geemap-dark {\n", " --jp-widgets-color: white;\n", " --jp-widgets-label-color: white;\n", " --jp-ui-font-color1: white;\n", " --jp-layout-color2: #454545;\n", " background-color: #383838;\n", " }\n", "\n", " .geemap-dark .jupyter-button {\n", " --jp-layout-color3: #383838;\n", " }\n", "\n", " .geemap-colab {\n", " background-color: var(--colab-primary-surface-color, white);\n", " }\n", "\n", " .geemap-colab .jupyter-button {\n", " --jp-layout-color3: var(--colab-primary-surface-color, white);\n", " }\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'temp' (kind: 2, datetime: 4320)> Size: 69kB\n",
       "array([[21.10000038, 21.10000038, 21.10000038, ..., 17.39999962,\n",
       "        17.39999962, 17.39999962],\n",
       "       [ 0.        ,  0.        ,  0.        , ...,  5.        ,\n",
       "         5.        ,  5.        ]], shape=(2, 4320))\n",
       "Coordinates:\n",
       "  * kind      (kind) <U5 40B 'obs' 'label'\n",
       "  * datetime  (datetime) datetime64[ns] 35kB 2022-09-01 ... 2022-09-15T23:55:00\n",
       "    lat       float64 8B 51.05\n",
       "    lon       float64 8B 3.675\n",
       "    altitude  float64 8B nan\n",
       "    LCZ       float64 8B nan\n",
       "    school    <U12 48B 'Sint-Barbara'\n",
       "Attributes:\n",
       "    obstype_name:                      temp\n",
       "    obstype_desc:                      2m - temperature\n",
       "    obstype_unit:                      degree_Celsius\n",
       "    MetObs toolkit version:            1.0.0\n",
       "    QC checks:                         ['duplicated_timestamp', 'repetitions']\n",
       "    QC:repetitions.max_N_repetitions:  200\n",
       "    QC:repetitions.sensorwhiteset:     0 whitelisted timestamps\n",
       "    GF methods:                        []\n",
       "    Label:ok:                          0\n",
       "    Label:repetitions outlier:         5
" ], "text/plain": [ " Size: 69kB\n", "array([[21.10000038, 21.10000038, 21.10000038, ..., 17.39999962,\n", " 17.39999962, 17.39999962],\n", " [ 0. , 0. , 0. , ..., 5. ,\n", " 5. , 5. ]], shape=(2, 4320))\n", "Coordinates:\n", " * kind (kind) \n", " .geemap-dark {\n", " --jp-widgets-color: white;\n", " --jp-widgets-label-color: white;\n", " --jp-ui-font-color1: white;\n", " --jp-layout-color2: #454545;\n", " background-color: #383838;\n", " }\n", "\n", " .geemap-dark .jupyter-button {\n", " --jp-layout-color3: #383838;\n", " }\n", "\n", " .geemap-colab {\n", " background-color: var(--colab-primary-surface-color, white);\n", " }\n", "\n", " .geemap-colab .jupyter-button {\n", " --jp-layout-color3: var(--colab-primary-surface-color, white);\n", " }\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'temp' (datetime: 4320)> Size: 35kB\n",
       "array([0., 0., 0., ..., 5., 5., 5.], shape=(4320,))\n",
       "Coordinates:\n",
       "  * datetime  (datetime) datetime64[ns] 35kB 2022-09-01 ... 2022-09-15T23:55:00\n",
       "    kind      <U5 20B 'label'\n",
       "    lat       float64 8B 51.05\n",
       "    lon       float64 8B 3.675\n",
       "    altitude  float64 8B nan\n",
       "    LCZ       float64 8B nan\n",
       "    school    <U12 48B 'Sint-Barbara'\n",
       "Attributes:\n",
       "    obstype_name:                      temp\n",
       "    obstype_desc:                      2m - temperature\n",
       "    obstype_unit:                      degree_Celsius\n",
       "    MetObs toolkit version:            1.0.0\n",
       "    QC checks:                         ['duplicated_timestamp', 'repetitions']\n",
       "    QC:repetitions.max_N_repetitions:  200\n",
       "    QC:repetitions.sensorwhiteset:     0 whitelisted timestamps\n",
       "    GF methods:                        []\n",
       "    Label:ok:                          0\n",
       "    Label:repetitions outlier:         5
" ], "text/plain": [ " Size: 35kB\n", "array([0., 0., 0., ..., 5., 5., 5.], shape=(4320,))\n", "Coordinates:\n", " * datetime (datetime) datetime64[ns] 35kB 2022-09-01 ... 2022-09-15T23:55:00\n", " kind \n", " .geemap-dark {\n", " --jp-widgets-color: white;\n", " --jp-widgets-label-color: white;\n", " --jp-ui-font-color1: white;\n", " --jp-layout-color2: #454545;\n", " background-color: #383838;\n", " }\n", "\n", " .geemap-dark .jupyter-button {\n", " --jp-layout-color3: #383838;\n", " }\n", "\n", " .geemap-colab {\n", " background-color: var(--colab-primary-surface-color, white);\n", " }\n", "\n", " .geemap-colab .jupyter-button {\n", " --jp-layout-color3: var(--colab-primary-surface-color, white);\n", " }\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'temp' (datetime: 4320)> Size: 328kB\n",
       "array(['ok', 'ok', 'ok', ..., 'repetitions outlier',\n",
       "       'repetitions outlier', 'repetitions outlier'],\n",
       "      shape=(4320,), dtype='<U19')\n",
       "Coordinates:\n",
       "  * datetime  (datetime) datetime64[ns] 35kB 2022-09-01 ... 2022-09-15T23:55:00\n",
       "    kind      <U5 20B 'label'\n",
       "    lat       float64 8B 51.05\n",
       "    lon       float64 8B 3.675\n",
       "    altitude  float64 8B nan\n",
       "    LCZ       float64 8B nan\n",
       "    school    <U12 48B 'Sint-Barbara'\n",
       "Attributes:\n",
       "    obstype_name:                      temp\n",
       "    obstype_desc:                      2m - temperature\n",
       "    obstype_unit:                      degree_Celsius\n",
       "    MetObs toolkit version:            1.0.0\n",
       "    QC checks:                         ['duplicated_timestamp', 'repetitions']\n",
       "    QC:repetitions.max_N_repetitions:  200\n",
       "    QC:repetitions.sensorwhiteset:     0 whitelisted timestamps\n",
       "    GF methods:                        []\n",
       "    Label:ok:                          0\n",
       "    Label:repetitions outlier:         5
" ], "text/plain": [ " Size: 328kB\n", "array(['ok', 'ok', 'ok', ..., 'repetitions outlier',\n", " 'repetitions outlier', 'repetitions outlier'],\n", " shape=(4320,), dtype=' xr.DataArray:\n", " # Construct the dict that maps labels -> numbers\n", " label_to_numeric_map = {f'{str(key).strip(\"Label:\")}' : val for key, val in da.attrs.items() if key.startswith('Label:')}\n", "\n", " # Now we invert the map\n", " numeric_to_label_map = {v: k for k, v in label_to_numeric_map.items()}\n", "\n", " # Apply the mapping\n", " squarer = lambda t: numeric_to_label_map.get(t, t)\n", " vfunc = np.vectorize(squarer)\n", " da.data = vfunc(da.data)\n", " return da\n", "\n", "obs_labels = numeric_labels_to_string_labels(ds_station['temp'].sel(kind='label'))\n", "obs_labels" ] }, { "cell_type": "code", "execution_count": 8, "id": "c8da90e2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'temp' (datetime: 4320)> Size: 35kB\n",
       "array([21.10000038, 21.10000038, 21.10000038, ..., 17.39999962,\n",
       "       17.39999962, 17.39999962], shape=(4320,))\n",
       "Coordinates:\n",
       "  * datetime  (datetime) datetime64[ns] 35kB 2022-09-01 ... 2022-09-15T23:55:00\n",
       "    kind      <U5 20B 'obs'\n",
       "    lat       float64 8B 51.05\n",
       "    lon       float64 8B 3.675\n",
       "    altitude  float64 8B nan\n",
       "    LCZ       float64 8B nan\n",
       "    school    <U12 48B 'Sint-Barbara'\n",
       "Attributes:\n",
       "    obstype_name:                      temp\n",
       "    obstype_desc:                      2m - temperature\n",
       "    obstype_unit:                      degree_Celsius\n",
       "    MetObs toolkit version:            1.0.0\n",
       "    QC checks:                         ['duplicated_timestamp', 'repetitions']\n",
       "    QC:repetitions.max_N_repetitions:  200\n",
       "    QC:repetitions.sensorwhiteset:     0 whitelisted timestamps\n",
       "    GF methods:                        []\n",
       "    Label:ok:                          0\n",
       "    Label:repetitions outlier:         5
" ], "text/plain": [ " Size: 35kB\n", "array([21.10000038, 21.10000038, 21.10000038, ..., 17.39999962,\n", " 17.39999962, 17.39999962], shape=(4320,))\n", "Coordinates:\n", " * datetime (datetime) datetime64[ns] 35kB 2022-09-01 ... 2022-09-15T23:55:00\n", " kind \n", " .geemap-dark {\n", " --jp-widgets-color: white;\n", " --jp-widgets-label-color: white;\n", " --jp-ui-font-color1: white;\n", " --jp-layout-color2: #454545;\n", " background-color: #383838;\n", " }\n", "\n", " .geemap-dark .jupyter-button {\n", " --jp-layout-color3: #383838;\n", " }\n", "\n", " .geemap-colab {\n", " background-color: var(--colab-primary-surface-color, white);\n", " }\n", "\n", " .geemap-colab .jupyter-button {\n", " --jp-layout-color3: var(--colab-primary-surface-color, white);\n", " }\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset> Size: 8MB\n",
       "Dimensions:         (name: 28, kind: 2, datetime: 4320)\n",
       "Coordinates:\n",
       "  * name            (name) <U9 1kB 'vlinder01' 'vlinder02' ... 'vlinder28'\n",
       "  * kind            (kind) <U5 40B 'obs' 'label'\n",
       "  * datetime        (datetime) datetime64[ns] 35kB 2022-09-01 ... 2022-09-15T...\n",
       "    lat             (name) float64 224B 50.98 51.02 51.32 ... 51.16 51.06 51.04\n",
       "    lon             (name) float64 224B 3.816 3.71 4.952 ... 4.998 3.728 3.77\n",
       "    altitude        float64 8B nan\n",
       "    LCZ             float64 8B nan\n",
       "    school          (name) <U29 3kB 'UGent' 'UGent' ... 'GO! Ath.'\n",
       "Data variables:\n",
       "    wind_speed      (name, kind, datetime) float64 2MB 1.556 1.528 ... 0.0 0.0\n",
       "    wind_direction  (name, kind, datetime) float64 2MB 65.0 75.0 ... 0.0 0.0\n",
       "    humidity        (name, kind, datetime) float64 2MB 65.0 65.0 ... 0.0 0.0\n",
       "    temp            (name, kind, datetime) float64 2MB 18.8 18.8 ... 0.0 0.0
" ], "text/plain": [ " Size: 8MB\n", "Dimensions: (name: 28, kind: 2, datetime: 4320)\n", "Coordinates:\n", " * name (name) \n", " .geemap-dark {\n", " --jp-widgets-color: white;\n", " --jp-widgets-label-color: white;\n", " --jp-ui-font-color1: white;\n", " --jp-layout-color2: #454545;\n", " background-color: #383838;\n", " }\n", "\n", " .geemap-dark .jupyter-button {\n", " --jp-layout-color3: #383838;\n", " }\n", "\n", " .geemap-colab {\n", " background-color: var(--colab-primary-surface-color, white);\n", " }\n", "\n", " .geemap-colab .jupyter-button {\n", " --jp-layout-color3: var(--colab-primary-surface-color, white);\n", " }\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'temp' (kind: 2, datetime: 4320)> Size: 69kB\n",
       "array([[21.10000038, 21.10000038, 21.10000038, ..., 17.39999962,\n",
       "        17.39999962, 17.39999962],\n",
       "       [ 0.        ,  0.        ,  0.        , ...,  5.        ,\n",
       "         5.        ,  5.        ]], shape=(2, 4320))\n",
       "Coordinates:\n",
       "  * kind      (kind) <U5 40B 'obs' 'label'\n",
       "  * datetime  (datetime) datetime64[ns] 35kB 2022-09-01 ... 2022-09-15T23:55:00\n",
       "    lat       float64 8B 51.05\n",
       "    lon       float64 8B 3.675\n",
       "    altitude  float64 8B nan\n",
       "    LCZ       float64 8B nan\n",
       "    school    <U29 116B 'Sint-Barbara'\n",
       "    name      <U9 36B 'vlinder05'\n",
       "Attributes:\n",
       "    obstype_name:            temp\n",
       "    obstype_desc:            2m - temperature\n",
       "    obstype_unit:            degree_Celsius\n",
       "    MetObs toolkit version:  1.0.0\n",
       "    QC checks:               ['duplicated_timestamp']\n",
       "    GF methods:              []\n",
       "    Label:ok:                0
" ], "text/plain": [ " Size: 69kB\n", "array([[21.10000038, 21.10000038, 21.10000038, ..., 17.39999962,\n", " 17.39999962, 17.39999962],\n", " [ 0. , 0. , 0. , ..., 5. ,\n", " 5. , 5. ]], shape=(2, 4320))\n", "Coordinates:\n", " * kind (kind)