Converting MetObs Toolkit data to xarray Datasets#

This notebook demonstrates the to_xr() methods of the Station and Dataset classes from the metobs_toolkit package using the built-in demo dataset.

We show:

Loading the demo dataset.
Converting a single station to an xarray.Dataset.
Inspecting the structure (dimensions, variables, attributes).
Converting the full multi-station dataset to xarray.
Exploring selections (e.g. picking observation values vs. labels).

What is xarray?#

xarray is a Python library that brings the labeled data concepts of pandas to N-dimensional arrays (NetCDF-style). It enables:

Named dimensions (e.g. datetime, kind, name)
Coordinate-based indexing and selection
Rich metadata via attributes
Easy export to formats like NetCDF / Zarr / GRIB (with plugins)

It is especially useful for structured time series, gridded data, or any multi-dimensional scientific data.

[2]:

# Imports
import metobs_toolkit
import xarray as xr

[3]:

# 1. Load the demo dataset into a Dataset object
dataset = metobs_toolkit.Dataset()
dataset.import_data_from_file(
    template_file=metobs_toolkit.demo_template,
    input_metadata_file=metobs_toolkit.demo_metadatafile,
    input_data_file=metobs_toolkit.demo_datafile,
)

print(f"Number of stations: {len(dataset.stations)}")
print("First 5 station names:", [s.name for s in dataset.stations[:5]])

Luchtdruk is present in the datafile, but not found in the template! This column will be ignored.
Neerslagintensiteit is present in the datafile, but not found in the template! This column will be ignored.
Neerslagsom is present in the datafile, but not found in the template! This column will be ignored.
Rukwind is present in the datafile, but not found in the template! This column will be ignored.
Luchtdruk_Zeeniveau is present in the datafile, but not found in the template! This column will be ignored.
Globe Temperatuur is present in the datafile, but not found in the template! This column will be ignored.
The following columns are present in the data file, but not in the template! They are skipped!
 ['Luchtdruk_Zeeniveau', 'Rukwind', 'Neerslagsom', 'Globe Temperatuur', 'Luchtdruk', 'Neerslagintensiteit']
The following columns are found in the metadata, but not in the template and are therefore ignored:
['benaming', 'sponsor', 'Network', 'stad']

Number of stations: 28
First 5 station names: ['vlinder01', 'vlinder02', 'vlinder03', 'vlinder04', 'vlinder05']

[4]:

# 2. Pick one station (e.g. 'vlinder05') and run a simple QC check to add labels
station = dataset.get_station('vlinder05')
station.repetitions_check(max_N_repetitions=200)

/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/metobs_toolkit/qc_collection/repetitions_check.py:62: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(

[6]:

# 3. Convert the single station to an xarray Dataset
ds_station = station.to_xr()

ds_station

[6]:

<xarray.Dataset> Size: 311kB
Dimensions:         (datetime: 4320, kind: 2)
Coordinates:
  * datetime        (datetime) datetime64[ns, UTC] 35kB 2022-09-01 00:00:00+0...
  * kind            (kind) <U5 40B 'obs' 'label'
    lat             float64 8B 51.05
    lon             float64 8B 3.675
    altitude        float64 8B nan
    LCZ             float64 8B nan
    school          <U12 48B 'Sint-Barbara'
Data variables:
    wind_direction  (kind, datetime) object 69kB 45.0 45.0 45.0 ... 'ok' 'ok'
    temp            (kind, datetime) object 69kB 21.100000381469727 ... 'repe...
    wind_speed      (kind, datetime) object 69kB 1.6111112833023071 ... 'ok'
    humidity        (kind, datetime) object 69kB 61.0 61.0 61.0 ... 'ok' 'ok'

xarray.Dataset

Dimensions:
- datetime: 4320
- kind: 2

Coordinates: (7)

datetime

(datetime)

datetime64[ns, UTC]

2022-09-01 00:00:00+00:00 ... 20...

<DatetimeArray>
['2022-09-01 00:00:00+00:00', '2022-09-01 00:05:00+00:00',
 '2022-09-01 00:10:00+00:00', '2022-09-01 00:15:00+00:00',
 '2022-09-01 00:20:00+00:00', '2022-09-01 00:25:00+00:00',
 '2022-09-01 00:30:00+00:00', '2022-09-01 00:35:00+00:00',
 '2022-09-01 00:40:00+00:00', '2022-09-01 00:45:00+00:00',
 ...
 '2022-09-15 23:10:00+00:00', '2022-09-15 23:15:00+00:00',
 '2022-09-15 23:20:00+00:00', '2022-09-15 23:25:00+00:00',
 '2022-09-15 23:30:00+00:00', '2022-09-15 23:35:00+00:00',
 '2022-09-15 23:40:00+00:00', '2022-09-15 23:45:00+00:00',
 '2022-09-15 23:50:00+00:00', '2022-09-15 23:55:00+00:00']
Length: 4320, dtype: datetime64[ns, UTC]

kind
(kind)
<U5
'obs' 'label'
```
array(['obs', 'label'], dtype='<U5')
```
lat
()
float64
51.05
```
array(51.052655)
```
lon
()
float64
3.675
```
array(3.675183)
```
altitude
()
float64
nan
```
array(nan)
```
LCZ
()
float64
nan
```
array(nan)
```
school
()
<U12
'Sint-Barbara'
```
array('Sint-Barbara', dtype='<U12')
```

Data variables: (4)
- wind_direction
  (kind, datetime)
  object
  45.0 45.0 45.0 ... 'ok' 'ok' 'ok'
  obstype_name :
  wind_direction
  obstype_desc :
  wind direction
  obstype_unit :
  degree
  QC :
  {'duplicated_timestamp': {'settings': {}}}
  GF :
  {}
```
array([[45.0, 45.0, 45.0, ..., 65.0, 65.0, 65.0],
       ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],
      shape=(2, 4320), dtype=object)
```
- temp
  (kind, datetime)
  object
  21.100000381469727 ... 'repetiti...
  obstype_name :
  temp
  obstype_desc :
  2m - temperature
  obstype_unit :
  degree_Celsius
  QC :
  {'duplicated_timestamp': {'settings': {}}, 'repetitions': {'settings': {'max_N_repetitions': 200}}}
  GF :
  {}
```
array([[21.100000381469727, 21.100000381469727, 21.100000381469727, ...,
        17.399999618530273, 17.399999618530273, 17.399999618530273],
       ['ok', 'ok', 'ok', ..., 'repetitions outlier',
        'repetitions outlier', 'repetitions outlier']],
      shape=(2, 4320), dtype=object)
```
- wind_speed
  (kind, datetime)
  object
  1.6111112833023071 ... 'ok'
  obstype_name :
  wind_speed
  obstype_desc :
  wind speed
  obstype_unit :
  meter / second
  QC :
  {'duplicated_timestamp': {'settings': {}}}
  GF :
  {}
```
array([[1.6111112833023071, 1.6111112833023071, 1.6111112833023071, ...,
        0.0, 0.0, 0.0],
       ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],
      shape=(2, 4320), dtype=object)
```
- humidity
  (kind, datetime)
  object
  61.0 61.0 61.0 ... 'ok' 'ok' 'ok'
  obstype_name :
  humidity
  obstype_desc :
  2m - relative humidity
  obstype_unit :
  percent
  QC :
  {'duplicated_timestamp': {'settings': {}}}
  GF :
  {}
```
array([[61.0, 61.0, 61.0, ..., 89.0, 89.0, 89.0],
       ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],
      shape=(2, 4320), dtype=object)
```

Indexes: (2)

datetime

PandasIndex

PandasIndex(DatetimeIndex(['2022-09-01 00:00:00+00:00', '2022-09-01 00:05:00+00:00',
               '2022-09-01 00:10:00+00:00', '2022-09-01 00:15:00+00:00',
               '2022-09-01 00:20:00+00:00', '2022-09-01 00:25:00+00:00',
               '2022-09-01 00:30:00+00:00', '2022-09-01 00:35:00+00:00',
               '2022-09-01 00:40:00+00:00', '2022-09-01 00:45:00+00:00',
               ...
               '2022-09-15 23:10:00+00:00', '2022-09-15 23:15:00+00:00',
               '2022-09-15 23:20:00+00:00', '2022-09-15 23:25:00+00:00',
               '2022-09-15 23:30:00+00:00', '2022-09-15 23:35:00+00:00',
               '2022-09-15 23:40:00+00:00', '2022-09-15 23:45:00+00:00',
               '2022-09-15 23:50:00+00:00', '2022-09-15 23:55:00+00:00'],
              dtype='datetime64[ns, UTC]', name='datetime', length=4320, freq=None))

kind

PandasIndex

PandasIndex(Index(['obs', 'label'], dtype='object', name='kind'))

Attributes: (0)

Structure of the station-level Dataset#

For each observed variable (e.g. temp, humidity, etc.) a DataArray is created with:

Dimension kind: separates ‘obs’ (values) and ‘label’ (QC and gap-fill labels)
Dimension datetime: corresponding timestamp

Attributes on each variable include:

obstype_name, obstype_desc, obstype_unit
QC: dictionary of applied quality control checks
GF: dictionary of applied gap-fill

[7]:

# 4. Inspect one variable (e.g. temperature)
ds_station['temp']

[7]:

<xarray.DataArray 'temp' (kind: 2, datetime: 4320)> Size: 69kB
array([[21.100000381469727, 21.100000381469727, 21.100000381469727, ...,
        17.399999618530273, 17.399999618530273, 17.399999618530273],
       ['ok', 'ok', 'ok', ..., 'repetitions outlier',
        'repetitions outlier', 'repetitions outlier']],
      shape=(2, 4320), dtype=object)
Coordinates:
  * datetime  (datetime) datetime64[ns, UTC] 35kB 2022-09-01 00:00:00+00:00 ....
  * kind      (kind) <U5 40B 'obs' 'label'
    lat       float64 8B 51.05
    lon       float64 8B 3.675
    altitude  float64 8B nan
    LCZ       float64 8B nan
    school    <U12 48B 'Sint-Barbara'
Attributes:
    obstype_name:  temp
    obstype_desc:  2m - temperature
    obstype_unit:  degree_Celsius
    QC:            {'duplicated_timestamp': {'settings': {}}, 'repetitions': ...
    GF:            {}

xarray.DataArray

'temp'

kind: 2
datetime: 4320

21.100000381469727 21.100000381469727 ... 'repetitions outlier'

array([[21.100000381469727, 21.100000381469727, 21.100000381469727, ...,
        17.399999618530273, 17.399999618530273, 17.399999618530273],
       ['ok', 'ok', 'ok', ..., 'repetitions outlier',
        'repetitions outlier', 'repetitions outlier']],
      shape=(2, 4320), dtype=object)

Coordinates: (7)

datetime

(datetime)

datetime64[ns, UTC]

2022-09-01 00:00:00+00:00 ... 20...

<DatetimeArray>
['2022-09-01 00:00:00+00:00', '2022-09-01 00:05:00+00:00',
 '2022-09-01 00:10:00+00:00', '2022-09-01 00:15:00+00:00',
 '2022-09-01 00:20:00+00:00', '2022-09-01 00:25:00+00:00',
 '2022-09-01 00:30:00+00:00', '2022-09-01 00:35:00+00:00',
 '2022-09-01 00:40:00+00:00', '2022-09-01 00:45:00+00:00',
 ...
 '2022-09-15 23:10:00+00:00', '2022-09-15 23:15:00+00:00',
 '2022-09-15 23:20:00+00:00', '2022-09-15 23:25:00+00:00',
 '2022-09-15 23:30:00+00:00', '2022-09-15 23:35:00+00:00',
 '2022-09-15 23:40:00+00:00', '2022-09-15 23:45:00+00:00',
 '2022-09-15 23:50:00+00:00', '2022-09-15 23:55:00+00:00']
Length: 4320, dtype: datetime64[ns, UTC]

kind
(kind)
<U5
'obs' 'label'
```
array(['obs', 'label'], dtype='<U5')
```
lat
()
float64
51.05
```
array(51.052655)
```
lon
()
float64
3.675
```
array(3.675183)
```
altitude
()
float64
nan
```
array(nan)
```
LCZ
()
float64
nan
```
array(nan)
```
school
()
<U12
'Sint-Barbara'
```
array('Sint-Barbara', dtype='<U12')
```

Indexes: (2)

datetime

PandasIndex

PandasIndex(DatetimeIndex(['2022-09-01 00:00:00+00:00', '2022-09-01 00:05:00+00:00',
               '2022-09-01 00:10:00+00:00', '2022-09-01 00:15:00+00:00',
               '2022-09-01 00:20:00+00:00', '2022-09-01 00:25:00+00:00',
               '2022-09-01 00:30:00+00:00', '2022-09-01 00:35:00+00:00',
               '2022-09-01 00:40:00+00:00', '2022-09-01 00:45:00+00:00',
               ...
               '2022-09-15 23:10:00+00:00', '2022-09-15 23:15:00+00:00',
               '2022-09-15 23:20:00+00:00', '2022-09-15 23:25:00+00:00',
               '2022-09-15 23:30:00+00:00', '2022-09-15 23:35:00+00:00',
               '2022-09-15 23:40:00+00:00', '2022-09-15 23:45:00+00:00',
               '2022-09-15 23:50:00+00:00', '2022-09-15 23:55:00+00:00'],
              dtype='datetime64[ns, UTC]', name='datetime', length=4320, freq=None))

kind

PandasIndex

PandasIndex(Index(['obs', 'label'], dtype='object', name='kind'))

Attributes: (5)
obstype_name :
temp
obstype_desc :
2m - temperature
obstype_unit :
degree_Celsius
QC :
{'duplicated_timestamp': {'settings': {}}, 'repetitions': {'settings': {'max_N_repetitions': 200}}}
GF :
{}

[10]:

# 5. Inspect the QC labels (kind='label')
labels = ds_station['temp'].sel(kind='label')
labels

[10]:

<xarray.DataArray 'temp' (datetime: 4320)> Size: 35kB
array(['ok', 'ok', 'ok', ..., 'repetitions outlier',
       'repetitions outlier', 'repetitions outlier'],
      shape=(4320,), dtype=object)
Coordinates:
  * datetime  (datetime) datetime64[ns, UTC] 35kB 2022-09-01 00:00:00+00:00 ....
    kind      <U5 20B 'label'
    lat       float64 8B 51.05
    lon       float64 8B 3.675
    altitude  float64 8B nan
    LCZ       float64 8B nan
    school    <U12 48B 'Sint-Barbara'
Attributes:
    obstype_name:  temp
    obstype_desc:  2m - temperature
    obstype_unit:  degree_Celsius
    QC:            {'duplicated_timestamp': {'settings': {}}, 'repetitions': ...
    GF:            {}

xarray.DataArray

'temp'

datetime: 4320

'ok' 'ok' 'ok' ... 'repetitions outlier' 'repetitions outlier'

array(['ok', 'ok', 'ok', ..., 'repetitions outlier',
       'repetitions outlier', 'repetitions outlier'],
      shape=(4320,), dtype=object)

Coordinates: (7)

datetime

(datetime)

datetime64[ns, UTC]

2022-09-01 00:00:00+00:00 ... 20...

<DatetimeArray>
['2022-09-01 00:00:00+00:00', '2022-09-01 00:05:00+00:00',
 '2022-09-01 00:10:00+00:00', '2022-09-01 00:15:00+00:00',
 '2022-09-01 00:20:00+00:00', '2022-09-01 00:25:00+00:00',
 '2022-09-01 00:30:00+00:00', '2022-09-01 00:35:00+00:00',
 '2022-09-01 00:40:00+00:00', '2022-09-01 00:45:00+00:00',
 ...
 '2022-09-15 23:10:00+00:00', '2022-09-15 23:15:00+00:00',
 '2022-09-15 23:20:00+00:00', '2022-09-15 23:25:00+00:00',
 '2022-09-15 23:30:00+00:00', '2022-09-15 23:35:00+00:00',
 '2022-09-15 23:40:00+00:00', '2022-09-15 23:45:00+00:00',
 '2022-09-15 23:50:00+00:00', '2022-09-15 23:55:00+00:00']
Length: 4320, dtype: datetime64[ns, UTC]

kind
()
<U5
'label'
```
array('label', dtype='<U5')
```
lat
()
float64
51.05
```
array(51.052655)
```
lon
()
float64
3.675
```
array(3.675183)
```
altitude
()
float64
nan
```
array(nan)
```
LCZ
()
float64
nan
```
array(nan)
```
school
()
<U12
'Sint-Barbara'
```
array('Sint-Barbara', dtype='<U12')
```

Indexes: (1)

datetime

PandasIndex

PandasIndex(DatetimeIndex(['2022-09-01 00:00:00+00:00', '2022-09-01 00:05:00+00:00',
               '2022-09-01 00:10:00+00:00', '2022-09-01 00:15:00+00:00',
               '2022-09-01 00:20:00+00:00', '2022-09-01 00:25:00+00:00',
               '2022-09-01 00:30:00+00:00', '2022-09-01 00:35:00+00:00',
               '2022-09-01 00:40:00+00:00', '2022-09-01 00:45:00+00:00',
               ...
               '2022-09-15 23:10:00+00:00', '2022-09-15 23:15:00+00:00',
               '2022-09-15 23:20:00+00:00', '2022-09-15 23:25:00+00:00',
               '2022-09-15 23:30:00+00:00', '2022-09-15 23:35:00+00:00',
               '2022-09-15 23:40:00+00:00', '2022-09-15 23:45:00+00:00',
               '2022-09-15 23:50:00+00:00', '2022-09-15 23:55:00+00:00'],
              dtype='datetime64[ns, UTC]', name='datetime', length=4320, freq=None))

Attributes: (5)
obstype_name :
temp
obstype_desc :
2m - temperature
obstype_unit :
degree_Celsius
QC :
{'duplicated_timestamp': {'settings': {}}, 'repetitions': {'settings': {'max_N_repetitions': 200}}}
GF :
{}

[11]:

#or the observations
records = ds_station['temp'].sel(kind='obs')
records

[11]:

<xarray.DataArray 'temp' (datetime: 4320)> Size: 35kB
array([21.100000381469727, 21.100000381469727, 21.100000381469727, ...,
       17.399999618530273, 17.399999618530273, 17.399999618530273],
      shape=(4320,), dtype=object)
Coordinates:
  * datetime  (datetime) datetime64[ns, UTC] 35kB 2022-09-01 00:00:00+00:00 ....
    kind      <U5 20B 'obs'
    lat       float64 8B 51.05
    lon       float64 8B 3.675
    altitude  float64 8B nan
    LCZ       float64 8B nan
    school    <U12 48B 'Sint-Barbara'
Attributes:
    obstype_name:  temp
    obstype_desc:  2m - temperature
    obstype_unit:  degree_Celsius
    QC:            {'duplicated_timestamp': {'settings': {}}, 'repetitions': ...
    GF:            {}

xarray.DataArray

'temp'

datetime: 4320

21.100000381469727 21.100000381469727 ... 17.399999618530273

array([21.100000381469727, 21.100000381469727, 21.100000381469727, ...,
       17.399999618530273, 17.399999618530273, 17.399999618530273],
      shape=(4320,), dtype=object)

Coordinates: (7)

datetime

(datetime)

datetime64[ns, UTC]

2022-09-01 00:00:00+00:00 ... 20...

<DatetimeArray>
['2022-09-01 00:00:00+00:00', '2022-09-01 00:05:00+00:00',
 '2022-09-01 00:10:00+00:00', '2022-09-01 00:15:00+00:00',
 '2022-09-01 00:20:00+00:00', '2022-09-01 00:25:00+00:00',
 '2022-09-01 00:30:00+00:00', '2022-09-01 00:35:00+00:00',
 '2022-09-01 00:40:00+00:00', '2022-09-01 00:45:00+00:00',
 ...
 '2022-09-15 23:10:00+00:00', '2022-09-15 23:15:00+00:00',
 '2022-09-15 23:20:00+00:00', '2022-09-15 23:25:00+00:00',
 '2022-09-15 23:30:00+00:00', '2022-09-15 23:35:00+00:00',
 '2022-09-15 23:40:00+00:00', '2022-09-15 23:45:00+00:00',
 '2022-09-15 23:50:00+00:00', '2022-09-15 23:55:00+00:00']
Length: 4320, dtype: datetime64[ns, UTC]

kind
()
<U5
'obs'
```
array('obs', dtype='<U5')
```
lat
()
float64
51.05
```
array(51.052655)
```
lon
()
float64
3.675
```
array(3.675183)
```
altitude
()
float64
nan
```
array(nan)
```
LCZ
()
float64
nan
```
array(nan)
```
school
()
<U12
'Sint-Barbara'
```
array('Sint-Barbara', dtype='<U12')
```

Indexes: (1)

datetime

PandasIndex

PandasIndex(DatetimeIndex(['2022-09-01 00:00:00+00:00', '2022-09-01 00:05:00+00:00',
               '2022-09-01 00:10:00+00:00', '2022-09-01 00:15:00+00:00',
               '2022-09-01 00:20:00+00:00', '2022-09-01 00:25:00+00:00',
               '2022-09-01 00:30:00+00:00', '2022-09-01 00:35:00+00:00',
               '2022-09-01 00:40:00+00:00', '2022-09-01 00:45:00+00:00',
               ...
               '2022-09-15 23:10:00+00:00', '2022-09-15 23:15:00+00:00',
               '2022-09-15 23:20:00+00:00', '2022-09-15 23:25:00+00:00',
               '2022-09-15 23:30:00+00:00', '2022-09-15 23:35:00+00:00',
               '2022-09-15 23:40:00+00:00', '2022-09-15 23:45:00+00:00',
               '2022-09-15 23:50:00+00:00', '2022-09-15 23:55:00+00:00'],
              dtype='datetime64[ns, UTC]', name='datetime', length=4320, freq=None))

Attributes: (5)
obstype_name :
temp
obstype_desc :
2m - temperature
obstype_unit :
degree_Celsius
QC :
{'duplicated_timestamp': {'settings': {}}, 'repetitions': {'settings': {'max_N_repetitions': 200}}}
GF :
{}

Converting the full Dataset#

We can also use to_xr() on a Dataset object. Doing so, an extra dimension name is added in the xarray.Dataset.

[12]:

# 6. Convert the entire collection of stations
ds_all = dataset.to_xr()

ds_all

[12]:

<xarray.Dataset> Size: 8MB
Dimensions:         (name: 28, kind: 2, datetime: 4320)
Coordinates:
  * datetime        (datetime) datetime64[ns, UTC] 35kB 2022-09-01 00:00:00+0...
  * kind            (kind) <U5 40B 'obs' 'label'
    lat             (name) float64 224B 50.98 51.02 51.32 ... 51.16 51.06 51.04
    lon             (name) float64 224B 3.816 3.71 4.952 ... 4.998 3.728 3.77
    altitude        float64 8B nan
    LCZ             float64 8B nan
    school          (name) <U29 3kB 'UGent' 'UGent' ... 'GO! Ath.'
  * name            (name) <U9 1kB 'vlinder01' 'vlinder02' ... 'vlinder28'
Data variables:
    wind_direction  (name, kind, datetime) object 2MB 65.0 75.0 ... 'ok' 'ok'
    temp            (name, kind, datetime) object 2MB 18.799999237060547 ... ...
    wind_speed      (name, kind, datetime) object 2MB 1.5555555820465088 ... ...
    humidity        (name, kind, datetime) object 2MB 65.0 65.0 ... 'ok' 'ok'

xarray.Dataset

Dimensions:
- name: 28
- kind: 2
- datetime: 4320

Coordinates: (8)

datetime

(datetime)

datetime64[ns, UTC]

2022-09-01 00:00:00+00:00 ... 20...

<DatetimeArray>
['2022-09-01 00:00:00+00:00', '2022-09-01 00:05:00+00:00',
 '2022-09-01 00:10:00+00:00', '2022-09-01 00:15:00+00:00',
 '2022-09-01 00:20:00+00:00', '2022-09-01 00:25:00+00:00',
 '2022-09-01 00:30:00+00:00', '2022-09-01 00:35:00+00:00',
 '2022-09-01 00:40:00+00:00', '2022-09-01 00:45:00+00:00',
 ...
 '2022-09-15 23:10:00+00:00', '2022-09-15 23:15:00+00:00',
 '2022-09-15 23:20:00+00:00', '2022-09-15 23:25:00+00:00',
 '2022-09-15 23:30:00+00:00', '2022-09-15 23:35:00+00:00',
 '2022-09-15 23:40:00+00:00', '2022-09-15 23:45:00+00:00',
 '2022-09-15 23:50:00+00:00', '2022-09-15 23:55:00+00:00']
Length: 4320, dtype: datetime64[ns, UTC]

kind
(kind)
<U5
'obs' 'label'
```
array(['obs', 'label'], dtype='<U5')
```

lat

(name)

float64

50.98 51.02 51.32 ... 51.06 51.04

array([50.980438, 51.022379, 51.324583, 51.335522, 51.052655, 51.0271  ,
       51.030889, 51.02813 , 50.927167, 50.935556, 51.222422, 51.216477,
       51.212211, 51.350618, 50.9353  , 51.26685 , 51.065269, 51.136244,
       50.841455, 50.847025, 51.260389, 50.989501, 51.260578, 51.167015,
       51.15472 , 51.16176 , 51.058099, 51.035293])

lon

(name)

float64

3.816 3.71 4.952 ... 3.728 3.77

array([3.815763, 3.709695, 4.952109, 4.934732, 3.675183, 4.5163  ,
       4.478445, 4.477398, 4.075722, 4.041389, 4.381726, 4.42344 ,
       4.398065, 4.315013, 4.1926  , 4.293436, 5.613458, 5.656769,
       4.363672, 4.357971, 2.991917, 2.85622 , 3.580151, 3.572062,
       3.708611, 4.997653, 3.728067, 3.769741])

altitude
()
float64
nan
```
array(nan)
```
LCZ
()
float64
nan
```
array(nan)
```

school

(name)

<U29

'UGent' 'UGent' ... 'GO! Ath.'

array(['UGent', 'UGent', 'Heilig Graf', 'Heilig Graf', 'Sint-Barbara',
       'BimSem', 'PTS', 'TSM', 'SMI', 'SMI', 'Sint-Annacollege', 'UGent',
       'UGent', 'UGent', 'Sint-Martinus', 'Sint-Maarten',
       'Sint-Augustinusinstituut Bree', 'TISM Bree', 'UGent', 'UGent',
       'Zeelyceum', '‘t Saam', 'Richtpunt Eeklo', 'OLV ten Doorn',
       'Einstein Atheneum', 'Sint Dimpna', 'Sec. Kunstinstituut',
       'GO! Ath.'], dtype='<U29')

name

(name)

<U9

'vlinder01' ... 'vlinder28'

array(['vlinder01', 'vlinder02', 'vlinder03', 'vlinder04', 'vlinder05',
       'vlinder06', 'vlinder07', 'vlinder08', 'vlinder09', 'vlinder10',
       'vlinder11', 'vlinder12', 'vlinder13', 'vlinder14', 'vlinder15',
       'vlinder16', 'vlinder17', 'vlinder18', 'vlinder19', 'vlinder20',
       'vlinder21', 'vlinder22', 'vlinder23', 'vlinder24', 'vlinder25',
       'vlinder26', 'vlinder27', 'vlinder28'], dtype='<U9')

Data variables: (4)

wind_direction

(name, kind, datetime)

object

65.0 75.0 75.0 ... 'ok' 'ok' 'ok'

obstype_name :: wind_direction
obstype_desc :: wind direction
obstype_unit :: degree
QC :: {'duplicated_timestamp': {'settings': {}}}
GF :: {}

array([[[65.0, 75.0, 75.0, ..., 295.0, 295.0, 295.0],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],

       [[25.0, 85.0, 45.0, ..., 295.0, 295.0, 295.0],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],

       [[115.0, 115.0, 115.0, ..., 95.0, 85.0, 85.0],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],

       ...,

       [[355.0, 25.0, 15.0, ..., 5.0, 5.0, 5.0],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],

       [[335.0, 335.0, 335.0, ..., 85.0, 85.0, 85.0],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],

       [[115.0, 115.0, 35.0, ..., 275.0, 275.0, 285.0],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']]],
      shape=(28, 2, 4320), dtype=object)

temp

(name, kind, datetime)

object

18.799999237060547 ... 'ok'

obstype_name :: temp
obstype_desc :: 2m - temperature
obstype_unit :: degree_Celsius
QC :: {'duplicated_timestamp': {'settings': {}}}
GF :: {}

array([[[18.799999237060547, 18.799999237060547, 18.799999237060547,
         ..., 13.0, 12.899999618530273, 12.899999618530273],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],

       [[19.399999618530273, 19.399999618530273, 19.299999237060547,
         ..., 13.100000381469727, 13.0, 12.899999618530273],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],

       [[17.0, 16.899999618530273, 16.799999237060547, ...,
         12.199999809265137, 12.300000190734863, 12.399999618530273],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],

       ...,

       [[17.899999618530273, 17.700000762939453, 17.5, ...,
         13.399999618530273, 13.399999618530273, 13.300000190734863],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],

       [[19.600000381469727, 19.600000381469727, 19.5, ...,
         14.399999618530273, 14.399999618530273, 14.300000190734863],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],

       [[17.799999237060547, 17.799999237060547, 17.700000762939453,
         ..., 13.199999809265137, 13.199999809265137, 13.0],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']]],
      shape=(28, 2, 4320), dtype=object)

wind_speed

(name, kind, datetime)

object

1.5555555820465088 ... 'ok'

obstype_name :: wind_speed
obstype_desc :: wind speed
obstype_unit :: meter / second
QC :: {'duplicated_timestamp': {'settings': {}}}
GF :: {}

array([[[1.5555555820465088, 1.527777910232544, 1.4166667461395264, ...,
         0.0833333432674408, 0.3055555820465088, 0.111111119389534],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],

       [[0.1944444477558136, 0.111111119389534, 0.25, ..., 0.0, 0.0,
         0.0],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],

       [[0.3333333730697632, 0.3055555820465088, 0.2777777910232544,
         ..., 0.1666666865348816, 0.0277777798473835, 0.0],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],

       ...,

       [[0.888888955116272, 0.9722222685813904, 0.7222222089767456, ...,
         0.055555559694767, 0.0, 0.0277777798473835],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],

       [[0.0, 0.0, 0.0, ..., 0.0, 0.0, 0.0],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],

       [[0.0, 0.0, 0.25, ..., 0.0, 0.0, 0.0],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']]],
      shape=(28, 2, 4320), dtype=object)

humidity

(name, kind, datetime)

object

65.0 65.0 65.0 ... 'ok' 'ok' 'ok'

obstype_name :: humidity
obstype_desc :: 2m - relative humidity
obstype_unit :: percent
QC :: {'duplicated_timestamp': {'settings': {}}}
GF :: {}

array([[[65.0, 65.0, 65.0, ..., 87.0, 86.0, 86.0],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],

       [[62.0, 62.0, 62.0, ..., 83.0, 83.0, 83.0],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],

       [[65.0, 65.0, 65.0, ..., 78.0, 78.0, 77.0],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],

       ...,

       [[61.0, 61.0, 62.0, ..., 77.0, 78.0, 77.0],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],

       [[60.0, 60.0, 61.0, ..., 71.0, 71.0, 71.0],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']],

       [[69.0, 69.0, 69.0, ..., 77.0, 77.0, 77.0],
        ['ok', 'ok', 'ok', ..., 'ok', 'ok', 'ok']]],
      shape=(28, 2, 4320), dtype=object)

Indexes: (3)

datetime

PandasIndex

PandasIndex(DatetimeIndex(['2022-09-01 00:00:00+00:00', '2022-09-01 00:05:00+00:00',
               '2022-09-01 00:10:00+00:00', '2022-09-01 00:15:00+00:00',
               '2022-09-01 00:20:00+00:00', '2022-09-01 00:25:00+00:00',
               '2022-09-01 00:30:00+00:00', '2022-09-01 00:35:00+00:00',
               '2022-09-01 00:40:00+00:00', '2022-09-01 00:45:00+00:00',
               ...
               '2022-09-15 23:10:00+00:00', '2022-09-15 23:15:00+00:00',
               '2022-09-15 23:20:00+00:00', '2022-09-15 23:25:00+00:00',
               '2022-09-15 23:30:00+00:00', '2022-09-15 23:35:00+00:00',
               '2022-09-15 23:40:00+00:00', '2022-09-15 23:45:00+00:00',
               '2022-09-15 23:50:00+00:00', '2022-09-15 23:55:00+00:00'],
              dtype='datetime64[ns, UTC]', name='datetime', length=4320, freq=None))

kind

PandasIndex

PandasIndex(Index(['obs', 'label'], dtype='object', name='kind'))

name

PandasIndex

PandasIndex(Index(['vlinder01', 'vlinder02', 'vlinder03', 'vlinder04', 'vlinder05',
       'vlinder06', 'vlinder07', 'vlinder08', 'vlinder09', 'vlinder10',
       'vlinder11', 'vlinder12', 'vlinder13', 'vlinder14', 'vlinder15',
       'vlinder16', 'vlinder17', 'vlinder18', 'vlinder19', 'vlinder20',
       'vlinder21', 'vlinder22', 'vlinder23', 'vlinder24', 'vlinder25',
       'vlinder26', 'vlinder27', 'vlinder28'],
      dtype='object', name='name'))

Attributes: (0)

[13]:

# 7. Selecting a single station from the multi-station Dataset
ds_one = ds_all.sel(name='vlinder05')
ds_one['temp']

[13]:

<xarray.DataArray 'temp' (kind: 2, datetime: 4320)> Size: 69kB
array([[21.100000381469727, 21.100000381469727, 21.100000381469727, ...,
        17.399999618530273, 17.399999618530273, 17.399999618530273],
       ['ok', 'ok', 'ok', ..., 'repetitions outlier',
        'repetitions outlier', 'repetitions outlier']],
      shape=(2, 4320), dtype=object)
Coordinates:
  * datetime  (datetime) datetime64[ns, UTC] 35kB 2022-09-01 00:00:00+00:00 ....
  * kind      (kind) <U5 40B 'obs' 'label'
    lat       float64 8B 51.05
    lon       float64 8B 3.675
    altitude  float64 8B nan
    LCZ       float64 8B nan
    school    <U29 116B 'Sint-Barbara'
    name      <U9 36B 'vlinder05'
Attributes:
    obstype_name:  temp
    obstype_desc:  2m - temperature
    obstype_unit:  degree_Celsius
    QC:            {'duplicated_timestamp': {'settings': {}}}
    GF:            {}

xarray.DataArray

'temp'

kind: 2
datetime: 4320

21.100000381469727 21.100000381469727 ... 'repetitions outlier'

array([[21.100000381469727, 21.100000381469727, 21.100000381469727, ...,
        17.399999618530273, 17.399999618530273, 17.399999618530273],
       ['ok', 'ok', 'ok', ..., 'repetitions outlier',
        'repetitions outlier', 'repetitions outlier']],
      shape=(2, 4320), dtype=object)

Coordinates: (8)

datetime

(datetime)

datetime64[ns, UTC]

2022-09-01 00:00:00+00:00 ... 20...

<DatetimeArray>
['2022-09-01 00:00:00+00:00', '2022-09-01 00:05:00+00:00',
 '2022-09-01 00:10:00+00:00', '2022-09-01 00:15:00+00:00',
 '2022-09-01 00:20:00+00:00', '2022-09-01 00:25:00+00:00',
 '2022-09-01 00:30:00+00:00', '2022-09-01 00:35:00+00:00',
 '2022-09-01 00:40:00+00:00', '2022-09-01 00:45:00+00:00',
 ...
 '2022-09-15 23:10:00+00:00', '2022-09-15 23:15:00+00:00',
 '2022-09-15 23:20:00+00:00', '2022-09-15 23:25:00+00:00',
 '2022-09-15 23:30:00+00:00', '2022-09-15 23:35:00+00:00',
 '2022-09-15 23:40:00+00:00', '2022-09-15 23:45:00+00:00',
 '2022-09-15 23:50:00+00:00', '2022-09-15 23:55:00+00:00']
Length: 4320, dtype: datetime64[ns, UTC]

kind
(kind)
<U5
'obs' 'label'
```
array(['obs', 'label'], dtype='<U5')
```
lat
()
float64
51.05
```
array(51.052655)
```
lon
()
float64
3.675
```
array(3.675183)
```
altitude
()
float64
nan
```
array(nan)
```
LCZ
()
float64
nan
```
array(nan)
```
school
()
<U29
'Sint-Barbara'
```
array('Sint-Barbara', dtype='<U29')
```
name
()
<U9
'vlinder05'
```
array('vlinder05', dtype='<U9')
```

Indexes: (2)

datetime

PandasIndex

PandasIndex(DatetimeIndex(['2022-09-01 00:00:00+00:00', '2022-09-01 00:05:00+00:00',
               '2022-09-01 00:10:00+00:00', '2022-09-01 00:15:00+00:00',
               '2022-09-01 00:20:00+00:00', '2022-09-01 00:25:00+00:00',
               '2022-09-01 00:30:00+00:00', '2022-09-01 00:35:00+00:00',
               '2022-09-01 00:40:00+00:00', '2022-09-01 00:45:00+00:00',
               ...
               '2022-09-15 23:10:00+00:00', '2022-09-15 23:15:00+00:00',
               '2022-09-15 23:20:00+00:00', '2022-09-15 23:25:00+00:00',
               '2022-09-15 23:30:00+00:00', '2022-09-15 23:35:00+00:00',
               '2022-09-15 23:40:00+00:00', '2022-09-15 23:45:00+00:00',
               '2022-09-15 23:50:00+00:00', '2022-09-15 23:55:00+00:00'],
              dtype='datetime64[ns, UTC]', name='datetime', length=4320, freq=None))

kind

PandasIndex

PandasIndex(Index(['obs', 'label'], dtype='object', name='kind'))

Attributes: (5)
obstype_name :
temp
obstype_desc :
2m - temperature
obstype_unit :
degree_Celsius
QC :
{'duplicated_timestamp': {'settings': {}}}
GF :
{}

Dimension summary (multi-station)#

name: name of the station
kind: sub-type of the data (e.g. ‘obs’, ‘label’, possibly ‘model’ if model time series added)
datetime: consolidated time axis (union across stations)

If model time series (e.g. ERA5) are imported, an additional internal dimension (e.g. models) appears inside the model DataArrays (stacked under kind='model').