Toolkit objects overview#

This toolkit is a Python package based on object-oriented programming (OOP). Here you can find a short description of the classes that are commonly used. In the introduction example you can find a notebook example to illustrate these classes.

Dataset#

The Datset class is at the heart of the toolkit and it holds a set of Station s. All methods applied on a Dataset are equally applied on all its Station s. Thus all methods that can be applied on a Station can be applied on a Dataset. See the API documentation for more details Dataset API and Station API.

your_dataset = metobs_toolkit.Dataset()

The dataset holds methods for

  • Importing raw data

  • Resampling/synchronizing timeseries

  • Extracting metadata

  • Visualizing

  • Quality control

  • Gap filling

Station#

A Station holds:

  • timeseries related to sensors (as SensorData)

  • metadata related to the site of the station (as a Site)

  • timeseries originating from models at the station site (as ModelTimeSeries).

All stations are assume to have a unique name. The data of a sensor is stored in SensorData objects, and all the metadata is stored by the Site object of the station.

You cannot create a new station, but you can extract a station from a Dataset. A Station is a class that has the same attributes and methods as a Dataset, but all the observations are limited to a specific station.

your_station = your_dataset.get_station(stationname = 'station_A')

SensorData#

A SensorData object holds the timeseries data for a specific observation type (e.g., temperature, humidity, wind speed) at a station. Each station can have multiple SensorData objects, one for each observation type. SensorData manages the actual measurements, associated timestamps, and quality control labels for its observation type. If present, gaps are stored in the SensorData SensorData objects are not created directly by users; they are managed by the toolkit when importing or processing data.

In practice you do not need to interact directly with this class. You can inspect the observations by using the df attribute on a Station or Dataset.

records_dataframe = your_dataset.df

See the API documentation SensorData API for more details.

Gaps#

A gap is a period in the timeseries where observations are missing. Gaps are automatically detected when importing raw data or after resampling/synchronizing timeseries. Each gap is represented by a Gap object, which is stored by the corresponding SensorData.

Regular users do not interact directly with Gap objects. Instead, gaps can be inspected and filled via methods on the Dataset and Station classes. For example, you can view gaps using the gapsdf attribute or visualize them with plotting methods.

# Inspect gaps for a dataset
print(your_dataset.gapsdf)

See the API documentation Gap API for more details.

Analysis#

The Analysis class is created from a Dataset and holds the observations that are assumed to be correct. In contrast to the Dataset, the Analysis methods do not change the observations but the focus is on filtering and aggregation. The Analysis methods are focussed on aggregating the observations to get insight into diurnal/seasonal patterns and landcover effects.

See the Analysis example for more details.

your_dataset_analysis = metobs_toolkit.Analysis(Dataholder=dataset)

Note

Creating an Analysis of a Station is not recommended, since there is not much scientific value in it.

Geedatasetmanagers#

A Geedatasetmanager is a class that manages the interaction between the toolkit and a specific dataset on Google Earth Engine (GEE). These managers do not store modeldata themselves (that is done in the ModelTimeSeries), but provide the interface to extract and interpret data from GEE.

There are two types of Geedatasetmanagers:

  • GEEStaticDatasetManager: Handles GEE datasets without a time dimension (static). Used to extract static properties (e.g., land cover, altitude, LCZ) at station locations or within buffers.

  • GEEDynamicDatasetManager: Handles GEE datasets with a time dimension (dynamic). Used to extract timeseries data (e.g., ERA5 temperature) at station locations. This manager uses ModelObstype definitions to map GEE dataset bands to observation types and handle unit conversions.

Default managers for common datasets are provided and accessible via the metobs_toolkit.default_GEE_datasets. You can also define your own for custom GEE datasets.

See the API documentation Geedatasetmanagers API and the Gee example for more details.

ModelTimeSeries#

A ModelTimeSeries object stores timeseries data extracted from a dynamic GEE dataset (e.g., ERA5) for a specific observation type at a station. It is similar to the SensorData class. These timeseries represent modelled or reanalysis data, and are typically used for comparison with observations, quality control, or gap filling.

ModelTimeSeries are stored in the Station objects. Regular users do not interact directly with ModelTimeSeries objects. Instead, modeldata can be inspected via the .modeldatadf attribute on the Dataset and Station classes.

# Access modelled temperature timeseries for a station
temp_modeldata = your_station.get_modeltimeseries('temp')

# View the timeseries DataFrame
print(temp_modeldata.df)

# Plot the modelled data
temp_modeldata.plot()

See the API documentation ModelTimeSeries API and the Gee example for more details.

Obstype and ModelObstype#

An Obstype defines an observation type, such as temperature, humidity, or wind speed. It specifies the standard name, standard unit, and a description for the observation type. Obstypes are used throughout the toolkit to ensure consistency in data handling, unit conversion, and quality control.

A ModelObstype extends the concept of an Obstype to model or reanalysis data (e.g., from GEE datasets). In addition to the standard attributes, a ModelObstype defines the corresponding band name and unit in the model dataset. This allows the toolkit to map model data bands to observation types and handle unit conversions automatically.

You typically do not need to create these objects directly; common obstypes and modelobstypes are predefined and used internally by the toolkit and GEE dataset managers.

See the API documentation Obstype API for more details.