Demo: Applying Quality Control#

In this example, we apply Quality Control (QC) to the demo data.

[1]:
import metobs_toolkit

Create your dataset#

We start by creating a dataset.

[2]:

# Create the dataset and import data using the demo files dataset = metobs_toolkit.Dataset() dataset.import_data_from_file( template_file=metobs_toolkit.demo_template, input_data_file=metobs_toolkit.demo_datafile, input_metadata_file=metobs_toolkit.demo_metadatafile, )
Luchtdruk is present in the datafile, but not found in the template! This column will be ignored.
Neerslagintensiteit is present in the datafile, but not found in the template! This column will be ignored.
Neerslagsom is present in the datafile, but not found in the template! This column will be ignored.
Rukwind is present in the datafile, but not found in the template! This column will be ignored.
Luchtdruk_Zeeniveau is present in the datafile, but not found in the template! This column will be ignored.
Globe Temperatuur is present in the datafile, but not found in the template! This column will be ignored.
The following columns are present in the data file, but not in the template! They are skipped!
 ['Luchtdruk_Zeeniveau', 'Luchtdruk', 'Globe Temperatuur', 'Neerslagintensiteit', 'Rukwind', 'Neerslagsom']
The following columns are found in the metadata, but not in the template and are therefore ignored:
['Network', 'benaming', 'sponsor', 'stad']

When importing raw data, two quality control checks are applied by default:

  • Duplicate timestamp check: The timestamps are checked for duplicates.

  • Invalid value check: All observations are cast to numeric values. If a raw value cannot be converted to a numeric value, the timestamp is ignored. This results in a gap.

We can inspect the outliers at any time using the .outliersdf attribute.

[3]:
dataset.outliersdf
[3]:
value label details
datetime obstype name

An empty outliers dataframe indicates that no outliers have been found yet.

Quality Control Checks#

In the MetObs-toolkit, a set of quality-control checks is implemented. Each check aims to find outliers using different techniques. In general, we can group QC checks into two categories: individual checks and social checks. Individual checks take only the time series under investigation into account. Social checks use time series from other sensors to assess the quality of a target time series.

The following individual checks are available:

  • Dataset.gross_value_check(): Checks whether observations fall between given thresholds.

  • Dataset.persistence_check(): Tests whether observations change over a specific period.

  • Dataset.repetitions_check(): Tests whether an observation remains unchanged for several records.

  • Dataset.step_check(): Tests whether observations produce unrealistic spikes in the time series.

  • Dataset.window_variation_check(): Tests whether the variation exceeds the threshold within moving time windows.

Note: For a detailed description, see the API documentation for these methods. Note: All these checks can be applied to Dataset and Station objects.

The following social checks are available:

  • Dataset.buddy_check(): Spatial buddy check.

  • Dataset.buddy_check_with_safetynets(): Spatial buddy check with safety nets.

Note: Social checks can only be applied to Dataset objects.

QC pipeline on temperature observations#

As an example, we apply a QC pipeline to temperature observations. Since some checks depend on the time resolution, it is highly recommended to resample your data before applying QC.

[4]:
target = 'temp'

# Resampling
dataset.resample(target_freq='10min')

# 1. gross value check
dataset.gross_value_check(
            obstype=target,
            lower_threshold=-10.0,
            upper_threshold=26.3)

# 2. persistence check
dataset.persistence_check(
            obstype=target,
            timewindow='60min',
            min_records_per_window=3)

# 3. repetitions check
dataset.repetitions_check(
            obstype=target,
            max_N_repetitions=5
)

# 4. step check
dataset.step_check(
            obstype=target,
            max_increase_per_second = 8.0 / 3600.0, # depends on the standard unit!
            max_decrease_per_second = -10.0 / 3600.0) # depends on the standard unit!

# 5. window variation check
dataset.window_variation_check(
            obstype=target,
            timewindow='60min',
            min_records_per_window=3,
            max_increase_per_second=8.0 / 3600.0, # depends on the standard unit!
            max_decrease_per_second = -10.0 / 3600.0, # depends on the standard unit!
            )
The present gaps are removed, new gaps are constructed for wind_direction data of station vlinder02..
The present gaps are removed, new gaps are constructed for wind_speed data of station vlinder02..
The present gaps are removed, new gaps are constructed for humidity data of station vlinder02..
The present gaps are removed, new gaps are constructed for temp data of station vlinder02..
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  groups.get_group(
[5]:
# 6. buddy check

# Note: This check can only be applied to a Dataset
dataset.buddy_check(
        obstype=target,
        # main check settings
        spatial_buddy_radius=15000, # 15 km definition of buddy radius
        spatial_z_threshold=2.0, # outlier threshold
        # requirements
        min_sample_size=5,
        max_alt_diff=None,  # Maximum elevation difference between stations
        N_iter=3, # Number of iterations
        instantaneous_tolerance='4min', # Maximum timestamp tolerance for 'at the same time'
        lapserate=None, # Specify the variation with altitude; if None, no correction is applied
        min_sample_spread=1.0,  # Minimum spread of the sample (in the same unit as the observations)
        use_z_robust_method= True, # Use a robust method for score calculations (median based)
)

Note: Dataset.buddy_check() and Dataset.buddy_check_with_safetynets() are complex methods. For a full explanation, see the API documentation for these methods.

Each check tests all temperature observation records and may mark some of them as outliers. These outliers are not checked by the next QC check.

When a record is marked as an outlier:

  • It is added to the set of outliers. (See Dataset.outliersdf.)

  • Its value is set to NaN in the time series. (See Dataset.df.)

[6]:
# The collection of outliers:
dataset.outliersdf
[6]:
value label details
datetime obstype name
2022-09-01 00:00:00+00:00 temp vlinder05 21.1 persistence outlier constant values in timewindow of 0 days 01:00:00
2022-09-01 00:10:00+00:00 temp vlinder05 21.1 persistence outlier constant values in timewindow of 0 days 01:00:00
2022-09-01 00:20:00+00:00 temp vlinder05 21.1 persistence outlier constant values in timewindow of 0 days 01:00:00
2022-09-01 00:30:00+00:00 temp vlinder05 21.1 persistence outlier constant values in timewindow of 0 days 01:00:00
2022-09-01 00:40:00+00:00 temp vlinder05 21.1 persistence outlier constant values in timewindow of 0 days 01:00:00
... ... ... ... ... ...
2022-09-15 23:40:00+00:00 temp vlinder05 17.4 persistence outlier constant values in timewindow of 0 days 01:00:00
vlinder21 15.7 persistence outlier constant values in timewindow of 0 days 01:00:00
2022-09-15 23:50:00+00:00 temp vlinder05 17.4 persistence outlier constant values in timewindow of 0 days 01:00:00
vlinder21 15.7 persistence outlier constant values in timewindow of 0 days 01:00:00
vlinder23 13.9 persistence outlier constant values in timewindow of 0 days 01:00:00

15751 rows × 3 columns

[7]:
# Values are set to NaN in the time series

# Filter to temperature records of a problematic station as an illustration.
dataset.df.xs('vlinder05', level='name').xs('temp', level='obstype')
[7]:
value label
datetime
2022-09-01 00:00:00+00:00 21.1 persistence outlier
2022-09-01 00:10:00+00:00 21.1 persistence outlier
2022-09-01 00:20:00+00:00 21.1 persistence outlier
2022-09-01 00:30:00+00:00 21.1 persistence outlier
2022-09-01 00:40:00+00:00 21.1 persistence outlier
... ... ...
2022-09-15 23:10:00+00:00 17.4 persistence outlier
2022-09-15 23:20:00+00:00 17.4 persistence outlier
2022-09-15 23:30:00+00:00 17.4 persistence outlier
2022-09-15 23:40:00+00:00 17.4 persistence outlier
2022-09-15 23:50:00+00:00 17.4 persistence outlier

2160 rows × 2 columns

We can visually inspect the effect of QC by plotting the time series and setting colorby='label'.

[8]:
dataset.make_plot(obstype=target,
                  colorby='label',
                  title='Timeseries after QC pipeline is applied on temperature')
[8]:
<Axes: title={'center': 'Timeseries after QC pipeline is applied on temperature'}, xlabel='Timestamps (in UTC)', ylabel='temp (degree_Celsius)'>
../_images/examples_qc_example_16_2.png

If you are interested in the performance of the applied QC, you can use the get_qc_stats() method to get an overview of the summary statistics.

[9]:
fig = dataset.get_qc_stats(obstype=target,
                      make_plot=True)
../_images/examples_qc_example_18_1.png

If you need to inspect the quality control performance for each observational record, including details for each check, you can use the Dataset.qc_overview_df method. It creates a dataframe for all records, including the “ok” records, and the columns indicate the label and details of all performed checks.

The difference compared to the Dataset.outliersdf dataframe is that the latter only presents the final label, while the former shows the details and flags of all individual checks.

[10]:
dataset.qc_overview_df(subset_obstypes=['temp'])
[10]:
value label details
checkname buddy_check duplicated_timestamp gross_value persistence repetitions step window_variation NaN buddy_check duplicated_timestamp gross_value persistence repetitions step window_variation NaN
datetime obstype name
2022-09-01 00:00:00+00:00 temp vlinder01 18.799999 condition_unmet passed passed passed passed passed passed NaN iteration 0:[Insufficient buddy sample size (n... no details no details no details no details no details NaN
vlinder02 19.400000 condition_unmet passed passed passed passed passed passed Not applied iteration 0:[Insufficient buddy sample size (n... no details no details no details no details no details Not applied
vlinder03 17.000000 condition_unmet passed passed passed passed passed passed NaN iteration 0:[Insufficient buddy sample size (n... no details no details no details no details no details NaN
vlinder04 15.900000 condition_unmet passed passed passed passed passed passed NaN iteration 0:[Insufficient buddy sample size (n... no details no details no details no details no details NaN
vlinder05 21.100000 unchecked passed passed flagged unchecked unchecked unchecked NaN iteration 0:[NA --> NA --> NA] \niteration 1:... no details no details constant values in timewindow of 0 days 01:00:00 no details no details NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2022-09-15 23:50:00+00:00 temp vlinder24 11.300000 condition_unmet passed passed passed passed passed passed NaN iteration 0:[Insufficient buddy sample size (n... no details no details no details no details no details NaN
vlinder25 14.200000 condition_unmet passed passed passed passed passed passed NaN iteration 0:[Insufficient buddy sample size (n... no details no details no details no details no details NaN
vlinder26 13.400000 condition_unmet passed passed passed passed passed passed NaN iteration 0:[Insufficient buddy sample size (n... no details no details no details no details no details NaN
vlinder27 14.400000 condition_unmet passed passed passed passed passed passed NaN iteration 0:[Insufficient buddy sample size (n... no details no details no details no details no details NaN
vlinder28 13.200000 condition_unmet passed passed passed passed passed passed NaN iteration 0:[Insufficient buddy sample size (n... no details no details no details no details no details NaN

60480 rows × 17 columns

Advanced: Protecting Specific Records with WhiteSet#

In some cases, you may know that certain observations are valid despite appearing anomalous (e.g., verified extreme weather events, station-specific conditions). The WhiteSet class allows you to protect such records from being flagged as outliers while still including them in all QC calculations.

All quality control checks accept a whiteset parameter.

Example:

[11]:
import pandas as pd
dataset = metobs_toolkit.Dataset()
dataset.import_data_from_file(
    template_file=metobs_toolkit.demo_template,
    input_data_file=metobs_toolkit.demo_datafile,
    input_metadata_file=metobs_toolkit.demo_metadatafile,
)
dataset.resample(target_freq='1h')

# Create a WhiteSet to protect specific timestamps
protected_times = pd.date_range(start='2022-09-03 12:00',
                                end='2022-09-05 16:00', freq='1h', tz='UTC')
whiteset = metobs_toolkit.WhiteSet(pd.Index(protected_times, name='datetime'))

# Apply QC with whitelisting
dataset.gross_value_check(
    obstype='temp',
    lower_threshold=0.0,
    upper_threshold=20.3,
    whiteset=whiteset  # Protected records won't be flagged
)
dataset.make_plot(obstype='temp',
                  colorby='label')
WARNING:<metobs_toolkit>:Luchtdruk is present in the datafile, but not found in the template! This column will be ignored.
WARNING:<metobs_toolkit>:Neerslagintensiteit is present in the datafile, but not found in the template! This column will be ignored.
WARNING:<metobs_toolkit>:Neerslagsom is present in the datafile, but not found in the template! This column will be ignored.
WARNING:<metobs_toolkit>:Rukwind is present in the datafile, but not found in the template! This column will be ignored.
WARNING:<metobs_toolkit>:Luchtdruk_Zeeniveau is present in the datafile, but not found in the template! This column will be ignored.
WARNING:<metobs_toolkit>:Globe Temperatuur is present in the datafile, but not found in the template! This column will be ignored.
WARNING:<metobs_toolkit>:The following columns are present in the data file, but not in the template! They are skipped!
 ['Luchtdruk_Zeeniveau', 'Luchtdruk', 'Globe Temperatuur', 'Neerslagintensiteit', 'Rukwind', 'Neerslagsom']
WARNING:<metobs_toolkit>:The following columns are found in the metadata, but not in the template and are therefore ignored:
['Network', 'benaming', 'sponsor', 'stad']
WARNING:<metobs_toolkit>:The present gaps are removed, new gaps are constructed for wind_direction data of station vlinder02..
WARNING:<metobs_toolkit>:The present gaps are removed, new gaps are constructed for wind_speed data of station vlinder02..
WARNING:<metobs_toolkit>:The present gaps are removed, new gaps are constructed for humidity data of station vlinder02..
WARNING:<metobs_toolkit>:The present gaps are removed, new gaps are constructed for temp data of station vlinder02..
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
  self.outliers_values_bin = pd.concat([self.outliers_values_bin,
[11]:
<Axes: title={'center': 'temp data.'}, xlabel='Timestamps (in UTC)', ylabel='temp (degree_Celsius)'>
../_images/examples_qc_example_22_3.png

For detailed information and examples, see the page on Using WhiteSet for Quality Control in the topics section.

Important notes on QC#

  1. The settings used in this demo are illustrative.

  2. The settings to use depend on:

    • The climate and season for the gross value check

    • The time frequency of your observations, so resampling before the QC pipeline is recommended

    • The observation type