Demo: Applying Quality Control#
In this example, we apply Quality Control (QC) to the demo data.
[1]:
import metobs_toolkit
Create your dataset#
We start by creating a dataset.
[2]:
# Create the dataset and import data using the demo files
dataset = metobs_toolkit.Dataset()
dataset.import_data_from_file(
template_file=metobs_toolkit.demo_template,
input_data_file=metobs_toolkit.demo_datafile,
input_metadata_file=metobs_toolkit.demo_metadatafile,
)
Luchtdruk is present in the datafile, but not found in the template! This column will be ignored.
Neerslagintensiteit is present in the datafile, but not found in the template! This column will be ignored.
Neerslagsom is present in the datafile, but not found in the template! This column will be ignored.
Rukwind is present in the datafile, but not found in the template! This column will be ignored.
Luchtdruk_Zeeniveau is present in the datafile, but not found in the template! This column will be ignored.
Globe Temperatuur is present in the datafile, but not found in the template! This column will be ignored.
The following columns are present in the data file, but not in the template! They are skipped!
['Luchtdruk_Zeeniveau', 'Luchtdruk', 'Globe Temperatuur', 'Neerslagintensiteit', 'Rukwind', 'Neerslagsom']
The following columns are found in the metadata, but not in the template and are therefore ignored:
['Network', 'benaming', 'sponsor', 'stad']
When importing raw data, two quality control checks are applied by default:
Duplicate timestamp check: The timestamps are checked for duplicates.
Invalid value check: All observations are cast to numeric values. If a raw value cannot be converted to a numeric value, the timestamp is ignored. This results in a gap.
We can inspect the outliers at any time using the .outliersdf attribute.
[3]:
dataset.outliersdf
[3]:
| value | label | details | |||
|---|---|---|---|---|---|
| datetime | obstype | name |
An empty outliers dataframe indicates that no outliers have been found yet.
Quality Control Checks#
In the MetObs-toolkit, a set of quality-control checks is implemented. Each check aims to find outliers using different techniques. In general, we can group QC checks into two categories: individual checks and social checks. Individual checks take only the time series under investigation into account. Social checks use time series from other sensors to assess the quality of a target time series.
The following individual checks are available:
Dataset.gross_value_check(): Checks whether observations fall between given thresholds.Dataset.persistence_check(): Tests whether observations change over a specific period.Dataset.repetitions_check(): Tests whether an observation remains unchanged for several records.Dataset.step_check(): Tests whether observations produce unrealistic spikes in the time series.Dataset.window_variation_check(): Tests whether the variation exceeds the threshold within moving time windows.
Note: For a detailed description, see the API documentation for these methods. Note: All these checks can be applied to Dataset and Station objects.
The following social checks are available:
Dataset.buddy_check(): Spatial buddy check.Dataset.buddy_check_with_safetynets(): Spatial buddy check with safety nets.
Note: Social checks can only be applied to Dataset objects.
QC pipeline on temperature observations#
As an example, we apply a QC pipeline to temperature observations. Since some checks depend on the time resolution, it is highly recommended to resample your data before applying QC.
[4]:
target = 'temp'
# Resampling
dataset.resample(target_freq='10min')
# 1. gross value check
dataset.gross_value_check(
obstype=target,
lower_threshold=-10.0,
upper_threshold=26.3)
# 2. persistence check
dataset.persistence_check(
obstype=target,
timewindow='60min',
min_records_per_window=3)
# 3. repetitions check
dataset.repetitions_check(
obstype=target,
max_N_repetitions=5
)
# 4. step check
dataset.step_check(
obstype=target,
max_increase_per_second = 8.0 / 3600.0, # depends on the standard unit!
max_decrease_per_second = -10.0 / 3600.0) # depends on the standard unit!
# 5. window variation check
dataset.window_variation_check(
obstype=target,
timewindow='60min',
min_records_per_window=3,
max_increase_per_second=8.0 / 3600.0, # depends on the standard unit!
max_decrease_per_second = -10.0 / 3600.0, # depends on the standard unit!
)
The present gaps are removed, new gaps are constructed for wind_direction data of station vlinder02..
The present gaps are removed, new gaps are constructed for wind_speed data of station vlinder02..
The present gaps are removed, new gaps are constructed for humidity data of station vlinder02..
The present gaps are removed, new gaps are constructed for temp data of station vlinder02..
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
[5]:
# 6. buddy check
# Note: This check can only be applied to a Dataset
dataset.buddy_check(
obstype=target,
# main check settings
spatial_buddy_radius=15000, # 15 km definition of buddy radius
spatial_z_threshold=2.0, # outlier threshold
# requirements
min_sample_size=5,
max_alt_diff=None, # Maximum elevation difference between stations
N_iter=3, # Number of iterations
instantaneous_tolerance='4min', # Maximum timestamp tolerance for 'at the same time'
lapserate=None, # Specify the variation with altitude; if None, no correction is applied
min_sample_spread=1.0, # Minimum spread of the sample (in the same unit as the observations)
use_z_robust_method= True, # Use a robust method for score calculations (median based)
)
Note: Dataset.buddy_check() and Dataset.buddy_check_with_safetynets() are complex methods. For a full explanation, see the API documentation for these methods.
Each check tests all temperature observation records and may mark some of them as outliers. These outliers are not checked by the next QC check.
When a record is marked as an outlier:
It is added to the set of outliers. (See
Dataset.outliersdf.)Its value is set to NaN in the time series. (See
Dataset.df.)
[6]:
# The collection of outliers:
dataset.outliersdf
[6]:
| value | label | details | |||
|---|---|---|---|---|---|
| datetime | obstype | name | |||
| 2022-09-01 00:00:00+00:00 | temp | vlinder05 | 21.1 | persistence outlier | constant values in timewindow of 0 days 01:00:00 |
| 2022-09-01 00:10:00+00:00 | temp | vlinder05 | 21.1 | persistence outlier | constant values in timewindow of 0 days 01:00:00 |
| 2022-09-01 00:20:00+00:00 | temp | vlinder05 | 21.1 | persistence outlier | constant values in timewindow of 0 days 01:00:00 |
| 2022-09-01 00:30:00+00:00 | temp | vlinder05 | 21.1 | persistence outlier | constant values in timewindow of 0 days 01:00:00 |
| 2022-09-01 00:40:00+00:00 | temp | vlinder05 | 21.1 | persistence outlier | constant values in timewindow of 0 days 01:00:00 |
| ... | ... | ... | ... | ... | ... |
| 2022-09-15 23:40:00+00:00 | temp | vlinder05 | 17.4 | persistence outlier | constant values in timewindow of 0 days 01:00:00 |
| vlinder21 | 15.7 | persistence outlier | constant values in timewindow of 0 days 01:00:00 | ||
| 2022-09-15 23:50:00+00:00 | temp | vlinder05 | 17.4 | persistence outlier | constant values in timewindow of 0 days 01:00:00 |
| vlinder21 | 15.7 | persistence outlier | constant values in timewindow of 0 days 01:00:00 | ||
| vlinder23 | 13.9 | persistence outlier | constant values in timewindow of 0 days 01:00:00 |
15751 rows × 3 columns
[7]:
# Values are set to NaN in the time series
# Filter to temperature records of a problematic station as an illustration.
dataset.df.xs('vlinder05', level='name').xs('temp', level='obstype')
[7]:
| value | label | |
|---|---|---|
| datetime | ||
| 2022-09-01 00:00:00+00:00 | 21.1 | persistence outlier |
| 2022-09-01 00:10:00+00:00 | 21.1 | persistence outlier |
| 2022-09-01 00:20:00+00:00 | 21.1 | persistence outlier |
| 2022-09-01 00:30:00+00:00 | 21.1 | persistence outlier |
| 2022-09-01 00:40:00+00:00 | 21.1 | persistence outlier |
| ... | ... | ... |
| 2022-09-15 23:10:00+00:00 | 17.4 | persistence outlier |
| 2022-09-15 23:20:00+00:00 | 17.4 | persistence outlier |
| 2022-09-15 23:30:00+00:00 | 17.4 | persistence outlier |
| 2022-09-15 23:40:00+00:00 | 17.4 | persistence outlier |
| 2022-09-15 23:50:00+00:00 | 17.4 | persistence outlier |
2160 rows × 2 columns
We can visually inspect the effect of QC by plotting the time series and setting colorby='label'.
[8]:
dataset.make_plot(obstype=target,
colorby='label',
title='Timeseries after QC pipeline is applied on temperature')
[8]:
<Axes: title={'center': 'Timeseries after QC pipeline is applied on temperature'}, xlabel='Timestamps (in UTC)', ylabel='temp (degree_Celsius)'>
If you are interested in the performance of the applied QC, you can use the get_qc_stats() method to get an overview of the summary statistics.
[9]:
fig = dataset.get_qc_stats(obstype=target,
make_plot=True)
If you need to inspect the quality control performance for each observational record, including details for each check, you can use the Dataset.qc_overview_df method. It creates a dataframe for all records, including the “ok” records, and the columns indicate the label and details of all performed checks.
The difference compared to the Dataset.outliersdf dataframe is that the latter only presents the final label, while the former shows the details and flags of all individual checks.
[10]:
dataset.qc_overview_df(subset_obstypes=['temp'])
[10]:
| value | label | details | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| checkname | buddy_check | duplicated_timestamp | gross_value | persistence | repetitions | step | window_variation | NaN | buddy_check | duplicated_timestamp | gross_value | persistence | repetitions | step | window_variation | NaN | |||
| datetime | obstype | name | |||||||||||||||||
| 2022-09-01 00:00:00+00:00 | temp | vlinder01 | 18.799999 | condition_unmet | passed | passed | passed | passed | passed | passed | NaN | iteration 0:[Insufficient buddy sample size (n... | no details | no details | no details | no details | no details | NaN | |
| vlinder02 | 19.400000 | condition_unmet | passed | passed | passed | passed | passed | passed | Not applied | iteration 0:[Insufficient buddy sample size (n... | no details | no details | no details | no details | no details | Not applied | |||
| vlinder03 | 17.000000 | condition_unmet | passed | passed | passed | passed | passed | passed | NaN | iteration 0:[Insufficient buddy sample size (n... | no details | no details | no details | no details | no details | NaN | |||
| vlinder04 | 15.900000 | condition_unmet | passed | passed | passed | passed | passed | passed | NaN | iteration 0:[Insufficient buddy sample size (n... | no details | no details | no details | no details | no details | NaN | |||
| vlinder05 | 21.100000 | unchecked | passed | passed | flagged | unchecked | unchecked | unchecked | NaN | iteration 0:[NA --> NA --> NA] \niteration 1:... | no details | no details | constant values in timewindow of 0 days 01:00:00 | no details | no details | NaN | |||
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2022-09-15 23:50:00+00:00 | temp | vlinder24 | 11.300000 | condition_unmet | passed | passed | passed | passed | passed | passed | NaN | iteration 0:[Insufficient buddy sample size (n... | no details | no details | no details | no details | no details | NaN | |
| vlinder25 | 14.200000 | condition_unmet | passed | passed | passed | passed | passed | passed | NaN | iteration 0:[Insufficient buddy sample size (n... | no details | no details | no details | no details | no details | NaN | |||
| vlinder26 | 13.400000 | condition_unmet | passed | passed | passed | passed | passed | passed | NaN | iteration 0:[Insufficient buddy sample size (n... | no details | no details | no details | no details | no details | NaN | |||
| vlinder27 | 14.400000 | condition_unmet | passed | passed | passed | passed | passed | passed | NaN | iteration 0:[Insufficient buddy sample size (n... | no details | no details | no details | no details | no details | NaN | |||
| vlinder28 | 13.200000 | condition_unmet | passed | passed | passed | passed | passed | passed | NaN | iteration 0:[Insufficient buddy sample size (n... | no details | no details | no details | no details | no details | NaN | |||
60480 rows × 17 columns
Advanced: Protecting Specific Records with WhiteSet#
In some cases, you may know that certain observations are valid despite appearing anomalous (e.g., verified extreme weather events, station-specific conditions). The WhiteSet class allows you to protect such records from being flagged as outliers while still including them in all QC calculations.
All quality control checks accept a whiteset parameter.
Example:
[11]:
import pandas as pd
dataset = metobs_toolkit.Dataset()
dataset.import_data_from_file(
template_file=metobs_toolkit.demo_template,
input_data_file=metobs_toolkit.demo_datafile,
input_metadata_file=metobs_toolkit.demo_metadatafile,
)
dataset.resample(target_freq='1h')
# Create a WhiteSet to protect specific timestamps
protected_times = pd.date_range(start='2022-09-03 12:00',
end='2022-09-05 16:00', freq='1h', tz='UTC')
whiteset = metobs_toolkit.WhiteSet(pd.Index(protected_times, name='datetime'))
# Apply QC with whitelisting
dataset.gross_value_check(
obstype='temp',
lower_threshold=0.0,
upper_threshold=20.3,
whiteset=whiteset # Protected records won't be flagged
)
dataset.make_plot(obstype='temp',
colorby='label')
WARNING:<metobs_toolkit>:Luchtdruk is present in the datafile, but not found in the template! This column will be ignored.
WARNING:<metobs_toolkit>:Neerslagintensiteit is present in the datafile, but not found in the template! This column will be ignored.
WARNING:<metobs_toolkit>:Neerslagsom is present in the datafile, but not found in the template! This column will be ignored.
WARNING:<metobs_toolkit>:Rukwind is present in the datafile, but not found in the template! This column will be ignored.
WARNING:<metobs_toolkit>:Luchtdruk_Zeeniveau is present in the datafile, but not found in the template! This column will be ignored.
WARNING:<metobs_toolkit>:Globe Temperatuur is present in the datafile, but not found in the template! This column will be ignored.
WARNING:<metobs_toolkit>:The following columns are present in the data file, but not in the template! They are skipped!
['Luchtdruk_Zeeniveau', 'Luchtdruk', 'Globe Temperatuur', 'Neerslagintensiteit', 'Rukwind', 'Neerslagsom']
WARNING:<metobs_toolkit>:The following columns are found in the metadata, but not in the template and are therefore ignored:
['Network', 'benaming', 'sponsor', 'stad']
WARNING:<metobs_toolkit>:The present gaps are removed, new gaps are constructed for wind_direction data of station vlinder02..
WARNING:<metobs_toolkit>:The present gaps are removed, new gaps are constructed for wind_speed data of station vlinder02..
WARNING:<metobs_toolkit>:The present gaps are removed, new gaps are constructed for humidity data of station vlinder02..
WARNING:<metobs_toolkit>:The present gaps are removed, new gaps are constructed for temp data of station vlinder02..
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
[11]:
<Axes: title={'center': 'temp data.'}, xlabel='Timestamps (in UTC)', ylabel='temp (degree_Celsius)'>
For detailed information and examples, see the page on Using WhiteSet for Quality Control in the topics section.
Important notes on QC#
The settings used in this demo are illustrative.
The settings to use depend on:
The climate and season for the gross value check
The time frequency of your observations, so resampling before the QC pipeline is recommended
The observation type