WhiteSet#
The WhiteSet class manages whitelisted records for quality control operations. It allows specific observations (identified by station name, observation type, and/or timestamp) to be excluded from outlier detection during QC checks. Whitelisted records participate in all QC calculations but are not flagged as outliers in the final results.
A regular user interacts with WhiteSet instances when calling QC methods on Dataset and Station classes, providing fine-grained control over which records should be protected from being flagged as outliers.
Important
In practice, users only create WhiteSet instances to pass them as parameters to QC methods. A user does generally not need to interact with this class through its methods.
Constructor#
|
Whitelist container for multiple stations and observation types. |
Methods#
A summary of all methods in the WhiteSet class.
|
Retrieve and optionally print detailed information about the WhiteSet. |
|
Create a sensor-specific whitelist for a station and observation type. |
Index Structure#
The white_records index can have one or more of the following levels:
name: Station identifier (matches the station name)
obstype: Observation type (e.g., ‘temp’, ‘humidity’)
datetime: Specific timestamps to whitelist
If the ‘datetime’ level is absent, all timestamps for matching station/obstype combinations are whitelisted.
Timezone Handling#
Important
When the white_records index contains a ‘datetime’ level, timestamps are automatically
handled during WhiteSet initialization:
Timezone-aware timestamps are converted to UTC
Timezone-naive timestamps are localized to UTC (with a warning)
It is strongly recommended to provide timezone-aware timestamps to avoid ambiguity and ensure correct matching during quality control operations.
Examples#
Create a WhiteSet with datetime-only whitelisting:
import pandas as pd
import metobs_toolkit
# Whitelist specific timestamps across all stations
timestamps = pd.date_range('2022-09-01 00:00', periods=10, freq='1h', tz='UTC')
whiteset = metobs_toolkit.WhiteSet(
pd.Index(timestamps, name='datetime')
)
# Use in QC check
dataset.gross_value_check(
obstype='temp',
lower_threshold=10.0,
upper_threshold=25.0,
whiteset=whiteset
)
Create a WhiteSet with station and datetime:
# Whitelist specific timestamps for specific stations
white_records = pd.MultiIndex.from_arrays([
['station1', 'station1', 'station2'],
pd.to_datetime(['2022-09-01 12:00', '2022-09-01 13:00', '2022-09-01 14:00'])
], names=['name', 'datetime'])
whiteset = metobs_toolkit.WhiteSet(white_records)
dataset.persistence_check(
obstype='temp',
timewindow='2h',
whiteset=whiteset
)
Create a WhiteSet with full specification:
# Whitelist specific records with all levels
white_records = pd.MultiIndex.from_arrays([
['station1', 'station1', 'station2'],
['temp', 'humidity', 'temp'],
pd.to_datetime(['2022-09-01 12:00', '2022-09-01 13:00', '2022-09-01 14:00'])
], names=['name', 'obstype', 'datetime'])
whiteset = metobs_toolkit.WhiteSet(white_records)
Notes#
Whitelisted records participate in all QC calculations (e.g., mean, standard deviation in buddy checks) but are protected from being flagged as outliers.
An empty WhiteSet (default) means no records are whitelisted.
The WhiteSet is validated upon initialization to ensure it has the correct index structure.
When using WhiteSet with Dataset-level QC methods, the whitelist is automatically filtered for each station and observation type combination.