metobs_toolkit.dataset.Dataset.persistence_check#

Dataset.persistence_check(obstype: str = 'temp', timewindow: str | Timedelta = Timedelta('0 days 01:00:00'), min_records_per_window: int = 5, whiteset: WhiteSet = WhiteSet(empty), use_mp: bool = True) None[source]#

Check if values are not constant in a moving time window.

Perform a persistence check on a time series to identify periods where observations remain constant within a specified time window. If the values are constant, all records in the moving window are flagged as outliers.

Parameters:
  • obstype (str, optional) – The target observation to check. By default “temp”

  • timewindow (str or pandas.Timedelta) – The size of the rolling time window to check for persistence. The default is pandas.Timedelta(“60min”)

  • min_records_per_window (int) – The minimum number of non-NaN records required within the time window for the check to be valid. The default is 5

  • whiteset (WhiteSet, optional) – A WhiteSet instance containing timestamps that should be excluded from outlier detection. Records matching the whiteset criteria will not be flagged as outliers. The default is an empty WhiteSet().

  • use_mp (bool, optional) – If True, the function will use multiprocessing to speed up the calculations. The default is False.

Return type:

None

Notes

  • This method modifies the outliers in place and does not return anything. You can use the outliersdf property to view all flagged outliers.

  • If the minimum number of records per window is locally not met, the function logs a warning and skips the persistence check.

  • This function can be computationally expensive for large datasets or small time windows.

  • The repetitions check is similar to the persistence check, but not identical. The persistence check uses thresholds that are meteorologically based (i.e. the moving window is defined by a duration), in contrast to the repetitions check whose thresholds are instrumentally based (i.e. the “window” is defined by a number of records.)

Warning

If the minimum number of records per window is not met over the full time series, a warning is logged, and the function returns an empty DatetimeIndex.