metobs_toolkit.station.Station.fill_gaps_with_weighted_diurnal_debiased_modeldata#

Station.fill_gaps_with_weighted_diurnal_debiased_modeldata(obstype: str, leading_period_duration: Timedelta | str = Timedelta('1 days 00:00:00'), trailing_period_duration: Timedelta | str = Timedelta('1 days 00:00:00'), min_lead_debias_sample_size: int = 2, min_trail_debias_sample_size: int = 2, overwrite_fill=False, modelname: str | None = None, modelvariable: str | None = None, max_gap_duration_to_fill: str | Timedelta = Timedelta('0 days 12:00:00'), min_value: float | None = None, max_value: float | None = None)[source]#

Fill the gaps using a weighted sum of model data corrected for the diurnal bias and weights with respect to the start of the gap.

This method fills the gaps using model data corrected for its diurnal bias. The diurnal bias is a bias that is estimated for each timestamp in the leading and trailing period (separately). For both periods separately, all biases are averaged over hour, minute and second, to obtain a diurnal bias (for each timestamp).

In addition, a normalized weight is computed for each gap record indicating the distance (in time) to the start and end of the gap. The correction applied on the interpolated (in time) model data, is thus a weighted sum of corrections coming from both the leading and trailing period.

Parameters:
  • obstype (str) – The target obstype to fill the gaps for.

  • leading_period_duration (str or pandas.Timedelta, optional) – The duration of the leading period. That is the period before the gap, used for bias estimation. The default is “24h”.

  • trailing_period_duration (str or pandas.Timedelta, optional) – The duration of the trailing period. That is the period after the gap, used for bias estimation. The default is “24h”.

  • min_lead_debias_sample_size (int, optional) – The minimum number of leading samples required for bias estimation. If this condition is not met, the gap is not filled. The default is 2.

  • min_trail_debias_sample_size (int, optional) – The minimum number of trailing samples required for bias estimation. If this condition is not met, the gap is not filled. The default is 2.

  • overwrite_fill (bool, optional) – If True, the status of a gap and present gapfill info will be ignored and overwritten. If False, only gaps without gapfill data are filled. The default is False.

  • modelname (str, optional) – The model name to filter by when multiple model data sources exist for the same observation type. If None, no filtering by model name is applied. The default is None.

  • modelvariable (str, optional) – The model variable to filter by when multiple model variables exist for the same observation type and model. If None, no filtering by model variable is applied. The default is None.

  • max_gap_duration_to_fill (str or pandas.Timedelta, optional) – The maximum gap duration of to fill with interpolation. The result is independent on the time-resolution of the gap. Defaults to 12 hours.

  • min_value (float, optional) – Minimum threshold for the filled values. Values below this threshold will be clipped to this minimum. If None, no minimum threshold is applied. The default is None.

  • max_value (float, optional) – Maximum threshold for the filled values. Values above this threshold will be clipped to this maximum. If None, no maximum threshold is applied. The default is None.

Return type:

None

Notes

A schematic description of the weighted diurnal debiased model data gap fill:

  1. Check if the obstype is knonw, and if the corresponding modeldata is present.

  2. Iterate over the gaps of the obstype.

  3. Check the compatibility of the ModelTimeSeries with the gap.

  4. Construct a leading and trailing sample, and test if they meet the required conditions. The required conditions are tested by testing the samplesizes per hour, minute and second for the leading and trailing periods (seperately).

  5. A leading and trailing set of diurnal biases are computed by grouping to hour, minute and second, and averaging the biases.

  6. A weight is computed for each gap record, that is the normalized distance to the start and end of the gap.

  7. Fill the gap records by using raw (interpolated) modeldata is corrected by a weighted sum the coresponding diurnal bias for the lead and trail periods.

  8. Clip filled values to the range [min_value, max_value] if specified.

  9. Update the gap attributes with the interpolated values, labels, and details.

Notes

Note that a suitable min_debias_sample_size depends on the sizes of the leading- and trailing periods, and also on the time resolution gap (=time resolution of the corresponding SensorData).

References

Jacobs A, et. al. (2024) Filling gaps in urban temperature observations by debiasing ERA5 reanalysis data