metobs_toolkit.dataset.Dataset.fill_gaps_with_debiased_modeldata#
- Dataset.fill_gaps_with_debiased_modeldata(obstype: str, leading_period_duration: str | Timedelta = Timedelta('1 days 00:00:00'), min_leading_records_total: int = 60, trailing_period_duration: str | Timedelta = Timedelta('1 days 00:00:00'), min_trailing_records_total: int = 60, overwrite_fill: bool = False, modelname: str | None = None, modelvariable: str | None = None, max_gap_duration_to_fill: str | Timedelta = Timedelta('0 days 12:00:00'), min_value: float | None = None, max_value: float | None = None) None[source]#
Fill the gaps using model data corrected for the bias.
This method fills the gaps using model data corrected for bias. The bias is estimated using a leading (before the gap) and trailing (after the gap) period. The bias is computed by combining the leading and trailing period, and comparing the model with the observations (not labeled as outliers). The model data is then interpolated to the missing records, and corrected with the estimated bias.
- Parameters:
obstype (str) – The target obstype to fill the gaps for.
leading_period_duration (str or pandas.Timedelta, optional) – The duration of the leading period. The default is “24h”.
min_leading_records_total (int, optional) – The minimum number of records required in the leading period. The default is 60.
trailing_period_duration (str or pandas.Timedelta, optional) – The duration of the trailing period. The default is “24h”.
min_trailing_records_total (int, optional) – The minimum number of records required in the trailing period. The default is 60.
overwrite_fill (bool, optional) – If True, the status of a gap and present gapfill info will be ignored and overwritten. If False, only gaps without gapfill data are filled. The default is False.
modelname (str, optional) – The model name to filter by when multiple model data sources exist for the same observation type. If None, no filtering by model name is applied. The default is None.
modelvariable (str, optional) – The model variable to filter by when multiple model variables exist for the same observation type and model. If None, no filtering by model variable is applied. The default is None.
max_gap_duration_to_fill (str or pandas.Timedelta, optional) – The maximum gap duration of to fill with interpolation. The result is independent on the time-resolution of the gap. Defaults to 12 hours.
min_value (float, optional) – Minimum threshold for the filled values. Values below this threshold will be clipped to this minimum. If None, no minimum threshold is applied. The default is None.
max_value (float, optional) – Maximum threshold for the filled values. Values above this threshold will be clipped to this maximum. If None, no maximum threshold is applied. The default is None.
- Return type:
None
Notes
A schematic description of the debiased model data gap fill:
Check if the obstype is knonw, and if the corresponding modeldata is present.
Iterate over the gaps of the obstype.
Check the compatibility of the ModelTimeSeries with the gap.
Construct a leading and trailing sample, and test if they meet the required conditions.
Compute the bias of the modeldata (combine leading and trailing samples).
Fill the gap records by using raw (interpolated) modeldata that is corrected by subtracting the bias.
Clip filled values to the range [min_value, max_value] if specified.
Update the gap attributes with the interpolated values, labels, and details.