metobs_toolkit.dataset.Dataset.interpolate_gaps#
- Dataset.interpolate_gaps(obstype: str, method: str = 'time', max_gap_duration_to_fill: str | Timedelta = Timedelta('0 days 03:00:00'), n_leading_anchors: int = 1, n_trailing_anchors: int = 1, max_lead_to_gap_distance: Timedelta | None = None, max_trail_to_gap_distance: Timedelta | None = None, overwrite_fill: bool = False, method_kwargs: dict = {}) None[source]#
Fill the gap(s) using interpolation of SensorData.
This method fills all the gaps of a specific obstype, by directly interpolating corresponding SensorData. Each gap is interpolated using the leading and trailing periods of the gap. One can select different interpolation methods. By using restrictions on the leading and trailing periods, one can ensure that the interpolation is only done when there are enough leading and trailing data available.
- Parameters:
obstype (str) – The target obstype to fill the gaps for.
method (str, optional) – Interpolation technique to use. See pandas.DataFrame.interpolate method argument for possible values. Make sure that n_leading_anchors, n_trailing_anchors and method_kwargs are set accordingly to the method (higher order interpolation techniques require more leading and trailing anchors). The default is “time”.
max_gap_duration_to_fill (str or pandas.Timedelta, optional) – The maximum gap duration of to fill with interpolation. The result is independent on the time-resolution of the gap. Defaults to 3 hours.
n_leading_anchors (int, optional) – The number of leading anchors to use for the interpolation. A leading anchor is a near record (not rejected by QC) just before the start of the gap, that is used for interpolation. Higher-order interpolation techniques require multiple leading anchors. Defaults to 1.
n_trailing_anchors (int, optional) – The number of trailing anchors to use for the interpolation. A trailing anchor is a near record (not rejected by QC) just after the end of the gap, that is used for interpolation. Higher-order interpolation techniques require multiple leading anchors. Defaults to 1.
max_lead_to_gap_distance (pandas.Timedelta or None, optional) – The maximum time difference between the start of the gap and a leading anchor(s). If None, no time restriction is applied on the leading anchors. The default is None.
max_trail_to_gap_distance (pandas.Timedelta or None, optional) – The maximum time difference between the end of the gap and a trailing anchor(s). If None, no time restriction is applied on the trailing anchors. Defaults to None.
overwrite_fill (bool, optional) – If True, the status of a gap and present gapfill info will be ignored and overwritten. If False, only gaps without gapfill data are filled. Defaults to False.
method_kwargs (dict, optional) – Extra arguments that are passed to pandas.DataFrame.interpolate() structured in a dict. Defaults to {}.
- Return type:
None
Notes
A schematic description:
Iterate over all gaps related to the target obstype.
Get the leading and trailing periods of the gap.
Check if the leading and trailing periods are valid.
Create a combined DataFrame with the leading, trailing, and gap data.
Interpolate the missing records using the specified method.
Update the gap attributes with the interpolated values, labels, and details.
If you want to use a higher-order method of interpolation, make sure to increase the n_leading_anchors and n_trailing_anchors accordingly. For example, for a cubic interpolation, you need at least 2 leading and 2 trailing anchors.