Demo example: filling gaps#
This example serves as a demonstration of how to fill gaps. We demonstrate this for temperature observations. First we import the demo data, then we introduce some gaps in them, and as a last step the gaps are filled.
[1]:
import metobs_toolkit
dataset = metobs_toolkit.Dataset() #Create a new dataset object
#Load the data
dataset.import_data_from_file(
template_file=metobs_toolkit.demo_template, #The template file
input_data_file=metobs_toolkit.demo_datafile, #The data file
input_metadata_file=metobs_toolkit.demo_metadatafile, #The metadata file
)
Luchtdruk is present in the datafile, but not found in the template! This column will be ignored.
Neerslagintensiteit is present in the datafile, but not found in the template! This column will be ignored.
Neerslagsom is present in the datafile, but not found in the template! This column will be ignored.
Rukwind is present in the datafile, but not found in the template! This column will be ignored.
Luchtdruk_Zeeniveau is present in the datafile, but not found in the template! This column will be ignored.
Globe Temperatuur is present in the datafile, but not found in the template! This column will be ignored.
The following columns are present in the data file, but not in the template! They are skipped!
['Neerslagintensiteit', 'Neerslagsom', 'Rukwind', 'Luchtdruk_Zeeniveau', 'Globe Temperatuur', 'Luchtdruk']
The following columns are found in the metadata, but not in the template and are therefore ignored:
['benaming', 'sponsor', 'Network', 'stad']
Identifying gaps#
As was shown in the introduction notebook, gaps are constructed when:
raw data is imported
resample()orDataset.sync_records()is applied
And we can inspect the gaps by plotting timeseries with make_plot(colorby='label'), or via the dataframe attribute gapsdf. Both methods work on Dataset-level as on Station-level.
[2]:
dataset.make_plot(obstype='temp', colorby='label')
[2]:
<Axes: title={'center': 'temp data.'}, xlabel='Timestamps (in UTC)', ylabel='temp (degree_Celsius)'>
As can be seen in the plot, the number of gaps is rather limited. But it is clear that there are some issues with the observations. A common approach is to apply a quality control (QC) pipeline, and interpret the outliers as gaps.
In the toolkit we can run a QC pipeline, see the QC introduction, and then use the .convert_outliers_to_gaps() method to convert them to gaps. This is another method to introduce gaps in the dataset.
[3]:
#Apply quality control on temperature (source: the QC introduction)
target = 'temp'
#Resampling
dataset.resample(target_freq='10min')
#1. gross value check
dataset.gross_value_check(
obstype=target,
lower_threshold=-10.0,
upper_threshold=26.3)
#2. persistence check
dataset.persistence_check(
obstype=target,
timewindow='60min',
min_records_per_window=3)
#3. repetitions check
dataset.repetitions_check(
obstype=target,
max_N_repetitions=5
)
#4. repetitions check
dataset.step_check(
obstype=target,
max_increase_per_second = 8.0 / 3600.0, #depends on standard unit!
max_decrease_per_second = -10.0 / 3600.0) #depends on standard unit!
#5. window variation check
dataset.window_variation_check(
obstype=target,
timewindow='60min',
min_records_per_window=3,
max_increase_per_second=8.0 / 3600.0, #depends on standard unit!
max_decrease_per_second = -10.0 / 3600.0, #depends on standard unit!
)
#6. buddy check
dataset.buddy_check(
obstype=target,
#main check settings
spatial_buddy_radius=15000, #15km definition of buddy radius
spatial_z_threshold=2.0, #outlier threshold
#requirements
min_sample_size=5,
max_alt_diff=None, # Maximum elevation difference between stations
N_iter=3, #Number of iterations
instantaneous_tolerance='4min', #Max timestamp tolerance for 'at the same time'
lapserate=None, #Specify the variation with altitude, if None no correction is applied
min_sample_spread=1.0, # Minimum spread of the sample (in the same unit as the observations)
)
dataset.make_plot(obstype='temp',
colorby='label',
title='After quality control on temperature observations.')
WARNING:<metobs_toolkit>:The present gaps are removed, new gaps are constructed for wind_direction data of station vlinder02..
WARNING:<metobs_toolkit>:The present gaps are removed, new gaps are constructed for temp data of station vlinder02..
WARNING:<metobs_toolkit>:The present gaps are removed, new gaps are constructed for humidity data of station vlinder02..
WARNING:<metobs_toolkit>:The present gaps are removed, new gaps are constructed for wind_speed data of station vlinder02..
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/sensordata.py:524: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
self.outliers_values_bin = pd.concat([self.outliers_values_bin,
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/qc_collection/checks/repetitions_check.py:88: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
groups.get_group(
[3]:
<Axes: title={'center': 'After quality control on temperature observations.'}, xlabel='Timestamps (in UTC)', ylabel='temp (degree_Celsius)'>
Now we convert all the temperature outliers to gaps.
Note: The outliers are removed, so the QC frequency statistics information is lost.
[4]:
#convert all outliers to gaps
dataset.convert_outliers_to_gaps(obstype='temp')
#Inspect the gaps in a plot
dataset.make_plot(obstype='temp',
colorby='label',
title='Temperature QC outliers are converted to gaps.')
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder01.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder01.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder01.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder01.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder02.!
WARNING:<metobs_toolkit>:Flushing current gaps for wind_direction data of station vlinder02.
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder02.!
WARNING:<metobs_toolkit>:Flushing current gaps for temp data of station vlinder02.
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder02.!
WARNING:<metobs_toolkit>:Flushing current gaps for humidity data of station vlinder02.
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder02.!
WARNING:<metobs_toolkit>:Flushing current gaps for wind_speed data of station vlinder02.
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder03.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder03.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder03.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder03.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder04.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder04.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder04.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder04.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder05.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder05.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder05.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder05.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder06.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder06.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder06.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder06.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder07.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder07.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder07.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder07.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder08.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder08.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder08.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder08.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder09.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder09.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder09.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder09.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder10.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder10.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder10.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder10.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder11.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder11.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder11.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder11.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder12.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder12.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder12.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder12.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder13.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder13.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder13.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder13.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder14.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder14.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder14.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder14.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder15.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder15.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder15.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder15.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder16.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder16.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder16.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder16.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder17.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder17.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder17.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder17.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder18.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder18.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder18.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder18.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder19.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder19.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder19.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder19.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder20.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder20.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder20.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder20.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder21.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder21.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder21.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder21.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder22.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder22.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder22.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder22.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder23.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder23.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder23.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder23.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder24.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder24.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder24.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder24.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder25.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder25.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder25.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder25.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder26.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder26.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder26.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder26.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder27.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder27.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder27.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder27.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_direction data of station vlinder28.!
WARNING:<metobs_toolkit>:Outliers are flushed for temp data of station vlinder28.!
WARNING:<metobs_toolkit>:Outliers are flushed for humidity data of station vlinder28.!
WARNING:<metobs_toolkit>:Outliers are flushed for wind_speed data of station vlinder28.!
[4]:
<Axes: title={'center': 'Temperature QC outliers are converted to gaps.'}, xlabel='Timestamps (in UTC)', ylabel='temp (degree_Celsius)'>
Since there are many gaps, for different stations, it is more clear to plot the time series of as single station.
[5]:
dataset.get_station('vlinder02').make_plot(obstype='temp',
colorby='label')
[5]:
<Axes: title={'center': 'temp data for station vlinder02'}, xlabel='Timestamps (in UTC)', ylabel='temp (degree_Celsius)'>
We now have a dataset with gaps that can be filled.
Inspecting a gap in detail#
The gaps are stored as a list of Gap instances, in the SensorData. We can acces them like:
[6]:
#Get one specific gap
specific_gap = (dataset
.get_station('vlinder02') #select a target stations
.get_sensor('temp') #select a target sensordata
.gaps[0] #select the first of the stored gaps
)
specific_gap.get_info()
================================================================================
General info of Gap
================================================================================
--- Gap details ---
Gap of temp for station: vlinder02
-From 2022-09-01 20:10:00+00:00 -> 2022-09-01 21:00:00+00:00
-Duration gap: 0 days 00:50:00
--- Gap filling details ---
-Gap status: unfilled
-Gapfill settings used:
Often it is more convenient to use the gapsdf attribute:
[7]:
dataset.get_station('vlinder02').gapsdf
[7]:
| value | label | details | ||
|---|---|---|---|---|
| datetime | obstype | |||
| 2022-09-01 20:10:00+00:00 | temp | NaN | gap | no details |
| 2022-09-01 20:20:00+00:00 | temp | NaN | gap | no details |
| 2022-09-01 20:30:00+00:00 | temp | NaN | gap | no details |
| 2022-09-01 20:40:00+00:00 | temp | NaN | gap | no details |
| 2022-09-01 20:50:00+00:00 | temp | NaN | gap | no details |
| ... | ... | ... | ... | ... |
| 2022-09-15 01:50:00+00:00 | temp | NaN | gap | no details |
| 2022-09-15 02:00:00+00:00 | temp | NaN | gap | no details |
| 2022-09-15 02:10:00+00:00 | temp | NaN | gap | no details |
| 2022-09-15 02:20:00+00:00 | temp | NaN | gap | no details |
| 2022-09-15 02:30:00+00:00 | temp | NaN | gap | no details |
465 rows × 3 columns
These missing observations are indicated in time series plots as vertical lines:
Gap overview#
For a higher-level summary of gaps, you can use the gap_overview_df() method. This provides a gap summary where each gap is represented by a single row (unlike gapsdf where all records inside each gap are listed):
[8]:
# Gap status overview at dataset level
dataset.gap_overview_df()
[8]:
| gapend | gapsize | label | details | |||
|---|---|---|---|---|---|---|
| gapstart | obstype | name | ||||
| 2022-09-01 00:00:00+00:00 | temp | vlinder05 | 2022-09-01 07:10:00+00:00 | 0 days 07:10:00 | unfilled | unidetail gap: no details |
| 2022-09-01 01:40:00+00:00 | temp | vlinder25 | 2022-09-01 02:30:00+00:00 | 0 days 00:50:00 | unfilled | unidetail gap: no details |
| 2022-09-01 02:10:00+00:00 | temp | vlinder19 | 2022-09-01 03:00:00+00:00 | 0 days 00:50:00 | unfilled | unidetail gap: no details |
| 2022-09-01 07:30:00+00:00 | temp | vlinder05 | 2022-09-01 10:20:00+00:00 | 0 days 02:50:00 | unfilled | unidetail gap: no details |
| 2022-09-01 11:10:00+00:00 | temp | vlinder27 | 2022-09-01 16:40:00+00:00 | 0 days 05:30:00 | unfilled | unidetail gap: no details |
| ... | ... | ... | ... | ... | ... | ... |
| 2022-09-15 21:00:00+00:00 | temp | vlinder24 | 2022-09-15 21:50:00+00:00 | 0 days 00:50:00 | unfilled | unidetail gap: no details |
| 2022-09-15 21:50:00+00:00 | temp | vlinder21 | 2022-09-15 22:40:00+00:00 | 0 days 00:50:00 | unfilled | unidetail gap: no details |
| 2022-09-15 22:00:00+00:00 | temp | vlinder08 | 2022-09-15 23:00:00+00:00 | 0 days 01:00:00 | unfilled | unidetail gap: no details |
| 2022-09-15 23:40:00+00:00 | temp | vlinder21 | 2022-09-15 23:50:00+00:00 | 0 days 00:10:00 | unfilled | unidetail gap: no details |
| 2022-09-15 23:50:00+00:00 | temp | vlinder23 | 2022-09-15 23:50:00+00:00 | 0 days 00:00:00 | unfilled | unidetail gap: no details |
647 rows × 4 columns
This is particularly useful for gap filling analysis and getting a concise overview of data completeness. The same method is available at station and sensor level.
Fill gaps#
In the toolkit, two groups of methods are implemented: interpolation methods and by making use of external modeldata.
NOTE: In this example, we use a single station (vlinder02) for demonstration. All methods can be directly applied on a Dataset, you do not need to apply this on all stations separately.
Interpolation methods#
The most straightforward method to fill a gap is by using interpolation. Linear interpolation is the best-known form of interpolation, but there are also more advanced forms of interpolation. In the toolkit, we can easily interpolate the gaps by making use of the .interpolate_gaps() method.
[9]:
#apply linear interpolation
dataset.get_station('vlinder02').interpolate_gaps(
obstype='temp', #Which gaps to fill
overwrite_fill = True, #Overwrite previous filled values if they are present
method='linear', #which interpolation method
#Limitations:
max_gap_duration_to_fill='10H', #maximum duration of a gap to fill (here 10 hours)
)
#make a plot
dataset.get_station('vlinder02').make_plot(
obstype='temp',
colorby='label',
title='Applying linear interpolation to fill the gaps.')
/home/thoverga/Documents/VLINDER_github/MetObs_toolkit/src/metobs_toolkit/backend_collection/datetime_collection.py:118: FutureWarning: 'H' is deprecated and will be removed in a future version. Please use 'h' instead of 'H'.
return pd.to_timedelta(inputdelta)
WARNING:<metobs_toolkit>:Cannot fill Gap(station=vlinder02, obstype=temp, start=2022-09-02 12:10:00+00:00, end=2022-09-03 01:30:00+00:00, status=failed gapfill) because the gap is too large (gapsize: 0 days 13:20:00 > 0 days 10:00:00 : max_gapsize). Increase the max_gapsize or use another gapfill method.
WARNING:<metobs_toolkit>:Cannot fill Gap(station=vlinder02, obstype=temp, start=2022-09-07 06:50:00+00:00, end=2022-09-08 14:10:00+00:00, status=failed gapfill) because the gap is too large (gapsize: 1 days 07:20:00 > 0 days 10:00:00 : max_gapsize). Increase the max_gapsize or use another gapfill method.
[9]:
<Axes: title={'center': 'Applying linear interpolation to fill the gaps.'}, xlabel='Timestamps (in UTC)', ylabel='temp (degree_Celsius)'>
As you can see, some gaps are filled others are not filled. This is because some gaps are larger than what was set by max_gap_duration_to_fill.
By using the Dataset.gap_overview_df() method (or for more details look at the .gapsdf) , it becomes clear why some gaps could not be filled.
[10]:
dataset.get_station('vlinder02').gap_overview_df()
[10]:
| gapend | gapsize | label | details | ||
|---|---|---|---|---|---|
| gapstart | obstype | ||||
| 2022-09-01 20:10:00+00:00 | temp | 2022-09-01 21:00:00+00:00 | 0 days 00:50:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-02 12:10:00+00:00 | temp | 2022-09-03 01:30:00+00:00 | 0 days 13:20:00 | failed gapfill | unidetail gap: Gap is too large (0 days 13:20:... |
| 2022-09-03 02:40:00+00:00 | temp | 2022-09-03 03:40:00+00:00 | 0 days 01:00:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-03 14:50:00+00:00 | temp | 2022-09-03 15:00:00+00:00 | 0 days 00:10:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-04 15:40:00+00:00 | temp | 2022-09-04 17:10:00+00:00 | 0 days 01:30:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-05 11:20:00+00:00 | temp | 2022-09-05 16:50:00+00:00 | 0 days 05:30:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-05 18:00:00+00:00 | temp | 2022-09-05 18:10:00+00:00 | 0 days 00:10:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-06 12:10:00+00:00 | temp | 2022-09-06 12:10:00+00:00 | 0 days 00:00:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-06 12:30:00+00:00 | temp | 2022-09-06 13:40:00+00:00 | 0 days 01:10:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-07 06:50:00+00:00 | temp | 2022-09-08 14:10:00+00:00 | 1 days 07:20:00 | failed gapfill | unidetail gap: Gap is too large (1 days 07:20:... |
| 2022-09-09 00:30:00+00:00 | temp | 2022-09-09 09:00:00+00:00 | 0 days 08:30:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-09 21:40:00+00:00 | temp | 2022-09-09 23:10:00+00:00 | 0 days 01:30:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-10 00:30:00+00:00 | temp | 2022-09-10 01:30:00+00:00 | 0 days 01:00:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-10 17:10:00+00:00 | humidity | 2022-09-10 17:10:00+00:00 | 0 days 00:00:00 | unfilled | unidetail gap: no details |
| temp | 2022-09-10 17:10:00+00:00 | 0 days 00:00:00 | successful gapfill | unidetail gap: Successful interpolation | |
| wind_direction | 2022-09-10 17:10:00+00:00 | 0 days 00:00:00 | unfilled | unidetail gap: no details | |
| wind_speed | 2022-09-10 17:10:00+00:00 | 0 days 00:00:00 | unfilled | unidetail gap: no details | |
| 2022-09-11 04:10:00+00:00 | temp | 2022-09-11 05:10:00+00:00 | 0 days 01:00:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-14 00:20:00+00:00 | temp | 2022-09-14 01:20:00+00:00 | 0 days 01:00:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-14 05:50:00+00:00 | temp | 2022-09-14 08:40:00+00:00 | 0 days 02:50:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-14 10:20:00+00:00 | temp | 2022-09-14 11:20:00+00:00 | 0 days 01:00:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-14 17:50:00+00:00 | temp | 2022-09-14 18:50:00+00:00 | 0 days 01:00:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-15 01:40:00+00:00 | temp | 2022-09-15 02:30:00+00:00 | 0 days 00:50:00 | successful gapfill | unidetail gap: Successful interpolation |
Higher order interpolation demo#
When using more advanced interpolation methods, often multiple anchors (= the good records that serve as anchor points for the interpolation) are required. In the toolkit we will refer to a leading period and a trailing period, the anchor’s observations before and after the gaps respectively.
Here is an example on applying a polynomial interpolation on gaps.
[11]:
import pandas as pd
target_station = dataset.get_station('vlinder02').copy()
#apply polynomial interpolation
target_station.interpolate_gaps(
obstype='temp',
method='polynomial',
overwrite_fill=False, #the gaps that are already filled are skipped (i.g. chainig GF methods)
n_leading_anchors=3, #at least 3 leading anchors are needed for 3th order polynomial interpolation
n_trailing_anchors=4, #at least 3 trailing anchors are needed for 3th order polynomial interpolation
max_gap_duration_to_fill=pd.Timedelta('15h'), #maximum gap size to fill
max_lead_to_gap_distance='60min', #the maximum distance (in time) beween the leading anchors and the start of the gap.
method_kwargs={'order':3}, #all extra arguments to pass to the pandas.Dataframe.interpolate method.
)
#make plot
target_station.make_plot(
obstype='temp',
colorby='label',
title='linear interpolation (small gaps) + 3th-order-polynomial (for medium gaps)')
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-01 20:10:00+00:00, end=2022-09-01 21:00:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-03 02:40:00+00:00, end=2022-09-03 03:40:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-03 14:50:00+00:00, end=2022-09-03 15:00:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-04 15:40:00+00:00, end=2022-09-04 17:10:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-05 11:20:00+00:00, end=2022-09-05 16:50:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-05 18:00:00+00:00, end=2022-09-05 18:10:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-06 12:10:00+00:00, end=2022-09-06 12:10:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-06 12:30:00+00:00, end=2022-09-06 13:40:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Cannot fill Gap(station=vlinder02, obstype=temp, start=2022-09-07 06:50:00+00:00, end=2022-09-08 14:10:00+00:00, status=failed gapfill) because the gap is too large (gapsize: 1 days 07:20:00 > 0 days 15:00:00 : max_gapsize). Increase the max_gapsize or use another gapfill method.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-09 00:30:00+00:00, end=2022-09-09 09:00:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-09 21:40:00+00:00, end=2022-09-09 23:10:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-10 00:30:00+00:00, end=2022-09-10 01:30:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-10 17:10:00+00:00, end=2022-09-10 17:10:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-11 04:10:00+00:00, end=2022-09-11 05:10:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-14 00:20:00+00:00, end=2022-09-14 01:20:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-14 05:50:00+00:00, end=2022-09-14 08:40:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-14 10:20:00+00:00, end=2022-09-14 11:20:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-14 17:50:00+00:00, end=2022-09-14 18:50:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-15 01:40:00+00:00, end=2022-09-15 02:30:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
[11]:
<Axes: title={'center': 'linear interpolation (small gaps) + 3th-order-polynomial (for medium gaps)'}, xlabel='Timestamps (in UTC)', ylabel='temp (degree_Celsius)'>
Warning#
As can be seen, interpolation is a technique that works for small gaps but it is not a suitable technique for larger gaps.
In the figure above, it is clear that interpolation (even higher order methods), are not suitable for larger gaps. To fill larger gaps, we can use and external dataset as described in the next section.
Fill gaps using external modeldata#
Another technique is to use an external source to fill the gaps. In the metobs-toolkit multiple variants of this technique are implemented, from simple to most complex:
.fill_gaps_with_raw_modeldata.fill_gaps_with_debiased_modeldata.fill_gaps_with_diurnal_debiased_modeldata.fill_gaps_with_weighted_diurnal_debias_modeldata
In general all methods use external modeldata and the more complex the method, the more complex the correction on the external modeldata is.
For a detailed description we refer to the API documentation.
As an example we will demonstrate the use of the .fill_gaps_with_debiased_modeldata method. To start, we need to extract external modeldata. We will extract ERA5-land temperature data, as is demonstrated in the GEE example.
[12]:
#Extract ERA5 temperature modedata
era5_manager = metobs_toolkit.default_GEE_datasets['ERA5-land']
#Extract the timeseries
era5_temp = dataset.get_station('vlinder02').get_gee_timeseries_data(
gee_manager=era5_manager, #The datasetmanager to use
startdt_utc=None,
enddt_utc=None,
obstypes=['temp'], #the observationtypes to extract, must be known modelobstypes
force_direct_transfer=True
)
[13]:
#Make a plot for illustration
dataset.get_station('vlinder02').make_plot(
obstype='temp',
colorby='label',
show_modeldata=True,
title='Partially interpolated timeseries and raw ERA5 temperature data.')
[13]:
<Axes: title={'center': 'Partially interpolated timeseries and raw ERA5 temperature data.'}, xlabel='Timestamps (in UTC)', ylabel='temp (degree_Celsius)'>
As we can see, we have external modeldata over the same period as the observations so we can use it to fill the gaps. In the .fill_gaps_with_debiased_modeldata, we specify a leading- and trailing period (a adjacent period before and after the gap with good records). These periods will be used to compute a modelbias, for which the gapfill values will be corrected for.
[14]:
import pandas as pd
#Fill the gaps with debias modeldata
dataset.get_station('vlinder02').fill_gaps_with_debiased_modeldata(
obstype='temp',
leading_period_duration=pd.Timedelta('4h'),
min_leading_records_total=20,
trailing_period_duration=pd.Timedelta('4h'),
min_trailing_records_total=20,
max_gap_duration_to_fill=pd.Timedelta('48h'))
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-01 20:10:00+00:00, end=2022-09-01 21:00:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Cannot fill Gap(station=vlinder02, obstype=temp, start=2022-09-02 12:10:00+00:00, end=2022-09-03 01:30:00+00:00, status=failed gapfill) because no valid trailing period can be found.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-03 02:40:00+00:00, end=2022-09-03 03:40:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-03 14:50:00+00:00, end=2022-09-03 15:00:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-04 15:40:00+00:00, end=2022-09-04 17:10:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-05 11:20:00+00:00, end=2022-09-05 16:50:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-05 18:00:00+00:00, end=2022-09-05 18:10:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-06 12:10:00+00:00, end=2022-09-06 12:10:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-06 12:30:00+00:00, end=2022-09-06 13:40:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-09 00:30:00+00:00, end=2022-09-09 09:00:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-09 21:40:00+00:00, end=2022-09-09 23:10:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-10 00:30:00+00:00, end=2022-09-10 01:30:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-10 17:10:00+00:00, end=2022-09-10 17:10:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-11 04:10:00+00:00, end=2022-09-11 05:10:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-14 00:20:00+00:00, end=2022-09-14 01:20:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-14 05:50:00+00:00, end=2022-09-14 08:40:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-14 10:20:00+00:00, end=2022-09-14 11:20:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-14 17:50:00+00:00, end=2022-09-14 18:50:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-15 01:40:00+00:00, end=2022-09-15 02:30:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
[15]:
#Make a plot for illustration
dataset.get_station('vlinder02').make_plot(
obstype='temp',
colorby='label',
show_modeldata=False,
title='Partially interpolated timeseries and partially filled with debiased ERA5 temperature data.')
[15]:
<Axes: title={'center': 'Partially interpolated timeseries and partially filled with debiased ERA5 temperature data.'}, xlabel='Timestamps (in UTC)', ylabel='temp (degree_Celsius)'>
As you can see, one of the remaining gaps is successfully filled with the debiased modeldata. One gap is not filled, to see why we can use the .gap_overview_df() method.
[16]:
dataset.get_station('vlinder02').get_sensor('temp').gap_overview_df()
[16]:
| gapend | gapsize | label | details | |
|---|---|---|---|---|
| gapstart | ||||
| 2022-09-01 20:10:00+00:00 | 2022-09-01 21:00:00+00:00 | 0 days 00:50:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-02 12:10:00+00:00 | 2022-09-03 01:30:00+00:00 | 0 days 13:20:00 | failed gapfill | unidetail gap: Too few trailing records (17 fo... |
| 2022-09-03 02:40:00+00:00 | 2022-09-03 03:40:00+00:00 | 0 days 01:00:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-03 14:50:00+00:00 | 2022-09-03 15:00:00+00:00 | 0 days 00:10:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-04 15:40:00+00:00 | 2022-09-04 17:10:00+00:00 | 0 days 01:30:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-05 11:20:00+00:00 | 2022-09-05 16:50:00+00:00 | 0 days 05:30:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-05 18:00:00+00:00 | 2022-09-05 18:10:00+00:00 | 0 days 00:10:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-06 12:10:00+00:00 | 2022-09-06 12:10:00+00:00 | 0 days 00:00:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-06 12:30:00+00:00 | 2022-09-06 13:40:00+00:00 | 0 days 01:10:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-07 06:50:00+00:00 | 2022-09-08 14:10:00+00:00 | 1 days 07:20:00 | successful gapfill | multi_details gap: bias corrected: 14.75 + -0.... |
| 2022-09-09 00:30:00+00:00 | 2022-09-09 09:00:00+00:00 | 0 days 08:30:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-09 21:40:00+00:00 | 2022-09-09 23:10:00+00:00 | 0 days 01:30:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-10 00:30:00+00:00 | 2022-09-10 01:30:00+00:00 | 0 days 01:00:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-10 17:10:00+00:00 | 2022-09-10 17:10:00+00:00 | 0 days 00:00:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-11 04:10:00+00:00 | 2022-09-11 05:10:00+00:00 | 0 days 01:00:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-14 00:20:00+00:00 | 2022-09-14 01:20:00+00:00 | 0 days 01:00:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-14 05:50:00+00:00 | 2022-09-14 08:40:00+00:00 | 0 days 02:50:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-14 10:20:00+00:00 | 2022-09-14 11:20:00+00:00 | 0 days 01:00:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-14 17:50:00+00:00 | 2022-09-14 18:50:00+00:00 | 0 days 01:00:00 | successful gapfill | unidetail gap: Successful interpolation |
| 2022-09-15 01:40:00+00:00 | 2022-09-15 02:30:00+00:00 | 0 days 00:50:00 | successful gapfill | unidetail gap: Successful interpolation |
We see that the issue is too few trailing records (17 found but 20 required in a 4-hour period). You can verify this by looking in the .df attribute of the station to see that this is true. You can also see this in the timeseries plot if you zoom in to the trailing period, that there is a interpolated gap (insert %matplotlib qt in the cell with the .make_plot() call to use the zoom-tool.)
Note that successful interpolated records are not included in the trailing period.
In order to solve this, we can lessen the restriction for the trailing period.
[17]:
#Fill the gaps with debias modeldata
dataset.get_station('vlinder02').fill_gaps_with_debiased_modeldata(
obstype='temp',
leading_period_duration=pd.Timedelta('4h'),
min_leading_records_total=20,
trailing_period_duration=pd.Timedelta('4h'),
min_trailing_records_total=15, #lessened restriction 20 -> 15
max_gap_duration_to_fill=pd.Timedelta('48h'))
#Make a plot for illustration
dataset.get_station('vlinder02').make_plot(
obstype='temp',
colorby='label',
show_modeldata=False,
title='Partially interpolated timeseries and partially filled with debiased ERA5 temperature data.')
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-01 20:10:00+00:00, end=2022-09-01 21:00:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-03 02:40:00+00:00, end=2022-09-03 03:40:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-03 14:50:00+00:00, end=2022-09-03 15:00:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-04 15:40:00+00:00, end=2022-09-04 17:10:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-05 11:20:00+00:00, end=2022-09-05 16:50:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-05 18:00:00+00:00, end=2022-09-05 18:10:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-06 12:10:00+00:00, end=2022-09-06 12:10:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-06 12:30:00+00:00, end=2022-09-06 13:40:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-07 06:50:00+00:00, end=2022-09-08 14:10:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-09 00:30:00+00:00, end=2022-09-09 09:00:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-09 21:40:00+00:00, end=2022-09-09 23:10:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-10 00:30:00+00:00, end=2022-09-10 01:30:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-10 17:10:00+00:00, end=2022-09-10 17:10:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-11 04:10:00+00:00, end=2022-09-11 05:10:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-14 00:20:00+00:00, end=2022-09-14 01:20:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-14 05:50:00+00:00, end=2022-09-14 08:40:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-14 10:20:00+00:00, end=2022-09-14 11:20:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-14 17:50:00+00:00, end=2022-09-14 18:50:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
WARNING:<metobs_toolkit>:Gap(station=vlinder02, obstype=temp, start=2022-09-15 01:40:00+00:00, end=2022-09-15 02:30:00+00:00, status=successful gapfill) cannot be filled (because it has a fill status successful gapfill), and overwrite fill is False.
[17]:
<Axes: title={'center': 'Partially interpolated timeseries and partially filled with debiased ERA5 temperature data.'}, xlabel='Timestamps (in UTC)', ylabel='temp (degree_Celsius)'>
Other Gap filling methods#
A set of GF methods are implemented in the MetObs-toolkit. Navigate to the Gaps related methods section in the documentation (see Dataset and Station) to get an overview and details of each method.