metobs_toolkit.dataset.Dataset#
- class Dataset[source]#
Dataset class for managing and processing meteorological observation data.
This class provides functionality to handle datasets containing observations from multiple stations. It includes methods for data synchronization, quality control, gap filling, and integration with Google Earth Engine (GEE) datasets. The Dataset class also supports importing and exporting data, as well as generating plots for visualization.
- stations#
A list of Station objects representing the stations in the dataset.
- Type:
list
- obstypes#
A dictionary of observation types known to the dataset.
- Type:
dict
- template#
The template instance used for data import and processing.
- Type:
Template
Methods
__init__()Initialize a Dataset instance.
add_new_observationtype(obstype)Add a new observation type to the dataset known-obstypes.
buddy_check([obstype, spatial_buddy_radius, ...])Spatial buddy check.
buddy_check_with_LCZ_safety_net()Deprecated alias for
buddy_check_with_safetynets.buddy_check_with_safetynets([obstype, ...])Spatial buddy check with configurable safety nets.
convert_outliers_to_gaps([all_observations, ...])Convert outlier values in the observation data to gaps.
copy([deep])Return a copy of the Dataset.
Compute pairwise great-circle distances between all stations.
fill_gaps_with_debiased_modeldata(obstype[, ...])Fill the gaps using model data corrected for the bias.
Fill the gaps using model data corrected for the diurnal bias.
fill_gaps_with_raw_modeldata(obstype[, ...])Fill the gap(s) using model data without correction.
Fill the gaps using a weighted sum of model data corrected for the diurnal bias and weights with respect to the start of the gap.
Create gap status overview DataFrame with one row per gap period.
get_LCZ([update_metadata, initialize_gee, ...])Retrieve Local Climate Zone (LCZ) for the stations using Google Earth Engine (GEE).
get_altitude([update_metadata, initialize_gee])Retrieve altitude for the stations using Google Earth Engine (GEE).
get_gee_timeseries_data(gee_dynamic_manager)Extract time series data from GEE.
get_info([printout])Retrieve and optionally print detailed information about the station.
get_landcover_fractions([buffers, ...])Get landcover fractions for a circular buffer at the stations using GEE.
get_qc_stats([obstype, make_plot])Summarize QC label frequencies across all stations for a given observation type.
get_static_gee_buffer_fraction_data(...[, ...])Extract circular buffer fractions of a GEE dataset at Stations locations.
get_static_gee_point_data(gee_static_manager)Extract static data from GEE dataset at Stations locations.
get_station(stationname)Retrieve a Station by name.
gross_value_check([obstype, ...])Identify outliers based on thresholds.
import_data_from_file(template_file[, ...])Import observational data and metadata from files.
import_gee_data_from_file(filepath, ...[, ...])Import Google Earth Engine (GEE) data from a CSV file
interpolate_gaps(obstype[, method, ...])Fill the gap(s) using interpolation of SensorData.
make_gee_plot([gee_manager, timeinstance, ...])Create an interactive spatial plot of the GEE dataset and stations.
make_plot([obstype, colorby, ...])Generate a time series plot for observational data.
make_plot_of_modeldata([obstype, colormap, ...])Generate a timeseries plot of model data for a specific observation type.
persistence_check([obstype, timewindow, ...])Check if values are not constant in a moving time window.
qc_overview_df([subset_stations, ...])Build a QC overview DataFrame for all stations in a Dataset.
rename_stations(renamedict)Rename stations in the dataset.
repetitions_check([obstype, ...])Test if an observation changes after a number of repetitions.
resample(target_freq[, obstype, ...])Resample observation data to a specified frequency.
save_dataset_to_pkl([filepath, overwrite])Save the dataset to a pickle (.pkl) file.
step_check([obstype, ...])Check for 'spikes' and 'dips' in a time series.
subset_by_stations(stationnames[, deepcopy])Create a subset of the dataset by selecting specific stations.
sync_records([obstype, ...])Synchronize records of sensor data across stations.
to_csv([filepath, overwrite])Save the dataset observations to a CSV file.
to_netcdf([filepath, overwrite])Save the Dataset as a netCDF file.
to_parquet([filepath, overwrite])Save the dataset observations to a parquet file.
to_xr()Concatenate multiple station Datasets into one along a new 'name' dimension.
window_variation_check([obstype, ...])Test if the increase/decrease in a time window exceeds a threshold.
Attributes
Dataset DataFrame constructor.
Get the latest end datetime from the observation data.
Construct a DataFrame representation of all the gaps.
Construct a DataFrame representation of metadata.
Construct a DataFrame representation of all the present model data.
Get the dictionary of known Obstypes by the Dataset.
Construct a DataFrame representation of all the outliers.
Get a list of all the present observation types.
Get the earliest start datetime from the observation data.
Get the list of Stations present in the Dataset.
Get the Template instance used when the data was imported.