metobs_toolkit.Dataset.combine_all_to_obsspace#
- Dataset.combine_all_to_obsspace(repr_outl_as_nan=False, overwrite_outliers_by_gaps_and_missing=True)[source]#
Make one dataframe with all observations and their labels.
Combine all observations, outliers, missing observations and gaps into one Dataframe. All observation types are combined an a label is added in a serperate column.
When gaps and missing records are updated from outliers one has to choice to represent these records as outliers or gaps. There can not be duplicates in the return dataframe.
By default the observation values of the outliers are saved, one can choice to use these values or NaN’s. following checks!
- Parameters:
repr_outl_as_nan (bool, optional) – If True, Nan’s are use for the values of the outliers. The default is False.
overwrite_outliers_by_gaps_and_missing (Bool, optional) –
- If True, records that are labeld as gap/missing and outlier are
labeled as gaps/missing. This has only effect when the gaps/missing observations are updated from the outliers. The default is True.
- returns:
combdf – A dataframe containing a continious time resolution of records, where each record is labeld.
- rtype:
pandas.DataFrame()
Examples
>>> import metobs_toolkit >>> >>> # Import data into a Dataset >>> dataset = metobs_toolkit.Dataset() >>> dataset.update_settings( ... input_data_file=metobs_toolkit.demo_datafile, ... input_metadata_file=metobs_toolkit.demo_metadatafile, ... template_file=metobs_toolkit.demo_template, ... ) >>> dataset.import_data_from_file() >>> dataset.coarsen_time_resolution(freq='1H') >>> >>> # Apply quality control on the temperature observations >>> dataset.apply_quality_control(obstype='temp') #Using the default QC settings >>> dataset Dataset instance containing: *28 stations *['temp', 'humidity', 'radiation_temp', 'pressure', 'pressure_at_sea_level', 'precip', 'precip_sum', 'wind_speed', 'wind_gust', 'wind_direction'] observation types *10080 observation records *1932 records labeled as outliers *0 gaps *3 missing observations *records range: 2022-09-01 00:00:00+00:00 --> 2022-09-15 23:00:00+00:00 (total duration: 14 days 23:00:00) *time zone of the records: UTC *Coordinates are available for all stations. >>> >>> # Combine all records to one dataframe in Observation-resolution >>> overview_df = dataset.combine_all_to_obsspace() >>> overview_df.head(12) value label toolkit_representation name datetime obstype vlinder01 2022-09-01 00:00:00+00:00 humidity 65.0 ok observation precip 0.0 ok observation precip_sum 0.0 ok observation pressure 101739.0 ok observation pressure_at_sea_level 102005.0 ok observation radiation_temp NaN ok observation temp 18.8 ok observation wind_direction 65.0 ok observation wind_gust 11.3 ok observation wind_speed 5.6 ok observation 2022-09-01 01:00:00+00:00 humidity 65.0 ok observation precip 0.0 ok observation