Demo: Analysis#

This example is the continuation of the previous example: filling gaps. This example serves as an introduction to the Analysis module. In the MetObs-toolkit there is a Analysis class, that holds some common methods used in research.

To start, we import the demo dataset.

[6]:

import metobs_toolkit
dataset = metobs_toolkit.Dataset() #Create a new dataset object

#Load the data
dataset.import_data_from_file(
                    template_file=metobs_toolkit.demo_template, #The template file
                    input_data_file=metobs_toolkit.demo_datafile, #The data file
                    input_metadata_file=metobs_toolkit.demo_metadatafile, #The metadata file
                    )

Luchtdruk is present in the datafile, but not found in the template! This column will be ignored.
Neerslagintensiteit is present in the datafile, but not found in the template! This column will be ignored.
Neerslagsom is present in the datafile, but not found in the template! This column will be ignored.
Rukwind is present in the datafile, but not found in the template! This column will be ignored.
Luchtdruk_Zeeniveau is present in the datafile, but not found in the template! This column will be ignored.
Globe Temperatuur is present in the datafile, but not found in the template! This column will be ignored.
The following columns are present in the data file, but not in the template! They are skipped!
 ['Luchtdruk', 'Luchtdruk_Zeeniveau', 'Neerslagintensiteit', 'Globe Temperatuur', 'Neerslagsom', 'Rukwind']
The following columns are found in the metadata, but not in the template and are therefore ignored:
['benaming', 'stad', 'sponsor', 'Network']

Later in this demo we will some landcover information, we extract this for all are stations in the dataset.

[7]:

#Get LCZ, and landcover fractions will be used later on
_lczseries = dataset.get_LCZ()

Creating an Analysis#

The built-in analysis functionality is centered around the Analysis class. This class holds only records that are assumed to be correct. Thus there are no QC outliers or gaps present defined, the data hold by an Analysis is hold in a single dataframe.

We can create an Analysis instance from a Dataset (or from a Station).

[8]:

analysis = metobs_toolkit.Analysis(Dataholder=dataset)
analysis.get_info()

================================================================================
                            General info of Analysis
================================================================================

  -Number of records: 120960
  -Observation types: ['humidity', 'temp', 'wind_direction', 'wind_speed']
  -Available metadata columns: ['lat', 'lon', 'altitude', 'LCZ', 'school', 'g...
  -Stations: ['vlinder01', 'vlinder02', 'vlinder03', 'vlinder04', 'vlinder05'...
  -All known time-derivatives: ['year', 'month', 'hour', 'minute', 'second', ...

[9]:

analysis.df.index.get_level_values('datetime').tz

[9]:

datetime.timezone.utc

We can inspect the stored data from the Analysis.df and Analysis.metadf attributes.

[10]:

analysis.df.head(10)

[10]:

		humidity	temp	wind_direction	wind_speed
datetime	name
2022-09-01 00:00:00+00:00	vlinder01	65.0	18.799999	65.0	1.555556
	vlinder02	62.0	19.400000	25.0	0.194444
	vlinder03	65.0	17.000000	115.0	0.333333
	vlinder04	66.0	15.900000	25.0	0.083333
	vlinder05	61.0	21.100000	45.0	1.611111
	vlinder06	65.0	17.700001	95.0	0.138889
	vlinder07	63.0	18.100000	15.0	0.611111
	vlinder08	59.0	19.200001	165.0	0.222222
	vlinder09	63.0	18.000000	55.0	0.888889
	vlinder10	63.0	19.100000	5.0	0.472222

[11]:

analysis.metadf.head(10)

[11]:

	lat	lon	altitude	LCZ	school	geometry
name
vlinder01	50.980438	3.815763	NaN	Low plants (LCZ D)	UGent	POINT (3.81576 50.98044)
vlinder02	51.022381	3.709695	NaN	Large lowrise	UGent	POINT (3.7097 51.02238)
vlinder03	51.324581	4.952109	NaN	Open midrise	Heilig Graf	POINT (4.95211 51.32458)
vlinder04	51.335522	4.934732	NaN	Sparsely built	Heilig Graf	POINT (4.93473 51.33552)
vlinder05	51.052654	3.675183	NaN	Water (LCZ G)	Sint-Barbara	POINT (3.67518 51.05265)
vlinder06	51.027100	4.516300	NaN	Scattered Trees (LCZ B)	BimSem	POINT (4.5163 51.0271)
vlinder07	51.030888	4.478445	NaN	Compact midrise	PTS	POINT (4.47845 51.03089)
vlinder08	51.028130	4.477398	NaN	Compact midrise	TSM	POINT (4.4774 51.02813)
vlinder09	50.927166	4.075722	NaN	Scattered Trees (LCZ B)	SMI	POINT (4.07572 50.92717)
vlinder10	50.935555	4.041389	NaN	Compact midrise	SMI	POINT (4.04139 50.93555)

Analysis methods#

An overview of the available analysis methods can be seen in the documentation of the Analysis class. The relevant methods depend on your data and your interests. As an example, a demonstration of the filter and diurnal cycle of the demo data.

Filtering data#

It is common to filter your data according to specific meteorological phenomena or periods in time. To do this you can use the apply_filter_on_records() method.

NOTE: The filtering will remove data

[12]:

print(f'The initial number of records: {analysis.df.shape[0]}')

#filter to non-windy afternoons
analysis.apply_filter_on_records('(wind_speed <= 2.5) & (hour > 12) & (hour < 20)')

#We can apply multiple consecutive filterings
analysis.apply_filter_on_records('season=="autumn" | season=="winter"') #Be aware of quotation!


print(f'The number of records after filtering: {analysis.df.shape[0]}')

The initial number of records: 120960
The number of records after filtering: 28536

We can also use the metadata to filter to by using apply_filter_on_metadata() method.

[13]:

analysis.apply_filter_on_metadata("LCZ == 'Large lowrise'")

Diurnal cycle#

To make a diurnal cycle plot of your Analysis use the get_diurnal_statistics() method:

[14]:

analysis.plot_diurnal_cycle(colorby='name', #each station is plotted, and colored differently
                            obstype = 'humidity',
                                return_data = False,
                                )

[14]:

<Axes: title={'center': 'Diurnal cycle of humidity grouped per name.'}, xlabel='Diurnal Timestamps (in UTC)', ylabel='humidity (percent)'>

../_images/examples_analysis_example_16_2.png

Note: Be aware that we filtered the data to wind-still afternoons in autumn!

If you want to work with the aggregated values, you can use the aggregate_df() method. As illustration we undo the filtering to have some more variability in the data. Then we aggregate the data per LCZ.

[15]:

import numpy as np

#Start with an unfiltered analysis
analysis=metobs_toolkit.Analysis(Dataholder=dataset)

aggdf = analysis.aggregate_df(
                        obstype='temp',
                        agg=["LCZ", "hour"], #by adding hour, we keep the diurnal variation
                        method=np.mean) #the aggregation function to use.
aggdf

[15]:

		temp
LCZ	hour
Compact midrise	0	17.739286
	1	17.230398
	2	16.657461
	3	16.421507
	4	16.209682
...	...	...
Water (LCZ G)	19	19.380926
	20	18.933889
	21	18.553888
	22	18.320370
	23	18.126852

240 rows × 1 columns

Diurnal cycle of differences#

The diurnal cycle of differences is also implemented. The values are the aggregated diurnal differences wrt a reference station.

As an example the diurnal temperature difference cycle is plotted with station vlinder02 as the reference. The aggregation is done per LCZ.

[16]:

analysis.plot_diurnal_cycle_with_reference_station(ref_station='vlinder02',
                                                  obstype='temp',
                                                  colorby='LCZ')

[16]:

<Axes: title={'center': 'Diurnal cycle of temp differences with vlinder02 as reference, grouped per LCZ.'}, xlabel='Diurnal Timestamps (in UTC)', ylabel='temp (degree_Celsius) difference'>

../_images/examples_analysis_example_20_2.png