metobs_toolkit.Dataset.import_data_from_file#

Dataset.import_data_from_file(long_format=True, obstype=None, obstype_unit=None, obstype_description=None, freq_estimation_method=None, freq_estimation_simplify=None, freq_estimation_simplify_error=None, kwargs_data_read={}, kwargs_metadata_read={})[source]#

Read observations from a csv file.

The paths are defined in the Settings.input_file. The input file columns should have a template that is stored in Settings.template_list.

If the metadata is stored in a seperate file, and the Settings.input_metadata_file is correct, than this metadata is also imported (if a suitable template is in the Settings.template_list.)

The dataset is by default assumed to be in long-format (each column represent an observation type, one column indicates the stationname). Wide-format can be used if

  • the ‘wide’ option is present in the template (this is done automatically if the themplate was made using the metobs_toolkit.build_template_prompt())

  • ‘long_format’ is set to False and if the observation type is specified (obstype, obstype_unit and obstype_description)

An estimation of the observational frequency is made per station. This is used to find missing observations and gaps.

The Dataset attributes are set and the following checks are executed:
  • Duplicate check

  • Invalid input check

  • Find missing observations

  • Find gaps

Parameters:
  • long_format (bool, optional) – True if the inputdata has a long-format, False if it has a wide-format. The default is True.

  • obstype (str, optional) – If the dataformat is wide, specify which observation type the observations represent. The obstype should be an element of metobs_toolkit.observation_types. The default is None.

  • obstype_unit (str, optional) – If the dataformat is wide, specify the unit of the obstype. The default is None.

  • obstype_description (str, optional) – If the dataformat is wide, specify the description of the obstype. The default is None.

  • freq_estimation_method ('highest' or 'median', optional) – Select wich method to use for the frequency estimation. If ‘highest’, the highest apearing frequency is used. If ‘median’, the median of the apearing frequencies is used. If None, the method stored in the Dataset.settings.time_settings[‘freq_estimation_method’] is used. The default is None.

  • freq_estimation_simplify (bool, optional) – If True, the likely frequency is converted to round hours, or round minutes. The “freq_estimation_simplify_error’ is used as a constrain. If the constrain is not met, the simplification is not performed. If None, the method stored in the Dataset.settings.time_settings[‘freq_estimation_simplify’] is used. The default is None.

  • freq_estimation_simplify_error (Timedelta or str, optional) – The tolerance string or object representing the maximum translation in time to form a simplified frequency estimation. Ex: ‘5T’ is 5 minuts, ‘1H’, is one hour. If None, the method stored in the Dataset.settings.time_settings[‘freq_estimation_simplify_error’] is used. The default is None.

  • kwargs_data_read (dict, optional) – Keyword arguments collected in a dictionary to pass to the pandas.read_csv() function on the data file. The default is {}.

  • kwargs_metadata_read (dict, optional) – Keyword arguments collected in a dictionary to pass to the pandas.read_csv() function on the metadata file. The default is {}.

Note

In pracktice, the default arguments will be sufficient for most applications.

Note

If options are present in the template, these will have priority over the arguments of this function.

Return type:

None.

Examples

>>> import metobs_toolkit
>>>
>>> # Import data into a Dataset
>>> dataset = metobs_toolkit.Dataset()
>>> dataset.update_settings(
...                         input_data_file=metobs_toolkit.demo_datafile,
...                         input_metadata_file=metobs_toolkit.demo_metadatafile,
...                         template_file=metobs_toolkit.demo_template,
...                         )
>>> dataset.import_data_from_file()