metobs_toolkit.dataset.Dataset.import_data_from_file#

Dataset.import_data_from_file(template_file: str | Path, input_data_file: str | Path | None = None, input_metadata_file: str | Path | None = None, freq_estimation_method: Literal['highest', 'median'] = 'median', freq_estimation_simplify_tolerance: str | Timedelta = '2min', origin_simplify_tolerance: str | Timedelta = '5min', timestamp_tolerance: str | Timedelta = '4min', kwargs_data_read: dict = {}, kwargs_metadata_read: dict = {}, templatefile_is_url: bool = False) None[source]#

Import observational data and metadata from files.

Importing data requires a ´Template´ which is constructed from a template file (JSON). (Use ´´metobs_toolkit.build_template_prompt()´´ to create a template file).

If input_data_file is provided, the method reads the raw observational data (CSV). A basic quality control (duplicate timestamps and invalid input) is performed, and a frequency estimation is made. Based on the estimated frequency, gaps are identified if present.

The method performs the following steps:

  • Estimates the frequency of observations using the ´freq_estimation_method´.

  • Simplifies the estimated frequency and origin timestamps based on tolerances.

  • Alligns the raw timestamps with target timestamps (by origin, and freq) using a nearest merge, considering a specified timestamp tolerance.

  • Executes checks for duplicates and invalid input.

  • Identifies gaps in the data.

if input_metadata_file is provided, the method reads the metadata (CSV).

Parameters:
  • template_file (str or Path) – Path to the template (JSON) file used to interpret the raw data/metadata files.

  • input_data_file (str or Path, optional) – Path to the input data file containing observations. If None, no data is read.

  • input_metadata_file (str or Path, optional) – Path to the input metadata file. If None, no metadata is read.

  • freq_estimation_method ({'highest', 'median'}, optional) – Method to estimate the frequency of observations (per station per observation type).

  • freq_estimation_simplify_tolerance (str or pd.Timedelta, optional) – The maximum allowed error in simplifying the target frequency.

  • origin_simplify_tolerance (str or pd.Timedelta, optional) – For each time series, the origin (first occurring timestamp) is set and simplification is applied.

  • timestamp_tolerance (str or pd.Timedelta, optional) – The maximum allowed time shift tolerance for aligning timestamps to target (perfect-frequency) timestamps.

  • kwargs_data_read (dict, optional) – Additional keyword arguments to pass to pandas.read_csv() when reading the data file.

  • kwargs_metadata_read (dict, optional) – Additional keyword arguments to pass to pandas.read_csv() when reading the metadata file.

  • templatefile_is_url (bool, optional) – If True, the template_file is interpreted as a URL to an online template file. If False, it is interpreted as a local file path.

Return type:

None