metobs_toolkit.dataset.Dataset.import_data_from_file#
- Dataset.import_data_from_file(template_file: str | Path, input_data_file: str | Path = None, input_metadata_file: str | Path = None, freq_estimation_method: Literal['highest', 'median'] = 'median', freq_estimation_simplify_tolerance: str | Timedelta = '2min', origin_simplify_tolerance: str | Timedelta = '5min', timestamp_tolerance: str | Timedelta = '4min', kwargs_data_read: dict = {}, kwargs_metadata_read: dict = {}, templatefile_is_url: bool = False) None[source]#
Import observational data and metadata from files.
Importing data requires a ´Template´ which is constructed from a template file (JSON). (Use ´´metobs_toolkit.build_template_prompt()´´ to create a template file).
If input_data_file is provided, the method reads the raw observational data (CSV). A basic quality control (duplicate timestamps and invalid input) is performed, and a frequency estimation is made. Based on the estimated frequency, gaps are identified if present.
The method performs the following steps:
Estimates the frequency of observations using the ´freq_estimation_method´.
Simplifies the estimated frequency and origin timestamps based on tolerances.
Alligns the raw timestamps with target timestamps (by origin, and freq) using a nearest merge, considering a specified timestamp tolerance.
Executes checks for duplicates and invalid input.
Identifies gaps in the data.
if input_metadata_file is provided, the method reads the metadata (CSV).
- Parameters:
template_file (str or Path) – Path to the template (JSON) file used to interpret the raw data/metadata files.
input_data_file (str or Path, optional) – Path to the input data file containing observations. If None, no data is read.
input_metadata_file (str or Path, optional) – Path to the input metadata file. If None, no metadata is read.
freq_estimation_method ({'highest', 'median'}, optional) – Method to estimate the frequency of observations (per station per observation type).
freq_estimation_simplify_tolerance (str or pd.Timedelta, optional) – The maximum allowed error in simplifying the target frequency.
origin_simplify_tolerance (str or pd.Timedelta, optional) – For each time series, the origin (first occurring timestamp) is set and simplification is applied.
timestamp_tolerance (str or pd.Timedelta, optional) – The maximum allowed time shift tolerance for aligning timestamps to target (perfect-frequency) timestamps.
kwargs_data_read (dict, optional) – Additional keyword arguments to pass to pandas.read_csv() when reading the data file.
kwargs_metadata_read (dict, optional) – Additional keyword arguments to pass to pandas.read_csv() when reading the metadata file.
templatefile_is_url (bool, optional) – If True, the template_file is interpreted as a URL to an online template file. If False, it is interpreted as a local file path.
- Return type:
None