{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "# Mapping to the toolkit\n", "\n", "The MetObs-toolkit uses standard names and formats for your data. To use the toolkit,\n", "your observational data must be converted to the toolkit standards this is referred to as **mapping**.\n", "\n", "To specify how the mapping must be done a **template** is used. This template contains\n", "all the information on how to convert your tabular data to the toolkit standards.\n", "A template is saved as a file (JSON file) and can be reused or shared. In practice you only need to \n", "make one template file, for your network.\n", "\n", "On this page, you can find information on how to construct a template." ] }, { "cell_type": "markdown", "id": "3", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "remove-input" ] }, "source": [ "## Raw data Structures\n", "\n", "To make a template you must be aware of which format your data is in. The toolkit can handle the following data structures:" ] }, { "cell_type": "markdown", "id": "4", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### Long-format\n", "Observations are stacked in rows per station. One column represents the station names." ] }, { "cell_type": "markdown", "id": "5", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "| Timestamp | 2m Temperature | 2m Humidity | ID |\n", "| -------- | ------- | ------- | ------- |\n", "| 2022-06-07 13:20:00 | 16.4 | 77.3 | Station_A |\n", "| 2022-06-07 13:30:00 | 16.7 | 75.6 | Station_A |\n", "| 2022-06-07 13:20:00 | 18.3 | 68.9 | Station_B |\n", "| 2022-06-07 13:30:00 | 18.6 | 71.9 | Station_B |\n" ] }, { "cell_type": "markdown", "id": "6", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### Single-station-format\n", "The same as a long format but without a column indicating the station names. Be aware that the toolkit interprets it as observations coming from one station." ] }, { "cell_type": "markdown", "id": "7", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "| Timestamp | 2m Temperature | 2m Humidity |\n", "| -------- | ------- | ------- |\n", "| 2022-06-07 13:20:00 | 16.4 | 77.3 |\n", "| 2022-06-07 13:30:00 | 16.7 | 75.6 |" ] }, { "cell_type": "markdown", "id": "8", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### Wide-format\n", "Columns represent different stations. The data represents one observation type." ] }, { "cell_type": "markdown", "id": "9", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "| Timestamp | Station_A | Station_B |\n", "| -------- | ------- | ------- |\n", "| 2022-06-07 13:20:00 | 16.4 | 18.3 |\n", "| 2022-06-07 13:30:00 | 16.7 | 18.6 |" ] }, { "cell_type": "markdown", "id": "10", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Template creation\n", "\n", "Once you have converted your tabular data files to either long-, wide-, or single-station-format, and saved them as a .csv file, a template can be made." ] }, { "cell_type": "markdown", "id": "11", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "\n", "
\n", "Note: If you want to use a metadata file, make sure it is converted to a wide-format and saved as a .csv file.\n", "\n", "
" ] }, { "cell_type": "markdown", "id": "12", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "The fastest and simplest way to make a template is by using the ``metobs_toolkit.build_template_prompt()`` function." ] }, { "cell_type": "markdown", "id": "13", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "```python \n", "import metobs_toolkit\n", "\n", "#create a template\n", "metobs_toolkit.build_template_prompt()\n", "```" ] }, { "cell_type": "markdown", "id": "15", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "This function will prompt questions and build a template that matches your data file (and metadata) file. The *template.json* file will be stored at a location of your choice." ] }, { "cell_type": "markdown", "id": "14", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "
\n", "Note: When the prompt asks if you need further help, and you type yes, some more questions are prompted. Once all information is given to the prompt, it will print out a piece of code that you have to run to load your data into the toolkit.\n", "
" ] }, { "cell_type": "markdown", "id": "a565e1df", "metadata": {}, "source": [ "Use the template file when importing the raw data. " ] }, { "cell_type": "code", "execution_count": 1, "id": "16", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Luchtdruk is present in the datafile, but not found in the template! This column will be ignored.\n", "Neerslagintensiteit is present in the datafile, but not found in the template! This column will be ignored.\n", "Neerslagsom is present in the datafile, but not found in the template! This column will be ignored.\n", "Rukwind is present in the datafile, but not found in the template! This column will be ignored.\n", "Luchtdruk_Zeeniveau is present in the datafile, but not found in the template! This column will be ignored.\n", "Globe Temperatuur is present in the datafile, but not found in the template! This column will be ignored.\n", "The following columns are present in the data file, but not in the template! They are skipped!\n", " ['Neerslagsom', 'Neerslagintensiteit', 'Rukwind', 'Globe Temperatuur', 'Luchtdruk', 'Luchtdruk_Zeeniveau']\n", "The following columns are found in the metadata, but not in the template and are therefore ignored: \n", "['benaming', 'sponsor', 'Network', 'stad']\n" ] } ], "source": [ "import metobs_toolkit\n", "\n", "dataset = metobs_toolkit.Dataset() #initiate an empty dataset\n", "dataset.import_data_from_file(\n", " input_data_file= metobs_toolkit.demo_datafile, #Path to your data (csv) file\n", " input_metadata_file=metobs_toolkit.demo_metadatafile, #Path to your metadata (csv) file\n", " template_file=metobs_toolkit.demo_template) #Path to your template (json) file.\n" ] }, { "cell_type": "markdown", "id": "17", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "The template (file) is read when calling the ``Dataset.import_data_from_file()`` method, and converted to a ``metobs_toolkit.Template`` which is accessible for each dataset." ] }, { "cell_type": "markdown", "id": "17a1cfa0", "metadata": {}, "source": [ "The template file is used to create a ``Template`` object, that will convert raw data to a standard format interpretable by the toolkit. This object is stored as a ``Dataset.template`` attribute. It is only used when importing raw data, and has no further use." ] }, { "cell_type": "code", "execution_count": 2, "id": "18", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.template" ] }, { "cell_type": "markdown", "id": "19", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "An overview of the template can be printed using the `show()` on the `Template` instance:" ] }, { "cell_type": "code", "execution_count": 3, "id": "20", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "================================================================================\n", " General info of Template \n", "================================================================================\n", "\n", "\n", "--- Data obstypes map ---\n", "\n", " -temp: Temperatuur\n", " -raw data in degC\n", " -description: 2mT passive\n", " -humidity: Vochtigheid\n", " -raw data in percent\n", " -description: 2m relative humidity passive\n", " -wind_speed: Windsnelheid\n", " -raw data in km/h\n", " -description: Average 2m 10-min windspeed\n", " -wind_direction: Windrichting\n", " -raw data in degrees\n", " -description: Average 2m 10-min windspeed, north is zero in CW direction...\n", "\n", "--- Data extra mapping info ---\n", "\n", " -name column (data) <---> Vlinder\n", "\n", "--- Data timestamp map ---\n", "\n", " -datetimecolumn <---> None\n", " -time_column <---> Tijd (UTC)\n", " -date_column <---> Datum\n", " -fmt <---> %Y-%m-%d %H:%M:%S\n", " -Timezone <---> UTC\n", "\n", "--- Metadata map ---\n", "\n", " -name <---> Vlinder\n", " -lat <---> lat\n", " -lon <---> lon\n", " -school <---> school\n", "\n" ] } ], "source": [ "dataset.template.get_info() #Get information about the template" ] } ], "metadata": { "kernelspec": { "display_name": "metobs_dev", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.10" } }, "nbformat": 4, "nbformat_minor": 5 }