Adding Transformations

Here, you can learn how to add your own custom transformations to the Kuwala transformation catalog.

Each transformation block you can pick from the Kuwala transformation catalog is a dbt macro under the hood. Based on the parameters that are passed to the macro it creates a dbt model representing the transformation and its corresponding YAML file.

We recommend you familiarize yourself with Jinja first so you can take advantage of all its possibilities. Dbt has written a neat introduction.

In order to develop a new transformation, you should run Kuwala in development mode. Follow these instructions for more detail:

1. Chose a transformation category

Before creating a new transformation, make sure you check out the existing ones in the transformation catalog.

Each transformation has to belong to one of the following categories: time, text, numeric, geo, merging, or general.

2. Transformation boilerplate

All macros for the transformation blocks are stored under kuwala/core/backend/app/dbt/kuwala_blocks/macros.

In the following, we explain each part of a macro by following the example of the apply_operation transformation.

2.1 Macro parameters

The declaration of a macro works similarly to the declaration of a function in Python. The macro name, in our example "apply_operation", is followed by parentheses which contain the parameters.

{% macro apply_operation(dbt_model, block_columns, column, operator, value, result_name) %}

{% endmacro %}

The first two parameters, dbt_model and block_columns , are part of most macros. They are passed down from the canvas automatically so you just simply need to declare them here. The parameter dbt_model specifies on top of which data or transformation block the transformation will be applied. The block_columns specify which columns should be selected after the transformation has been applied.

For transformations that are based on multiple blocks, e.g., join_by_id,instead of having only one dbt_model parameter you might need to have dbt_model_left and dbt_model_right.

2.2 Setting dbt relations

In order to create a lineage graph of your models, dbt uses the ref() syntax which resolves to the correct view or table in your data warehouse.

{% macro apply_operation(dbt_model, block_columns, column, operator, value, result_name) %}
    {% set rel = '{{ ref("' + dbt_model + '") }}' %}

{% endmacro %}

2.3 Custom functions

For some transformations, you might require custom functions such as mapping a parameter like the operator. You can create helper macros and store them under kuwala/core/backend/app/dbt/kuwala_blocks/macros/utils.

{% macro apply_operation(dbt_model, block_columns, column, operator, value, result_name) %}
    {% set rel = '{{ ref("' + dbt_model + '") }}' %}
    {% set operator_value = get_operator_value(operator) %}

{% endmacro %}

2.4 The actual transformation

The actual transformation that requires you to write custom SQL code is saved in a query variable which we'll use later on as a subquery and wrap it.

{% macro apply_operation(dbt_model, block_columns, column, operator, value, result_name) %}
    {% set rel = '{{ ref("' + dbt_model + '") }}' %}
    {% set operator_value = get_operator_value(operator) %}

    {% set query %}
        SELECT *, {{ column }} {{ operator_value }} {{ value }} AS {{ result_name }}
        FROM {{ rel }}
    {% endset %}

{% endmacro %}

If you require a different syntax for different data warehouses, you can use target.type in Jinja expressions.

For example:

{% if target.type == 'bigquery' %}

{% else %}

{% endif %}

2.5 Getting the final query

We pass the transformation query along with the block_columns to the helper macro to get the final result we will save as our dbt model.

{% macro apply_operation(dbt_model, block_columns, column, operator, value, result_name) %}
    {% set rel = '{{ ref("' + dbt_model + '") }}' %}
    {% set operator_value = get_operator_value(operator) %}

    {% set query %}
        SELECT *, {{ column }} {{ operator_value }} {{ value }} AS {{ result_name }}
        FROM {{ rel }}
    {% endset %}

    {% set result = get_result_query(block_columns, query) %}

{% endmacro %}

2.6 Returning the result

As the final step, we are logging and returning the generated query when the macro is executed. We are logging the result so we can pick it up from the subprocess call and save it as an SQL file. The result is also returned so it can be used in other macros as well.

{% macro apply_operation(dbt_model, block_columns, column, operator, value, result_name) %}
    {% set rel = '{{ ref("' + dbt_model + '") }}' %}
    {% set operator_value = get_operator_value(operator) %}

    {% set query %}
        SELECT *, {{ column }} {{ operator_value }} {{ value }} AS {{ result_name }}
        FROM {{ rel }}
    {% endset %}

    {% set result = get_result_query(block_columns, query) %}

    {% if execute %}
        {{ log(result, info=True) }}
        {% do return(result) %}
    {% endif %}
{% endmacro %}

3. Transformation specifications

Now, the only thing that's left to do before your transformation shows up in the transformation catalog is to write the transformation specification. The transformation specifications are stored under kuwala/core/backend/app/resources/transformation_catalog. Put your file under the fitting transformation category and when you start the backend it will read those files and store them in the backend database.

3.1 JSON file

The specification of a transformation is written in a JSON file and looks like this (example for our transformation apply_operation):

kuwala/core/backend/app/dbt/kuwala_blocks/macros/numeric/apply_operation.sql
{
  "id": "apply_operation",
  "category": "numeric",
  "name": "Apply operation",
  "icon": "calculator",
  "description": "With this transformation, you can manipulate a numeric column by applying basic arithmetic operations.",
  "required_column_types": ["numeric"],
  "optional_column_types": [],
  "min_number_of_input_blocks": 1,
  "max_number_of_input_blocks": 1,
  "macro_parameters": [
    {
      "id": "column",
      "name": "Column",
      "type": "text"
    }, {
      "id": "operator",
      "name": "Operator",
      "type": "text",
      "options": [
        {
          "id": "add",
          "name": "Add"
        }, {
          "id": "subtract",
          "name": "Subtract"
        }, {
          "id": "multiply",
          "name": "Multiply"
        }, {
          "id": "divide",
          "name": "Divide"
        }
      ]
    }, {
      "id": "value",
      "name": "Value",
      "type": "numeric"
    }, {
      "id": "result_name",
      "name": "Result name",
      "type": "text"
    }
  ],
  "examples_before": [
    {
      "columns": [
        "order_number",
        "weight_in_kg"
      ],
      "rows": [
        [
          "63472",
          "1.2"
        ],
        [
          "63473",
          "0.6"
        ],
        [
          "63474",
          "0.75"
        ],
        [
          "63475",
          "2.0"
        ]
      ]
    }
  ],
  "examples_after": [
    {
      "columns": [
        "order_number",
        "weight_in_kg",
        "weight_in_g"
      ],
      "rows": [
        [
          "63472",
          "1.2",
          "1200"
        ],
        [
          "63473",
          "0.6",
          "600"
        ],
        [
          "63474",
          "0.75",
          "750"
        ],
        [
          "63475",
          "2.0",
          "2000"
        ]
      ]
    }
  ]
}

3.1.1 ID

The ID has to be unique, all lower and snake case.

3.1.2 Category

Each transformation has to belong to one of the following categories: time, text, numeric, geo, merging, or general.

3.1.3 Icon

The icons come from FontAwesome. You can pick a fitting one and use the id as the value for the icon property.

If you are using a new icon that has not been used in the frontend yet, you also need to add it to the IconsLoader under kuwala/core/canvas/src/utils/IconsLoader.js.

3.1.4 Column and parameter types

The column and parameter types have to be either "text", "numeric", "date", "timestamp", or "boolean".

4. Publish your transformation

All you have to do now is to restart the backend and reload the frontend. Your transformation is now displayed in the transformation catalog and can be used on the canvas.

4.1 Production use

To use your transformation in production via our Docker image, create a fork and submit a PR to the base repository.

5. Locating the dbt project

All the models that are generated by the transformation blocks are saved in a dbt project. You can find them under kuwala/tmp/kuwala/backend/dbt. The folder names are the IDs of the corresponding data sources.

Last updated