Adding Transformations
Here, you can learn how to add your own custom transformations to the Kuwala transformation catalog.
Last updated
Here, you can learn how to add your own custom transformations to the Kuwala transformation catalog.
Last updated
Each transformation block you can pick from the Kuwala transformation catalog is a dbt macro under the hood. Based on the parameters that are passed to the macro it creates a dbt model representing the transformation and its corresponding YAML file.
We recommend you familiarize yourself with Jinja first so you can take advantage of all its possibilities. Dbt has written a neat introduction.
In order to develop a new transformation, you should run Kuwala in development mode. Follow these instructions for more detail:
Before creating a new transformation, make sure you check out the existing ones in the transformation catalog.
Each transformation has to belong to one of the following categories: time, text, numeric, geo, merging, or general.
All macros for the transformation blocks are stored under kuwala/core/backend/app/dbt/kuwala_blocks/macros
.
In the following, we explain each part of a macro by following the example of the apply_operation
transformation.
The declaration of a macro works similarly to the declaration of a function in Python. The macro name, in our example "apply_operation", is followed by parentheses which contain the parameters.
The first two parameters, dbt_model
and block_columns
, are part of most macros. They are passed down from the canvas automatically so you just simply need to declare them here. The parameter dbt_model
specifies on top of which data or transformation block the transformation will be applied. The block_columns
specify which columns should be selected after the transformation has been applied.
For transformations that are based on multiple blocks, e.g., join_by_id,
instead of having only one dbt_model
parameter you might need to have dbt_model_left
and dbt_model_right
.
In order to create a lineage graph of your models, dbt uses the ref()
syntax which resolves to the correct view or table in your data warehouse.
For some transformations, you might require custom functions such as mapping a parameter like the operator
. You can create helper macros and store them under kuwala/core/backend/app/dbt/kuwala_blocks/macros/utils
.
The actual transformation that requires you to write custom SQL code is saved in a query variable which we'll use later on as a subquery and wrap it.
If you require a different syntax for different data warehouses, you can use target.type
in Jinja expressions.
For example:
We pass the transformation query along with the block_columns
to the helper macro to get the final result we will save as our dbt model.
As the final step, we are logging and returning the generated query when the macro is executed. We are logging the result so we can pick it up from the subprocess call and save it as an SQL file. The result is also returned so it can be used in other macros as well.
Now, the only thing that's left to do before your transformation shows up in the transformation catalog is to write the transformation specification. The transformation specifications are stored under kuwala/core/backend/app/resources/transformation_catalog
. Put your file under the fitting transformation category and when you start the backend it will read those files and store them in the backend database.
The specification of a transformation is written in a JSON file and looks like this (example for our transformation apply_operation
):
The ID has to be unique, all lower and snake case.
Each transformation has to belong to one of the following categories: time, text, numeric, geo, merging, or general.
The icons come from FontAwesome. You can pick a fitting one and use the id as the value for the icon property.
If you are using a new icon that has not been used in the frontend yet, you also need to add it to the IconsLoader
under kuwala/core/canvas/src/utils/IconsLoader.js
.
The column and parameter types have to be either "text"
, "numeric"
, "date"
, "timestamp"
, or "boolean"
.
All you have to do now is to restart the backend and reload the frontend. Your transformation is now displayed in the transformation catalog and can be used on the canvas.
To use your transformation in production via our Docker image, create a fork and submit a PR to the base repository.
All the models that are generated by the transformation blocks are saved in a dbt project. You can find them under kuwala/tmp/kuwala/backend/dbt
. The folder names are the IDs of the corresponding data sources.