Skip to main content
Version: 0.5

Overview of Transformations

What is a transformation?​

A transformation is a function that specifies logic to run against data retrieved from external data sources.

Transformations are a crucial piece of Tecton's functionality; Feature pipelines, via Feature Views, call transformations to compute feature values.

Transformation modes​

Transformations support the pandas, pyspark, python, snowflake_sql, snowpark, and spark_sql modes. See Transformation Modes for details.

Where transformations can be defined​

A transformation can be defined inside or outside of a Feature View.

Compared to defining a transformation inside of a Feature View, the main advantages of defining a transformation outside of a Feature View are:

  • Reusability
    • Transformations can be reused by multiple Feature Views.
    • A Feature View can call multiple transformations.
  • Discoverability: Transformations can be searched in the Web UI.

Defining a transformation inside of a Feature View​

The following example shows a Feature View that implements a transformation in the body of the Feature View function my_feature_view. The transformation runs in spark_sql mode and renames columns from the data source to feature_one and feature_two.

@batch_feature_view(
mode="spark_sql",
# ...
)
def my_feature_view(input_data):
return f"""
SELECT
entity_id,
timestamp,
column_a AS feature_one,
column_b AS feature_two
FROM {input_data}
"""

Defining a transformation function outside of a Feature View​

See Defining a Transformation Outside of a Feature View.

Transformation input and output​

Input​

The input to a transformation contains the columns in the data source.

Output​

When a transformation is defined inside of a Feature View, the output of the transformation is a DataFrame that must include:

  1. The join keys of all entities included in the entities list
  2. A timestamp column. If there is more than one timestamp column, a timestamp_key parameter must be set to specify which column is the correct timestamp of the feature values.
  3. Feature value columns. All columns other than the join keys and timestamp will be considered features in a Feature View.

Was this page helpful?

Happy React is loading...