Version: 0.6

0.3 to 0.4 Upgrade Guide

What's new in 0.4

Tecton 0.4 includes the next generation feature definition framework with native support for Snowflake transformations. The major goal for this release is to simplify core concepts, while increasing flexibility and maintaining high performance.

Tecton 0.4 includes many updates and improvements. For a full list see the Tecton 0.4 Release Notes.

0.3 and 0.4 side-by-side comparison

Tecton version 0.4 introduces new classes which replace classes that were available in version 0.3. The following tables list the mapping between 0.3 and 0.4 classes, parameters, and methods.

Class Renames/Changes

0.3 Definition	0.4 Definition
*Data Sources*
BatchDataSource	BatchSource
StreamDataSource	StreamSource
RequestDataSource	RequestSource
*Data Source Configs*
FileDSConfig	FileConfig
HiveDSConfig	HiveConfig
KafkaDSConfig	KafkaConfig
KinesisDSConfig	KinesisConfig
RedshiftDSConfig	RedshiftConfig
SnowflakeDSConfig	SnowflakeConfig
*Feature Views*
@batch_window_aggregate_feature_view	@batch_feature_view
@stream_window_aggregate_feature_view	@stream_feature_view
*Misc Classes*
FeatureAggregation	Aggregation
*New Classes*
-	AggregationMode
-	KafkaOutputStream
-	KinesisOutputStream
-	FilteredSource
*Deprecated Classes in 0.3*
Input	-
BackfillConfig	-
MonitoringConfig	-

Feature View/Table Parameter Changes

0.3 Definition	0.4 Definition	Type Changes
inputs	sources	Dict -> List of Data Sources or Filtered Source
name_override	name
aggregation_slide_period	aggregation_interval	str -> datetime.timedelta
timestamp_key	timestamp_field
batch_cluster_config	batch_compute
stream_cluster_config	stream_compute
online_config	online_store
offline_config	offline_store
output_schema	schema	See release notes for type changes
batch_schedule, ttl, max_batch_aggregation_interval		str -> datetime.timedelta
family	- (removed)
schedule_offset(nested in Input)	- (removed, see DataSource data_delay)
window(nested in Input)	- (removed, see start_time_offset)
-	start_time_offset (window - Feature View's batch_schedule)
monitoring.alert_email (nested in MonitoringConfig)	alert_email
monitoring.monitor_freshness (nested in MonitoringConfig)	monitor_freshness
monitoring.expected_feature_freshness (nested in MonitoringConfig)	expected_feature_freshness	str -> datetime.timedelta

Data Source Parameter Changes

0.3 Definition	0.4 Definition	Type Changes
*Data Sources*
batch_ds_config	batch_config
stream_ds_config	stream_config
request_schema	schema	See release notes for type changes
*Data Sources Configs*
timestamp_column_name	timestamp_field
timestamp_field	timestamp_field
raw_batch_translator	post_processor
raw_stream_translator	post_processor
default_watermark_delay_threshold	watermark_delay_threshold	str -> datetime.timedelta
default_initial_stream_position	initial_stream_position
schedule_offset (defined in Feature View)	data_delay

Interactive Method Changes

In addition to declarative classes, interactive FeatureView and FeatureTable methods with overlapping functionality have been consolidated.

0.3 Interactive Method	0.4 Interactive Method
get_historical_features	get_historical_features
preview	get_historical_features
get_feature_dataframe	get_historical_features
get_features	get_historical_features
get_online_features	get_online_features
get_feature_vector	get_online_features
run	run*

*arguments have changed in 0.4

Incremental upgrades from 0.3 to 0.4

Feature repository updates and CLI updates can be decoupled using the compat library for 0.3 feature definitions. Feature definitions using the 0.3 objects can continue to be used after updating to 0.4 CLI when imports are migrated to the compat module. 0.3 objects cannot be applied using tecton~=0.4 without changing from tecton import paths to from tecton.compat. For example:

# 0.3 object imports can only be applied using tecton 0.3
from tecton import Entity

# 0.3 object imports from `compat` can be applied using tecton 0.4
from tecton.compat import Entity

See the table below for a compatibility matrix.

	CLI 0.3	CLI 0.4
Framework 0.3	`from tecton import` (as normal)	`from tecton.compat import` required
Framework 0.4	Not supported	`from tecton import` (as normal)

⚠️ Important: 0.4 Feature Services and Feature Views

The format of feature column names for offline features has changed from feature_view_name.feature_name to feature_view_name__feature_name. Feature view names cannot contain double underscores. Online feature columns will continue to use the format feature_view_name.feature_name.

Steps to upgrade to the Tecton 0.4 CLI

The steps below will allow you to upgrade the 0.3 CLI to the 0.4 CLI and use the 0.4 CLI with a 0.3 feature repo.

1. Verify the repo is a version 0.3 repo

pip install tecton~=0.3.0 and run tecton plan. This should yield no diffs before beginning the migration.

2. Migrate imports to the `compat` library

Update all imports from the tecton module that are found in the 0.3.0 column of the table above to import from tecton.compat instead. This is important! You should only update 0.3.0 classes to import from tecton.compat. Do not run tecton apply yet!

3. Upgrade your CLI to 0.4

Install tecton~=0.4.0 and run tecton plan. This should yield no diffs.

4. Congrats!

You now have a 0.3 repo compatible with the 0.4 CLI.

Upgrade the 0.3 feature repo to use 0.4 Tecton object definitions

Once you've upgraded to the Tecton 0.4 CLI and updated the 0.3 feature repo to make it compatible with the 0.4 CLI, you can begin updating the feature repo to use 0.4 Tecton object definitions.

The fundamentals of 0.3 and 0.4 are similar except for class and parameter names. Most upgrades will require light refactoring of each Tecton object definition, refer to the class and parameter changes above. However, there are specific scenarios detailed below that require additional steps and refactoring particularly to avoid re-materializing data. Please read them carefully before upgrading your feature repo. Note: Objects must be semantically identical when doing an upgrade; no property changes can be combined with an upgrade.

Important: With the exception of Feature Services, 0.3 objects can only depend on 0.3 objects, and 0.4 objects can only depend on 0.4 objects. Feature repos can be upgraded incrementally, but Feature Views must be upgraded in lock-step with any upstream dependencies (Transformations, Entities, Data Sources).

Upgrade Hive Data Sources using `date_partition_column`

In 0.3, date_partition_column was deprecated and replaced with datetime_partition_columns. Users must migrate off of date_partition_column as a separate step before migrating to 0.4 non-compat.

Consider the following Hive Data Source config using date_partition_column:

...
hive_config = HiveDSConfig(
    table="credit_scores",
    database="demo_fraud",
    timestamp_column_name="timestamp",
    date_partition_column="day",
)
...

1.) Migrate off `date_partition_column` using `--suppress-recreates`

Update your definition to use datetime_partition_columns in 0.4 compat:

...
hive_config = HiveDSConfig(
    table="credit_scores",
    database="demo_fraud",
    timestamp_column_name="timestamp",
    datetime_partition_columns=[DatetimePartitionColumn(column_name="day", datepart="date", zero_padded=True)],
)
...

Run tecton plan --suppress-recreates to avoid recreating your data source, and re-materializing dependent feature views.

2.) Upgrade to 0.4 non-compat

Update your config to the 0.4 definition:

...
hive_config = HiveConfig(
    table="credit_scores",
    database="demo_fraud",
    timestamp_field="timestamp",
    datetime_partition_columns=[DatetimePartitionColumn(column_name="day", datepart="date", zero_padded=True)],
)
...

Run tecton apply to upgrade your data source.

Upgrade Feature Views with Aggregations

note

When following this upgrade procedure, feature data will not be rematerialized.

In 0.3, in a Batch Window Aggregate Feature View or a Stream Window Aggregate Feature View, multiple time_windows can be specified in the FeatureAggregation definition that is used in the aggregations parameter of the Feature View. For example:

...
aggregation_slide_period = "1day"
aggregations = ([FeatureAggregation(column="transaction", function="count", time_windows=["24h", "30d", "90d"])],)
...

In 0.4, FeatureAggregation has been replaced with Aggregation, which only supports one time window per definition; when upgrading you must use multiple Aggregation definitions if there are multiple time windows. Maintain the same ordering of aggregations when upgrading your Feature View.

Additionally, when upgrading, for each Aggregation you must specify the name parameter, which uses the format <column>_<function>_<time window>_<aggregation interval> where:

<column> is the column value in the 0.3 FeatureAggregation
<function> is the function value in the 0.3 FeatureAggregation
<time window> is one of the elements in the time_windows list in the 0.3 FeatureAggregation
<aggregation interval> is the value of the aggregation_interval (in string format), in the 0.4 Aggregation

For example, to upgrade the aggregations example above, where the value of aggregation_interval is timedelta(days=1) rewrite aggregations to:

...
aggregations = (
    [
        Aggregation(
            column="transaction",
            function="count",
            time_window=timedelta(days=1),
            name="transaction_count_24h_1day",
        ),
        Aggregation(
            column="transaction",
            function="count",
            time_window=timedelta(days=30),
            name="transaction_count_30d_1day",
        ),
        Aggregation(
            column="transaction",
            function="count",
            time_window=timedelta(days=90),
            name="transaction_count_90d_1day",
        ),
    ],
)
...

By using the format for the name parameter as explained above, the 0.4 feature output column names will remain the same as the 0.3 definition.

Upgrade Feature Views containing the `window` parameter in `Input` object definitions

note

When following this upgrade procedure, feature data will not be rematerialized.

In 0.3, an Input in a Feature View can have a window that defines how long to look back for data from the current time.

In 0.4, a FilteredSource can define a start_time_offset (the equivalent of window), which pre-filters the Data Source to the time range beginning with start_time + start_time_offset of the current materialization window, and ending with end_time of the current materialization window.

To upgrade to FilteredSource, set start_time_offset = -1 * (window - batch_schedule), where window is from the 0.3 Input definition and batch_schedule is from the Feature View definition. If your previous window was WINDOW_UNBOUNDED_PRECEDING, then set the start_time_offset=timedelta.min.

For a concrete example, refer to Example 4 in the section below.

Upgrade Feature Views containing `schedule_offset`

note

When following this upgrade procedure, feature data will not be rematerialized.

In 0.3, the inputs parameter of a Feature View can optionally specify a schedule_offset as follows:

inputs = {"transactions": Input(transactions_batch, schedule_offset="1hr")}

schedule_offset specifies how long to wait after the end of the batch_schedule period before starting the next materialization job. This parameter is typically used to ensure that all data has landed, and in most cases should be the same for all Feature View Inputs that use the same Data Source.

In 0.4, the equivalent parameter, data_delay, is configured in the config object (such as HiveConfig and FileConfig) that is referenced in a Data Source.

Check the schedule_offset for all Feature Views that use a given Data Source and follow the steps below based on whether they are currently equivalent. Note: Follow the second set of instructions if schedule_offset is only set in some Feature View Inputs that use the same Data Source.

If all Feature View `Input`s referring to a given Data Source use the same `schedule_offset` value

Upgrade the Data Source to 0.4 and set the data_delay parameter of the Data Source config object to the value of schedule_offset defined in the Feature Views. Run tecton apply with these changes. You should see an output similar to the following:

$ tecton apply
Using workspace "my_workspace" on cluster https://my_app.tecton.ai
✅ Imported 1 Python modules from the feature repository
✅ Collecting local feature declarations
✅ Performing server-side feature validation: Finished generating plan.
↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓

  ~ Upgrade BatchDataSource to the latest Tecton framework
    name:            transactions_batch
    description:     Batch Data Source for transactions_stream

  ~ Upgrade StreamDataSource to the latest Tecton framework
    name:            transactions_stream
    description:     Kafka Data Source with transaction data

↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑

Feature Views using that Data Source can be incrementally upgraded at any time.

If Feature View `Input`s referring to a given Data Source use different `schedule_offset` values

If the Inputs referring to the same Data Source use different schedule_offset values, verify, as required by your use case, if they can be updated to the same value. Note: Stream Feature Views cannot use different schedule_offset values for the same Stream Data Source without re-materializing data.

note

For most use cases, the Inputs referring to the same Data Source can use the same schedule_offset value. Using the same value is desireable because the upgrade from a 0.3 to a 0.4 Feature View is simpler.

Use of one `schedule_offset` value

If one schedule_offset value can be used for all usages of the Data Source, follow these steps:

Prior to upgrading each of the Feature Views to 0.4, update all occurrences of schedule_offset in Input for a particular Data Source to use the same value. To avoid re-materializing the data for this Feature View, run tecton apply --suppress-recreates.

You should get an output similar the following:

$tecton plan --suppress-recreates
Using workspace "my_workspace" on cluster https://my_app.tecton.ai
✅ Imported 1 Python modules from the feature repository
✅ Collecting local feature declarations
✅ Performing server-side feature validation: Finished generating plan.
↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓

  ~ Update BatchFeatureView
    name:            user_has_great_credit
    description:     Whether the user has a great credit score (over 740).
    pipeline.root.transformation_node.inputs[0].node.data_source_node.schedule_offset: 1h -> 2h
    materialization: No new materialization jobs

↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑

Upgrade the Data Source object to the 0.4 (non-compat) definition and set the data_delay parameter of the Data Source config object to the value of schedule_offset. Feature Views using the Data Source can be upgraded incrementally at any time.

Use of multiple `schedule_offset` values

If all Input objects that refer to the same Data Source cannot use the same schedule_offset value, then each Input object containing a unique schedule_offset value will require a separate Data Source.

The following example describes how to upgrade from features views that require the use of multiple schedule_offset to Data Sources that use data_delay.

Suppose you have three feature views, as follows, where each Input object uses the transactions_batch data source with a different schedule_offset value:

...
inputs = ({"transactions": Input(transactions_batch)},)
...

...
inputs = ({"transactions": Input(transactions_batch, schedule_offset="1hr")},)
...

...
inputs = ({"transactions": Input(transactions_batch, schedule_offset="2hr")},)
...

Make two copies of the source file containing the transactions_batch compat Data Source. In the copied source files, replace transactions_batch with transactions_batch_copy_1 and transactions_batch_copy_2.
Update the Input object definition in two of the features views, replacing transactions_batch with the new name.

...
inputs = ({"transactions": Input(transactions_batch)},)
...

...
inputs = ({"transactions": Input(transactions_batch_copy_1, schedule_offset="1hr")},)
...

...
inputs = ({"transactions": Input(transactions_batch_copy_2, schedule_offset="2hr")},)
...

Apply your changes with tecton apply. If upgrading Batch Feature Views, you can use the suppress-recreates flag to avoid re-materializing data.

The suppress-recreates flag for Stream Feature Views, you must re-materialize data if you need different schedule_offset values.

You can see the two new Data Sources were created and the corresponding feature views were updated in the plan output below:

$tecton apply --suppress-recreates
Using workspace "my_workspace" on cluster https://my_app.tecton.ai
✅ Imported 26 Python modules from the feature repository
✅ Collecting local feature declarations
✅ Performing server-side feature validation: Finished generating plan.
↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓

    + Create BatchDataSource
        name:            transactions_batch_copy_1
        description:     Copy of transactions_batch with offset 1hr

    + Create BatchDataSource
        name:            transactions_batch_copy_2
        description:     Copy of transactions_batch with offset 2hr

    ~ Update BatchFeatureView
        name:            feature_view_2
        description:     Feature View with schedule_offset of 1hr
        DependencyChanged(VirtualDataSource):  -> transactions_batch_copy_1
        materialization: No new materialization jobs

    ~ Update BatchFeatureView
        name:            feature_view_3
        description:     Feature View with schedule_offset of 2hr
        DependencyChanged(VirtualDataSource):  -> transactions_batch_copy_2
        materialization: No new materialization jobs

↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑

Upgrade the three Data Sources to 0.4 (non-compat). During this process, set the data_delay value in each Data Source to the corresponding schedule_offset value in the Feature View which refers to that Data Source.

transactions_batch = BatchSource(
    batch_config=HiveConfig(
        database="transactions",
        # No data delay needed
    ),
    # ...
)

transactions_batch_copy_1 = BatchSource(
    batch_config=HiveConfig(
        database="transactions",
        data_delay=time_delta(hours=1),
    ),
    # ...
)

transactions_batch_copy_2 = BatchSource(
    batch_config=HiveConfig(
        database="transactions",
        data_delay=time_delta(hours=2),
    ),
    # ...
)

Upgrade the three Feature Views to non-compat 0.4. They can be upgraded all at once, or incrementally. For example,

...
inputs = ({"transactions": Input(transactions_batch, schedule_offset="1hr")},)
...

becomes:

...
sources = (FilteredSource(transactions_batch),)
...

Upgrade Feature Views that use `tecton_sliding_window`

In 0.4, tecton_sliding_window is deprecated, and no longer managed by Tecton.

note

We recommend migrating away from tecton_sliding_window and using incremental backfills in 0.4. Please reach out to customer success for more information.

We have provided an implementation of the tecton_sliding_window transformation in this repository. You can copy and paste that transformation into your repository and continue to maintain it. Follow the instructions in the repository README to upgrade.

Examples of upgrading 0.3 objects to 0.4 objects

Example 1: Converting a 0.3 BatchDataSource to a 0.4 BatchSource

# 0.3
from tecton.compat import HiveDSConfig
from tecton.compat import BatchDataSource

hive_config = HiveDSConfig(
  table="transaction_log",
  database="accounting"
)

batch03 = BatchDataSource(
  name="tx_log",
  batch_ds_config=hive_config
)

# 0.4
from tecton import HiveConfig
from tecton import BatchSource

hive_config = HiveConfig(
  table="transaction_log",
  database="accounting"
)

batch04 = BatchSource(
  name="tx_log",
  batch_config=hive_config
)

Example 2: Converting a 0.3 StreamDataSource to a 0.4 StreamSource

# 0.3
from tecton.compat import HiveDSConfig
from tecton.compat import KinesisDSConfig
from tecton.compat import StreamDataSource

hive_config = HiveDSConfig(
  table="transaction_log",
  database="accounting"
)

def noop_translator(df):
  return df

kinesis_config = KinesisDSConfig(
  stream_name="transaction_stream",
  raw_stream_translator=noop_translator,
  region="us-west-2"
)

stream03 = StreamDataSource(
  name="transactions_03",
  batch_ds_config=hive_config,
  stream_ds_config=stream_config
)

# 0.4
from tecton import HiveConfig
from tecton import KinesisConfig
from tecton import StreamSource

hive_config = HiveConfig(
  table="transaction_log",
  database="accounting"
)

def noop_translator(df):
  return df

kinesis_config = KinesisConfig(
  stream_name="transaction_stream",
  post_processor=noop_translator,
  region="us-west-2"
)

stream04 = StreamSource(
  name="transactions_04",
  batch_config=hive_config,
  stream_config=kinesis_config
)

Example 3: Converting a 0.3 Stream Window Aggregate Feature View to a 0.4 Stream Feature View

# 0.3
from tecton.compat import stream_window_aggregate_feature_view
from tecton.compat import FeatureAggregation
from tecton.compat import Input

from datetime import datetime

@stream_window_aggregate_feature_view(
    inputs={'ad_impressions': Input(ad_impressions_stream)},
    entities=[user],
    mode='spark_sql',
    aggregation_slide_period='1h',
    aggregations=[
      FeatureAggregation(
        column='impression',
        function='count',
        time_windows=['1h', '24h','72h']
      )
    ],
    online=False,
    offline=False,
    batch_schedule='1d',
    feature_start_time=datetime(2022, 5, 1),
)
def user_ad_impression_counts(ad_impressions):
    return f"""
        SELECT
            user_uuid as user_id,
            ad_id,
            1 as impression,
            timestamp
        FROM
            {ad_impressions}
        """

# 0.4
from tecton import stream_feature_view
from tecton import Aggregation
from tecton import FilteredSource

from datetime import datetime, timedelta

@stream_feature_view(
    source=FilteredSource(ad_impressions_stream),
    entities=[user],
    mode='spark_sql',
    aggregation_interval=timedelta(hours=1),
    aggregations=[
        Aggregation(column='impression', function='count', time_window=timedelta(hours=1), name="impression_count_1h_1d"),
        Aggregation(column='impression', function='count', time_window=timedelta(hours=24), name="impression_count_24h_1d"),
        Aggregation(column='impression', function='count', time_window=timedelta(hours=72), name="impression_count_72h_1d"),
    ],
    online=False,
    offline=False,
    batch_schedule=timedelta(days=1),
    feature_start_time=datetime(2022, 5, 1),
)
def user_impression_counts(ad_impressions):
    return f'''
        SELECT
            user_uuid as user_id,
            1 as impression,
            timestamp
        FROM
            {ad_impressions}
        '''

Example 4: Converting a 0.3 Batch Feature View (non-aggregate) to a 0.4 Batch Feature View

# 0.3
from tecton.compat import batch_feature_view
from tecton.compat import Input
from tecton.compat import tecton_sliding_window
from tecton.compat import transformation
from tecton.compat import const
from tecton.compat import BackfillConfig

from datetime import datetime

# Counts distinct ad ids for each user and window. The timestamp
# for the feature is the end of the window, which is set by using
# the tecton_sliding_window transformation
@transformation(mode='spark_sql')
def user_distinct_ad_count_transformation(window_input_df):
    return f'''
        SELECT
            user_uuid as user_id,
            approx_count_distinct(ad_id) as distinct_ad_count,
            window_end as timestamp
        FROM
            {window_input_df}
        GROUP BY
            user_uuid, window_end
        '''

@batch_feature_view(
    inputs={'ad_impressions': Input(ad_impressions_batch, window='7d')},
    entities=[user],
    mode='pipeline',
    ttl='4d',
    batch_schedule='1d',
    online=False,
    offline=True,
    feature_start_time=datetime(2022, 5, 1),
    backfill_config=BackfillConfig("multiple_batch_schedule_intervals_per_job"),
)
def user_distinct_ad_count_7d(ad_impressions):
    return user_distinct_ad_count_transformation(
        # Use tecton_sliding_transformation to create trailing 7 day time windows.
        # The slide_interval defaults to the batch_schedule (1 day).
        tecton_sliding_window(ad_impressions,
            timestamp_key=const('timestamp'),
            window_size=const('7d')))

# 0.4
from tecton import batch_feature_view
from tecton import FilteredSource
from tecton import materialization_context

from datetime import datetime, timedelta


@batch_feature_view(
    sources=[FilteredSource(ad_impressions_batch, start_time_offset=timedelta(days=-6))],
    entities=[user],
    mode='spark_sql',
    ttl=timedelta(days=4),
    batch_schedule=timedelta(days=1),
    incremental_backfills=True,
    online=False,
    offline=True,
    feature_start_time=datetime(2022, 5, 1)
)
def user_distinct_ad_count_7d(ad_impressions, context=materialization_context()):
    return f'''
        SELECT
            user_uuid as user_id,
            approx_count_distinct(ad_id) as distinct_ad_count,
            TO_TIMESTAMP("{context.end_time}") - INTERVAL 1 MICROSECOND as timestamp
        FROM
            {ad_impressions}
        GROUP BY
            user_uuid
    '''

Example 5: Converting a 0.3 Feature Table to a 0.4 Feature Table

# 0.3
from pyspark.sql.types import (
  StructType,
  StructField,
  FloatType,
  ArrayType,
  StringType,
  TimestampType
)
from tecton.compat import Entity
from tecton.compat import FeatureTable
from tecton.compat import DeltaConfig


schema = StructType([
    StructField('user_id', StringType()),
    StructField('timestamp', TimestampType()),
    StructField('user_embedding', ArrayType(FloatType())),
])


user_embeddings = FeatureTable(
    name='user_embeddings',
    entities=[user],
    schema=schema,
    online=True,
    offline=True,
    ttl='10day',
    description='Precomputed user embeddings pushed into Tecton.'
)

# 0.4
from tecton.types import Field, String, Timestamp, Array, Float64
from tecton import Entity, FeatureTable, DeltaConfig
from datetime import timedelta


schema = [
    Field('user_id', String),
    Field('timestamp', Timestamp),
    Field('user_embedding', Array(Float64))
]


user_embeddings = FeatureTable(
    name='user_embeddings',
    entities=[user],
    schema=schema,
    online=True,
    offline=True,
    ttl=timedelta(days=10),
    description='Precomputed user embeddings pushed into Tecton.'
)

Example 6: Converting a 0.3 On-Demand Feature View to a 0.4 On-Demand Feature View

# 0.3
from tecton.compat import RequestDataSource
from tecton.compat import Input
from tecton.compat import on_demand_feature_view

from pyspark.sql.types import StructType, StructField, FloatType, ArrayType, DoubleType

from ads.features.feature_tables.user_embeddings import user_embeddings

request_schema = StructType([StructField('query_embedding', ArrayType(FloatType()))])
request = RequestDataSource(request_schema=request_schema)

output_schema = StructType([StructField('cosine_similarity', DoubleType())])


@on_demand_feature_view(
    inputs={
        'request': Input(request),
        'user_embedding': Input(user_embeddings)
    },
    mode='python',
    output_schema=output_schema
)
def user_query_embedding_similarity(request, user_embedding):
    import numpy as np
    from numpy.linalg import norm

    @np.vectorize
    def cosine_similarity(a: np.ndarray, b: np.ndarray):
        # Handle the case where there is no precomputed user embedding.
        if a is None or b is None:
            return -1.0

        return np.dot(a, b)/(norm(a)*norm(b))

    result = {}
    result["cosine_similarity"] = cosine_similarity(
      user_embedding["user_embedding"], request["query_embedding"]
    ).astype('float64')

    return result

# 0.4
from tecton import RequestSource
from tecton import on_demand_feature_view
from tecton.types import Field, Array, Float64

from ads.features.feature_tables.user_embeddings import user_embeddings

request_schema = [Field('query_embedding', Array(Float64))]
request = RequestSource(schema=request_schema)

output_schema = [Field('cosine_similarity', Float64)]


@on_demand_feature_view(
    sources=[request, user_embeddings],
    mode='python',
    schema=output_schema
)
def user_query_embedding_similarity(request, user_embedding):
    import numpy as np
    from numpy.linalg import norm

    @np.vectorize
    def cosine_similarity(a: np.ndarray, b: np.ndarray):
        # Handle the case where there is no precomputed user embedding.
        if a is None or b is None:
            return -1.0

        return np.dot(a, b)/(norm(a)*norm(b))

    result = {}
    result["cosine_similarity"] = cosine_similarity(
      user_embedding["user_embedding"], request["query_embedding"]
    ).astype('float64')

    return result

FAQs

How do Batch/Stream Window Aggregate Feature Views in 0.3 map to 0.4 Feature Views?

A 0.4 batch_feature_view that has the aggregations and aggregation_interval parameters set will behave the same as a 0.3 batch_window_aggregate_feature_view (the same is true for stream_feature_view). See the Batch Feature View Time-Windowed Aggregations Section for more info.

When should I use incremental backfills?

When Tecton's built-in aggregations aren't an option, using incremental_backfills=True will instruct Tecton to execute your query every batch_schedule with each job being responsible for a single time-window aggregation. See the Incremental Backfill guide for more information.

When should I use `FilteredSource`?

FilteredSource should be used whenever possible for Spark users (Databricks or EMR) to push down time filtering to Data Sources. This will make incremental materialization jobs much more efficient since costly scans across long time ranges can be avoided.

How long will Tecton 0.3 be supported for?

See Release Notes for more details.

Was this page helpful?

Happy React is loading...

0.3 to 0.4 Upgrade Guide

What's new in 0.4​

0.3 and 0.4 side-by-side comparison​

Class Renames/Changes​

Feature View/Table Parameter Changes​

Data Source Parameter Changes​

Interactive Method Changes​

Incremental upgrades from 0.3 to 0.4​

⚠️ Important: 0.4 Feature Services and Feature Views​

Steps to upgrade to the Tecton 0.4 CLI​

1. Verify the repo is a version 0.3 repo​

2. Migrate imports to the compat library​

3. Upgrade your CLI to 0.4​

4. Congrats!​

Upgrade the 0.3 feature repo to use 0.4 Tecton object definitions​

Upgrade Hive Data Sources using date_partition_column​

1.) Migrate off date_partition_column using --suppress-recreates​

2.) Upgrade to 0.4 non-compat​

Upgrade Feature Views with Aggregations​

Upgrade Feature Views containing the window parameter in Input object definitions​

Upgrade Feature Views containing schedule_offset​

If all Feature View Inputs referring to a given Data Source use the same schedule_offset value​

If Feature View Inputs referring to a given Data Source use different schedule_offset values​

Use of one schedule_offset value​

Use of multiple schedule_offset values​

Upgrade Feature Views that use tecton_sliding_window​

Examples of upgrading 0.3 objects to 0.4 objects​

Example 1: Converting a 0.3 BatchDataSource to a 0.4 BatchSource​

Example 2: Converting a 0.3 StreamDataSource to a 0.4 StreamSource​

Example 3: Converting a 0.3 Stream Window Aggregate Feature View to a 0.4 Stream Feature View​

Example 4: Converting a 0.3 Batch Feature View (non-aggregate) to a 0.4 Batch Feature View​

Example 5: Converting a 0.3 Feature Table to a 0.4 Feature Table​

Example 6: Converting a 0.3 On-Demand Feature View to a 0.4 On-Demand Feature View​

FAQs​

How do Batch/Stream Window Aggregate Feature Views in 0.3 map to 0.4 Feature Views?​

When should I use incremental backfills?​

When should I use FilteredSource?​

How long will Tecton 0.3 be supported for?​

Was this page helpful?

What's new in 0.4

0.3 and 0.4 side-by-side comparison

Class Renames/Changes

Feature View/Table Parameter Changes

Data Source Parameter Changes

Interactive Method Changes

Incremental upgrades from 0.3 to 0.4

⚠️ Important: 0.4 Feature Services and Feature Views

Steps to upgrade to the Tecton 0.4 CLI

1. Verify the repo is a version 0.3 repo

2. Migrate imports to the `compat` library

3. Upgrade your CLI to 0.4

4. Congrats!

Upgrade the 0.3 feature repo to use 0.4 Tecton object definitions

Upgrade Hive Data Sources using `date_partition_column`

1.) Migrate off `date_partition_column` using `--suppress-recreates`

2.) Upgrade to 0.4 non-compat

Upgrade Feature Views with Aggregations

Upgrade Feature Views containing the `window` parameter in `Input` object definitions

Upgrade Feature Views containing `schedule_offset`

If all Feature View `Input`s referring to a given Data Source use the same `schedule_offset` value

If Feature View `Input`s referring to a given Data Source use different `schedule_offset` values

Use of one `schedule_offset` value

Use of multiple `schedule_offset` values

Upgrade Feature Views that use `tecton_sliding_window`

Examples of upgrading 0.3 objects to 0.4 objects

Example 1: Converting a 0.3 BatchDataSource to a 0.4 BatchSource

Example 2: Converting a 0.3 StreamDataSource to a 0.4 StreamSource

Example 3: Converting a 0.3 Stream Window Aggregate Feature View to a 0.4 Stream Feature View

Example 4: Converting a 0.3 Batch Feature View (non-aggregate) to a 0.4 Batch Feature View

Example 5: Converting a 0.3 Feature Table to a 0.4 Feature Table

Example 6: Converting a 0.3 On-Demand Feature View to a 0.4 On-Demand Feature View

FAQs

How do Batch/Stream Window Aggregate Feature Views in 0.3 map to 0.4 Feature Views?

When should I use incremental backfills?

When should I use `FilteredSource`?

How long will Tecton 0.3 be supported for?