0.3 to 0.4 Upgrade Guide
What's new in 0.4
Tecton 0.4 includes the next generation feature definition framework with native support for Snowflake transformations. The major goal for this release is to simplify core concepts, while increasing flexibility and maintaining high performance.
Tecton 0.4 includes many updates and improvements. For a full list see the Tecton 0.4 Release Notes.
0.3 and 0.4 side-by-side comparison
Tecton version 0.4 introduces new classes which replace classes that were available in version 0.3. The following tables list the mapping between 0.3 and 0.4 classes, parameters, and methods.
Class Renames/Changes
0.3 Definition | 0.4 Definition |
---|---|
Data Sources | |
BatchDataSource | BatchSource |
StreamDataSource | StreamSource |
RequestDataSource | RequestSource |
Data Source Configs | |
FileDSConfig | FileConfig |
HiveDSConfig | HiveConfig |
KafkaDSConfig | KafkaConfig |
KinesisDSConfig | KinesisConfig |
RedshiftDSConfig | RedshiftConfig |
SnowflakeDSConfig | SnowflakeConfig |
Feature Views | |
@batch_window_aggregate_feature_view | @batch_feature_view |
@stream_window_aggregate_feature_view | @stream_feature_view |
Misc Classes | |
FeatureAggregation | Aggregation |
New Classes | |
- | AggregationMode |
- | KafkaOutputStream |
- | KinesisOutputStream |
- | FilteredSource |
Deprecated Classes in 0.3 | |
Input | - |
BackfillConfig | - |
MonitoringConfig | - |
Feature View/Table Parameter Changes
0.3 Definition | 0.4 Definition | Type Changes |
---|---|---|
inputs | sources | Dict -> List of Data Sources or Filtered Source |
name_override | name | |
aggregation_slide_period | aggregation_interval | str -> datetime.timedelta |
timestamp_key | timestamp_field | |
batch_cluster_config | batch_compute | |
stream_cluster_config | stream_compute | |
online_config | online_store | |
offline_config | offline_store | |
output_schema | schema | See release notes for type changes |
batch_schedule, ttl, max_batch_aggregation_interval | str -> datetime.timedelta | |
family | - (removed) | |
schedule_offset(nested in Input) | - (removed, see DataSource data_delay) | |
window(nested in Input) | - (removed, see start_time_offset) | |
- | start_time_offset (window - Feature View's batch_schedule) | |
monitoring.alert_email (nested in MonitoringConfig) | alert_email | |
monitoring.monitor_freshness (nested in MonitoringConfig) | monitor_freshness | |
monitoring.expected_feature_freshness (nested in MonitoringConfig) | expected_feature_freshness | str -> datetime.timedelta |
Data Source Parameter Changes
0.3 Definition | 0.4 Definition | Type Changes |
---|---|---|
Data Sources | ||
batch_ds_config | batch_config | |
stream_ds_config | stream_config | |
request_schema | schema | See release notes for type changes |
Data Sources Configs | ||
timestamp_column_name | timestamp_field | |
timestamp_field | timestamp_field | |
raw_batch_translator | post_processor | |
raw_stream_translator | post_processor | |
default_watermark_delay_threshold | watermark_delay_threshold | str -> datetime.timedelta |
default_initial_stream_position | initial_stream_position | |
schedule_offset (defined in Feature View) | data_delay |
Interactive Method Changes
In addition to declarative classes, interactive FeatureView and FeatureTable methods with overlapping functionality have been consolidated.
0.3 Interactive Method | 0.4 Interactive Method |
---|---|
get_historical_features | get_historical_features |
preview | get_historical_features |
get_feature_dataframe | get_historical_features |
get_features | get_historical_features |
get_online_features | get_online_features |
get_feature_vector | get_online_features |
run | run* |
*arguments have changed in 0.4
Incremental upgrades from 0.3 to 0.4
Feature repository updates and CLI updates can be decoupled using the compat
library for 0.3 feature definitions. Feature definitions using the 0.3 objects
can continue to be used after updating to 0.4 CLI when imports are migrated to
the compat
module. 0.3 objects cannot be applied using tecton~=0.4
without
changing from tecton
import paths to from tecton.compat
. For example:
# 0.3 object imports can only be applied using tecton 0.3
from tecton import Entity
# 0.3 object imports from `compat` can be applied using tecton 0.4
from tecton.compat import Entity
See the table below for a compatibility matrix.
CLI 0.3 | CLI 0.4 | |
---|---|---|
Framework 0.3 | from tecton import (as normal) | from tecton.compat import required |
Framework 0.4 | Not supported | from tecton import (as normal) |
⚠️ Important: 0.4 Feature Services and Feature Views
The format of feature column names for offline features has changed from
feature_view_name.feature_name
to feature_view_name__feature_name
. Feature
view names cannot contain double underscores. Online feature columns will
continue to use the format feature_view_name.feature_name
.
Steps to upgrade to the Tecton 0.4 CLI
The steps below will allow you to upgrade the 0.3 CLI to the 0.4 CLI and use the 0.4 CLI with a 0.3 feature repo.
1. Verify the repo is a version 0.3 repo
pip install tecton~=0.3.0
and run tecton plan
. This should yield no diffs
before beginning the migration.
2. Migrate imports to the compat
library
Update all imports from the tecton
module that are found in the 0.3.0 column
of the table above to import from tecton.compat
instead. This is important!
You should only update 0.3.0 classes to import from tecton.compat
. Do not
run tecton apply
yet!
3. Upgrade your CLI to 0.4
Install tecton~=0.4.0
and run tecton plan
. This should yield no diffs.
4. Congrats!
You now have a 0.3 repo compatible with the 0.4 CLI.
Upgrade the 0.3 feature repo to use 0.4 Tecton object definitions
Once you've upgraded to the Tecton 0.4 CLI and updated the 0.3 feature repo to make it compatible with the 0.4 CLI, you can begin updating the feature repo to use 0.4 Tecton object definitions.
The fundamentals of 0.3 and 0.4 are similar except for class and parameter names. Most upgrades will require light refactoring of each Tecton object definition, refer to the class and parameter changes above. However, there are specific scenarios detailed below that require additional steps and refactoring particularly to avoid re-materializing data. Please read them carefully before upgrading your feature repo. Note: Objects must be semantically identical when doing an upgrade; no property changes can be combined with an upgrade.
Important: With the exception of Feature Services, 0.3 objects can only depend on 0.3 objects, and 0.4 objects can only depend on 0.4 objects. Feature repos can be upgraded incrementally, but Feature Views must be upgraded in lock-step with any upstream dependencies (Transformations, Entities, Data Sources).
Upgrade Hive Data Sources using date_partition_column
In 0.3, date_partition_column
was deprecated and replaced with
datetime_partition_columns
. Users must migrate off of date_partition_column as
a separate step before migrating to 0.4 non-compat.
Consider the following Hive Data Source config using date_partition_column
:
...
hive_config = HiveDSConfig(
table="credit_scores",
database="demo_fraud",
timestamp_column_name="timestamp",
date_partition_column="day",
)
...
1.) Migrate off date_partition_column
using --suppress-recreates
Update your definition to use datetime_partition_columns
in 0.4 compat:
...
hive_config = HiveDSConfig(
table="credit_scores",
database="demo_fraud",
timestamp_column_name="timestamp",
datetime_partition_columns=[DatetimePartitionColumn(column_name="day", datepart="date", zero_padded=True)],
)
...
Run tecton plan --suppress-recreates
to avoid recreating your data source, and
re-materializing dependent feature views.
2.) Upgrade to 0.4 non-compat
Update your config to the 0.4 definition:
...
hive_config = HiveConfig(
table="credit_scores",
database="demo_fraud",
timestamp_field="timestamp",
datetime_partition_columns=[DatetimePartitionColumn(column_name="day", datepart="date", zero_padded=True)],
)
...
Run tecton apply
to upgrade your data source.
Upgrade Feature Views with Aggregations
When following this upgrade procedure, feature data will not be rematerialized.
In 0.3, in a Batch Window Aggregate Feature View or a Stream Window Aggregate
Feature View, multiple time_windows
can be specified in the
FeatureAggregation
definition that is used in the aggregations
parameter of
the Feature View. For example:
...
aggregation_slide_period = "1day"
aggregations = ([FeatureAggregation(column="transaction", function="count", time_windows=["24h", "30d", "90d"])],)
...
In 0.4, FeatureAggregation
has been replaced with Aggregation
, which only
supports one time window per definition; when upgrading you must use multiple
Aggregation
definitions if there are multiple time windows. Maintain the same
ordering of aggregations when upgrading your Feature View.
Additionally, when upgrading, for each Aggregation
you must specify the name
parameter, which uses the format
<column>_<function>_<time window>_<aggregation interval>
where:
<column>
is thecolumn
value in the 0.3FeatureAggregation
<function>
is thefunction
value in the 0.3FeatureAggregation
<time window>
is one of the elements in thetime_windows
list in the 0.3FeatureAggregation
<aggregation interval>
is the value of theaggregation_interval
(in string format), in the 0.4Aggregation
.
For example, to upgrade the aggregations
example above, where the value of
aggregation_interval
is timedelta(days=1)
rewrite aggregations
to:
...
aggregations = (
[
Aggregation(
column="transaction",
function="count",
time_window=timedelta(days=1),
name="transaction_count_24h_1day",
),
Aggregation(
column="transaction",
function="count",
time_window=timedelta(days=30),
name="transaction_count_30d_1day",
),
Aggregation(
column="transaction",
function="count",
time_window=timedelta(days=90),
name="transaction_count_90d_1day",
),
],
)
...
By using the format for the name
parameter as explained above, the 0.4 feature
output column names will remain the same as the 0.3 definition.
Upgrade Feature Views containing the window
parameter in Input
object definitions
When following this upgrade procedure, feature data will not be rematerialized.
In 0.3, an Input
in a Feature View can have a window
that defines how long
to look back for data from the current time.
In 0.4, a FilteredSource
can define a start_time_offset
(the equivalent of
window
), which pre-filters the Data Source to the time range beginning with
start_time
+ start_time_offset
of the current materialization window, and
ending with end_time
of the current materialization window.
To upgrade to FilteredSource
, set
start_time_offset = -1 * (window - batch_schedule)
, where window
is from the
0.3 Input
definition and batch_schedule
is from the Feature View definition.
If your previous window was WINDOW_UNBOUNDED_PRECEDING, then set the
start_time_offset=timedelta.min.
For a concrete example, refer to Example 4 in the section below.
Upgrade Feature Views containing schedule_offset
When following this upgrade procedure, feature data will not be rematerialized.
In 0.3, the inputs
parameter of a Feature View can optionally specify a
schedule_offset
as follows:
inputs = {"transactions": Input(transactions_batch, schedule_offset="1hr")}
schedule_offset
specifies how long to wait after the end of the
batch_schedule
period before starting the next materialization job. This
parameter is typically used to ensure that all data has landed, and in most
cases should be the same for all Feature View Input
s that use the same Data
Source.
In 0.4, the equivalent parameter, data_delay
, is configured in the config
object (such as HiveConfig
and FileConfig
) that is referenced in a Data
Source.
Check the schedule_offset
for all Feature Views that use a given Data Source
and follow the steps below based on whether they are currently equivalent. Note:
Follow the second set of instructions if schedule_offset
is only set in some
Feature View Input
s that use the same Data Source.
If all Feature View Input
s referring to a given Data Source use the same schedule_offset
value
Upgrade the Data Source to 0.4 and set the
data_delay
parameter of the Data Source config object to the value ofschedule_offset
defined in the Feature Views. Runtecton apply
with these changes. You should see an output similar to the following:$ tecton apply
Using workspace "my_workspace" on cluster https://my_app.tecton.ai
✅ Imported 1 Python modules from the feature repository
✅ Collecting local feature declarations
✅ Performing server-side feature validation: Finished generating plan.
↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓
~ Upgrade BatchDataSource to the latest Tecton framework
name: transactions_batch
description: Batch Data Source for transactions_stream
~ Upgrade StreamDataSource to the latest Tecton framework
name: transactions_stream
description: Kafka Data Source with transaction data
↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑Feature Views using that Data Source can be incrementally upgraded at any time.
If Feature View Input
s referring to a given Data Source use different schedule_offset
values
If the Input
s referring to the same Data Source use different
schedule_offset
values, verify, as required by your use case, if they can be
updated to the same value. Note: Stream Feature Views cannot use different
schedule_offset
values for the same Stream Data Source without
re-materializing data.
For most use cases, the Input
s referring to the same Data Source can use the
same schedule_offset
value. Using the same value is desireable because the
upgrade from a 0.3 to a 0.4 Feature View is simpler.
Use of one schedule_offset
value
If one schedule_offset
value can be used for all usages of the Data Source,
follow these steps:
Prior to upgrading each of the Feature Views to 0.4, update all occurrences of
schedule_offset
inInput
for a particular Data Source to use the same value. To avoid re-materializing the data for this Feature View, runtecton apply --suppress-recreates
.You should get an output similar the following:
$tecton plan --suppress-recreates
Using workspace "my_workspace" on cluster https://my_app.tecton.ai
✅ Imported 1 Python modules from the feature repository
✅ Collecting local feature declarations
✅ Performing server-side feature validation: Finished generating plan.
↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓
~ Update BatchFeatureView
name: user_has_great_credit
description: Whether the user has a great credit score (over 740).
pipeline.root.transformation_node.inputs[0].node.data_source_node.schedule_offset: 1h -> 2h
materialization: No new materialization jobs
↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑Upgrade the Data Source object to the 0.4 (non-compat) definition and set the
data_delay
parameter of the Data Source config object to the value ofschedule_offset
. Feature Views using the Data Source can be upgraded incrementally at any time.
Use of multiple schedule_offset
values
If all Input
objects that refer to the same Data Source cannot use the
same schedule_offset
value, then each Input
object containing a unique
schedule_offset
value will require a separate Data Source.
The following example describes how to upgrade from features views that require
the use of multiple schedule_offset
to Data Sources that use data_delay
.
Suppose you have three feature views, as follows, where each Input
object uses
the transactions_batch
data source with a different schedule_offset
value:
...
inputs = ({"transactions": Input(transactions_batch)},)
...
...
inputs = ({"transactions": Input(transactions_batch, schedule_offset="1hr")},)
...
...
inputs = ({"transactions": Input(transactions_batch, schedule_offset="2hr")},)
...
Make two copies of the source file containing the
transactions_batch
compat Data Source. In the copied source files, replacetransactions_batch
withtransactions_batch_copy_1
andtransactions_batch_copy_2
.Update the
Input
object definition in two of the features views, replacingtransactions_batch
with the new name.
...
inputs = ({"transactions": Input(transactions_batch)},)
...
...
inputs = ({"transactions": Input(transactions_batch_copy_1, schedule_offset="1hr")},)
...
...
inputs = ({"transactions": Input(transactions_batch_copy_2, schedule_offset="2hr")},)
...
- Apply your changes with
tecton apply
. If upgrading Batch Feature Views, you can use thesuppress-recreates
flag to avoid re-materializing data.
The suppress-recreates
flag for Stream Feature Views, you must re-materialize
data if you need different schedule_offset
values.
You can see the two new Data Sources were created and the corresponding feature views were updated in the plan output below:
```bash
$tecton apply --suppress-recreates
Using workspace "my_workspace" on cluster https://my_app.tecton.ai
✅ Imported 26 Python modules from the feature repository
✅ Collecting local feature declarations
✅ Performing server-side feature validation: Finished generating plan.
↓↓↓↓↓↓↓↓↓↓↓↓ Plan Start ↓↓↓↓↓↓↓↓↓↓
+ Create BatchDataSource
name: transactions_batch_copy_1
description: Copy of transactions_batch with offset 1hr
+ Create BatchDataSource
name: transactions_batch_copy_2
description: Copy of transactions_batch with offset 2hr
~ Update BatchFeatureView
name: feature_view_2
description: Feature View with schedule_offset of 1hr
DependencyChanged(VirtualDataSource): -> transactions_batch_copy_1
materialization: No new materialization jobs
~ Update BatchFeatureView
name: feature_view_3
description: Feature View with schedule_offset of 2hr
DependencyChanged(VirtualDataSource): -> transactions_batch_copy_2
materialization: No new materialization jobs
↑↑↑↑↑↑↑↑↑↑↑↑ Plan End ↑↑↑↑↑↑↑↑↑↑↑↑
```
- Upgrade the three Data Sources to 0.4 (non-compat). During this process, set
the
data_delay
value in each Data Source to the correspondingschedule_offset
value in the Feature View which refers to that Data Source.
transactions_batch = BatchSource(
batch_config=HiveConfig(
database="transactions",
# No data delay needed
),
# ...
)
transactions_batch_copy_1 = BatchSource(
batch_config=HiveConfig(
database="transactions",
data_delay=time_delta(hours=1),
),
# ...
)
transactions_batch_copy_2 = BatchSource(
batch_config=HiveConfig(
database="transactions",
data_delay=time_delta(hours=2),
),
# ...
)
- Upgrade the three Feature Views to non-compat 0.4. They can be upgraded all at once, or incrementally. For example,
...
inputs = ({"transactions": Input(transactions_batch, schedule_offset="1hr")},)
...
becomes:
...
sources = (FilteredSource(transactions_batch),)
...
Upgrade Feature Views that use tecton_sliding_window
In 0.4, tecton_sliding_window
is deprecated, and no longer managed by Tecton.
We recommend migrating away from tecton_sliding_window
and using incremental
backfills in 0.4. Please reach out to customer success for more information.
We have provided an implementation of the tecton_sliding_window
transformation
in this
repository.
You can copy and paste that transformation into your repository and continue to
maintain it. Follow the instructions in the repository README to upgrade.
Examples of upgrading 0.3 objects to 0.4 objects
Example 1: Converting a 0.3 BatchDataSource to a 0.4 BatchSource
|
|
Example 2: Converting a 0.3 StreamDataSource to a 0.4 StreamSource
|
|
Example 3: Converting a 0.3 Stream Window Aggregate Feature View to a 0.4 Stream Feature View
|
|
Example 4: Converting a 0.3 Batch Feature View (non-aggregate) to a 0.4 Batch Feature View
|
|
Example 5: Converting a 0.3 Feature Table to a 0.4 Feature Table
|
|
Example 6: Converting a 0.3 On-Demand Feature View to a 0.4 On-Demand Feature View
|
|
FAQs
How do Batch/Stream Window Aggregate Feature Views in 0.3 map to 0.4 Feature Views?
A 0.4 batch_feature_view
that has the aggregations
and
aggregation_interval
parameters set will behave the same as a 0.3
batch_window_aggregate_feature_view
(the same is true for
stream_feature_view
). See the
Batch Feature View Time-Windowed Aggregations Section
for more info.
When should I use incremental backfills?
When Tecton's built-in aggregations aren't an option, using
incremental_backfills=True
will instruct Tecton to execute your query every
batch_schedule
with each job being responsible for a single time-window
aggregation. See the
Incremental Backfill guide
for more information.
When should I use FilteredSource
?
FilteredSource
should be used whenever possible for Spark users (Databricks or
EMR) to push down time filtering to Data Sources. This will make incremental
materialization jobs much more efficient since costly scans across long time
ranges can be avoided.
How long will Tecton 0.3 be supported for?
See Release Notes for more details.