Overview
Tecton 0.4 was released in June 2022. Tecton 0.4 includes the following framework improvements and changes:
- Snowflake support
- API simplification & improvements
- Materialization info diffs
Snowflake Support
Tecton 0.4 includes compatibility with Snowflake for processing and storing features. Once connected to a Snowflake warehouse, users can define features in Snowflake SQL or Snowpark.
@batch_feature_view(
    sources=[transactions],
    entities=[user],
    mode="snowflake_sql",
    aggregation_interval=timedelta(days=1),
    aggregations=[
        Aggregation(column="TRANSACTION", function="sum", time_window=timedelta(days=1)),
        Aggregation(column="TRANSACTION", function="sum", time_window=timedelta(days=7)),
        Aggregation(column="TRANSACTION", function="sum", time_window=timedelta(days=40)),
        Aggregation(column="AMT", function="mean", time_window=timedelta(days=1)),
        Aggregation(column="AMT", function="mean", time_window=timedelta(days=7)),
        Aggregation(column="AMT", function="mean", time_window=timedelta(days=40)),
    ],
    online=True,
    feature_start_time=datetime(2020, 10, 10),
    description="User transaction totals over a series of time windows, updated daily.",
)
def user_transaction_metrics(transactions):
    return f"""
        SELECT
            USER_ID,
            1 as TRANSACTION,
            AMT,
            TIMESTAMP
        FROM
            {transactions}
        """
API Simplification and Improvements
0.4 includes a large set of changes to simplify and improve Tecton’s
declarative Feature Repository API.
SDK 0.4 maintains backwards compatibility with the tecton.compat submodule.
Users can migrate from 0.3 to 0.4 without changing their Feature Repo by
importing Tecton objects from tecton.compat instead of tecton.
Functional Changes
- Removed batch_window_aggregate_feature_viewandstream_window_aggregate_feature_viewtypes.- batch_feature_viewand- stream_feature_viewnow support Tecton window aggregations.
- Rationale: These object types overlapped significantly and unnecessarily increased the number of concepts that new users had to learn.
 
- Changes to materialization timestamp filtering.- During materialization, the output of Feature Views will now be automatically filtered to the materialization period (i.e. the window of time that is being backfilled or updated incrementally at steady state).
- Data Sources no longer require a timestamp column to be defined because the time filter is now applied on the output of the Feature View.
- Users have two options for optimizing query performance by pushing down
timestamp filtering:- Handle time filtering with custom logic using the
materialization_context.
- Use FilteredSourceto have Tecton automatically filter the Data Source to the correct period before the Feature View transformation is applied.
 
- Handle time filtering with custom logic using the
- Rationale: Tecton's previous timestamp filtering logic worked well when a Feature View had exactly one Data Source and that Data Source had a timestamp column that was used directly as the Feature View feature time. Outside of that case, Tecton's timestamp filtering logic was unintuitive and the frequent source of bugs. This new logic should be simpler for most users while simultaneously providing more flexibility for power users.
- See this batch feature view overview for more information.
 
- Introduce “Incremental Backfilling” to Batch Feature Views.- incremental_backfillsis a new parameter for Batch Feature Views that changes how Tecton backfills the feature view. If set to- True, Tecton will backfill every period in the backfill window in its own job. In some cases (e.g. customer aggregations), this can lead to much simpler query definitions.
- Rationale: Provide a means for users to easily and correctly implement Feature Views with custom aggregations.
- See this guide for more info.
 
- Configurable data_delayon Data Sources.- Replaces schedule_offset, a Feature View parameter.
- By default, incremental (i.e. non-backfill) materialization jobs run
immediately at the end of the batch schedule period. data_delayconfigures how long materialization jobs should wait before running after the end of a period, typically to ensure that all data has landed. For example, if a feature view has abatch_scheduleof 1 day and one of the data source inputs has adata_delayof 1 hour, then incremental materialization jobs will run at01:00UTC (one hour after the period has ended).
- Rationale: This parameter delays materialization due to upstream data delays, which logically fits as a Data Source property. Feature Views now inherit data delays from all dependent Data Sources.
 
- Replaces 
- Support custom names for aggregate features.- Allow users to set custom names for aggregate features. (Previously, users
had to use Tecton auto-generated names like amount_mean_7d_1d.)
- Example:@batch_feature_view(
 # ...
 aggregations=[
 Aggregation(
 name="transaction_amount_daily_avg",
 column="amount",
 function="mean",
 time_window=timedelta(days=1),
 ),
 Aggregation(
 name="transaction_amount_weekly_avg",
 column="amount",
 function="mean",
 time_window=timedelta(days=7),
 ),
 ]
 )
 def user_transaction_counts(transactions):
 return f"""
 SELECT
 user_id,
 timestamp,
 amount
 FROM {transactions}
 """
 
- Allow users to set custom names for aggregate features. (Previously, users
had to use Tecton auto-generated names like 
Non-functional Changes
- Tecton data types - Tecton now uses - tecton.typeswhen defining Feature View schemas and Request Data Sources.
- Example: - from tecton import on_demand_feature_view, RequestSource
 from tecton.types import Int64, Bool, Field
 transaction_request = RequestSource(schema=[Field("transaction_amount_is_high", Int64)])
 
@on_demand_feature_view(
    sources=[transaction_request],
    mode="python",
    schema=[Field("transaction_amount_is_high", Bool)],
)
def transaction_amount_is_high(transaction_request):
    return {"transaction_amount_is_high": transaction_request["amount"] >= 10000}
```
- Rationale: Previously Tecton used PySpark data types to define all schemas. This made PySpark a required dependency for the Tecton SDK, but Tecton can now be used without Spark with Snowflake. Tecton will continue to use native data types (PySpark, Snowflake, etc.) in data platform specific contexts, e.g. when providing an explicit schema for a Spark Data Source. 
- Use - timedeltafor a duration parameters instead of- pytimestrings.- E.g. time_window=timedelta(hours=12)instead oftime_window="12h"
- Rationale: Consistent with API’s usage of datetimeobjects, removes an API dependency on the PyTime implementation, and less ambiguous.
 
- E.g. 
- Use functional style to define Feature View overrides in Feature Services. - Example:
 - transaction_fraud_service = FeatureService(
 name="transaction_fraud_service",
 features=[
 # Select a subset of features from a feature view.
 transaction_features[["amount"]],
 # Rename a feature view and/or rebind its join keys. In this example, we want user features for both the
 # transaction sender and recipient, so include the feature view twice and bind it to two different feature
 # service join keys.
 user_features.with_name("sender_features").with_join_key_map({"user_id": "sender_id"}),
 user_features.with_name("recipient_features").with_join_key_map({"user_id": "recipient_id"}),
 ],
 )
Parameter/Class Changes
Class Renames/Changes
| 0.3 Definition | 0.4 Definition | 
|---|---|
| Data Sources | |
| BatchDataSource | BatchSource | 
| StreamDataSource | StreamSource | 
| FileDSConfig | FileConfig | 
| HiveDSConfig | HiveConfig | 
| KafkaDSConfig | KafkaConfig | 
| KinesisDSConfig | KinesisConfig | 
| RedshiftDSConfig | RedshiftConfig | 
| RequestDataSource | RequestSource | 
| SnowflakeDSConfig | SnowflakeConfig | 
| Feature Views | |
| @batch_window_aggregate_feature_view | @batch_feature_view | 
| @stream_window_aggregate_feature_view | @stream_feature_view | 
| Misc Classes | |
| FeatureAggregation | Aggregation | 
| New Classes | |
| - | AggregationMode | 
| - | KafkaOutputStream | 
| - | KinesisOutputStream | 
| - | FilteredSource | 
| Deprecated Classes in 0.3 | |
| Input | - | 
| BackfillConfig | - | 
| MonitoringConfig | - | 
Feature View/Table Parameter Changes
| 0.3 Definition | 0.4 Definition | 
|---|---|
| inputs | sources | 
| name_override | name | 
| aggregation_slide_period | aggregation_interval | 
| timestamp_key | timestamp_field | 
| batch_cluster_config | batch_compute | 
| stream_cluster_config | stream_compute | 
| online_config | online_store | 
| offline_config | offline_store | 
| output_schema | schema | 
| family | - (removed) | 
| schedule_offset | - (removed, see DataSource data_delay) | 
| monitoring.alert_email (nested) | alert_email | 
| monitoring.monitor_freshness (nested) | monitor_freshness | 
| monitoring.expected_freshness (nested) | expected_freshness | 
Data Source Parameter Changes
| 0.3 Definition | 0.4 Definition | 
|---|---|
| timestamp_column_name | timestamp_field | 
| batch_ds_config | batch_config | 
| stream_ds_config | stream_config | 
| raw_batch_translator | post_processor | 
| default_watermark_delay_threshold | watermark_delay_threshold | 
| default_initial_stream_position | initial_stream_position | 
Materialization info in tecton plan
tecton plan will now print a summary of the backfill and incremental
materialization jobs that will result from applying a plan. This feature should
help users avoid applying changes that trigger more new jobs than expected.
$ tecton apply
...
  + Create FeatureView
    name:            user_transaction_counts
    owner:           matt@tecton.ai
    description:     User transaction totals over a series of time windows, updated daily.
    materialization: 10 backfills, 1 recurring batch job
    > backfill:      9 Backfill jobs 2020-10-03 00:00:00 UTC to 2022-04-14 00:00:00 UTC writing to the Offline Store
                     1 Backfill job 2022-04-14 00:00:00 UTC to 2022-06-06 00:00:00 UTC writing to both the Online and Offline Store
    > incremental:   1 Recurring Batch job scheduled every 1 day writing to both the Online and Offline Store