0.5 to 0.6 Upgrade Guide
Sunsetting Python 3.7 supportβ
Starting in 0.6, the Tecton SDK and CLI no longer run in Python 3.7 environments. The Tecton SDK and CLI retain compatibility with Python 3.8 and Python 3.9.
β οΈ In some rare cases, updating Python versions can cause Tecton to identify
unexpected diff in transformation logic. In these scenarios, itβs typically safe
to use the --suppress-recreates
option to override the diff. Tecton recommends
updating your Python version separately from your Tecton SDK version. For
example, if you are currently using Python 3.7 with Tecton 0.5, you could first
update to Python 3.8, and then perform the Tecton 0.6 upgrade.
Sample Upgrade Process for Feature Repositoriesβ
This pull request
shows the upgrade process from 0.5.5
to 0.6
for a sample Feature Repository.
Breaking changes to Feature Repositoriesβ
Changes to default feature names when using the last_distinct()
aggregationβ
Impact: Feature Views using the last_distinct()
aggregation will cause a
tecton plan
error unless feature names are explicitly defined.
With the introduction of the last()
aggregation function, Tecton has changed
the default feature name for last_distinct()
aggregations to avoid confusion
between the two functions.
Previously, when using the last_distinct()
aggregation and not specifying the
name
argument, the default name would be set based on the number of values to
be returned, the aggregation time window, and the aggregation interval. For
example, the following Aggregation definition would result in a feature column
named my_column_lastn_15_7d_1d
.
@batch_feature_view(
# ...
aggregations=[Aggregation(column="my_column", function=last_distinct(15), time_window=datetime.timedelta(days=7))],
aggregation_interval=timedelta(days=1),
)
def my_fv(data_source):
pass
In 0.6, the new default name will be my_column_last_distinct_15_7d_1d
.
To upgrade to 0.6 when you used the default feature name previously, set the
name
argument to match the legacy naming convention. For example:
@batch_feature_view(
# ...
aggregations=[
Aggregation(
column="my_column",
function=last_distinct(15),
time_window=datetime.timedelta(days=7),
name="my_column_lastn_15_7d_1d",
)
],
aggregation_interval=timedelta(days=1),
)
def my_fv(data_source):
pass
If the explicitly set name matches the existing one, then no difference should
show during tecton plan
.
If you do not set the name
parameter, you will see an error during the upgrade
process.
$ tecton plan
Using workspace "prod" on cluster https://your-instance.tecton.ai
β
Imported 47 Python modules from the feature repository
β
Collecting local feature declarations
β Performing server-side feature validation: Finished generating plan.
Errors in `user_recent_transactions`(FeatureView) while changing SDK from 0.5.5 to 0.6.0. The default aggregation column name was changed in this SDK from:
amt_lastn10_1h_10m -> amt_last_distinct_10_1h_10m,
please explicitly set 'name' to the legacy name to avoid rematerializing the feature view, such as Aggregation(..., name="amt_lastn10_1h_10m")
=================== StreamFeatureView user_recent_transactions declared in fraud/features/stream_features/last_transactions.py ===================
0025: def user_recent_transactions(transactions):
0026: return f'''
0027: SELECT
0028: user_id,
0029: cast(amt as string) as amt,
0030: timestamp
0031: FROM
0032: {transactions}
0033: '''
0034:
Feature View Unit Testing Changesβ
Use Tecton SDK version 0.6.5 or higher when upgrading to test_run()
.
Impact: Unit tests run during tecton plan
will fail unless updated to use
the new interfaces.
Tecton has made a few minor changes to methods used for running unit tests:
FeatureView.run()
has been renamed toFeatureView.test_run()
. This new name helps differentiate between the method for unit testing and the method for interactive execution in notebook environments.start_time
andend_time
are now required parameters forBatchFeatureView.test_run()
andStreamFeatureView.test_run()
.FeatureView.test_run()
does not have aspark
parameter for specifying the Spark session. By default,FeatureView.test_run()
will use the Tecton-defined Spark session. You can override the Spark session withtecton.set_tecton_spark_session()
.- Some internal changes were made to ensure the unit testing code path appropriately reflects the production code path. Itβs possible some minor changes in behavior will cause tests to fail.
- The old
FeatureView.run()
returned spark dataframes whereasFeatureView.test_run()
returns tecton dataframes. So you may need to migrate code likefv.run().toPandas()
tofv.run().to_pandas()
. - Some internal changes were made to processing the data sources schema when
running unit tests. Some data source schema fields (like
batch_config.timestamp_field
andbatch_config.date_time_partition_columns
) are required to be set in the mock data.
See the Unit Testing guide for more details on how to write unit tests with Tecton 0.6.
To upgrade:
- Replace use of
FeatureView.run()
withFeatureView.test_run()
. - Add
start_time
andend_time
parameters to thetest_run()
call for batch and stream feature views. Typically the time range should span your test data. - If you were using the Tecton-provided Spark session, remove use of the
spark
parameter from thetest_run()
call. - If you were initializing a Spark session separately, use the
tecton.set_tecton_spark_session()
method prior to thetest_run()
call. - You should ensure that the mock data schema matches the schema of the real
data source schema. Fields that are used in the Feature View (e.g. in the SQL
query) or in the Data Source config (e.g.
timestamp_field
anddate_time_partition_columns
) are required to be provided. - Since
test_run()
now returns aTectonDataFrame
, you'll need to convert your results to spark or pandas dataframes usingtecton_df.to_spark()
andtecton_df.to_pandas()
respectively.
For Tecton on Snowflake, SnowflakeConfig
definitions no longer allow database
or schema
when they are using the query
parameterβ
Impact: tecton plan
will fail if SnowflakeConfig
specifies both query
and database
or schema
parameters.
Previously, declaring a SnowflakeConfig
for use with a BathDataSource
for
Tecton on Snowflake required always setting the database
and schema
parameters, even though they were ignored when using the query
parameter
instead. Now those parameters are not allowed to be used together.
To upgrade, remove the database
and schema
parameters from your
SnowflakeConfig
definition.
During tecton plan
, you will see an Upgrade operation. This upgrade will not
cause any operational impact.
~ Upgrade Batch Data Source name: transactions snowflake_ds_config.database: TECTON_DEMO_DATA -> snowflake_ds_config.schema: FRAUD ->
temp_s3
parameter is removed from RedshiftConfig
β
Impact: RedshiftConfig
definitions using the temp_s3
parameter will cause
an error during tecton plan
.
The temp_s3
parameter previously did not have any effect since this was moved
to a backend, cluster-level configuration. Removing the parameter does not have
any operational impact.
Non-breaking changes to Feature Repositoriesβ
prevent_destroy
tag is now a top-level parameterβ
Impact: tecton plan
will show a warning if a Feature View or Feature Service
uses a tag parameter with a prevent_destroy
attribute. This will become an
error in future versions.
Previously, adding tags={"prevent_destroy": "true"}
would fail a tecton plan
that
caused a recreate to the object.
Now you can achieve the same functionality by using the prevent_destroy
parameter for Feature View or Feature Service definitions.
New default databricks_version
and emr_version
parameters.β
Impact: Unless modified, materialization jobs will begin to run on new Databricks Runtimes or EMR versions.
During your first tecton plan
using 0.6, Tecton on Databricks and Tecton on
EMR users will see an Update to the new Databricks runtimes and EMR versions,
respectively.
If you did not previously specify batch_compute
or stream_compute
, you will
see a warning about the Spark version change.
~ Update Stream Feature View
name: user_ad_impression_counts
owner: matt@tecton.ai
description: The count of impressions between a given user and a given ad
warning: Changing spark version for stream materialization from 9.1.x-scala2.12 to 10.4.x-scala2.12. Though uncommon, feature computation behavior could change across different versions.
If you did specify the batch_compute
or stream_compute
, then there will also
be a diff showing the pinned_spark_version
change.
~ Update Stream Feature View
name: content_keyword_click_counts
owner: ravi@tecton.ai
description: The count of ad impressions for a content_keyword
batch_compute.new_databricks.pinned_spark_version: -> 10.4.x-scala2.12
stream_compute.new_databricks.pinned_spark_version: -> 10.4.x-scala2.12
warning: Changing spark version for batch materialization from 9.1.x-scala2.12 to 10.4.x-scala2.12. Though uncommon, feature computation behavior could change across different versions.
warning: Changing spark version for stream materialization from 9.1.x-scala2.12 to 10.4.x-scala2.12. Though uncommon, feature computation behavior could change across different versions.
If you would like remain on the prior Spark version,
specify the databricks_version
or emr_version
parameter.
Otherwise no action is needed.
aggregation_mode
is deprecated; use stream_processing_mode
β
Impact: No behavior change in 0.6.
In versions prior to 0.6, Stream Feature Views used
aggregation_mode=AggregationMode.TIME_INTERVAL
or
aggregation_mode=AggregationMode.CONTINUOUS
to configure using
sliding or continuous aggregations.
Now that continuous processing is available for Stream Feature Views without
aggregations, the aggregation_mode
parameter is being deprecated and replaced
by
stream_processing_mode.
If you did not explicitly set aggregation_mode
, then this change has no impact
on your repository.
To upgrade:
- Update imports for relevant Feature View definitions:
from tecton import AggregationMode
tofrom tecton import StreamProcessingMode
- If your Feature View has set
aggregation_mode=AggregationMode.TIME_INTERVAL
, replace it withstream_processing_mode=StreamProcessingMode.TIME_INTERVAL
. - If your Feature View has set
aggregation_mode=AggregationMode.CONTINUOUS
, replace it withstream_processing_mode=StreamProcessingMode.CONTINUOUS
.
If done correctly, no difference will show when running tecton plan
.
Import changed for materialization_contextβ
Impact: tecton plan
will show an Upgrade for relevant Transformations. No
operational effect.
During tecton plan
, you will see an Upgrade operation for any Transformation
that referenced the materialization_context
. The diff will show a change to
the imports automatically configured by Tecton; no action is needed.
~ Upgrade Transformation
name: x
user_function.body:
-from tecton_spark.materialization_context import materialization_context
+from tecton_core.materialization_context import materialization_context
def x(ds, materialization_context=materialization_context()):
return f'select * from {ds}'
Breaking changes to interactive SDK Objectsβ
Impact: Attempting to reference any of these properties/attributes with the 0.6 SDK will cause an error.
Removed properties for Feature Viewsβ
Removed Property | Replacement |
---|---|
FeatureView.features | FeatureView.get_feature_columns() |
FeatureView.timestamp_field | FeatureView.get_timestamp_field() |
FeatureView.is_on_demand | isinstance(feature_view, OnDemandFeatureView) |
FeatureView.is_temporal | isinstance(fv, (BatchFeatureView, StreamFeatureView)) and len(fv.aggregations) == 0 |
FeatureView.is_temporal_aggregate | isinstance(fv, (BatchFeatureView, StreamFeatureView)) and len(fv.aggregations) > 0 |
Removed properties for Feature Servicesβ
Removed Property | Replacement |
---|---|
FeatureService.features | FeatureService.get_feature_columns() |
Note that FeatureServices.features has been repurposed to return List[FeatureReference]. | |
FeatureService.logging | None |
FeatureService.feature_views | None. See the example below to obtain the list of distinct Feature Views in a Feature Service. |
feature_views = [ref.feature_definition for ref in feature_service.features]
deduplicated_feature_views = set(fvs)
Removed properties for Data Sourcesβ
Removed Property | Replacement |
---|---|
DataSource.columns | DataSource.get_columns() |
Non-breaking changes to interactive SDK Objectsβ
Deprecated properties for Feature Viewsβ
Deprecated Property | Replacement |
---|---|
FeatureView.max_data_delay | FeatureView.max_source_data_delay |
Deprecated properties for Data Sourcesβ
Removed Property | Replacement |
---|---|
DataSource.is_streaming | is_instance(ds, StreamSource) |
Non-breaking CLI Changesβ
tecton api-key
is deprecated; use tecton service-account
β
The tecton service-account
command introduced in 0.6 is the preferred way to
create and manage Service Accounts from the command line.
tecton api-key
can still be used to create Service Accounts, but may be
missing some options, such as configuring the Service Account name.
tecton api-key
is deprecated in 0.6 and will be removed in future Tecton
versions.