Skip to main content
Version: Beta 🚧

The Feature Development Workflow

In Tecton, features are developed and tested in a notebook and then productionised as code with a GitOps workflow.

This gives the benefit of fast iteration speed in a notebook, while preserving DevOps best practices for productionisation like version control, code reviews, and CI/CD.

A typical development workflow for building a feature and testing it in a training data set looks like this:

  1. Create and validate a new feature definition in a notebook
  2. Run the feature pipeline interactively to ensure correct feature data
  3. Fetch a set of registered features from a workspace and create a new feature set
  4. Generate training data to test the new feature in a model
  5. Copy the new feature definition into your feature repo
  6. Apply your changes to a live production workspace via the Tecton CLI or a CI/CD pipeline
note

If you do not need to test the feature in a model, then you would skip steps 3 and 4 above.

This page will walk through these steps in detail.

If you have not already done so, install the Tecton CLI on your local machine and in your notebook environment. Also ensure that your notebook has access to the relevant compute for your Feature Views.

1. Create and validate a local feature definition in a notebook​

Any Tecton object can be defined and validated in a notebook. We call these definitions "local objects".

Simply write the definition in a notebook cell and call .validate() on the object. Tecton will ensure the definition is correct and run automatic schema validations on feature pipelines.

from tecton import Entity, BatchSource, FileConfig, batch_feature_view, FilteredSource
from datetime import datetime, timedelta

user = Entity(name="user", join_keys=["user_id"])

user_sign_ups = BatchSource(
name="user_sign_ups",
batch_config=FileConfig(
uri="s3://tecton.ai.public/tutorials/fraud_demo/customers/data.pq",
file_format="parquet",
timestamp_field="signup_timestamp",
),
)


@batch_feature_view(
sources=[FilteredSource(user_sign_ups)],
entities=[user],
mode="spark_sql",
batch_schedule=timedelta(days=1),
ttl=timedelta(days=3650),
)
def user_credit_card_issuer(user_sign_ups):
return f"""
SELECT
user_id,
signup_timestamp,
CASE SUBSTRING(CAST(cc_num AS STRING), 0, 1)
WHEN '3' THEN 'AmEx'
WHEN '4' THEN 'Visa'
WHEN '5' THEN 'MasterCard'
WHEN '6' THEN 'Discover'
ELSE 'other'
END as credit_card_issuer
FROM
{user_sign_ups}
"""

Tecton objects must first be validated before they can be queried interactively. You can either explicitly validate objects with .validate() or call tecton.set_validation_mode('auto') once in your notebook for automatic lazy validations at the time of usage.

user_credit_card_issuer.validate()  # or call tecton.set_validation_mode('auto')

Depending on remote objects​

Remote objects are ones that are registered with workspaces. During development, local objects can depend on remote objects fetched from a workspace as part of their definitions.

For example, your team's production workspace may include a registered data source and entity that you want to build features off of. There is no need to rewrite those definitions in the notebook. Instead simply fetch them from the workspace and use them in your new feature definition. For example:

from tecton import Entity, BatchSource, FileConfig, batch_feature_view, FilteredSource
from datetime import datetime, timedelta

# Fetch the workspace
ws = tecton.get_workspace("prod")

# Fetch objects from the workspace
user = ws.get_entity("user")
user_sign_ups = ws.get_data_source("user_sign_ups")

# Use those objects as dependencies
@batch_feature_view(
sources=[FilteredSource(user_sign_ups)],
entities=[user],
mode="spark_sql",
batch_schedule=timedelta(days=1),
ttl=timedelta(days=3650),
)
def user_credit_card_issuer(user_sign_ups):
return f"""
SELECT
user_id,
signup_timestamp,
CASE SUBSTRING(CAST(cc_num AS STRING), 0, 1)
WHEN '3' THEN 'AmEx'
WHEN '4' THEN 'Visa'
WHEN '5' THEN 'MasterCard'
WHEN '6' THEN 'Discover'
ELSE 'other'
END as credit_card_issuer
FROM
{user_sign_ups}
"""

2. Test objects interactively​

Interactive methods can be called on objects to test their output or get additional information. Refer to the SDK Reference for available methods on Tecton objects.

start = datetime(2017, 1, 1)
end = datetime(2022, 1, 1)

# Get a range of historical feature data
df = user_credit_card_issuer.get_historical_features(start_time=start, end_time=end)

display(df.to_pandas())
user_idsignup_timestampcredit_card_issuer_effective_timestamp
0user_7094621964032017-04-06 00:50:31Visa2017-04-07 00:00:00
1user_6879584520572017-05-08 16:07:51Discover2017-05-09 00:00:00
2user_8842403872422017-06-15 19:33:18other2017-06-16 00:00:00
3user_2051257466822017-09-03 03:42:14AmEx2017-09-04 00:00:00
4user_9504822394212017-09-08 19:26:25Visa2017-09-09 00:00:00

3. Fetch a set of registered features from a workspace and create a new feature set​

After creating a new feature you may want to test it in a new feature set for a model. You can do this by creating a local Feature Service object. As needed, additional features can be fetched from a workspace and added to the new Feature Service.

Commonly you may want to fetch a feature set from an existing Feature Service and add your new feature to it. You can get the list of features in a Feature Service by calling .features on it and then include that list in a new local Feature Service.

from tecton import FeatureService

ws = tecton.get_workspace("prod")
features_list = ws.get_feature_service("fraud_detection").features

fraud_detection_v2 = FeatureService(name="fraud_detection_v2", features=features_list + [user_credit_card_issuer])
fraud_detection_v2.validate()
note

Tecton objects are immutable and therefore a new local Feature Service is created.

4. Generate training data to test the new feature in a model​

Training data can be generated for a list of training events by calling get_historical_features(spine=training_events) on a Feature Service. Tecton will join in the historically accurate value of each feature for each event in the provided spine.

Feature values will be fetched from the Offline Store if they have been materialized offline and computed on the fly if not.

training_events = spark.read.parquet("s3://tecton.ai.public/tutorials/fraud_demo/transactions/")
training_data = fraud_detection_v2.get_historical_features(spine=training_events)

display(training_data.to_pandas())
user_idtimestampmerchantamtis_frauduser_transaction_amount_averages__amt_mean_1d_1duser_transaction_amount_averages__amt_mean_3d_1duser_transaction_amount_averages__amt_mean_7d_1duser_credit_card_issuer__credit_card_issuertransaction_amount_is_higher_than_average__transaction_amount_is_higher_than_average
0user_1313404710602021-01-01 10:44:12Spencer-Runolfsson332.610nannannanVisaTrue
1user_1313404710602021-01-04 22:48:21Schroeder, Hauck and Treutel105.331nan332.61332.61VisaFalse
2user_1313404710602021-01-05 15:14:06O'Reilly, Mohr and Purdy15.390105.33105.33218.97VisaFalse
3user_1313404710602021-01-06 02:51:49Donnelly PLC66.07015.3960.36151.11VisaFalse
4user_1313404710602021-01-07 00:59:43Huel Ltd113.63066.0762.2633129.85VisaFalse

5. Copy definitions into your team's Feature Repository​

Objects and helper functions can be copied directly into a Feature Repository for productionisation. References to remote workspace objects should be changed to local definitions in the repo.

note

You do not need to call .validate() in a Feature Repo. Validation will be run on all objects during tecton apply

features/user_credit_card_issuer.py

from tecton import Entity, BatchSource, FileConfig, batch_feature_view
from datetime import datetime, timedelta

# Change to local object references
from entities import user
from data_sources import user_sign_ups

# Set materialization parameters to materialize this feature online and offline
@batch_feature_view(
sources=[user_sign_ups],
entities=[user],
mode="spark_sql",
batch_schedule=timedelta(days=1),
ttl=timedelta(days=3650),
online=True,
offline=True,
feature_start_time=datetime(2017, 1, 1),
)
def user_credit_card_issuer(user_sign_ups):
return f"""
SELECT
user_id,
signup_timestamp,
CASE SUBSTRING(CAST(cc_num AS STRING), 0, 1)
WHEN '3' THEN 'AmEx'
WHEN '4' THEN 'Visa'
WHEN '5' THEN 'MasterCard'
WHEN '6' THEN 'Discover'
ELSE 'other'
END as credit_card_issuer
FROM
{user_sign_ups}
"""

feature_services/fraud_detection.py

from tecton import FeatureService
from features.user_transaction_amount_averages import user_transaction_amount_averages
from features.transaction_amount_is_higher_than_average import transaction_amount_is_higher_than_average
from features.user_credit_card_issuer import user_credit_card_issuer

fraud_detection = FeatureService(
name="fraud_detection", features=[user_transaction_amount_averages, transaction_amount_is_higher_than_average]
)

# Add the new Feature Service and change the feature list to local feature references
fraud_detection_v2 = FeatureService(
name="fraud_detection:v2",
features=[transaction_amount_is_higher_than_average, user_transaction_amount_metrics, user_credit_card_issuer],
)

6. Apply your changes to a live production workspace via the Tecton CLI or a CI/CD pipeline​

Feature repositories get "applied" to workspaces using the Tecton CLI in order to register definitions and productionise feature pipelines.

To apply changes, follow these steps in your terminal:

  1. Log into your organization's Tecton account using tecton login my-org.tecton.ai
  2. Select the workspace you want to apply you changes to using tecton workspace select [workspace-name]
  3. Run tecton plan to check the changes that would be applied to the workspace.
  4. Run tecton apply to apply your changes.
tip

Many organizations integrate the "tecton apply" step into their their CI/CD pipelines. This means that rather than using tecton apply directly in the CLI, you may simply create a git PR for your changes.

Development Workspaces​

If you want to save your changes in Tecton without spinning up production services you can apply your repo to a "development" workspace.

Development workspaces do not have any compute or serving resources and therefore do not incur any cost. They can primarily be used to visualize feature pipelines and share your work.

To create a development workspace, run tecton workspace create [my-workspace] in your CLI.

Was this page helpful?

Happy React is loading...