Skip to main content
Version: 0.6

Creating Feature 2

In this topic, you will create and test the second feature, user_transaction_counts. This feature calculates the number of transactions (per user), over the last day, 30 days, and 90 days.

In your local feature repository, open the file features/batch_features/user_transaction_counts.py. In the file, uncomment the following code, which is a definition of the Feature View.

from tecton import batch_feature_view, FilteredSource, Aggregation
from entities import user
from data_sources.transactions import transactions
from datetime import datetime, timedelta


@batch_feature_view(
sources=[FilteredSource(transactions)],
entities=[user],
mode="spark_sql",
aggregation_interval=timedelta(days=1),
aggregations=[
Aggregation(column="transaction_id", function="count", time_window=timedelta(days=1)),
Aggregation(column="transaction_id", function="count", time_window=timedelta(days=30)),
Aggregation(column="transaction_id", function="count", time_window=timedelta(days=90)),
],
online=True,
offline=True,
feature_start_time=datetime(2021, 1, 1),
description="User transaction totals over a series of time windows, updated daily.",
name="user_transaction_counts",
)
def user_transaction_counts(transactions):
return f"""
SELECT
user_id,
transaction_id,
timestamp
FROM
{transactions}
"""

In your terminal, run tecton apply to apply this Feature View to your workspace.

The Feature View's transformation​

The aggregations parameter​

The @batch_feature_view decorator contains the aggregations parameter. The presence of this parameter indicates that a Feature View uses one or more built-in aggregations. Built-in aggregations are much easier to use than defining the equivalent aggregations on your own.

The aggregations parameter value specifies three Aggregation objects, which define three built-in aggregations. An Aggregation object takes three inputs: the column to perform the aggregation on, a function to apply to the column, and a time_window which is the time period that the aggregation runs against.

Further reading on using aggregations​

The transformation function​

Unlike the credit_card_issuer transformation function shown previously, the user_transaction_counts transformation function does not implement the transformation logic because its associated Feature View uses built-in aggregations.

The columns in the SELECT statement of the user_transaction_counts transformation function specify inputs to send to the Aggregations, as follows:

Column number in SELECT statementDescriptionSELECT column value for the user_transaction_counts function
1Column for the function in the Aggregation to group by. This is also the entity name. Entities are used as join keys when multiple features are joined together. You will see an example of this in part 2 of the tutorial.user_id
2The column value in the Aggregationtransaction_id
3The field name of the timestamp in the external data sourcetimestamp

Internally, the built-in Aggregation with time_window=timedelta(days=30) is translated into a SQL statement that is nearly equivalent to:

SELECT
user_id,
COUNT(transaction_id),
timestamp
FROM
{transactions}
WHERE
timestamp >= [start timestamp of the current materialization time window] - INTERVAL 30 DAYS,
AND timestamp < [end timestamp of the current materialization time window]
GROUP BY user_id

Feature View output​

When the Feature View runs, it outputs each aggregation in the following format.

<column name in the Aggregation>_<function name in the Aggregation>_<time_window value in Aggregation>_<aggregation_interval value>

For example, when the user_transaction_counts Feature View runs, the column name for the 30 day aggregation is transaction_id_count_30d_1d. You will see the output for all of the Feature View columns when testing the Feature View, in the next section.

Test the Feature View​

To test the Feature View interactively, follow these steps. Note that a unit test is not shown.

In your notebook, get the Feature View from the workspace:

fv = ws.get_feature_view("user_transaction_counts")

In your notebook, call the run method of the Feature View to get feature data for the timestamp range of 2022-1-1 to 2022-4-10, and display the generated feature values.

offline_features = fv.run(datetime(2022, 1, 1), datetime(2022, 4, 10)).to_spark().limit(10)
offline_features.show()

Sample Output:

user_idtimestamptransaction_id_count_1d_1dtransaction_id_count_30d_1dtransaction_id_count_90d_1d
user_1313404710602022-01-02 00:00:00111
user_1313404710602022-01-03 00:00:00122
user_1313404710602022-01-04 00:00:00133

Materialization scheduling​

The aggregation_interval specifies how often to run the materialization jobs for Feature Views that use built-in aggregations.

Was this page helpful?

Happy React is loading...