Skip to main content
Version: 0.5

Testing Batch Features

Import libraries and select your workspace​

import tecton
import pandas
from datetime import datetime, timedelta

ws = tecton.get_workspace("prod")

Load a Batch Feature View​

fv = ws.get_feature_view("user_transaction_counts")
fv.summary()

Run a Feature View transformation pipeline​

The BatchFeatureView::run function can be used to dry run execute a Feature View transformation pipeline over a given time range. This can be useful for checking the output of your feature transformation logic or debugging a materialization job.

caution

There is no guarantee that the output data is the same as the feature values that would be created in this time frame, such as in the following cases:

  • When using incremental backfills, feature data for a given time range may depend on multiple executions of the Feature view transformation pipeline.
  • Feature values may be dependent on scheduling information (e.g. batch_schedule, data_delay, feature_start_time) that doesn't match the start_time and end_time you provide.
  • Aggregations may require more input data that the window you provide with start_time and end_time.

If you want to produce feature values for a given time range, you should use get_historical_feature(start_time, end_time).

result_dataframe = fv.run(start_time=datetime(2021, 1, 1), end_time=datetime(2022, 1, 2)).to_pandas()
display(result_dataframe)
user_idsignup_timestampcredit_card_issuer
0user_6000032784852021-01-01 06:25:57other
1user_4699984415712021-01-01 07:16:06Visa
2user_5025676046892021-01-01 04:39:10Visa
3user_9306919581072021-01-01 10:52:31Visa
4user_7825107887082021-01-01 20:15:25other

Run with mock sources​

Mock input data sources can be passed into the BatchFeatureView::run function using the same source names from the Feature View definition.

users_data = pandas.DataFrame(
{
"user_id": ["user_1", "user_1", "user_2"],
"cc_num": ["423456789012", "567890123456", "678901234567"],
"signup_timestamp": [
datetime(2022, 1, 1, 2),
datetime(2022, 1, 1, 4),
datetime(2022, 1, 1, 3),
],
}
)

result_dataframe = fv.run(
start_time=datetime(2022, 1, 1),
end_time=datetime(2022, 1, 2),
users=users_data, # `users` is the name of this FeatureView input.
).to_pandas()

display(result_dataframe)
user_idsignup_timestampcredit_card_issuer
0user_12022-01-01 02:00:00Visa
1user_12022-01-01 04:00:00MasterCard
2user_22022-01-01 03:00:00Discover

Run a Batch Feature View with tiled aggregations​

BatchFeatureView::run for feature views with aggregations is quite similar to with the only different that it also supports aggregation_level parameter.

When a feature view with tile aggregates, the query operates in three logical steps:

  1. The feature view query is run over the provided time range. The user defined transformations are applied over the data source.
  2. The result of #1 is aggregated into tiles the size of the aggregation_interval.
  3. The tiles from #2 are combined to form the final feature values. The number of tiles that are combined is based off of the time_window of the aggregation.

To see the output of #1, use aggregation_level="disabled". For #2, use aggregation_level="partial". For #3, use aggregation_level="full".

aggregation_level="full" is the default behavior.

For more details on aggregate_tiles, refer to Creating Features that use Time-Windowed Aggregations.

agg_fv = ws.get_feature_view("user_transaction_counts")

result_dataframe = agg_fv.run(
start_time=datetime(2022, 5, 1),
end_time=datetime(2022, 5, 2),
aggregation_level="disabled",
).to_pandas()

display(result_dataframe)
user_idtransactiontimestamp
0user_22250678998412022-05-01 21:04:38
1user_2699081696812022-05-01 19:45:14
2user_33775031741212022-05-01 15:18:48
3user_33775031741212022-05-01 07:11:31
4user_33775031741212022-05-01 01:50:51
result_dataframe = agg_fv.run(
start_time=datetime(2022, 5, 1),
end_time=datetime(2022, 5, 2),
aggregation_level="partial",
).to_pandas()

display(result_dataframe)
user_idtransaction_count_1dtile_start_timetile_end_time
0user_22250678998412022-05-01 00:00:002022-05-02 00:00:00
1user_2699081696812022-05-01 00:00:002022-05-02 00:00:00
2user_33775031741242022-05-01 00:00:002022-05-02 00:00:00
3user_40253984590122022-05-01 00:00:002022-05-02 00:00:00
4user_46161596668512022-05-01 00:00:002022-05-02 00:00:00
end = datetime(2022, 5, 2)

result_dataframe = agg_fv.run(
start_time=end
- timedelta(days=90), # Note: to get an interesting "full" aggregation, we need to provide adequate input data.
end_time=end,
aggregation_level="full",
).to_pandas()

display(result_dataframe)
user_idtimestamptransaction_count_1d_1dtransaction_count_30d_1dtransaction_count_90d_1d
0user_1313404710602022-04-30 00:00:001622
1user_1313404710602022-04-23 00:00:001621
2user_1313404710602022-04-18 00:00:001720
3user_1313404710602022-04-15 00:00:002719
4user_1313404710602022-04-08 00:00:001617

Get a Range of Feature Values from the Offline Store​

BatchFeatureView::get_historical_features can read a range of featue values from the offline store between a given start_time and end_time.

from_source=True can be passed in in order to bypass the offline store and compute features on-the-fly against the raw data source. This is useful for testing the expected output of feature values.

Use from_source=False (default) to see what data is materialized in the offline store.

result_dataframe = fv.get_historical_features(
start_time=datetime(2022, 5, 1), end_time=datetime(2022, 5, 2)
).to_pandas()
display(result_dataframe)
user_idtimestamptransaction_count_1d_1dtransaction_count_30d_1dtransaction_count_90d_1d_effective_timestamp
0user_2051257466822022-05-01 00:00:0028342022-05-01 00:00:00
1user_2225067899842022-05-01 00:00:001421412022-05-01 00:00:00
2user_2685148449662022-05-01 00:00:00129662022-05-01 00:00:00
3user_3944957590232022-05-01 00:00:00121682022-05-01 00:00:00
4user_4598428899562022-05-01 00:00:00114392022-05-01 00:00:00

Read the Latest Features from Online Feature Store​

danger

For performance reasons, this function should only be used for testing and not in a production environment. To read features online efficiently, see Reading Features for Inference

fv.get_online_features({"user_id": "user_609904782486"}).to_dict()
Out: {
"transaction_count_1d_1d": 1,
"transaction_count_30d_1d": 17,
"transaction_count_90d_1d": 56,
}

Read Historical Features from Offline Feature Store with Time-Travel​

Create a spine DataFrame with events to look up. For more information on spines, check out Selecting Sample Keys and Timestamps.

spine_df = pandas.DataFrame(
{
"user_id": ["user_722584453020", "user_461615966685"],
"timestamp": [datetime(2022, 5, 1, 3, 20, 0), datetime(2022, 6, 6, 2, 30, 0)],
}
)
display(spine_df)
user_idtimestamp
0user_7225844530202022-05-01 03:20:00
1user_4616159666852022-06-06 02:30:00

from_source=True can be passed in in order to bypass the offline store and compute features on-the-fly against the raw data source. However, this will be slower than reading feature data that has been materialized to the offline store.

result_dataframe = fv.get_historical_features(spine_df, from_source=True).to_pandas()
display(result_dataframe)
user_idtimestampuser_transaction_counts__transaction_count_1d_1duser_transaction_counts__transaction_count_30d_1duser_transaction_counts__transaction_count_90d_1d
0user_4616159666852022-06-06 02:30:0001340
1user_7225844530202022-05-01 03:20:0002873

Was this page helpful?

Happy React is loading...