Python Environments for On-Demand Feature Views
This feature is currently in Public Preview.
Build more powerful On-Demand Features like by leveraging popular Python packages available in Python Environments.
Python Environments for On-Demand Feature Views are isolated compute environments where transformations are run during Online feature retrieval. Specifying an environment enables the use of common Python libraries when building real-time features.
Available Python Environments​
Tecton publishes a set of Python Environments that include with common feature transformation packages.
Python Environments are identified by a name and a version number, such as
tecton-python-core:0.1
. By pinning your environment, you can be sure that your
transformation logic will continue to run reliably.
The following Python Environments are available for use:
tecton-python-core
is a lightweight environment with the minimal set of dependencies availabletecton-python-extended
offers a larger set of common feature transformation packages
The table below lists all available versions for these environments.
Environment | Date published |
---|---|
tecton-python-core:0.1 | 2023-07-26 |
tecton-python-extended:0.1 | 2023-07-26 |
Specifying Environments for On-Demand Feature Views and Feature Services​
The environments
parameter on an On-Demand Feature View definition specifies
the set of Environments that the transformation logic is compatible with. If the
dependency required for your feature view is available in multiple environments,
then you can include the set of environments in this list.
The on_demand_environment
on the Feature Service definition specifies the
single environment that will be used when running all On-Demand Feature Views in
that Feature Service during Online retrieval.
For example, let’s say we have:
- A Feature View with a dependency on
numpy
, which is available in bothtecton-python-core:0.1
andtecton-python-extended:0.1
. - A Feature View with a dependency on
fuzzywuzzy
, which is only available intecton-python-extended:0.1
- A Feature Service that contains both of these Feature Views
@on_demand_feature_view(
sources=[transaction_request, user_transaction_amount_metrics],
mode="python",
schema=output_schema,
environments=["tecton-python-core:0.1", "tecton-python-extended:0.1", "my-new-env"],
)
def my_on_demand_feature_view_with_numpy(request, user_metrics):
import numpy
...
@on_demand_feature_view(
sources=[search_request, product_description],
mode="python",
schema=output_schema,
environments=["tecton-python-extended:0.1", "my-new-env"],
)
def my_on_demand_feature_view_with_fuzzywuzzy(request, user_metrics):
import fuzzywuzzy
...
my_fs = FeatureService(
name="my_fs_with_extended_environment",
features=[my_on_demand_feature_view_with_numpy, my_on_demand_feature_view_with_fuzzywuzzy],
on_demand_environment="tecton-python-extended:0.1",
)
my_fs_2 = FeatureService(
name="my_fs_with_extended_environment:v2",
features=[my_on_demand_feature_view_with_numpy, my_on_demand_feature_view_with_fuzzywuzzy],
on_demand_environment="my-new-env",
)
Note that:
- If
environments
is not specified for an On-Demand Feature View, then it is assumed to be compatible with all environments. - At execution time, all On-Demand Feature Views in a Feature Service must be
run in the same Environment. As a result, the
on_demand_environment
specified by the Feature Service must be on theenvironments
list for all On-Demand Feature Views included in thefeatures
list (or, they don’t specify anyenvironments
). Conversely, if an On-Demand Feature View specifies anenvironments
constraint, then any Feature Service that includes the On-Demand Feature View must specify anon_demand_environment
on that list. - Configuring an
on_demand_environment
can have an impact onget-features
latency. See section below.
Configuring Notebook and Testing environments to be compatible with package requirements​
The Environment configurations above are used only during the online execution of On-Demand Feature Views. In order to develop and test these Feature Views in offline environments, you’ll need to make sure any relevant dependencies are installed.
Below are our suggestions on how to configure offline environments, but there are other ways to achieve the same goal of having the appropriate dependencies installed.
Installing dependencies in your Notebook environment​
- Databricks
- EMR
Install individual packages in your notebook with %pip install
. Alternatively,
copy the full set of dependencies for the relevant version into a
requirements.txt
file to install all the dependencies at once.
To install individual packages, see the documentation for installing PyPI packages in EMR notebooks.
Installing dependencies in your Unit Testing environment​
In order to run unit tests for your On-Demand Feature Views with specific Python dependencies, ensure that the local Python environment executing the unit test have the proper dependency versions installed.
Impact of using Environments on get-features
latency​
The total latency observed is highly dependent on the complexity of the
On-Demand Feature View transformation. For example, if the transformation
contains sleep(1)
, then it will take at least 1 second to run.
Configuring the on_demand_environment
for a Feature Service creates some
per-request overhead on top of the transformation time when calling that Feature
Service with the get-features
API.
Executing transformations with an environment typically adds 20-50ms on top of the transformation time. This latency will be higher if there is a sudden spike in traffic, as the service scales to match the new load.
If the On-Demand Feature View includes another Feature View as a source, then it must wait for the upstream Feature View to return before executing, making the latency additive. Otherwise, the On-Demand Feature View will be executed in parallel with other Feature Views in the Feature Service.
To inspect the impact of your On-Demand Feature Views on the total latency of
your get-features
request, you can compare the serverTimeSeconds
and
sloServerTimeSeconds
values in the metadataOptions
response object. The
serverTimeSeconds
value represents the entire time it took for Tecton to
fulfill the request, while the sloServerTimeSeconds
measurement removes time
spent on On-Demand Feature View execution.