Setup
Before you begin, we recommend reading the Tecton Concepts page for an understanding of key Tecton concepts and product interfaces.
Follow these steps to set up your environment prior to beginning the tutorial.
Connect Tecton to a data platform (if not already done so)​
If your Tecton instance is not already connected to a data platform (Databricks or EMR), follow these steps:
Databricks​
Follow the steps in Configuring Databricks.
Follow the steps in Connecting Databricks Notebooks.
EMR​
Follow the steps in Configuring EMR.
Follow the steps in Connecting EMR Notebooks up to, but not including, the section Configure the notebook.
Install the Tecton CLI​
If you have not already done so, install the Tecton CLI on your local machine.
Clone the tutorial feature repository​
Clone the tutorial feature repository, which is located on Github.
This repository contains the skeleton structure for the feature pipelines you will build, and is incomplete. As you work through this tutorial, you will add code for the features and other feature-related code.
Log in to your Tecton account with the Tecton CLI​
In your terminal, run:
tecton login <account-name>.tecton.ai
Replace <account-name>
with your Tecton account name. This is the same as the
URL for your Web UI.
Logging into your account will allow you to create and manage workspaces, apply feature repo definitions to a workspace, manage API keys, and more.
Create a workspace​
You will initially create a development workspace to use for testing your features during early development. Later, you will create a live workspace to productionize features.
Run the following command to create a development workspace, replacing
<workspace name>
with a name of your choosing, such as
tecton-fundamentals-tutorial
.
tecton workspace create <workspace name>
Create a new notebook​
Create a new notebook in Databricks or EMR.
Install JARs in your notebook (EMR only)​
If using EMR, configure the notebook by installing the Tecton JARs.
Import necessary modules in your notebook​
In your notebook, run:
- Databricks
- EMR
%pip install mlflow
sc.install_pypi_package("scikit-learn")
You may receive an error that the scikit-learn
has other dependent packages.
If so, install those packages (using sc.install_pypi_package(...)
) and then
run sc.install_pypi_package("scikit-learn")
again.
In your next notebook cell, run:
sc.install_pypi_package("aws-secretsmanager-caching")
Then, run the following in the next notebook cell:
import tecton
import pandas as pd
from datetime import datetime, timedelta
Retrieve a reference to your new workspace​
In your next notebook cell, run:
ws = tecton.get_workspace("<workspace name>")
This workspace object (ws
) will be used to read feature definitions from the
workspace.