Running Tecton in Production
This page provides an introduction and overview of key topics related to operating Tecton in a production environment.
Tecton is a platform for managing, serving, and monitoring ML features. Running Tecton in production requires coordinating several components, including:
- Feature repositories with feature definitions
- Feature data materialized in an offline store and online store
- REST API servers for low latency feature serving
- Monitoring and alerting to maintain data quality and uptime
Why Running in Production Matters​
As you deploy machine learning models to production, the features that serve as model input become a critical part of your infrastructure. Tecton provides capabilities to help you:
- Ensure high uptime and low latency for feature data
- Maintain data quality and integrity over time
- Track feature usage and model performance
- Control infrastructure costs by monitoring jobs and usage
Key Capabilities​
Some of the key capabilities Tecton offers for running in production include:
Scalable Feature Serving​
Tecton's REST API and gRPC servers are built to handle high query volumes with low latency. You can scale up the number of servers to handle up to 1M+ queries per second.
Monitoring and Alerting​
Tecton offers monitoring dashboards and alerting to help you maintain uptime and data quality. This includes:
- Uptime and error rate monitoring for the REST API
- Email alerts for failed feature materialization jobs
- Summary metrics on feature view usage and freshness
Cost Management Tools​
Tecton applies tags to resources in your cloud environment so you can view costs by feature view or workspace. Tecton also offers tools for suppressing rematerialization and optimizing cluster configurations to control infrastructure usage.
Integrations with CI/CD​
You can connect your Git repository containing Tecton configurations to CI/CD tools like GitHub Actions. This enables automatic deployment of new features to your production workspaces.
Best Practices​
To run Tecton successfully in production, we recommend following these best practices:
- Scale your feature serving capacity to handle your production traffic volumes
- Use monitoring and alerts to gain visibility into jobs, uptime, and data quality
- Follow recommended strategies for updating features, models, and feature services
- Manage your infrastructure costs by optimizing cluster configurations and suppressing unnecessary recomputations
Please see our full production best practices guide for more details.