tecton.interactive.StreamDataSource¶
-
class
tecton.interactive.
StreamDataSource
¶ StreamDataSource is an abstraction data over streaming data sources.
StreamFeatureViews and StreamWindowAggregateFeatureViews ingest data from StreamDataSources.
A StreamDataSource contains a stream data source config, as well as a batch data source config for backfills.
Methods
Returns this data source’s data as a Tecton DataFrame.
Starts a streaming job to write incoming records from this DS’s stream to a temporary table with a given name.
Displays a human readable summary of this data source.
-
get_dataframe
(start_time=None, end_time=None, *, apply_translator=True)¶ Returns this data source’s data as a Tecton DataFrame.
- Parameters
start_time (
Union
[DateTime
,datetime
,None
]) – The interval start time from when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined ifapply_translator
is True.end_time (
Union
[DateTime
,datetime
,None
]) – The interval end time until when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined ifapply_translator
is True.apply_translator (
bool
) – If True, the transformation specified byraw_batch_translator
will be applied to the dataframe for the data source.
- Returns
A Tecton DataFrame containing the data source’s raw or translated source data.
- Raises
TectonValidationError – If
apply_translator
is False, butstart_time
orend_time
filters are passed in.
-
start_stream_preview
(table_name, *, apply_translator=True, option_overrides=None)¶ Starts a streaming job to write incoming records from this DS’s stream to a temporary table with a given name.
After records have been written to the table, they can be queried using
spark.sql()
. If ran in a Databricks notebook, Databricks will also automatically visualize the number of incoming records.This is a testing method, most commonly used to verify a StreamDataSource is correctly receiving streaming events. Note that the table will grow infinitely large, so this is only really useful for debugging in notebooks.
- Parameters
table_name (
str
) – The name of the temporary table that this method will write to.apply_translator (
bool
) – Whether to apply this data source’sraw_stream_translator
. When True, the translated data will be written to the table. When False, the raw, untranslated data will be written.option_overrides (
Optional
[Dict
[str
,str
]]) – A dictionary of Spark readStream options that will override any readStream options set by the data source. Can be used to configure behavior only for the preview, e.g. settingstartingOffsets:latest
to preview only the most recent events in a Kafka stream.
-
summary
()¶ Displays a human readable summary of this data source.
Attributes
columns
Returns streaming DS columns if it’s present.
created_at
Returns the creation date of this Tecton Object.
defined_in
Returns filename where this Tecton Object has been declared.
description
The description of this Tecton Object, set by user.
family
Deprecated.
id
Returns a unique ID for the data source.
is_streaming
Whether or not it’s a StreamDataSource.
name
The name of this Tecton Object.
owner
The owner of this Tecton Object (typically the email of the primary maintainer.)
tags
Tags associated with this Tecton Object (key-value pairs of arbitrary metadata set by user.)
workspace
Returns the workspace this Tecton Object was created in.
-