tecton.interactive.StreamDataSource¶

class tecton.interactive.StreamDataSource¶

StreamDataSource is an abstraction data over streaming data sources.

StreamFeatureViews and StreamWindowAggregateFeatureViews ingest data from StreamDataSources.

A StreamDataSource contains a stream data source config, as well as a batch data source config for backfills.

Methods

`get_dataframe`	Returns this data source’s data as a Tecton DataFrame.
`start_stream_preview`	Starts a streaming job to write incoming records from this DS’s stream to a temporary table with a given name.
`summary`	Displays a human readable summary of this data source.

get_dataframe(start_time=None, end_time=None, *, apply_translator=True)¶

Returns this data source’s data as a Tecton DataFrame.

Parameters

start_time (Union[DateTime, datetime, None]) – The interval start time from when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined if apply_translator is True.
end_time (Union[DateTime, datetime, None]) – The interval end time until when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined if apply_translator is True.
apply_translator (bool) – If True, the transformation specified by raw_batch_translator will be applied to the dataframe for the data source. apply_translator is not applicable to batch sources configured with spark_batch_config because it does not have a post_processor.

Returns

A Tecton DataFrame containing the data source’s raw or translated source data.

Raises

TectonValidationError – If apply_translator is False, but start_time or end_time filters are passed in.

start_stream_preview(table_name, *, apply_translator=True, option_overrides=None)¶

Starts a streaming job to write incoming records from this DS’s stream to a temporary table with a given name.

After records have been written to the table, they can be queried using spark.sql(). If ran in a Databricks notebook, Databricks will also automatically visualize the number of incoming records.

This is a testing method, most commonly used to verify a StreamDataSource is correctly receiving streaming events. Note that the table will grow infinitely large, so this is only really useful for debugging in notebooks.

Parameters

table_name (str) – The name of the temporary table that this method will write to.
apply_translator (bool) – Whether to apply this data source’s raw_stream_translator. When True, the translated data will be written to the table. When False, the raw, untranslated data will be written. apply_translator is not applicable to stream sources configured with spark_stream_config because it does not have a post_processor.
option_overrides (Optional[Dict[str, str]]) – A dictionary of Spark readStream options that will override any readStream options set by the data source. Can be used to configure behavior only for the preview, e.g. setting startingOffsets:latest to preview only the most recent events in a Kafka stream.

summary()¶: Displays a human readable summary of this data source.

Attributes

`columns`	Returns streaming DS columns if it’s present.
`created_at`	Returns the creation date of this Tecton Object.
`defined_in`	Returns filename where this Tecton Object has been declared.
`description`	The description of this Tecton Object, set by user.
`family`	Deprecated.
`id`	Returns a unique ID for the data source.
`is_streaming`	Whether or not it’s a StreamDataSource.
`name`	The name of this Tecton Object.
`owner`	The owner of this Tecton Object (typically the email of the primary maintainer.)
`tags`	Tags associated with this Tecton Object (key-value pairs of arbitrary metadata set by user.)
`workspace`	Returns the workspace this Tecton Object was created in.