tecton.interactive.StreamDataSource¶
-
class
tecton.interactive.StreamDataSource¶ StreamDataSource is an abstraction data over streaming data sources.
StreamFeatureViews and StreamWindowAggregateFeatureViews ingest data from StreamDataSources.
A StreamDataSource contains a stream data source config, as well as a batch data source config for backfills.
Methods
Returns this data source’s data as a Tecton DataFrame.
Starts a streaming job to write incoming records from this DS’s stream to a temporary table with a given name.
Displays a human readable summary of this data source.
-
get_dataframe(start_time=None, end_time=None, *, apply_translator=True)¶ Returns this data source’s data as a Tecton DataFrame.
- Parameters
start_time (
Union[DateTime,datetime,None]) – The interval start time from when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined ifapply_translatoris True.end_time (
Union[DateTime,datetime,None]) – The interval end time until when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined ifapply_translatoris True.apply_translator (
bool) – If True, the transformation specified byraw_batch_translatorwill be applied to the dataframe for the data source.apply_translatoris not applicable to batch sources configured withspark_batch_configbecause it does not have apost_processor.
- Returns
A Tecton DataFrame containing the data source’s raw or translated source data.
- Raises
TectonValidationError – If
apply_translatoris False, butstart_timeorend_timefilters are passed in.
-
start_stream_preview(table_name, *, apply_translator=True, option_overrides=None)¶ Starts a streaming job to write incoming records from this DS’s stream to a temporary table with a given name.
After records have been written to the table, they can be queried using
spark.sql(). If ran in a Databricks notebook, Databricks will also automatically visualize the number of incoming records.This is a testing method, most commonly used to verify a StreamDataSource is correctly receiving streaming events. Note that the table will grow infinitely large, so this is only really useful for debugging in notebooks.
- Parameters
table_name (
str) – The name of the temporary table that this method will write to.apply_translator (
bool) – Whether to apply this data source’sraw_stream_translator. When True, the translated data will be written to the table. When False, the raw, untranslated data will be written.apply_translatoris not applicable to stream sources configured withspark_stream_configbecause it does not have apost_processor.option_overrides (
Optional[Dict[str,str]]) – A dictionary of Spark readStream options that will override any readStream options set by the data source. Can be used to configure behavior only for the preview, e.g. settingstartingOffsets:latestto preview only the most recent events in a Kafka stream.
-
summary()¶ Displays a human readable summary of this data source.
Attributes
columnsReturns streaming DS columns if it’s present.
created_atReturns the creation date of this Tecton Object.
defined_inReturns filename where this Tecton Object has been declared.
descriptionThe description of this Tecton Object, set by user.
familyDeprecated.
idReturns a unique ID for the data source.
is_streamingWhether or not it’s a StreamDataSource.
nameThe name of this Tecton Object.
ownerThe owner of this Tecton Object (typically the email of the primary maintainer.)
tagsTags associated with this Tecton Object (key-value pairs of arbitrary metadata set by user.)
workspaceReturns the workspace this Tecton Object was created in.
-