Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Applies to: Databricks SQL
Databricks Runtime 11.3 LTS and above
Returns the file-level state of an Auto Loader or read_files
stream.
Syntax
cloud_files_state( { TABLE ( table_name ) | checkpoint } )
Arguments
- table_name: The identifier of the streaming table that’s being written to by
read_files
. The name must not include a temporal specification. Available in Databricks Runtime 13.3 LTS and above. checkpoint
: ASTRING
literal. The checkpoint directory for a stream using the Auto Loader source. See What is Auto Loader?.
Returns
Returns a table with the following schema:
path STRING NOT NULL PRIMARY KEY
The path of a file.
size BIGINT NOT NULL
The size of a file in bytes.
create_time TIMESTAMP NOT NULL
The time that a file was created.
discovery_time TIMESTAMP NOT NULL
Applies to:
Databricks Runtime 16.4 and above
The time that a file was discovered.
processed_time TIMESTAMP NOT NULL
Applies to:
Databricks Runtime 16.4 and above when
cloudFiles.cleanSource
is enabled. See Auto Loader options.The time that a file was processed. If a batch encounters a failure and is retried, a file might be processed multiple times. When retries happen, this field contains the most recent processing time.
commit_time TIMESTAMP
Applies to:
Databricks Runtime 16.4 and above when
cloudFiles.cleanSource
is enabled. See Auto Loader options.The time that a file was committed to the checkpoint after processing.
NULL
if the file is not yet processed. There is no guaranteed latency for marking a file as committed; a file might be processed but marked as committed arbitrarily later. Marking the file as committed means that Auto Loader does not require the file for processing again.archive_time TIMESTAMP
Applies to:
Databricks Runtime 16.4 and above when
cloudFiles.cleanSource
is enabled. See Auto Loader options.The time that a file was archived.
NULL
if the file has not been archived.archive_mode STRING
Applies to:
Databricks Runtime 16.4 and above when
cloudFiles.cleanSource
is enabled. See Auto Loader options.MOVE
ifcloudFiles.cleanSource
was set toMOVE
when the file was archived.DELETE
ifcloudFiles.cleanSource
was set toDELETE
when the file was archived.NULL
ifcloudFiles.cleanSource
was set toOFF
(default).move_location STRING
Applies to:
Databricks Runtime 16.4 and above when
cloudFiles.cleanSource
is enabled. See Auto Loader options.The full path of where the file was moved to during the archival operation when
cloudFiles.cleanSource
was set toMOVE
.NULL
if the file has not been archived orcloudFiles.cleanSource
is one ofDELETE
orOFF
.source_id STRING
The ID of the Auto Loader source in the streaming query. This value is
'0'
for streams that ingest from a single cloud object store location.flow_name STRING
Applies to:
Databricks SQL
Databricks Runtime 13.3 and above
Represents a specific streaming flow in DLT that contains one or more cloud files sources. NULL if no table_name was given.
ingestion_state STRING
Applies to:
Databricks Runtime 16.4 and above when
cloudFiles.cleanSource
is enabled. See Auto Loader options.Whether the file has been ingested, indicated by one of the following states:
NULL
: The file has not been processed yet, or the file state cannot be determined by Auto Loader.PROCESSING
: The file is being processed.SKIPPED_CORRUPTED
: The file was not ingested because it was corrupt.SKIPPED_MISSING
: The file was not ingested because it was not found during processing.INGESTED
: The file has been processed by the sink at least once. It may be processed again by non-idempotent sinks likeforeachBatch
in case of failures in the stream. Only files with a non-nullcommit_time
field that are in theINGESTED
state have completed processing.NOT_RECOGNIZED_BY_DBR
: Reserved for version compatibility. This state will be displayed for states introduced in later Databricks Runtime versions that are unrecognized by earlier Databricks Runtime versions.
Permissions
You need to have:
OWNER
privileges on the streaming table if using a streaming table identifier.READ FILES
privileges on the checkpoint location if providing a checkpoint under an external location.
Examples
-- Simple example from checkpoint
> SELECT path FROM CLOUD_FILES_STATE('/some/checkpoint');
/some/input/path
/other/input/path
-- Simple example from source subdir
> SELECT path FROM CLOUD_FILES_STATE('/some/checkpoint/sources/0');
/some/input/path
/other/input/path
-- Simple example from streaming table
> SELECT path FROM CLOUD_FILES_STATE(TABLE(my_streaming_table));
/some/input/path
/other/input/path