Polybase accessing s3 storage error - Content of the directory cannot be listed

Haijin Li 21 Reputation points
2024-08-14T00:24:25.8666667+00:00

I have the following script for accessing on-prem S3-compatible storage.

CREATE DATABASE SCOPED CREDENTIAL s3
 WITH IDENTITY = 'S3 Access Key',
 SECRET = '****:****' ;
END
GO 

CREATE EXTERNAL DATA SOURCE ds_s3
WITH
(   LOCATION = 's3://s3server.test.com:9000/'
,   CREDENTIAL = s3
);
GO

-- query with error 
SELECT *
FROM
    OPENROWSET(
        BULK 'testbucket/test1/*.parquet',
        DATA_SOURCE = 'ds_s3',
        FORMAT='PARQUET'
    ) AS src
GO

-- Error msg
-- Content of directory on path '/testbucket/test1/*.parquet' cannot be listed.

Additional information

  1. I am not sure the port number is needed for the data source. With the port number included, I got the error in 80 seconds. Without the port number, I got the same error immediately.
  2. There is no log entries related to above errors at C:\Program Files\Microsoft SQL Server\MSSQL16.MSSQLSERVER\MSSQL\Log\Polybase
  3. The credential (access key and secret key) works. I tested it with a script accessing those parquet files via https.

I also accessed the parquet files using AWS CLI successfully, after I supplied the access key using aws configure command. aws s3 ls --endpoint-url https://s3server.test.com s3://testbucket/test1 --recursive

AWS CLI does not take * wildcard so the command below output nothing. But I do not think it is the issue. aws s3 ls --endpoint-url https://s3server.test.com s3://testbucket/test1/*.parquet --recursive

Thank you.

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
14,494 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Alejandro Echeverria 6 Reputation points
    2024-08-14T14:25:40.5933333+00:00

    Take a look at the documentation: https://learn.microsoft.com/en-us/sql/relational-databases/polybase/polybase-configure-s3-compatible?view=sql-server-ver16

    There is a part that mentions your exact same problem "cannot be listed".


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.