Conversation
…s only support single namespace, for every namespace , make call to getTables. Use ThreadPoolCallbackRunnerLocal to perform in thredpool. tryGetTableMetadata , Inject https://s3.{region}.amazonaws.com so that requests dont goto bucket.s3.amazonaws.com. pre-strips the s3://bucket-name/ prefix, leaving just the relative path metadata/00001-....json, so downstream code constructs the correct S3 key. Detects missing/empty credentials and injects the catalog's own IAM credentials from credentials_provider->GetAWSCredentials().
301f82e to
dbcf541
Compare
|
TESTING |
|
AI audit note: This review comment was generated by AI (gpt-5.3-codex). Audit update for PR #1617 (Added support for s3 tables / S3 Tables Iceberg REST catalog): Confirmed defects: No confirmed defects in reviewed scope. Coverage summary: |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dbcf541b8d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| /// Iceberg REST catalog for Amazon S3 Tables (SigV4, signing name `s3tables`). | ||
| /// https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tables-integrating-open-source.html | ||
| class S3TablesCatalog final : public RestCatalog |
There was a problem hiding this comment.
Override S3 Tables drop to request purge
S3TablesCatalog inherits RestCatalog::dropTable, which hardcodes ?purgeRequested=False in the DELETE URL; AWS S3 Tables documents that purge=false is rejected with 400 Bad Request, so DROP TABLE against catalog_type='s3tables' will fail consistently. Please override the drop path for this catalog to send a purge-enabled delete request.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed, added support for dropTable
| { | ||
| if (settings[DatabaseDataLakeSetting::catalog_type].value == DB::DatabaseDataLakeCatalogType::GLUE) | ||
| if (settings[DatabaseDataLakeSetting::catalog_type].value == DB::DatabaseDataLakeCatalogType::GLUE | ||
| || settings[DatabaseDataLakeSetting::catalog_type].value == DB::DatabaseDataLakeCatalogType::S3_TABLES) |
There was a problem hiding this comment.
Enforce non-empty warehouse for s3tables
validateSettings() now groups S3_TABLES with Glue and skips the warehouse check, but this catalog still relies on warehouse for loadConfig() (/v1/config?warehouse=...) and for deriving config.prefix when the server does not return one; allowing an empty warehouse defers this to runtime failures on namespace/table requests instead of rejecting invalid configuration at CREATE DATABASE time.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
addressed added separate check for S3_TABLES
|
|
||
| LOG_DEBUG(log, "Requesting: {}", url.toString()); | ||
|
|
||
| return DB::BuilderRWBufferFromHTTP(url) |
There was a problem hiding this comment.
I see the base class implementation has some sort of retry mechanism, why not have it here as well?
try
{
return create_buffer(false);
}
catch (const DB::HTTPException & e)
{
const auto status = e.getHTTPStatus();
if (update_token_if_expired &&
(status == Poco::Net::HTTPResponse::HTTPStatus::HTTP_UNAUTHORIZED
|| status == Poco::Net::HTTPResponse::HTTPStatus::HTTP_FORBIDDEN))
{
return create_buffer(true);
}
throw;
}
|
|
||
| const auto & context = getContext(); | ||
|
|
||
| Poco::URI url(endpoint, /* enable_url_encoding */ false); |
There was a problem hiding this comment.
/// enable_url_encoding=false to allow use tables with encoded sequences in names like 'foo%2Fbar'
|
|
||
| } | ||
|
|
||
| void signRequestWithAWSV4( |
There was a problem hiding this comment.
Nit: Perhaps name it like signHeaders as the output of this method is a list of headers and not a request?
There was a problem hiding this comment.
if (!payload.empty())
{
auto body_stream = Aws::MakeShared<std::stringstream>("AWSV4Signer");
body_stream->write(payload.data(), static_cast<std::streamsize>(payload.size()));
body_stream->seekg(0);
request.AddContentBody(body_stream);
}
static constexpr bool sign_body = true;
if (!signer.SignRequest(request, region.c_str(), service.c_str(), sign_body))
The request body is also signed
|
|
||
| static constexpr bool sign_body = true; | ||
| if (!signer.SignRequest(request, region.c_str(), service.c_str(), sign_body)) | ||
| throw DB::Exception(DB::ErrorCodes::LOGICAL_ERROR, "AWS SigV4 signing failed"); |
There was a problem hiding this comment.
LOGICAL_ERROR will crash ClickHouse, it should be something else
There was a problem hiding this comment.
changed to S3_ERROR
| Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy::Always, | ||
| /* urlEscapePath = */ false); | ||
|
|
||
| config = loadConfig(); |
There was a problem hiding this comment.
This is kind of dangerous. You've made createReadBuffer virtual, and loadConfig calls it. If S3TablesCatalog::createReadBuffer relies on members that are initialized after loadConfig, it is UB
There was a problem hiding this comment.
Looking at the code, there doesn't seem to be an imminent problem, but i would at least add a comment about this?
I don't see a way around this "problem" tbh
There was a problem hiding this comment.
Maybe this solves the problem? https://github.com/Altinity/ClickHouse/pull/1617/changes#r3053334453
| return true; | ||
| } | ||
|
|
||
| DB::HTTPHeaderEntries S3TablesCatalog::getAuthHeaders(bool /* update_token */) const |
There was a problem hiding this comment.
Why not use this method to implement the signing instead of overwriting the createReadBuffer method?
There was a problem hiding this comment.
moved logic to getAuthHeaders
|
Added support for drop table |
…r to getAuthHeaders
arthurpassos
left a comment
There was a problem hiding this comment.
I haven't finished reviewing it yet, a few files are still pending
| Poco::Net::HTTPBasicCredentials credentials{}; | ||
|
|
||
| DB::ReadWriteBufferFromHTTPPtr createReadBuffer( | ||
| virtual DB::ReadWriteBufferFromHTTPPtr createReadBuffer( |
There was a problem hiding this comment.
I think you no longer need the virtual modifier here and in sendRequest
| } | ||
| } | ||
|
|
||
| DB::Names S3TablesCatalog::getTables() const |
There was a problem hiding this comment.
Can you add a comment explaining why this has to be overriden? The base method seems to be kind of similar
| if (!result.requiresCredentials()) | ||
| return true; | ||
|
|
||
| bool need_credentials = !result.hasStorageCredentials() || !result.getStorageCredentials(); |
There was a problem hiding this comment.
As far as I can tell, neither TableMetadata::hasStorageCredentials or TableMetadata::getStorageCredentials are virtual. And their default implementation reference the same member TableMetadata::storage_credentials. Which means !result.hasStorageCredentials() || !result.getStorageCredentials() is redundant and can be simplified to !result.hasStorageCredentials()
There was a problem hiding this comment.
Actually, isn't it simpler and equivalent to just do:
bool need_credentials = true
if (const auto storage_credentials = result.getStorageCredentials())
{
auto creds = std::dynamic_pointer_cast<S3Credentials>(storage_credentials);
if (creds && !creds->isEmpty())
need_credentials = false;
}
| LOG_DEBUG(log, "S3 Tables: no vended credentials for {}.{}, injecting catalog IAM credentials", namespace_name, table_name); | ||
| auto aws_creds = credentials_provider->GetAWSCredentials(); | ||
| result.setStorageCredentials(std::make_shared<S3Credentials>( | ||
| String(aws_creds.GetAWSAccessKeyId().c_str(), aws_creds.GetAWSAccessKeyId().size()), |
There was a problem hiding this comment.
Do you really need to grab the internal const char pointer and then create a String from it? I suspect you can just do: std::make_shared<S3Credentials>(aws_creds.GetAWSAccessKeyId())
See
| String(aws_creds.GetSessionToken().c_str(), aws_creds.GetSessionToken().size()))); | ||
| } | ||
|
|
||
| if (result.getEndpoint().empty()) |
There was a problem hiding this comment.
IIRC, AWS s3 supports several types of endpoint formats. Are you sure it is ok to support only s3.<region>.amazonaws.com? And I see it is also hard-coded to https.
See
Line 35 in 9ab63c5
| result.setEndpoint(regional_endpoint); | ||
| } | ||
|
|
||
| if (result.hasDataLakeSpecificProperties()) |
There was a problem hiding this comment.
I think this can be simplified to if (const auto props = result.getDataLakeSpecificProperties()). No need to check for hasDataLakeSpecificProperties and props.has_value()
|
@codex review |
|
|
Codex Review: Didn't find any major issues. 🎉 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Support for s3 tables(iceberg REST Catalog), closes #ClickHouse#95340
Documentation entry for user-facing changes
Support for s3 tables(Iceberg serverless REST Catalog) using sigv4 authentication.
Can be enabled with
allow_experimental_database_s3_tablesCI/CD Options
Exclude tests:
Regression jobs to run: