fix(iceberg): default namespace location so fresh CTAS does not race metadata write (#9074) (#9246)

* fix(iceberg): advertise default namespace location for REST clients

Trino's REST catalog has two code paths for CREATE TABLE depending on
whether a table location can be resolved before the catalog call:

    // TrinoRestCatalog.newCreateTableTransaction (Trino 479)
    if (location.isEmpty()) {
        return tableBuilder.create().newTransaction();   // EAGER: REST POST now
    }
    return tableBuilder.withLocation(...).createTransaction(); // DEFERRED

`tableLocation` resolves to null when the user does not pass
`location = '...'` AND `defaultTableLocation` returns null. The latter
happens whenever the namespace's `loadNamespaceMetadata` response has no
`location` property — and our handler returned exactly that.

In the eager branch Trino calls REST POST /v1/.../tables immediately, so
our handleCreateTable persists `<location>/metadata/v1.metadata.json` to
the filer. Trino's IcebergMetadata.beginCreateTable then runs
`fileSystem.listFiles(location).hasNext()` on the same path, finds the
metadata file we just wrote, and throws "Cannot create a table on a
non-empty location" — even on a brand-new schema and table name.

Synthesize a default `location` of `s3://<bucket>/<flattened-namespace>`
on Get/Create namespace responses when one is not stored. With a non-
null `defaultTableLocation`, Trino takes the deferred branch, picks a
unique `<namespace-location>/<table>-<UUID>` path (UUID added by the
standard `iceberg.unique-table-location=true` setting), and the empty-
location check passes. The actual REST POST is deferred to commit time,
so our metadata write lands alongside the data files Trino has already
produced.

Existing namespaces with an explicit `location` property are untouched —
the synthesis only kicks in when the property is absent.

Refs #9074

* test(trino): regression for fresh CREATE TABLE without explicit location

Exercises the follow-up scenario reported on issue #9074: a CREATE TABLE
on a brand-new schema and brand-new table name with NO `location = '...'`
clause, followed by a CTAS on top — exactly the SQL pattern from the
report. Before advertising a default namespace location, the first
CREATE TABLE failed with

    Cannot create a table on a non-empty location:
    s3://iceberg-tables/<schema>/<table>, set
    'iceberg.unique-table-location=true' in your Iceberg catalog
    properties to use unique table locations for every table.

even though the bucket was genuinely empty. The bug surfaced because our
handleCreateTable persists `<location>/metadata/v1.metadata.json` during
Trino's eager-create branch, and Trino's post-create listFiles
emptiness check then trips on the metadata file we just wrote.

The test asserts the CTAS succeeds AND the resulting table contains the
source rows, since the reporter saw the table get created with empty
data when the query failed.

Refs #9074
This commit is contained in:
Chris Lu
2026-04-27 16:37:33 -07:00
committed by GitHub
parent cba2f7b1dd
commit f50917224a
3 changed files with 113 additions and 2 deletions
@@ -0,0 +1,76 @@
package catalog_trino
import (
"fmt"
"strings"
"testing"
)
// TestTrinoIssue9074CTASFreshTable is a regression test for the follow-up
// report on https://github.com/seaweedfs/seaweedfs/issues/9074.
//
// The original fix purged stale data on DROP so a re-CREATE at the same
// path could succeed. The reporter then observed that even a fresh table
// name fails — they ran a CTAS without an explicit `location =` clause and
// hit the same "Cannot create a table on a non-empty location" error.
//
// Root cause: when (a) the user does not pass `location = ...`, and (b) the
// namespace has no Iceberg `location` property, Trino's REST catalog goes
// through TrinoRestCatalog.newCreateTableTransaction's eager branch
// (`tableBuilder.create().newTransaction()`) which immediately invokes the
// REST POST. Our handleCreateTable persists `<location>/metadata/v1.metadata.json`
// as part of that call. Trino's IcebergMetadata.beginCreateTable then runs
// `fileSystem.listFiles(location).hasNext()` and trips on the metadata file
// we just wrote.
//
// Fix: handleGetNamespace / handleCreateNamespace now advertise a default
// `location` property of `s3://<bucket>/<flattened-namespace>` when the
// namespace doesn't already carry one. Trino's `defaultTableLocation` then
// returns `<namespace-location>/<table>-<UUID>` (UUID added by the standard
// `iceberg.unique-table-location=true` connector setting), and the REST
// catalog takes the deferred `createTransaction` branch that does NOT call
// REST POST until commit-time. The empty-location check sees a genuinely
// empty UUID-suffixed path and passes; on commit, our metadata write lands
// alongside the data files Trino has already produced.
//
// This test exercises the exact pattern from the report: a fresh schema, a
// fresh table name, no `location =` clause, and a CTAS on top.
func TestTrinoIssue9074CTASFreshTable(t *testing.T) {
env := setupTrinoTest(t)
defer env.Cleanup(t)
schemaName := "issue9074_" + randomString(6)
srcName := "src_" + randomString(6)
ctasName := "ctas_" + randomString(6)
srcQualified := fmt.Sprintf("iceberg.%s.%s", schemaName, srcName)
ctasQualified := fmt.Sprintf("iceberg.%s.%s", schemaName, ctasName)
runTrinoSQL(t, env.trinoContainer, fmt.Sprintf("CREATE SCHEMA IF NOT EXISTS iceberg.%s", schemaName))
defer runTrinoSQLAllowNamespaceNotEmpty(t, env.trinoContainer, fmt.Sprintf("DROP SCHEMA IF EXISTS iceberg.%s", schemaName))
defer runTrinoSQL(t, env.trinoContainer, fmt.Sprintf("DROP TABLE IF EXISTS %s", srcQualified))
defer runTrinoSQL(t, env.trinoContainer, fmt.Sprintf("DROP TABLE IF EXISTS %s", ctasQualified))
t.Logf(">>> CREATE source %s (no explicit location)", srcQualified)
runTrinoSQL(t, env.trinoContainer, fmt.Sprintf(
"CREATE TABLE %s (id INTEGER, name VARCHAR)", srcQualified))
runTrinoSQL(t, env.trinoContainer, fmt.Sprintf(
"INSERT INTO %s VALUES (1, 'a'), (2, 'b'), (3, 'c')", srcQualified))
ctasSQL := fmt.Sprintf("CREATE TABLE IF NOT EXISTS %s AS (SELECT * FROM %s)", ctasQualified, srcQualified)
t.Logf(">>> CTAS (no location override): %s", ctasSQL)
runTrinoSQL(t, env.trinoContainer, ctasSQL)
countOutput := runTrinoSQL(t, env.trinoContainer, fmt.Sprintf("SELECT count(*) FROM %s", ctasQualified))
got := mustParseCSVInt64(t, countOutput)
if got != 3 {
t.Fatalf("CTAS target: expected 3 rows, got %d (resulting table should not be empty)", got)
}
out := runTrinoSQL(t, env.trinoContainer, fmt.Sprintf("SELECT name FROM %s ORDER BY id", ctasQualified))
for _, want := range []string{"a", "b", "c"} {
if !strings.Contains(out, want) {
t.Fatalf("CTAS target missing row %q; got:\n%s", want, out)
}
}
}
+2 -2
View File
@@ -144,7 +144,7 @@ func (s *Server) handleCreateNamespace(w http.ResponseWriter, r *http.Request) {
result := CreateNamespaceResponse{
Namespace: Namespace(createResp.Namespace),
Properties: normalizeNamespaceProperties(createResp.Properties),
Properties: withDefaultNamespaceLocation(createResp.Properties, bucketName, createResp.Namespace),
}
writeJSON(w, http.StatusOK, result)
}
@@ -188,7 +188,7 @@ func (s *Server) handleGetNamespace(w http.ResponseWriter, r *http.Request) {
result := GetNamespaceResponse{
Namespace: Namespace(getResp.Namespace),
Properties: normalizeNamespaceProperties(getResp.Properties),
Properties: withDefaultNamespaceLocation(getResp.Properties, bucketName, getResp.Namespace),
}
writeJSON(w, http.StatusOK, result)
}
+35
View File
@@ -169,3 +169,38 @@ func normalizeNamespaceProperties(properties map[string]string) map[string]strin
}
return properties
}
// namespaceLocationProperty is the well-known Iceberg key used to advertise a
// default base path for tables created in a namespace.
const namespaceLocationProperty = "location"
// defaultNamespaceLocation returns the s3 path under which tables in the
// namespace should be stored when the namespace itself does not carry an
// explicit "location" property. The flattened namespace path mirrors the on-
// disk layout used by handleCreateTable so a deferred client-side commit ends
// up at the same place a server-computed one would.
func defaultNamespaceLocation(bucket string, namespace []string) string {
if bucket == "" || len(namespace) == 0 {
return ""
}
return fmt.Sprintf("s3://%s/%s", bucket, flattenNamespacePath(namespace))
}
// withDefaultNamespaceLocation populates the Iceberg "location" property on a
// namespace response when the storage layer does not carry one. Trino's REST
// catalog falls back to an eager createTable code path when the namespace
// has no location, which races our metadata write against Trino's emptiness
// check and surfaces as a "Cannot create a table on a non-empty location"
// error on the very first CREATE TABLE. Advertising a default location lets
// Trino take the deferred-transaction path and pick a unique per-table path.
// See issue #9074.
func withDefaultNamespaceLocation(properties map[string]string, bucket string, namespace []string) map[string]string {
properties = normalizeNamespaceProperties(properties)
if _, ok := properties[namespaceLocationProperty]; ok {
return properties
}
if def := defaultNamespaceLocation(bucket, namespace); def != "" {
properties[namespaceLocationProperty] = def
}
return properties
}