Skip to content

[SPARK-33902][SQL] Support CREATE TABLE LIKE for V2#54809

Open
viirya wants to merge 15 commits intoapache:masterfrom
viirya:create-table-like-v2
Open

[SPARK-33902][SQL] Support CREATE TABLE LIKE for V2#54809
viirya wants to merge 15 commits intoapache:masterfrom
viirya:create-table-like-v2

Conversation

@viirya
Copy link
Member

@viirya viirya commented Mar 14, 2026

What changes were proposed in this pull request?

Previously, CREATE TABLE LIKE was implemented only via CreateTableLikeCommand, which bypassed the V2 catalog pipeline entirely. This meant:

  • 3-part names (catalog.namespace.table) caused a parse error
  • 2-part names targeting a V2 catalog caused NoSuchDatabaseException

This PR adds a V2 execution path for CREATE TABLE LIKE:

  • Grammar: change tableIdentifier (2-part max) to identifierReference (N-part) for both target and source, consistent with all other DDL commands
  • Parser: emit CreateTableLike (new V2 logical plan) instead of CreateTableLikeCommand directly
  • Add createTableLike API to TableCatalogfor connector-delegated copy semantics
  • ResolveCatalogs: resolve the target UnresolvedIdentifier to ResolvedIdentifier
  • ResolveSessionCatalog: route back to CreateTableLikeCommand when both target and source are V1 tables/views in the session catalog (V1->V1 path)
  • DataSourceV2Strategy: convert CreateTableLike to new CreateTableLikeExec
  • CreateTableLikeExec: physical exec that calls TableCatalog.createTableLike() on the target catalog

Why are the changes needed?

CREATE TABLE LIKE was implemented solely via CreateTableLikeCommand, a V1-only command that bypasses the DataSource V2 analysis pipeline entirely. As a result, it was impossible to use CREATE TABLE LIKE to create a table in a non-session V2 catalog (e.g., testcat.dst): a 2-part name like testcat.dst was misinterpreted as database testcat in the session catalog and threw NoSuchDatabaseException, while a 3-part name like testcat.ns.dst was a parse error because the grammar only accepted 2-part tableIdentifier.

This change routes CREATE TABLE LIKE through the standard V2 DDL pipeline so that V2 catalog targets are fully supported, while preserving the existing V1 behavior when both target and source resolve to the session catalog.

Does this PR introduce any user-facing change?

Yes. CREATE TABLE LIKE DDL command supports V2.

How was this patch tested?

  • CreateTableLikeSuite: new integration tests covering V2 target with V1/V2 source, cross-catalog, views as source, IF NOT EXISTS, property behavior, and V1 fallback regression, etc.
  • DDLParserSuite: updated existing create table like test to match the new CreateTableLike plan shape; added 3-part name parsing test

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6

@viirya viirya changed the title [][SQL] Support CREATE TABLE LIKE for V2 [SPARK-XXXXX][SQL] Support CREATE TABLE LIKE for V2 Mar 14, 2026
@viirya viirya changed the title [SPARK-XXXXX][SQL] Support CREATE TABLE LIKE for V2 [SPARK-55994][SQL] Support CREATE TABLE LIKE for V2 Mar 14, 2026
@viirya viirya changed the title [SPARK-55994][SQL] Support CREATE TABLE LIKE for V2 [SPARK-33902][SQL] Support CREATE TABLE LIKE for V2 Mar 15, 2026
@viirya viirya force-pushed the create-table-like-v2 branch 2 times, most recently from 6e695fe to 6e3053c Compare March 16, 2026 01:51
@aokolnychyi
Copy link
Contributor

I'll take a look later today.

// For CREATE TABLE LIKE, use the v1 command if both the target and source are in the session
// catalog (or a V1-compatible catalog extension). If source is in a different catalog, fall
// through to the V2 execution path (CreateTableLikeExec via DataSourceV2Strategy).
case CreateTableLike(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean for DSv2 connectors that override the session catalog?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example is Iceberg session catalog.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, agree, we should add a test for sessionCatalog

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. When a connector like Iceberg overrides the session catalog, the target resolves through ResolvedV1Identifier (since supportsV1Command returns true for the session catalog), but the source — a native Iceberg Table — does NOT match ResolvedV1TableOrViewIdentifier (which requires V1Table). So ResolveSessionCatalog falls through and CreateTableLikeExec handles it, passing the Iceberg Table directly. The target createTable call goes to V2SessionCatalog, which delegates to the Iceberg catalog extension. This should work, but deserves a test. I can add one if you'd like.

// CHAR/VARCHAR types are preserved as declared (without internal metadata expansion).
val columns = sourceTable match {
case v1: V1Table =>
val rawSchema = CharVarcharUtils.getRawSchema(v1.catalogTable.schema)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — the test "CHAR and VARCHAR types are preserved from v1 source to v2 target" in CreateTableLikeSuite covers this. It creates a V1 source with CHAR(10) and VARCHAR(20), runs CREATE TABLE testcat.dst LIKE src, and asserts schema("name").dataType === CharType(10) and schema("tag").dataType === VarcharType(20).

case class CreateTableLikeExec(
targetCatalog: TableCatalog,
targetIdent: Identifier,
sourceTable: Table,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean it would only work for creating V2 table from another V2 table?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this can be V1Table that wraps CatalogTable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct on both. sourceTable: Table is the V2 Table interface, which can be any implementation. For session catalog sources, ResolveRelations wraps the CatalogTable in a V1Table, which implements Table. So V1→V2 works: the source is a V1Table and we handle it explicitly in the match block at line 57 to preserve CHAR/VARCHAR types.

val partitioning = sourceTable.partitioning

// 3. Resolve provider: USING clause overrides, else copy from source.
val resolvedProvider = provider.orElse {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this source provider but not target? Can we actually populate this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does DSv1 do and is it applicable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the source provider being copied to the target — which is exactly the semantics of CREATE TABLE LIKE: the target inherits the source's format unless overridden by a USING clause. This matches V1 CreateTableLikeCommand behavior, which also copies the source provider. The copied provider goes into PROP_PROVIDER in finalProps and is passed to catalog.createTable. Whether the target catalog uses it is catalog-specific: InMemoryCatalog stores it as-is; V2SessionCatalog validates it via DataSource.lookupDataSource.

locationProp

try {
// Constraints from the source table are intentionally NOT copied for several reasons:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is too long to be included here, let's shorten it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I'll shorten it.

@aokolnychyi
Copy link
Contributor

@gengliangwang @cloud-fan, can you folks help review as well?

@aokolnychyi
Copy link
Contributor

cc @szehon-ho as well

// If constraint copying is desired, use ALTER TABLE ADD CONSTRAINT after creation.
// If we wanted to support them in the future, the right approach would be to add an
// INCLUDING CONSTRAINTS clause (as PostgreSQL does) rather than copying blindly.
val tableInfo = new TableInfo.Builder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Owner?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. CatalogV2Util.convertTableProperties (used by CreateTableExec) calls withDefaultOwnership to add the current user as owner. We should do the same by adding CatalogV2Util.withDefaultOwnership(finalProps). I'll add that.

* - Source table's TBLPROPERTIES (user-specified `properties` are used instead)
* - Statistics, owner, create time
*/
case class CreateTableLikeExec(
Copy link
Contributor

@aokolnychyi aokolnychyi Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have V1 -> V2 within as well across catalog tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — "v2 target, v1 source: schema and partitioning are copied" tests V1 source (default.src in session catalog) → V2 target (testcat.dst). The "cross-catalog" and "3-part name" tests cover V2→V2 across catalogs.

@sarutak
Copy link
Member

sarutak commented Mar 16, 2026

The proposed behavior seems different from CREATE TABLE LIKE in Databricks Runtime. Is it OK?

I wonder if we can delegate what to copy on CREATE TABLE LIKE to each table format implementation?

ResolvedTable(_, _, table, _),
fileFormat: CatalogStorageFormat, provider, properties, ifNotExists) =>
CreateTableLikeExec(
catalog.asTableCatalog, ident, table, fileFormat, provider, properties, ifNotExists) :: Nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The three CreateTableLike match cases (for ResolvedTable, ResolvedPersistentView, ResolvedTempView) are nearly identical. Consider consolidating into a single pattern:

    case CreateTableLike(
        ResolvedIdentifier(catalog, ident), source,
        fileFormat: CatalogStorageFormat, provider, properties, ifNotExists) =>
      val table = source match {
        case ResolvedTable(_, _, t, _) => t
        case ResolvedPersistentView(_, _, meta) => V1Table(meta)
        case ResolvedTempView(_, meta) => V1Table(meta)
      }
      CreateTableLikeExec(
        catalog.asTableCatalog, ident, table, fileFormat, provider, properties, ifNotExists) :: Nil

Copy link
Member Author

@viirya viirya Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. I can refactor to the single pattern you proposed, with the source table resolved in an inner match. This is cleaner and removes the duplication.

targetCatalog: TableCatalog,
targetIdent: Identifier,
sourceTable: Table,
fileFormat: CatalogStorageFormat,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fileFormat: CatalogStorageFormat carries inputFormat/outputFormat/serde fields, but only locationUri is used (line 84). Consider narrowing the exec's parameter to location: Option[URI] to make the contract explicit, leaving the full CatalogStorageFormat only in the logical plan (where the V1 fallback path needs it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid. Only locationUri is used in CreateTableLikeExec. I'll change the exec's parameter to location: Option[URI] and extract it at the DataSourceV2Strategy callsite.

val v1 = "CREATE TABLE table1 LIKE table2"
// Helper to extract fields from the new CreateTableLike unresolved plan.
// The parser now emits CreateTableLike (v2 logical plan) instead of
// CreateTableLikeCommand, so both name and source are unresolved identifiers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source is UnresolvedTableOrView, not an unresolved identifier:

Suggested change
// CreateTableLikeCommand, so both name and source are unresolved identifiers.
// CreateTableLikeCommand, so the name is an UnresolvedIdentifier and the source is an UnresolvedTableOrView.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I'll apply the suggestion.

@gengliangwang
Copy link
Member

BTW, consider unifying to a single CreateTableLikeExec — The current PR keeps two execution paths: V1 fallback via CreateTableLikeCommand (for V1-V1 cases) and the new CreateTableLikeExec (for V2 targets). The test "v2 source, v1 target" already proves CreateTableLikeExec works for session catalog targets via V2SessionCatalog.

* @param properties User-specified TBLPROPERTIES.
* @param ifNotExists IF NOT EXISTS flag.
*/
case class CreateTableLike(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have one single command (UnaryRunnableCommand)? I thought that's the preferred way now to reduce plan complexity in the different stages

@viirya
Copy link
Member Author

viirya commented Mar 17, 2026

BTW, consider unifying to a single CreateTableLikeExec — The current PR keeps two execution paths: V1 fallback via CreateTableLikeCommand (for V1-V1 cases) and the new CreateTableLikeExec (for V2 targets). The test "v2 source, v1 target" already proves CreateTableLikeExec works for session catalog targets via V2SessionCatalog.

I considered this, but prefer to keep the separation consistent with how every other V2 DDL command in Spark is structured — ResolveSessionCatalog routes session catalog targets back to V1 commands, and DataSourceV2Strategy handles the rest. Unifying would require CreateTableLikeExec to absorb Hive serde semantics (STORED AS, ROW FORMAT, etc.) that properly belong to CreateTableLikeCommand. The V2→V1 test passing with a simple parquet table doesn't cover those cases. I think we'd rather keep the boundary clean and follow the established pattern.

@viirya
Copy link
Member Author

viirya commented Mar 17, 2026

The proposed behavior seems different from CREATE TABLE LIKE in Databricks Runtime. Is it OK?

I wonder if we can delegate what to copy on CREATE TABLE LIKE to each table format implementation?

DBR's behavior is format-aware: what gets copied depends on the source/target format combination. A Delta target gets constraints and configuration; a non-Delta target doesn't. This implies the copy logic is delegated to the table format (Delta) rather than being hardcoded in a single generic exec node.

In our implementation, CreateTableLikeExec is format-agnostic — it always copies only columns, partitioning, and provider, regardless of what the source or target catalog supports.

The "delegate to format implementation" design (where Delta controls what it copies) is appealing but would require a new TableCatalog API (e.g. createTableLike), which is a larger scope. It could be treated like a special behavior if the format wants to implement it. For this PR we kept the behavior simple and consistent with existing V1 CreateTableLikeCommand. We can think it as default behavior. I'm happy to file a follow-up JIRA to explore catalog-delegated copy semantics, including comments and constraint propagation for supporting catalogs.

@aokolnychyi
Copy link
Contributor

I took a look at the PR with fresh eyes today. I do think @sarutak brings a valid point.
What about actually offering a new method in TableCatalog?

default Table createTableLike(
    Identifier ident,
    TableInfo tableInfo,
    Table sourceTable) throws TableAlreadyExistsException, NoSuchNamespaceException {
  throws QueryExecutionErrors.unsupportedCreateTableLikeError();
}

Where tableInfo will contain override / explicit options and connectors can access sourceTable for any additional information? It will be a bit harder to standardize what props / state has to be copied. The danger right now is that Spark may copy properties that can't be copied and the connector can't distinguish between CREATE and CREATE LIKE.

This is similar to what Iceberg does with its snapshot and migrate procedures.

@sarutak
Copy link
Member

sarutak commented Mar 17, 2026

@aokolnychyi Yeah, that's exactly I'm thinking about.

@aokolnychyi
Copy link
Contributor

aokolnychyi commented Mar 17, 2026

The downside of adding a new method like createTableLike is that the eventual behavior becomes connector-based, meaning Spark can't fully control it. The question is whether we can populate enough information via the public properties to make this command usable for Delta and Iceberg. For instance, Iceberg needs to pick the format version and Delta needs to pick the protocol version (inherit it). One way to expose that is via table properties.

@viirya, can you do a deeper analysis to ensure we pick the right design but also make this feature truly usable?

@aokolnychyi
Copy link
Contributor

aokolnychyi commented Mar 17, 2026

The danger right now is that Spark may copy properties that can't be copied and the connector can't distinguish between CREATE and CREATE LIKE.

We could mitigate this by adding a new TableCatalogCapability and say new PROP_SOURCE_PROVIDER or a similar option to TableCatalog so that connectors can distinguish between regular CREATE and CREATE TABLE LIKE. Once connectors know they are in CREATE TABLE LIKE context, they can access table properties of the source table (they are copied over) to access additional information like protocol version. Connectors can also filter out the properties as well because they know they are in CREATE TABLE LIKE.

The main question is this enough to get away without createTableLike, @viirya?

@viirya
Copy link
Member Author

viirya commented Mar 18, 2026

@viirya, can you do a deeper analysis to ensure we pick the right design but also make this feature truly usable?

The analysis is quite long. I put it into a Google Doc:
https://docs.google.com/document/d/1CJmOoLnASBjEpkRqlqgkkG22XKBD4OoSSFdgOxa_0a8/edit?usp=sharing

The main question is this enough to get away without createTableLike, @viirya?

In short, with the approach of TableCatalogCapability + Marker + Source Properties, it should be able to achieve same features like createTableLike, but it might be fragile on some features like property filtering etc.

Consider the fragile features in current createTable + TableCatalogCapability + Marker + Source Properties, maybe we should go for createTableLike.

The secondary reason is evolvability. If in the future we need to pass additional context (e.g., ifNotExists, a version hint, or source catalog metadata), Approach createTab can add a parameter. Approach TableCatalogCapability would need to introduce yet another synthetic property key, making the convention grow unboundedly.

Approach TableCatalogCapability's only real advantage is avoiding a new TableCatalog method. But given that TableCatalog already has methods like createTable, alterTable, dropTable, adding createTableLike follows the established pattern and is a natural extension. The Spark connector API has precedent for adding new methods with default implementations or throwing exception for backward compatibility — connectors opt in when ready.

@jzhuge
Copy link
Member

jzhuge commented Mar 18, 2026

+1 to the createTableLike API direction. I had an earlier attempt at this in #40963 — sorry it did not land. Glad to see it picked up and taken further.

Two concrete cases where the current exec falls short for Iceberg: sort order is not captured by Table.partitioning(), and format-version is a table property the exec intentionally skips. CREATE TABLE LIKE on an Iceberg v2 table would silently produce a v1 table with no sort order. createTableLike lets the connector handle this correctly.

Thanks @viirya, @aokolnychyi, @gengliangwang, @szehon-ho, and @sarutak for pushing this forward.

@viirya
Copy link
Member Author

viirya commented Mar 18, 2026

@jzhuge Thanks for the Iceberg examples — sort order and format-version are exactly the kind of format-specific semantics that can't be handled generically. This further confirms createTableLike is the right direction.

@viirya
Copy link
Member Author

viirya commented Mar 18, 2026

How can we address / minimize discrepancies between implementations? For instance, what if one connector preserves constraints while the other drops them? What if one connector copies props from the source while the other drops them? Anything we can do beyond docs?

Documentation helps but doesn't enforce consistency. Here's a more structural idea.

The root cause of discrepancy is that the target connector has to know what is important to copy from the source — but it has no way to know source-format-specific semantics. Delta knows which properties are internal and must be filtered. Iceberg knows its sort order and format version must be preserved. A generic target connector cannot know any of this.

One way to address this is to introduce a source-side contract: instead of the target being responsible for deciding what to copy, the source table exposes a pre-built TableInfo representing what it considers correct to copy for a LIKE operation. Something like a new method on the Table interface:

default TableInfo tableInfoForLike() {
  // default: schema + partitioning only (current behavior)
}

The flow would then be:

  1. Spark calls targetCatalog.createTableLike(ident, userOverridesTableInfo, sourceTable)
  2. Target connector calls sourceTable.tableInfoForLike() itself and merges with userOverridesTableInfo using its own logic

This cleanly separates responsibilities:

  • Source controls what it exposes for copying — Delta filters out delta.columnMapping.maxColumnId, coordinated commits props, etc.; Iceberg includes sort order and format version
  • Target applies user overrides and creates the table using its own format semantics

The remaining discrepancy is only on the target side — e.g., a target that doesn't support constraints simply drops them. This is unavoidable and acceptable: you cannot force a target format to support features it doesn't have. But at least the source's intent is clearly expressed and correctly packaged, rather than relying on the target to reverse-engineer it.

@viirya
Copy link
Member Author

viirya commented Mar 18, 2026

What would be the meaning of TableInfo in this API? Will it only contain overrides or will it contain a merged state? One potential benefit of passing merged state from Spark is to have more consistent behavior. That said, connectors still will have to filter the props separately and it was kind of nice to have overrides explicitly.

I'd recommend overrides only — tableInfo should contain only what the user explicitly specified in the command (TBLPROPERTIES, LOCATION, USING). Pre-merging in Spark would bring back the disambiguation problem that connectors would need to reverse-engineer which properties came from the user vs Spark, losing the clean separation that makes createTableLike better than the capability approach. With overrides-only, the contract is clear: read source state from sourceTable, apply user intent from tableInfo, connector decides what gets copied and how to merge. This matches your earlier intuition that "it was kind of nice to have overrides explicitly."

@gengliangwang
Copy link
Member

How about adding a capability-gated createTableLike to TableCatalog:

  // New capability                                                                                                                           
  enum TableCatalogCapability {                                                                                                               
      SUPPORT_CREATE_TABLE_LIKE
  }                                                                                                                                           
                  
  // New default method in TableCatalog
  default Table createTableLike(
      Identifier ident,                                                                                                                       
      Identifier sourceIdent,                                                                                                                 
      TableInfo overrides) {                                                                                                                  
      throw new UnsupportedOperationException(                                                                                                
          name() + " does not support CREATE TABLE LIKE");
  }

And in Spark's execution layer:

  if (catalog.capabilities().contains(SUPPORT_CREATE_TABLE_LIKE)) {                                                                           
    // Connector handles everything: loads source, filters internal props, applies overrides
    catalog.createTableLike(ident, sourceIdent, overrides)                                                                                    
  } else {                                                                                                                                    
    // Fallback: Spark loads source, builds merged TableInfo, calls createTable                                                               
    val source = catalog.loadTable(sourceIdent)                                                                                               
    val merged = buildTableInfoFromSource(source, overrides)                                                                                  
    catalog.createTable(ident, merged)                                                                                                        
  }

This gives connectors like Delta full control over internal property filtering/ protocol inheritance in while still providing a reasonable fallback for simpler connectors that don't opt in. The fallback path handles the "good enough" case; connectors with complex internal state (Delta, Iceberg) implement the capability for correctness.

@sarutak
Copy link
Member

sarutak commented Mar 19, 2026

I'm +1 to @viirya 's solution for now.

@gengliangwang
Passing Identifier sourceIdent instead of Table sourceTable breaks cross-catalog CREATE TABLE LIKE right? Consider:

-- Iceberg source -> Delta target
CREATE TABLE delta_catalog.db.target LIKE iceberg_catalog.db.source

In this solution, delta_catalog receives sourceIdent = db.source, but it cannot call loadTable(db.source) because that identifier belongs to iceberg_catalog, not delta_catalog. The target catalog simply has no way to resolve an identifier from a different catalog.

@aokolnychyi
Copy link
Contributor

aokolnychyi commented Mar 19, 2026

I don't see how tableInfoForLike described here helps. It would still mean the implementation is connector-dependent. Also, there would be no way to express things like Iceberg sort order as TableInfo in Spark can't express it.

I also don't think createTableLike with sourceIdent is any better. In fact, it would break cross-catalog migration and would move the resolution further into the connector.

All in all, I think createTableLike(ident, tableInfo, sourceTable) with tableInfo containing overrides is the best approach we considered. Any thoughts?

@sarutak
Copy link
Member

sarutak commented Mar 19, 2026

I think @aokolnychyi's createTableLike(ident, tableInfo, sourceTable) with overrides-only tableInfo works well too. Passing the resolved Table object lets connectors use instanceof for format-specific metadata (Iceberg sort order, Delta protocol version, etc.), and the overrides-only tableInfo keeps the contract clean.

If the need for a source-side contract like tableInfoForLike() becomes clear in practice, it can be added later as a default method on Table without breaking existing implementations. No reason to block on it now.

@gengliangwang
Copy link
Member

@sarutak I see. Using sourceTable instead of sourceIdent makes sense.
@aokolnychyi sounds good

@viirya
Copy link
Member Author

viirya commented Mar 19, 2026

Agree on that. Okay since we got consensus, I will proceed with this direction and update this PR.

@viirya viirya force-pushed the create-table-like-v2 branch 2 times, most recently from 8236b58 to c5af564 Compare March 20, 2026 04:38
*/
default Table createTableLike(Identifier ident, TableInfo tableInfo, Table sourceTable)
throws TableAlreadyExistsException, NoSuchNamespaceException {
return createTable(ident, tableInfo);
Copy link
Contributor

@aokolnychyi aokolnychyi Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually safe or should we throw an exception by default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this is what I wanted to ask. Should we provide at least a default create table like for connectors not implement this API and just do minimum thing like columns and partitioning? But by doing that, we have to carry columns and partitions in the TableInfo so it raises the question in #54809 (comment).

A consistent way might be just throw an exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to remove the default implementation now.

// read source metadata, including properties and constraints, directly from sourceTable.
val tableInfo = new TableInfo.Builder()
.withColumns(columns)
.withPartitions(partitioning)
Copy link
Contributor

@aokolnychyi aokolnychyi Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems inconsistent. It does NOT seem like TableInfo contains only overrides, it seems like it actually copies some but not all state from the source table, making it really confusing. And if it copies partitioning, why doesn't it copy constraints?

Copy link
Contributor

@aokolnychyi aokolnychyi Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question for properties too. If we copied partitioning, why didn't we copy properties? This isn't strictly overrides or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed them as no default implementation now.

Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

The design cleanly follows the established V2 DDL pipeline (parser → logical plan → ResolveCatalogs → ResolveSessionCatalog V1 fallback → DataSourceV2Strategy → exec → catalog API). The new createTableLike(ident, tableInfo, sourceTable) API with overrides-only tableInfo is a good contract for connector-delegated copy semantics.

Two general observations:

  • Stale Scaladoc in SparkSqlParser.scala:1144: The comment still says "Create a [[CreateTableLikeCommand]] command" but the method now returns a CreateTableLike logical plan.
  • Missing CHAR/VARCHAR preservation test: The old CreateTableLikeExec had explicit CharVarcharUtils.getRawSchema handling for V1Table sources. The refactor to the createTableLike API removed this (correctly — it’s now the connector’s responsibility), but the CHAR/VARCHAR test mentioned in earlier review comments is no longer present. Consider adding a test that creates a V1 source with CHAR/VARCHAR columns, runs CREATE TABLE testcat.dst LIKE src, and verifies type preservation through the connector path.

viirya and others added 15 commits March 21, 2026 00:52
## What changes were proposed in this pull request?

Previously, `CREATE TABLE LIKE` was implemented only via `CreateTableLikeCommand`,
which bypassed the V2 catalog pipeline entirely. This meant:
- 3-part names (catalog.namespace.table) caused a parse error
- 2-part names targeting a V2 catalog caused `NoSuchDatabaseException`

This PR adds a V2 execution path for `CREATE TABLE LIKE`:

- Grammar: change `tableIdentifier` (2-part max) to `identifierReference`
  (N-part) for both target and source, consistent with all other DDL commands
- Parser: emit `CreateTableLike` (new V2 logical plan) instead of
  `CreateTableLikeCommand` directly
- `ResolveCatalogs`: resolve the target `UnresolvedIdentifier` to
  `ResolvedIdentifier`
- `ResolveSessionCatalog`: route back to `CreateTableLikeCommand` when both
  target and source are V1 tables/views in the session catalog (V1->V1 path)
- `DataSourceV2Strategy`: convert `CreateTableLike` to new `CreateTableLikeExec`
- `CreateTableLikeExec`: physical exec that copies schema and partitioning from
  the resolved source `Table` and calls `TableCatalog.createTable()`

## How was this patch tested?

- `CreateTableLikeSuite`: new integration tests covering V2 target with V1/V2
  source, cross-catalog, views as source, IF NOT EXISTS, property behavior,
  and V1 fallback regression
- `DDLParserSuite`: updated existing `create table like` test to match the new
  `CreateTableLike` plan shape; added 3-part name parsing test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add two tests covering the case where the source is a V2 table in a
non-session catalog and the target resolves to the session catalog.
These exercise the CreateTableLikeExec → V2SessionCatalog path and
confirm that schema and partitioning are correctly propagated.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add two tests to CreateTableLikeSuite documenting that pure V2 catalogs
(e.g. InMemoryCatalog) accept any provider string without validation,
while V2SessionCatalog rejects non-existent providers by delegating to
DataSource.lookupDataSource. This is consistent with how CreateTableExec
handles the USING clause for other V2 DDL commands.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…CREATE TABLE LIKE

Two new tests covering previously untested code paths in CreateTableLikeExec:
- Source provider is copied to V2 target as PROP_PROVIDER when no USING override
  is given, consistent with how CreateTableExec handles other V2 DDL.
- CHAR(n)/VARCHAR(n) types declared on a V1 source are preserved in the V2
  target via CharVarcharUtils.getRawSchema, not collapsed to StringType.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add inline comment explaining the six reasons withConstraints is
intentionally omitted: V1 behavior parity, ForeignKey cross-catalog
dangling references, constraint name collision risk, validation status
semantics on empty tables, NOT NULL already captured in nullability,
and PostgreSQL precedent (INCLUDING CONSTRAINTS opt-in). Also notes
the path forward if constraint copying is added in the future.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Clarify that V1 tables (CatalogTable) have no constraint objects at all
since CHECK/PRIMARY KEY/UNIQUE/FOREIGN KEY are V2-only concepts added in
Spark 4.1.0, rather than saying CreateTableLikeCommand "never copied"
them which implies an intentional decision rather than absence of the
feature.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ed identifiers

After the CREATE TABLE LIKE V2 change, the target and source identifiers
in CreateTableLikeCommand are now fully qualified (spark_catalog.default.*)
because ResolvedV1Identifier explicitly adds the catalog component via
ident.asTableIdentifier.copy(catalog = Some(catalog.name)), and
ResolvedV1TableIdentifier returns t.catalogTable.identifier which also
includes the catalog. Update the analyzer golden file accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Narrow CreateTableLikeExec parameter from fileFormat: CatalogStorageFormat
  to location: Option[URI] since only locationUri is used; extract at the
  DataSourceV2Strategy callsite (gengliangwang)
- Add withDefaultOwnership to finalProps so the target table records the
  current user as owner, consistent with CreateTableExec (aokolnychyi)
- Consolidate three CreateTableLike pattern match cases in DataSourceV2Strategy
  into a single case with an inner match on the source (gengliangwang)
- Shorten the constraint comment and add a note on source provider inheritance
  (aokolnychyi)
- Fix DDLParserSuite comment: source is UnresolvedTableOrView, not an
  unresolved identifier (gengliangwang)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tests that when a DSv2 connector overrides the session catalog via
CatalogExtension (e.g. Iceberg SparkSessionCatalog), CREATE TABLE LIKE
with a native V2 source correctly uses CreateTableLikeExec and creates
the target as a native V2 table in the extension catalog.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ed copy semantics

Add a new default method `createTableLike(Identifier, TableInfo, Table)` to
`TableCatalog` so that connectors (Delta, Iceberg, etc.) can implement
format-specific CREATE TABLE LIKE semantics by accessing the resolved source
`Table` object directly (e.g. Delta protocol inheritance, Iceberg sort order
and format version).

`TableInfo` contains user-specified overrides (TBLPROPERTIES, LOCATION),
resolved provider, and current user as owner. Source TBLPROPERTIES and
constraints are NOT bulk-copied; connectors read them from `sourceTable`.
The default implementation falls back to `createTable(ident, tableInfo)`.

`CreateTableLikeExec` is updated to call `createTableLike` instead of
`createTable`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
InMemoryTableCatalog overrides createTableLike to demonstrate connector-specific
copy semantics: source properties are merged into the target (user overrides win),
and source constraints are copied from sourceTable.constraints() directly.
BasicInMemoryTableCatalog does not override createTableLike and uses the default
fallback, which copies only schema, partitioning, and user-specified overrides.

Tests added to CatalogSuite covering:
- User-specified properties in tableInfo are applied to the target
- Source properties are copied by the connector implementation
- User-specified properties override source properties
- Source constraints are copied by the connector implementation
- Default fallback does not copy source properties

CreateTableLikeSuite updated to reflect that InMemoryTableCatalog's createTableLike
copies source properties, and adds a test for user override precedence.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… by default

Key changes:
- TableCatalog.createTableLike default now throws UnsupportedOperationException
  instead of falling back to createTable; connectors must explicitly implement it
- TableInfo passed to createTableLike contains only user-specified overrides
  (TBLPROPERTIES, LOCATION, resolved provider, owner); schema, partitioning,
  and constraints are NOT pre-populated -- connectors read all source metadata
  directly from sourceTable
- CreateTableLikeExec no longer extracts columns/partitioning into TableInfo;
  removed CharVarcharUtils usage (CHAR/VARCHAR preservation is connector-specific)
- InMemoryTableCatalog.createTableLike updated to read columns, partitioning,
  and constraints from sourceTable directly
- TableInfo.Builder.columns defaults to empty array (no longer null) so
  properties-only builds succeed while requireNonNull guard is preserved
- Removed V2->V1 (session catalog target) support and related tests; that path
  is unsupported -- connectors targeting the session catalog must override
  createTableLike themselves
- Updated CatalogSuite test to verify UnsupportedOperationException is thrown
  for catalogs that do not override createTableLike

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dOperationException

The session catalog does not implement createTableLike, so CREATE TABLE LIKE
targeting it with a V2 source should throw UnsupportedOperationException.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
"There is no default implementation" was contradictory since the method
is declared with 'default' and has a body. Reword to accurately say
the default implementation throws UnsupportedOperationException.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… stale Scaladoc

- InMemoryTableCatalog.createTableLike now applies CharVarcharUtils.getRawSchema
  when the source is a V1Table, preserving CHAR/VARCHAR column types as declared
  rather than collapsed to StringType. This illustrates the pattern connectors
  should follow to preserve declared types from V1 sources.
- Add test "CHAR and VARCHAR types are preserved from v1 source to v2 target"
  in CreateTableLikeSuite to verify end-to-end type fidelity.
- Fix stale Scaladoc in SparkSqlParser.scala: "Create a [[CreateTableLikeCommand]]
  command" → "Create a [[CreateTableLike]] logical plan."

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@viirya viirya force-pushed the create-table-like-v2 branch from eb58fad to 3b94076 Compare March 21, 2026 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants