[SPARK-56510][SQL] Fix ReplaceData DML without metadata attributes not projecting out the operation column by ZiyaZa · Pull Request #55372 · apache/spark

ZiyaZa · 2026-04-16T16:39:28Z

What changes were proposed in this pull request?

Previously, all DSv2 tests used an in-memory table that had some metadata attributes. This caused the code path for no-metadata attributes to be missed. This PR introduces a new property no-metadata for testing with an InMemoryTable without metadata attributes.

Previous implementation had a bug for ReplaceData plans that it would use DataWritingSparkTask without projection, which means that the connector would receive one more column (the __row_operation column) in addition to the row data to write. This is fixed in this PR by creating a new Writing Task DataWithProjectionWritingSparkTask that supports projecting only row data.

Additionally, following changes are done to clean-up the code:

Renamed WRITE_WITH_METADATA_OPERATION to WRITE_OPERATION and WRITE_OPERATION to WRITE_WITHOUT_METADATA_OPERATION to make the intention clearer. Previously, it was confusing that in DataWithProjectionWritingSparkTask, we had WRITE_WITH_METADATA_OPERATION when there are no metadata attributes.
Created RowLevelWriteExec as a parent of ReplaceDataExec / WriteDeltaExec, which now holds a helper getMetricValue for metric computation

Why are the changes needed?

To fix a bug.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New unit tests.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.6

…e operation column

szehon-ho

Good bug fix for the no-metadata-attributes code path in ReplaceData. The core change (introducing DataWithProjectionWritingSparkTask) is correct and well-targeted — it ensures the __row_operation column is projected out even when there are no metadata attributes. A few suggestions below.

szehon-ho · 2026-04-16T21:57:27Z

+  final val WRITE_WITHOUT_METADATA_OPERATION: Int = 5
+  final val WRITE_OPERATION: Int = 6


nit: The rename swaps both names AND integer values — WRITE_OPERATION goes from value 5 to 6. Since the values are arbitrary and only consumed by match statements in the same codebase, consider just renaming without swapping the integer values. This would avoid any (unlikely but possible) risk with code that may have hard-coded the integer values.

szehon-ho · 2026-04-16T21:57:27Z

+      val operation = row.getInt(0)
+
+      operation match {
+        case WRITE_OPERATION | WRITE_WITHOUT_METADATA_OPERATION =>


nit: Worth a brief comment here explaining why both operation types are handled. This task is only used when there are no metadata attributes, yet WRITE_OPERATION-tagged rows (carryover/update rows in MERGE, all rows in DELETE/UPDATE) still arrive because the rewrite rules tag them with WRITE_OPERATION regardless of whether metadata attributes exist. Without the comment, a reader might wonder why a no-metadata writing task handles WRITE_OPERATION.

szehon-ho · 2026-04-16T21:57:27Z

+package org.apache.spark.sql.connector
+
+class GroupBasedNoMetadataDeleteFromTableSuite extends DeleteFromTableSuiteBase {
+


suggestion: Six new test files, each ~10 lines of actual code, is significant boilerplate. Consider adding a noMetadata flag to the existing suites and running them with both configurations (e.g., via a shared trait or parameterization). This would avoid class proliferation and keep the test matrix more maintainable.

szehon-ho · 2026-04-16T21:57:27Z

-        override def build(): Write = new Write with RequiresDistributionAndOrdering {
-          override def requiredDistribution: Distribution = {
-            Distributions.clustered(Array(PARTITION_COLUMN_REF))
+        override def build(): Write = if (noMetadata) {


question: Is it intentional that the noMetadata path bypasses RequiresDistributionAndOrdering? This exercises a different physical plan (no shuffle/sort). If the goal is just to test the no-metadata code path, consider keeping the distribution/ordering requirements so these tests cover the same physical plan shape as the existing suites.

szehon-ho · 2026-04-16T21:57:27Z

    val pk = id.getInt(0)
    buffer.deletes += pk
-    val logEntry = new GenericInternalRow(Array[Any](DELETE, pk, meta.copy(), null))
+    val metaCopy = if (meta != null) meta.copy() else null


This null guard is needed because DeltaWritingSparkTask passes null for metadata when requiredMetadataAttributes() is empty. However, the DeltaWriter API methods (delete(meta, id), update(meta, id, row), reinsert(meta, row)) don't document that meta can be null. Third-party connectors could hit the same NPE. Consider adding Javadoc on those API methods to clarify the contract.

aokolnychyi · 2026-04-16T22:21:05Z

Let me take a look.

ZiyaZa added 3 commits April 16, 2026 15:25

Fix ReplaceData DML without metadata attributes not projecting out th…

c70e1da

…e operation column

Rename WRITE_OPERATION and WRITE_WITH_METADATA_OPERATION

8b37604

Tests without metadata attributes

f69f4e6

ZiyaZa changed the title ~~[SQL] Fix ReplaceData DML without metadata attributes not projecting out the operation column~~ [SPARK-56510][SQL] Fix ReplaceData DML without metadata attributes not projecting out the operation column Apr 16, 2026

juliuszsompolski approved these changes Apr 16, 2026

View reviewed changes

Fix comment

f70cc22

szehon-ho reviewed Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56510][SQL] Fix ReplaceData DML without metadata attributes not projecting out the operation column#55372

[SPARK-56510][SQL] Fix ReplaceData DML without metadata attributes not projecting out the operation column#55372
ZiyaZa wants to merge 4 commits intoapache:masterfrom
ZiyaZa:dml-no-metadata

ZiyaZa commented Apr 16, 2026

Uh oh!

szehon-ho left a comment

Uh oh!

szehon-ho Apr 16, 2026

Uh oh!

szehon-ho Apr 16, 2026

Uh oh!

szehon-ho Apr 16, 2026

Uh oh!

szehon-ho Apr 16, 2026

Uh oh!

szehon-ho Apr 16, 2026

Uh oh!

aokolnychyi commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		final val WRITE_WITHOUT_METADATA_OPERATION: Int = 5
		final val WRITE_OPERATION: Int = 6

		package org.apache.spark.sql.connector

		class GroupBasedNoMetadataDeleteFromTableSuite extends DeleteFromTableSuiteBase {

Conversation

ZiyaZa commented Apr 16, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

szehon-ho left a comment

Choose a reason for hiding this comment

Uh oh!

szehon-ho Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

szehon-ho Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

szehon-ho Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

szehon-ho Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

szehon-ho Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

aokolnychyi commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants