Skip to content

fix(csv-import): preserve inherited domains across all importers without materializing them (#28523)#28655

Open
sonika-shah wants to merge 2 commits into
mainfrom
worktree-fix-28523-inherited-domain-bulk-edit
Open

fix(csv-import): preserve inherited domains across all importers without materializing them (#28523)#28655
sonika-shah wants to merge 2 commits into
mainfrom
worktree-fix-28523-inherited-domain-bulk-edit

Conversation

@sonika-shah
Copy link
Copy Markdown
Collaborator

@sonika-shah sonika-shah commented Jun 3, 2026

Problem

Closes #28523.

Bulk-editing a data asset (tags, certification, tier, description, …) fails with:

Rule [Data Product Domain Validation] validation failed: Entity does not satisfy the rule.

when the entity inherits its domain (e.g. a table inheriting from its schema/database) and has a data product assigned in that same domain. The error appears at the "Next"/validate step of bulk edit.

Root cause

The UI "bulk edit" is a CSV import, not the single-entity PATCH path. The exported CSV carries no explicit domain for an entity whose domain is inherited (inherited domains aren't written to the domains column). On import, the importer loads the existing entity (which still has its data products) but the empty domain column wiped the inherited domain to null, so the rule saw dataProducts present + domains empty and treated it as a genuine mismatch.

The single-entity PATCH path works because it loads original with inherited domains + data products hydrated, so the rule passes.

Fix

1. Preserve the inherited domain for rule validation — across every importer

A 4-arg getDomains(printer, csvRecord, fieldNumber, current) overload returns the entity's already-loaded domains (including inherited) when the CSV domains column is empty. Wired into every importer that can trigger the rule, not just the Drive entities:

  • DatabaseSchemaRepositorytable bulk-edit (the issue's exact path), non-recursive
  • EntityCsv.createTableEntitytable, recursive import (its dry-run load previously excluded domains, so that exclusion is removed)
  • DatabaseRepository, DatabaseServiceRepository, DriveServiceRepository, SecurityServiceRepository
  • DirectoryRepository, FileRepository, SpreadsheetRepository, WorksheetRepository (Drive entities)

All of these load the entity with its inherited domain already in memory, so reusing it costs no extra DB calls.

2. Never persist the inherited domain as a direct domain (dropInheritedDomains)

Supplying the inherited domain to the entity for validation introduced a subtle correctness bug: the bulk-import persistence path (updateManyEntitiesForImport → storeRelationshipsInternal → storeDomains) stores whatever domains the entity carries as direct relationships — it does not filter inherited refs the way the interactive PATCH/PUT EntityUpdater path does. Left unaddressed, the inherited domain would be materialized into a permanent direct domain.

dropInheritedDomains(entity) is called once, centrally, in EntityCsv.createEntity (both overloads) — after rule evaluation, before the entity is queued for persistence. It removes only inherited=true references and keeps any explicitly-provided direct domain. The entity therefore keeps re-inheriting from its parent on read and never acquires a stored direct domain. This also covers the original Drive-entity paths.

Performance

No additional DB calls on any non-recursive path (the inherited domain is reused from the existing * load). The only added work is one in-load domain resolution per table row in the recursive dry-run import (un-excluding domains), which is intrinsic — the inherited domain can't be known without loading it — and is dwarfed by the column load that path already performs. The rule engine and dropInheritedDomains are pure in-memory.

Tests

  • DatabaseSchemaResourceIT (integration, real end-to-end) — two new tests (non-recursive and recursive) that drive the actual importer: a domain on the database → a table that inherits it → a data product in that domain assigned to the table → bulk-import a CSV row with an empty domain column. Each asserts (a) the rule passes on both dry-run and real import, and (b) the inherited domain is not materialized as a direct domain afterward. Both are genuine RED→GREEN reproductions.
  • EntityCsvTest (unit) — drives the real getDomains overload and the real RuleEngine/Data Product Domain Validation rule, plus a fast assertion that the inherited domain is dropped before persistence. 63/63 pass.

Ran mvn spotless:apply on the changed modules.

Copilot AI review requested due to automatic review settings June 3, 2026 04:40
Comment thread openmetadata-service/src/main/java/org/openmetadata/csv/EntityCsv.java Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a CSV-import (bulk edit) validation failure when an entity has data products set but its domain is only inherited (and therefore not present in the CSV row), causing the “Data Product Domain Validation” semantics rule to incorrectly fail during RuleEngine.evaluateUpdate(...).

Changes:

  • Add evaluateUpdateRules(...) in EntityCsv to temporarily hydrate inherited domains via EntityRepository.getForInheritance(...) for rule evaluation only, then restore the imported domains value so persistence remains unchanged.
  • Wire the new rule-evaluation helper into both createEntity(...) update paths in EntityCsv.
  • Add unit tests covering the matching-domain pass case and mismatched-domain failure case for CSV bulk edits involving inherited domains + data products.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
openmetadata-service/src/main/java/org/openmetadata/csv/EntityCsv.java Hydrates inherited domains only during rule evaluation for CSV updates with data products but no explicit domains, preventing false “Data Product Domain Validation” failures.
openmetadata-service/src/test/java/org/openmetadata/csv/EntityCsvTest.java Adds regression tests ensuring inherited-domain validation passes when domains match and still fails on genuine domain mismatches, and that inherited domains are not persisted from validation.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread openmetadata-service/src/main/java/org/openmetadata/csv/EntityCsv.java Outdated
…omain Validation doesn't falsely fail (#28523)

Bulk-editing an entity via CSV import (tags, certification, ...) failed with
"Rule [Data Product Domain Validation] validation failed" when the entity had
an inherited domain and an assigned data product.

Root cause: for an update row, the Drive importers load the existing entity
with all fields (getEntityByName(..., "*")) — which already resolves the
inherited domain — then overwrite `domains` with the (empty) CSV domain
column, discarding it. The platform rule then sees a data product with an
empty domain and treats it as a genuine mismatch. The single-entity edit path
is unaffected because it never discards the inherited domain.

Fix: treat an empty domain column as "leave domains unchanged" rather than
"clear domains". A new EntityCsv.getDomains(...) overload returns the entity's
already-loaded domains when the CSV column is empty, and the File / Directory /
Spreadsheet / Worksheet importers pass their current domains as the fallback.
This needs no extra DB lookup (the domain was already loaded) and no cache.
Inherited references are filtered out before persistence (EntityUpdater uses
getEntityReferences which drops inherited refs), so they are never materialized
as a direct domain; a genuine data product / domain mismatch still fails.

Tests: unit test for the empty-column-keeps-existing-domains behavior, plus
rule-level tests (real RuleEngine + real rule) asserting a matching inherited
domain passes and a genuine mismatch fails.
Copilot AI review requested due to automatic review settings June 4, 2026 01:25
@sonika-shah sonika-shah force-pushed the worktree-fix-28523-inherited-domain-bulk-edit branch from 36f0f4e to e1f2b4f Compare June 4, 2026 01:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

🟡 Playwright Results — all passed (15 flaky)

✅ 4258 passed · ❌ 0 failed · 🟡 15 flaky · ⏭️ 88 skipped

Shard Passed Failed Flaky Skipped
✅ Shard 1 299 0 0 4
✅ Shard 2 803 0 0 9
🟡 Shard 3 802 0 2 8
🟡 Shard 4 850 0 4 12
🟡 Shard 5 719 0 1 47
🟡 Shard 6 785 0 8 8
🟡 15 flaky test(s) (passed on retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 2 retries)
  • Flow/ExploreAggregationCountsMatching.spec.ts › should verify left panel counts and tab search results for normal search (shard 3, 1 retry)
  • Pages/CustomProperties.spec.ts › Markdown (shard 4, 1 retry)
  • Pages/CustomProperties.spec.ts › Enum (shard 4, 1 retry)
  • Pages/DataContractInheritance.spec.ts › Run Validation - Inherited contract validation uses entity-based validation (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Contract Status badge should be visible on condition if Contract Tab is present/hidden by Persona (shard 4, 1 retry)
  • Pages/Entity.spec.ts › Delete Topic (shard 5, 1 retry)
  • Pages/Lineage/DataAssetLineage.spec.ts › Column lineage for dashboardDataModel -> mlModel (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage service filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage service type filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/Lineage/PlatformLineage.spec.ts › Verify domain platform view (shard 6, 1 retry)
  • Pages/ODCSImportExport.spec.ts › Multi-object ODCS contract - object selector shows all schema objects (shard 6, 1 retry)
  • Pages/TestSuite.spec.ts › Logical TestSuite (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

…p materializing it (#28523)

Extends the inherited-domain handling beyond the four Drive entities to every CSV importer that can trigger 'Data Product Domain Validation': the non-recursive table path (DatabaseSchemaRepository), the recursive table path (EntityCsv.createTableEntity, whose dry-run load previously excluded domains), and the Database/DatabaseService/DriveService/SecurityService importers. All load the entity with its inherited domain already in memory, so reusing it on an empty domain column adds no DB calls.

Also fixes a materialization bug in the shared import path: feeding the inherited domain in for rule validation caused it to be persisted as a direct domain, because updateManyEntitiesForImport -> storeRelationshipsInternal -> storeDomains does not filter inherited refs the way the interactive EntityUpdater path does. dropInheritedDomains() now strips inherited domains after rule evaluation and before persistence, so the entity keeps re-inheriting on read. This also fixes the original Drive-entity paths, whose mocked unit tests never persisted and so never caught it.

Adds real end-to-end integration tests (non-recursive and recursive) in DatabaseSchemaResourceIT that drive the actual importer with an inherited domain plus an assigned data product, asserting the rule passes and the inherited domain is not materialized as a direct domain. EntityCsvTest gains a unit assertion for the strip.
@sonika-shah sonika-shah changed the title fix(csv-import): resolve inherited domains before Data Product Domain Validation (#28523) fix(csv-import): preserve inherited domains across all importers without materializing them (#28523) Jun 4, 2026
Comment on lines +422 to +426
public List<EntityReference> getDomains(
CSVPrinter printer, CSVRecord csvRecord, int fieldNumber, List<EntityReference> current) {
List<EntityReference> parsed = getDomains(printer, csvRecord, fieldNumber);
return parsed != null ? parsed : current;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Edge Case: Empty domain column no longer clears an explicit (direct) domain

The new 4-arg getDomains(printer, csvRecord, fieldNumber, current) returns current (the loaded entity's entire domain list, both inherited and direct) whenever the CSV domains column is empty. dropInheritedDomains then strips only inherited=true refs — direct domains in current are kept and persisted.

Before this commit, the 3-arg overload returned null for an empty column, so an empty domains cell cleared any existing domain. After wiring the 4-arg overload into DatabaseSchemaRepository (table), DatabaseRepository, DatabaseServiceRepository (database/schema/databaseService), DriveServiceRepository, SecurityServiceRepository, and EntityCsv.createTableEntity, an empty domains column now means "leave the existing direct domain unchanged" rather than "clear it."

This is the documented intent for inherited domains, but it is also collaterally applied to explicitly-assigned (direct) domains: since the exporter writes a direct domain into the column, a user who edits an exported CSV and empties that column to remove the domain will no longer be able to clear it via bulk edit. If clearing a direct domain via empty column was a supported workflow, this is a silent regression. Confirm this is intended; if not, only fall back to current for inherited refs (e.g. return the inherited subset of current when the column is empty, not the direct domains).

Was this helpful? React with 👍 / 👎

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Jun 4, 2026

Code Review ⚠️ Changes requested 3 resolved / 4 findings

Preserves inherited domains during CSV bulk import to satisfy Data Product Domain Validation, but the fix incorrectly treats an empty CSV domain column as a signal to retain existing direct domains instead of clearing them.

⚠️ Edge Case: Empty domain column no longer clears an explicit (direct) domain

📄 openmetadata-service/src/main/java/org/openmetadata/csv/EntityCsv.java:422-426 📄 openmetadata-service/src/main/java/org/openmetadata/csv/EntityCsv.java:434-445 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DatabaseSchemaRepository.java:968 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DatabaseServiceRepository.java:290 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DatabaseServiceRepository.java:374 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DatabaseServiceRepository.java:451 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DatabaseRepository.java:795 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DriveServiceRepository.java:220

The new 4-arg getDomains(printer, csvRecord, fieldNumber, current) returns current (the loaded entity's entire domain list, both inherited and direct) whenever the CSV domains column is empty. dropInheritedDomains then strips only inherited=true refs — direct domains in current are kept and persisted.

Before this commit, the 3-arg overload returned null for an empty column, so an empty domains cell cleared any existing domain. After wiring the 4-arg overload into DatabaseSchemaRepository (table), DatabaseRepository, DatabaseServiceRepository (database/schema/databaseService), DriveServiceRepository, SecurityServiceRepository, and EntityCsv.createTableEntity, an empty domains column now means "leave the existing direct domain unchanged" rather than "clear it."

This is the documented intent for inherited domains, but it is also collaterally applied to explicitly-assigned (direct) domains: since the exporter writes a direct domain into the column, a user who edits an exported CSV and empties that column to remove the domain will no longer be able to clear it via bulk edit. If clearing a direct domain via empty column was a supported workflow, this is a silent regression. Confirm this is intended; if not, only fall back to current for inherited refs (e.g. return the inherited subset of current when the column is empty, not the direct domains).

✅ 3 resolved
Performance: Extra per-row DB query during bulk import validation

📄 openmetadata-service/src/main/java/org/openmetadata/csv/EntityCsv.java:1131-1145
evaluateUpdateRules calls repository.getForInheritance(...) for every updated row that has data products but no explicit domain. getForInheritance issues a find(id, include) plus a relationship lookup (findToRelationshipsForEntity), so a bulk CSV import of N such drive entities (Directory/File/Spreadsheet/Worksheet) adds N extra DB round-trips on top of the existing findMatchForImport query — including during dry-run validation, where the original code performed rule evaluation purely in memory.

This mirrors the single-entity edit path and is gated behind a narrow condition (data products present, domains empty, inherited domain found), so the impact is limited and likely acceptable. If large bulk imports of domain-inheriting assets with data products become common, consider batching/caching the inheritance resolution per import run to avoid the per-row query.

Edge Case: Inherited-domain resolution depends on parent ref being set on parsed entity

📄 openmetadata-service/src/main/java/org/openmetadata/csv/EntityCsv.java:1132-1146
The fix now resolves inherited domains via repository.setInheritedFieldsUntyped(List.of(entity), getFields("domains")). The batch setInheritedFields(List, Fields) builds its parent map from getParentReference(entity) (e.g. entity.getDatabaseSchema(), entity.getService()), and only falls back to the per-entity getParentEntity(entity, fields) path when no parent ref is found. This is correct ONLY if the CSV-parsed import entity has its parent EntityReference (or the data needed by getParentEntity) populated at the point evaluateUpdateRules runs. The previous implementation resolved inheritance off the persisted original (loaded by id) via getForInheritance, which did not rely on the parsed entity carrying a parent reference. If, for some Drive importer, the parsed entity lacks the parent reference, inheritance silently resolves to empty and the "Data Product Domain Validation" rule will incorrectly fail — the exact bug this PR fixes. The unit tests stub setInheritedFieldsUntyped, so they do not exercise the real parent-resolution path. Recommend an integration-level test (or manual verification) against a real Drive entity import to confirm the parent reference is populated on the parsed entity before rule evaluation.

Quality: Removed test for once-per-parent inheritance memoization leaves it uncovered

📄 openmetadata-service/src/test/java/org/openmetadata/csv/EntityCsvTest.java:2967
The commit deletes test_bulkEditInheritedDomainResolvesOncePerParent, which previously asserted inherited-domain resolution happened only once per parent across multiple rows. The memoization responsibility now lives in EntityRepository's request-scoped parent cache (getCachedInheritanceParent/cacheInheritanceParent), so the per-import dedup behavior is no longer verified anywhere in EntityCsvTest. The remaining tests stub setInheritedFieldsUntyped directly and never assert call counts across multiple rows, so a regression that re-loads the parent per row (the original perf concern that was previously fixed) would not be caught. Consider keeping a multi-row test that verifies the repository's parent cache is consulted, or asserting the per-parent resolution count, to guard the performance behavior described in the PR's perf commit.

🤖 Prompt for agents
Code Review: Preserves inherited domains during CSV bulk import to satisfy Data Product Domain Validation, but the fix incorrectly treats an empty CSV domain column as a signal to retain existing direct domains instead of clearing them.

1. ⚠️ Edge Case: Empty domain column no longer clears an explicit (direct) domain
   Files: openmetadata-service/src/main/java/org/openmetadata/csv/EntityCsv.java:422-426, openmetadata-service/src/main/java/org/openmetadata/csv/EntityCsv.java:434-445, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DatabaseSchemaRepository.java:968, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DatabaseServiceRepository.java:290, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DatabaseServiceRepository.java:374, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DatabaseServiceRepository.java:451, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DatabaseRepository.java:795, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DriveServiceRepository.java:220

   The new 4-arg `getDomains(printer, csvRecord, fieldNumber, current)` returns `current` (the loaded entity's *entire* domain list, both inherited and direct) whenever the CSV `domains` column is empty. `dropInheritedDomains` then strips only `inherited=true` refs — direct domains in `current` are kept and persisted.
   
   Before this commit, the 3-arg overload returned `null` for an empty column, so an empty `domains` cell cleared any existing domain. After wiring the 4-arg overload into DatabaseSchemaRepository (table), DatabaseRepository, DatabaseServiceRepository (database/schema/databaseService), DriveServiceRepository, SecurityServiceRepository, and EntityCsv.createTableEntity, an empty `domains` column now means "leave the existing direct domain unchanged" rather than "clear it."
   
   This is the documented intent for inherited domains, but it is also collaterally applied to explicitly-assigned (direct) domains: since the exporter writes a direct domain into the column, a user who edits an exported CSV and empties that column to remove the domain will no longer be able to clear it via bulk edit. If clearing a direct domain via empty column was a supported workflow, this is a silent regression. Confirm this is intended; if not, only fall back to `current` for inherited refs (e.g. return the inherited subset of `current` when the column is empty, not the direct domains).

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jun 4, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Bulk Edit Fails for Tables with Inherited Domain Due to Data Product Domain Validation

2 participants