fix(csv-import): preserve inherited domains across all importers without materializing them (#28523)#28655
fix(csv-import): preserve inherited domains across all importers without materializing them (#28523)#28655sonika-shah wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a CSV-import (bulk edit) validation failure when an entity has data products set but its domain is only inherited (and therefore not present in the CSV row), causing the “Data Product Domain Validation” semantics rule to incorrectly fail during RuleEngine.evaluateUpdate(...).
Changes:
- Add
evaluateUpdateRules(...)inEntityCsvto temporarily hydrate inherited domains viaEntityRepository.getForInheritance(...)for rule evaluation only, then restore the imported domains value so persistence remains unchanged. - Wire the new rule-evaluation helper into both
createEntity(...)update paths inEntityCsv. - Add unit tests covering the matching-domain pass case and mismatched-domain failure case for CSV bulk edits involving inherited domains + data products.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| openmetadata-service/src/main/java/org/openmetadata/csv/EntityCsv.java | Hydrates inherited domains only during rule evaluation for CSV updates with data products but no explicit domains, preventing false “Data Product Domain Validation” failures. |
| openmetadata-service/src/test/java/org/openmetadata/csv/EntityCsvTest.java | Adds regression tests ensuring inherited-domain validation passes when domains match and still fails on genuine domain mismatches, and that inherited domains are not persisted from validation. |
b63a03f to
b7c1a1a
Compare
b7c1a1a to
36f0f4e
Compare
…omain Validation doesn't falsely fail (#28523) Bulk-editing an entity via CSV import (tags, certification, ...) failed with "Rule [Data Product Domain Validation] validation failed" when the entity had an inherited domain and an assigned data product. Root cause: for an update row, the Drive importers load the existing entity with all fields (getEntityByName(..., "*")) — which already resolves the inherited domain — then overwrite `domains` with the (empty) CSV domain column, discarding it. The platform rule then sees a data product with an empty domain and treats it as a genuine mismatch. The single-entity edit path is unaffected because it never discards the inherited domain. Fix: treat an empty domain column as "leave domains unchanged" rather than "clear domains". A new EntityCsv.getDomains(...) overload returns the entity's already-loaded domains when the CSV column is empty, and the File / Directory / Spreadsheet / Worksheet importers pass their current domains as the fallback. This needs no extra DB lookup (the domain was already loaded) and no cache. Inherited references are filtered out before persistence (EntityUpdater uses getEntityReferences which drops inherited refs), so they are never materialized as a direct domain; a genuine data product / domain mismatch still fails. Tests: unit test for the empty-column-keeps-existing-domains behavior, plus rule-level tests (real RuleEngine + real rule) asserting a matching inherited domain passes and a genuine mismatch fails.
36f0f4e to
e1f2b4f
Compare
🟡 Playwright Results — all passed (15 flaky)✅ 4258 passed · ❌ 0 failed · 🟡 15 flaky · ⏭️ 88 skipped
🟡 15 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
…p materializing it (#28523) Extends the inherited-domain handling beyond the four Drive entities to every CSV importer that can trigger 'Data Product Domain Validation': the non-recursive table path (DatabaseSchemaRepository), the recursive table path (EntityCsv.createTableEntity, whose dry-run load previously excluded domains), and the Database/DatabaseService/DriveService/SecurityService importers. All load the entity with its inherited domain already in memory, so reusing it on an empty domain column adds no DB calls. Also fixes a materialization bug in the shared import path: feeding the inherited domain in for rule validation caused it to be persisted as a direct domain, because updateManyEntitiesForImport -> storeRelationshipsInternal -> storeDomains does not filter inherited refs the way the interactive EntityUpdater path does. dropInheritedDomains() now strips inherited domains after rule evaluation and before persistence, so the entity keeps re-inheriting on read. This also fixes the original Drive-entity paths, whose mocked unit tests never persisted and so never caught it. Adds real end-to-end integration tests (non-recursive and recursive) in DatabaseSchemaResourceIT that drive the actual importer with an inherited domain plus an assigned data product, asserting the rule passes and the inherited domain is not materialized as a direct domain. EntityCsvTest gains a unit assertion for the strip.
| public List<EntityReference> getDomains( | ||
| CSVPrinter printer, CSVRecord csvRecord, int fieldNumber, List<EntityReference> current) { | ||
| List<EntityReference> parsed = getDomains(printer, csvRecord, fieldNumber); | ||
| return parsed != null ? parsed : current; | ||
| } |
There was a problem hiding this comment.
⚠️ Edge Case: Empty domain column no longer clears an explicit (direct) domain
The new 4-arg getDomains(printer, csvRecord, fieldNumber, current) returns current (the loaded entity's entire domain list, both inherited and direct) whenever the CSV domains column is empty. dropInheritedDomains then strips only inherited=true refs — direct domains in current are kept and persisted.
Before this commit, the 3-arg overload returned null for an empty column, so an empty domains cell cleared any existing domain. After wiring the 4-arg overload into DatabaseSchemaRepository (table), DatabaseRepository, DatabaseServiceRepository (database/schema/databaseService), DriveServiceRepository, SecurityServiceRepository, and EntityCsv.createTableEntity, an empty domains column now means "leave the existing direct domain unchanged" rather than "clear it."
This is the documented intent for inherited domains, but it is also collaterally applied to explicitly-assigned (direct) domains: since the exporter writes a direct domain into the column, a user who edits an exported CSV and empties that column to remove the domain will no longer be able to clear it via bulk edit. If clearing a direct domain via empty column was a supported workflow, this is a silent regression. Confirm this is intended; if not, only fall back to current for inherited refs (e.g. return the inherited subset of current when the column is empty, not the direct domains).
Was this helpful? React with 👍 / 👎
Code Review
|
| Compact |
|
Was this helpful? React with 👍 / 👎 | Gitar
|



Problem
Closes #28523.
Bulk-editing a data asset (tags, certification, tier, description, …) fails with:
when the entity inherits its domain (e.g. a table inheriting from its schema/database) and has a data product assigned in that same domain. The error appears at the "Next"/validate step of bulk edit.
Root cause
The UI "bulk edit" is a CSV import, not the single-entity PATCH path. The exported CSV carries no explicit domain for an entity whose domain is inherited (inherited domains aren't written to the
domainscolumn). On import, the importer loads the existing entity (which still has its data products) but the empty domain column wiped the inherited domain to null, so the rule sawdataProductspresent +domainsempty and treated it as a genuine mismatch.The single-entity PATCH path works because it loads
originalwith inherited domains + data products hydrated, so the rule passes.Fix
1. Preserve the inherited domain for rule validation — across every importer
A 4-arg
getDomains(printer, csvRecord, fieldNumber, current)overload returns the entity's already-loaded domains (including inherited) when the CSVdomainscolumn is empty. Wired into every importer that can trigger the rule, not just the Drive entities:DatabaseSchemaRepository— table bulk-edit (the issue's exact path), non-recursiveEntityCsv.createTableEntity— table, recursive import (its dry-run load previously excludeddomains, so that exclusion is removed)DatabaseRepository,DatabaseServiceRepository,DriveServiceRepository,SecurityServiceRepositoryDirectoryRepository,FileRepository,SpreadsheetRepository,WorksheetRepository(Drive entities)All of these load the entity with its inherited domain already in memory, so reusing it costs no extra DB calls.
2. Never persist the inherited domain as a direct domain (
dropInheritedDomains)Supplying the inherited domain to the entity for validation introduced a subtle correctness bug: the bulk-import persistence path (
updateManyEntitiesForImport → storeRelationshipsInternal → storeDomains) stores whatever domains the entity carries as direct relationships — it does not filter inherited refs the way the interactive PATCH/PUTEntityUpdaterpath does. Left unaddressed, the inherited domain would be materialized into a permanent direct domain.dropInheritedDomains(entity)is called once, centrally, inEntityCsv.createEntity(both overloads) — after rule evaluation, before the entity is queued for persistence. It removes onlyinherited=truereferences and keeps any explicitly-provided direct domain. The entity therefore keeps re-inheriting from its parent on read and never acquires a stored direct domain. This also covers the original Drive-entity paths.Performance
No additional DB calls on any non-recursive path (the inherited domain is reused from the existing
*load). The only added work is one in-load domain resolution per table row in the recursive dry-run import (un-excludingdomains), which is intrinsic — the inherited domain can't be known without loading it — and is dwarfed by the column load that path already performs. The rule engine anddropInheritedDomainsare pure in-memory.Tests
DatabaseSchemaResourceIT(integration, real end-to-end) — two new tests (non-recursive and recursive) that drive the actual importer: a domain on the database → a table that inherits it → a data product in that domain assigned to the table → bulk-import a CSV row with an empty domain column. Each asserts (a) the rule passes on both dry-run and real import, and (b) the inherited domain is not materialized as a direct domain afterward. Both are genuine RED→GREEN reproductions.EntityCsvTest(unit) — drives the realgetDomainsoverload and the realRuleEngine/Data Product Domain Validationrule, plus a fast assertion that the inherited domain is dropped before persistence. 63/63 pass.