Skip to content

fix: use BINARY type for RAW_BYTES URL fetcher schema#5133

Merged
Yicong-Huang merged 4 commits into
apache:mainfrom
Ma77Ball:fix/RawBytesDecoder
May 21, 2026
Merged

fix: use BINARY type for RAW_BYTES URL fetcher schema#5133
Yicong-Huang merged 4 commits into
apache:mainfrom
Ma77Ball:fix/RawBytesDecoder

Conversation

@Ma77Ball
Copy link
Copy Markdown
Contributor

@Ma77Ball Ma77Ball commented May 19, 2026

What changes were proposed in this PR?

URLFetcherOpDesc.sourceSchema() advertised AttributeType.ANY for RAW_BYTES decoding, even though the executor already emits a concrete byte[]. This change returns AttributeType.BINARY instead, matching the runtime payload and unblocking Iceberg materialization (which rejects ANY). The existing URLFetcherOpDescSpec test that pinned the old behavior is flipped to assert BINARY.

Test json:
url-fetcher-raw-bytes-test.json

Any related issues, documentation, or discussions?

Closes: #5074

How was this PR tested?

Updated URLFetcherOpDescSpec covers the schema; URLFetcherOpExecSpec already pins the runtime field as Array[Byte], so static schema and runtime type now agree. CI Scala test job is expected to pass.

Was this PR authored or co-authored using generative AI tooling?

Co-authored with Claude Opus 4.7 in compliance with ASF

@Ma77Ball
Copy link
Copy Markdown
Contributor Author

/request-review @aglinxinyuan

@aglinxinyuan
Copy link
Copy Markdown
Contributor

Can you provide a workflow JSON that I can test the difference before and after this change?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

@Yicong-Huang Yicong-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Yicong-Huang Yicong-Huang enabled auto-merge (squash) May 19, 2026 07:09
@Yicong-Huang Yicong-Huang disabled auto-merge May 19, 2026 07:09
@Yicong-Huang
Copy link
Copy Markdown
Contributor

Can you provide a workflow JSON that I can test the difference before and after this change?

@Ma77Ball please provide a test JSON

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 42.86%. Comparing base (e4557ee) to head (c0a186c).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5133      +/-   ##
============================================
- Coverage     43.34%   42.86%   -0.49%     
+ Complexity     2213     2210       -3     
============================================
  Files          1049     1045       -4     
  Lines         40561    40188     -373     
  Branches       4322     4251      -71     
============================================
- Hits          17581    17226     -355     
  Misses        21888    21888              
+ Partials       1092     1074      -18     
Flag Coverage Δ *Carryforward flag
access-control-service 39.53% <ø> (ø)
agent-service 33.72% <ø> (ø) Carriedforward from e225037
amber 43.80% <100.00%> (-0.01%) ⬇️
computing-unit-managing-service 0.00% <ø> (ø)
config-service 0.00% <ø> (ø)
file-service 32.18% <ø> (ø)
frontend 33.87% <ø> (-0.75%) ⬇️ Carriedforward from e225037
python 89.14% <ø> (-1.37%) ⬇️ Carriedforward from e225037
workflow-compiling-service 56.81% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Yicong-Huang
Copy link
Copy Markdown
Contributor

@Ma77Ball can you make coverage happy?

@Ma77Ball
Copy link
Copy Markdown
Contributor Author

I added a test JSON file and improved code coverage. @Yicong-Huang, please review and merge.

@Yicong-Huang Yicong-Huang added this pull request to the merge queue May 21, 2026
Merged via the queue into apache:main with commit 62883b8 May 21, 2026
20 checks passed
@Ma77Ball Ma77Ball deleted the fix/RawBytesDecoder branch May 21, 2026 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RAW_BYTES decoding broken: sourceSchema() returns ANY, fails with "ANY type is not supported in Iceberg"

5 participants