Skip to content

ref(vortex-io): unify cloud object storage api#8259

Open
m7kss1 wants to merge 8 commits into
vortex-data:developfrom
m7kss1:feat-cloud-object-store
Open

ref(vortex-io): unify cloud object storage api#8259
m7kss1 wants to merge 8 commits into
vortex-data:developfrom
m7kss1:feat-cloud-object-store

Conversation

@m7kss1

@m7kss1 m7kss1 commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

Cloud object store construction was duplicated across vortex-jni, vortex-python,
vortex-duckdb each with different scheme coverage and credential behavior

New API

/// Resolve any URL or path
let vxf = session.open_options().open_url("s3://bucket/key/file.vortex").await?;

/// Typed result for callers that need the store directly
let (store, path) = FileLocation::resolve(url)?.into_remote()?;

CLI: now remote files work out of the box:

$ AWS_REGION=us-east-1 vx query s3://bucket/hits.vortex --sql "select count(*) from hits"
$ vx tree gs://bucket/hits.vortex
$ vx browse az://bucket/hits.vortex

Closes: #000

Testing

AI disclosure: XXX

@m7kss1 m7kss1 force-pushed the feat-cloud-object-store branch 2 times, most recently from bd6efc0 to 70f771d Compare June 5, 2026 06:59
@m7kss1 m7kss1 changed the title [WIP] ref(vortex-io): unify cloud object storage api ref(vortex-io): unify cloud object storage api Jun 5, 2026
@m7kss1

m7kss1 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

note: verified only on s3-compatible storages, azure and gcs remain untested for now

@AdamGS AdamGS self-assigned this Jun 5, 2026
@AdamGS

AdamGS commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Hi @m7kss1 thanks for your PR! I'll review it next week.

@m7kss1 m7kss1 requested a review from a team June 5, 2026 18:15
m7kss1 added 8 commits June 7, 2026 19:04
Consolidate cloud storage builders and resolvers from multiple integrations
(JNI, DuckDB, datafusion-bench) into a single canonical core module.

New:
- vortex-io/src/object_store/{cloud,registry}.rs with Registry and resolve_url
- VortexOpenOptions::open_url dispatcher for CLI and library users
- 'cloud' umbrella feature gating S3, GCS, Azure, HTTP support

Migrate:
- JNI, DuckDB, datafusion-bench to use core make_object_store
- CLI commands to accept remote URLs (s3://, gs://, az://)

Improvements:
- Consistent credential/endpoint handling across integrations
- First-class remote file support in vortex query/tree/browse/segments

Signed-off-by: Maxim Dergousov <dergousovmaxim99@gmail.com>
Signed-off-by: Dergousov Maksim <dergousovmaxim99@gmail.com>
Signed-off-by: Dergousov Maksim <dergousovmaxim99@gmail.com>
Signed-off-by: Dergousov Maksim <dergousovmaxim99@gmail.com>
Signed-off-by: Dergousov Maksim <dergousovmaxim99@gmail.com>
Signed-off-by: Dergousov Maksim <dergousovmaxim99@gmail.com>
Signed-off-by: Dergousov Maksim <dergousovmaxim99@gmail.com>
Signed-off-by: Dergousov Maksim <dergousovmaxim99@gmail.com>
Signed-off-by: Dergousov Maksim <dergousovmaxim99@gmail.com>
@m7kss1 m7kss1 force-pushed the feat-cloud-object-store branch from 097a411 to c952c6d Compare June 7, 2026 19:08
@codspeed-hq

codspeed-hq Bot commented Jun 8, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 4 improved benchmarks
❌ 4 regressed benchmarks
✅ 1505 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation baseline_eq[4, 65536] 186.5 µs 239.2 µs -22.02%
Simulation baseline_lt[16, 65536] 219 µs 276.8 µs -20.87%
Simulation baseline_lt[4, 65536] 202.3 µs 254.4 µs -20.47%
Simulation baseline_eq[16, 65536] 231.4 µs 261.7 µs -11.57%
Simulation bitwise_not_vortex_buffer_mut[128] 275.3 ns 216.9 ns +26.89%
Simulation bitwise_not_vortex_buffer_mut[1024] 336.9 ns 278.6 ns +20.94%
Simulation bitwise_not_vortex_buffer_mut[2048] 400.6 ns 342.2 ns +17.05%
Simulation encode_varbin[(1000, 2)] 164.1 µs 143.1 µs +14.65%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing m7kss1:feat-cloud-object-store (c952c6d) with develop (e06d80b)

Open in CodSpeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants