Skip to content

[chore] Sync upstream apache/datafusion-sqlparser-rs into sigma fork#47

Open
ayman-sigma wants to merge 122 commits intomainfrom
ayman/sync-upstream-20260406-v2
Open

[chore] Sync upstream apache/datafusion-sqlparser-rs into sigma fork#47
ayman-sigma wants to merge 122 commits intomainfrom
ayman/sync-upstream-20260406-v2

Conversation

@ayman-sigma
Copy link
Copy Markdown

@ayman-sigma ayman-sigma commented Apr 7, 2026

The changes are done by Claude Code. The important commits to review are the last 2 commits.

Summary

Syncs the sigmacomputing fork with the latest upstream apache/datafusion-sqlparser-rs main branch (HEAD 913cf0e7), bringing in all changes since v0.60.0 (120 upstream commits).

Approach: New branch created off upstream/main HEAD, then origin/main (sigma's changes) merged in. This preserves the full upstream commit history as primary ancestry.

Upstream changes included

v0.61.0 release (66 commits from 22 contributors) — highlights:

  • Performance: reduced string allocations, optimized make_word() keyword lookup
  • New DDL: ALTER OPERATOR, ALTER OPERATOR CLASS/FAMILY, DROP OPERATOR (PostgreSQL)
  • New DDL: CREATE/ALTER/DROP POLICY (PostgreSQL)
  • Oracle: quote-delimited strings (Q'...'), MERGE predicates, hierarchical queries (CONNECT BY)
  • Databricks: time-travel TIMESTAMP/VERSION AS OF, OPTIMIZE TABLE
  • MySQL: bitwise shift operators, && as boolean AND, SELECT modifiers, CAST(... AS ... ARRAY)
  • MSSQL: parenthesized EXEC, standalone BEGIN...END, IF/ELSE without semicolons
  • Snowflake: SAMPLE clause on subqueries, TRUNCATE IF EXISTS
  • PostgreSQL: PARTITION OF, force row-level security, ANALYZE with optional table/column
  • C-style comments (/* ... */), pipe operator |> fixes, tokenizer custom mapper support
  • core::error::Error impl for ParserError and TokenizerError
  • Extensive refactoring: replaced dialect_of! macro checks with trait methods

3 unreleased commits beyond v0.61.0:

Sigma-specific features preserved

All sigma custom features have been carried through the merge:

  • PassThroughQuery table factor — pass-through SQL strings that skip parsing
  • Token::Mustache and tokenize_with_range()— Mustache template tokenization with byte-offset ranges
  • Expr::InExpr — parameter substitution hack for [ NOT ] IN <expr>
  • Databricks JSON pathhas_colon field on JsonPath, AllElements variant, get_next_precedence() for :
  • Databricks backslash escapessupports_string_literal_backslash_escape()true
  • Databricks nested commentssupports_nested_comments()true
  • supports_semi_structured_array_all_elements() — Databricks [*] syntax
  • GitHub semgrep workflow and other CI additions

Conflict resolutions

Conflicts arose in 5 files and were resolved preserving both sides:

File Sigma change Upstream change Resolution
src/ast/mod.rs AllElements variant, has_colon field, InExpr expr ColonBracket variant, extensive new DDL exports Kept all
src/ast/spans.rs InExpr span handler AllElements/ColonBracket span handlers Kept all
src/lib.rs #![expect(clippy::unnecessary_unwrap)] #![forbid(clippy::unreachable)], #![forbid(missing_docs)] Kept upstream's forbids; removed sigma's now-unfulfilled expect
src/parser/mod.rs has_colon tracking, InExpr/PassThroughQuery parsing ColonBracket handling in parse_json_path Merged both
tests/sqlparser_snowflake.rs Sigma snowflake tests Upstream snowflake tests Kept all

Post-merge compilation fixes:

  • Added pos: 0 to State initializer in tokenizer hint parsing (sigma's pos field not initialized in upstream's new code path)
  • Fixed PassThroughQuery Display impl where sample code from Derived leaked in during merge
  • Added missing doc comments to sigma types (InExpr, PassThroughQuery, TokenWithRange) required by upstream's new #![forbid(missing_docs)]
  • Updated JsonPath test initializers to include has_colon: true for Snowflake colon-path tests
  • Updated Databricks a:['b'] test to use new ColonBracket AST node (upstream refined the representation; display output is identical)
  • Excluded DatabricksDialect from parse_array_subscript test — Databricks uses : for JSON paths, conflicting with array slice syntax arr[1:2]

Test plan

  • cargo clippy --all-features --all-targets — clean
  • cargo fmt — no changes
  • cargo test --all-features --all-targets — all 21 test suites pass (0 failures)

🤖 Generated with Claude Code

romanoff and others added 30 commits December 3, 2025 05:14
Co-authored-by: Ifeanyi Ubah <ify1992@yahoo.com>
Co-authored-by: Ifeanyi Ubah <ify1992@yahoo.com>
Co-authored-by: Ifeanyi Ubah <ify1992@yahoo.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Ifeanyi Ubah <ify1992@yahoo.com>
LucaCappelletti94 and others added 27 commits February 27, 2026 12:38
…ed names (apache#2244)

Co-authored-by: Yoav Cohen <59807311+yoavcloud@users.noreply.github.com>
…2228)

Signed-off-by: Guan-Ming Chiu <guanmingchiu@gmail.com>
Signed-off-by: Guan-Ming (Wesley) Chiu <105915352+guan404ming@users.noreply.github.com>
Co-authored-by: Ifeanyi Ubah <ifeanyi@validio.io>
Co-authored-by: Ifeanyi Ubah <ify1992@yahoo.com>
# Conflicts:
#	src/ast/mod.rs
#	src/ast/spans.rs
#	src/lib.rs
#	src/parser/mod.rs
#	tests/sqlparser_snowflake.rs
- Add missing `pos` field to State initializer in tokenizer hint parsing
- Remove spurious `sample` reference that leaked into PassThroughQuery Display
- Add `sample` AfterTableAlias check back to Derived Display
- Add missing doc comments to sigma types (InExpr, PassThroughQuery, TokenWithRange)
  required by upstream's new #![forbid(missing_docs)]
- Remove unfulfilled #![expect(clippy::unnecessary_unwrap)] from lib.rs
- Add has_colon: true to JsonPath initializers in Snowflake-style colon-path tests
- Update Databricks a:['b'] test to use new ColonBracket AST node (upstream change)
- Exclude DatabricksDialect from parse_array_subscript test since Databricks uses
  `:` for JSON paths, conflicting with array slice syntax arr[1:2]

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ayman-sigma ayman-sigma requested review from deifactor and jmhain April 7, 2026 01:32
has_colon: false,
path: vec![
JsonPathElem::Bracket {
JsonPathElem::ColonBracket {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deifactor to confirm the impact of this change. It seems they supported the colon case using this ColonBracket. Do we need to change stuff on our side for that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.