Skip to content

feat(optimizer): annotate types for ORDER BY alias references#7281

Open
doripo wants to merge 2 commits intotobymao:mainfrom
doripo:feat-orderby-alias-types
Open

feat(optimizer): annotate types for ORDER BY alias references#7281
doripo wants to merge 2 commits intotobymao:mainfrom
doripo:feat-orderby-alias-types

Conversation

@doripo
Copy link
Contributor

@doripo doripo commented Mar 12, 2026

When ORDER BY references a projection alias, annotate_types leaves the column typed as UNKNOWN:

import sqlglot
from sqlglot import exp
from sqlglot.optimizer import qualify, annotate_types

query = qualify.qualify(sqlglot.parse_one("SELECT x + 1 AS y FROM t ORDER BY y"), schema={"t": {"x": "INT"}})
annotated = annotate_types.annotate_types(query, schema={"t": {"x": "INT"}})
order = annotated.find(exp.Order)
print(order.expressions[0].this.type)  # UNKNOWN — should be INT

This happens because qualify_columns intentionally preserves ORDER BY alias refs (they're valid SQL in all dialects), so the annotator has no table-qualified column to resolve against.

This PR adds _resolve_order_by_alias, called from the column annotation path in _annotate_expression. When a bare column in ORDER BY matches a projection alias, it forces the projection's annotation via a recursive call if needed, then copies the type. Since qualify_columns has already added table qualifiers to projection columns, the recursive call resolves them via the normal path and does not re-enter the alias-ref code.

Test coverage includes basic alias resolution, shadowing/collisions, sort modifiers, compound expressions, set operations, window functions, subquery-as-projection, type coercion, and regression guards.

When ORDER BY references a projection alias (e.g., SELECT x+1 AS y ...
ORDER BY y), the column's type was left as UNKNOWN. qualify_columns
intentionally preserves these alias refs (they're valid SQL in all
dialects), so the single-pass annotator has no table-qualified column
to resolve against.

Add a post-pass (_fixup_order_by_aliases) in annotate_scope that runs
after projections are fully typed. It builds an alias-to-type map, fixes
matching bare columns in ORDER BY, and re-derives parent types on
compound expressions (e.g., ORDER BY y + 1) via _reannotate_subtree.

This approach avoids modifying _annotate_expression, the core annotation
loop. _reannotate_subtree clears non-leaf types (preserving Column/Literal
ground truth), prunes at Subquery boundaries, and re-invokes
_annotate_expression sequentially.
@doripo
Copy link
Contributor Author

doripo commented Mar 12, 2026

Happy to adjust the approach if you'd prefer this handled differently. A few notes on design choices:

  • Post-pass vs. modifying _annotate_expression: I went with a post-pass to avoid touching the core annotation loop. The tradeoff is two walks over the ORDER BY subtree, but it keeps the change self-contained.
  • _reannotate_subtree as a separate method: Seemed cleaner and potentially reusable for traversal-order gaps, but can be inlined into _fixup_order_by_aliases if preferred.


# Iterate through all the expressions of the current scope in post-order, and annotate
self._annotate_expression(scope.expression, scope)
self._fixup_order_by_aliases(scope)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we introducing a separate pass here? This method seems very costly. Is it possible to solve this more simply by resolving ORDER BY alias references to their projection expression types during the existing column annotation path (lines 390-425)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick review!

Switched to an in-loop approach in the new commit — _resolve_order_by_alias forces the projection's annotation via a recursive call when needed. Updated the PR description accordingly.

Replace the post-pass (_fixup_order_by_aliases + _reannotate_subtree)
with _resolve_order_by_alias, called from the column annotation path
in _annotate_expression. When a bare column in ORDER BY matches a
projection alias, it forces the projection's annotation via a recursive
call if needed, then copies the type.

This resolves alias types during the existing annotation pass instead
of walking the ORDER BY subtree twice after the fact.

Signed-off-by: Dori Polotsky <doripo@riverpool.ai>
@doripo
Copy link
Contributor Author

doripo commented Mar 13, 2026

Note: _resolve_order_by_alias has extra logic to collect the last matching alias rather than returning on first match, to replicate behavior empirically observed on a local DuckDB and pinned as a test. On the other hand, duplicate alias resolution appears unspecified across dialects -- this could be simplified to an early return and the duplicate alias test dropped if preferred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants