Summary
flatten_join_alias_var_optimizer in src/backend/optimizer/util/clauses.c unconditionally called pfree(havingQual) even when flatten_join_alias_vars returned the same pointer (i.e., nothing was changed). This caused a use-after-free that led to non-deterministic ORCA fallback to the Postgres planner for correlated subqueries with GROUP BY () HAVING <outer_ref>.
Root Cause
In the original code:
Node *havingQual = queryNew->havingQual;
if (NULL != havingQual)
{
queryNew->havingQual = flatten_join_alias_vars(queryNew, havingQual);
pfree(havingQual); // ← always freed, even when pointer unchanged
}
When flatten_join_alias_vars returns the same pointer (e.g., havingQual is an outer-reference Var with varlevelsup=1 — nothing to flatten), the code frees the live node and leaves queryNew->havingQual pointing to freed memory.
Observed Mechanism (Debug Instrumentation)
For the query:
SELECT v.c, (SELECT count(*) FROM gstest2 GROUP BY () HAVING v.c)
FROM (VALUES (false),(true)) v(c) ORDER BY v.c;
The inner subquery's havingQual is v.c (a T_Var, nodeTag=150). Debug output:
DEBUG flatten_join_alias_var_optimizer: pfree havingQual=0x55fc9d054080 (same=1) nodeTag_before=150
DEBUG after pfree: havingQual=0x55fc9d054080 nodeTag_after=2139062143 ← freed (0x7F7F7F7F)
DEBUG copyQuery: havingQual=0x55fc9d054080 nodeTag=596 ← memory REUSED as T_List!
Step-by-step:
pfree(v.c Var) at address 0x55fc9d054080 — returned to palloc free pool
EliminateDistinctClause calls gpdb::CopyObject(query) → copyObjectImpl(T_Query)
- While copying earlier fields (targetList, groupingSets…),
palloc reuses 0x55fc9d054080 for a new T_List node (nodeTag=596)
COPY_NODE_FIELD(havingQual) calls copyObjectImpl(0x55fc9d054080) — now sees T_List instead of T_Var
pqueryEliminateDistinct->havingQual = copy of a random T_List
- ORCA's query translator receives a
T_List as the HAVING expression (expects a scalar boolean)
- ORCA finds a
RangeTblEntry for gstest2 inside that list and throws:
GPORCA does not support the following feature:
({RTE :alias <> :eref {ALIAS :aliasname gstest2 ...} :rtekind 0 ...})
- This is a non-
ExmaGPDB GPOS exception → caught in CGPOptimizer::PlannedStmtFromQueryInternal → ORCA falls back to Postgres planner
Why the Bug Went Unnoticed
The Postgres planner fallback produced the correct result (f | NULL), so no regression test ever failed. The memory corruption was silently masked.
The bug was exposed when fixing the same function's list_free guards on targetList/returningList (adding pointer-equality checks before freeing). After that fix, ORCA no longer fell back for this query — but ORCA's decorrelation logic for GROUP BY () HAVING <outer_ref> was incorrect, producing wrong results (f | 0 instead of f | NULL).
Fix
Guard pfree with a pointer-equality check (same pattern already applied to targetList, returningList, scatterClause, limitOffset, limitCount):
Node *havingQual = queryNew->havingQual;
if (NULL != havingQual)
{
queryNew->havingQual = flatten_join_alias_vars(queryNew, havingQual);
if (havingQual != queryNew->havingQual) // ← only free if mutated
pfree(havingQual);
}
Related ORCA Fix
The ORCA decorrelation bug exposed by this fix (incorrect COALESCE(count(*), 0) applied to GROUP BY () HAVING <outer_ref>) was separately fixed in src/backend/gporca/libgpopt/src/xforms/CSubqueryHandler.cpp by detecting the correlated-HAVING pattern in SSubqueryDesc::Psd() and forcing m_fCorrelatedExecution = true to route through the SubPlan (correlated execution) path instead of the incorrect Left Outer Join + COALESCE decorrelation path.
Summary
flatten_join_alias_var_optimizerinsrc/backend/optimizer/util/clauses.cunconditionally calledpfree(havingQual)even whenflatten_join_alias_varsreturned the same pointer (i.e., nothing was changed). This caused a use-after-free that led to non-deterministic ORCA fallback to the Postgres planner for correlated subqueries withGROUP BY () HAVING <outer_ref>.Root Cause
In the original code:
When
flatten_join_alias_varsreturns the same pointer (e.g.,havingQualis an outer-referenceVarwithvarlevelsup=1— nothing to flatten), the code frees the live node and leavesqueryNew->havingQualpointing to freed memory.Observed Mechanism (Debug Instrumentation)
For the query:
The inner subquery's
havingQualisv.c(aT_Var, nodeTag=150). Debug output:Step-by-step:
pfree(v.c Var)at address0x55fc9d054080— returned to palloc free poolEliminateDistinctClausecallsgpdb::CopyObject(query)→copyObjectImpl(T_Query)pallocreuses0x55fc9d054080for a newT_Listnode (nodeTag=596)COPY_NODE_FIELD(havingQual)callscopyObjectImpl(0x55fc9d054080)— now sees T_List instead of T_VarpqueryEliminateDistinct->havingQual= copy of a randomT_ListT_Listas the HAVING expression (expects a scalar boolean)RangeTblEntryforgstest2inside that list and throws:ExmaGPDBGPOS exception → caught inCGPOptimizer::PlannedStmtFromQueryInternal→ ORCA falls back to Postgres plannerWhy the Bug Went Unnoticed
The Postgres planner fallback produced the correct result (
f | NULL), so no regression test ever failed. The memory corruption was silently masked.The bug was exposed when fixing the same function's
list_freeguards ontargetList/returningList(adding pointer-equality checks before freeing). After that fix, ORCA no longer fell back for this query — but ORCA's decorrelation logic forGROUP BY () HAVING <outer_ref>was incorrect, producing wrong results (f | 0instead off | NULL).Fix
Guard
pfreewith a pointer-equality check (same pattern already applied totargetList,returningList,scatterClause,limitOffset,limitCount):Related ORCA Fix
The ORCA decorrelation bug exposed by this fix (incorrect
COALESCE(count(*), 0)applied toGROUP BY () HAVING <outer_ref>) was separately fixed insrc/backend/gporca/libgpopt/src/xforms/CSubqueryHandler.cppby detecting the correlated-HAVING pattern inSSubqueryDesc::Psd()and forcingm_fCorrelatedExecution = trueto route through the SubPlan (correlated execution) path instead of the incorrect Left Outer Join + COALESCE decorrelation path.