Skip to content

feat:Generalize LiteralList lowering to support float, string and mixed-type lists#185

Open
yogendra-17 wants to merge 2 commits intoarxlang:mainfrom
yogendra-17:feature-Literallist
Open

feat:Generalize LiteralList lowering to support float, string and mixed-type lists#185
yogendra-17 wants to merge 2 commits intoarxlang:mainfrom
yogendra-17:feature-Literallist

Conversation

@yogendra-17
Copy link
Contributor

Notes

  • This repository uses an AI bot for reviews. Keep your PR in Draft while
    you work. When you’re ready for a review, change the status to Ready for
    review
    to trigger a new review round. If you make additional changes and
    don’t want to trigger the bot, switch the PR back to Draft.
  • AI-bot comments may not always be accurate. Please review them critically and
    share your feedback; it helps us improve the tool.
  • Avoid changing code that is unrelated to your proposal. Keep your PR as short
    as possible to increase the chances of a timely review. Large PRs may not be
    reviewed and may be closed.
  • Don’t add unnecessary comments. Your code should be readable and
    self-documenting
    (guidance).
  • Don’t change core features without prior discussion with the community. Use
    our Discord to discuss ideas, blockers, or issues
    (https://discord.gg/Nu4MdGj9jB).
  • Do not include secrets (API keys, tokens, passwords), credentials, or
    sensitive data/PII in code, configs, logs, screenshots, or commit history. If
    something leaks, rotate the credentials immediately, invalidate the old key,
    and note it in the PR so maintainers can assist.
  • Do not commit large binaries or generated artifacts. If large datasets are
    needed for tests, prefer small fixtures or programmatic downloads declared in
    makim.yaml (e.g., a task that fetches data at test time). If a large binary is
    unavoidable, discuss first and consider Git LFS.

Pull Request description

This PR enhances LiteralList lowering by removing the restriction that only homogeneous integer constant lists are supported. Previously, the visitor raised TypeError for valid constructs such as float lists ([1.0, 2.0]) and string lists (["a", "b"]). The implementation has been generalized to infer a common element type, perform necessary type coercions (including integer widening and int-to-float promotion), and correctly handle both constant and runtime-valued elements. Constant lists continue to use the efficient constant-array path, while non-constant lists are lowered using alloca and store, ensuring consistent and type-safe LLVM IR generation.

How to test these changes

  • ...

Pull Request checklists

This PR is a:

  • bug-fix
  • new feature
  • maintenance

About this PR:

  • it includes tests.
  • the tests are executed on CI.
  • the tests generate log file(s) (path).
  • pre-commit hooks were executed locally.
  • this PR requires a project documentation update.

Author's checklist:

  • I have reviewed the changes and it contains no misspelling.
  • The code is well commented, especially in the parts that contain more
    complexity.
  • New and old tests passed locally.

Additional information

Reviewer's checklist

Copy and paste this template for your review's note:

## Reviewer's Checklist

- [ ] I managed to reproduce the problem locally from the `main` branch
- [ ] I managed to test the new changes locally
- [ ] I confirm that the issues mentioned were fixed/resolved .

@github-actions
Copy link

github-actions bot commented Mar 4, 2026

OSL ChatGPT Reviewer

NOTE: This is generated by an AI program, so some comments may not make sense.

src/irx/builders/llvmliteir.py

  • Correctness/API: The two code paths return different types: constant lists yield a [N x T] value, but mixed/runtime lists return T* (pointer to first element). This will break downstream expectations and can miscompile call sites that assume a value vs a pointer. Unify the result type. Suggest building an SSA array aggregate for the runtime path and returning the array value, not a pointer. This also avoids stack-escape issues (see next point). Replace the alloca+GEP+store block with insertvalue on an undef aggregate (L.1938):
    def _build_array_aggregate(self, elems: list[ir.Value], arr_ty: ir.ArrayType) -> ir.Value:
    "title: Build SSA array aggregate from elements via insertvalue"
    agg: ir.Value = ir.Constant.undef(arr_ty)
    for i, v in enumerate(elems):
    agg = self._llvm.ir_builder.insert_value(agg, v, [i], name="list_ins")
    return agg

    usage

    arr_val = self._build_array_aggregate(coerced, arr_ty)
    self.result_stack.append(arr_val)

  • Correctness/Safety: Returning a pointer to stack memory (alloca) can escape the function and become dangling if the list literal is returned, captured, or stored beyond the statement. Even if not escaping, this introduces lifetime hazards. The insertvalue approach above removes this risk (L.1952).

  • Correctness: _coerce_to reads v.constant for ir.Constant. In llvmlite this attribute is not guaranteed across all constant kinds; accessing a missing attribute will raise at runtime. Guard it and fail fast with a clear error for non-scalar constants (L.1996):
    def _const_scalar(self, c: ir.Constant) -> int | float:
    "title: Extract Python scalar from an integer/float Constant"
    if is_int_type(c.type) or is_fp_type(c.type):
    if hasattr(c, "constant"):
    return c.constant
    raise TypeError("LiteralList: unsupported constant kind for coercion")

    Then use raw = self._const_scalar(v) in _coerce_to.

  • Correctness: Integer widening uses sext and int→fp uses sitofp unconditionally. If ASTx supports unsigned integers, this will corrupt values. You need zext and uitofp when the source is unsigned (L.2010, L.2017). If signedness isn’t tracked in ir.Type, carry it from ASTx metadata into coercion.

  • Correctness: Silent narrowing (int width downcast and fp fptrunc) happens implicitly. If the language doesn’t allow implicit narrowing, this should raise instead of truncating. Consider guarding integer narrowing and fp fptrunc unless an explicit cast was requested (L.2012, L.2023).


tests/test_literal_list.py

  • Correctness: The float-related tests only assert the element type/count; they won’t detect wrong coercion (e.g., bitcast instead of numeric convert). Please also assert the actual values for:
    • homogeneous float list
    • int→float promotion case
  • Correctness: Mixed-width int widening is only tested with positive values. Add a negative (e.g., i16(-1), i32(2)) to verify sign-extension semantics.
  • Test fragility: _make_visitor_in_function mutates private internals (visitor._llvm.*, ir_builder). This tightly couples tests to internals and may break on refactors. Consider exposing a public helper/fixture to create a live insertion point instead.
  • Flakiness risk: The helper creates a function named "_test_dummy". If the module is reused, this can collide with an existing symbol. Prefer a unique name per call (e.g., suffix with a counter/UUID).
  • Maintainability: Several tests are parametrized with builder_class but the helper ignores it and hardcodes LLVMLiteIR. If another builder is added, these tests won’t exercise it.

@github-actions
Copy link

github-actions bot commented Mar 4, 2026

OSL ChatGPT Reviewer

NOTE: This is generated by an AI program, so some comments may not make sense.

src/irx/builders/llvmliteir.py

  • High-risk type/lifetime mismatch: constant path returns an array value, non-constant path returns T* (pointer to first element). This will break downstream code expecting a uniform type and can cause subtle bugs in overload resolution and GEP users. Unify the result type. Suggest: for the constant path, materialize a private constant global and return a pointer to its first element to match the non-constant path (L.1920).
    Suggested change:
    def _emit_const_list_ptr(self, arr_ty: ir.ArrayType, elems: list[ir.Constant]) -> ir.Value:
    """
    title: Emit a private constant global for a list and return T* to its first element.
    """
    zero32 = ir.Constant(self._llvm.INT32_TYPE, 0)
    g = ir.GlobalVariable(self._llvm.module, arr_ty, name=self._llvm.fresh("list_const"))
    g.linkage = "internal"
    g.global_constant = True
    g.initializer = ir.Constant(arr_ty, elems)
    return self._llvm.ir_builder.gep(g, [zero32, zero32], inbounds=True)

    Then replace the constant-array return with:
    ptr = self._emit_const_list_ptr(arr_ty, coerced)
    self.result_stack.append(ptr)

  • Potential UB: returning a pointer to stack-allocated memory may escape the function. If the list literal’s result can be stored in a longer-lived location or returned, this is invalid. At minimum, ensure the allocation is in the entry block; ideally, detect escape and use a heap or global for constants (L.1930).
    Suggested change:
    def _alloca_in_entry_block(self, ty: ir.Type, name: str) -> ir.instructions.AllocaInstr:
    """
    title: Create an alloca in the function entry block.
    """
    func = self._llvm.ir_builder.function
    entry = func.entry_basic_block
    b = ir.IRBuilder(entry)
    b.position_at_beginning(entry)
    return b.alloca(ty, name=name)

    And replace the alloca call:
    alloca = self._alloca_in_entry_block(arr_ty, name="list_tmp")

  • Signedness correctness: _coerce_to unconditionally uses signed casts (sext/sitofp/fptosi). If the language has unsigned integers, this is incorrect. You need signedness info to choose zext/uitofp/fptoui (L.2010).
    Suggested change:
    def _int_cast(self, b: ir.IRBuilder, v: ir.Value, dst: ir.IntType, signed: bool) -> ir.Value:
    """
    title: Cast an integer value to another integer type honoring signedness.
    """
    if v.type.width < dst.width:
    return b.sext(v, dst, "list_sext") if signed else b.zext(v, dst, "list_zext")
    if v.type.width > dst.width:
    return b.trunc(v, dst, "list_trunc")
    return v

    Use in _coerce_to for int->int and int<->fp, selecting sitofp/uitofp and fptosi/fptoui based on signed.

  • Constant narrowing correctness: constant-path int narrowing via ir.Constant(target_ty, int(raw)) can raise on out-of-range values and diverge from the runtime truncation semantics (L.1985). Mask/truncate explicitly before building the constant.
    Suggested change:
    def _const_int_to_type(self, val: int, dst: ir.IntType) -> ir.Constant:
    """
    title: Truncate or widen a Python int to dst bitwidth with LLVM semantics.
    """
    bw = dst.width
    mask = (1 << bw) - 1
    truncated = val & mask
    # Preserve negative two’s-complement for values with top bit set
    if truncated >> (bw - 1):
    signed = truncated - (1 << bw)
    return ir.Constant(dst, signed)
    return ir.Constant(dst, truncated)

    Then replace constant int->int coercion with self._const_int_to_type(int(raw), target_ty).

  • Minor but impactful portability: GEP indices are hardcoded as i32. On some targets the preferred index size is i64; consider using the target’s ptr index type from DataLayout or a central helper to avoid mixed index widths (L.1940).


tests/test_literal_list.py

  • Potential correctness gap: Mixed-width int widening claims sign-extension but no negative case is tested. Please add a case to ensure signed values are preserved when widening (e.g., i16(-1) -> i32(-1)). (L.140)
    def test_literal_list_mixed_int_widths_signext(builder_class: Type[Builder]) -> None:
    """
    title: Mixed-width signed integers sign-extend to widest width.
    summary: '[-1:i16, 2:i32] widens to [2 x i32] with values [-1, 2].'
    parameters:
    builder_class:
    type: Type[Builder]
    """
    visitor = _make_visitor_in_function()
    visitor.visit(
    astx.LiteralList(elements=[astx.LiteralInt16(-1), astx.LiteralInt32(2)])
    )
    const = visitor.result_stack.pop()
    assert isinstance(const, ir.Constant)
    assert isinstance(const.type, ir.ArrayType)
    assert const.type.count == 2
    assert const.type.element == ir.IntType(32)
    assert _array_i32_values(const) == [-1, 2]

  • Fragility/brittleness: _make_visitor_in_function reaches into private internals and overwrites the ir_builder object, which may desynchronize state if other components hold references. Prefer using the public ir.VoidType() and repositioning an existing builder if present. (L.18, L.23)
    def _make_visitor_in_function() -> LLVMLiteIRVisitor:
    """
    title: Return a visitor whose ir_builder is inside a live basic block.
    """
    builder = LLVMLiteIR()
    visitor = builder.translator
    visitor.result_stack.clear()
    fn_ty = ir.FunctionType(ir.VoidType(), [])
    fn = ir.Function(visitor._llvm.module, fn_ty, name="_test_dummy")
    bb = fn.append_basic_block("entry")
    if getattr(visitor._llvm, "ir_builder", None) is not None:
    visitor._llvm.ir_builder.position_at_end(bb)
    else:
    visitor._llvm.ir_builder = ir.IRBuilder(bb)
    return visitor


@yogendra-17 yogendra-17 force-pushed the feature-Literallist branch from c87d4ab to 87b6e11 Compare March 5, 2026 08:16
@github-actions
Copy link

github-actions bot commented Mar 5, 2026

OSL ChatGPT Reviewer

NOTE: This is generated by an AI program, so some comments may not make sense.

src/irx/builders/llvmliteir.py

  • High-risk type inconsistency: constant path returns an ArrayType value, runtime path returns T* (pointer to first element). This will break downstream code that assumes a uniform representation. Consider unifying to always return T* (e.g., materialize constants as a global/alloca and return a GEP) or always return an array value and add an explicit “decay to pointer” where needed. The “Return pointer to first element” block is the divergence point.

  • Potential lifetime/escape bug: returning a pointer to an alloca (“list_tmp”) is only valid within the function and until return. If the list value can escape (stored in longer-lived memory or returned), this is a UAF. Either enforce non-escape at the IRx level or materialize as a global/heap allocation when escape is detected. This applies to the alloca and the final returned pointer.

  • Signedness assumptions in coercions: _coerce_to uses sext/trunc/sitofp/fptosi, which assume signed ints. If IRx supports unsigned ints or booleans (i1), this yields wrong semantics (should use zext/uitofp/fptoui and avoid sext from i1). Please add signedness handling to integer widening/narrowing and int<->fp conversions.

  • Silent narrowing/precision loss: both constant and runtime paths allow lossy conversions (int truncation, fp->int, fp narrowing) without diagnostics. This can introduce hard-to-find bugs. Consider rejecting narrowing by default (or gating behind an explicit cast) and emitting a clear error.

  • Alloca placement/perf: creating allocas at the current insertion point in loops can degrade performance and hinder mem2reg. Prefer placing the array alloca in the function entry block and using GEP/stores at the use site to avoid repeated stack allocations.


tests/test_literal_list.py

  • _make_visitor_in_function hardcodes LLVMLiteIR and couples to private VOID_TYPE. This will silently bypass the parametrize when more builders are added and is brittle to internals. Suggest: accept builder_class and use ir.VoidType() instead of visitor._llvm.VOID_TYPE. Also update call sites to pass builder_class. (L.33, L.44, L.47, L.86, L.129, L.157, L.187)
    def _make_visitor_in_function(builder_class: Type[Builder]) -> LLVMLiteIRVisitor:
    """
    title: Return a visitor whose ir_builder is inside a live basic block.
    """
    builder: Builder = builder_class()
    visitor = cast(LLVMLiteIRVisitor, builder.translator)
    visitor.result_stack.clear()
    fn_ty = ir.FunctionType(ir.VoidType(), [])
    fn = ir.Function(visitor._llvm.module, fn_ty, name="_test_dummy")
    bb = fn.append_basic_block("entry")
    visitor._llvm.ir_builder = ir.IRBuilder(bb)
    return visitor

  • The widening test only uses positive ints, so it can’t catch incorrect zero-extension. Add a negative case to ensure sign-extension. (Insert after the widens test, e.g. L.120)
    @pytest.mark.skipif(not HAS_LITERAL_LIST, reason="astx.LiteralList not available")
    @pytest.mark.parametrize("builder_class", [LLVMLiteIR])
    def test_literal_list_mixed_int_widths_signext(builder_class: Type[Builder]) -> None:
    """
    title: Mixed-width integers are sign-extended when widening.
    """
    visitor = _make_visitor_in_function(builder_class)
    visitor.visit(astx.LiteralList(elements=[astx.LiteralInt16(-1), astx.LiteralInt32(2)]))
    const = visitor.result_stack.pop()
    vals = _array_i32_values(const)
    assert vals == [-1, 2]


@yogendra-17 yogendra-17 force-pushed the feature-Literallist branch from 87b6e11 to 65ceece Compare March 6, 2026 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant