Fix LiteralUTF8Char lowering for non-ASCII UTF-8 chars by tmdeveloper007 · Pull Request #207 · arxlang/irx

tmdeveloper007 · 2026-03-07T01:44:58Z

Pull Request description

This PR fixes a Unicode lowering bug in LiteralUTF8Char.

When lowering astx.LiteralUTF8Char, the code correctly computed the UTF-8 byte length, but initialized the backing global constant using ASCII encoding. That caused translation to fail for valid non-ASCII characters such as é with a UnicodeEncodeError.

Changes made:

use UTF-8 bytes when initializing LiteralUTF8Char storage
add a regression test covering a multibyte UTF-8 char literal

Addresses #208

How to test these changes

run pytest tests/test_string.py -q -k "utf8_char_non_ascii_translate"
confirm the test passes
optionally verify that lowering a module containing astx.LiteralUTF8Char("é") no longer raises UnicodeEncodeError

Pull Request checklists

This PR is a:

bug-fix
new feature
maintenance

About this PR:

it includes tests.
the tests are executed on CI.
the tests generate log file(s) (path).
pre-commit hooks were executed locally.
this PR requires a project documentation update.

Author's checklist:

I have reviewed the changes and it contains no misspelling.
The code is well commented, especially in the parts that contain more
complexity.
New and old tests passed locally.

Additional information

Validation run locally:
- pytest tests/test_string.py -q -k "utf8_char_non_ascii_translate"
- ruff check src/irx/builders/llvmliteir.py tests/test_string.py
I kept the change minimal and limited it to the UTF-8 char literal lowering path and one regression test.

Reviewer's checklist

Copy and paste this template for your review's note:

## Reviewer's Checklist

- [ ] I managed to reproduce the problem locally from the `main` branch
- [ ] I managed to test the new changes locally
- [ ] I confirm that the issues mentioned were fixed/resolved .

yuvimittal · 2026-03-10T11:04:28Z

@tmdeveloper007 , the tests currently only testing that LLVM IR generation doesn't crash, not that the UTF-8 character lowering is correct.

yuvimittal · 2026-03-10T11:05:25Z

tests/test_string.py

+    char_literal = astx.LiteralUTF8Char(expected)
+
+    decl_tmp = astx.VariableDeclaration(
+        name="tmp", type_=astx.String(), value=char_literal


what is the use of tmp here?

Fix LiteralUTF8Char lowering for non-ASCII UTF-8 chars

0bf7ccf

yuvimittal reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix LiteralUTF8Char lowering for non-ASCII UTF-8 chars#207

Fix LiteralUTF8Char lowering for non-ASCII UTF-8 chars#207
tmdeveloper007 wants to merge 1 commit intoarxlang:mainfrom
tmdeveloper007:ISSUE-205

tmdeveloper007 commented Mar 7, 2026 •

edited

Loading

Uh oh!

yuvimittal commented Mar 10, 2026

Uh oh!

yuvimittal Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tmdeveloper007 commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request description

How to test these changes

Pull Request checklists

Additional information

Reviewer's checklist

Uh oh!

yuvimittal commented Mar 10, 2026

Uh oh!

yuvimittal Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tmdeveloper007 commented Mar 7, 2026 •

edited

Loading