Skip to content

amdflang: array constructor assignment to private arrays in GPU_PARALLEL_LOOP produces wrong values (m_ibm.fpp, m_collisions.fpp) #1449

@sbryngelson

Description

@sbryngelson

Summary

Array constructor assignments (var = [a, b, c]) to private local arrays inside GPU_PARALLEL_LOOP regions produce silently wrong values on AMD GPU when compiled with amdflang using both -fopenmp-assume-threads-oversubscription and -fopenmp-assume-teams-oversubscription. This is the flag combination MFC uses for all AMD GPU builds.

The bug is present in all AFAR drops we tested: 23.1.0, 23.2.0, 23.2.1 (gfx90a / MI250X).

Errors are large — O(10–100) in absolute value, not floating-point noise — affecting ~50% of loop iterations non-deterministically.

Trigger Conditions

The bug fires when all three conditions hold:

  1. -fopenmp-assume-threads-oversubscription and -fopenmp-assume-teams-oversubscription are both present (either flag alone does not trigger it)
  2. An array constructor [expr1, expr2, ...] is assigned to a private array inside a target teams distribute parallel do
  3. The loop has enough iterations (≥ ~64)

Affected Code in MFC

src/simulation/m_ibm.fpp

s_ibm_correct_state (line ~195 GPU_PARALLEL_LOOP, lines 207/209):

real(wp), dimension(3) :: physical_loc   ! declared private
...
$:GPU_PARALLEL_LOOP(private='[..., physical_loc, ...]')
do i = 1, num_gps
    ...
    if (p > 0) then
        physical_loc = [x_cc(j), y_cc(k), z_cc(l)]   ! BUG: wrong values ~50% of iterations
    else
        physical_loc = [x_cc(j), y_cc(k), 0._wp]      ! BUG: same
    end if
    ...
end do

This subroutine is called every time step for all IBM cases. The wrong physical_loc values mean ghost point corrections are computed using corrupted spatial coordinates.

s_find_ghost_points (line ~397 GPU_PARALLEL_LOOP, lines 407/409): same pattern.

src/simulation/m_collisions.fpp

s_detect_ib_collisions_n2 (lines 337–338):

centroid_1 = [patch_ib(pid1)%x_centroid, patch_ib(pid1)%y_centroid, 0._wp]
centroid_2 = [patch_ib(pid2)%x_centroid, patch_ib(pid2)%y_centroid, 0._wp]

Inside a GPU_PARALLEL_LOOP. Only reachable when collision_model > 0 (not currently in CI).

Why CI Passes

The affected code paths (s_ibm_correct_state, s_find_ghost_points) are exercised by the GPU CI IBM test suite. However, the current regression tests appear to tolerate the corrupted ghost point coordinates — likely because the test geometries and flow conditions happen to be insensitive enough that wrong physical_loc values don't push the final solution norm outside the 1e-10 tolerance. This is a silent correctness bug.

Minimal Reproducer

program minimal_array_constructor
    implicit none
    integer, parameter :: N = 64
    real(8) :: x(N), y(N), z(N), out(N,3)
    real(8) :: loc(3)
    integer :: i, nerr

    do i = 1, N
        x(i) = real(i,8)*0.1d0; y(i) = real(i,8)*0.2d0; z(i) = real(i,8)*0.3d0
    end do

    !$omp target teams distribute parallel do map(to:x,y,z) map(from:out) private(loc)
    do i = 1, N
        loc = [x(i), y(i), z(i)]
        out(i,1) = loc(1); out(i,2) = loc(2); out(i,3) = loc(3)
    end do
    !$omp end target teams distribute parallel do

    nerr = 0
    do i = 1, N
        if (abs(out(i,1)-x(i)) > 1d-14 .or. &
            abs(out(i,2)-y(i)) > 1d-14 .or. &
            abs(out(i,3)-z(i)) > 1d-14) nerr = nerr + 1
    end do
    if (nerr == 0) then
        print *, "PASS"
    else
        print *, "FAIL:", nerr, "of", N, "(max abs error O(10-100))"
    end if
end program

Compile and run:

amdflang -fopenmp --offload-arch=gfx90a -fopenmp-target-fast \
  -fopenmp-assume-threads-oversubscription \
  -fopenmp-assume-teams-oversubscription \
  -o minimal minimal_array_constructor.f90
./minimal
# FAIL: 31 of 64 (max abs error O(10-100))

Workaround

Replace array constructor assignments with explicit element-by-element assignment:

! Instead of:
physical_loc = [x_cc(j), y_cc(k), z_cc(l)]

! Use:
physical_loc(1) = x_cc(j)
physical_loc(2) = y_cc(k)
physical_loc(3) = z_cc(l)

This is confirmed to produce correct results.

Upstream Compiler Bug

Being investigated in ROCm/llvm-project. A related fix for private VLA arrays is in ROCm/llvm-project#2422. The array constructor bug is a separate issue — the upstream report will follow.

cc @danieljvickers

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions