Skip to content

ee/lwip: enable LWIP_NETIF_LOOPBACK + bump MEMP_NUM_TCP_PCB#839

Draft
fjtrujy wants to merge 2 commits into
ps2dev:masterfrom
fjtrujy:tune/ee-lwip-loopback-pcb
Draft

ee/lwip: enable LWIP_NETIF_LOOPBACK + bump MEMP_NUM_TCP_PCB#839
fjtrujy wants to merge 2 commits into
ps2dev:masterfrom
fjtrujy:tune/ee-lwip-loopback-pcb

Conversation

@fjtrujy
Copy link
Copy Markdown
Member

@fjtrujy fjtrujy commented May 8, 2026

Summary

Two small EE-side lwipopts.h tunings, kept separate from the lwIP 2.2.1 upgrade (#838) so each can be evaluated and reverted on its own. Stacked on top of #838.

If reviewers prefer, this PR can be split into two single-change PRs (one per tuning). Bundled here for now because both touch only ee/network/tcpip/src/include/lwipopts.h and neither is independently load-bearing.

Changes

LWIP_NETIF_LOOPBACK = 1 (with explicit LWIP_HAVE_LOOPIF = 0)

Enables in-place loopback delivery: when an EE-side app sends to its own netif IP, lwIP loops the packet through the netif's input callback rather than emitting it on the wire. Useful for self-tests where an EE app acts as both client and server (notably ps2_drivers/samples/tcp_echo_test_ee, our diagnostic loopback test).

LWIP_HAVE_LOOPIF is forced to 0 because lwIP would otherwise auto-default it to (LWIP_NETIF_LOOPBACK && !LWIP_SINGLE_NETIF) = 1, which auto-creates a 127.0.0.1 netif at init. We don't need a separate loopback netif when in-place loopback delivery is enough, and forcing it off keeps init lean.

Cost: struct netif gains two pointers (loop_first, loop_last) when LWIP_NETIF_LOOPBACK=1+8 bytes per netif on 32-bit. With one SMAP netif, +8 bytes static. CPU cost is a single "is dst == our netif IP?" check on outbound ip4_output_if_src — negligible.

MEMP_NUM_TCP_PCB = 32 (up from lwIP default 5)

Defensive headroom for server workloads where the server does the active TCP close (each closed pcb sits in TIME_WAIT for 2×MSL ≈ 60 s holding a slot). The default of 5 is enough for the common case where the client closes first (e.g. our http_burst.py tests via http.client), but a server-active-close pattern exhausts it after a handful of fast back-to-back requests.

Cost: each struct tcp_pcb is roughly 140 B on the EE-side build, so 27 extra entries cost ≈ 3.8 KB of BSS in the MEMP_PBUF_POOL allocated at lwip_init. No CPU cost — memp_malloc is O(1).

Why not in #838?

#838 is the strict upgrade-only PR ("upgrade to lwIP 2.2.1, nothing else"), which is much easier to review and revert if a regression surfaces. These two tunings are:

  • not required for the upgrade to work (verified empirically: ps2_http sustains 100/100 burst without them),
  • separately motivated (loopback is for tests; PCB bump is for hypothetical server load),
  • arguably orthogonal to lwIP's version.

Asymmetry note

If the PCB bump is desired, a future change should mirror it on the IOP-side lwipopts.h for parity (the IOP-side ps2_http path benefits equally). Held off here to keep this PR focused on a single side.

Test plan

  • PCSX2: tcp_echo_test_ee — 100+ self-loopback iterations, all OK.
  • PCSX2: ps2_http — 30/30 HTTP 200 with this branch (also passes without it; this just doesn't regress).
  • EE-side network samples sweep: 0 TLB misses, no spurious select wakes.

🤖 Generated with Claude Code

fjtrujy and others added 2 commits May 8, 2026 15:58
Bumps the vendored lwIP from STABLE-2_0_3_RELEASE to STABLE-2_2_1_RELEASE
(7+ years of upstream fixes / features) on both the IOP and EE
networking stacks.

ps2sdk-side adjustments to track the upstream API and tune for the
PS2 targets:

iop/tcpip and ee/network/tcpip lwipopts.h:
- LWIP_TCPIP_CORE_LOCKING=1 + LWIP_TCPIP_CORE_LOCKING_INPUT=1 (lwIP
  2.2.1 default). Socket / netconn API calls take a binary-sem mutex on
  the calling app thread instead of round-tripping through the tcpip
  thread mailbox. lwIP releases the lock before any blocking I/O wait
  so the tcpip thread + netif input still make progress. Saves a
  context switch + sem wait per API call versus message passing.
- DEFAULT_THREAD_STACKSIZE=0x600 on IOP (matches the historical lwIP
  2.0.3 budget; the deep socket call chains run on app threads under
  core-locking, the tcpip thread only handles timers + tcpip_callback
  dispatch, max ~450 B measured via -fstack-usage).
- LWIP_DHCP_DOES_ACD_CHECK=0 + LWIP_ACD=0: no AutoIP, controlled LAN,
  the conflict-detection timer/code is dead weight here.
- Pruned three options removed/renamed in upstream:
  LWIP_SOCKET_SET_ERRNO (gone since 2017's commit 0ee6ad0a),
  DHCP_DOES_ARP_CHECK (renamed to LWIP_DHCP_DOES_ACD_CHECK in 2.2.0),
  LWIP_DHCP_CHECK_LINK_UP (no longer referenced anywhere in 2.2.1).

ee/network/tcpip lwipopts.h MEM_ALIGNMENT:
- Drop from 64 (historical "EE cache design" value, attributed to
  SP193) to 16, matching the alignment newlib's malloc returns on EE.
  With MEM_ALIGNMENT=64, pbuf_alloc(PBUF_RAM) computed
    payload = LWIP_MEM_ALIGN(p + SIZEOF_STRUCT_PBUF + offset)
  inside an allocation sized assuming p was already 64-byte aligned;
  newlib only guarantees 16, set by newlib/newlib/configure.host:254
    mips64r5900*)
        machine_dir=r5900
        newlib_cflags="${newlib_cflags} -DMALLOC_ALIGNMENT=16"
        ;;
  so the payload pointer slid past the allocation by up to 48 bytes
  and scribbled into the next chunk's header — TLB misses inside
  _malloc_r / _free_r after the first close cycle of any TCP server
  (real hw locks up; PCSX2 logs the misses but limps on).
  PBUF_POOL pbufs (the only ones touched by IOP->EE DMA + cache
  invalidate) come from memp's static pools and are 64-byte aligned
  regardless of MEM_ALIGNMENT, so the cache-design invariant the old
  comment cited is preserved by memp, not by MEM_ALIGNMENT. The
  IOP-side lwipopts has used MEM_ALIGNMENT=4 forever for the same
  reason.

iop/tcpip API:
- tcpip_callback_with_block was promoted to tcpip_callback (the macro
  became the real function in lwIP 2.1.0); update exports.tab,
  imports.lst, and call sites.
- iop/network/smap/src/imports.lst: corresponding rename for the
  netif callbacks the smap driver imports from ps2ip-nm.irx.
- ps2ip.c: define `int errno` in .data so socket error paths have
  somewhere to write to (lwIP 2.2.1 always writes errno; the section
  attribute was already corrected in the previous commit).

iop/tcpip-base/sys_arch.c:
- Add sys_mbox_trypost_fromisr stub (new in lwIP 2.2.1, called from
  tcpip_input when LWIP_TCPIP_CORE_LOCKING_INPUT=0; harmless in our
  build but the symbol must be present).

ee/network/tcpip/src/sys_arch.c:
- Replace the previous DI/EI-based sys_arch_protect with a per-thread
  recursive semaphore. The DI/EI variant deadlocked the EE: any code
  path inside a SYS_ARCH_PROTECT region that ended up calling newlib's
  malloc/free would WaitSema on the heap recursive mutex with
  interrupts disabled. The sema-based variant lets nested waits work
  normally and removes the EE-specific incompatibility between lwIP
  and any other library that uses newlib's locks. lwIP allows
  SYS_ARCH_PROTECT to nest, so the implementation tracks the owning
  thread + a recursion counter and only Wait/Signal on the outermost
  transitions.

iop/tcpip/tcpip/Makefile + ps2api_IPV4 list:
- acd.c (new in lwIP 2.2.0) intentionally not added; LWIP_ACD=0 makes
  it a no-op TU and we'd rather drop the few KB outright.

Verified on real hardware: ps2link boots, execee + reset cycle works
repeatedly, IOP printf-over-UDP (KPRTTY) coexists cleanly with module
loading. EE-side TCP server (ps2_drivers/samples/tcp_server_ee) and
mongoose-based ps2_http both sustain 100/100 burst tests with 0 TLB
misses, exercising the EE lwipopts MEM_ALIGNMENT and sys_arch.c
adjustments above.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LWIP_NETIF_LOOPBACK=1 lets the EE-side lwIP route packets sent to a
netif's own IP back through the netif's input path, so an EE app can
act as both client and server to itself for testing without needing
inbound TCP delivery from the host (which PCSX2 Sockets mode doesn't
provide). Required for ps2_drivers' tcp_echo_test_ee sample.

Forces LWIP_HAVE_LOOPIF=0 explicitly: lwIP would otherwise auto-default
it to (LWIP_NETIF_LOOPBACK && !LWIP_SINGLE_NETIF) = 1, which creates a
127.0.0.1 netif at init and breaks DHCP DISCOVER routing in PCSX2.

Bumps MEMP_NUM_TCP_PCB from the lwIP default of 5 to 32. The default
is way too small for a server pattern: each completed request leaves
the closing-side pcb in TIME_WAIT for 2*MSL (~60 s) holding a slot,
plus there's always one slot for the listening pcb itself. With only
4 effective slots an HTTP server runs out as soon as a few back-to-back
requests close. 32 leaves headroom for sustained traffic plus the
SYN_RECV in-flight state for a burst.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant