Skip to content

Commit 7537c69

Browse files
committed
Document OWN_GIL safety mechanisms and lock ordering
- CHANGELOG: Add OWN_GIL safety fixes section for 2.2.0 - owngil_internals.md: Add Safety Mechanisms section covering interp_id validation, lock ordering (ABBA prevention), callback re-entry limitation - event_loop_architecture.md: Add Per-Process Namespace Management section with lock ordering and cleanup behavior documentation - process-bound-envs.md: Add Interpreter ID Validation and Cleanup Safety sections explaining cross-interpreter protection
1 parent e6126ee commit 7537c69

4 files changed

Lines changed: 137 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,21 @@
22

33
## 2.2.0 (unreleased)
44

5+
### Fixed
6+
7+
- **OWN_GIL Safety Fixes** - Critical fixes for OWN_GIL subinterpreter mode
8+
- **Mutex leak in erlang module** - `async_futures_mutex` now always destroyed in
9+
`erlang_module_free()` regardless of `pipe_initialized` flag
10+
- **ABBA deadlock prevention** - Fixed lock ordering in `event_loop_down()` and
11+
`event_loop_destructor()` to acquire GIL before `namespaces_mutex`, matching the
12+
normal execution path and preventing deadlocks
13+
- **Dangling env pointer detection** - Added `interp_id` validation in
14+
`owngil_execute_*_with_env()` functions to detect and reject env resources
15+
created by a different interpreter, returning `{error, env_wrong_interpreter}`
16+
- **OWN_GIL callback documentation** - Documented that `erlang.call()` from OWN_GIL
17+
contexts uses `thread_worker_call()` rather than suspension/resume protocol;
18+
re-entrant calls to the same OWN_GIL context are not supported
19+
520
### Added
621

722
- **PyBuffer API** - Zero-copy WSGI input buffer for streaming HTTP bodies

docs/event_loop_architecture.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -242,3 +242,45 @@ pthread_mutex_unlock PyGILState_Release
242242
| GIL acquisitions | 1 per batch | Not per-task |
243243
| Handle allocations | ~0 (pooled) | After warmup |
244244
| Time syscalls | 1 per iteration | Cached within iteration |
245+
246+
## Per-Process Namespace Management
247+
248+
Each Erlang process can have an isolated Python namespace within an event loop. These namespaces are tracked in a linked list protected by `namespaces_mutex`.
249+
250+
### Lock Ordering
251+
252+
To prevent ABBA deadlocks, locks must always be acquired in this order:
253+
254+
```
255+
1. GIL (PyGILState_Ensure)
256+
2. namespaces_mutex (pthread_mutex_lock)
257+
```
258+
259+
This ordering is enforced in:
260+
- `ensure_process_namespace()` - Called with GIL held, then acquires mutex
261+
- `event_loop_down()` - Acquires GIL first, then mutex for cleanup
262+
- `event_loop_destructor()` - Acquires GIL first, then mutex for cleanup
263+
264+
### Cleanup Behavior
265+
266+
When a monitored process dies (`event_loop_down`) or the event loop is destroyed:
267+
268+
**For main interpreter (`interp_id == 0`):**
269+
```c
270+
PyGILState_STATE gstate = PyGILState_Ensure();
271+
pthread_mutex_lock(&loop->namespaces_mutex);
272+
// Py_XDECREF(ns->globals), etc.
273+
pthread_mutex_unlock(&loop->namespaces_mutex);
274+
PyGILState_Release(gstate);
275+
```
276+
277+
**For subinterpreters (`interp_id != 0`):**
278+
```c
279+
pthread_mutex_lock(&loop->namespaces_mutex);
280+
// Skip Py_XDECREF - cannot safely acquire subinterpreter GIL
281+
// Objects freed when interpreter is destroyed
282+
enif_free(ns);
283+
pthread_mutex_unlock(&loop->namespaces_mutex);
284+
```
285+
286+
This design accepts a minor memory leak (Python dicts not decrefd) to avoid the complexity and risk of acquiring a subinterpreter's GIL from an arbitrary thread.

docs/owngil_internals.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -395,11 +395,65 @@ Use shared-GIL (subinterp) when:
395395
- High call frequency
396396
- Resource constraints
397397

398+
## Safety Mechanisms
399+
400+
### Interpreter ID Validation
401+
402+
Process-local environments (`py_env_resource_t`) store the Python interpreter ID when created. Before execution, OWN_GIL functions validate that the env belongs to the current interpreter:
403+
404+
```c
405+
PyInterpreterState *current_interp = PyInterpreterState_Get();
406+
if (current_interp != NULL && penv->interp_id != PyInterpreterState_GetID(current_interp)) {
407+
// Return {error, env_wrong_interpreter}
408+
}
409+
```
410+
411+
This prevents dangling pointer access when an env resource outlives its interpreter.
412+
413+
### Lock Ordering (ABBA Deadlock Prevention)
414+
415+
Lock ordering must be consistent to prevent deadlocks:
416+
417+
**Correct order: GIL first, then namespaces_mutex**
418+
419+
Normal execution path:
420+
```
421+
PyGILState_Ensure() // 1. Acquire GIL
422+
pthread_mutex_lock() // 2. Acquire mutex
423+
// ... work ...
424+
pthread_mutex_unlock() // 3. Release mutex
425+
PyGILState_Release() // 4. Release GIL
426+
```
427+
428+
Cleanup paths (`event_loop_down`, `event_loop_destructor`) follow the same order:
429+
```c
430+
// For main interpreter: GIL first, then mutex
431+
PyGILState_STATE gstate = PyGILState_Ensure();
432+
pthread_mutex_lock(&loop->namespaces_mutex);
433+
// ... cleanup with Py_XDECREF ...
434+
pthread_mutex_unlock(&loop->namespaces_mutex);
435+
PyGILState_Release(gstate);
436+
```
437+
438+
For subinterpreters (where `PyGILState_Ensure` cannot be used), cleanup skips `Py_DECREF` - the objects will be freed when the interpreter is destroyed.
439+
440+
### Callback Re-entry Limitation
441+
442+
OWN_GIL contexts do not support the suspension/resume protocol used for `erlang.call()` callbacks. When Python code in an OWN_GIL context calls `erlang.call()`:
443+
444+
1. The call is routed to `thread_worker_call()` (not the OWN_GIL thread)
445+
2. The call executes on a thread worker, not the calling OWN_GIL context
446+
3. Re-entrant calls back to the same OWN_GIL context are not supported
447+
448+
This is because the OWN_GIL thread cannot be suspended - it owns its GIL and must remain responsive to process requests.
449+
398450
## Files
399451
400452
| File | Description |
401453
|------|-------------|
402454
| `c_src/py_nif.h` | Structure definitions, request types |
403455
| `c_src/py_nif.c` | Thread main, dispatch, execute functions |
456+
| `c_src/py_callback.c` | Callback handling, thread worker dispatch |
457+
| `c_src/py_event_loop.c` | Event loop and namespace management |
404458
| `src/py_context.erl` | Erlang API for context management |
405459
| `test/py_owngil_features_SUITE.erl` | Test suite |

docs/process-bound-envs.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -249,6 +249,32 @@ Environments are stored as NIF resources with the following lifecycle:
249249

250250
For subinterpreters, environments are created inside the target interpreter using its memory allocator - critical for memory safety.
251251

252+
### Interpreter ID Validation
253+
254+
Each `py_env_resource_t` stores the Python interpreter ID (`interp_id`) when created. For OWN_GIL contexts, before any operation using a process-local env, the system validates that the env belongs to the current interpreter:
255+
256+
```c
257+
PyInterpreterState *current_interp = PyInterpreterState_Get();
258+
if (penv->interp_id != PyInterpreterState_GetID(current_interp)) {
259+
return {error, env_wrong_interpreter};
260+
}
261+
```
262+
263+
This prevents:
264+
- Using an env from a destroyed interpreter (dangling pointer)
265+
- Using an env created for a different OWN_GIL context
266+
- Memory corruption from cross-interpreter dict access
267+
268+
### Cleanup Safety
269+
270+
For the main interpreter (`interp_id == 0`), the destructor acquires the GIL and decrefs the Python dicts normally.
271+
272+
For subinterpreters, the destructor skips `Py_DECREF` because:
273+
1. `PyGILState_Ensure` cannot safely acquire a subinterpreter's GIL
274+
2. The Python objects will be freed when the subinterpreter is destroyed via `Py_EndInterpreter`
275+
276+
This design prioritizes safety over avoiding minor memory leaks during edge cases.
277+
252278
## See Also
253279

254280
- [Context Affinity](context-affinity.md) - Context binding and routing

0 commit comments

Comments
 (0)