Skip to content

Commit 7921bbe

Browse files
committed
Document OWN_GIL mode features and usage
CHANGELOG.md: - Add OWN_GIL Context Mode with feature list - Add Process-Local Environments for OWN_GIL - Add Per-Process Event Loop Namespaces - Add OWN_GIL Test Suites section - Add Changed section for asyncio compatibility fixes docs/owngil_internals.md: - Add Quick Start section with usage examples - Add Feature Compatibility table - Add Benchmarking section with example output docs/scalability.md: - Add OWN_GIL to mode comparison table - Add OWN_GIL Mode section with architecture, usage, process-local envs - Update subinterp section to clarify shared-GIL behavior - Add "When to use OWN_GIL" guidance
1 parent 7537c69 commit 7921bbe

3 files changed

Lines changed: 178 additions & 11 deletions

File tree

CHANGELOG.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,44 @@
6767
- `examples/bench_async_task.erl` - Erlang benchmark runner
6868
- `priv/test_async_task.py` - Python async task implementation
6969

70+
- **OWN_GIL Context Mode** - True parallel Python execution (Python 3.12+)
71+
- `py_context:start_link(Id, owngil)` - Create context with dedicated pthread and GIL
72+
- Each OWN_GIL context runs in its own thread with independent Python GIL
73+
- Enables true CPU parallelism across multiple Python contexts
74+
- Full feature support: channels, buffers, callbacks, PIDs, reactor, async tasks
75+
- `py_context:get_nif_ref/1` - Get NIF reference for low-level operations
76+
- New benchmark: `examples/bench_owngil.erl` comparing SHARED_GIL vs OWN_GIL
77+
- See [OWN_GIL Internals](docs/owngil_internals.md) for architecture details
78+
79+
- **Process-Local Environments for OWN_GIL** - Namespace isolation within shared contexts
80+
- `py_context:create_local_env/1` - Create isolated Python namespace for calling process
81+
- `py_nif:context_exec(Ref, Code, Env)` - Execute with process-local environment
82+
- `py_nif:context_eval(Ref, Expr, Locals, Env)` - Evaluate with process-local environment
83+
- `py_nif:context_call(Ref, Mod, Func, Args, Kwargs, Env)` - Call with process-local environment
84+
- Multiple Erlang processes can share an OWN_GIL context with isolated namespaces
85+
- Interpreter ID validation prevents cross-interpreter env usage
86+
87+
- **Per-Process Event Loop Namespaces** - Process isolation for event loop API
88+
- `py_nif:event_loop_exec/2` - Execute code in calling process's namespace
89+
- `py_nif:event_loop_eval/2` - Evaluate expression in calling process's namespace
90+
- Functions defined via exec callable via `create_task` with `__main__` module
91+
- Automatic cleanup when Erlang process exits
92+
93+
- **OWN_GIL Test Suites** - Feature verification
94+
- `py_context_owngil_SUITE` - Core OWN_GIL functionality (15 tests)
95+
- `py_owngil_features_SUITE` - Feature integration (44 tests covering channels,
96+
buffers, callbacks, PIDs, reactor, async tasks, asyncio, local envs)
97+
98+
### Changed
99+
100+
- **Event Loop Lock Ordering** - GIL acquired before `namespaces_mutex` in cleanup paths
101+
to prevent ABBA deadlocks with normal execution path
102+
103+
- **Asyncio Compatibility** - Fixed for Python 3.12+ with subinterpreters
104+
- Thread-local event loop context in `process_ready_tasks`
105+
- Eager task execution handling for Python 3.12+
106+
- Deprecation warning fix: use `erlang.run()` instead of `erlang.install()`
107+
70108
## 2.1.0 (2026-03-12)
71109

72110
### Added

docs/owngil_internals.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,50 @@
44

55
OWN_GIL mode provides true parallel Python execution using Python 3.12+ per-interpreter GIL (`PyInterpreterConfig_OWN_GIL`). Each OWN_GIL context runs in a dedicated pthread with its own subinterpreter and GIL.
66

7+
## Quick Start
8+
9+
```erlang
10+
%% Create an OWN_GIL context (requires Python 3.12+)
11+
{ok, Ctx} = py_context:start_link(1, owngil),
12+
13+
%% Basic operations work the same as other modes
14+
{ok, 4.0} = py_context:call(Ctx, math, sqrt, [16], #{}),
15+
ok = py_context:exec(Ctx, <<"x = 42">>),
16+
{ok, 42} = py_context:eval(Ctx, <<"x">>),
17+
18+
%% True parallelism: multiple OWN_GIL contexts execute simultaneously
19+
{ok, Ctx2} = py_context:start_link(2, owngil),
20+
%% Ctx and Ctx2 run in parallel with independent GILs
21+
22+
%% Process-local environments for namespace isolation
23+
{ok, Env} = py_context:create_local_env(Ctx),
24+
CtxRef = py_context:get_nif_ref(Ctx),
25+
ok = py_nif:context_exec(CtxRef, <<"my_var = 'isolated'">> , Env),
26+
27+
%% Cleanup
28+
py_context:stop(Ctx),
29+
py_context:stop(Ctx2).
30+
```
31+
32+
## Feature Compatibility
33+
34+
All major erlang_python features work with OWN_GIL mode:
35+
36+
| Feature | Status | Notes |
37+
|---------|--------|-------|
38+
| `py_context:call/5` | Full | Function calls |
39+
| `py_context:eval/2` | Full | Expression evaluation |
40+
| `py_context:exec/2` | Full | Statement execution |
41+
| Channels (`py_channel`) | Full | Bidirectional messaging |
42+
| Buffers (`py_buffer`) | Full | Zero-copy streaming |
43+
| Callbacks (`erlang.call`) | Partial | Uses thread_worker, not re-entrant |
44+
| PIDs (`erlang.Pid`) | Full | Round-trip serialization |
45+
| Send (`erlang.send`) | Full | Fire-and-forget messaging |
46+
| Reactor (`erlang.reactor`) | Full | FD-based protocols |
47+
| Async Tasks | Full | `py_event_loop:create_task` |
48+
| Asyncio | Full | `asyncio.sleep`, `gather`, etc. |
49+
| Process-local envs | Full | Namespace isolation |
50+
751
## Architecture
852

953
```
@@ -395,6 +439,46 @@ Use shared-GIL (subinterp) when:
395439
- High call frequency
396440
- Resource constraints
397441

442+
## Benchmarking
443+
444+
Run the benchmark to compare modes on your system:
445+
446+
```bash
447+
rebar3 compile && escript examples/bench_owngil.erl
448+
```
449+
450+
Example output:
451+
```
452+
========================================================
453+
OWN_GIL vs SHARED_GIL Benchmark
454+
========================================================
455+
456+
System Information
457+
------------------
458+
Erlang/OTP: 27
459+
Schedulers: 8
460+
Python: 3.14.0
461+
Subinterp: true
462+
463+
1. Single Context Latency (1000 calls to math.sqrt)
464+
Mode us/call calls/sec
465+
---- ------- ---------
466+
subinterp 2.5 400000
467+
owngil 10.2 98000
468+
469+
2. Parallel Throughput (4 contexts, 10000 calls each)
470+
Mode total_ms calls/sec
471+
---- -------- ---------
472+
subinterp 100.5 398000
473+
owngil 28.3 1415000 <- 3.5x faster
474+
475+
3. CPU-Bound Speedup (fibonacci(30) x 4 contexts)
476+
Mode total_ms speedup
477+
---- -------- -------
478+
subinterp 800.2 1.0x
479+
owngil 205.1 3.9x <- near-linear scaling
480+
```
481+
398482
## Safety Mechanisms
399483

400484
### Interpreter ID Validation

docs/scalability.md

Lines changed: 56 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -21,22 +21,61 @@ py:num_executors().
2121
| Mode | Python Version | Parallelism | GIL Behavior | Best For |
2222
|------|----------------|-------------|--------------|----------|
2323
| **free_threaded** | 3.13+ (nogil build) | True N-way | None | Maximum throughput |
24-
| **subinterp** | 3.12+ | True N-way | Per-interpreter | CPU-bound, isolation |
24+
| **owngil** | 3.12+ | True N-way | Per-interpreter (dedicated thread) | CPU-bound parallel |
25+
| **subinterp** | 3.12+ | None (shared GIL) | Shared GIL (pool) | High call frequency |
2526
| **multi_executor** | Any | GIL contention | Shared, round-robin | I/O-bound, compatibility |
2627

2728
### Free-Threaded Mode (Python 3.13+)
2829

2930
When running on a free-threaded Python build (compiled with `--disable-gil`), erlang_python executes Python calls directly without any executor routing. This provides maximum parallelism for CPU-bound workloads.
3031

32+
### OWN_GIL Mode (Python 3.12+)
33+
34+
Creates dedicated pthreads with independent GILs for true parallel Python execution. Each OWN_GIL context runs in its own thread, enabling CPU parallelism.
35+
36+
**Architecture:**
37+
- Each context gets a dedicated pthread with its own subinterpreter and GIL
38+
- Requests dispatched via mutex/condvar IPC (not dirty schedulers)
39+
- True parallel execution across multiple OWN_GIL contexts
40+
- Higher per-call latency (~10μs vs ~2.5μs) but better parallelism
41+
42+
**Usage:**
43+
```erlang
44+
%% Create OWN_GIL contexts for parallel execution
45+
{ok, Ctx1} = py_context:start_link(1, owngil),
46+
{ok, Ctx2} = py_context:start_link(2, owngil),
47+
48+
%% These execute in parallel with independent GILs
49+
spawn(fun() -> py_context:call(Ctx1, heavy_compute, run, [Data1]) end),
50+
spawn(fun() -> py_context:call(Ctx2, heavy_compute, run, [Data2]) end).
51+
```
52+
53+
**Process-Local Environments:**
54+
```erlang
55+
%% Multiple processes can share an OWN_GIL context with isolated namespaces
56+
{ok, Env} = py_context:create_local_env(Ctx),
57+
CtxRef = py_context:get_nif_ref(Ctx),
58+
ok = py_nif:context_exec(CtxRef, <<"x = 42">>, Env),
59+
{ok, 42} = py_nif:context_eval(CtxRef, <<"x">>, #{}, Env).
60+
```
61+
62+
**When to use OWN_GIL:**
63+
- CPU-bound Python workloads that benefit from parallelism
64+
- Long-running computations
65+
- When you need true concurrent Python execution
66+
- Scientific computing, ML inference, data processing
67+
68+
**See also:** [OWN_GIL Internals](owngil_internals.md) for architecture details.
69+
3170
### Sub-interpreter Mode (Python 3.12+)
3271

33-
Uses Python's sub-interpreter feature with per-interpreter GIL (`Py_GIL_OWN`). Each sub-interpreter runs in its own dedicated thread with its own GIL, enabling true parallel execution across interpreters.
72+
Uses Python's sub-interpreter feature with a shared GIL pool. Multiple contexts share the GIL but have isolated namespaces. Best for high call frequency with low latency.
3473

3574
**Architecture:**
36-
- Thread pool manages N subinterpreters (default: number of schedulers)
37-
- Each subinterpreter has its own thread, GIL, and Python state
38-
- Requests are routed to subinterpreters via `py_context_router`
39-
- 25-30% faster cast operations compared to worker mode
75+
- Pool of pre-created subinterpreters with shared GIL
76+
- Execution on dirty schedulers with `PyThreadState_Swap`
77+
- Lower latency (~2.5μs) but no true parallelism
78+
- Best throughput for short operations
4079

4180
**Note:** Each sub-interpreter has isolated state. Use the [Shared State](#shared-state) API to share data between workers.
4281

@@ -74,11 +113,17 @@ Runs N executor threads that share the GIL. Requests are distributed round-robin
74113
- You're running CPU-bound workloads
75114
- Memory efficiency is important
76115

77-
**Use Subinterpreters (Python 3.12+) when:**
78-
- You need parallelism with state isolation
79-
- You want crash isolation between contexts
80-
- You're running untrusted or unstable code
81-
- You need predictable per-request state
116+
**Use OWN_GIL (Python 3.12+) when:**
117+
- You need true CPU parallelism across Python contexts
118+
- Running long computations (ML inference, data processing)
119+
- Workload benefits from multiple independent Python interpreters
120+
- You can tolerate higher per-call latency for better throughput
121+
122+
**Use Subinterpreters/Shared-GIL (Python 3.12+) when:**
123+
- You need high call frequency with low latency
124+
- Individual operations are short
125+
- You want namespace isolation without thread overhead
126+
- Memory efficiency is important (shared interpreter pool)
82127

83128
**Use Multi-Executor (Python < 3.12) when:**
84129
- Running on older Python versions

0 commit comments

Comments
 (0)