You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/owngil_internals.md
+84Lines changed: 84 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,50 @@
4
4
5
5
OWN_GIL mode provides true parallel Python execution using Python 3.12+ per-interpreter GIL (`PyInterpreterConfig_OWN_GIL`). Each OWN_GIL context runs in a dedicated pthread with its own subinterpreter and GIL.
6
6
7
+
## Quick Start
8
+
9
+
```erlang
10
+
%% Create an OWN_GIL context (requires Python 3.12+)
|**subinterp**| 3.12+ | None (shared GIL) | Shared GIL (pool) | High call frequency |
25
26
|**multi_executor**| Any | GIL contention | Shared, round-robin | I/O-bound, compatibility |
26
27
27
28
### Free-Threaded Mode (Python 3.13+)
28
29
29
30
When running on a free-threaded Python build (compiled with `--disable-gil`), erlang_python executes Python calls directly without any executor routing. This provides maximum parallelism for CPU-bound workloads.
30
31
32
+
### OWN_GIL Mode (Python 3.12+)
33
+
34
+
Creates dedicated pthreads with independent GILs for true parallel Python execution. Each OWN_GIL context runs in its own thread, enabling CPU parallelism.
35
+
36
+
**Architecture:**
37
+
- Each context gets a dedicated pthread with its own subinterpreter and GIL
38
+
- Requests dispatched via mutex/condvar IPC (not dirty schedulers)
39
+
- True parallel execution across multiple OWN_GIL contexts
40
+
- Higher per-call latency (~10μs vs ~2.5μs) but better parallelism
41
+
42
+
**Usage:**
43
+
```erlang
44
+
%% Create OWN_GIL contexts for parallel execution
45
+
{ok, Ctx1} =py_context:start_link(1, owngil),
46
+
{ok, Ctx2} =py_context:start_link(2, owngil),
47
+
48
+
%% These execute in parallel with independent GILs
- CPU-bound Python workloads that benefit from parallelism
64
+
- Long-running computations
65
+
- When you need true concurrent Python execution
66
+
- Scientific computing, ML inference, data processing
67
+
68
+
**See also:**[OWN_GIL Internals](owngil_internals.md) for architecture details.
69
+
31
70
### Sub-interpreter Mode (Python 3.12+)
32
71
33
-
Uses Python's sub-interpreter feature with per-interpreter GIL (`Py_GIL_OWN`). Each sub-interpreter runs in its own dedicated thread with its own GIL, enabling true parallel execution across interpreters.
72
+
Uses Python's sub-interpreter feature with a shared GIL pool. Multiple contexts share the GIL but have isolated namespaces. Best for high call frequency with low latency.
34
73
35
74
**Architecture:**
36
-
-Thread pool manages N subinterpreters (default: number of schedulers)
37
-
-Each subinterpreter has its own thread, GIL, and Python state
38
-
-Requests are routed to subinterpreters via `py_context_router`
39
-
-25-30% faster cast operations compared to worker mode
75
+
-Pool of pre-created subinterpreters with shared GIL
76
+
-Execution on dirty schedulers with `PyThreadState_Swap`
77
+
-Lower latency (~2.5μs) but no true parallelism
78
+
-Best throughput for short operations
40
79
41
80
**Note:** Each sub-interpreter has isolated state. Use the [Shared State](#shared-state) API to share data between workers.
42
81
@@ -74,11 +113,17 @@ Runs N executor threads that share the GIL. Requests are distributed round-robin
74
113
- You're running CPU-bound workloads
75
114
- Memory efficiency is important
76
115
77
-
**Use Subinterpreters (Python 3.12+) when:**
78
-
- You need parallelism with state isolation
79
-
- You want crash isolation between contexts
80
-
- You're running untrusted or unstable code
81
-
- You need predictable per-request state
116
+
**Use OWN_GIL (Python 3.12+) when:**
117
+
- You need true CPU parallelism across Python contexts
118
+
- Running long computations (ML inference, data processing)
119
+
- Workload benefits from multiple independent Python interpreters
120
+
- You can tolerate higher per-call latency for better throughput
0 commit comments